File: ReleaseNotes.txt

package info (click to toggle)
ispc 1.26.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 95,356 kB
  • sloc: cpp: 55,778; python: 6,681; yacc: 3,074; lex: 1,095; ansic: 714; sh: 283; makefile: 16
file content (2190 lines) | stat: -rw-r--r-- 96,871 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
=== v1.26.0 === (6 February 2025)

ISPC release featuring improved ARM support, new "generic" targets that simplify
ISPC's internal design and streamline the addition of new targets, improved code
generation across x86 and ARM, and multiple stability fixes. This release is
based on a patched LLVM 18.1.8.

ARM Support Changes:

 - The `--arch=arm` flag, which previously mapped to ARMv7 (32-bit), now maps to
   ARMv8 (32-bit). There are no changes to `--arch=aarch64`, which continues to
   map to ARMv8 (64-bit).
 - The CPU definitions for the ARMv7 architecture have been removed: `cortex-a9`
   and `cortex-a15`.
 - New CPU definitions have been introduced, including `cortex-a55`,
   `cortex-a78`, `cortex-a510`, and `cortex-a520`, along with support for new
   Apple devices.
 - New double-pumped targets have been introduced: `neon-i16x16` and
   `neon-i8x32`.
 - Dot product operations are now supported using native ARM instructions
   (`sdot`/`udot`).
 - Performance on ARMv8 has been improved by an average on 13%.

Generic Targets:

In this release, generic targets were introduced in ISPC. Their main goal is to
simplify ISPC target management and serve as the foundation for
hardware-specific targets, requiring only selective tuning when performance
expectations are not met.

ARM targets have been refactored to use generic targets as a baseline, resulting
in cleaner code and improved performance. This change also makes it easier to
add support for new architectures, such as RISC-V or any other LLVM-supported
target.

Generic targets can also be used as standalone targets in cases where no native
target exists with the required width for a particular CPU (e.g., a 32-wide
target for SSE4). This can be done by specifying the following options in ISPC:

`--target=generic-i1x32 --cpu=penryn`

A complete list of all generic targets and the architectures they support can
be found in the output of:

`ispc --support-matrix`

Code Generation:

 - The `-O1` optimization pipeline has been further optimized for size: loop
   unrolling and function inlining have been adjusted accordingly.
 - Improved generated code for the `count_leading_zeros` and
   `count_trailing_zeros` functions by producing native instructions ( e.g.
   `vplzcntq`).
 - Improved generated code for masked load/stores for int8/int16 types on AVX512
   by generating native instructions (`vmovdqu8`, `vmovdqu16`).
 - Improved code generation when returning structs from functions by eliminating
   unnecessary `mov` instructions.

Language Changes:

 - Enhanced support for LLVM intrinsics when the `--enable-llvm-intrinsics` flag
   is used, including support for intrinsics with no arguments (#3112) and
   overloaded intrinsics.
 - Added user-visible macro definitions for the LLVM version that ISPC is based
   on.
 - The `__attribute__((deprecated))` attribute can now be applied to functions,
   generating a warning when the function is called.

Deprecated Targets:

 - The KNL (`avx512knl-x16`) target has been removed.

Compiler Switches Behavior:

 - The `--darwin-version-min` option has been added to specify the minimum
   deployment target version for macOS and iOS applications. This addresses a
   new linker behavior introduced in Xcode 15.0, which issues a warning when no
   version is provided.
 - The `--nocpp` command-line flag is now deprecated and will be removed in a
   future release.

Dispatch Behavior:

 - The behavior of user programs when no supported ISA is detected in the
   auto-dispatch code has changed. Instead of raising the `SIGABRT` signal, the
   system will now raise `SIGILL`.  This affects users who rely on `SIGABRT` in
   their signal handlers for error handling or recovery.  Such users must update
   their code to handle `SIGILL` instead. This change improves predictability and
   removes the dispatcher's reliance on the C standard library.

Bug Fixes:

 - Fixed a crash for functions returning pointers.
 - Fixed incorrect values for some predefined macros.
 - Fixed a crash when using sizeof as a global variable initializer.
 - Fixed function template overload resolution issues.
 - Fixed incorrect behavior in short vector casts inside templates.
 - Fixed incorrect zero handling in the `ldexp` standard library function.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/24.35.30872.22
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.17.28
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 32.0.101.6083
   https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.17.28
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/43fb73fe
 - https://github.com/intel/vc-intrinsics/commit/4f5bc1bb
 - https://github.com/oneapi-src/level-zero/commit/c1f6e28 (v1.17.28)
 - https://github.com/llvm/llvm-project/commit/3b5b5c1(llvmorg-18.1.8) +
   patches from llvm_patches folder

=== v1.25.0 === (15 October 2024)

ISPC release featuring new attribute syntax support, extended template handling
for short vectors and arrays, added support for new Intel GPUs and CPUs, and
significant code generation enhancements. This release is based on patched LLVM
18.1.8, with the minimum required glibc version for Linux binaries raised to
2.27.

Language changes:

 - Added support for `__attribute__` syntax in variable and function
   declarations.  The following attributes are now supported: `noescape`,
   `address_space(N)`, `external_only`, and `unmangled`.
 - Added template support for short vectors and extended it for array
   declarations, allowing the use of type and non-type parameters to specify
   their types and dimensions.
 - Added error messages for variable declarations with incompatible type
   specifiers.
 - Supported typedef with same name as struct tag.

Code generation:

 - Improved generated code for cases when `foreach` loop iteration domain is
   equal 3.
 - Reduced the number of copies in dispatcher logic for cases where a user
   application consists of several multi-target modules.
 - Improved the generated code for `foreach` loop counters by propagating
   LLVM IR flags (e.g., `nsw`, `nuw`) in ISPC optimization passes.
 - Improved shuffles generation for some AVX2 and AVX512 targets, resulting in
   up to 80% faster code.
 - Produced IEEE 754 compliant instructions (`fminnm`/`vminnm`) on ARM for
   min/max operations.

Compile time:

 - Updated standard library (stdlib) processing: ISPC no longer parses the
   stdlib source code for each compiled source file. Instead, the stdlib is
   precompiled into an LLVM IR module during the ISPC build process and is loaded
   and linked on-demand when needed by user code. This change significantly reduces
   compilation time, with improvements ranging from 5% to 60%, depending on the
   size of the user's code (the smaller the code, the greater the improvement).
   More details about this change is here:
   https://github.com/ispc/ispc/blob/main/docs/design/TargetRedesign.md

New targets:

 - Added `xe2hpg-x16` and `xe2hpg-x32` targets for Xe2 Battlemage Intel(R) GPUs.
 - Added `xe2lpg-x16` and `xe2lpg-x32` targets for Xe2 Lunar Lake Intel(R) GPUs.
 - Added CPU definitions for `ArrowLake`, `LunarLake` and `GraniteRapids`:
   `arrowlake`, `lunarlake` and `graniterapids` respectively.

Deprecated targets:

 - `avx512knl-x16`, `gen9-x8` and `gen9-x16` targets are deprecated and will be
   removed in future releases.
 - Adjusted device names for `xelpg-x8` and `xelpg-x16` to `mtl-u` and `mtl-h`.

Standard library:

 - Fixed the implementation of `exp` for `-Inf`, `NaN`, and large numbers, which
   previously resulted in highly inaccurate results.
 - Added a function for the floating-point remainder of division (`fmod`) for
   floating-point types.
 - Added support for float/double atomics (add, sub, min, max)

Compiler switches behavior:

 - Supported the `-ffunction-sections` command line flag to generate each
   function in a separate section.

Infrastructure/build changes:

 - ISPC can now be built in two modes: slim and composite. The default mode is
   composite, which corresponds to the current distribution practice of
   packaging ISPC as a single binary. The slim mode can be enabled by setting
   `ISPC_SLIM_BINARY=ON`. In this mode, the stdlib and binutils libraries are
   excluded from the binary, resulting in a smaller binary footprint. Instead,
   these libraries are placed alongside the ISPC binary. This mode simplifies the
   ISPC build process, making it more suitable for development purposes.
 - Removed build dependency from `ncurses/terminfo`.
 - Added dead code stripping for macOS binaries during linking to reduce the
   final binary size.
 - Fixed build logic for determining the list of system libraries required by
   LLVM.
 - Fixed issue with installation of test files for ISPC Runtime.
 - Added numerous new GitHub Actions workflows with security testing. ISPC now
   follows OpenSSF best practices for development.

Bug fixes:

 - Fixed crashes found by fuzzing.
 - Fixed C/C++ header generation for typedefs with anonymous structs.
 - Fixed the loss of the uniform qualifier in nested templates.
 - Added dump of module verifier's output when it reports an error.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/24.35.30872.22
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.17.28
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 32.0.101.6083
   https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.17.28
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/43fb73fe
 - https://github.com/intel/vc-intrinsics/commit/4f5bc1bb
 - https://github.com/oneapi-src/level-zero/commit/c1f6e28 (v1.17.28)
 - https://github.com/llvm/llvm-project/commit/3b5b5c1(llvmorg-18.1.8) +
   patches from llvm_patches folder

=== v1.24.0 === (16 May 2024)

ISPC release with dot product functions, non-type template parameters, and
generated code improvements. This release is based on patched LLVM 17.0.6.

Language changes:

 - Added support for non-type template parameters. Uniform integers and enums
   can be used now as template parameters.
 - Added dot product functions for unsigned and signed `int8` and `int16`
   types. They leverage AVX-VNNI and AVX512-VNNI instructions if supported by
   targets.
 - Added macro definitions for numeric limits.

New targets:

 - Added targets:

   1. `avx2vnni-i32x4`, `avx2vnni-i32x8`, `avx2vnni-i32x16` with AVX-VNNI
   instruction support,
   2.  `avx512icl-x4`, `avx512icl-x8`, `avx512icl-x16`, `avx512icl-x32` and
   `avx512icl-x64` with AVX512-VNNI instruction support.

Code generation:

 - Fixed generation of code for GPU when unnecessary vectorized instruction are
   used during address arithmetic, e.g., for accessing fields of varying
   structures.
 - Improved generated code for cases when `foreach` loop iteration domain is
   less than the target width.

Compiler switches behavior:

 - `--pic` command line flag now corresponds to the `-fpic` flag of Clang
   and GCC, whereas the newly introduced `--PIC` corresponds to `-fPIC`.

Bug fixes:

 - The implementation of `round` standard library function was aligned across
   all targets. It may potentially affect the results of the code that uses
   this function for the following targets: `avx2-i16x16`, `avx2-i8x32` and
   all `avx512` targets.
 - Fixed cases when unwind info were not generated for functions. This impacted
   debugging and profiling on Windows.
 - Fixed broken targets `sse4-i8xN` and `avx2-i8xN`.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/24.05.28454.6
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.16.9
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.5518
   https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.16.9
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/89ab5df3
 - https://github.com/intel/vc-intrinsics/commit/f9c34404
 - https://github.com/oneapi-src/level-zero/commit/61c97c87 (v1.16.9)
 - https://github.com/llvm/llvm-project/commit/6009708(llvmorg-17.0.6) +
   patches from llvm_patches folder

=== v1.23.0 === (14 February 2024)

ISPC release with bug fixes and a few language improvements. The release is based on patched LLVM 16.0.6.

Language changes:

 - Improved `const` variables initialization:

    1. Variables with const qualifiers can be initialized using the values of previously initialized const variables
       including arithmetic operations above them.
    2. Enum values can be used as constants.

 - One can use the result of selection operator as lvalue now.

Compiler switches behavior:

 - `--dump-file=<dir>` forces now to dump the whole IR modules after each pass.

ISPC Runtime improvements:

 - Added `ISPCRT_GPU_DRIVER` environment variable that allows to choose the specific driver. If more than
   one supported GPU is present in the system, they may be managed by several GPU drivers. The user can
   select the GPU driver using this variable.

Infrastructure/build changes:

 - Removed the build dependency from `llvm-dis`.
 - Lock the time zone to UTS to fix build reproducibility.

Bug fixes:

 - Fixed ABI compatibility of `bool` types returned to C/C++ code.
 - Fixed build error when bison emulates POSIX Yacc.
 - Fixed target definition for `neon-i16x8`, `sse2-i32x8` and `ps5`.
 - Fixed ICE when generating unwind info for `aarch64` code on Windows.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/23.48.27912.11
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.15.1
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.5194_101.5252
   https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.15.1
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/d1c69c33
 - https://github.com/intel/vc-intrinsics/commit/b16218b8
 - https://github.com/oneapi-src/level-zero/commit/ea5be99 (v1.15.1)
 - https://github.com/llvm/llvm-project/commit/7cbf1a25(llvmorg-16.0.6) +
   patches from llvm_patches folder

=== v1.22.0 === (15 November 2023)

ISPC release with template operators support; improved debugging experience of ISPC code on Windows; multiple
stability and performance fixes and more. The release is based on patched LLVM 16.0.6.

ISPC distribution changes:

 - ISPC binaries were compiled with LTO by Clang/LLVM toolchain on all supported platforms and architectures using
   superbuild. ISPC binaries got faster a few percent in average.
 - Examples were excluded from ISPC archives. They are placed alongside as separate archives
   `ispc-examples-v1.22.0.zip` and `ispc-examples-v1.22.0.tar.gz`.

Language changes:

 - Added support for template operators.
 - Revised the usage of function specifiers with templates. For more details please refer to Function Templates
   section of documentation.

Infrastructure changes:

 - Release built with LTO (except `aarch64` Linux).
 - Supported building ISPC with LLVM 17 although GPU support wasn't tested.

New compiler switches:

 - `--dwarf-version` switch accepts DWARF 5 version.
 - `--dwarf-version` switch forces DWARF format debug info generation on Windows. It allows to debug ISPC code linked
   with MinGW generated code.

Bug fixes:

 - Fixed performance regression caused by missed memory effects for genx intrinsics declarations.
 - Fixed performance regression caused by change in the loop unswitch LLVM pass.
 - Fixed C compatibility of ISPC generated headers.
 - Added unwind table to ISPC generated functions for Windows targets. It fixed issues with incorrect backtrace during
   debugging and profiling.
 - Fixed emitted code for negate of short float vectors.
 - Fixed several issues that were related to the usage of bool in different cases.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/23.30.26918.9
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.14.0
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.4826
   https://www.intel.com/content/www/us/en/download/785597/788440/intel-arc-iris-xe-graphics-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.14.0
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/8ac46249
 - https://github.com/intel/vc-intrinsics/commit/77f069b7
 - https://github.com/oneapi-src/level-zero/commit/37363a45 (v1.14.0)
 - https://github.com/llvm/llvm-project/commit/7cbf1a25(llvmorg-16.0.6) +
   patches from llvm_patches folder

=== v1.21.0 === (18 August 2023)

ISPC release with template function specializations support; changed rules for signed integer overflow, which match
C/C++ behavior and lead to more aggressive optimizations; an enhanced ISPC Runtime; multiple stability and performance
fixes and more. The release is based on patched LLVM 15.0.7.

Language changes:

 - Added support for function template specializations with explicit template arguments.
   For more details please refer to Function Templates section of documentation.

 - Modified behavior for signed integer overflow.
   Now, in case of signed integer overflow, `ispc` will assume undefined behavior similar to C and C++. This change may
   cause compatibility issues. You can manage this behavior by using the `--[no-]wrap-signed-int` compiler switch. The default
   behavior (before version 1.21.0) can be preserved by using `--wrap-signed-int`, which maintains defined wraparound
   behavior for signed integers, though it may limit some compiler optimizations.

New hardware support:

Added support of Intel Meteor Lake Xe-LPG graphics:

 - Added two new ISPC targets: `xelpg-x16` and `xelpg-x8`
 - Added two new device names: `mtl-m` and `mtl-p`

Infrastructure changes:

 - ISPC now uses LLVM's new pass manager. Optimization pipeline was modified by introducing early LoopFullUnrollPass
   which matches ISPC unrolled loops with manually unrolled loops in many cases.
 - Introduced ISPC superbuild, which facilitates building ISPC with Xe dependencies (LLVM, L0, vc-intrinsics,
   SPIRV-Translator). It can generate an archive with dependencies or consume a pre-built archive to build ISPC only.
   It also enables generating LTO or LTO+PGO enabled builds of LLVM and ISPC.
 - Supported building ISPC with LLVM 16.

New compiler switches:

 - `--mcmodel` switch, which accepts `small` and `large` values. The definition is similar to gcc/clang. When `large`
   model is used, it enables programs larger than 2Gb.
 - `--opt=disable-gathers` and `--opt=disable-scatters` options, which disable generation of gathers and scatters
   instructions on platforms that support them (for performance experiments).
 - `--[no-]wrap-signed-int` switches, which [does not] preserve(s) wrap-around behavior on signed integer overflow.

ISPC Runtime improvements:

 - Added `ispcrtSetTaskingCallbacks` to the ISPCRT API, allowing the override of default implementations of
   `ISPCLaunch`, `ISPCAlloc`, and `ISPCSync`.
 - Removed compile-time Level Zero dependency from ISPCRT, no longer necessary after the ISPCRT split into CPU and GPU
   parts.


Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/23.22.26516.18
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.13.5
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.4644
   https://www.intel.com/content/www/us/en/download/726609/intel-arc-iris-xe-graphics-whql-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.13.5
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/e82ecc2
 - https://github.com/intel/vc-intrinsics/commit/910db48
 - https://github.com/oneapi-src/level-zero/commit/e1f09b4 (v1.13.5)
 - https://github.com/llvm/llvm-project/commit/8dfdcc7 (llvmorg-15.0.7) +
   patches from llvm_patches folder

=== v1.20.0 === (5 May 2023)

ISPC release with compile time improvements, enhancements in the ISPC Runtime,
and a number of code generation fixes. The release is based on patched LLVM
15.0.7.

ISPC distribution changes.

ISPC binaries got faster and smaller. ISPC binaries got smaller approximately
by 1/3 and a few percent faster. The distribution macOS now includes x86_64,
arm64 and Universal Binaries. On Linux a snap package with the latest ISPC is
available.

ISPC Runtime.

 - ispcrt was split under the hood into GPU and CPU parts, which are loaded
   dynamically. This means you don't need GPU dependencies when running CPU-only
   code using ispcrt.
 - ispcrt got support for fences to enable CPU/GPU asynchronous computations.
 - ispcrt does not depend on OpenMP runtime anymore, but requires TBB.

New targets.

For better fine-tuning when targeting old platforms, sse4 targets were split
into sse4.1 and sse4.2 targets. All changes are backward compatible - sse4 are
aliased to sse4.2 and multi-target compilation allows only one of sse4 target,
so build systems are not confused.

Improvements for contributors

We got a brand new Github Codespaces config, so you are welcome to start
hacking on ISPC in browser. Give it a try!

Recommended versions of Runtime Dependencies when targeting GPU.

Linux:

 - Intel(R) Graphics Compute Runtime
   https://github.com/intel/compute-runtime/releases/tag/23.09.25812.14
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.10.0
 - Threading Building Blocks (TBB)

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/releases/stable_602_20230323.html

Windows:

 - Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.4146
   https://www.intel.com/content/www/us/en/download/726609/772016/intel-arc-iris-xe-graphics-whql-windows.html
 - Level Zero Loader
   https://github.com/oneapi-src/level-zero/releases/tag/v1.10.0
 - OpenCL™ Offline Compiler (OCLOC)
   https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
   (this is needed for AoT compilation on Windows only)
 - Supported platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
   processor graphics

Components revisions used in GPU-enabled build:
 - https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/855eb27
 - https://github.com/intel/vc-intrinsics/commit/29fe787
 - https://github.com/oneapi-src/level-zero/commit/0d56d8e (v1.10.0)
 - https://github.com/llvm/llvm-project/commit/8dfdcc7 (llvmorg-15.0.7) +
   patches from llvm_patches folder


=== v1.19.0 === (28 February 2022)

ISPC release with long-awaited function templates technical preview; new
hardware support for 4th generation Intel® Xeon® Scalable (codename Sapphire
Rapids) CPUs, Intel® Data Center GPU Max (codename Ponte Vecchio), and updated
support for Intel® Arc™ GPUs; improved performance and compile time; an enhanced
ISPC Runtime; a bunch of stability fixes and more. The release is based on
patched LLVM 14.0.6.

Language changes.

Function templates support was introduced in ISPC and it's currently in
technical preview, meaning that current language definition might change in
future versions. For more details please refer to Function Templates
section of documentation.

ISPC has got several other language changes needed for ISPC/SYCL
interoperability (an experimental feature):
1. Support of `__regcall` attribute.
2. A new language construct `invoke_sycl` which is used to call SYCL function
   from ISPC. The function must be declared on ISPC side with `extern "SYCL"
   __regcall` qualifiers.
3. Support of `extern "C"` functions definitions.

New hardware support.

1. Targets for 4th generation Intel® Xeon® Scalable (codename Sapphire Rapids)
   CPUs were introduced: `avx512spr-x4`, `avx512spr-x8`,`avx512spr-x16`,
   `avx512spr-x32`, `avx512spr-x64`. The key difference with other AVX512 targets
   is native support for FP16.
2. New `xehpc-x16`/`xehpc-x32` targets were added for Intel® Data Center GPU Max
   (codename Ponte Vecchio). A new `pvc` device name was introduced.
3. New device names `acm-g10`, `acm-g11`, and `acm-g12` were added for Intel®
   Arc™ Graphics. The `dg2` device name has been removed.
4. Support for Aarch64 targets was enabled on Windows.

ISPC Runtime.

1. A chunking allocator was introduced that can be enabled with `ISPCRT_MEM_POOL`.
2. An API was added to link input modules through `ispcrtStaticLinkModules`
   (using linking on vISA level under the hood) and `ispcrtDynamicLinkModules`
   (using binary linking under the hood).
3. Support for creating multiple devices within a single context was added, and
   an API was added to get a function pointer from a module. It's also possible to
   construct ISPC RT objects from native handlers now.
4. ISPC RT verbose mode was added that can be enabled through `ISPCRT_VERBOSE`.

Performance.

There's a significant performance boost on Xe targets caused by updates in the
ISPC optimization pipeline and the usage of the new spill-cost IGC finalizer
function, which dramatically reduces spill size.

Utilities.

1. ISPC `link` mode has been introduced, allowing to link several LLVM bitcode
   or SPIR-V files and output the result as LLVM bitcode or SPIR-V. For example:
     ispc link test_a.bc test_b.bc --emit-spirv -o test.spv

2. CMake utilities was improved, and support was added for building an ISPC GPU
   target from multiple ISPC files, linking them with `ispc --link`. An
   application's ISPC CMakeLists would look like this:
     add_ispc_library(my_ispc_lib filea.ispc fileb.ispc)
     ispc_target_include_directories(my_ispc_lib <some directory path>)
     ispc_target_compile_definitions(my_ispc_lib -DMY_DEFINE=1)

     add_ispc_library(my_ispc_kernel filec.ispc)
     ispc_target_link_libraries(my_ispc_kernel my_ispc_lib)

Runtime Dependencies when targeting GPU.

Linux:

- Intel(R) Graphics Compute Runtime
  https://github.com/intel/compute-runtime/releases/tag/22.49.25018.24
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v1.9.4
- OpenMP Runtime. Consult your Linux distribution documentation for the
  installation of OpenMP runtime instructions. No specific version is required.

Windows:

- Intel(R) Graphics Windows(R) DCH Drivers 30.0.101.4091
  https://www.intel.com/content/www/us/en/download/726609/intel-arc-iris-xe-graphics-whql-windows.html
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.9.4
- OpenCL™ Offline Compiler (OCLOC)
  https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
  (this is needed for AoT compilation on Windows only)

Components revisions used in GPU-enabled build:

- https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/c469fa8
- https://github.com/intel/vc-intrinsics/commit/3ac855c
- https://github.com/oneapi-src/level-zero/commit/4ed13f3 (v1.9.4)
- https://github.com/llvm/llvm-project/commit/f28c006 (llvmorg-14.0.6) +
  patches from llvm_patches folder

=== v1.18.0 === (5 May 2022)

An ISPC release with a bunch of stability and performance fixes, improvements
for ISPC Runtime, and complete stdlib support for `float16` type. This release
is based on patched LLVM 13.0.1.

`-E` switch was introduced to run preprocessor only. An old bug preventing the
compiler to crash in case of preprocessor error was fixed and now the compiler
will properly crash. As some users considered an old behavior convenient in
some cases, `--ignore-preprocessor-errors` switch was introduced to maintain
the old behavior.

Targets naming was changed for the targets with native masking support to drop
"base type" from the naming scheme, the old naming is accepted for
compatibility. This affected AVX512 target names, the new names are
`avx512skx-x4`, `avx512skx-x8`, `avx512skx-x16`, `avx512skx-x32`,
`avx512skx-x64`, and `avx512knl-x16`.

For debugging and for those, who are interested in understanding compiler
internals, `--ast-dump` switch was introduced. The produced dump of AST
(Abstract Syntax Tree) is intentionally made to look like clang AST dump for
convenience.

Standard library gained full support for `float16` type. Note that it is fully
supported only on the targets with native hardware support. On the other
targets emulation is still not guaranteed but may work in some cases.

Among other fixes, it is worth mentioning the following:
- fixed a bug #1308 affecting multi-target compilation
- a bunch of fixes to make it easier to build ISPC on FreeBSD, even though
  FreeBSD is not officially supported

Improvements for the ISPC Runtime in this release:
- flexible task system selection during build
- support of ISPCRT build separate from ISPC
- support of ISPCRT build for CPU only
- version check in CMake
- new API to get the type of allocated memory (`ispcrtGetMemoryViewAllocType`
  and `ispcrtGetMemoryAllocType`)
- new API for memory copy on device (`ispcrtCopyMemoryView`)
- support of device-only memory without corresponding application memory.

Performance on Xe targets was significantly improved in this release due to
optimizations in ISPC and Vector Backend.

Runtime Dependencies when targeting GPU:

Linux:

- Intel(R) Graphics Compute Runtime
  https://github.com/intel/compute-runtime/releases/tag/22.17.23034
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v1.7.15
- OpenMP Runtime. Consult your Linux distribution documentation for the
  installation of OpenMP runtime instructions. No specific version is required.

Windows:

- Intel(R) Graphics Windows(R) DCH Drivers 30.0.101.1660
  https://www.intel.com/content/www/us/en/download/19344/intel-graphics-windows-dch-drivers.html
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v1.7.15
- OpenCL™ Offline Compiler (OCLOC)
  https://registrationcenter-download.intel.com/akdlm/irc_nas/18653/ocloc_win_101.1660.zip
  (this is needed for AoT compilation on Windows only)

Components revisions used in GPU-enabled build:

- https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/d7a0304
- https://github.com/intel/vc-intrinsics/commit/1e2562d
- https://github.com/oneapi-src/level-zero/commit/bb7fff0 (v1.7.15)
- https://github.com/llvm/llvm-project/commit/75e33f7 (llvmorg-13.0.1) +
  patches from llvm_patches folder

=== v1.17.0 === (14 January 2022)

An ISPC release with massive update of Xe targets, including support for
forthcoming XeHPG GPUs, improvements for `double` type on AVX512 targets, and
multiple standard library improvements. Windows and Linux binaries in this
release support both CPU and GPU targets, while macOS binary supports only CPU.
This release is based on patched LLVM 12.0.1.

Improvements for CPU targets:
- Performance improvements for `double` type on AVX512 targets - better use of
  gather/scatter instructions, 2-5x improvements for `rsqrt()` and `rcp()`
  standard library functions.
- New `avx512skx-i32x4` target.
- `aos_to_soa` and `soa_to_aos` performance improvements for `-x8` and `-x16`
  targets on CPU.
- `--math-lib=svml` mode was fixed and extended - it requires Intel® C++
  Compiler (`icc` or `icx`) to link the binary.
- `zen1`, `zen2`, and `zen3` CPU definitions were added.
- Added experimental support for PS5 platform.

ISPC language got experimental support for IEEE 754 half-precision data type -
`float16`. Not all library functions are supported yet with this type. The key
focus in this release was on hardware natively supporting this type.

This update includes breaking changes in compiler switches for Xe targets:
- Graphics targets `genx-x8` and `genx-x16` were renamed to `gen9-x8` and
  `gen9-x16`.
- Compiler architectures for graphics target were renamed from `genx32` and
  `genx64` to `xe32` and `xe64`.
- Xe targets were renamed from uppercase to lowercase (so instead of SKL/TGLLP
  it is now skl/tgllp).
- A new `--device` switch (which is an alias for the existing `--cpu` switch)
  was introduced. Now the recommended way to specify the required platform for
  CPU and GPU is: `--device=<platform>`

Also this release changes `export` and `task` functions definition on GPU.  Now
GPU kernel is ISPC `task` function only, `export` functions cannot be invoked
from host (i.e. called from ISPC Runtime/L0 Runtime) anymore. `export`
functions are ready to be linked with and called from other GPU modules.
Currently, ISPC experimentally supports such interoperability with Explicit
SIMD SYCL* Extension (ESIMD).

New Xe targets were added:
- `xelp-x8` and `xelp-x16`. XeLP refers to XeLP generation of hardware
  (TigerLake chips and alike).
- `xehpg-x8` and `xehpg-x16`. XeHPG is the architecture name for the
  forthcoming Intel® Arc™ GPUs codename Alchemist..

GPU part has a bunch of stability, performance, and usability improvements
including but not limited to `alloca()` with constant parameter support,
`assume()` support, improved performance for double math functions and integer
division.

`ISPC Runtime` performance was improved several times by fixing the setting of
local group size for kernels, using events as a synchronization mechanism, and
utilizing HW compute and copy engines. There is also a new structure
`ISPCRTModuleOptions` to pass additional options to VC backend if needed.
Currently, `ISPCRTModuleOptions` allows setting of stack size for VC backend
which is used to compile SPIR-V.

Runtime Dependencies when targeting GPU:

Linux:

- Intel(R) Graphics Compute Runtime
  https://github.com/intel/compute-runtime/releases/tag/22.02.22151
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v1.7.4
- OpenMP Runtime. Consult your Linux distribution documentation for the
  installation of OpenMP runtime instructions. No specific version is required.

Windows:

- Intel(R) Graphics Windows(R) DCH Drivers 30.0.101.1191
  https://www.intel.com/content/www/us/en/download/19344/intel-graphics-windows-dch-drivers.html
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v1.7.4
- OpenCL™ Offline Compiler (OCLOC)
  https://software.intel.com/sites/downloads/ocloc/ocloc_win_101.1191.zip (this
  is needed for AoT compilation on Windows only)

Components revisions used in GPU-enabled build:

KhronosGroup/SPIRV-LLVM-Translator@ed25f1b intel/vc-intrinsics@3a5f4b4
oneapi-src/level-zero@2824c1f (v1.7.4) llvm/llvm-project@fed4134 (llvmorg-12.0.1) + patches from llvm_patches folder

=== v1.16.1 === (15 July 2021)

A minor ISPC update, which has a bug fix for [issue
#2111](https://github.com/ispc/ispc/issues/2111) and is based on patched
version of LLVM 12.0.1.

The bug fix affects x86 targets only and shows up as incorrect code generation
for the sequence of `shuffle()` and `reduce_add()` stdlib functions.

If you are building `ispc` from the sources, note that the fix is implemented
as a patch for LLVM backend and LLVM must be built with this patch applied in
order for this fix to take an effect. Stock build of LLVM 12.0.1 will not
contain this bug fix.

=== v1.16.0 === (11 June 2021)

An ISPC release with language extensions for performance fine tuning, cpu
definitions for `AlderLake` and `SapphireRapids` targets, support for macOS ARM
targets, and massive update of Intel GPUs support. Windows and Linux binaries
in this release support both CPU and GPU targets, while macOS binary supports
only CPU. This release is based on patched LLVM 12.0.0.

The language changes include the following:
- The ability to directly call LLVM intrinsics from ISPC source. This should be
  handy for performance fine tuning and reaching the hardware instructions not
  yet covered by the standard library. Note that it is an experimental feature
  and is enabled only with `--enable-llvm-intrinsics` switch. Please refer to
  `LLVM Intrinsic Functions` section of the user manual for more details.
- `assume()` optimization hint, which can be used for communicating assumptions
  to the optimizer. It will not lead to runtime check, unlike `assert()` calls.
  This is intended for optimizations like removing null pointer checks, removing
  loop reminders, communicating alignment information to the optimizer, and etc.
  Please refer to `Compiler Optimization Hints` section of the user manual for
  more details.
- Support for stack memory allocations through `alloca()` calls.
- `trunc()` standard library functions.

Changes for CPU targets:
- CPU definitions for `AlderLake` and `SapphireRapids` were added: `alderlake`
  and `sapphirerapids` respectively.
- CPU definition for Apple ARM chips were added: `apple-a7`, `apple-a10`,
  `apple-a11`, `apple-a12`, `apple-a13`, `apple-a14`.
- Support for macOS ARM targets was added.

Using GPU-enabled binaries you can build ISPC programs and run them on
Intel(R) Core(tm) Processors with Gen9 graphics (formerly `Skylake`,
`Kaby Lake`, `Coffee Lake`) and Gen12 graphics (TigerLake mobile CPU) using
`--target` options (`genx-x8` and `genx-x16`) and `--cpu` option for
specifying particular platform (e.g. `--cpu=TGLLP`).

The main GPU feature of the current release is Windows support.
There are also a bunch of stability and performance improvements.
Here are some of them:
- ISPC Runtime got support of unified shared memory and multi GPU. Also, there is
  a new `TaskQueue::submit()` method which allows to start executing, but don't
  wait for the completion.
- Thread private memory was mapped to SVM in VC backend. It greatly improves
  stability of the current release. It may affect performance on Gen9 graphics
  but we do not expect any significant changes on Gen12.
- L0 binary generation was reworked through libocloc. Supported on Linux only.

More details about the current state of GPU support are available here:
https://ispc.github.io/ispc_for_xe.html

For build instructions check our docker recipe:
https://github.com/ispc/ispc/blob/main/docker/ubuntu/xpu_ispc_build/Dockerfile

GPU support is still in Beta stage so you may experience some issues but we
strongly encourage you to try it out and give us feedback! You can reach us
through Github discussions and issues, or on Twitter (@ispc_updates).

Runtime Dependencies when targeting GPU:

Linux:
- Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/21.21.19914
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.2.3
- OpenMP Runtime. Consult your Linux distribution documentation for the
  installation of OpenMP runtime instructions. No specific version is required.

Windows:
- Intel(R) Graphics - BETA Windows(R) 10 DCH Drivers 30.0.100.9667
  https://downloadcenter.intel.com/download/30522/Intel-Graphics-BETA-Windows-10-DCH-Drivers
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.2.3

Components revisions used in GPU-enabled build:

KhronosGroup/SPIRV-LLVM-Translator@0592c4f
intel/vc-intrinsics@2d0795c
oneapi-src/level-zero@0d30b1f (v1.2.3)
llvm/llvm-project@d28af7c (llvmorg-12.0.0) + patches from llvm_patches folder

=== v1.15.0 === (18 December 2020)

An ISPC release with several improvements for CPU and Beta support of Intel
graphics hardware architectures. The binaries in this release include CPU versions
for Windows, Linux, and macOS, and a GPU-enabled Linux binary, which supports
both CPU and GPU.
CPU binaries are based on patched LLVM 11.0.0, GPU binary is based on patched
LLVM 10.0.1.

CPU changes include:

- New loop unroll pragmas: '#pragma unroll' and '#pragma nounroll' directives
  provide loop unrolling optimization hints to the compiler. This pragma may be used
  immediately before a loop statement. Currently, this functionality is limited to
  uniform for and do-while.
- More efficient 'packed_[load|store]_active()' stdlib functions implementation
  (up to 2.5x faster), which now supports 64 bit types.
- New cpus: 'icelake-server', 'tigerlake' , 'alderlake', 'sapphirerapids'.
- Several stability fixes related to SOA types, bool varying type initialization,
  broken alignment information, type scoping.
- Compile time improvements.
ISPC support was added to CMake 3.19 so you can use now the standard CMake approach
to find ISPC on the system and use it in your build.
https://cmake.org/cmake/help/latest/release/3.19.html#languages

Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R)
Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) and
Gen12 graphics (TigerLake mobile CPU) using '--target' options ('genx-x8' and
'genx-x16') and '--cpu' option for specifying particular platform (e.g. '--cpu=TGLLP').

Stability and performance were significantly improved in this release. Here is the list
of new features:

- Initial support of ahead of time compilation to oneAPI Level Zero binary format using
  '--emit-zebin' switch. You can use this binary from ISPC Runtime by setting
  ISPCRT_USE_ZEBIN env variable to 1. Please note that SPIR-V format is still a
  recommended and default way.
- Initial function pointers implementation.
- Global atomics support.
- Double math functions support.
- Memory functions support.
- Reworked masking approach. We disabled genx hardware mask by default and use
  a software mask by default.
- Improved address spaces differentiation.
- Initial debug support.
- TGLLP (TigerLake mobile CPU) support ('--cpu=TGLLP').
We also added examples to demonstrate interoperability with oneAPI DPC++ Compiler.
More details about current state of GPU support are available here:
https://ispc.github.io/ispc_for_xe.html

For build instructions check our docker recipe:
https://github.com/ispc/ispc/blob/main/docker/ubuntu/gen/Dockerfile

GPU support is in Beta stage so you may experience some issues but we
strongly encourage to try it out and give us feedback! You can reach us through
Github discussions and issues, ISPC mailing list (ispc-users@googlegroups.com),
or on Twitter (@ispc_updates).

Runtime Dependencies:

Intel(R) Graphics Compute Runtime
https://github.com/intel/compute-runtime/releases/tag/20.50.18716
Level Zero Loader
https://github.com/oneapi-src/level-zero/releases/tag/v1.0.22
OpenMP Runtime. Consult your Linux distribution documentation for the
installation of OpenMP runtime instructions. No specific version is required.

Components revisions used in GPU-enabled build:

KhronosGroup/SPIRV-LLVM-Translator@ab5e12a
intel/vc-intrinsics@2de2dd4
oneapi-src/level-zero@c6fa2cd (v1.0.22)
llvm/llvm-project@ef32c61 (llvmorg-10.0.1) + patches from llvm_patches folder

=== v1.14.1 === (28 August 2020)

A minor ISPC update with a bug fix for AVX512 detection problem on macOS
(for more details see issue #1854) and update of GPU version to use Level0 v1.0.
CPU binaries are based on patched LLVM 10.0.1.

Runtime Dependencies for GPU-enabled build:
- Intel(R) Graphics Compute Runtime
  https://github.com/intel/compute-runtime/releases/tag/20.33.17675
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v1.0
- OpenMP Runtime
  Consult your Linux distribution documentation for installation of OpenMP runtime
  instructions.

Components revisions used in GPU-enabled build:
KhronosGroup/SPIRV-LLVM-Translator@1a5c52f
intel/vc-intrinsics@f39ff1e
oneapi-src/level-zero@fcc7b7a (v1.0)
llvm/llvm-project@ef32c61 (llvmorg-10.0.1) + patches from llvm_patches folder

=== v1.14.0 === (30 July 2020)

An ISPC release with several improvements for CPU and initial support of Intel
graphics hardware architectures. The binaries in this release include CPU versions
for Windows, Linux, and macOS, as previous releases, plus a GPU-enabled Linux binary,
which supports both CPU and GPU. CPU binaries are based on patched LLVM 10.0.1.

CPU changes include:
- new avx2-i8x32, avx2-i16x16, avx512skx-i8x64, avx512skx-i16x32 targets.
- "generic" targets were removed.
- several stability fixes, including bugs discovered during fuzzing ISPC by YARPGen.
- integer division performance improvements.
- support for __vectorcall calling convention on Windows x64 (enabled by
  '--vectorcall')

Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R)
Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake)
using new '--target' options: 'genx-x8' and 'genx-x16'. For code generation ISPC
uses Vector Compute backend which is the part of 'Intel(R) Graphics Compute Runtime'
through SPIR-V interface. This release also includes ISPC Runtime based on
'oneAPI Level Zero' for GPU and 'OpenMP Runtime' for CPU, which creates unified
abstraction for executing ISPC code on CPU and GPU.

More details are available here: https://ispc.github.io/ispc_for_xe.html

For build instructions check our docker recipe:
https://github.com/ispc/ispc/blob/main/docker/ubuntu/gen/Dockerfile

The stability and performance of GPU part of this release is not mature yet but we
strongly encourage to try it out and give us feedback! You can reach us through
Github issues, ISPC mailing list (ispc-users@googlegroups.com), or on Twitter
(@ispc_updates).

Runtime Dependencies
- Intel(R) Graphics Compute Runtime
  https://github.com/intel/compute-runtime/releases/tag/20.29.17408
- Level Zero Loader
  https://github.com/oneapi-src/level-zero/releases/tag/v0.91.21
- OpenMP Runtime
  Consult your Linux distribution documentation for installation of OpenMP runtime
  instructions.

Components revisions used in this build:
KhronosGroup/SPIRV-LLVM-Translator@1e661b2
intel/vc-intrinsics.git@a0b66f2
oneapi-src/level-zero@v0.91.21
llvm/llvm-project@llvmorg-10.0.0

=== v1.13.0 === (23 April 2020)

An ISPC update, which graduates cross-compilation support to production and
has multiple code generation improvements and bug fixes. AVX512 targets may
get the biggest performance boost due to changed internal representation of
masks (we observed up to 5% speedups), and new switch '--opt=disable-zmm',
which disables using zmm registers in favour of ymm for avx512skx-i32x16 target.
All targets will definitely benefit from LLVM 10.0 backend used in this release.

Here is the list of other changes:

- new switch '--support-matrix' was added to display information about supported
  cross-compilation targets, which are managed by '--target-os=<os>',
  '--target=<ispc-target>', and '--arch=<arch>' switches.
- representation of 'bool' type in storage was changed to match C/C++ (i.e. one
  'bool' occupies one byte) for better interoperability.
- type aliases for unsigned types were added: 'uint8', 'uint16', 'uint32',
  'uint64', and 'uint'. To detect if these types are supported you can check if
  ISPC_UINT_IS_DEFINED macro is defined.
- 'extract()'/'insert()' for boolean arguments, and 'abs()' for all integer and
  FP types were added to standard library.
- FreeBSD was added to the list of supported target OSes, but it's not well
  tested.

Supported platforms in this release are below. Rows are hosts, columns are
targets. x86 and arm are both 32 and 64 bits, where appropriate.

          Windows | Linux    | macOS | Android  | iOS | PS4 | FreeBSD
Windows | x86     | x86, arm | x86   | x86, arm |     | x86 | x86, arm
Linux   |         | x86, arm | x86   | x86, arm |     |     | x86, arm
macOS   |         | x86, arm | x86   | x86, arm | arm |     | x86, arm

=== v1.12.0 === (15 August 2019)

This ISPC update includes experimental cross OS compilation support, ARM and AARCH64
support and a bunch of language features and stability fixes.

Here are the details:

- ISPC is now a cross OS compiler - You can build ISPC programs for Windows, Linux,
  macOS, iOS, Android and PS4 targets from Windows, Linux and macOS hosts.
- ARM and AARCH64 support has been enabled for ISPC. ARM support currently exists for
  neon-i32x4, neon-i8x16 and neon-i16x8 targets. AARCH64 is supported for neon-i32x4
  as well as for a new "double-pumped" 8-wide target: neon-i32x8.
- A new 128-bit AVX2 target (avx2-i32x4) was added.
- Added a CPU definition for Ice Lake client CPUs (--cpu=icl). Note that there is no
  special target for new instructions in Ice Lake flavor of AVX512 yet. For now, You
  can use SKX targets (avx512skx-i32x8 and avx512skx-i32x16) with --cpu=icl.
- Removed the generic targets for KNC and KNL, so ISPC does not have KNC support anymore.
  KNL is still supported through native target (avx512knl-i32x16).
- Removed AVX1.1 (IvyBridge) targets (use AVX1 targets instead).
- Introduced new language features:
    - 'noinline' function qualifier.
    - 'rsqrt_fast()' and 'rcp_fast()' functions.
    - Static initialization for varying.
- A new command line option '--emit-llvm-text' was added to dump LLVM IR in text format.

An ISPC top-of-trunk build is now available in the Compiler Explorer (https://godbolt.org)

The release is based on a patched LLVM 8.0.0 backend.

=== v1.11.0 === (19 April 2019)

An ISPC update with a bunch of new features and stability bug fixes based on a
patched LLVM 8.0.0 backend.

Notable new features are:
- A new 256-bit AVX512 target (avx512skx-i32x8).
- Modified -O1 switch to optimize for size.
- “#pragma once” in auto-generated headers.
- Better debugging support with -O0.

Also we resumed support for PS4 build.

To efficiently write ISPC programs you can now use the ISPC plug-in for VSCode:
https://marketplace.visualstudio.com/items?itemName=intel-corporation.ispc

=== v1.10.0 === (18 January 2019)

An ISPC update, which brings several new features, has a bunch of stability and
performance bug fixes, and infrastructure improvements for those who are
interested in participating in hacking on the ISPC trunk. We also are also
deprecating KNC support and the KNL-generic target (in favor of the native KNL
target, i.e. avx512knl-i32x16).

We've added:
- a streaming store and load implementation (see "Streaming Load and Store
  Operations" section in documentation)
- support for 64 bit wide types in aos_to_soa/soa_to_aos intrinsics
- an option to specify assembler style (see --x86-asm-syntax switch
  documentation is help message)
- a pragma to disable warnings locally (search for "#pragma ignore" in
  documentation)

Our examples include a new SGEMM example which demonstrates different versions
of matrix multiply with various level of optimality. It is useful for learning
how to start from a naive implementation and then add various optimizations
afterwards. Also, our build system is now based on CMake, as are the examples.
So you can use it as a reference for integrating ISPC to your CMake-based
project.

For those who are interested in hacking ISPC or trying a bleeding edge
development version, we have CI on Linux (Travis-CI) and Windows (Appveyor),
including automatic package builds on Windows. We also have Dockerfiles, which
demonstrate bringing up your environment for ISPC development.

The release is based on a patched LLVM 5.0.2 backend.

=== v1.9.2 === (10 November 2017)

An ISPC update, which brings out-of-the-box debug support on Windows,
better performance of most of the targets and a bunch of stability
and performance bug fixes.

The release is based on patched LLVM 5.0 backend.

Windows build is now supports only VS2015 and newer. If you are using earlier
versions, the only known problem that you may encounter is a problem with
"print" ISPC library function.

AVX512 targets are the main beneficiaries of a newer LLVM backend and
demonstrate the biggest performance improvements. SVML support is also
now available on these targets (requires linking by ICC compiler).

=== v1.9.1 === (8 July 2016)

An ISPC update with new native AVX512 target for future Xeon CPUs and
improvements for debugging, including new switch --dwarf-version to support
debugging on old systems.

The release is based on patched LLVM 3.8.

=== v1.9.0 === (12 Feb 2016)

An ISPC release with AVX512 (KNL flavor) support and a number of bug fixes,
based on fresh LLVM 3.8 backend.

For AVX512 two modes are supported - generic and native. For instructions on how
to use them, please refer to the wiki. Going forward we assume that native mode
is the primary way to get AVX512 support and that generic mode will be deprecated.
If you observe significantly better performance in generic mode, please report
it via github issues.

Starting this release we are shipping two versions on Windows:
(1) for VS2013 and earlier releases
(2) for VS2015 and newer releases
The reason for doing this is the redesigned C run-time library in VS.
An implementation of "print" ISPC standard library function relies on C runtime
library, which has changed. If you are not using "print" function in your code,
you are safe to use either version.

A new options was introduced to improve debugging: --no-omit-frame-pointer.

=== v1.8.2 === (29 May 2015)

An ISPC update with several important stability fixes and an experimental
AVX512 support.

Current level of AVX512 support is targeting the new generation of Xeon Phi
codename Knights Landing. It's implemented in two different ways: as generic and
native target. Generic target is similar to KNC support and requires Intel C/C++
Compiler (15.0 and newer) and is available in regular ISPC build, which is
based on LLVM 3.6.1. For the native AVX512 target, we have a separate ISPC
build, which is based on LLVM trunk (3.7). This build is less stable and has
several known issues. Nevertheless, if you are interested in AVX512 support for
your code, we encourage you to try it and report the bugs. We actively working
with LLVM maintainers to fix all AVX512 bugs, so your feedback is important for
us and will ensure that bugs affecting your code are fixed by LLVM 3.7 release.

Other notable changes and fixes include:

* Broadwell support via --cpu=broadwell.

* Changed cpu naming to accept cpu codenames. Check help for more details.

* --cpu switch disallowed in multi-target mode.

* Alignment of structure fields (in generated header files) is changed to be
  more consistent regardless used C/C++ compiler.

* --dllexport switch is added on Windows to make non-static functions DLL
  export.

* --print-target switch is added to dump details of LLVM target machine.
  This may help you to debug issues with code generation for incorrect target
  (or more likely to ensure that code generation is done right).

* A bug was fixed, which triggered uniform statements to be executed with
  all-off mask under some circumstances.

* The restriction for using some uniform types as return type in multi-target
  mode with targets of different width was relaxed.

Also, if you are using ISPC for code generation for current generation of
Xeon Phi (Knights Corner), the following changes are for you:

* A bunch of stability fixes for KNC.

* A bug, which affects projects with multiple ISPC source files compiled with generic
  target is fixed. As side effect, you may see multiple warnings about unused static
  functions - you need to add "-wd177" switch to ICC compiling generic output files.

The release includes LLVM 3.6.1 binaries for Linux, MacOS, Windows and Windows based
cross-compiler for Sony PlayStation4. LLVM 3.5 based experimental Linux binary with
NVPTX support (now supporting also K80).

Native AVX512 support is available in the set of less stable LLVM 3.7 based binaries
for Linux, MacOS and Windows.

=== v1.8.1 === (31 December 2014)

A minor update of ``ispc`` with several important stability fixes, namely:

* Auto-dispatch mechanism is fixed in pre-built Linux binaries (it used to
  select too conservative target).

* Compile crash with "-O2 -g" is fixed.

Also KNC (Xeon Phi) support is further improved.

The release includes experimental build for Sony PlayStation4 target (Windows
cross compiler), as well NVPTX experimental support (64 bit Linux binaries
only). Note that there might be NVPTX compilation fails with CUDA 7.0.

Similar to 1.8.0 all binaries are based on LLVM 3.5. MacOS binaries are built
for MacOS 10.9 Mavericks. Linux binaries are compatible with kernel 2.6.32
(ok for RHEL6) and later.

=== v1.8.0 === (16 October 2014)

A major new version of ISPC, which introduces experimental support for NVPTX
target, brings numerous improvements to our KNC (Xeon Phi) support, introduces
debugging support on Windows and fixes several bugs. We also ship experimental
build for Sony PlayStation4 target in this release. Binaries for all platforms
are based on LLVM 3.5.

Note that MacOS binaries are build for MacOS 10.9 Mavericks. Linux binaries are
compatible with kernel 2.6.32 (ok for RHEL6) and later.

More details:

* Experimental NVPTX support is available for users of our binary distribution
  on Linux only at the moment. MacOS and Windows users willing to experiment
  with this target are welcome to build it from source. Note that GPU imposes
  some limitation on ISPC language, which are discussed in corresponding section
  of ISPC User's Guide. Implementation of NVPTX support was done by our
  contributor Evghenii Gaburov.

* KNC support was greatly extended in knc.h header file. Beyond new features
  there are stability fixes and changes for icc 15.0 compatibility. Stdlib
  prefetch functions were improved to map to KNC vector prefetches.

* PS4 experimental build is Windows to PS4 cross compiler, which disables arch
  and cpu selection (which are preset to PS4 hardware).

* Debug info support on Windows (compatible with VS2010, VS2012 and VS2013).

* Critical bug fix, which caused code generation for incorrect target, despite
  explicit target switches, under some conditions.

* Stability fix of the bug, which caused print() function to execute under
  all-off mask under some conditions.

=== v1.7.0 === (18 April 2014)

A major new version of ISPC with several language and library extensions and
fixes in debug info support. Binaries for all platforms are based on patched
version on LLVM 3.4. There also performance improvements beyond switchover to
LLVM 3.4.

The list of language and library changes:

* Support for varying types in exported functions was added. See documentation
  for more details.

* get_programCount() function was moved from stdlib.ispc to
  examples/util/util.isph, which needs to be included somewhere in your
  project, if you want to use it.

* Library functions for saturated arithmetic were added. add/sub/mul/div
  operations are supported for signed and unsigned 8/16/32/64 integer types
  (both uniform and varying).

* The algorithm for selecting overloaded function was extended to cover more
  types of overloading. Handling of reference types in overloaded functions was
  fixed. The rules for selecting the best match were changed to match C++,
  which requires the function to be the best match for all parameters. In
  ambiguous cases, a warning is issued, but it will be converted to an error
  in the next release.

* Explicit typecasts between any two reference types were allowed.

* Implicit cast of pointer to const type to void* was disallowed.

The list of other notable changes is:

* Number of fixes for better debug info support.

* Memory corruption bug was fixed, which caused rare but not reproducible
  compile time fails.

* Alias analysis was enabled (more aggressive optimizations are expected).

* A bug involving inaccurate handling of "const" qualifier was fixed. As a
  result, more "const" qualifiers may appear in .h files, which may cause
  compilation errors.

=== v1.6.0 === (19 December 2013)

A major new version of ISPC with major improvements in performance and
stability. Linux and MacOS binaries are based on patched version of LLVM 3.3,
while Windows version is based on LLVM 3.4rc3. LLVM 3.4 significantly improves
stability on Win32 platform, so we've decided not to wait for official LLVM 3.4
release.

The list of the most significant changes is:

* New avx1-i32x4 target was added. It may play well for you, if you are focused
  on integer computations or FP unit in your hardware is 128 bit wide.

* Support for calculations in double precision was extended with two new
  targets avx1.1-i64x4 and avx2-i64x4.

* Language support for overloaded operators was added.

* New library shift() function was added, which is similar to rotate(), but is
  non-circular.

* The language was extended to accept 3 dimensional tasking - a syntactic sugar,
  which may facilitate programming of some tasks.

* Regression, which broke --opt=force-aligned-memory is fixed.

If you are not using pre-built binaries, you may notice the following changes:

* VS2012/VS2013 are supported.

* alloy.py (with -b switch) can build LLVM for you on any platform now
  (except MacOS 10.9, but we know about the problem and working on it).
  This is a preferred way to build LLVM for ISPC, as all required patches for
  better performance and stability will automatically apply.

* LLVM 3.5 (current trunk) is supported.

There are also multiple fixes for better performance and stability, most
notable are:

* Fixed performance problem for x2 targets.

* Fixed a problem with incorrect vzeroupper insertion on AVX target on Win32.

=== v1.5.0 === (27 September 2013)

A major new version of ISPC with several new targets and important bug fixes.
Here's a list of the most important changes, if you are using pre-built
binaries (which are based on patched version of LLVM 3.3):

* The naming of targets was changed to explicitly include data type width and
  a number of threads in the gang. For example, avx2-i32x8 is avx2 target,
  which uses 32 bit types as a base and has 8 threads in a gang. Old naming
  scheme is still supported, but deprecated.

* New SSE4 targets for calculations based on 8 bit and 16 bit data types:
  sse4-i8x16 and sse4-i16x8.

* New AVX1 target for calculations based on 64 bit data types: avx1-i64x4.

* SVML support was extended and improved.

* Behavior of -g switch was changed to not affect optimization level.

* ISPC debug infrastructure was redesigned. See --help-dev for more info and
  enjoy capabilities of new --debug-phase=<value> and --off-phase=<value>
  switches.

* Fixed an auto-dispatch bug, which caused AVX code execution when OS doesn't
  support AVX (but hardware does).

* Fixed a bug, which discarded uniform/varying keyword in typedefs.

* Several performance regressions were fixed.

If you are building ISPC yourself, then following changes are also available
to you:

* --cpu=slm for targeting Intel Atom codename Silvermont (if LLVM 3.4 is used).

* ARM NEON targets are available (if enabled in build system).

* --debug-ir=<value> is available to generate debug information based on LLVM
  IR (if LLVM 3.4 is used). In debugger you'll see LLVM IR instead of source
  code.

* A redesigned and improved test and configuration management system is
  available to facilitate the process of building LLVM and testing ISPC
  compiler.

Standard library changes/fixes:

* __pause() function was removed from standard library.

* Fixed reduce_[min|max]_[float|double] intrinsics, which were producing
  incorrect code under some conditions.

Language changes:

* By default a floating point constant without a suffix is a single precision
  constant (32 bit). A new suffix "d" was introduced to allow double precision
  constant (64 bit). Please refer to tests/double-consts.ispc for syntax
  examples.

=== v1.4.4 === (19 July 2013)

A minor version update with several stability fixes requested by the customers.

=== v1.4.3 === (25 June 2013)

A minor version update with several stability improvements:

* Two bugs were fixed (including a bug in LLVM) to improve stability on 32 bit
  platforms.

* A bug affecting several examples was fixed.

* --instrument switch is fixed.

All tests and examples now properly compile and execute on native targets on
Unix platforms (Linux and MacOS).

=== v1.4.2 === (11 June 2013)

A minor version update with a few important changes:

* Stability fix for AVX2 target (Haswell) - problem with gather instructions was
  released in LLVM 3.4, if you build with LLVM 3.2 or 3.3, it's available in our
  repository (llvm_patches/r183327-AVX2-GATHER.patch) and needs to be applied
  manually.

* Stability fix for widespread issue on Win32 platform (#503).

* Performance improvements for Xeon Phi related to mask representation.

Also LLVM 3.3 has been released and now it's the recommended version for building ISPC.
Precompiled binaries are also built with LLVM 3.3.

=== v1.4.1 === (28 May 2013)

A major new version of ispc has been released with stability and performance
improvements on all supported platforms (Windows, Linux and MacOS).
This version supports LLVM 3.1, 3.2, 3.3 and 3.4. The released binaries are built with 3.2.

New compiler features:

* ISPC memory allocation returns aligned memory with platform natural alignment
  of vector registers by default. Alignment can also be managed via
  --force-alignment=<value>.

Important bug fixes/changes:

* ISPC was fixed to be fully functional when built by GCC 4.7.

* Major cleanup of build and test scripts on Windows.

* Gather/scatter performance improvements on Xeon Phi.

* FMA instructions are enabled for AVX2 instruction set.

* Support of RDRAND instruction when available via library function rdrand (Ivy Bridge).

Release also contains numerous bug fixes and minor improvements.

=== v1.3.0 === (29 June 2012)

This is a major new release of ispc, with support for more compilation
targets and a number of additions to the language.  As usual, the quality
of generated code has also been improved in a number of cases and a number
of small bugs have been fixed.

New targets:

* This release provides "beta" support for compiling to Intel® Xeon
  Phi™ processor, code named Knights Corner, the first processor in
  the Intel® Many Integrated Core Architecture.  See
  http://ispc.github.io/ispc.html#compiling-for-the-intel-xeon-phi-architecture
  for more details on this support.

* This release also has an "avx1.1" target, which provides support for the
  new instructions in the Intel Ivy Bridge microarchitecutre.

New language features:

* The foreach_active statement allows iteration over the active program
  instances in a gang.  (See
  http://ispc.github.io/ispc.html#iteration-over-active-program-instances-foreach-active)

* foreach_unique allows iterating over subsets of program instances in a
  gang that share the same value of a variable.  (See
  http://ispc.github.io/ispc.html#iteration-over-unique-elements-foreach-unique)

* An "unmasked" function qualifier and statement in the language allow
  re-activating execution of all program instances in a gang.  (See
  http://ispc.github.io/ispc.html#re-establishing-the-execution-mask

Standard library updates:

* The seed_rng() function has been modified to take a "varying" seed value
  when a varying RNGState is being initialized.

* An isnan() function has been added, to check for floating-point "not a
  number" values.

* The float_to_srgb8() routine does high performance conversion of
  floating-point color values to SRGB8 format.

Other changes:

* A number of bugfixes have been made for compiler crashes with malformed
  programs.

* Floating-point comparisons are now "unordered", so that any comparison
  where one of the operands is a "not a number" value returns false.  (This
  matches standard IEEE floating-point behavior.)

* The code generated for 'break' statements in "varying" loops has been
  improved for some common cases.

* Compile time and compiler memory use have both been improved,
  particularly for large input programs.

* A nubmer of bugs have been fixed in the debugging information generated
  by the compiler when the "-g" command-line flag is used.

=== v1.2.2 === (20 April 2012)

This release includes a number of small additions to functionality and a
number of bugfixes.  New functionality includes:

* It's now possible to forward declare structures as in C/C++: "struct
  Foo;".  After such a declaration, structs with pointers to "Foo" and
  functions that take pointers or references to Foo structs can be declared
  without the entire definition of Foo being available.

* New built-in types size_t, ptrdiff_t, and [u]intptr_t are now available,
  corresponding to the equivalent types in C.

* The standard library now provides atomic_swap*() and
  atomic_compare_exchange*() functions for void * types.

* The C++ backend has seen a number of improvements to the quality and
  readability of generated code.

A number of bugs have been fixed in this release as well.  The most
significant are:

* Fixed a bug where nested loops could cause a compiler crash in some
  circumstances (issues #240, and #229)

* Gathers could access invlaid mamory (and cause the program to crash) in
  some circumstances (#235)

* References to temporary values are now handled properly when passed to a
  function that takes a reference typed parameter.

* A case where incorrect code could be generated for compile-time-constant
  initializers has been fixed (#234).

=== v1.2.1 === (6 April 2012)

This release contains only minor new functionality and is mostly for many
small bugfixes and improvements to error handling and error reporting.
The new functionality that is present is:

* Significantly more efficient versions of the float / half conversion
  routines are now available in the standard library, thanks to Fabian
  Giesen.

* The last member of a struct can now be a zero-length array; this allows
  the trick of dynamically allocating enough storage for the struct and
  some number of array elements at the end of it.

Significant bugs fixed include:

* Issue #205: When a target ISA isn't specified, use the host system's
  capabilities to choose a target for which it will be able to run the
  generated code.

* Issues #215 and #217: Don't allocate storage for global variables that
  are declared "extern".

* Issue #197: Allow NULL as a default argument value in a function
  declaration.

* Issue #223: Fix bugs where taking the address of a function wouldn't work
  as expected.

* Issue #224: When there are overloaded variants of a function that take
  both reference and const reference parameters, give the non-const
  reference preference when matching values of that underlying type.

* Issue #225: An error is issed when a varying lvalue is assigned to a
  reference type (rather than crashing).

* Issue #193: Permit conversions from array types to void *, not just the
  pointer type of the underlying array element.

* Issue #199: Still evaluate expressions that are cast to (void).

The documentation has also been improved, with FAQs added to clarify some
aspects of the ispc pointer model.

=== v1.2.0 === (20 March 2012)

This is a major new release of ispc, with a number of significant
improvements to functionality, performance, and compiler robustness.  It
does, however, include three small changes to language syntax and semantics
that may require changes to existing programs:

* Syntax for the "launch" keyword has been cleaned up; it's now no longer
  necessary to bracket the launched function call with angle brackets.
  (In other words, now use "launch foo();", rather than "launch < foo() >;".

* When using pointers, the pointed-to data type is now "uniform" by
  default.  Use the varying keyword to specify varying pointed-to types when
  needed.  (i.e. "float *ptr" is a varying pointer to uniform float data,
  whereas previously it was a varying pointer to varying float values.)
  Use "varying float *" to specify a varying pointer to varying float data,
  and so forth.

* The details of "uniform" and "varying" and how they interact with struct
  types have been cleaned up.  Now, when a struct type is declared, if the
  struct elements don't have explicit "uniform" or "varying" qualifiers,
  they are said to have "unbound" variability.  When a struct type is
  instantiated, any unbound variability elements inherit the variability of
  the parent struct type. See http://ispc.github.io/ispc.html#struct-types
  for more details.

ispc has a new language feature that makes it much easier to use the
efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout of
data.  A new "soa<n>" qualifier can be applied to structure types to
specify an n-wide SoA version of the corresponding type.  Array indexing
and pointer operations with arrays SoA types automatically handles the
two-stage indexing calculation to access the data.  See
http://ispc.github.io/ispc.html#structure-of-array-types for more details.

For more efficient access of data that is still in "array of structures"
(AoS) format, ispc has a new "memory coalescing" optimization that
automatically detects series of strided loads and/or gathers that can be
transformed into a more efficient set of vector loads and shuffles.  A
diagnostic is emitted when this optimization is successfully applied.

Smaller changes in this release:

* The standard library now provides memcpy(), memmove() and memset()
  functions, as well as single-precision asin() and acos() functions.

* -I can now be specified on the command-line to specify a search path for
  #include files.

* A number of improvements have been made to error reporting from the
  parser, and a number of cases where malformed programs could cause the
  compiler to crash have been fixed.

* A number of small improvements to the quality and performance of generated
  code have been made, including finding more cases where 32-bit addressing
  calculations can be safely done on 64-bit systems and generating better
  code for initializer expressions.

=== v1.1.4 === (4 February 2012)

There are two major bugfixes for Windows in this release.  First, a number
of failures in AVX code generation on Windows have been fixed; AVX on
Windows now has no known issues.  Second, a longstanding bug in parsing 64-bit
integer constants on Windows has been fixed.

This release features a new experimental scalar target, contributed by Gabe
Weisz <gweisz@cs.cmu.edu>.  This target ("--target=generic-1") compiles
gangs of single program instances (i.e. programCount == 1); it can be
useful for debugging ispc programs.

The compiler now supports dynamic memory allocation in ispc programs (with
"new" and "delete" operators based on C++).  See
http://ispc.github.io/ispc.html#dynamic-memory-allocation in the
documentation for more information.

ispc now performs "short circuit" evaluation of the || and && logical
operators and the ? : selection operator.  (This represents the correction
of a major incompatibility with C.)  Code like "(index < arraySize &&
array[index] == 1)" thus now executes as in C, where "array[index]" won't
be evaluated unless "index" is less than "arraySize".

The standard library now provides "local" atomic operations, which are
atomic across the gang of program instances (but not across other gangs or
other hardware threads.  See the updated documentation on atomics for more
information:
http://ispc.github.io/ispc.html#atomic-operations-and-memory-fences.

The standard library now offers a clock() function, which returns a uniform
int64 value that counts processor cycles; it can be used for
fine-resolution timing measurements.

Finally (of limited interest now): ispc now supports the forthcoming AVX2
instruction set, due with Haswell-generation CPUs.  All tests and examples
compile and execute correctly with AVX2.  (Thanks specifically to Craig
Topper and Nadav Rotem for work on AVX2 support in LLVM, which made this
possible.)

=== v1.1.3 === (20 January 2012)

With this release, the language now supports "switch" statements, with the
same semantics and syntax as in C.

This release includes fixes for two important performance related issues:
the quality of code generated for "foreach" statements has been
substantially improved (https://github.com/ispc/ispc/issues/151), and a
performance regression with code for "gathers" that was introduced in
v1.1.2 has been fixed in this release.

A number of other small bugs were fixed in this release as well, including
one where invalid memory would sometimes be incorrectly accessed
(https://github.com/ispc/ispc/issues/160).

Thanks to Jean-Luc Duprat for a number of patches that improve support for
building on various platforms, and to Pierre-Antoine Lacaze for patches so
that ispc builds under MinGW.

=== v1.1.2 === (9 January 2012)

The major new feature in this release is support for "generic" C++
vectorized output; in other words, ispc can emit C++ code that corresponds
to the vectorized computation that the ispc program represents.  See the
examples/intrinsics directory in the ispc distribution for two example
implementations of the set of functions that must be provided map the
vector calls generated by ispc to target specific functions.

ispc now has partial support for 'goto' statements; specifically, goto is
allowed if any enclosing control flow statements (if/for/while/do) have
'uniform' test expressions, but not if they have 'varying' tests.

A number of improvements have been made to the code generated for gathers
and scatters--one of them (better matching x86's "free" scale by 2/4/8 for
addressing calculations) improved the performance of the noise example by
14%.

Many small bugs have been fixed in this release as well, including issue
numbers 138, 129, 135, 127, 149, and 142.

=== v1.1.1 === (15 December 2011)

This release doesn't include any significant new functionality, but does
include a small improvements in generated code and a number of bug fixes.

The one user-visible language change is that integer constants may be
specified with 'u' and 'l' suffixes, like in C.  For example, "1024llu"
defines the constant with unsigned 64-bit type.

More informative and useful error messages are printed when function
overload resolution fails.

Masking is avoided in additional cases when the mask can be
statically-determined to be all on.

A number of small bugs have been fixed:
- Under some circumstances, incorrect masks were used when assigning a
  value to a reference and when doing gathers/scatters.
- Incorrect code could be generated in some cases when some instances
  returned part way through a function but others contineud executing.
- Type checking wasn't being performed for calls through function pointers;
  now an error is issued if the arguments don't match up, etc.
- Incorrect code was being generated for gather/scatter to structs that had
  elements with varying short-vector types.
- Typechecking wasn't being performed for "foreach" statements; this led to
  problems like function overload resolution not being performed if an
  overloaded function call was used to determine the iteration range..
- A number of symbols would be multiply-defined when compiling to multiple
  targets and using the sse2-x2 target as one of them (issue #131).

=== v1.1.0 === (5 December 2011)

This is a major new release of the compiler, with significant additions to
language functionality and capabilities.  It includes a number of small
language syntax changes that will require modification of existing
programs.  These changes should generally be straightforward and all are
steps toward eliminating parts of ispc syntax that are incompatible with
C/C++.  See
http://ispc.github.io/ispc.html#updating-ispc-programs-for-changes-in-ispc-1-1
for more information about these changes.

ispc now fully supports pointers, including pointer arithmetic, implicit
conversions of arrays to pointers, and all of the other capabilities of
pointers in C.  See http://ispc.github.io/ispc.html#pointer-types for more
information about pointers in ispc and
http://ispc.github.io/ispc.html#function-pointer-types for information
about function pointers in ispc.

Reference types are now declared with C++ syntax (e.g. "const float &foo").

ispc now supports 64-bit addressing.  For performance reasons, this
capability is disabled by default (even on 64-bit targets), but can be
enabled with a command-line flag:
http://ispc.github.io/ispc.html#selecting-32-or-64-bit-addressing.

This release features new parallel "foreach" statements, which make it
easier in many instances to map program instances to data for data-parallel
computation than the programIndex/programCount mechanism:
http://ispc.github.io/ispc.html#parallel-iteration-statements-foreach-and-foreach-tiled.

Finally, all of the system's documentation has been significantly revised.
The documentation of ispc's parallel execution model has been rewritten:
http://ispc.github.io/ispc.html#the-ispc-parallel-execution-model, and
there is now a more specific discussion of similarities and differences
between ispc and C/C++:
http://ispc.github.io/ispc.html#relationship-to-the-c-programming-language.
There is now a separate FAQ (http://ispc.github.io/faq.html), and a
Performance Guide (http://ispc.github.io/perfguide.html).

=== v1.0.12 === (20 October 2011)

This release includes a new "double-pumped" 8-wide target for SSE2,
"sse2-x2".  Like the sse4-x2 and avx-x2 targets, this target may deliver
higher performance for some workloads than the regular sse2 target.  (For
other workloads, it may be slower.)

The ispc language now includes an "assert()" statement.  See
http://ispc.github.io/ispc.html#assertions for more information.

The compiler now sets a preprocessor #define based on the target ISA; for
example, ISPC_TARGET_SSE4 is defined for the sse4 targets, and so forth.

The standard library now provides high-performance routines for converting
between some "array of structures" and "structure of arrays" formats.
See
http://ispc.github.io/ispc.html#converting-between-array-of-structures-and-structure-of-arrays-layout
for more information.

Inline functions now have static linkage.

A number of improvements have been made to the optimization passes that
detect when gathers and scatters can be transformed into vector stores and
loads, respectively.  In particular, these passes now handle variables that
are used as loop induction variables much better.

=== v1.0.11 === (6 October 2011)

The main new feature in this release is support for generating code for
multiple targets (e.g., SSE2, SSE4, and AVX) and having the compiled code
select the best variant at execution time.  For more information, see
http://ispc.github.io/ispc.html#compiling-with-support-for-multiple-instruction-sets.

All of the examples now take advantage of the support for multiple
compilation targets; thus, if one has an AVX system, it's not necessary to
recompile the examples to use the AVX target.

Performance of the built-in task system that is used in the examples has
been improved.

Finally, the print() statement now works on OSX; it had been broken for the
last few releases.

=== v1.0.10 === (30 September 2011)

This release features an extensive new example showing the application of
ispc to a deferred shading algorithm for scenes with thousands of lights
(examples/deferred).  This is an implementation of the algorithm that Johan
Andersson described at SIGGRAPH 2009 and was implemented by Andrew
Lauritzen and Jefferson Montgomery.  The basic idea is that a pre-rendered
G-buffer is partitioned into tiles, and in each tile, the set of lights
that contribute to the tile is computed.  Then, the pixels in the tile are
then shaded using those light sources. (See slides 19-29 of
http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf
for more details on the algorithm.)

The mechanism for launching tasks from ispc code has been generalized to
allow multiple tasks to be launched with a single launch call (see
http://ispc.github.io/ispc.html#task-parallelism-language-syntax for more
information.)

A few new functions have been added to the standard library: num_cores()
returns the number of cores in the system's CPU, and variants of all of the
atomic operators that take 'uniform' values as parameters have been added.

=== v1.0.9 === (26 September 2011)

The binary release of v1.0.9 is the first that supports AVX code
generation.  Two targets are provided: "avx", which runs with a
programCount of 8, and "avx-x2" which runs 16 program instances
simultaneously.  (This binary is also built using the in-progress LLVM 3.0
development libraries, while previous ones have been built with the
released 2.9 version of LLVM.)

This release has no other significant changes beyond a number of small
bugfixes (https://github.com/ispc/ispc/issues/100,
https://github.com/ispc/ispc/issues/101, https://github.com/ispc/ispc/issues/103.)

=== v1.0.8 === (19 September 2011)

A number of improvements have been made to handling of 'if' statements in
the language:
  - A bug was fixed where invalid memory could be incorrectly accessed even
    if none of the running program instances wanted to execute the
    corresponding instructions (https://github.com/ispc/ispc/issues/74).
  - The code generated for 'if' statements is a bit simpler and thus more
    efficient.

There is now '--pic' command-line argument that causes position-independent
code to be generated (Linux and OSX only).

A number of additional performance improvements:
  - Loops are now unrolled by default; the --opt=disable-loop-unroll
    command-line argument can be used to disable this behavior.
    (https://github.com/ispc/ispc/issues/78)
  - A few more cases where gathers/scatters could be determined at compile
    time to actually access contiguous locations have been added.
    (https://github.com/ispc/ispc/issues/79)

Finally, warnings are now issued (if possible) when it can be determined
at compile-time that an out-of-bounds array index is being used.
(https://github.com/ispc/ispc/issues/98).


=== v1.0.7 === (3 September 2011)

The various atomic_*_global() standard library functions are generally
substantially more efficient.  They all previously issued one hardware
atomic instruction for each running program instance but now locally
compute a reduction over the operands and issue a single hardware atomic,
giving the same effect and results in the end (issue #57).

CPU/ISA target handling has been substantially improved.  If no CPU is
specified, the host CPU type is used, not just a default of "nehalem".  A
number of bugs were fixed that ensure that LLVM doesn't generate SSE>2
instructions when using the SSE2 target (fixes issue #82).

Shift rights of unsigned integer types use a logical shift right
instruction now, not an arithmetic shift right (fixed issue #88).

When emitting header files, 'extern' declarations of globals used in ispc
code are now outside of the ispc namespace.  Fixes issue #64.

The stencil example has been modified to do runs with and without
parallelism.

Many other small bugfixes and improvements.

=== v1.0.6 === (17 August 2011)

Some additional cross-program instance operations have been added to the
standard library.  reduce_equal() checks to see if the given value is the
same across all running program instances, and exclusive_scan_{and,or,and}()
computes a scan over the given value in the running program instances.
See the documentation of these new routines for more information:
http://ispc.github.io/ispc.html#cross-program-instance-operations.

The simple task system implementations used in the examples have been
improved.  The Windows version no nlonger has a hard limit on the number of
tasks that can be launched, and all versions have less dynamic memory
allocation and less locking.  More of the examples now have paths that also
measure performance using tasks along with SPMD vectorization.

Two new examples have been added: one that shows the implementation of a
ray-marching volume rendering algorithm, and one that shows a 3D stencil
computation, as might be done for PDE solutions.

Standard library routines to issue prefetches have been added.  See the
documentation for more details: http://ispc.github.io/ispc.html#prefetches.

Fast versions of the float to half-precision float conversion routines have
been added.  For more details, see:
http://ispc.github.io/ispc.html#conversions-to-and-from-half-precision-floats.

There is the usual set of small bug fixes.  Notably, a number of details
related to handling 32 versus 64 bit targets have been fixed, which in turn
has fixed a bug related to tasks having incorrect values for pointers
passed to them.

=== v1.0.5 === (1 August 2011)

Multi-element vector swizzles are supported; for example, given a 3-wide
vector "foo", then expressions like "foo.zyx" and "foo.yz" can be used to
construct other short vectors.  See
http://ispc.github.io/ispc.html#short-vector-types
for more details.  (Thanks to Pete Couperus for implementing this code!).

int8 and int16 datatypes are now supported.  It is still generally more
efficient to use int32 for intermediate computations, even if the in-memory
format is int8 or int16.

There are now standard library routines to convert to and from 'half'-format
floating-point values (half_to_float() and float_to_half()).

There is a new example with an implementation of Perlin's Noise function
(examples/noise).  It shows a speedup of approximately 4.2x versus a C
implementation on OSX and a 2.9x speedup versus C on Windows.

=== v1.0.4 === (18 July 2011)

enums are now supported in ispc; see the section on enumeration types in
the documentation (http://ispc.github.io/ispc.html#enumeration-types) for
more informaiton.

bools are converted to integers with zero extension, not sign extension as
before (i.e. a 'true' bool converts to the value one, not 'all bits on'.)
For cases where sign extension is still desired, there is a
sign_extend(bool) function in the standard library.

Support for 64-bit types in the standard library is much more complete than
before.

64-bit integer constants are now supported by the parser.

Storage for parameters to tasks is now allocated dynamically on Windows,
rather than on the stack; with this fix, all tests now run correctly on
Windows.

There is now support for atomic swap and compare/exchange with float and
double types.

A number of additional small bugs have been fixed and a number of cases
where the compiler would crash given a malformed program have been fixed.

=== v1.0.3 === (4 July 2011)

ispc now has a bulit-in pre-processor (from LLVM's clang compiler).
(Thanks to Pete Couperus for this patch!)  It is therefore no longer
necessary to use cl.exe for preprocessing on Windows; the MSVC proejct
files for the examples have been updated accordingly.

There is another variant of the shuffle() function int the standard
library: "<type> shuffle(<type> v0, <type> v1, int permute)", where the
permutation vector indexes over the concatenation of the two vectors
(e.g. the value 0 corresponds to the first element of v0, the value
2*programCount-1 corresponds to the last element of v1, etc.)

ispc now supports the usual range of atomic operations (add, subtract, min,
max, and, or, and xor) as well as atomic swap and atomic compare and
exchange.  There is also a facility for inserting memory fences.  See the
"Atomic Operations and Memory Fences" section of the user's guide
(http://ispc.github.io/ispc.html#atomic-operations-and-memory-fences) for
more information.

There are now both 'signed' and 'unsigned' variants of the standard library
functions like packed_load_active() that take references to arrays of
signed int32s and unsigned int32s respectively.  (The
{load_from,store_to}_{int8,int16}() functions have similarly been augmented
to have both 'signed' and 'unsigned' variants.)

In initializer expressions with variable declarations, it is no longer
legal to initialize arrays and structs with single scalar values that then
initialize their members; they now must be initialized with initializer
lists in braces (or initialized after of the initializer with a loop over
array elements, etc.)

=== v1.0.2 === (1 July 2011)

Floating-point hexidecimal constants are now parsed correctly on Windows
(fixes issue #16).

SSE2 is now the default target if --cpu=atom is given in the command line
arguments and another target isn't explicitly specified.

The standard library now provides broadcast(), rotate(), and shuffle()
routines for efficient communication between program instances.

The MSVC solution files to build the examples on Windows now use
/fpmath:fast when building.

=== v1.0.1 === (24 June 2011)

ispc no longer requires that pointers to memory that are passed in to ispc
have alignment equal to the targets vector width; now alignment just has to
be the regular element alignment (e.g. 4 bytes for floats, etc.)  This
change also fixed a number of cases where it previously incorrectly
generated aligned load/store instructions in cases where the address wasn't
actually aligned (even if the base address passed into ispc code was).

=== v1.0 === (21 June 2011)

Initial Release