File: CHANGELOG

package info (click to toggle)
blis 2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 41,904 kB
  • sloc: ansic: 351,996; fortran: 21,831; cpp: 10,947; sh: 9,392; makefile: 1,921; asm: 1,516; python: 695
file content (30999 lines) | stat: -rw-r--r-- 1,319,569 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163
4164
4165
4166
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176
4177
4178
4179
4180
4181
4182
4183
4184
4185
4186
4187
4188
4189
4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201
4202
4203
4204
4205
4206
4207
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
4222
4223
4224
4225
4226
4227
4228
4229
4230
4231
4232
4233
4234
4235
4236
4237
4238
4239
4240
4241
4242
4243
4244
4245
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255
4256
4257
4258
4259
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
4328
4329
4330
4331
4332
4333
4334
4335
4336
4337
4338
4339
4340
4341
4342
4343
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353
4354
4355
4356
4357
4358
4359
4360
4361
4362
4363
4364
4365
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375
4376
4377
4378
4379
4380
4381
4382
4383
4384
4385
4386
4387
4388
4389
4390
4391
4392
4393
4394
4395
4396
4397
4398
4399
4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422
4423
4424
4425
4426
4427
4428
4429
4430
4431
4432
4433
4434
4435
4436
4437
4438
4439
4440
4441
4442
4443
4444
4445
4446
4447
4448
4449
4450
4451
4452
4453
4454
4455
4456
4457
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483
4484
4485
4486
4487
4488
4489
4490
4491
4492
4493
4494
4495
4496
4497
4498
4499
4500
4501
4502
4503
4504
4505
4506
4507
4508
4509
4510
4511
4512
4513
4514
4515
4516
4517
4518
4519
4520
4521
4522
4523
4524
4525
4526
4527
4528
4529
4530
4531
4532
4533
4534
4535
4536
4537
4538
4539
4540
4541
4542
4543
4544
4545
4546
4547
4548
4549
4550
4551
4552
4553
4554
4555
4556
4557
4558
4559
4560
4561
4562
4563
4564
4565
4566
4567
4568
4569
4570
4571
4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
4583
4584
4585
4586
4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617
4618
4619
4620
4621
4622
4623
4624
4625
4626
4627
4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643
4644
4645
4646
4647
4648
4649
4650
4651
4652
4653
4654
4655
4656
4657
4658
4659
4660
4661
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675
4676
4677
4678
4679
4680
4681
4682
4683
4684
4685
4686
4687
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
4700
4701
4702
4703
4704
4705
4706
4707
4708
4709
4710
4711
4712
4713
4714
4715
4716
4717
4718
4719
4720
4721
4722
4723
4724
4725
4726
4727
4728
4729
4730
4731
4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744
4745
4746
4747
4748
4749
4750
4751
4752
4753
4754
4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768
4769
4770
4771
4772
4773
4774
4775
4776
4777
4778
4779
4780
4781
4782
4783
4784
4785
4786
4787
4788
4789
4790
4791
4792
4793
4794
4795
4796
4797
4798
4799
4800
4801
4802
4803
4804
4805
4806
4807
4808
4809
4810
4811
4812
4813
4814
4815
4816
4817
4818
4819
4820
4821
4822
4823
4824
4825
4826
4827
4828
4829
4830
4831
4832
4833
4834
4835
4836
4837
4838
4839
4840
4841
4842
4843
4844
4845
4846
4847
4848
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874
4875
4876
4877
4878
4879
4880
4881
4882
4883
4884
4885
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915
4916
4917
4918
4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
4929
4930
4931
4932
4933
4934
4935
4936
4937
4938
4939
4940
4941
4942
4943
4944
4945
4946
4947
4948
4949
4950
4951
4952
4953
4954
4955
4956
4957
4958
4959
4960
4961
4962
4963
4964
4965
4966
4967
4968
4969
4970
4971
4972
4973
4974
4975
4976
4977
4978
4979
4980
4981
4982
4983
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031
5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
5193
5194
5195
5196
5197
5198
5199
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264
5265
5266
5267
5268
5269
5270
5271
5272
5273
5274
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
5291
5292
5293
5294
5295
5296
5297
5298
5299
5300
5301
5302
5303
5304
5305
5306
5307
5308
5309
5310
5311
5312
5313
5314
5315
5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
5331
5332
5333
5334
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
5346
5347
5348
5349
5350
5351
5352
5353
5354
5355
5356
5357
5358
5359
5360
5361
5362
5363
5364
5365
5366
5367
5368
5369
5370
5371
5372
5373
5374
5375
5376
5377
5378
5379
5380
5381
5382
5383
5384
5385
5386
5387
5388
5389
5390
5391
5392
5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409
5410
5411
5412
5413
5414
5415
5416
5417
5418
5419
5420
5421
5422
5423
5424
5425
5426
5427
5428
5429
5430
5431
5432
5433
5434
5435
5436
5437
5438
5439
5440
5441
5442
5443
5444
5445
5446
5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468
5469
5470
5471
5472
5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
5494
5495
5496
5497
5498
5499
5500
5501
5502
5503
5504
5505
5506
5507
5508
5509
5510
5511
5512
5513
5514
5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
5525
5526
5527
5528
5529
5530
5531
5532
5533
5534
5535
5536
5537
5538
5539
5540
5541
5542
5543
5544
5545
5546
5547
5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559
5560
5561
5562
5563
5564
5565
5566
5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578
5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
5589
5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
5609
5610
5611
5612
5613
5614
5615
5616
5617
5618
5619
5620
5621
5622
5623
5624
5625
5626
5627
5628
5629
5630
5631
5632
5633
5634
5635
5636
5637
5638
5639
5640
5641
5642
5643
5644
5645
5646
5647
5648
5649
5650
5651
5652
5653
5654
5655
5656
5657
5658
5659
5660
5661
5662
5663
5664
5665
5666
5667
5668
5669
5670
5671
5672
5673
5674
5675
5676
5677
5678
5679
5680
5681
5682
5683
5684
5685
5686
5687
5688
5689
5690
5691
5692
5693
5694
5695
5696
5697
5698
5699
5700
5701
5702
5703
5704
5705
5706
5707
5708
5709
5710
5711
5712
5713
5714
5715
5716
5717
5718
5719
5720
5721
5722
5723
5724
5725
5726
5727
5728
5729
5730
5731
5732
5733
5734
5735
5736
5737
5738
5739
5740
5741
5742
5743
5744
5745
5746
5747
5748
5749
5750
5751
5752
5753
5754
5755
5756
5757
5758
5759
5760
5761
5762
5763
5764
5765
5766
5767
5768
5769
5770
5771
5772
5773
5774
5775
5776
5777
5778
5779
5780
5781
5782
5783
5784
5785
5786
5787
5788
5789
5790
5791
5792
5793
5794
5795
5796
5797
5798
5799
5800
5801
5802
5803
5804
5805
5806
5807
5808
5809
5810
5811
5812
5813
5814
5815
5816
5817
5818
5819
5820
5821
5822
5823
5824
5825
5826
5827
5828
5829
5830
5831
5832
5833
5834
5835
5836
5837
5838
5839
5840
5841
5842
5843
5844
5845
5846
5847
5848
5849
5850
5851
5852
5853
5854
5855
5856
5857
5858
5859
5860
5861
5862
5863
5864
5865
5866
5867
5868
5869
5870
5871
5872
5873
5874
5875
5876
5877
5878
5879
5880
5881
5882
5883
5884
5885
5886
5887
5888
5889
5890
5891
5892
5893
5894
5895
5896
5897
5898
5899
5900
5901
5902
5903
5904
5905
5906
5907
5908
5909
5910
5911
5912
5913
5914
5915
5916
5917
5918
5919
5920
5921
5922
5923
5924
5925
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
5937
5938
5939
5940
5941
5942
5943
5944
5945
5946
5947
5948
5949
5950
5951
5952
5953
5954
5955
5956
5957
5958
5959
5960
5961
5962
5963
5964
5965
5966
5967
5968
5969
5970
5971
5972
5973
5974
5975
5976
5977
5978
5979
5980
5981
5982
5983
5984
5985
5986
5987
5988
5989
5990
5991
5992
5993
5994
5995
5996
5997
5998
5999
6000
6001
6002
6003
6004
6005
6006
6007
6008
6009
6010
6011
6012
6013
6014
6015
6016
6017
6018
6019
6020
6021
6022
6023
6024
6025
6026
6027
6028
6029
6030
6031
6032
6033
6034
6035
6036
6037
6038
6039
6040
6041
6042
6043
6044
6045
6046
6047
6048
6049
6050
6051
6052
6053
6054
6055
6056
6057
6058
6059
6060
6061
6062
6063
6064
6065
6066
6067
6068
6069
6070
6071
6072
6073
6074
6075
6076
6077
6078
6079
6080
6081
6082
6083
6084
6085
6086
6087
6088
6089
6090
6091
6092
6093
6094
6095
6096
6097
6098
6099
6100
6101
6102
6103
6104
6105
6106
6107
6108
6109
6110
6111
6112
6113
6114
6115
6116
6117
6118
6119
6120
6121
6122
6123
6124
6125
6126
6127
6128
6129
6130
6131
6132
6133
6134
6135
6136
6137
6138
6139
6140
6141
6142
6143
6144
6145
6146
6147
6148
6149
6150
6151
6152
6153
6154
6155
6156
6157
6158
6159
6160
6161
6162
6163
6164
6165
6166
6167
6168
6169
6170
6171
6172
6173
6174
6175
6176
6177
6178
6179
6180
6181
6182
6183
6184
6185
6186
6187
6188
6189
6190
6191
6192
6193
6194
6195
6196
6197
6198
6199
6200
6201
6202
6203
6204
6205
6206
6207
6208
6209
6210
6211
6212
6213
6214
6215
6216
6217
6218
6219
6220
6221
6222
6223
6224
6225
6226
6227
6228
6229
6230
6231
6232
6233
6234
6235
6236
6237
6238
6239
6240
6241
6242
6243
6244
6245
6246
6247
6248
6249
6250
6251
6252
6253
6254
6255
6256
6257
6258
6259
6260
6261
6262
6263
6264
6265
6266
6267
6268
6269
6270
6271
6272
6273
6274
6275
6276
6277
6278
6279
6280
6281
6282
6283
6284
6285
6286
6287
6288
6289
6290
6291
6292
6293
6294
6295
6296
6297
6298
6299
6300
6301
6302
6303
6304
6305
6306
6307
6308
6309
6310
6311
6312
6313
6314
6315
6316
6317
6318
6319
6320
6321
6322
6323
6324
6325
6326
6327
6328
6329
6330
6331
6332
6333
6334
6335
6336
6337
6338
6339
6340
6341
6342
6343
6344
6345
6346
6347
6348
6349
6350
6351
6352
6353
6354
6355
6356
6357
6358
6359
6360
6361
6362
6363
6364
6365
6366
6367
6368
6369
6370
6371
6372
6373
6374
6375
6376
6377
6378
6379
6380
6381
6382
6383
6384
6385
6386
6387
6388
6389
6390
6391
6392
6393
6394
6395
6396
6397
6398
6399
6400
6401
6402
6403
6404
6405
6406
6407
6408
6409
6410
6411
6412
6413
6414
6415
6416
6417
6418
6419
6420
6421
6422
6423
6424
6425
6426
6427
6428
6429
6430
6431
6432
6433
6434
6435
6436
6437
6438
6439
6440
6441
6442
6443
6444
6445
6446
6447
6448
6449
6450
6451
6452
6453
6454
6455
6456
6457
6458
6459
6460
6461
6462
6463
6464
6465
6466
6467
6468
6469
6470
6471
6472
6473
6474
6475
6476
6477
6478
6479
6480
6481
6482
6483
6484
6485
6486
6487
6488
6489
6490
6491
6492
6493
6494
6495
6496
6497
6498
6499
6500
6501
6502
6503
6504
6505
6506
6507
6508
6509
6510
6511
6512
6513
6514
6515
6516
6517
6518
6519
6520
6521
6522
6523
6524
6525
6526
6527
6528
6529
6530
6531
6532
6533
6534
6535
6536
6537
6538
6539
6540
6541
6542
6543
6544
6545
6546
6547
6548
6549
6550
6551
6552
6553
6554
6555
6556
6557
6558
6559
6560
6561
6562
6563
6564
6565
6566
6567
6568
6569
6570
6571
6572
6573
6574
6575
6576
6577
6578
6579
6580
6581
6582
6583
6584
6585
6586
6587
6588
6589
6590
6591
6592
6593
6594
6595
6596
6597
6598
6599
6600
6601
6602
6603
6604
6605
6606
6607
6608
6609
6610
6611
6612
6613
6614
6615
6616
6617
6618
6619
6620
6621
6622
6623
6624
6625
6626
6627
6628
6629
6630
6631
6632
6633
6634
6635
6636
6637
6638
6639
6640
6641
6642
6643
6644
6645
6646
6647
6648
6649
6650
6651
6652
6653
6654
6655
6656
6657
6658
6659
6660
6661
6662
6663
6664
6665
6666
6667
6668
6669
6670
6671
6672
6673
6674
6675
6676
6677
6678
6679
6680
6681
6682
6683
6684
6685
6686
6687
6688
6689
6690
6691
6692
6693
6694
6695
6696
6697
6698
6699
6700
6701
6702
6703
6704
6705
6706
6707
6708
6709
6710
6711
6712
6713
6714
6715
6716
6717
6718
6719
6720
6721
6722
6723
6724
6725
6726
6727
6728
6729
6730
6731
6732
6733
6734
6735
6736
6737
6738
6739
6740
6741
6742
6743
6744
6745
6746
6747
6748
6749
6750
6751
6752
6753
6754
6755
6756
6757
6758
6759
6760
6761
6762
6763
6764
6765
6766
6767
6768
6769
6770
6771
6772
6773
6774
6775
6776
6777
6778
6779
6780
6781
6782
6783
6784
6785
6786
6787
6788
6789
6790
6791
6792
6793
6794
6795
6796
6797
6798
6799
6800
6801
6802
6803
6804
6805
6806
6807
6808
6809
6810
6811
6812
6813
6814
6815
6816
6817
6818
6819
6820
6821
6822
6823
6824
6825
6826
6827
6828
6829
6830
6831
6832
6833
6834
6835
6836
6837
6838
6839
6840
6841
6842
6843
6844
6845
6846
6847
6848
6849
6850
6851
6852
6853
6854
6855
6856
6857
6858
6859
6860
6861
6862
6863
6864
6865
6866
6867
6868
6869
6870
6871
6872
6873
6874
6875
6876
6877
6878
6879
6880
6881
6882
6883
6884
6885
6886
6887
6888
6889
6890
6891
6892
6893
6894
6895
6896
6897
6898
6899
6900
6901
6902
6903
6904
6905
6906
6907
6908
6909
6910
6911
6912
6913
6914
6915
6916
6917
6918
6919
6920
6921
6922
6923
6924
6925
6926
6927
6928
6929
6930
6931
6932
6933
6934
6935
6936
6937
6938
6939
6940
6941
6942
6943
6944
6945
6946
6947
6948
6949
6950
6951
6952
6953
6954
6955
6956
6957
6958
6959
6960
6961
6962
6963
6964
6965
6966
6967
6968
6969
6970
6971
6972
6973
6974
6975
6976
6977
6978
6979
6980
6981
6982
6983
6984
6985
6986
6987
6988
6989
6990
6991
6992
6993
6994
6995
6996
6997
6998
6999
7000
7001
7002
7003
7004
7005
7006
7007
7008
7009
7010
7011
7012
7013
7014
7015
7016
7017
7018
7019
7020
7021
7022
7023
7024
7025
7026
7027
7028
7029
7030
7031
7032
7033
7034
7035
7036
7037
7038
7039
7040
7041
7042
7043
7044
7045
7046
7047
7048
7049
7050
7051
7052
7053
7054
7055
7056
7057
7058
7059
7060
7061
7062
7063
7064
7065
7066
7067
7068
7069
7070
7071
7072
7073
7074
7075
7076
7077
7078
7079
7080
7081
7082
7083
7084
7085
7086
7087
7088
7089
7090
7091
7092
7093
7094
7095
7096
7097
7098
7099
7100
7101
7102
7103
7104
7105
7106
7107
7108
7109
7110
7111
7112
7113
7114
7115
7116
7117
7118
7119
7120
7121
7122
7123
7124
7125
7126
7127
7128
7129
7130
7131
7132
7133
7134
7135
7136
7137
7138
7139
7140
7141
7142
7143
7144
7145
7146
7147
7148
7149
7150
7151
7152
7153
7154
7155
7156
7157
7158
7159
7160
7161
7162
7163
7164
7165
7166
7167
7168
7169
7170
7171
7172
7173
7174
7175
7176
7177
7178
7179
7180
7181
7182
7183
7184
7185
7186
7187
7188
7189
7190
7191
7192
7193
7194
7195
7196
7197
7198
7199
7200
7201
7202
7203
7204
7205
7206
7207
7208
7209
7210
7211
7212
7213
7214
7215
7216
7217
7218
7219
7220
7221
7222
7223
7224
7225
7226
7227
7228
7229
7230
7231
7232
7233
7234
7235
7236
7237
7238
7239
7240
7241
7242
7243
7244
7245
7246
7247
7248
7249
7250
7251
7252
7253
7254
7255
7256
7257
7258
7259
7260
7261
7262
7263
7264
7265
7266
7267
7268
7269
7270
7271
7272
7273
7274
7275
7276
7277
7278
7279
7280
7281
7282
7283
7284
7285
7286
7287
7288
7289
7290
7291
7292
7293
7294
7295
7296
7297
7298
7299
7300
7301
7302
7303
7304
7305
7306
7307
7308
7309
7310
7311
7312
7313
7314
7315
7316
7317
7318
7319
7320
7321
7322
7323
7324
7325
7326
7327
7328
7329
7330
7331
7332
7333
7334
7335
7336
7337
7338
7339
7340
7341
7342
7343
7344
7345
7346
7347
7348
7349
7350
7351
7352
7353
7354
7355
7356
7357
7358
7359
7360
7361
7362
7363
7364
7365
7366
7367
7368
7369
7370
7371
7372
7373
7374
7375
7376
7377
7378
7379
7380
7381
7382
7383
7384
7385
7386
7387
7388
7389
7390
7391
7392
7393
7394
7395
7396
7397
7398
7399
7400
7401
7402
7403
7404
7405
7406
7407
7408
7409
7410
7411
7412
7413
7414
7415
7416
7417
7418
7419
7420
7421
7422
7423
7424
7425
7426
7427
7428
7429
7430
7431
7432
7433
7434
7435
7436
7437
7438
7439
7440
7441
7442
7443
7444
7445
7446
7447
7448
7449
7450
7451
7452
7453
7454
7455
7456
7457
7458
7459
7460
7461
7462
7463
7464
7465
7466
7467
7468
7469
7470
7471
7472
7473
7474
7475
7476
7477
7478
7479
7480
7481
7482
7483
7484
7485
7486
7487
7488
7489
7490
7491
7492
7493
7494
7495
7496
7497
7498
7499
7500
7501
7502
7503
7504
7505
7506
7507
7508
7509
7510
7511
7512
7513
7514
7515
7516
7517
7518
7519
7520
7521
7522
7523
7524
7525
7526
7527
7528
7529
7530
7531
7532
7533
7534
7535
7536
7537
7538
7539
7540
7541
7542
7543
7544
7545
7546
7547
7548
7549
7550
7551
7552
7553
7554
7555
7556
7557
7558
7559
7560
7561
7562
7563
7564
7565
7566
7567
7568
7569
7570
7571
7572
7573
7574
7575
7576
7577
7578
7579
7580
7581
7582
7583
7584
7585
7586
7587
7588
7589
7590
7591
7592
7593
7594
7595
7596
7597
7598
7599
7600
7601
7602
7603
7604
7605
7606
7607
7608
7609
7610
7611
7612
7613
7614
7615
7616
7617
7618
7619
7620
7621
7622
7623
7624
7625
7626
7627
7628
7629
7630
7631
7632
7633
7634
7635
7636
7637
7638
7639
7640
7641
7642
7643
7644
7645
7646
7647
7648
7649
7650
7651
7652
7653
7654
7655
7656
7657
7658
7659
7660
7661
7662
7663
7664
7665
7666
7667
7668
7669
7670
7671
7672
7673
7674
7675
7676
7677
7678
7679
7680
7681
7682
7683
7684
7685
7686
7687
7688
7689
7690
7691
7692
7693
7694
7695
7696
7697
7698
7699
7700
7701
7702
7703
7704
7705
7706
7707
7708
7709
7710
7711
7712
7713
7714
7715
7716
7717
7718
7719
7720
7721
7722
7723
7724
7725
7726
7727
7728
7729
7730
7731
7732
7733
7734
7735
7736
7737
7738
7739
7740
7741
7742
7743
7744
7745
7746
7747
7748
7749
7750
7751
7752
7753
7754
7755
7756
7757
7758
7759
7760
7761
7762
7763
7764
7765
7766
7767
7768
7769
7770
7771
7772
7773
7774
7775
7776
7777
7778
7779
7780
7781
7782
7783
7784
7785
7786
7787
7788
7789
7790
7791
7792
7793
7794
7795
7796
7797
7798
7799
7800
7801
7802
7803
7804
7805
7806
7807
7808
7809
7810
7811
7812
7813
7814
7815
7816
7817
7818
7819
7820
7821
7822
7823
7824
7825
7826
7827
7828
7829
7830
7831
7832
7833
7834
7835
7836
7837
7838
7839
7840
7841
7842
7843
7844
7845
7846
7847
7848
7849
7850
7851
7852
7853
7854
7855
7856
7857
7858
7859
7860
7861
7862
7863
7864
7865
7866
7867
7868
7869
7870
7871
7872
7873
7874
7875
7876
7877
7878
7879
7880
7881
7882
7883
7884
7885
7886
7887
7888
7889
7890
7891
7892
7893
7894
7895
7896
7897
7898
7899
7900
7901
7902
7903
7904
7905
7906
7907
7908
7909
7910
7911
7912
7913
7914
7915
7916
7917
7918
7919
7920
7921
7922
7923
7924
7925
7926
7927
7928
7929
7930
7931
7932
7933
7934
7935
7936
7937
7938
7939
7940
7941
7942
7943
7944
7945
7946
7947
7948
7949
7950
7951
7952
7953
7954
7955
7956
7957
7958
7959
7960
7961
7962
7963
7964
7965
7966
7967
7968
7969
7970
7971
7972
7973
7974
7975
7976
7977
7978
7979
7980
7981
7982
7983
7984
7985
7986
7987
7988
7989
7990
7991
7992
7993
7994
7995
7996
7997
7998
7999
8000
8001
8002
8003
8004
8005
8006
8007
8008
8009
8010
8011
8012
8013
8014
8015
8016
8017
8018
8019
8020
8021
8022
8023
8024
8025
8026
8027
8028
8029
8030
8031
8032
8033
8034
8035
8036
8037
8038
8039
8040
8041
8042
8043
8044
8045
8046
8047
8048
8049
8050
8051
8052
8053
8054
8055
8056
8057
8058
8059
8060
8061
8062
8063
8064
8065
8066
8067
8068
8069
8070
8071
8072
8073
8074
8075
8076
8077
8078
8079
8080
8081
8082
8083
8084
8085
8086
8087
8088
8089
8090
8091
8092
8093
8094
8095
8096
8097
8098
8099
8100
8101
8102
8103
8104
8105
8106
8107
8108
8109
8110
8111
8112
8113
8114
8115
8116
8117
8118
8119
8120
8121
8122
8123
8124
8125
8126
8127
8128
8129
8130
8131
8132
8133
8134
8135
8136
8137
8138
8139
8140
8141
8142
8143
8144
8145
8146
8147
8148
8149
8150
8151
8152
8153
8154
8155
8156
8157
8158
8159
8160
8161
8162
8163
8164
8165
8166
8167
8168
8169
8170
8171
8172
8173
8174
8175
8176
8177
8178
8179
8180
8181
8182
8183
8184
8185
8186
8187
8188
8189
8190
8191
8192
8193
8194
8195
8196
8197
8198
8199
8200
8201
8202
8203
8204
8205
8206
8207
8208
8209
8210
8211
8212
8213
8214
8215
8216
8217
8218
8219
8220
8221
8222
8223
8224
8225
8226
8227
8228
8229
8230
8231
8232
8233
8234
8235
8236
8237
8238
8239
8240
8241
8242
8243
8244
8245
8246
8247
8248
8249
8250
8251
8252
8253
8254
8255
8256
8257
8258
8259
8260
8261
8262
8263
8264
8265
8266
8267
8268
8269
8270
8271
8272
8273
8274
8275
8276
8277
8278
8279
8280
8281
8282
8283
8284
8285
8286
8287
8288
8289
8290
8291
8292
8293
8294
8295
8296
8297
8298
8299
8300
8301
8302
8303
8304
8305
8306
8307
8308
8309
8310
8311
8312
8313
8314
8315
8316
8317
8318
8319
8320
8321
8322
8323
8324
8325
8326
8327
8328
8329
8330
8331
8332
8333
8334
8335
8336
8337
8338
8339
8340
8341
8342
8343
8344
8345
8346
8347
8348
8349
8350
8351
8352
8353
8354
8355
8356
8357
8358
8359
8360
8361
8362
8363
8364
8365
8366
8367
8368
8369
8370
8371
8372
8373
8374
8375
8376
8377
8378
8379
8380
8381
8382
8383
8384
8385
8386
8387
8388
8389
8390
8391
8392
8393
8394
8395
8396
8397
8398
8399
8400
8401
8402
8403
8404
8405
8406
8407
8408
8409
8410
8411
8412
8413
8414
8415
8416
8417
8418
8419
8420
8421
8422
8423
8424
8425
8426
8427
8428
8429
8430
8431
8432
8433
8434
8435
8436
8437
8438
8439
8440
8441
8442
8443
8444
8445
8446
8447
8448
8449
8450
8451
8452
8453
8454
8455
8456
8457
8458
8459
8460
8461
8462
8463
8464
8465
8466
8467
8468
8469
8470
8471
8472
8473
8474
8475
8476
8477
8478
8479
8480
8481
8482
8483
8484
8485
8486
8487
8488
8489
8490
8491
8492
8493
8494
8495
8496
8497
8498
8499
8500
8501
8502
8503
8504
8505
8506
8507
8508
8509
8510
8511
8512
8513
8514
8515
8516
8517
8518
8519
8520
8521
8522
8523
8524
8525
8526
8527
8528
8529
8530
8531
8532
8533
8534
8535
8536
8537
8538
8539
8540
8541
8542
8543
8544
8545
8546
8547
8548
8549
8550
8551
8552
8553
8554
8555
8556
8557
8558
8559
8560
8561
8562
8563
8564
8565
8566
8567
8568
8569
8570
8571
8572
8573
8574
8575
8576
8577
8578
8579
8580
8581
8582
8583
8584
8585
8586
8587
8588
8589
8590
8591
8592
8593
8594
8595
8596
8597
8598
8599
8600
8601
8602
8603
8604
8605
8606
8607
8608
8609
8610
8611
8612
8613
8614
8615
8616
8617
8618
8619
8620
8621
8622
8623
8624
8625
8626
8627
8628
8629
8630
8631
8632
8633
8634
8635
8636
8637
8638
8639
8640
8641
8642
8643
8644
8645
8646
8647
8648
8649
8650
8651
8652
8653
8654
8655
8656
8657
8658
8659
8660
8661
8662
8663
8664
8665
8666
8667
8668
8669
8670
8671
8672
8673
8674
8675
8676
8677
8678
8679
8680
8681
8682
8683
8684
8685
8686
8687
8688
8689
8690
8691
8692
8693
8694
8695
8696
8697
8698
8699
8700
8701
8702
8703
8704
8705
8706
8707
8708
8709
8710
8711
8712
8713
8714
8715
8716
8717
8718
8719
8720
8721
8722
8723
8724
8725
8726
8727
8728
8729
8730
8731
8732
8733
8734
8735
8736
8737
8738
8739
8740
8741
8742
8743
8744
8745
8746
8747
8748
8749
8750
8751
8752
8753
8754
8755
8756
8757
8758
8759
8760
8761
8762
8763
8764
8765
8766
8767
8768
8769
8770
8771
8772
8773
8774
8775
8776
8777
8778
8779
8780
8781
8782
8783
8784
8785
8786
8787
8788
8789
8790
8791
8792
8793
8794
8795
8796
8797
8798
8799
8800
8801
8802
8803
8804
8805
8806
8807
8808
8809
8810
8811
8812
8813
8814
8815
8816
8817
8818
8819
8820
8821
8822
8823
8824
8825
8826
8827
8828
8829
8830
8831
8832
8833
8834
8835
8836
8837
8838
8839
8840
8841
8842
8843
8844
8845
8846
8847
8848
8849
8850
8851
8852
8853
8854
8855
8856
8857
8858
8859
8860
8861
8862
8863
8864
8865
8866
8867
8868
8869
8870
8871
8872
8873
8874
8875
8876
8877
8878
8879
8880
8881
8882
8883
8884
8885
8886
8887
8888
8889
8890
8891
8892
8893
8894
8895
8896
8897
8898
8899
8900
8901
8902
8903
8904
8905
8906
8907
8908
8909
8910
8911
8912
8913
8914
8915
8916
8917
8918
8919
8920
8921
8922
8923
8924
8925
8926
8927
8928
8929
8930
8931
8932
8933
8934
8935
8936
8937
8938
8939
8940
8941
8942
8943
8944
8945
8946
8947
8948
8949
8950
8951
8952
8953
8954
8955
8956
8957
8958
8959
8960
8961
8962
8963
8964
8965
8966
8967
8968
8969
8970
8971
8972
8973
8974
8975
8976
8977
8978
8979
8980
8981
8982
8983
8984
8985
8986
8987
8988
8989
8990
8991
8992
8993
8994
8995
8996
8997
8998
8999
9000
9001
9002
9003
9004
9005
9006
9007
9008
9009
9010
9011
9012
9013
9014
9015
9016
9017
9018
9019
9020
9021
9022
9023
9024
9025
9026
9027
9028
9029
9030
9031
9032
9033
9034
9035
9036
9037
9038
9039
9040
9041
9042
9043
9044
9045
9046
9047
9048
9049
9050
9051
9052
9053
9054
9055
9056
9057
9058
9059
9060
9061
9062
9063
9064
9065
9066
9067
9068
9069
9070
9071
9072
9073
9074
9075
9076
9077
9078
9079
9080
9081
9082
9083
9084
9085
9086
9087
9088
9089
9090
9091
9092
9093
9094
9095
9096
9097
9098
9099
9100
9101
9102
9103
9104
9105
9106
9107
9108
9109
9110
9111
9112
9113
9114
9115
9116
9117
9118
9119
9120
9121
9122
9123
9124
9125
9126
9127
9128
9129
9130
9131
9132
9133
9134
9135
9136
9137
9138
9139
9140
9141
9142
9143
9144
9145
9146
9147
9148
9149
9150
9151
9152
9153
9154
9155
9156
9157
9158
9159
9160
9161
9162
9163
9164
9165
9166
9167
9168
9169
9170
9171
9172
9173
9174
9175
9176
9177
9178
9179
9180
9181
9182
9183
9184
9185
9186
9187
9188
9189
9190
9191
9192
9193
9194
9195
9196
9197
9198
9199
9200
9201
9202
9203
9204
9205
9206
9207
9208
9209
9210
9211
9212
9213
9214
9215
9216
9217
9218
9219
9220
9221
9222
9223
9224
9225
9226
9227
9228
9229
9230
9231
9232
9233
9234
9235
9236
9237
9238
9239
9240
9241
9242
9243
9244
9245
9246
9247
9248
9249
9250
9251
9252
9253
9254
9255
9256
9257
9258
9259
9260
9261
9262
9263
9264
9265
9266
9267
9268
9269
9270
9271
9272
9273
9274
9275
9276
9277
9278
9279
9280
9281
9282
9283
9284
9285
9286
9287
9288
9289
9290
9291
9292
9293
9294
9295
9296
9297
9298
9299
9300
9301
9302
9303
9304
9305
9306
9307
9308
9309
9310
9311
9312
9313
9314
9315
9316
9317
9318
9319
9320
9321
9322
9323
9324
9325
9326
9327
9328
9329
9330
9331
9332
9333
9334
9335
9336
9337
9338
9339
9340
9341
9342
9343
9344
9345
9346
9347
9348
9349
9350
9351
9352
9353
9354
9355
9356
9357
9358
9359
9360
9361
9362
9363
9364
9365
9366
9367
9368
9369
9370
9371
9372
9373
9374
9375
9376
9377
9378
9379
9380
9381
9382
9383
9384
9385
9386
9387
9388
9389
9390
9391
9392
9393
9394
9395
9396
9397
9398
9399
9400
9401
9402
9403
9404
9405
9406
9407
9408
9409
9410
9411
9412
9413
9414
9415
9416
9417
9418
9419
9420
9421
9422
9423
9424
9425
9426
9427
9428
9429
9430
9431
9432
9433
9434
9435
9436
9437
9438
9439
9440
9441
9442
9443
9444
9445
9446
9447
9448
9449
9450
9451
9452
9453
9454
9455
9456
9457
9458
9459
9460
9461
9462
9463
9464
9465
9466
9467
9468
9469
9470
9471
9472
9473
9474
9475
9476
9477
9478
9479
9480
9481
9482
9483
9484
9485
9486
9487
9488
9489
9490
9491
9492
9493
9494
9495
9496
9497
9498
9499
9500
9501
9502
9503
9504
9505
9506
9507
9508
9509
9510
9511
9512
9513
9514
9515
9516
9517
9518
9519
9520
9521
9522
9523
9524
9525
9526
9527
9528
9529
9530
9531
9532
9533
9534
9535
9536
9537
9538
9539
9540
9541
9542
9543
9544
9545
9546
9547
9548
9549
9550
9551
9552
9553
9554
9555
9556
9557
9558
9559
9560
9561
9562
9563
9564
9565
9566
9567
9568
9569
9570
9571
9572
9573
9574
9575
9576
9577
9578
9579
9580
9581
9582
9583
9584
9585
9586
9587
9588
9589
9590
9591
9592
9593
9594
9595
9596
9597
9598
9599
9600
9601
9602
9603
9604
9605
9606
9607
9608
9609
9610
9611
9612
9613
9614
9615
9616
9617
9618
9619
9620
9621
9622
9623
9624
9625
9626
9627
9628
9629
9630
9631
9632
9633
9634
9635
9636
9637
9638
9639
9640
9641
9642
9643
9644
9645
9646
9647
9648
9649
9650
9651
9652
9653
9654
9655
9656
9657
9658
9659
9660
9661
9662
9663
9664
9665
9666
9667
9668
9669
9670
9671
9672
9673
9674
9675
9676
9677
9678
9679
9680
9681
9682
9683
9684
9685
9686
9687
9688
9689
9690
9691
9692
9693
9694
9695
9696
9697
9698
9699
9700
9701
9702
9703
9704
9705
9706
9707
9708
9709
9710
9711
9712
9713
9714
9715
9716
9717
9718
9719
9720
9721
9722
9723
9724
9725
9726
9727
9728
9729
9730
9731
9732
9733
9734
9735
9736
9737
9738
9739
9740
9741
9742
9743
9744
9745
9746
9747
9748
9749
9750
9751
9752
9753
9754
9755
9756
9757
9758
9759
9760
9761
9762
9763
9764
9765
9766
9767
9768
9769
9770
9771
9772
9773
9774
9775
9776
9777
9778
9779
9780
9781
9782
9783
9784
9785
9786
9787
9788
9789
9790
9791
9792
9793
9794
9795
9796
9797
9798
9799
9800
9801
9802
9803
9804
9805
9806
9807
9808
9809
9810
9811
9812
9813
9814
9815
9816
9817
9818
9819
9820
9821
9822
9823
9824
9825
9826
9827
9828
9829
9830
9831
9832
9833
9834
9835
9836
9837
9838
9839
9840
9841
9842
9843
9844
9845
9846
9847
9848
9849
9850
9851
9852
9853
9854
9855
9856
9857
9858
9859
9860
9861
9862
9863
9864
9865
9866
9867
9868
9869
9870
9871
9872
9873
9874
9875
9876
9877
9878
9879
9880
9881
9882
9883
9884
9885
9886
9887
9888
9889
9890
9891
9892
9893
9894
9895
9896
9897
9898
9899
9900
9901
9902
9903
9904
9905
9906
9907
9908
9909
9910
9911
9912
9913
9914
9915
9916
9917
9918
9919
9920
9921
9922
9923
9924
9925
9926
9927
9928
9929
9930
9931
9932
9933
9934
9935
9936
9937
9938
9939
9940
9941
9942
9943
9944
9945
9946
9947
9948
9949
9950
9951
9952
9953
9954
9955
9956
9957
9958
9959
9960
9961
9962
9963
9964
9965
9966
9967
9968
9969
9970
9971
9972
9973
9974
9975
9976
9977
9978
9979
9980
9981
9982
9983
9984
9985
9986
9987
9988
9989
9990
9991
9992
9993
9994
9995
9996
9997
9998
9999
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
10011
10012
10013
10014
10015
10016
10017
10018
10019
10020
10021
10022
10023
10024
10025
10026
10027
10028
10029
10030
10031
10032
10033
10034
10035
10036
10037
10038
10039
10040
10041
10042
10043
10044
10045
10046
10047
10048
10049
10050
10051
10052
10053
10054
10055
10056
10057
10058
10059
10060
10061
10062
10063
10064
10065
10066
10067
10068
10069
10070
10071
10072
10073
10074
10075
10076
10077
10078
10079
10080
10081
10082
10083
10084
10085
10086
10087
10088
10089
10090
10091
10092
10093
10094
10095
10096
10097
10098
10099
10100
10101
10102
10103
10104
10105
10106
10107
10108
10109
10110
10111
10112
10113
10114
10115
10116
10117
10118
10119
10120
10121
10122
10123
10124
10125
10126
10127
10128
10129
10130
10131
10132
10133
10134
10135
10136
10137
10138
10139
10140
10141
10142
10143
10144
10145
10146
10147
10148
10149
10150
10151
10152
10153
10154
10155
10156
10157
10158
10159
10160
10161
10162
10163
10164
10165
10166
10167
10168
10169
10170
10171
10172
10173
10174
10175
10176
10177
10178
10179
10180
10181
10182
10183
10184
10185
10186
10187
10188
10189
10190
10191
10192
10193
10194
10195
10196
10197
10198
10199
10200
10201
10202
10203
10204
10205
10206
10207
10208
10209
10210
10211
10212
10213
10214
10215
10216
10217
10218
10219
10220
10221
10222
10223
10224
10225
10226
10227
10228
10229
10230
10231
10232
10233
10234
10235
10236
10237
10238
10239
10240
10241
10242
10243
10244
10245
10246
10247
10248
10249
10250
10251
10252
10253
10254
10255
10256
10257
10258
10259
10260
10261
10262
10263
10264
10265
10266
10267
10268
10269
10270
10271
10272
10273
10274
10275
10276
10277
10278
10279
10280
10281
10282
10283
10284
10285
10286
10287
10288
10289
10290
10291
10292
10293
10294
10295
10296
10297
10298
10299
10300
10301
10302
10303
10304
10305
10306
10307
10308
10309
10310
10311
10312
10313
10314
10315
10316
10317
10318
10319
10320
10321
10322
10323
10324
10325
10326
10327
10328
10329
10330
10331
10332
10333
10334
10335
10336
10337
10338
10339
10340
10341
10342
10343
10344
10345
10346
10347
10348
10349
10350
10351
10352
10353
10354
10355
10356
10357
10358
10359
10360
10361
10362
10363
10364
10365
10366
10367
10368
10369
10370
10371
10372
10373
10374
10375
10376
10377
10378
10379
10380
10381
10382
10383
10384
10385
10386
10387
10388
10389
10390
10391
10392
10393
10394
10395
10396
10397
10398
10399
10400
10401
10402
10403
10404
10405
10406
10407
10408
10409
10410
10411
10412
10413
10414
10415
10416
10417
10418
10419
10420
10421
10422
10423
10424
10425
10426
10427
10428
10429
10430
10431
10432
10433
10434
10435
10436
10437
10438
10439
10440
10441
10442
10443
10444
10445
10446
10447
10448
10449
10450
10451
10452
10453
10454
10455
10456
10457
10458
10459
10460
10461
10462
10463
10464
10465
10466
10467
10468
10469
10470
10471
10472
10473
10474
10475
10476
10477
10478
10479
10480
10481
10482
10483
10484
10485
10486
10487
10488
10489
10490
10491
10492
10493
10494
10495
10496
10497
10498
10499
10500
10501
10502
10503
10504
10505
10506
10507
10508
10509
10510
10511
10512
10513
10514
10515
10516
10517
10518
10519
10520
10521
10522
10523
10524
10525
10526
10527
10528
10529
10530
10531
10532
10533
10534
10535
10536
10537
10538
10539
10540
10541
10542
10543
10544
10545
10546
10547
10548
10549
10550
10551
10552
10553
10554
10555
10556
10557
10558
10559
10560
10561
10562
10563
10564
10565
10566
10567
10568
10569
10570
10571
10572
10573
10574
10575
10576
10577
10578
10579
10580
10581
10582
10583
10584
10585
10586
10587
10588
10589
10590
10591
10592
10593
10594
10595
10596
10597
10598
10599
10600
10601
10602
10603
10604
10605
10606
10607
10608
10609
10610
10611
10612
10613
10614
10615
10616
10617
10618
10619
10620
10621
10622
10623
10624
10625
10626
10627
10628
10629
10630
10631
10632
10633
10634
10635
10636
10637
10638
10639
10640
10641
10642
10643
10644
10645
10646
10647
10648
10649
10650
10651
10652
10653
10654
10655
10656
10657
10658
10659
10660
10661
10662
10663
10664
10665
10666
10667
10668
10669
10670
10671
10672
10673
10674
10675
10676
10677
10678
10679
10680
10681
10682
10683
10684
10685
10686
10687
10688
10689
10690
10691
10692
10693
10694
10695
10696
10697
10698
10699
10700
10701
10702
10703
10704
10705
10706
10707
10708
10709
10710
10711
10712
10713
10714
10715
10716
10717
10718
10719
10720
10721
10722
10723
10724
10725
10726
10727
10728
10729
10730
10731
10732
10733
10734
10735
10736
10737
10738
10739
10740
10741
10742
10743
10744
10745
10746
10747
10748
10749
10750
10751
10752
10753
10754
10755
10756
10757
10758
10759
10760
10761
10762
10763
10764
10765
10766
10767
10768
10769
10770
10771
10772
10773
10774
10775
10776
10777
10778
10779
10780
10781
10782
10783
10784
10785
10786
10787
10788
10789
10790
10791
10792
10793
10794
10795
10796
10797
10798
10799
10800
10801
10802
10803
10804
10805
10806
10807
10808
10809
10810
10811
10812
10813
10814
10815
10816
10817
10818
10819
10820
10821
10822
10823
10824
10825
10826
10827
10828
10829
10830
10831
10832
10833
10834
10835
10836
10837
10838
10839
10840
10841
10842
10843
10844
10845
10846
10847
10848
10849
10850
10851
10852
10853
10854
10855
10856
10857
10858
10859
10860
10861
10862
10863
10864
10865
10866
10867
10868
10869
10870
10871
10872
10873
10874
10875
10876
10877
10878
10879
10880
10881
10882
10883
10884
10885
10886
10887
10888
10889
10890
10891
10892
10893
10894
10895
10896
10897
10898
10899
10900
10901
10902
10903
10904
10905
10906
10907
10908
10909
10910
10911
10912
10913
10914
10915
10916
10917
10918
10919
10920
10921
10922
10923
10924
10925
10926
10927
10928
10929
10930
10931
10932
10933
10934
10935
10936
10937
10938
10939
10940
10941
10942
10943
10944
10945
10946
10947
10948
10949
10950
10951
10952
10953
10954
10955
10956
10957
10958
10959
10960
10961
10962
10963
10964
10965
10966
10967
10968
10969
10970
10971
10972
10973
10974
10975
10976
10977
10978
10979
10980
10981
10982
10983
10984
10985
10986
10987
10988
10989
10990
10991
10992
10993
10994
10995
10996
10997
10998
10999
11000
11001
11002
11003
11004
11005
11006
11007
11008
11009
11010
11011
11012
11013
11014
11015
11016
11017
11018
11019
11020
11021
11022
11023
11024
11025
11026
11027
11028
11029
11030
11031
11032
11033
11034
11035
11036
11037
11038
11039
11040
11041
11042
11043
11044
11045
11046
11047
11048
11049
11050
11051
11052
11053
11054
11055
11056
11057
11058
11059
11060
11061
11062
11063
11064
11065
11066
11067
11068
11069
11070
11071
11072
11073
11074
11075
11076
11077
11078
11079
11080
11081
11082
11083
11084
11085
11086
11087
11088
11089
11090
11091
11092
11093
11094
11095
11096
11097
11098
11099
11100
11101
11102
11103
11104
11105
11106
11107
11108
11109
11110
11111
11112
11113
11114
11115
11116
11117
11118
11119
11120
11121
11122
11123
11124
11125
11126
11127
11128
11129
11130
11131
11132
11133
11134
11135
11136
11137
11138
11139
11140
11141
11142
11143
11144
11145
11146
11147
11148
11149
11150
11151
11152
11153
11154
11155
11156
11157
11158
11159
11160
11161
11162
11163
11164
11165
11166
11167
11168
11169
11170
11171
11172
11173
11174
11175
11176
11177
11178
11179
11180
11181
11182
11183
11184
11185
11186
11187
11188
11189
11190
11191
11192
11193
11194
11195
11196
11197
11198
11199
11200
11201
11202
11203
11204
11205
11206
11207
11208
11209
11210
11211
11212
11213
11214
11215
11216
11217
11218
11219
11220
11221
11222
11223
11224
11225
11226
11227
11228
11229
11230
11231
11232
11233
11234
11235
11236
11237
11238
11239
11240
11241
11242
11243
11244
11245
11246
11247
11248
11249
11250
11251
11252
11253
11254
11255
11256
11257
11258
11259
11260
11261
11262
11263
11264
11265
11266
11267
11268
11269
11270
11271
11272
11273
11274
11275
11276
11277
11278
11279
11280
11281
11282
11283
11284
11285
11286
11287
11288
11289
11290
11291
11292
11293
11294
11295
11296
11297
11298
11299
11300
11301
11302
11303
11304
11305
11306
11307
11308
11309
11310
11311
11312
11313
11314
11315
11316
11317
11318
11319
11320
11321
11322
11323
11324
11325
11326
11327
11328
11329
11330
11331
11332
11333
11334
11335
11336
11337
11338
11339
11340
11341
11342
11343
11344
11345
11346
11347
11348
11349
11350
11351
11352
11353
11354
11355
11356
11357
11358
11359
11360
11361
11362
11363
11364
11365
11366
11367
11368
11369
11370
11371
11372
11373
11374
11375
11376
11377
11378
11379
11380
11381
11382
11383
11384
11385
11386
11387
11388
11389
11390
11391
11392
11393
11394
11395
11396
11397
11398
11399
11400
11401
11402
11403
11404
11405
11406
11407
11408
11409
11410
11411
11412
11413
11414
11415
11416
11417
11418
11419
11420
11421
11422
11423
11424
11425
11426
11427
11428
11429
11430
11431
11432
11433
11434
11435
11436
11437
11438
11439
11440
11441
11442
11443
11444
11445
11446
11447
11448
11449
11450
11451
11452
11453
11454
11455
11456
11457
11458
11459
11460
11461
11462
11463
11464
11465
11466
11467
11468
11469
11470
11471
11472
11473
11474
11475
11476
11477
11478
11479
11480
11481
11482
11483
11484
11485
11486
11487
11488
11489
11490
11491
11492
11493
11494
11495
11496
11497
11498
11499
11500
11501
11502
11503
11504
11505
11506
11507
11508
11509
11510
11511
11512
11513
11514
11515
11516
11517
11518
11519
11520
11521
11522
11523
11524
11525
11526
11527
11528
11529
11530
11531
11532
11533
11534
11535
11536
11537
11538
11539
11540
11541
11542
11543
11544
11545
11546
11547
11548
11549
11550
11551
11552
11553
11554
11555
11556
11557
11558
11559
11560
11561
11562
11563
11564
11565
11566
11567
11568
11569
11570
11571
11572
11573
11574
11575
11576
11577
11578
11579
11580
11581
11582
11583
11584
11585
11586
11587
11588
11589
11590
11591
11592
11593
11594
11595
11596
11597
11598
11599
11600
11601
11602
11603
11604
11605
11606
11607
11608
11609
11610
11611
11612
11613
11614
11615
11616
11617
11618
11619
11620
11621
11622
11623
11624
11625
11626
11627
11628
11629
11630
11631
11632
11633
11634
11635
11636
11637
11638
11639
11640
11641
11642
11643
11644
11645
11646
11647
11648
11649
11650
11651
11652
11653
11654
11655
11656
11657
11658
11659
11660
11661
11662
11663
11664
11665
11666
11667
11668
11669
11670
11671
11672
11673
11674
11675
11676
11677
11678
11679
11680
11681
11682
11683
11684
11685
11686
11687
11688
11689
11690
11691
11692
11693
11694
11695
11696
11697
11698
11699
11700
11701
11702
11703
11704
11705
11706
11707
11708
11709
11710
11711
11712
11713
11714
11715
11716
11717
11718
11719
11720
11721
11722
11723
11724
11725
11726
11727
11728
11729
11730
11731
11732
11733
11734
11735
11736
11737
11738
11739
11740
11741
11742
11743
11744
11745
11746
11747
11748
11749
11750
11751
11752
11753
11754
11755
11756
11757
11758
11759
11760
11761
11762
11763
11764
11765
11766
11767
11768
11769
11770
11771
11772
11773
11774
11775
11776
11777
11778
11779
11780
11781
11782
11783
11784
11785
11786
11787
11788
11789
11790
11791
11792
11793
11794
11795
11796
11797
11798
11799
11800
11801
11802
11803
11804
11805
11806
11807
11808
11809
11810
11811
11812
11813
11814
11815
11816
11817
11818
11819
11820
11821
11822
11823
11824
11825
11826
11827
11828
11829
11830
11831
11832
11833
11834
11835
11836
11837
11838
11839
11840
11841
11842
11843
11844
11845
11846
11847
11848
11849
11850
11851
11852
11853
11854
11855
11856
11857
11858
11859
11860
11861
11862
11863
11864
11865
11866
11867
11868
11869
11870
11871
11872
11873
11874
11875
11876
11877
11878
11879
11880
11881
11882
11883
11884
11885
11886
11887
11888
11889
11890
11891
11892
11893
11894
11895
11896
11897
11898
11899
11900
11901
11902
11903
11904
11905
11906
11907
11908
11909
11910
11911
11912
11913
11914
11915
11916
11917
11918
11919
11920
11921
11922
11923
11924
11925
11926
11927
11928
11929
11930
11931
11932
11933
11934
11935
11936
11937
11938
11939
11940
11941
11942
11943
11944
11945
11946
11947
11948
11949
11950
11951
11952
11953
11954
11955
11956
11957
11958
11959
11960
11961
11962
11963
11964
11965
11966
11967
11968
11969
11970
11971
11972
11973
11974
11975
11976
11977
11978
11979
11980
11981
11982
11983
11984
11985
11986
11987
11988
11989
11990
11991
11992
11993
11994
11995
11996
11997
11998
11999
12000
12001
12002
12003
12004
12005
12006
12007
12008
12009
12010
12011
12012
12013
12014
12015
12016
12017
12018
12019
12020
12021
12022
12023
12024
12025
12026
12027
12028
12029
12030
12031
12032
12033
12034
12035
12036
12037
12038
12039
12040
12041
12042
12043
12044
12045
12046
12047
12048
12049
12050
12051
12052
12053
12054
12055
12056
12057
12058
12059
12060
12061
12062
12063
12064
12065
12066
12067
12068
12069
12070
12071
12072
12073
12074
12075
12076
12077
12078
12079
12080
12081
12082
12083
12084
12085
12086
12087
12088
12089
12090
12091
12092
12093
12094
12095
12096
12097
12098
12099
12100
12101
12102
12103
12104
12105
12106
12107
12108
12109
12110
12111
12112
12113
12114
12115
12116
12117
12118
12119
12120
12121
12122
12123
12124
12125
12126
12127
12128
12129
12130
12131
12132
12133
12134
12135
12136
12137
12138
12139
12140
12141
12142
12143
12144
12145
12146
12147
12148
12149
12150
12151
12152
12153
12154
12155
12156
12157
12158
12159
12160
12161
12162
12163
12164
12165
12166
12167
12168
12169
12170
12171
12172
12173
12174
12175
12176
12177
12178
12179
12180
12181
12182
12183
12184
12185
12186
12187
12188
12189
12190
12191
12192
12193
12194
12195
12196
12197
12198
12199
12200
12201
12202
12203
12204
12205
12206
12207
12208
12209
12210
12211
12212
12213
12214
12215
12216
12217
12218
12219
12220
12221
12222
12223
12224
12225
12226
12227
12228
12229
12230
12231
12232
12233
12234
12235
12236
12237
12238
12239
12240
12241
12242
12243
12244
12245
12246
12247
12248
12249
12250
12251
12252
12253
12254
12255
12256
12257
12258
12259
12260
12261
12262
12263
12264
12265
12266
12267
12268
12269
12270
12271
12272
12273
12274
12275
12276
12277
12278
12279
12280
12281
12282
12283
12284
12285
12286
12287
12288
12289
12290
12291
12292
12293
12294
12295
12296
12297
12298
12299
12300
12301
12302
12303
12304
12305
12306
12307
12308
12309
12310
12311
12312
12313
12314
12315
12316
12317
12318
12319
12320
12321
12322
12323
12324
12325
12326
12327
12328
12329
12330
12331
12332
12333
12334
12335
12336
12337
12338
12339
12340
12341
12342
12343
12344
12345
12346
12347
12348
12349
12350
12351
12352
12353
12354
12355
12356
12357
12358
12359
12360
12361
12362
12363
12364
12365
12366
12367
12368
12369
12370
12371
12372
12373
12374
12375
12376
12377
12378
12379
12380
12381
12382
12383
12384
12385
12386
12387
12388
12389
12390
12391
12392
12393
12394
12395
12396
12397
12398
12399
12400
12401
12402
12403
12404
12405
12406
12407
12408
12409
12410
12411
12412
12413
12414
12415
12416
12417
12418
12419
12420
12421
12422
12423
12424
12425
12426
12427
12428
12429
12430
12431
12432
12433
12434
12435
12436
12437
12438
12439
12440
12441
12442
12443
12444
12445
12446
12447
12448
12449
12450
12451
12452
12453
12454
12455
12456
12457
12458
12459
12460
12461
12462
12463
12464
12465
12466
12467
12468
12469
12470
12471
12472
12473
12474
12475
12476
12477
12478
12479
12480
12481
12482
12483
12484
12485
12486
12487
12488
12489
12490
12491
12492
12493
12494
12495
12496
12497
12498
12499
12500
12501
12502
12503
12504
12505
12506
12507
12508
12509
12510
12511
12512
12513
12514
12515
12516
12517
12518
12519
12520
12521
12522
12523
12524
12525
12526
12527
12528
12529
12530
12531
12532
12533
12534
12535
12536
12537
12538
12539
12540
12541
12542
12543
12544
12545
12546
12547
12548
12549
12550
12551
12552
12553
12554
12555
12556
12557
12558
12559
12560
12561
12562
12563
12564
12565
12566
12567
12568
12569
12570
12571
12572
12573
12574
12575
12576
12577
12578
12579
12580
12581
12582
12583
12584
12585
12586
12587
12588
12589
12590
12591
12592
12593
12594
12595
12596
12597
12598
12599
12600
12601
12602
12603
12604
12605
12606
12607
12608
12609
12610
12611
12612
12613
12614
12615
12616
12617
12618
12619
12620
12621
12622
12623
12624
12625
12626
12627
12628
12629
12630
12631
12632
12633
12634
12635
12636
12637
12638
12639
12640
12641
12642
12643
12644
12645
12646
12647
12648
12649
12650
12651
12652
12653
12654
12655
12656
12657
12658
12659
12660
12661
12662
12663
12664
12665
12666
12667
12668
12669
12670
12671
12672
12673
12674
12675
12676
12677
12678
12679
12680
12681
12682
12683
12684
12685
12686
12687
12688
12689
12690
12691
12692
12693
12694
12695
12696
12697
12698
12699
12700
12701
12702
12703
12704
12705
12706
12707
12708
12709
12710
12711
12712
12713
12714
12715
12716
12717
12718
12719
12720
12721
12722
12723
12724
12725
12726
12727
12728
12729
12730
12731
12732
12733
12734
12735
12736
12737
12738
12739
12740
12741
12742
12743
12744
12745
12746
12747
12748
12749
12750
12751
12752
12753
12754
12755
12756
12757
12758
12759
12760
12761
12762
12763
12764
12765
12766
12767
12768
12769
12770
12771
12772
12773
12774
12775
12776
12777
12778
12779
12780
12781
12782
12783
12784
12785
12786
12787
12788
12789
12790
12791
12792
12793
12794
12795
12796
12797
12798
12799
12800
12801
12802
12803
12804
12805
12806
12807
12808
12809
12810
12811
12812
12813
12814
12815
12816
12817
12818
12819
12820
12821
12822
12823
12824
12825
12826
12827
12828
12829
12830
12831
12832
12833
12834
12835
12836
12837
12838
12839
12840
12841
12842
12843
12844
12845
12846
12847
12848
12849
12850
12851
12852
12853
12854
12855
12856
12857
12858
12859
12860
12861
12862
12863
12864
12865
12866
12867
12868
12869
12870
12871
12872
12873
12874
12875
12876
12877
12878
12879
12880
12881
12882
12883
12884
12885
12886
12887
12888
12889
12890
12891
12892
12893
12894
12895
12896
12897
12898
12899
12900
12901
12902
12903
12904
12905
12906
12907
12908
12909
12910
12911
12912
12913
12914
12915
12916
12917
12918
12919
12920
12921
12922
12923
12924
12925
12926
12927
12928
12929
12930
12931
12932
12933
12934
12935
12936
12937
12938
12939
12940
12941
12942
12943
12944
12945
12946
12947
12948
12949
12950
12951
12952
12953
12954
12955
12956
12957
12958
12959
12960
12961
12962
12963
12964
12965
12966
12967
12968
12969
12970
12971
12972
12973
12974
12975
12976
12977
12978
12979
12980
12981
12982
12983
12984
12985
12986
12987
12988
12989
12990
12991
12992
12993
12994
12995
12996
12997
12998
12999
13000
13001
13002
13003
13004
13005
13006
13007
13008
13009
13010
13011
13012
13013
13014
13015
13016
13017
13018
13019
13020
13021
13022
13023
13024
13025
13026
13027
13028
13029
13030
13031
13032
13033
13034
13035
13036
13037
13038
13039
13040
13041
13042
13043
13044
13045
13046
13047
13048
13049
13050
13051
13052
13053
13054
13055
13056
13057
13058
13059
13060
13061
13062
13063
13064
13065
13066
13067
13068
13069
13070
13071
13072
13073
13074
13075
13076
13077
13078
13079
13080
13081
13082
13083
13084
13085
13086
13087
13088
13089
13090
13091
13092
13093
13094
13095
13096
13097
13098
13099
13100
13101
13102
13103
13104
13105
13106
13107
13108
13109
13110
13111
13112
13113
13114
13115
13116
13117
13118
13119
13120
13121
13122
13123
13124
13125
13126
13127
13128
13129
13130
13131
13132
13133
13134
13135
13136
13137
13138
13139
13140
13141
13142
13143
13144
13145
13146
13147
13148
13149
13150
13151
13152
13153
13154
13155
13156
13157
13158
13159
13160
13161
13162
13163
13164
13165
13166
13167
13168
13169
13170
13171
13172
13173
13174
13175
13176
13177
13178
13179
13180
13181
13182
13183
13184
13185
13186
13187
13188
13189
13190
13191
13192
13193
13194
13195
13196
13197
13198
13199
13200
13201
13202
13203
13204
13205
13206
13207
13208
13209
13210
13211
13212
13213
13214
13215
13216
13217
13218
13219
13220
13221
13222
13223
13224
13225
13226
13227
13228
13229
13230
13231
13232
13233
13234
13235
13236
13237
13238
13239
13240
13241
13242
13243
13244
13245
13246
13247
13248
13249
13250
13251
13252
13253
13254
13255
13256
13257
13258
13259
13260
13261
13262
13263
13264
13265
13266
13267
13268
13269
13270
13271
13272
13273
13274
13275
13276
13277
13278
13279
13280
13281
13282
13283
13284
13285
13286
13287
13288
13289
13290
13291
13292
13293
13294
13295
13296
13297
13298
13299
13300
13301
13302
13303
13304
13305
13306
13307
13308
13309
13310
13311
13312
13313
13314
13315
13316
13317
13318
13319
13320
13321
13322
13323
13324
13325
13326
13327
13328
13329
13330
13331
13332
13333
13334
13335
13336
13337
13338
13339
13340
13341
13342
13343
13344
13345
13346
13347
13348
13349
13350
13351
13352
13353
13354
13355
13356
13357
13358
13359
13360
13361
13362
13363
13364
13365
13366
13367
13368
13369
13370
13371
13372
13373
13374
13375
13376
13377
13378
13379
13380
13381
13382
13383
13384
13385
13386
13387
13388
13389
13390
13391
13392
13393
13394
13395
13396
13397
13398
13399
13400
13401
13402
13403
13404
13405
13406
13407
13408
13409
13410
13411
13412
13413
13414
13415
13416
13417
13418
13419
13420
13421
13422
13423
13424
13425
13426
13427
13428
13429
13430
13431
13432
13433
13434
13435
13436
13437
13438
13439
13440
13441
13442
13443
13444
13445
13446
13447
13448
13449
13450
13451
13452
13453
13454
13455
13456
13457
13458
13459
13460
13461
13462
13463
13464
13465
13466
13467
13468
13469
13470
13471
13472
13473
13474
13475
13476
13477
13478
13479
13480
13481
13482
13483
13484
13485
13486
13487
13488
13489
13490
13491
13492
13493
13494
13495
13496
13497
13498
13499
13500
13501
13502
13503
13504
13505
13506
13507
13508
13509
13510
13511
13512
13513
13514
13515
13516
13517
13518
13519
13520
13521
13522
13523
13524
13525
13526
13527
13528
13529
13530
13531
13532
13533
13534
13535
13536
13537
13538
13539
13540
13541
13542
13543
13544
13545
13546
13547
13548
13549
13550
13551
13552
13553
13554
13555
13556
13557
13558
13559
13560
13561
13562
13563
13564
13565
13566
13567
13568
13569
13570
13571
13572
13573
13574
13575
13576
13577
13578
13579
13580
13581
13582
13583
13584
13585
13586
13587
13588
13589
13590
13591
13592
13593
13594
13595
13596
13597
13598
13599
13600
13601
13602
13603
13604
13605
13606
13607
13608
13609
13610
13611
13612
13613
13614
13615
13616
13617
13618
13619
13620
13621
13622
13623
13624
13625
13626
13627
13628
13629
13630
13631
13632
13633
13634
13635
13636
13637
13638
13639
13640
13641
13642
13643
13644
13645
13646
13647
13648
13649
13650
13651
13652
13653
13654
13655
13656
13657
13658
13659
13660
13661
13662
13663
13664
13665
13666
13667
13668
13669
13670
13671
13672
13673
13674
13675
13676
13677
13678
13679
13680
13681
13682
13683
13684
13685
13686
13687
13688
13689
13690
13691
13692
13693
13694
13695
13696
13697
13698
13699
13700
13701
13702
13703
13704
13705
13706
13707
13708
13709
13710
13711
13712
13713
13714
13715
13716
13717
13718
13719
13720
13721
13722
13723
13724
13725
13726
13727
13728
13729
13730
13731
13732
13733
13734
13735
13736
13737
13738
13739
13740
13741
13742
13743
13744
13745
13746
13747
13748
13749
13750
13751
13752
13753
13754
13755
13756
13757
13758
13759
13760
13761
13762
13763
13764
13765
13766
13767
13768
13769
13770
13771
13772
13773
13774
13775
13776
13777
13778
13779
13780
13781
13782
13783
13784
13785
13786
13787
13788
13789
13790
13791
13792
13793
13794
13795
13796
13797
13798
13799
13800
13801
13802
13803
13804
13805
13806
13807
13808
13809
13810
13811
13812
13813
13814
13815
13816
13817
13818
13819
13820
13821
13822
13823
13824
13825
13826
13827
13828
13829
13830
13831
13832
13833
13834
13835
13836
13837
13838
13839
13840
13841
13842
13843
13844
13845
13846
13847
13848
13849
13850
13851
13852
13853
13854
13855
13856
13857
13858
13859
13860
13861
13862
13863
13864
13865
13866
13867
13868
13869
13870
13871
13872
13873
13874
13875
13876
13877
13878
13879
13880
13881
13882
13883
13884
13885
13886
13887
13888
13889
13890
13891
13892
13893
13894
13895
13896
13897
13898
13899
13900
13901
13902
13903
13904
13905
13906
13907
13908
13909
13910
13911
13912
13913
13914
13915
13916
13917
13918
13919
13920
13921
13922
13923
13924
13925
13926
13927
13928
13929
13930
13931
13932
13933
13934
13935
13936
13937
13938
13939
13940
13941
13942
13943
13944
13945
13946
13947
13948
13949
13950
13951
13952
13953
13954
13955
13956
13957
13958
13959
13960
13961
13962
13963
13964
13965
13966
13967
13968
13969
13970
13971
13972
13973
13974
13975
13976
13977
13978
13979
13980
13981
13982
13983
13984
13985
13986
13987
13988
13989
13990
13991
13992
13993
13994
13995
13996
13997
13998
13999
14000
14001
14002
14003
14004
14005
14006
14007
14008
14009
14010
14011
14012
14013
14014
14015
14016
14017
14018
14019
14020
14021
14022
14023
14024
14025
14026
14027
14028
14029
14030
14031
14032
14033
14034
14035
14036
14037
14038
14039
14040
14041
14042
14043
14044
14045
14046
14047
14048
14049
14050
14051
14052
14053
14054
14055
14056
14057
14058
14059
14060
14061
14062
14063
14064
14065
14066
14067
14068
14069
14070
14071
14072
14073
14074
14075
14076
14077
14078
14079
14080
14081
14082
14083
14084
14085
14086
14087
14088
14089
14090
14091
14092
14093
14094
14095
14096
14097
14098
14099
14100
14101
14102
14103
14104
14105
14106
14107
14108
14109
14110
14111
14112
14113
14114
14115
14116
14117
14118
14119
14120
14121
14122
14123
14124
14125
14126
14127
14128
14129
14130
14131
14132
14133
14134
14135
14136
14137
14138
14139
14140
14141
14142
14143
14144
14145
14146
14147
14148
14149
14150
14151
14152
14153
14154
14155
14156
14157
14158
14159
14160
14161
14162
14163
14164
14165
14166
14167
14168
14169
14170
14171
14172
14173
14174
14175
14176
14177
14178
14179
14180
14181
14182
14183
14184
14185
14186
14187
14188
14189
14190
14191
14192
14193
14194
14195
14196
14197
14198
14199
14200
14201
14202
14203
14204
14205
14206
14207
14208
14209
14210
14211
14212
14213
14214
14215
14216
14217
14218
14219
14220
14221
14222
14223
14224
14225
14226
14227
14228
14229
14230
14231
14232
14233
14234
14235
14236
14237
14238
14239
14240
14241
14242
14243
14244
14245
14246
14247
14248
14249
14250
14251
14252
14253
14254
14255
14256
14257
14258
14259
14260
14261
14262
14263
14264
14265
14266
14267
14268
14269
14270
14271
14272
14273
14274
14275
14276
14277
14278
14279
14280
14281
14282
14283
14284
14285
14286
14287
14288
14289
14290
14291
14292
14293
14294
14295
14296
14297
14298
14299
14300
14301
14302
14303
14304
14305
14306
14307
14308
14309
14310
14311
14312
14313
14314
14315
14316
14317
14318
14319
14320
14321
14322
14323
14324
14325
14326
14327
14328
14329
14330
14331
14332
14333
14334
14335
14336
14337
14338
14339
14340
14341
14342
14343
14344
14345
14346
14347
14348
14349
14350
14351
14352
14353
14354
14355
14356
14357
14358
14359
14360
14361
14362
14363
14364
14365
14366
14367
14368
14369
14370
14371
14372
14373
14374
14375
14376
14377
14378
14379
14380
14381
14382
14383
14384
14385
14386
14387
14388
14389
14390
14391
14392
14393
14394
14395
14396
14397
14398
14399
14400
14401
14402
14403
14404
14405
14406
14407
14408
14409
14410
14411
14412
14413
14414
14415
14416
14417
14418
14419
14420
14421
14422
14423
14424
14425
14426
14427
14428
14429
14430
14431
14432
14433
14434
14435
14436
14437
14438
14439
14440
14441
14442
14443
14444
14445
14446
14447
14448
14449
14450
14451
14452
14453
14454
14455
14456
14457
14458
14459
14460
14461
14462
14463
14464
14465
14466
14467
14468
14469
14470
14471
14472
14473
14474
14475
14476
14477
14478
14479
14480
14481
14482
14483
14484
14485
14486
14487
14488
14489
14490
14491
14492
14493
14494
14495
14496
14497
14498
14499
14500
14501
14502
14503
14504
14505
14506
14507
14508
14509
14510
14511
14512
14513
14514
14515
14516
14517
14518
14519
14520
14521
14522
14523
14524
14525
14526
14527
14528
14529
14530
14531
14532
14533
14534
14535
14536
14537
14538
14539
14540
14541
14542
14543
14544
14545
14546
14547
14548
14549
14550
14551
14552
14553
14554
14555
14556
14557
14558
14559
14560
14561
14562
14563
14564
14565
14566
14567
14568
14569
14570
14571
14572
14573
14574
14575
14576
14577
14578
14579
14580
14581
14582
14583
14584
14585
14586
14587
14588
14589
14590
14591
14592
14593
14594
14595
14596
14597
14598
14599
14600
14601
14602
14603
14604
14605
14606
14607
14608
14609
14610
14611
14612
14613
14614
14615
14616
14617
14618
14619
14620
14621
14622
14623
14624
14625
14626
14627
14628
14629
14630
14631
14632
14633
14634
14635
14636
14637
14638
14639
14640
14641
14642
14643
14644
14645
14646
14647
14648
14649
14650
14651
14652
14653
14654
14655
14656
14657
14658
14659
14660
14661
14662
14663
14664
14665
14666
14667
14668
14669
14670
14671
14672
14673
14674
14675
14676
14677
14678
14679
14680
14681
14682
14683
14684
14685
14686
14687
14688
14689
14690
14691
14692
14693
14694
14695
14696
14697
14698
14699
14700
14701
14702
14703
14704
14705
14706
14707
14708
14709
14710
14711
14712
14713
14714
14715
14716
14717
14718
14719
14720
14721
14722
14723
14724
14725
14726
14727
14728
14729
14730
14731
14732
14733
14734
14735
14736
14737
14738
14739
14740
14741
14742
14743
14744
14745
14746
14747
14748
14749
14750
14751
14752
14753
14754
14755
14756
14757
14758
14759
14760
14761
14762
14763
14764
14765
14766
14767
14768
14769
14770
14771
14772
14773
14774
14775
14776
14777
14778
14779
14780
14781
14782
14783
14784
14785
14786
14787
14788
14789
14790
14791
14792
14793
14794
14795
14796
14797
14798
14799
14800
14801
14802
14803
14804
14805
14806
14807
14808
14809
14810
14811
14812
14813
14814
14815
14816
14817
14818
14819
14820
14821
14822
14823
14824
14825
14826
14827
14828
14829
14830
14831
14832
14833
14834
14835
14836
14837
14838
14839
14840
14841
14842
14843
14844
14845
14846
14847
14848
14849
14850
14851
14852
14853
14854
14855
14856
14857
14858
14859
14860
14861
14862
14863
14864
14865
14866
14867
14868
14869
14870
14871
14872
14873
14874
14875
14876
14877
14878
14879
14880
14881
14882
14883
14884
14885
14886
14887
14888
14889
14890
14891
14892
14893
14894
14895
14896
14897
14898
14899
14900
14901
14902
14903
14904
14905
14906
14907
14908
14909
14910
14911
14912
14913
14914
14915
14916
14917
14918
14919
14920
14921
14922
14923
14924
14925
14926
14927
14928
14929
14930
14931
14932
14933
14934
14935
14936
14937
14938
14939
14940
14941
14942
14943
14944
14945
14946
14947
14948
14949
14950
14951
14952
14953
14954
14955
14956
14957
14958
14959
14960
14961
14962
14963
14964
14965
14966
14967
14968
14969
14970
14971
14972
14973
14974
14975
14976
14977
14978
14979
14980
14981
14982
14983
14984
14985
14986
14987
14988
14989
14990
14991
14992
14993
14994
14995
14996
14997
14998
14999
15000
15001
15002
15003
15004
15005
15006
15007
15008
15009
15010
15011
15012
15013
15014
15015
15016
15017
15018
15019
15020
15021
15022
15023
15024
15025
15026
15027
15028
15029
15030
15031
15032
15033
15034
15035
15036
15037
15038
15039
15040
15041
15042
15043
15044
15045
15046
15047
15048
15049
15050
15051
15052
15053
15054
15055
15056
15057
15058
15059
15060
15061
15062
15063
15064
15065
15066
15067
15068
15069
15070
15071
15072
15073
15074
15075
15076
15077
15078
15079
15080
15081
15082
15083
15084
15085
15086
15087
15088
15089
15090
15091
15092
15093
15094
15095
15096
15097
15098
15099
15100
15101
15102
15103
15104
15105
15106
15107
15108
15109
15110
15111
15112
15113
15114
15115
15116
15117
15118
15119
15120
15121
15122
15123
15124
15125
15126
15127
15128
15129
15130
15131
15132
15133
15134
15135
15136
15137
15138
15139
15140
15141
15142
15143
15144
15145
15146
15147
15148
15149
15150
15151
15152
15153
15154
15155
15156
15157
15158
15159
15160
15161
15162
15163
15164
15165
15166
15167
15168
15169
15170
15171
15172
15173
15174
15175
15176
15177
15178
15179
15180
15181
15182
15183
15184
15185
15186
15187
15188
15189
15190
15191
15192
15193
15194
15195
15196
15197
15198
15199
15200
15201
15202
15203
15204
15205
15206
15207
15208
15209
15210
15211
15212
15213
15214
15215
15216
15217
15218
15219
15220
15221
15222
15223
15224
15225
15226
15227
15228
15229
15230
15231
15232
15233
15234
15235
15236
15237
15238
15239
15240
15241
15242
15243
15244
15245
15246
15247
15248
15249
15250
15251
15252
15253
15254
15255
15256
15257
15258
15259
15260
15261
15262
15263
15264
15265
15266
15267
15268
15269
15270
15271
15272
15273
15274
15275
15276
15277
15278
15279
15280
15281
15282
15283
15284
15285
15286
15287
15288
15289
15290
15291
15292
15293
15294
15295
15296
15297
15298
15299
15300
15301
15302
15303
15304
15305
15306
15307
15308
15309
15310
15311
15312
15313
15314
15315
15316
15317
15318
15319
15320
15321
15322
15323
15324
15325
15326
15327
15328
15329
15330
15331
15332
15333
15334
15335
15336
15337
15338
15339
15340
15341
15342
15343
15344
15345
15346
15347
15348
15349
15350
15351
15352
15353
15354
15355
15356
15357
15358
15359
15360
15361
15362
15363
15364
15365
15366
15367
15368
15369
15370
15371
15372
15373
15374
15375
15376
15377
15378
15379
15380
15381
15382
15383
15384
15385
15386
15387
15388
15389
15390
15391
15392
15393
15394
15395
15396
15397
15398
15399
15400
15401
15402
15403
15404
15405
15406
15407
15408
15409
15410
15411
15412
15413
15414
15415
15416
15417
15418
15419
15420
15421
15422
15423
15424
15425
15426
15427
15428
15429
15430
15431
15432
15433
15434
15435
15436
15437
15438
15439
15440
15441
15442
15443
15444
15445
15446
15447
15448
15449
15450
15451
15452
15453
15454
15455
15456
15457
15458
15459
15460
15461
15462
15463
15464
15465
15466
15467
15468
15469
15470
15471
15472
15473
15474
15475
15476
15477
15478
15479
15480
15481
15482
15483
15484
15485
15486
15487
15488
15489
15490
15491
15492
15493
15494
15495
15496
15497
15498
15499
15500
15501
15502
15503
15504
15505
15506
15507
15508
15509
15510
15511
15512
15513
15514
15515
15516
15517
15518
15519
15520
15521
15522
15523
15524
15525
15526
15527
15528
15529
15530
15531
15532
15533
15534
15535
15536
15537
15538
15539
15540
15541
15542
15543
15544
15545
15546
15547
15548
15549
15550
15551
15552
15553
15554
15555
15556
15557
15558
15559
15560
15561
15562
15563
15564
15565
15566
15567
15568
15569
15570
15571
15572
15573
15574
15575
15576
15577
15578
15579
15580
15581
15582
15583
15584
15585
15586
15587
15588
15589
15590
15591
15592
15593
15594
15595
15596
15597
15598
15599
15600
15601
15602
15603
15604
15605
15606
15607
15608
15609
15610
15611
15612
15613
15614
15615
15616
15617
15618
15619
15620
15621
15622
15623
15624
15625
15626
15627
15628
15629
15630
15631
15632
15633
15634
15635
15636
15637
15638
15639
15640
15641
15642
15643
15644
15645
15646
15647
15648
15649
15650
15651
15652
15653
15654
15655
15656
15657
15658
15659
15660
15661
15662
15663
15664
15665
15666
15667
15668
15669
15670
15671
15672
15673
15674
15675
15676
15677
15678
15679
15680
15681
15682
15683
15684
15685
15686
15687
15688
15689
15690
15691
15692
15693
15694
15695
15696
15697
15698
15699
15700
15701
15702
15703
15704
15705
15706
15707
15708
15709
15710
15711
15712
15713
15714
15715
15716
15717
15718
15719
15720
15721
15722
15723
15724
15725
15726
15727
15728
15729
15730
15731
15732
15733
15734
15735
15736
15737
15738
15739
15740
15741
15742
15743
15744
15745
15746
15747
15748
15749
15750
15751
15752
15753
15754
15755
15756
15757
15758
15759
15760
15761
15762
15763
15764
15765
15766
15767
15768
15769
15770
15771
15772
15773
15774
15775
15776
15777
15778
15779
15780
15781
15782
15783
15784
15785
15786
15787
15788
15789
15790
15791
15792
15793
15794
15795
15796
15797
15798
15799
15800
15801
15802
15803
15804
15805
15806
15807
15808
15809
15810
15811
15812
15813
15814
15815
15816
15817
15818
15819
15820
15821
15822
15823
15824
15825
15826
15827
15828
15829
15830
15831
15832
15833
15834
15835
15836
15837
15838
15839
15840
15841
15842
15843
15844
15845
15846
15847
15848
15849
15850
15851
15852
15853
15854
15855
15856
15857
15858
15859
15860
15861
15862
15863
15864
15865
15866
15867
15868
15869
15870
15871
15872
15873
15874
15875
15876
15877
15878
15879
15880
15881
15882
15883
15884
15885
15886
15887
15888
15889
15890
15891
15892
15893
15894
15895
15896
15897
15898
15899
15900
15901
15902
15903
15904
15905
15906
15907
15908
15909
15910
15911
15912
15913
15914
15915
15916
15917
15918
15919
15920
15921
15922
15923
15924
15925
15926
15927
15928
15929
15930
15931
15932
15933
15934
15935
15936
15937
15938
15939
15940
15941
15942
15943
15944
15945
15946
15947
15948
15949
15950
15951
15952
15953
15954
15955
15956
15957
15958
15959
15960
15961
15962
15963
15964
15965
15966
15967
15968
15969
15970
15971
15972
15973
15974
15975
15976
15977
15978
15979
15980
15981
15982
15983
15984
15985
15986
15987
15988
15989
15990
15991
15992
15993
15994
15995
15996
15997
15998
15999
16000
16001
16002
16003
16004
16005
16006
16007
16008
16009
16010
16011
16012
16013
16014
16015
16016
16017
16018
16019
16020
16021
16022
16023
16024
16025
16026
16027
16028
16029
16030
16031
16032
16033
16034
16035
16036
16037
16038
16039
16040
16041
16042
16043
16044
16045
16046
16047
16048
16049
16050
16051
16052
16053
16054
16055
16056
16057
16058
16059
16060
16061
16062
16063
16064
16065
16066
16067
16068
16069
16070
16071
16072
16073
16074
16075
16076
16077
16078
16079
16080
16081
16082
16083
16084
16085
16086
16087
16088
16089
16090
16091
16092
16093
16094
16095
16096
16097
16098
16099
16100
16101
16102
16103
16104
16105
16106
16107
16108
16109
16110
16111
16112
16113
16114
16115
16116
16117
16118
16119
16120
16121
16122
16123
16124
16125
16126
16127
16128
16129
16130
16131
16132
16133
16134
16135
16136
16137
16138
16139
16140
16141
16142
16143
16144
16145
16146
16147
16148
16149
16150
16151
16152
16153
16154
16155
16156
16157
16158
16159
16160
16161
16162
16163
16164
16165
16166
16167
16168
16169
16170
16171
16172
16173
16174
16175
16176
16177
16178
16179
16180
16181
16182
16183
16184
16185
16186
16187
16188
16189
16190
16191
16192
16193
16194
16195
16196
16197
16198
16199
16200
16201
16202
16203
16204
16205
16206
16207
16208
16209
16210
16211
16212
16213
16214
16215
16216
16217
16218
16219
16220
16221
16222
16223
16224
16225
16226
16227
16228
16229
16230
16231
16232
16233
16234
16235
16236
16237
16238
16239
16240
16241
16242
16243
16244
16245
16246
16247
16248
16249
16250
16251
16252
16253
16254
16255
16256
16257
16258
16259
16260
16261
16262
16263
16264
16265
16266
16267
16268
16269
16270
16271
16272
16273
16274
16275
16276
16277
16278
16279
16280
16281
16282
16283
16284
16285
16286
16287
16288
16289
16290
16291
16292
16293
16294
16295
16296
16297
16298
16299
16300
16301
16302
16303
16304
16305
16306
16307
16308
16309
16310
16311
16312
16313
16314
16315
16316
16317
16318
16319
16320
16321
16322
16323
16324
16325
16326
16327
16328
16329
16330
16331
16332
16333
16334
16335
16336
16337
16338
16339
16340
16341
16342
16343
16344
16345
16346
16347
16348
16349
16350
16351
16352
16353
16354
16355
16356
16357
16358
16359
16360
16361
16362
16363
16364
16365
16366
16367
16368
16369
16370
16371
16372
16373
16374
16375
16376
16377
16378
16379
16380
16381
16382
16383
16384
16385
16386
16387
16388
16389
16390
16391
16392
16393
16394
16395
16396
16397
16398
16399
16400
16401
16402
16403
16404
16405
16406
16407
16408
16409
16410
16411
16412
16413
16414
16415
16416
16417
16418
16419
16420
16421
16422
16423
16424
16425
16426
16427
16428
16429
16430
16431
16432
16433
16434
16435
16436
16437
16438
16439
16440
16441
16442
16443
16444
16445
16446
16447
16448
16449
16450
16451
16452
16453
16454
16455
16456
16457
16458
16459
16460
16461
16462
16463
16464
16465
16466
16467
16468
16469
16470
16471
16472
16473
16474
16475
16476
16477
16478
16479
16480
16481
16482
16483
16484
16485
16486
16487
16488
16489
16490
16491
16492
16493
16494
16495
16496
16497
16498
16499
16500
16501
16502
16503
16504
16505
16506
16507
16508
16509
16510
16511
16512
16513
16514
16515
16516
16517
16518
16519
16520
16521
16522
16523
16524
16525
16526
16527
16528
16529
16530
16531
16532
16533
16534
16535
16536
16537
16538
16539
16540
16541
16542
16543
16544
16545
16546
16547
16548
16549
16550
16551
16552
16553
16554
16555
16556
16557
16558
16559
16560
16561
16562
16563
16564
16565
16566
16567
16568
16569
16570
16571
16572
16573
16574
16575
16576
16577
16578
16579
16580
16581
16582
16583
16584
16585
16586
16587
16588
16589
16590
16591
16592
16593
16594
16595
16596
16597
16598
16599
16600
16601
16602
16603
16604
16605
16606
16607
16608
16609
16610
16611
16612
16613
16614
16615
16616
16617
16618
16619
16620
16621
16622
16623
16624
16625
16626
16627
16628
16629
16630
16631
16632
16633
16634
16635
16636
16637
16638
16639
16640
16641
16642
16643
16644
16645
16646
16647
16648
16649
16650
16651
16652
16653
16654
16655
16656
16657
16658
16659
16660
16661
16662
16663
16664
16665
16666
16667
16668
16669
16670
16671
16672
16673
16674
16675
16676
16677
16678
16679
16680
16681
16682
16683
16684
16685
16686
16687
16688
16689
16690
16691
16692
16693
16694
16695
16696
16697
16698
16699
16700
16701
16702
16703
16704
16705
16706
16707
16708
16709
16710
16711
16712
16713
16714
16715
16716
16717
16718
16719
16720
16721
16722
16723
16724
16725
16726
16727
16728
16729
16730
16731
16732
16733
16734
16735
16736
16737
16738
16739
16740
16741
16742
16743
16744
16745
16746
16747
16748
16749
16750
16751
16752
16753
16754
16755
16756
16757
16758
16759
16760
16761
16762
16763
16764
16765
16766
16767
16768
16769
16770
16771
16772
16773
16774
16775
16776
16777
16778
16779
16780
16781
16782
16783
16784
16785
16786
16787
16788
16789
16790
16791
16792
16793
16794
16795
16796
16797
16798
16799
16800
16801
16802
16803
16804
16805
16806
16807
16808
16809
16810
16811
16812
16813
16814
16815
16816
16817
16818
16819
16820
16821
16822
16823
16824
16825
16826
16827
16828
16829
16830
16831
16832
16833
16834
16835
16836
16837
16838
16839
16840
16841
16842
16843
16844
16845
16846
16847
16848
16849
16850
16851
16852
16853
16854
16855
16856
16857
16858
16859
16860
16861
16862
16863
16864
16865
16866
16867
16868
16869
16870
16871
16872
16873
16874
16875
16876
16877
16878
16879
16880
16881
16882
16883
16884
16885
16886
16887
16888
16889
16890
16891
16892
16893
16894
16895
16896
16897
16898
16899
16900
16901
16902
16903
16904
16905
16906
16907
16908
16909
16910
16911
16912
16913
16914
16915
16916
16917
16918
16919
16920
16921
16922
16923
16924
16925
16926
16927
16928
16929
16930
16931
16932
16933
16934
16935
16936
16937
16938
16939
16940
16941
16942
16943
16944
16945
16946
16947
16948
16949
16950
16951
16952
16953
16954
16955
16956
16957
16958
16959
16960
16961
16962
16963
16964
16965
16966
16967
16968
16969
16970
16971
16972
16973
16974
16975
16976
16977
16978
16979
16980
16981
16982
16983
16984
16985
16986
16987
16988
16989
16990
16991
16992
16993
16994
16995
16996
16997
16998
16999
17000
17001
17002
17003
17004
17005
17006
17007
17008
17009
17010
17011
17012
17013
17014
17015
17016
17017
17018
17019
17020
17021
17022
17023
17024
17025
17026
17027
17028
17029
17030
17031
17032
17033
17034
17035
17036
17037
17038
17039
17040
17041
17042
17043
17044
17045
17046
17047
17048
17049
17050
17051
17052
17053
17054
17055
17056
17057
17058
17059
17060
17061
17062
17063
17064
17065
17066
17067
17068
17069
17070
17071
17072
17073
17074
17075
17076
17077
17078
17079
17080
17081
17082
17083
17084
17085
17086
17087
17088
17089
17090
17091
17092
17093
17094
17095
17096
17097
17098
17099
17100
17101
17102
17103
17104
17105
17106
17107
17108
17109
17110
17111
17112
17113
17114
17115
17116
17117
17118
17119
17120
17121
17122
17123
17124
17125
17126
17127
17128
17129
17130
17131
17132
17133
17134
17135
17136
17137
17138
17139
17140
17141
17142
17143
17144
17145
17146
17147
17148
17149
17150
17151
17152
17153
17154
17155
17156
17157
17158
17159
17160
17161
17162
17163
17164
17165
17166
17167
17168
17169
17170
17171
17172
17173
17174
17175
17176
17177
17178
17179
17180
17181
17182
17183
17184
17185
17186
17187
17188
17189
17190
17191
17192
17193
17194
17195
17196
17197
17198
17199
17200
17201
17202
17203
17204
17205
17206
17207
17208
17209
17210
17211
17212
17213
17214
17215
17216
17217
17218
17219
17220
17221
17222
17223
17224
17225
17226
17227
17228
17229
17230
17231
17232
17233
17234
17235
17236
17237
17238
17239
17240
17241
17242
17243
17244
17245
17246
17247
17248
17249
17250
17251
17252
17253
17254
17255
17256
17257
17258
17259
17260
17261
17262
17263
17264
17265
17266
17267
17268
17269
17270
17271
17272
17273
17274
17275
17276
17277
17278
17279
17280
17281
17282
17283
17284
17285
17286
17287
17288
17289
17290
17291
17292
17293
17294
17295
17296
17297
17298
17299
17300
17301
17302
17303
17304
17305
17306
17307
17308
17309
17310
17311
17312
17313
17314
17315
17316
17317
17318
17319
17320
17321
17322
17323
17324
17325
17326
17327
17328
17329
17330
17331
17332
17333
17334
17335
17336
17337
17338
17339
17340
17341
17342
17343
17344
17345
17346
17347
17348
17349
17350
17351
17352
17353
17354
17355
17356
17357
17358
17359
17360
17361
17362
17363
17364
17365
17366
17367
17368
17369
17370
17371
17372
17373
17374
17375
17376
17377
17378
17379
17380
17381
17382
17383
17384
17385
17386
17387
17388
17389
17390
17391
17392
17393
17394
17395
17396
17397
17398
17399
17400
17401
17402
17403
17404
17405
17406
17407
17408
17409
17410
17411
17412
17413
17414
17415
17416
17417
17418
17419
17420
17421
17422
17423
17424
17425
17426
17427
17428
17429
17430
17431
17432
17433
17434
17435
17436
17437
17438
17439
17440
17441
17442
17443
17444
17445
17446
17447
17448
17449
17450
17451
17452
17453
17454
17455
17456
17457
17458
17459
17460
17461
17462
17463
17464
17465
17466
17467
17468
17469
17470
17471
17472
17473
17474
17475
17476
17477
17478
17479
17480
17481
17482
17483
17484
17485
17486
17487
17488
17489
17490
17491
17492
17493
17494
17495
17496
17497
17498
17499
17500
17501
17502
17503
17504
17505
17506
17507
17508
17509
17510
17511
17512
17513
17514
17515
17516
17517
17518
17519
17520
17521
17522
17523
17524
17525
17526
17527
17528
17529
17530
17531
17532
17533
17534
17535
17536
17537
17538
17539
17540
17541
17542
17543
17544
17545
17546
17547
17548
17549
17550
17551
17552
17553
17554
17555
17556
17557
17558
17559
17560
17561
17562
17563
17564
17565
17566
17567
17568
17569
17570
17571
17572
17573
17574
17575
17576
17577
17578
17579
17580
17581
17582
17583
17584
17585
17586
17587
17588
17589
17590
17591
17592
17593
17594
17595
17596
17597
17598
17599
17600
17601
17602
17603
17604
17605
17606
17607
17608
17609
17610
17611
17612
17613
17614
17615
17616
17617
17618
17619
17620
17621
17622
17623
17624
17625
17626
17627
17628
17629
17630
17631
17632
17633
17634
17635
17636
17637
17638
17639
17640
17641
17642
17643
17644
17645
17646
17647
17648
17649
17650
17651
17652
17653
17654
17655
17656
17657
17658
17659
17660
17661
17662
17663
17664
17665
17666
17667
17668
17669
17670
17671
17672
17673
17674
17675
17676
17677
17678
17679
17680
17681
17682
17683
17684
17685
17686
17687
17688
17689
17690
17691
17692
17693
17694
17695
17696
17697
17698
17699
17700
17701
17702
17703
17704
17705
17706
17707
17708
17709
17710
17711
17712
17713
17714
17715
17716
17717
17718
17719
17720
17721
17722
17723
17724
17725
17726
17727
17728
17729
17730
17731
17732
17733
17734
17735
17736
17737
17738
17739
17740
17741
17742
17743
17744
17745
17746
17747
17748
17749
17750
17751
17752
17753
17754
17755
17756
17757
17758
17759
17760
17761
17762
17763
17764
17765
17766
17767
17768
17769
17770
17771
17772
17773
17774
17775
17776
17777
17778
17779
17780
17781
17782
17783
17784
17785
17786
17787
17788
17789
17790
17791
17792
17793
17794
17795
17796
17797
17798
17799
17800
17801
17802
17803
17804
17805
17806
17807
17808
17809
17810
17811
17812
17813
17814
17815
17816
17817
17818
17819
17820
17821
17822
17823
17824
17825
17826
17827
17828
17829
17830
17831
17832
17833
17834
17835
17836
17837
17838
17839
17840
17841
17842
17843
17844
17845
17846
17847
17848
17849
17850
17851
17852
17853
17854
17855
17856
17857
17858
17859
17860
17861
17862
17863
17864
17865
17866
17867
17868
17869
17870
17871
17872
17873
17874
17875
17876
17877
17878
17879
17880
17881
17882
17883
17884
17885
17886
17887
17888
17889
17890
17891
17892
17893
17894
17895
17896
17897
17898
17899
17900
17901
17902
17903
17904
17905
17906
17907
17908
17909
17910
17911
17912
17913
17914
17915
17916
17917
17918
17919
17920
17921
17922
17923
17924
17925
17926
17927
17928
17929
17930
17931
17932
17933
17934
17935
17936
17937
17938
17939
17940
17941
17942
17943
17944
17945
17946
17947
17948
17949
17950
17951
17952
17953
17954
17955
17956
17957
17958
17959
17960
17961
17962
17963
17964
17965
17966
17967
17968
17969
17970
17971
17972
17973
17974
17975
17976
17977
17978
17979
17980
17981
17982
17983
17984
17985
17986
17987
17988
17989
17990
17991
17992
17993
17994
17995
17996
17997
17998
17999
18000
18001
18002
18003
18004
18005
18006
18007
18008
18009
18010
18011
18012
18013
18014
18015
18016
18017
18018
18019
18020
18021
18022
18023
18024
18025
18026
18027
18028
18029
18030
18031
18032
18033
18034
18035
18036
18037
18038
18039
18040
18041
18042
18043
18044
18045
18046
18047
18048
18049
18050
18051
18052
18053
18054
18055
18056
18057
18058
18059
18060
18061
18062
18063
18064
18065
18066
18067
18068
18069
18070
18071
18072
18073
18074
18075
18076
18077
18078
18079
18080
18081
18082
18083
18084
18085
18086
18087
18088
18089
18090
18091
18092
18093
18094
18095
18096
18097
18098
18099
18100
18101
18102
18103
18104
18105
18106
18107
18108
18109
18110
18111
18112
18113
18114
18115
18116
18117
18118
18119
18120
18121
18122
18123
18124
18125
18126
18127
18128
18129
18130
18131
18132
18133
18134
18135
18136
18137
18138
18139
18140
18141
18142
18143
18144
18145
18146
18147
18148
18149
18150
18151
18152
18153
18154
18155
18156
18157
18158
18159
18160
18161
18162
18163
18164
18165
18166
18167
18168
18169
18170
18171
18172
18173
18174
18175
18176
18177
18178
18179
18180
18181
18182
18183
18184
18185
18186
18187
18188
18189
18190
18191
18192
18193
18194
18195
18196
18197
18198
18199
18200
18201
18202
18203
18204
18205
18206
18207
18208
18209
18210
18211
18212
18213
18214
18215
18216
18217
18218
18219
18220
18221
18222
18223
18224
18225
18226
18227
18228
18229
18230
18231
18232
18233
18234
18235
18236
18237
18238
18239
18240
18241
18242
18243
18244
18245
18246
18247
18248
18249
18250
18251
18252
18253
18254
18255
18256
18257
18258
18259
18260
18261
18262
18263
18264
18265
18266
18267
18268
18269
18270
18271
18272
18273
18274
18275
18276
18277
18278
18279
18280
18281
18282
18283
18284
18285
18286
18287
18288
18289
18290
18291
18292
18293
18294
18295
18296
18297
18298
18299
18300
18301
18302
18303
18304
18305
18306
18307
18308
18309
18310
18311
18312
18313
18314
18315
18316
18317
18318
18319
18320
18321
18322
18323
18324
18325
18326
18327
18328
18329
18330
18331
18332
18333
18334
18335
18336
18337
18338
18339
18340
18341
18342
18343
18344
18345
18346
18347
18348
18349
18350
18351
18352
18353
18354
18355
18356
18357
18358
18359
18360
18361
18362
18363
18364
18365
18366
18367
18368
18369
18370
18371
18372
18373
18374
18375
18376
18377
18378
18379
18380
18381
18382
18383
18384
18385
18386
18387
18388
18389
18390
18391
18392
18393
18394
18395
18396
18397
18398
18399
18400
18401
18402
18403
18404
18405
18406
18407
18408
18409
18410
18411
18412
18413
18414
18415
18416
18417
18418
18419
18420
18421
18422
18423
18424
18425
18426
18427
18428
18429
18430
18431
18432
18433
18434
18435
18436
18437
18438
18439
18440
18441
18442
18443
18444
18445
18446
18447
18448
18449
18450
18451
18452
18453
18454
18455
18456
18457
18458
18459
18460
18461
18462
18463
18464
18465
18466
18467
18468
18469
18470
18471
18472
18473
18474
18475
18476
18477
18478
18479
18480
18481
18482
18483
18484
18485
18486
18487
18488
18489
18490
18491
18492
18493
18494
18495
18496
18497
18498
18499
18500
18501
18502
18503
18504
18505
18506
18507
18508
18509
18510
18511
18512
18513
18514
18515
18516
18517
18518
18519
18520
18521
18522
18523
18524
18525
18526
18527
18528
18529
18530
18531
18532
18533
18534
18535
18536
18537
18538
18539
18540
18541
18542
18543
18544
18545
18546
18547
18548
18549
18550
18551
18552
18553
18554
18555
18556
18557
18558
18559
18560
18561
18562
18563
18564
18565
18566
18567
18568
18569
18570
18571
18572
18573
18574
18575
18576
18577
18578
18579
18580
18581
18582
18583
18584
18585
18586
18587
18588
18589
18590
18591
18592
18593
18594
18595
18596
18597
18598
18599
18600
18601
18602
18603
18604
18605
18606
18607
18608
18609
18610
18611
18612
18613
18614
18615
18616
18617
18618
18619
18620
18621
18622
18623
18624
18625
18626
18627
18628
18629
18630
18631
18632
18633
18634
18635
18636
18637
18638
18639
18640
18641
18642
18643
18644
18645
18646
18647
18648
18649
18650
18651
18652
18653
18654
18655
18656
18657
18658
18659
18660
18661
18662
18663
18664
18665
18666
18667
18668
18669
18670
18671
18672
18673
18674
18675
18676
18677
18678
18679
18680
18681
18682
18683
18684
18685
18686
18687
18688
18689
18690
18691
18692
18693
18694
18695
18696
18697
18698
18699
18700
18701
18702
18703
18704
18705
18706
18707
18708
18709
18710
18711
18712
18713
18714
18715
18716
18717
18718
18719
18720
18721
18722
18723
18724
18725
18726
18727
18728
18729
18730
18731
18732
18733
18734
18735
18736
18737
18738
18739
18740
18741
18742
18743
18744
18745
18746
18747
18748
18749
18750
18751
18752
18753
18754
18755
18756
18757
18758
18759
18760
18761
18762
18763
18764
18765
18766
18767
18768
18769
18770
18771
18772
18773
18774
18775
18776
18777
18778
18779
18780
18781
18782
18783
18784
18785
18786
18787
18788
18789
18790
18791
18792
18793
18794
18795
18796
18797
18798
18799
18800
18801
18802
18803
18804
18805
18806
18807
18808
18809
18810
18811
18812
18813
18814
18815
18816
18817
18818
18819
18820
18821
18822
18823
18824
18825
18826
18827
18828
18829
18830
18831
18832
18833
18834
18835
18836
18837
18838
18839
18840
18841
18842
18843
18844
18845
18846
18847
18848
18849
18850
18851
18852
18853
18854
18855
18856
18857
18858
18859
18860
18861
18862
18863
18864
18865
18866
18867
18868
18869
18870
18871
18872
18873
18874
18875
18876
18877
18878
18879
18880
18881
18882
18883
18884
18885
18886
18887
18888
18889
18890
18891
18892
18893
18894
18895
18896
18897
18898
18899
18900
18901
18902
18903
18904
18905
18906
18907
18908
18909
18910
18911
18912
18913
18914
18915
18916
18917
18918
18919
18920
18921
18922
18923
18924
18925
18926
18927
18928
18929
18930
18931
18932
18933
18934
18935
18936
18937
18938
18939
18940
18941
18942
18943
18944
18945
18946
18947
18948
18949
18950
18951
18952
18953
18954
18955
18956
18957
18958
18959
18960
18961
18962
18963
18964
18965
18966
18967
18968
18969
18970
18971
18972
18973
18974
18975
18976
18977
18978
18979
18980
18981
18982
18983
18984
18985
18986
18987
18988
18989
18990
18991
18992
18993
18994
18995
18996
18997
18998
18999
19000
19001
19002
19003
19004
19005
19006
19007
19008
19009
19010
19011
19012
19013
19014
19015
19016
19017
19018
19019
19020
19021
19022
19023
19024
19025
19026
19027
19028
19029
19030
19031
19032
19033
19034
19035
19036
19037
19038
19039
19040
19041
19042
19043
19044
19045
19046
19047
19048
19049
19050
19051
19052
19053
19054
19055
19056
19057
19058
19059
19060
19061
19062
19063
19064
19065
19066
19067
19068
19069
19070
19071
19072
19073
19074
19075
19076
19077
19078
19079
19080
19081
19082
19083
19084
19085
19086
19087
19088
19089
19090
19091
19092
19093
19094
19095
19096
19097
19098
19099
19100
19101
19102
19103
19104
19105
19106
19107
19108
19109
19110
19111
19112
19113
19114
19115
19116
19117
19118
19119
19120
19121
19122
19123
19124
19125
19126
19127
19128
19129
19130
19131
19132
19133
19134
19135
19136
19137
19138
19139
19140
19141
19142
19143
19144
19145
19146
19147
19148
19149
19150
19151
19152
19153
19154
19155
19156
19157
19158
19159
19160
19161
19162
19163
19164
19165
19166
19167
19168
19169
19170
19171
19172
19173
19174
19175
19176
19177
19178
19179
19180
19181
19182
19183
19184
19185
19186
19187
19188
19189
19190
19191
19192
19193
19194
19195
19196
19197
19198
19199
19200
19201
19202
19203
19204
19205
19206
19207
19208
19209
19210
19211
19212
19213
19214
19215
19216
19217
19218
19219
19220
19221
19222
19223
19224
19225
19226
19227
19228
19229
19230
19231
19232
19233
19234
19235
19236
19237
19238
19239
19240
19241
19242
19243
19244
19245
19246
19247
19248
19249
19250
19251
19252
19253
19254
19255
19256
19257
19258
19259
19260
19261
19262
19263
19264
19265
19266
19267
19268
19269
19270
19271
19272
19273
19274
19275
19276
19277
19278
19279
19280
19281
19282
19283
19284
19285
19286
19287
19288
19289
19290
19291
19292
19293
19294
19295
19296
19297
19298
19299
19300
19301
19302
19303
19304
19305
19306
19307
19308
19309
19310
19311
19312
19313
19314
19315
19316
19317
19318
19319
19320
19321
19322
19323
19324
19325
19326
19327
19328
19329
19330
19331
19332
19333
19334
19335
19336
19337
19338
19339
19340
19341
19342
19343
19344
19345
19346
19347
19348
19349
19350
19351
19352
19353
19354
19355
19356
19357
19358
19359
19360
19361
19362
19363
19364
19365
19366
19367
19368
19369
19370
19371
19372
19373
19374
19375
19376
19377
19378
19379
19380
19381
19382
19383
19384
19385
19386
19387
19388
19389
19390
19391
19392
19393
19394
19395
19396
19397
19398
19399
19400
19401
19402
19403
19404
19405
19406
19407
19408
19409
19410
19411
19412
19413
19414
19415
19416
19417
19418
19419
19420
19421
19422
19423
19424
19425
19426
19427
19428
19429
19430
19431
19432
19433
19434
19435
19436
19437
19438
19439
19440
19441
19442
19443
19444
19445
19446
19447
19448
19449
19450
19451
19452
19453
19454
19455
19456
19457
19458
19459
19460
19461
19462
19463
19464
19465
19466
19467
19468
19469
19470
19471
19472
19473
19474
19475
19476
19477
19478
19479
19480
19481
19482
19483
19484
19485
19486
19487
19488
19489
19490
19491
19492
19493
19494
19495
19496
19497
19498
19499
19500
19501
19502
19503
19504
19505
19506
19507
19508
19509
19510
19511
19512
19513
19514
19515
19516
19517
19518
19519
19520
19521
19522
19523
19524
19525
19526
19527
19528
19529
19530
19531
19532
19533
19534
19535
19536
19537
19538
19539
19540
19541
19542
19543
19544
19545
19546
19547
19548
19549
19550
19551
19552
19553
19554
19555
19556
19557
19558
19559
19560
19561
19562
19563
19564
19565
19566
19567
19568
19569
19570
19571
19572
19573
19574
19575
19576
19577
19578
19579
19580
19581
19582
19583
19584
19585
19586
19587
19588
19589
19590
19591
19592
19593
19594
19595
19596
19597
19598
19599
19600
19601
19602
19603
19604
19605
19606
19607
19608
19609
19610
19611
19612
19613
19614
19615
19616
19617
19618
19619
19620
19621
19622
19623
19624
19625
19626
19627
19628
19629
19630
19631
19632
19633
19634
19635
19636
19637
19638
19639
19640
19641
19642
19643
19644
19645
19646
19647
19648
19649
19650
19651
19652
19653
19654
19655
19656
19657
19658
19659
19660
19661
19662
19663
19664
19665
19666
19667
19668
19669
19670
19671
19672
19673
19674
19675
19676
19677
19678
19679
19680
19681
19682
19683
19684
19685
19686
19687
19688
19689
19690
19691
19692
19693
19694
19695
19696
19697
19698
19699
19700
19701
19702
19703
19704
19705
19706
19707
19708
19709
19710
19711
19712
19713
19714
19715
19716
19717
19718
19719
19720
19721
19722
19723
19724
19725
19726
19727
19728
19729
19730
19731
19732
19733
19734
19735
19736
19737
19738
19739
19740
19741
19742
19743
19744
19745
19746
19747
19748
19749
19750
19751
19752
19753
19754
19755
19756
19757
19758
19759
19760
19761
19762
19763
19764
19765
19766
19767
19768
19769
19770
19771
19772
19773
19774
19775
19776
19777
19778
19779
19780
19781
19782
19783
19784
19785
19786
19787
19788
19789
19790
19791
19792
19793
19794
19795
19796
19797
19798
19799
19800
19801
19802
19803
19804
19805
19806
19807
19808
19809
19810
19811
19812
19813
19814
19815
19816
19817
19818
19819
19820
19821
19822
19823
19824
19825
19826
19827
19828
19829
19830
19831
19832
19833
19834
19835
19836
19837
19838
19839
19840
19841
19842
19843
19844
19845
19846
19847
19848
19849
19850
19851
19852
19853
19854
19855
19856
19857
19858
19859
19860
19861
19862
19863
19864
19865
19866
19867
19868
19869
19870
19871
19872
19873
19874
19875
19876
19877
19878
19879
19880
19881
19882
19883
19884
19885
19886
19887
19888
19889
19890
19891
19892
19893
19894
19895
19896
19897
19898
19899
19900
19901
19902
19903
19904
19905
19906
19907
19908
19909
19910
19911
19912
19913
19914
19915
19916
19917
19918
19919
19920
19921
19922
19923
19924
19925
19926
19927
19928
19929
19930
19931
19932
19933
19934
19935
19936
19937
19938
19939
19940
19941
19942
19943
19944
19945
19946
19947
19948
19949
19950
19951
19952
19953
19954
19955
19956
19957
19958
19959
19960
19961
19962
19963
19964
19965
19966
19967
19968
19969
19970
19971
19972
19973
19974
19975
19976
19977
19978
19979
19980
19981
19982
19983
19984
19985
19986
19987
19988
19989
19990
19991
19992
19993
19994
19995
19996
19997
19998
19999
20000
20001
20002
20003
20004
20005
20006
20007
20008
20009
20010
20011
20012
20013
20014
20015
20016
20017
20018
20019
20020
20021
20022
20023
20024
20025
20026
20027
20028
20029
20030
20031
20032
20033
20034
20035
20036
20037
20038
20039
20040
20041
20042
20043
20044
20045
20046
20047
20048
20049
20050
20051
20052
20053
20054
20055
20056
20057
20058
20059
20060
20061
20062
20063
20064
20065
20066
20067
20068
20069
20070
20071
20072
20073
20074
20075
20076
20077
20078
20079
20080
20081
20082
20083
20084
20085
20086
20087
20088
20089
20090
20091
20092
20093
20094
20095
20096
20097
20098
20099
20100
20101
20102
20103
20104
20105
20106
20107
20108
20109
20110
20111
20112
20113
20114
20115
20116
20117
20118
20119
20120
20121
20122
20123
20124
20125
20126
20127
20128
20129
20130
20131
20132
20133
20134
20135
20136
20137
20138
20139
20140
20141
20142
20143
20144
20145
20146
20147
20148
20149
20150
20151
20152
20153
20154
20155
20156
20157
20158
20159
20160
20161
20162
20163
20164
20165
20166
20167
20168
20169
20170
20171
20172
20173
20174
20175
20176
20177
20178
20179
20180
20181
20182
20183
20184
20185
20186
20187
20188
20189
20190
20191
20192
20193
20194
20195
20196
20197
20198
20199
20200
20201
20202
20203
20204
20205
20206
20207
20208
20209
20210
20211
20212
20213
20214
20215
20216
20217
20218
20219
20220
20221
20222
20223
20224
20225
20226
20227
20228
20229
20230
20231
20232
20233
20234
20235
20236
20237
20238
20239
20240
20241
20242
20243
20244
20245
20246
20247
20248
20249
20250
20251
20252
20253
20254
20255
20256
20257
20258
20259
20260
20261
20262
20263
20264
20265
20266
20267
20268
20269
20270
20271
20272
20273
20274
20275
20276
20277
20278
20279
20280
20281
20282
20283
20284
20285
20286
20287
20288
20289
20290
20291
20292
20293
20294
20295
20296
20297
20298
20299
20300
20301
20302
20303
20304
20305
20306
20307
20308
20309
20310
20311
20312
20313
20314
20315
20316
20317
20318
20319
20320
20321
20322
20323
20324
20325
20326
20327
20328
20329
20330
20331
20332
20333
20334
20335
20336
20337
20338
20339
20340
20341
20342
20343
20344
20345
20346
20347
20348
20349
20350
20351
20352
20353
20354
20355
20356
20357
20358
20359
20360
20361
20362
20363
20364
20365
20366
20367
20368
20369
20370
20371
20372
20373
20374
20375
20376
20377
20378
20379
20380
20381
20382
20383
20384
20385
20386
20387
20388
20389
20390
20391
20392
20393
20394
20395
20396
20397
20398
20399
20400
20401
20402
20403
20404
20405
20406
20407
20408
20409
20410
20411
20412
20413
20414
20415
20416
20417
20418
20419
20420
20421
20422
20423
20424
20425
20426
20427
20428
20429
20430
20431
20432
20433
20434
20435
20436
20437
20438
20439
20440
20441
20442
20443
20444
20445
20446
20447
20448
20449
20450
20451
20452
20453
20454
20455
20456
20457
20458
20459
20460
20461
20462
20463
20464
20465
20466
20467
20468
20469
20470
20471
20472
20473
20474
20475
20476
20477
20478
20479
20480
20481
20482
20483
20484
20485
20486
20487
20488
20489
20490
20491
20492
20493
20494
20495
20496
20497
20498
20499
20500
20501
20502
20503
20504
20505
20506
20507
20508
20509
20510
20511
20512
20513
20514
20515
20516
20517
20518
20519
20520
20521
20522
20523
20524
20525
20526
20527
20528
20529
20530
20531
20532
20533
20534
20535
20536
20537
20538
20539
20540
20541
20542
20543
20544
20545
20546
20547
20548
20549
20550
20551
20552
20553
20554
20555
20556
20557
20558
20559
20560
20561
20562
20563
20564
20565
20566
20567
20568
20569
20570
20571
20572
20573
20574
20575
20576
20577
20578
20579
20580
20581
20582
20583
20584
20585
20586
20587
20588
20589
20590
20591
20592
20593
20594
20595
20596
20597
20598
20599
20600
20601
20602
20603
20604
20605
20606
20607
20608
20609
20610
20611
20612
20613
20614
20615
20616
20617
20618
20619
20620
20621
20622
20623
20624
20625
20626
20627
20628
20629
20630
20631
20632
20633
20634
20635
20636
20637
20638
20639
20640
20641
20642
20643
20644
20645
20646
20647
20648
20649
20650
20651
20652
20653
20654
20655
20656
20657
20658
20659
20660
20661
20662
20663
20664
20665
20666
20667
20668
20669
20670
20671
20672
20673
20674
20675
20676
20677
20678
20679
20680
20681
20682
20683
20684
20685
20686
20687
20688
20689
20690
20691
20692
20693
20694
20695
20696
20697
20698
20699
20700
20701
20702
20703
20704
20705
20706
20707
20708
20709
20710
20711
20712
20713
20714
20715
20716
20717
20718
20719
20720
20721
20722
20723
20724
20725
20726
20727
20728
20729
20730
20731
20732
20733
20734
20735
20736
20737
20738
20739
20740
20741
20742
20743
20744
20745
20746
20747
20748
20749
20750
20751
20752
20753
20754
20755
20756
20757
20758
20759
20760
20761
20762
20763
20764
20765
20766
20767
20768
20769
20770
20771
20772
20773
20774
20775
20776
20777
20778
20779
20780
20781
20782
20783
20784
20785
20786
20787
20788
20789
20790
20791
20792
20793
20794
20795
20796
20797
20798
20799
20800
20801
20802
20803
20804
20805
20806
20807
20808
20809
20810
20811
20812
20813
20814
20815
20816
20817
20818
20819
20820
20821
20822
20823
20824
20825
20826
20827
20828
20829
20830
20831
20832
20833
20834
20835
20836
20837
20838
20839
20840
20841
20842
20843
20844
20845
20846
20847
20848
20849
20850
20851
20852
20853
20854
20855
20856
20857
20858
20859
20860
20861
20862
20863
20864
20865
20866
20867
20868
20869
20870
20871
20872
20873
20874
20875
20876
20877
20878
20879
20880
20881
20882
20883
20884
20885
20886
20887
20888
20889
20890
20891
20892
20893
20894
20895
20896
20897
20898
20899
20900
20901
20902
20903
20904
20905
20906
20907
20908
20909
20910
20911
20912
20913
20914
20915
20916
20917
20918
20919
20920
20921
20922
20923
20924
20925
20926
20927
20928
20929
20930
20931
20932
20933
20934
20935
20936
20937
20938
20939
20940
20941
20942
20943
20944
20945
20946
20947
20948
20949
20950
20951
20952
20953
20954
20955
20956
20957
20958
20959
20960
20961
20962
20963
20964
20965
20966
20967
20968
20969
20970
20971
20972
20973
20974
20975
20976
20977
20978
20979
20980
20981
20982
20983
20984
20985
20986
20987
20988
20989
20990
20991
20992
20993
20994
20995
20996
20997
20998
20999
21000
21001
21002
21003
21004
21005
21006
21007
21008
21009
21010
21011
21012
21013
21014
21015
21016
21017
21018
21019
21020
21021
21022
21023
21024
21025
21026
21027
21028
21029
21030
21031
21032
21033
21034
21035
21036
21037
21038
21039
21040
21041
21042
21043
21044
21045
21046
21047
21048
21049
21050
21051
21052
21053
21054
21055
21056
21057
21058
21059
21060
21061
21062
21063
21064
21065
21066
21067
21068
21069
21070
21071
21072
21073
21074
21075
21076
21077
21078
21079
21080
21081
21082
21083
21084
21085
21086
21087
21088
21089
21090
21091
21092
21093
21094
21095
21096
21097
21098
21099
21100
21101
21102
21103
21104
21105
21106
21107
21108
21109
21110
21111
21112
21113
21114
21115
21116
21117
21118
21119
21120
21121
21122
21123
21124
21125
21126
21127
21128
21129
21130
21131
21132
21133
21134
21135
21136
21137
21138
21139
21140
21141
21142
21143
21144
21145
21146
21147
21148
21149
21150
21151
21152
21153
21154
21155
21156
21157
21158
21159
21160
21161
21162
21163
21164
21165
21166
21167
21168
21169
21170
21171
21172
21173
21174
21175
21176
21177
21178
21179
21180
21181
21182
21183
21184
21185
21186
21187
21188
21189
21190
21191
21192
21193
21194
21195
21196
21197
21198
21199
21200
21201
21202
21203
21204
21205
21206
21207
21208
21209
21210
21211
21212
21213
21214
21215
21216
21217
21218
21219
21220
21221
21222
21223
21224
21225
21226
21227
21228
21229
21230
21231
21232
21233
21234
21235
21236
21237
21238
21239
21240
21241
21242
21243
21244
21245
21246
21247
21248
21249
21250
21251
21252
21253
21254
21255
21256
21257
21258
21259
21260
21261
21262
21263
21264
21265
21266
21267
21268
21269
21270
21271
21272
21273
21274
21275
21276
21277
21278
21279
21280
21281
21282
21283
21284
21285
21286
21287
21288
21289
21290
21291
21292
21293
21294
21295
21296
21297
21298
21299
21300
21301
21302
21303
21304
21305
21306
21307
21308
21309
21310
21311
21312
21313
21314
21315
21316
21317
21318
21319
21320
21321
21322
21323
21324
21325
21326
21327
21328
21329
21330
21331
21332
21333
21334
21335
21336
21337
21338
21339
21340
21341
21342
21343
21344
21345
21346
21347
21348
21349
21350
21351
21352
21353
21354
21355
21356
21357
21358
21359
21360
21361
21362
21363
21364
21365
21366
21367
21368
21369
21370
21371
21372
21373
21374
21375
21376
21377
21378
21379
21380
21381
21382
21383
21384
21385
21386
21387
21388
21389
21390
21391
21392
21393
21394
21395
21396
21397
21398
21399
21400
21401
21402
21403
21404
21405
21406
21407
21408
21409
21410
21411
21412
21413
21414
21415
21416
21417
21418
21419
21420
21421
21422
21423
21424
21425
21426
21427
21428
21429
21430
21431
21432
21433
21434
21435
21436
21437
21438
21439
21440
21441
21442
21443
21444
21445
21446
21447
21448
21449
21450
21451
21452
21453
21454
21455
21456
21457
21458
21459
21460
21461
21462
21463
21464
21465
21466
21467
21468
21469
21470
21471
21472
21473
21474
21475
21476
21477
21478
21479
21480
21481
21482
21483
21484
21485
21486
21487
21488
21489
21490
21491
21492
21493
21494
21495
21496
21497
21498
21499
21500
21501
21502
21503
21504
21505
21506
21507
21508
21509
21510
21511
21512
21513
21514
21515
21516
21517
21518
21519
21520
21521
21522
21523
21524
21525
21526
21527
21528
21529
21530
21531
21532
21533
21534
21535
21536
21537
21538
21539
21540
21541
21542
21543
21544
21545
21546
21547
21548
21549
21550
21551
21552
21553
21554
21555
21556
21557
21558
21559
21560
21561
21562
21563
21564
21565
21566
21567
21568
21569
21570
21571
21572
21573
21574
21575
21576
21577
21578
21579
21580
21581
21582
21583
21584
21585
21586
21587
21588
21589
21590
21591
21592
21593
21594
21595
21596
21597
21598
21599
21600
21601
21602
21603
21604
21605
21606
21607
21608
21609
21610
21611
21612
21613
21614
21615
21616
21617
21618
21619
21620
21621
21622
21623
21624
21625
21626
21627
21628
21629
21630
21631
21632
21633
21634
21635
21636
21637
21638
21639
21640
21641
21642
21643
21644
21645
21646
21647
21648
21649
21650
21651
21652
21653
21654
21655
21656
21657
21658
21659
21660
21661
21662
21663
21664
21665
21666
21667
21668
21669
21670
21671
21672
21673
21674
21675
21676
21677
21678
21679
21680
21681
21682
21683
21684
21685
21686
21687
21688
21689
21690
21691
21692
21693
21694
21695
21696
21697
21698
21699
21700
21701
21702
21703
21704
21705
21706
21707
21708
21709
21710
21711
21712
21713
21714
21715
21716
21717
21718
21719
21720
21721
21722
21723
21724
21725
21726
21727
21728
21729
21730
21731
21732
21733
21734
21735
21736
21737
21738
21739
21740
21741
21742
21743
21744
21745
21746
21747
21748
21749
21750
21751
21752
21753
21754
21755
21756
21757
21758
21759
21760
21761
21762
21763
21764
21765
21766
21767
21768
21769
21770
21771
21772
21773
21774
21775
21776
21777
21778
21779
21780
21781
21782
21783
21784
21785
21786
21787
21788
21789
21790
21791
21792
21793
21794
21795
21796
21797
21798
21799
21800
21801
21802
21803
21804
21805
21806
21807
21808
21809
21810
21811
21812
21813
21814
21815
21816
21817
21818
21819
21820
21821
21822
21823
21824
21825
21826
21827
21828
21829
21830
21831
21832
21833
21834
21835
21836
21837
21838
21839
21840
21841
21842
21843
21844
21845
21846
21847
21848
21849
21850
21851
21852
21853
21854
21855
21856
21857
21858
21859
21860
21861
21862
21863
21864
21865
21866
21867
21868
21869
21870
21871
21872
21873
21874
21875
21876
21877
21878
21879
21880
21881
21882
21883
21884
21885
21886
21887
21888
21889
21890
21891
21892
21893
21894
21895
21896
21897
21898
21899
21900
21901
21902
21903
21904
21905
21906
21907
21908
21909
21910
21911
21912
21913
21914
21915
21916
21917
21918
21919
21920
21921
21922
21923
21924
21925
21926
21927
21928
21929
21930
21931
21932
21933
21934
21935
21936
21937
21938
21939
21940
21941
21942
21943
21944
21945
21946
21947
21948
21949
21950
21951
21952
21953
21954
21955
21956
21957
21958
21959
21960
21961
21962
21963
21964
21965
21966
21967
21968
21969
21970
21971
21972
21973
21974
21975
21976
21977
21978
21979
21980
21981
21982
21983
21984
21985
21986
21987
21988
21989
21990
21991
21992
21993
21994
21995
21996
21997
21998
21999
22000
22001
22002
22003
22004
22005
22006
22007
22008
22009
22010
22011
22012
22013
22014
22015
22016
22017
22018
22019
22020
22021
22022
22023
22024
22025
22026
22027
22028
22029
22030
22031
22032
22033
22034
22035
22036
22037
22038
22039
22040
22041
22042
22043
22044
22045
22046
22047
22048
22049
22050
22051
22052
22053
22054
22055
22056
22057
22058
22059
22060
22061
22062
22063
22064
22065
22066
22067
22068
22069
22070
22071
22072
22073
22074
22075
22076
22077
22078
22079
22080
22081
22082
22083
22084
22085
22086
22087
22088
22089
22090
22091
22092
22093
22094
22095
22096
22097
22098
22099
22100
22101
22102
22103
22104
22105
22106
22107
22108
22109
22110
22111
22112
22113
22114
22115
22116
22117
22118
22119
22120
22121
22122
22123
22124
22125
22126
22127
22128
22129
22130
22131
22132
22133
22134
22135
22136
22137
22138
22139
22140
22141
22142
22143
22144
22145
22146
22147
22148
22149
22150
22151
22152
22153
22154
22155
22156
22157
22158
22159
22160
22161
22162
22163
22164
22165
22166
22167
22168
22169
22170
22171
22172
22173
22174
22175
22176
22177
22178
22179
22180
22181
22182
22183
22184
22185
22186
22187
22188
22189
22190
22191
22192
22193
22194
22195
22196
22197
22198
22199
22200
22201
22202
22203
22204
22205
22206
22207
22208
22209
22210
22211
22212
22213
22214
22215
22216
22217
22218
22219
22220
22221
22222
22223
22224
22225
22226
22227
22228
22229
22230
22231
22232
22233
22234
22235
22236
22237
22238
22239
22240
22241
22242
22243
22244
22245
22246
22247
22248
22249
22250
22251
22252
22253
22254
22255
22256
22257
22258
22259
22260
22261
22262
22263
22264
22265
22266
22267
22268
22269
22270
22271
22272
22273
22274
22275
22276
22277
22278
22279
22280
22281
22282
22283
22284
22285
22286
22287
22288
22289
22290
22291
22292
22293
22294
22295
22296
22297
22298
22299
22300
22301
22302
22303
22304
22305
22306
22307
22308
22309
22310
22311
22312
22313
22314
22315
22316
22317
22318
22319
22320
22321
22322
22323
22324
22325
22326
22327
22328
22329
22330
22331
22332
22333
22334
22335
22336
22337
22338
22339
22340
22341
22342
22343
22344
22345
22346
22347
22348
22349
22350
22351
22352
22353
22354
22355
22356
22357
22358
22359
22360
22361
22362
22363
22364
22365
22366
22367
22368
22369
22370
22371
22372
22373
22374
22375
22376
22377
22378
22379
22380
22381
22382
22383
22384
22385
22386
22387
22388
22389
22390
22391
22392
22393
22394
22395
22396
22397
22398
22399
22400
22401
22402
22403
22404
22405
22406
22407
22408
22409
22410
22411
22412
22413
22414
22415
22416
22417
22418
22419
22420
22421
22422
22423
22424
22425
22426
22427
22428
22429
22430
22431
22432
22433
22434
22435
22436
22437
22438
22439
22440
22441
22442
22443
22444
22445
22446
22447
22448
22449
22450
22451
22452
22453
22454
22455
22456
22457
22458
22459
22460
22461
22462
22463
22464
22465
22466
22467
22468
22469
22470
22471
22472
22473
22474
22475
22476
22477
22478
22479
22480
22481
22482
22483
22484
22485
22486
22487
22488
22489
22490
22491
22492
22493
22494
22495
22496
22497
22498
22499
22500
22501
22502
22503
22504
22505
22506
22507
22508
22509
22510
22511
22512
22513
22514
22515
22516
22517
22518
22519
22520
22521
22522
22523
22524
22525
22526
22527
22528
22529
22530
22531
22532
22533
22534
22535
22536
22537
22538
22539
22540
22541
22542
22543
22544
22545
22546
22547
22548
22549
22550
22551
22552
22553
22554
22555
22556
22557
22558
22559
22560
22561
22562
22563
22564
22565
22566
22567
22568
22569
22570
22571
22572
22573
22574
22575
22576
22577
22578
22579
22580
22581
22582
22583
22584
22585
22586
22587
22588
22589
22590
22591
22592
22593
22594
22595
22596
22597
22598
22599
22600
22601
22602
22603
22604
22605
22606
22607
22608
22609
22610
22611
22612
22613
22614
22615
22616
22617
22618
22619
22620
22621
22622
22623
22624
22625
22626
22627
22628
22629
22630
22631
22632
22633
22634
22635
22636
22637
22638
22639
22640
22641
22642
22643
22644
22645
22646
22647
22648
22649
22650
22651
22652
22653
22654
22655
22656
22657
22658
22659
22660
22661
22662
22663
22664
22665
22666
22667
22668
22669
22670
22671
22672
22673
22674
22675
22676
22677
22678
22679
22680
22681
22682
22683
22684
22685
22686
22687
22688
22689
22690
22691
22692
22693
22694
22695
22696
22697
22698
22699
22700
22701
22702
22703
22704
22705
22706
22707
22708
22709
22710
22711
22712
22713
22714
22715
22716
22717
22718
22719
22720
22721
22722
22723
22724
22725
22726
22727
22728
22729
22730
22731
22732
22733
22734
22735
22736
22737
22738
22739
22740
22741
22742
22743
22744
22745
22746
22747
22748
22749
22750
22751
22752
22753
22754
22755
22756
22757
22758
22759
22760
22761
22762
22763
22764
22765
22766
22767
22768
22769
22770
22771
22772
22773
22774
22775
22776
22777
22778
22779
22780
22781
22782
22783
22784
22785
22786
22787
22788
22789
22790
22791
22792
22793
22794
22795
22796
22797
22798
22799
22800
22801
22802
22803
22804
22805
22806
22807
22808
22809
22810
22811
22812
22813
22814
22815
22816
22817
22818
22819
22820
22821
22822
22823
22824
22825
22826
22827
22828
22829
22830
22831
22832
22833
22834
22835
22836
22837
22838
22839
22840
22841
22842
22843
22844
22845
22846
22847
22848
22849
22850
22851
22852
22853
22854
22855
22856
22857
22858
22859
22860
22861
22862
22863
22864
22865
22866
22867
22868
22869
22870
22871
22872
22873
22874
22875
22876
22877
22878
22879
22880
22881
22882
22883
22884
22885
22886
22887
22888
22889
22890
22891
22892
22893
22894
22895
22896
22897
22898
22899
22900
22901
22902
22903
22904
22905
22906
22907
22908
22909
22910
22911
22912
22913
22914
22915
22916
22917
22918
22919
22920
22921
22922
22923
22924
22925
22926
22927
22928
22929
22930
22931
22932
22933
22934
22935
22936
22937
22938
22939
22940
22941
22942
22943
22944
22945
22946
22947
22948
22949
22950
22951
22952
22953
22954
22955
22956
22957
22958
22959
22960
22961
22962
22963
22964
22965
22966
22967
22968
22969
22970
22971
22972
22973
22974
22975
22976
22977
22978
22979
22980
22981
22982
22983
22984
22985
22986
22987
22988
22989
22990
22991
22992
22993
22994
22995
22996
22997
22998
22999
23000
23001
23002
23003
23004
23005
23006
23007
23008
23009
23010
23011
23012
23013
23014
23015
23016
23017
23018
23019
23020
23021
23022
23023
23024
23025
23026
23027
23028
23029
23030
23031
23032
23033
23034
23035
23036
23037
23038
23039
23040
23041
23042
23043
23044
23045
23046
23047
23048
23049
23050
23051
23052
23053
23054
23055
23056
23057
23058
23059
23060
23061
23062
23063
23064
23065
23066
23067
23068
23069
23070
23071
23072
23073
23074
23075
23076
23077
23078
23079
23080
23081
23082
23083
23084
23085
23086
23087
23088
23089
23090
23091
23092
23093
23094
23095
23096
23097
23098
23099
23100
23101
23102
23103
23104
23105
23106
23107
23108
23109
23110
23111
23112
23113
23114
23115
23116
23117
23118
23119
23120
23121
23122
23123
23124
23125
23126
23127
23128
23129
23130
23131
23132
23133
23134
23135
23136
23137
23138
23139
23140
23141
23142
23143
23144
23145
23146
23147
23148
23149
23150
23151
23152
23153
23154
23155
23156
23157
23158
23159
23160
23161
23162
23163
23164
23165
23166
23167
23168
23169
23170
23171
23172
23173
23174
23175
23176
23177
23178
23179
23180
23181
23182
23183
23184
23185
23186
23187
23188
23189
23190
23191
23192
23193
23194
23195
23196
23197
23198
23199
23200
23201
23202
23203
23204
23205
23206
23207
23208
23209
23210
23211
23212
23213
23214
23215
23216
23217
23218
23219
23220
23221
23222
23223
23224
23225
23226
23227
23228
23229
23230
23231
23232
23233
23234
23235
23236
23237
23238
23239
23240
23241
23242
23243
23244
23245
23246
23247
23248
23249
23250
23251
23252
23253
23254
23255
23256
23257
23258
23259
23260
23261
23262
23263
23264
23265
23266
23267
23268
23269
23270
23271
23272
23273
23274
23275
23276
23277
23278
23279
23280
23281
23282
23283
23284
23285
23286
23287
23288
23289
23290
23291
23292
23293
23294
23295
23296
23297
23298
23299
23300
23301
23302
23303
23304
23305
23306
23307
23308
23309
23310
23311
23312
23313
23314
23315
23316
23317
23318
23319
23320
23321
23322
23323
23324
23325
23326
23327
23328
23329
23330
23331
23332
23333
23334
23335
23336
23337
23338
23339
23340
23341
23342
23343
23344
23345
23346
23347
23348
23349
23350
23351
23352
23353
23354
23355
23356
23357
23358
23359
23360
23361
23362
23363
23364
23365
23366
23367
23368
23369
23370
23371
23372
23373
23374
23375
23376
23377
23378
23379
23380
23381
23382
23383
23384
23385
23386
23387
23388
23389
23390
23391
23392
23393
23394
23395
23396
23397
23398
23399
23400
23401
23402
23403
23404
23405
23406
23407
23408
23409
23410
23411
23412
23413
23414
23415
23416
23417
23418
23419
23420
23421
23422
23423
23424
23425
23426
23427
23428
23429
23430
23431
23432
23433
23434
23435
23436
23437
23438
23439
23440
23441
23442
23443
23444
23445
23446
23447
23448
23449
23450
23451
23452
23453
23454
23455
23456
23457
23458
23459
23460
23461
23462
23463
23464
23465
23466
23467
23468
23469
23470
23471
23472
23473
23474
23475
23476
23477
23478
23479
23480
23481
23482
23483
23484
23485
23486
23487
23488
23489
23490
23491
23492
23493
23494
23495
23496
23497
23498
23499
23500
23501
23502
23503
23504
23505
23506
23507
23508
23509
23510
23511
23512
23513
23514
23515
23516
23517
23518
23519
23520
23521
23522
23523
23524
23525
23526
23527
23528
23529
23530
23531
23532
23533
23534
23535
23536
23537
23538
23539
23540
23541
23542
23543
23544
23545
23546
23547
23548
23549
23550
23551
23552
23553
23554
23555
23556
23557
23558
23559
23560
23561
23562
23563
23564
23565
23566
23567
23568
23569
23570
23571
23572
23573
23574
23575
23576
23577
23578
23579
23580
23581
23582
23583
23584
23585
23586
23587
23588
23589
23590
23591
23592
23593
23594
23595
23596
23597
23598
23599
23600
23601
23602
23603
23604
23605
23606
23607
23608
23609
23610
23611
23612
23613
23614
23615
23616
23617
23618
23619
23620
23621
23622
23623
23624
23625
23626
23627
23628
23629
23630
23631
23632
23633
23634
23635
23636
23637
23638
23639
23640
23641
23642
23643
23644
23645
23646
23647
23648
23649
23650
23651
23652
23653
23654
23655
23656
23657
23658
23659
23660
23661
23662
23663
23664
23665
23666
23667
23668
23669
23670
23671
23672
23673
23674
23675
23676
23677
23678
23679
23680
23681
23682
23683
23684
23685
23686
23687
23688
23689
23690
23691
23692
23693
23694
23695
23696
23697
23698
23699
23700
23701
23702
23703
23704
23705
23706
23707
23708
23709
23710
23711
23712
23713
23714
23715
23716
23717
23718
23719
23720
23721
23722
23723
23724
23725
23726
23727
23728
23729
23730
23731
23732
23733
23734
23735
23736
23737
23738
23739
23740
23741
23742
23743
23744
23745
23746
23747
23748
23749
23750
23751
23752
23753
23754
23755
23756
23757
23758
23759
23760
23761
23762
23763
23764
23765
23766
23767
23768
23769
23770
23771
23772
23773
23774
23775
23776
23777
23778
23779
23780
23781
23782
23783
23784
23785
23786
23787
23788
23789
23790
23791
23792
23793
23794
23795
23796
23797
23798
23799
23800
23801
23802
23803
23804
23805
23806
23807
23808
23809
23810
23811
23812
23813
23814
23815
23816
23817
23818
23819
23820
23821
23822
23823
23824
23825
23826
23827
23828
23829
23830
23831
23832
23833
23834
23835
23836
23837
23838
23839
23840
23841
23842
23843
23844
23845
23846
23847
23848
23849
23850
23851
23852
23853
23854
23855
23856
23857
23858
23859
23860
23861
23862
23863
23864
23865
23866
23867
23868
23869
23870
23871
23872
23873
23874
23875
23876
23877
23878
23879
23880
23881
23882
23883
23884
23885
23886
23887
23888
23889
23890
23891
23892
23893
23894
23895
23896
23897
23898
23899
23900
23901
23902
23903
23904
23905
23906
23907
23908
23909
23910
23911
23912
23913
23914
23915
23916
23917
23918
23919
23920
23921
23922
23923
23924
23925
23926
23927
23928
23929
23930
23931
23932
23933
23934
23935
23936
23937
23938
23939
23940
23941
23942
23943
23944
23945
23946
23947
23948
23949
23950
23951
23952
23953
23954
23955
23956
23957
23958
23959
23960
23961
23962
23963
23964
23965
23966
23967
23968
23969
23970
23971
23972
23973
23974
23975
23976
23977
23978
23979
23980
23981
23982
23983
23984
23985
23986
23987
23988
23989
23990
23991
23992
23993
23994
23995
23996
23997
23998
23999
24000
24001
24002
24003
24004
24005
24006
24007
24008
24009
24010
24011
24012
24013
24014
24015
24016
24017
24018
24019
24020
24021
24022
24023
24024
24025
24026
24027
24028
24029
24030
24031
24032
24033
24034
24035
24036
24037
24038
24039
24040
24041
24042
24043
24044
24045
24046
24047
24048
24049
24050
24051
24052
24053
24054
24055
24056
24057
24058
24059
24060
24061
24062
24063
24064
24065
24066
24067
24068
24069
24070
24071
24072
24073
24074
24075
24076
24077
24078
24079
24080
24081
24082
24083
24084
24085
24086
24087
24088
24089
24090
24091
24092
24093
24094
24095
24096
24097
24098
24099
24100
24101
24102
24103
24104
24105
24106
24107
24108
24109
24110
24111
24112
24113
24114
24115
24116
24117
24118
24119
24120
24121
24122
24123
24124
24125
24126
24127
24128
24129
24130
24131
24132
24133
24134
24135
24136
24137
24138
24139
24140
24141
24142
24143
24144
24145
24146
24147
24148
24149
24150
24151
24152
24153
24154
24155
24156
24157
24158
24159
24160
24161
24162
24163
24164
24165
24166
24167
24168
24169
24170
24171
24172
24173
24174
24175
24176
24177
24178
24179
24180
24181
24182
24183
24184
24185
24186
24187
24188
24189
24190
24191
24192
24193
24194
24195
24196
24197
24198
24199
24200
24201
24202
24203
24204
24205
24206
24207
24208
24209
24210
24211
24212
24213
24214
24215
24216
24217
24218
24219
24220
24221
24222
24223
24224
24225
24226
24227
24228
24229
24230
24231
24232
24233
24234
24235
24236
24237
24238
24239
24240
24241
24242
24243
24244
24245
24246
24247
24248
24249
24250
24251
24252
24253
24254
24255
24256
24257
24258
24259
24260
24261
24262
24263
24264
24265
24266
24267
24268
24269
24270
24271
24272
24273
24274
24275
24276
24277
24278
24279
24280
24281
24282
24283
24284
24285
24286
24287
24288
24289
24290
24291
24292
24293
24294
24295
24296
24297
24298
24299
24300
24301
24302
24303
24304
24305
24306
24307
24308
24309
24310
24311
24312
24313
24314
24315
24316
24317
24318
24319
24320
24321
24322
24323
24324
24325
24326
24327
24328
24329
24330
24331
24332
24333
24334
24335
24336
24337
24338
24339
24340
24341
24342
24343
24344
24345
24346
24347
24348
24349
24350
24351
24352
24353
24354
24355
24356
24357
24358
24359
24360
24361
24362
24363
24364
24365
24366
24367
24368
24369
24370
24371
24372
24373
24374
24375
24376
24377
24378
24379
24380
24381
24382
24383
24384
24385
24386
24387
24388
24389
24390
24391
24392
24393
24394
24395
24396
24397
24398
24399
24400
24401
24402
24403
24404
24405
24406
24407
24408
24409
24410
24411
24412
24413
24414
24415
24416
24417
24418
24419
24420
24421
24422
24423
24424
24425
24426
24427
24428
24429
24430
24431
24432
24433
24434
24435
24436
24437
24438
24439
24440
24441
24442
24443
24444
24445
24446
24447
24448
24449
24450
24451
24452
24453
24454
24455
24456
24457
24458
24459
24460
24461
24462
24463
24464
24465
24466
24467
24468
24469
24470
24471
24472
24473
24474
24475
24476
24477
24478
24479
24480
24481
24482
24483
24484
24485
24486
24487
24488
24489
24490
24491
24492
24493
24494
24495
24496
24497
24498
24499
24500
24501
24502
24503
24504
24505
24506
24507
24508
24509
24510
24511
24512
24513
24514
24515
24516
24517
24518
24519
24520
24521
24522
24523
24524
24525
24526
24527
24528
24529
24530
24531
24532
24533
24534
24535
24536
24537
24538
24539
24540
24541
24542
24543
24544
24545
24546
24547
24548
24549
24550
24551
24552
24553
24554
24555
24556
24557
24558
24559
24560
24561
24562
24563
24564
24565
24566
24567
24568
24569
24570
24571
24572
24573
24574
24575
24576
24577
24578
24579
24580
24581
24582
24583
24584
24585
24586
24587
24588
24589
24590
24591
24592
24593
24594
24595
24596
24597
24598
24599
24600
24601
24602
24603
24604
24605
24606
24607
24608
24609
24610
24611
24612
24613
24614
24615
24616
24617
24618
24619
24620
24621
24622
24623
24624
24625
24626
24627
24628
24629
24630
24631
24632
24633
24634
24635
24636
24637
24638
24639
24640
24641
24642
24643
24644
24645
24646
24647
24648
24649
24650
24651
24652
24653
24654
24655
24656
24657
24658
24659
24660
24661
24662
24663
24664
24665
24666
24667
24668
24669
24670
24671
24672
24673
24674
24675
24676
24677
24678
24679
24680
24681
24682
24683
24684
24685
24686
24687
24688
24689
24690
24691
24692
24693
24694
24695
24696
24697
24698
24699
24700
24701
24702
24703
24704
24705
24706
24707
24708
24709
24710
24711
24712
24713
24714
24715
24716
24717
24718
24719
24720
24721
24722
24723
24724
24725
24726
24727
24728
24729
24730
24731
24732
24733
24734
24735
24736
24737
24738
24739
24740
24741
24742
24743
24744
24745
24746
24747
24748
24749
24750
24751
24752
24753
24754
24755
24756
24757
24758
24759
24760
24761
24762
24763
24764
24765
24766
24767
24768
24769
24770
24771
24772
24773
24774
24775
24776
24777
24778
24779
24780
24781
24782
24783
24784
24785
24786
24787
24788
24789
24790
24791
24792
24793
24794
24795
24796
24797
24798
24799
24800
24801
24802
24803
24804
24805
24806
24807
24808
24809
24810
24811
24812
24813
24814
24815
24816
24817
24818
24819
24820
24821
24822
24823
24824
24825
24826
24827
24828
24829
24830
24831
24832
24833
24834
24835
24836
24837
24838
24839
24840
24841
24842
24843
24844
24845
24846
24847
24848
24849
24850
24851
24852
24853
24854
24855
24856
24857
24858
24859
24860
24861
24862
24863
24864
24865
24866
24867
24868
24869
24870
24871
24872
24873
24874
24875
24876
24877
24878
24879
24880
24881
24882
24883
24884
24885
24886
24887
24888
24889
24890
24891
24892
24893
24894
24895
24896
24897
24898
24899
24900
24901
24902
24903
24904
24905
24906
24907
24908
24909
24910
24911
24912
24913
24914
24915
24916
24917
24918
24919
24920
24921
24922
24923
24924
24925
24926
24927
24928
24929
24930
24931
24932
24933
24934
24935
24936
24937
24938
24939
24940
24941
24942
24943
24944
24945
24946
24947
24948
24949
24950
24951
24952
24953
24954
24955
24956
24957
24958
24959
24960
24961
24962
24963
24964
24965
24966
24967
24968
24969
24970
24971
24972
24973
24974
24975
24976
24977
24978
24979
24980
24981
24982
24983
24984
24985
24986
24987
24988
24989
24990
24991
24992
24993
24994
24995
24996
24997
24998
24999
25000
25001
25002
25003
25004
25005
25006
25007
25008
25009
25010
25011
25012
25013
25014
25015
25016
25017
25018
25019
25020
25021
25022
25023
25024
25025
25026
25027
25028
25029
25030
25031
25032
25033
25034
25035
25036
25037
25038
25039
25040
25041
25042
25043
25044
25045
25046
25047
25048
25049
25050
25051
25052
25053
25054
25055
25056
25057
25058
25059
25060
25061
25062
25063
25064
25065
25066
25067
25068
25069
25070
25071
25072
25073
25074
25075
25076
25077
25078
25079
25080
25081
25082
25083
25084
25085
25086
25087
25088
25089
25090
25091
25092
25093
25094
25095
25096
25097
25098
25099
25100
25101
25102
25103
25104
25105
25106
25107
25108
25109
25110
25111
25112
25113
25114
25115
25116
25117
25118
25119
25120
25121
25122
25123
25124
25125
25126
25127
25128
25129
25130
25131
25132
25133
25134
25135
25136
25137
25138
25139
25140
25141
25142
25143
25144
25145
25146
25147
25148
25149
25150
25151
25152
25153
25154
25155
25156
25157
25158
25159
25160
25161
25162
25163
25164
25165
25166
25167
25168
25169
25170
25171
25172
25173
25174
25175
25176
25177
25178
25179
25180
25181
25182
25183
25184
25185
25186
25187
25188
25189
25190
25191
25192
25193
25194
25195
25196
25197
25198
25199
25200
25201
25202
25203
25204
25205
25206
25207
25208
25209
25210
25211
25212
25213
25214
25215
25216
25217
25218
25219
25220
25221
25222
25223
25224
25225
25226
25227
25228
25229
25230
25231
25232
25233
25234
25235
25236
25237
25238
25239
25240
25241
25242
25243
25244
25245
25246
25247
25248
25249
25250
25251
25252
25253
25254
25255
25256
25257
25258
25259
25260
25261
25262
25263
25264
25265
25266
25267
25268
25269
25270
25271
25272
25273
25274
25275
25276
25277
25278
25279
25280
25281
25282
25283
25284
25285
25286
25287
25288
25289
25290
25291
25292
25293
25294
25295
25296
25297
25298
25299
25300
25301
25302
25303
25304
25305
25306
25307
25308
25309
25310
25311
25312
25313
25314
25315
25316
25317
25318
25319
25320
25321
25322
25323
25324
25325
25326
25327
25328
25329
25330
25331
25332
25333
25334
25335
25336
25337
25338
25339
25340
25341
25342
25343
25344
25345
25346
25347
25348
25349
25350
25351
25352
25353
25354
25355
25356
25357
25358
25359
25360
25361
25362
25363
25364
25365
25366
25367
25368
25369
25370
25371
25372
25373
25374
25375
25376
25377
25378
25379
25380
25381
25382
25383
25384
25385
25386
25387
25388
25389
25390
25391
25392
25393
25394
25395
25396
25397
25398
25399
25400
25401
25402
25403
25404
25405
25406
25407
25408
25409
25410
25411
25412
25413
25414
25415
25416
25417
25418
25419
25420
25421
25422
25423
25424
25425
25426
25427
25428
25429
25430
25431
25432
25433
25434
25435
25436
25437
25438
25439
25440
25441
25442
25443
25444
25445
25446
25447
25448
25449
25450
25451
25452
25453
25454
25455
25456
25457
25458
25459
25460
25461
25462
25463
25464
25465
25466
25467
25468
25469
25470
25471
25472
25473
25474
25475
25476
25477
25478
25479
25480
25481
25482
25483
25484
25485
25486
25487
25488
25489
25490
25491
25492
25493
25494
25495
25496
25497
25498
25499
25500
25501
25502
25503
25504
25505
25506
25507
25508
25509
25510
25511
25512
25513
25514
25515
25516
25517
25518
25519
25520
25521
25522
25523
25524
25525
25526
25527
25528
25529
25530
25531
25532
25533
25534
25535
25536
25537
25538
25539
25540
25541
25542
25543
25544
25545
25546
25547
25548
25549
25550
25551
25552
25553
25554
25555
25556
25557
25558
25559
25560
25561
25562
25563
25564
25565
25566
25567
25568
25569
25570
25571
25572
25573
25574
25575
25576
25577
25578
25579
25580
25581
25582
25583
25584
25585
25586
25587
25588
25589
25590
25591
25592
25593
25594
25595
25596
25597
25598
25599
25600
25601
25602
25603
25604
25605
25606
25607
25608
25609
25610
25611
25612
25613
25614
25615
25616
25617
25618
25619
25620
25621
25622
25623
25624
25625
25626
25627
25628
25629
25630
25631
25632
25633
25634
25635
25636
25637
25638
25639
25640
25641
25642
25643
25644
25645
25646
25647
25648
25649
25650
25651
25652
25653
25654
25655
25656
25657
25658
25659
25660
25661
25662
25663
25664
25665
25666
25667
25668
25669
25670
25671
25672
25673
25674
25675
25676
25677
25678
25679
25680
25681
25682
25683
25684
25685
25686
25687
25688
25689
25690
25691
25692
25693
25694
25695
25696
25697
25698
25699
25700
25701
25702
25703
25704
25705
25706
25707
25708
25709
25710
25711
25712
25713
25714
25715
25716
25717
25718
25719
25720
25721
25722
25723
25724
25725
25726
25727
25728
25729
25730
25731
25732
25733
25734
25735
25736
25737
25738
25739
25740
25741
25742
25743
25744
25745
25746
25747
25748
25749
25750
25751
25752
25753
25754
25755
25756
25757
25758
25759
25760
25761
25762
25763
25764
25765
25766
25767
25768
25769
25770
25771
25772
25773
25774
25775
25776
25777
25778
25779
25780
25781
25782
25783
25784
25785
25786
25787
25788
25789
25790
25791
25792
25793
25794
25795
25796
25797
25798
25799
25800
25801
25802
25803
25804
25805
25806
25807
25808
25809
25810
25811
25812
25813
25814
25815
25816
25817
25818
25819
25820
25821
25822
25823
25824
25825
25826
25827
25828
25829
25830
25831
25832
25833
25834
25835
25836
25837
25838
25839
25840
25841
25842
25843
25844
25845
25846
25847
25848
25849
25850
25851
25852
25853
25854
25855
25856
25857
25858
25859
25860
25861
25862
25863
25864
25865
25866
25867
25868
25869
25870
25871
25872
25873
25874
25875
25876
25877
25878
25879
25880
25881
25882
25883
25884
25885
25886
25887
25888
25889
25890
25891
25892
25893
25894
25895
25896
25897
25898
25899
25900
25901
25902
25903
25904
25905
25906
25907
25908
25909
25910
25911
25912
25913
25914
25915
25916
25917
25918
25919
25920
25921
25922
25923
25924
25925
25926
25927
25928
25929
25930
25931
25932
25933
25934
25935
25936
25937
25938
25939
25940
25941
25942
25943
25944
25945
25946
25947
25948
25949
25950
25951
25952
25953
25954
25955
25956
25957
25958
25959
25960
25961
25962
25963
25964
25965
25966
25967
25968
25969
25970
25971
25972
25973
25974
25975
25976
25977
25978
25979
25980
25981
25982
25983
25984
25985
25986
25987
25988
25989
25990
25991
25992
25993
25994
25995
25996
25997
25998
25999
26000
26001
26002
26003
26004
26005
26006
26007
26008
26009
26010
26011
26012
26013
26014
26015
26016
26017
26018
26019
26020
26021
26022
26023
26024
26025
26026
26027
26028
26029
26030
26031
26032
26033
26034
26035
26036
26037
26038
26039
26040
26041
26042
26043
26044
26045
26046
26047
26048
26049
26050
26051
26052
26053
26054
26055
26056
26057
26058
26059
26060
26061
26062
26063
26064
26065
26066
26067
26068
26069
26070
26071
26072
26073
26074
26075
26076
26077
26078
26079
26080
26081
26082
26083
26084
26085
26086
26087
26088
26089
26090
26091
26092
26093
26094
26095
26096
26097
26098
26099
26100
26101
26102
26103
26104
26105
26106
26107
26108
26109
26110
26111
26112
26113
26114
26115
26116
26117
26118
26119
26120
26121
26122
26123
26124
26125
26126
26127
26128
26129
26130
26131
26132
26133
26134
26135
26136
26137
26138
26139
26140
26141
26142
26143
26144
26145
26146
26147
26148
26149
26150
26151
26152
26153
26154
26155
26156
26157
26158
26159
26160
26161
26162
26163
26164
26165
26166
26167
26168
26169
26170
26171
26172
26173
26174
26175
26176
26177
26178
26179
26180
26181
26182
26183
26184
26185
26186
26187
26188
26189
26190
26191
26192
26193
26194
26195
26196
26197
26198
26199
26200
26201
26202
26203
26204
26205
26206
26207
26208
26209
26210
26211
26212
26213
26214
26215
26216
26217
26218
26219
26220
26221
26222
26223
26224
26225
26226
26227
26228
26229
26230
26231
26232
26233
26234
26235
26236
26237
26238
26239
26240
26241
26242
26243
26244
26245
26246
26247
26248
26249
26250
26251
26252
26253
26254
26255
26256
26257
26258
26259
26260
26261
26262
26263
26264
26265
26266
26267
26268
26269
26270
26271
26272
26273
26274
26275
26276
26277
26278
26279
26280
26281
26282
26283
26284
26285
26286
26287
26288
26289
26290
26291
26292
26293
26294
26295
26296
26297
26298
26299
26300
26301
26302
26303
26304
26305
26306
26307
26308
26309
26310
26311
26312
26313
26314
26315
26316
26317
26318
26319
26320
26321
26322
26323
26324
26325
26326
26327
26328
26329
26330
26331
26332
26333
26334
26335
26336
26337
26338
26339
26340
26341
26342
26343
26344
26345
26346
26347
26348
26349
26350
26351
26352
26353
26354
26355
26356
26357
26358
26359
26360
26361
26362
26363
26364
26365
26366
26367
26368
26369
26370
26371
26372
26373
26374
26375
26376
26377
26378
26379
26380
26381
26382
26383
26384
26385
26386
26387
26388
26389
26390
26391
26392
26393
26394
26395
26396
26397
26398
26399
26400
26401
26402
26403
26404
26405
26406
26407
26408
26409
26410
26411
26412
26413
26414
26415
26416
26417
26418
26419
26420
26421
26422
26423
26424
26425
26426
26427
26428
26429
26430
26431
26432
26433
26434
26435
26436
26437
26438
26439
26440
26441
26442
26443
26444
26445
26446
26447
26448
26449
26450
26451
26452
26453
26454
26455
26456
26457
26458
26459
26460
26461
26462
26463
26464
26465
26466
26467
26468
26469
26470
26471
26472
26473
26474
26475
26476
26477
26478
26479
26480
26481
26482
26483
26484
26485
26486
26487
26488
26489
26490
26491
26492
26493
26494
26495
26496
26497
26498
26499
26500
26501
26502
26503
26504
26505
26506
26507
26508
26509
26510
26511
26512
26513
26514
26515
26516
26517
26518
26519
26520
26521
26522
26523
26524
26525
26526
26527
26528
26529
26530
26531
26532
26533
26534
26535
26536
26537
26538
26539
26540
26541
26542
26543
26544
26545
26546
26547
26548
26549
26550
26551
26552
26553
26554
26555
26556
26557
26558
26559
26560
26561
26562
26563
26564
26565
26566
26567
26568
26569
26570
26571
26572
26573
26574
26575
26576
26577
26578
26579
26580
26581
26582
26583
26584
26585
26586
26587
26588
26589
26590
26591
26592
26593
26594
26595
26596
26597
26598
26599
26600
26601
26602
26603
26604
26605
26606
26607
26608
26609
26610
26611
26612
26613
26614
26615
26616
26617
26618
26619
26620
26621
26622
26623
26624
26625
26626
26627
26628
26629
26630
26631
26632
26633
26634
26635
26636
26637
26638
26639
26640
26641
26642
26643
26644
26645
26646
26647
26648
26649
26650
26651
26652
26653
26654
26655
26656
26657
26658
26659
26660
26661
26662
26663
26664
26665
26666
26667
26668
26669
26670
26671
26672
26673
26674
26675
26676
26677
26678
26679
26680
26681
26682
26683
26684
26685
26686
26687
26688
26689
26690
26691
26692
26693
26694
26695
26696
26697
26698
26699
26700
26701
26702
26703
26704
26705
26706
26707
26708
26709
26710
26711
26712
26713
26714
26715
26716
26717
26718
26719
26720
26721
26722
26723
26724
26725
26726
26727
26728
26729
26730
26731
26732
26733
26734
26735
26736
26737
26738
26739
26740
26741
26742
26743
26744
26745
26746
26747
26748
26749
26750
26751
26752
26753
26754
26755
26756
26757
26758
26759
26760
26761
26762
26763
26764
26765
26766
26767
26768
26769
26770
26771
26772
26773
26774
26775
26776
26777
26778
26779
26780
26781
26782
26783
26784
26785
26786
26787
26788
26789
26790
26791
26792
26793
26794
26795
26796
26797
26798
26799
26800
26801
26802
26803
26804
26805
26806
26807
26808
26809
26810
26811
26812
26813
26814
26815
26816
26817
26818
26819
26820
26821
26822
26823
26824
26825
26826
26827
26828
26829
26830
26831
26832
26833
26834
26835
26836
26837
26838
26839
26840
26841
26842
26843
26844
26845
26846
26847
26848
26849
26850
26851
26852
26853
26854
26855
26856
26857
26858
26859
26860
26861
26862
26863
26864
26865
26866
26867
26868
26869
26870
26871
26872
26873
26874
26875
26876
26877
26878
26879
26880
26881
26882
26883
26884
26885
26886
26887
26888
26889
26890
26891
26892
26893
26894
26895
26896
26897
26898
26899
26900
26901
26902
26903
26904
26905
26906
26907
26908
26909
26910
26911
26912
26913
26914
26915
26916
26917
26918
26919
26920
26921
26922
26923
26924
26925
26926
26927
26928
26929
26930
26931
26932
26933
26934
26935
26936
26937
26938
26939
26940
26941
26942
26943
26944
26945
26946
26947
26948
26949
26950
26951
26952
26953
26954
26955
26956
26957
26958
26959
26960
26961
26962
26963
26964
26965
26966
26967
26968
26969
26970
26971
26972
26973
26974
26975
26976
26977
26978
26979
26980
26981
26982
26983
26984
26985
26986
26987
26988
26989
26990
26991
26992
26993
26994
26995
26996
26997
26998
26999
27000
27001
27002
27003
27004
27005
27006
27007
27008
27009
27010
27011
27012
27013
27014
27015
27016
27017
27018
27019
27020
27021
27022
27023
27024
27025
27026
27027
27028
27029
27030
27031
27032
27033
27034
27035
27036
27037
27038
27039
27040
27041
27042
27043
27044
27045
27046
27047
27048
27049
27050
27051
27052
27053
27054
27055
27056
27057
27058
27059
27060
27061
27062
27063
27064
27065
27066
27067
27068
27069
27070
27071
27072
27073
27074
27075
27076
27077
27078
27079
27080
27081
27082
27083
27084
27085
27086
27087
27088
27089
27090
27091
27092
27093
27094
27095
27096
27097
27098
27099
27100
27101
27102
27103
27104
27105
27106
27107
27108
27109
27110
27111
27112
27113
27114
27115
27116
27117
27118
27119
27120
27121
27122
27123
27124
27125
27126
27127
27128
27129
27130
27131
27132
27133
27134
27135
27136
27137
27138
27139
27140
27141
27142
27143
27144
27145
27146
27147
27148
27149
27150
27151
27152
27153
27154
27155
27156
27157
27158
27159
27160
27161
27162
27163
27164
27165
27166
27167
27168
27169
27170
27171
27172
27173
27174
27175
27176
27177
27178
27179
27180
27181
27182
27183
27184
27185
27186
27187
27188
27189
27190
27191
27192
27193
27194
27195
27196
27197
27198
27199
27200
27201
27202
27203
27204
27205
27206
27207
27208
27209
27210
27211
27212
27213
27214
27215
27216
27217
27218
27219
27220
27221
27222
27223
27224
27225
27226
27227
27228
27229
27230
27231
27232
27233
27234
27235
27236
27237
27238
27239
27240
27241
27242
27243
27244
27245
27246
27247
27248
27249
27250
27251
27252
27253
27254
27255
27256
27257
27258
27259
27260
27261
27262
27263
27264
27265
27266
27267
27268
27269
27270
27271
27272
27273
27274
27275
27276
27277
27278
27279
27280
27281
27282
27283
27284
27285
27286
27287
27288
27289
27290
27291
27292
27293
27294
27295
27296
27297
27298
27299
27300
27301
27302
27303
27304
27305
27306
27307
27308
27309
27310
27311
27312
27313
27314
27315
27316
27317
27318
27319
27320
27321
27322
27323
27324
27325
27326
27327
27328
27329
27330
27331
27332
27333
27334
27335
27336
27337
27338
27339
27340
27341
27342
27343
27344
27345
27346
27347
27348
27349
27350
27351
27352
27353
27354
27355
27356
27357
27358
27359
27360
27361
27362
27363
27364
27365
27366
27367
27368
27369
27370
27371
27372
27373
27374
27375
27376
27377
27378
27379
27380
27381
27382
27383
27384
27385
27386
27387
27388
27389
27390
27391
27392
27393
27394
27395
27396
27397
27398
27399
27400
27401
27402
27403
27404
27405
27406
27407
27408
27409
27410
27411
27412
27413
27414
27415
27416
27417
27418
27419
27420
27421
27422
27423
27424
27425
27426
27427
27428
27429
27430
27431
27432
27433
27434
27435
27436
27437
27438
27439
27440
27441
27442
27443
27444
27445
27446
27447
27448
27449
27450
27451
27452
27453
27454
27455
27456
27457
27458
27459
27460
27461
27462
27463
27464
27465
27466
27467
27468
27469
27470
27471
27472
27473
27474
27475
27476
27477
27478
27479
27480
27481
27482
27483
27484
27485
27486
27487
27488
27489
27490
27491
27492
27493
27494
27495
27496
27497
27498
27499
27500
27501
27502
27503
27504
27505
27506
27507
27508
27509
27510
27511
27512
27513
27514
27515
27516
27517
27518
27519
27520
27521
27522
27523
27524
27525
27526
27527
27528
27529
27530
27531
27532
27533
27534
27535
27536
27537
27538
27539
27540
27541
27542
27543
27544
27545
27546
27547
27548
27549
27550
27551
27552
27553
27554
27555
27556
27557
27558
27559
27560
27561
27562
27563
27564
27565
27566
27567
27568
27569
27570
27571
27572
27573
27574
27575
27576
27577
27578
27579
27580
27581
27582
27583
27584
27585
27586
27587
27588
27589
27590
27591
27592
27593
27594
27595
27596
27597
27598
27599
27600
27601
27602
27603
27604
27605
27606
27607
27608
27609
27610
27611
27612
27613
27614
27615
27616
27617
27618
27619
27620
27621
27622
27623
27624
27625
27626
27627
27628
27629
27630
27631
27632
27633
27634
27635
27636
27637
27638
27639
27640
27641
27642
27643
27644
27645
27646
27647
27648
27649
27650
27651
27652
27653
27654
27655
27656
27657
27658
27659
27660
27661
27662
27663
27664
27665
27666
27667
27668
27669
27670
27671
27672
27673
27674
27675
27676
27677
27678
27679
27680
27681
27682
27683
27684
27685
27686
27687
27688
27689
27690
27691
27692
27693
27694
27695
27696
27697
27698
27699
27700
27701
27702
27703
27704
27705
27706
27707
27708
27709
27710
27711
27712
27713
27714
27715
27716
27717
27718
27719
27720
27721
27722
27723
27724
27725
27726
27727
27728
27729
27730
27731
27732
27733
27734
27735
27736
27737
27738
27739
27740
27741
27742
27743
27744
27745
27746
27747
27748
27749
27750
27751
27752
27753
27754
27755
27756
27757
27758
27759
27760
27761
27762
27763
27764
27765
27766
27767
27768
27769
27770
27771
27772
27773
27774
27775
27776
27777
27778
27779
27780
27781
27782
27783
27784
27785
27786
27787
27788
27789
27790
27791
27792
27793
27794
27795
27796
27797
27798
27799
27800
27801
27802
27803
27804
27805
27806
27807
27808
27809
27810
27811
27812
27813
27814
27815
27816
27817
27818
27819
27820
27821
27822
27823
27824
27825
27826
27827
27828
27829
27830
27831
27832
27833
27834
27835
27836
27837
27838
27839
27840
27841
27842
27843
27844
27845
27846
27847
27848
27849
27850
27851
27852
27853
27854
27855
27856
27857
27858
27859
27860
27861
27862
27863
27864
27865
27866
27867
27868
27869
27870
27871
27872
27873
27874
27875
27876
27877
27878
27879
27880
27881
27882
27883
27884
27885
27886
27887
27888
27889
27890
27891
27892
27893
27894
27895
27896
27897
27898
27899
27900
27901
27902
27903
27904
27905
27906
27907
27908
27909
27910
27911
27912
27913
27914
27915
27916
27917
27918
27919
27920
27921
27922
27923
27924
27925
27926
27927
27928
27929
27930
27931
27932
27933
27934
27935
27936
27937
27938
27939
27940
27941
27942
27943
27944
27945
27946
27947
27948
27949
27950
27951
27952
27953
27954
27955
27956
27957
27958
27959
27960
27961
27962
27963
27964
27965
27966
27967
27968
27969
27970
27971
27972
27973
27974
27975
27976
27977
27978
27979
27980
27981
27982
27983
27984
27985
27986
27987
27988
27989
27990
27991
27992
27993
27994
27995
27996
27997
27998
27999
28000
28001
28002
28003
28004
28005
28006
28007
28008
28009
28010
28011
28012
28013
28014
28015
28016
28017
28018
28019
28020
28021
28022
28023
28024
28025
28026
28027
28028
28029
28030
28031
28032
28033
28034
28035
28036
28037
28038
28039
28040
28041
28042
28043
28044
28045
28046
28047
28048
28049
28050
28051
28052
28053
28054
28055
28056
28057
28058
28059
28060
28061
28062
28063
28064
28065
28066
28067
28068
28069
28070
28071
28072
28073
28074
28075
28076
28077
28078
28079
28080
28081
28082
28083
28084
28085
28086
28087
28088
28089
28090
28091
28092
28093
28094
28095
28096
28097
28098
28099
28100
28101
28102
28103
28104
28105
28106
28107
28108
28109
28110
28111
28112
28113
28114
28115
28116
28117
28118
28119
28120
28121
28122
28123
28124
28125
28126
28127
28128
28129
28130
28131
28132
28133
28134
28135
28136
28137
28138
28139
28140
28141
28142
28143
28144
28145
28146
28147
28148
28149
28150
28151
28152
28153
28154
28155
28156
28157
28158
28159
28160
28161
28162
28163
28164
28165
28166
28167
28168
28169
28170
28171
28172
28173
28174
28175
28176
28177
28178
28179
28180
28181
28182
28183
28184
28185
28186
28187
28188
28189
28190
28191
28192
28193
28194
28195
28196
28197
28198
28199
28200
28201
28202
28203
28204
28205
28206
28207
28208
28209
28210
28211
28212
28213
28214
28215
28216
28217
28218
28219
28220
28221
28222
28223
28224
28225
28226
28227
28228
28229
28230
28231
28232
28233
28234
28235
28236
28237
28238
28239
28240
28241
28242
28243
28244
28245
28246
28247
28248
28249
28250
28251
28252
28253
28254
28255
28256
28257
28258
28259
28260
28261
28262
28263
28264
28265
28266
28267
28268
28269
28270
28271
28272
28273
28274
28275
28276
28277
28278
28279
28280
28281
28282
28283
28284
28285
28286
28287
28288
28289
28290
28291
28292
28293
28294
28295
28296
28297
28298
28299
28300
28301
28302
28303
28304
28305
28306
28307
28308
28309
28310
28311
28312
28313
28314
28315
28316
28317
28318
28319
28320
28321
28322
28323
28324
28325
28326
28327
28328
28329
28330
28331
28332
28333
28334
28335
28336
28337
28338
28339
28340
28341
28342
28343
28344
28345
28346
28347
28348
28349
28350
28351
28352
28353
28354
28355
28356
28357
28358
28359
28360
28361
28362
28363
28364
28365
28366
28367
28368
28369
28370
28371
28372
28373
28374
28375
28376
28377
28378
28379
28380
28381
28382
28383
28384
28385
28386
28387
28388
28389
28390
28391
28392
28393
28394
28395
28396
28397
28398
28399
28400
28401
28402
28403
28404
28405
28406
28407
28408
28409
28410
28411
28412
28413
28414
28415
28416
28417
28418
28419
28420
28421
28422
28423
28424
28425
28426
28427
28428
28429
28430
28431
28432
28433
28434
28435
28436
28437
28438
28439
28440
28441
28442
28443
28444
28445
28446
28447
28448
28449
28450
28451
28452
28453
28454
28455
28456
28457
28458
28459
28460
28461
28462
28463
28464
28465
28466
28467
28468
28469
28470
28471
28472
28473
28474
28475
28476
28477
28478
28479
28480
28481
28482
28483
28484
28485
28486
28487
28488
28489
28490
28491
28492
28493
28494
28495
28496
28497
28498
28499
28500
28501
28502
28503
28504
28505
28506
28507
28508
28509
28510
28511
28512
28513
28514
28515
28516
28517
28518
28519
28520
28521
28522
28523
28524
28525
28526
28527
28528
28529
28530
28531
28532
28533
28534
28535
28536
28537
28538
28539
28540
28541
28542
28543
28544
28545
28546
28547
28548
28549
28550
28551
28552
28553
28554
28555
28556
28557
28558
28559
28560
28561
28562
28563
28564
28565
28566
28567
28568
28569
28570
28571
28572
28573
28574
28575
28576
28577
28578
28579
28580
28581
28582
28583
28584
28585
28586
28587
28588
28589
28590
28591
28592
28593
28594
28595
28596
28597
28598
28599
28600
28601
28602
28603
28604
28605
28606
28607
28608
28609
28610
28611
28612
28613
28614
28615
28616
28617
28618
28619
28620
28621
28622
28623
28624
28625
28626
28627
28628
28629
28630
28631
28632
28633
28634
28635
28636
28637
28638
28639
28640
28641
28642
28643
28644
28645
28646
28647
28648
28649
28650
28651
28652
28653
28654
28655
28656
28657
28658
28659
28660
28661
28662
28663
28664
28665
28666
28667
28668
28669
28670
28671
28672
28673
28674
28675
28676
28677
28678
28679
28680
28681
28682
28683
28684
28685
28686
28687
28688
28689
28690
28691
28692
28693
28694
28695
28696
28697
28698
28699
28700
28701
28702
28703
28704
28705
28706
28707
28708
28709
28710
28711
28712
28713
28714
28715
28716
28717
28718
28719
28720
28721
28722
28723
28724
28725
28726
28727
28728
28729
28730
28731
28732
28733
28734
28735
28736
28737
28738
28739
28740
28741
28742
28743
28744
28745
28746
28747
28748
28749
28750
28751
28752
28753
28754
28755
28756
28757
28758
28759
28760
28761
28762
28763
28764
28765
28766
28767
28768
28769
28770
28771
28772
28773
28774
28775
28776
28777
28778
28779
28780
28781
28782
28783
28784
28785
28786
28787
28788
28789
28790
28791
28792
28793
28794
28795
28796
28797
28798
28799
28800
28801
28802
28803
28804
28805
28806
28807
28808
28809
28810
28811
28812
28813
28814
28815
28816
28817
28818
28819
28820
28821
28822
28823
28824
28825
28826
28827
28828
28829
28830
28831
28832
28833
28834
28835
28836
28837
28838
28839
28840
28841
28842
28843
28844
28845
28846
28847
28848
28849
28850
28851
28852
28853
28854
28855
28856
28857
28858
28859
28860
28861
28862
28863
28864
28865
28866
28867
28868
28869
28870
28871
28872
28873
28874
28875
28876
28877
28878
28879
28880
28881
28882
28883
28884
28885
28886
28887
28888
28889
28890
28891
28892
28893
28894
28895
28896
28897
28898
28899
28900
28901
28902
28903
28904
28905
28906
28907
28908
28909
28910
28911
28912
28913
28914
28915
28916
28917
28918
28919
28920
28921
28922
28923
28924
28925
28926
28927
28928
28929
28930
28931
28932
28933
28934
28935
28936
28937
28938
28939
28940
28941
28942
28943
28944
28945
28946
28947
28948
28949
28950
28951
28952
28953
28954
28955
28956
28957
28958
28959
28960
28961
28962
28963
28964
28965
28966
28967
28968
28969
28970
28971
28972
28973
28974
28975
28976
28977
28978
28979
28980
28981
28982
28983
28984
28985
28986
28987
28988
28989
28990
28991
28992
28993
28994
28995
28996
28997
28998
28999
29000
29001
29002
29003
29004
29005
29006
29007
29008
29009
29010
29011
29012
29013
29014
29015
29016
29017
29018
29019
29020
29021
29022
29023
29024
29025
29026
29027
29028
29029
29030
29031
29032
29033
29034
29035
29036
29037
29038
29039
29040
29041
29042
29043
29044
29045
29046
29047
29048
29049
29050
29051
29052
29053
29054
29055
29056
29057
29058
29059
29060
29061
29062
29063
29064
29065
29066
29067
29068
29069
29070
29071
29072
29073
29074
29075
29076
29077
29078
29079
29080
29081
29082
29083
29084
29085
29086
29087
29088
29089
29090
29091
29092
29093
29094
29095
29096
29097
29098
29099
29100
29101
29102
29103
29104
29105
29106
29107
29108
29109
29110
29111
29112
29113
29114
29115
29116
29117
29118
29119
29120
29121
29122
29123
29124
29125
29126
29127
29128
29129
29130
29131
29132
29133
29134
29135
29136
29137
29138
29139
29140
29141
29142
29143
29144
29145
29146
29147
29148
29149
29150
29151
29152
29153
29154
29155
29156
29157
29158
29159
29160
29161
29162
29163
29164
29165
29166
29167
29168
29169
29170
29171
29172
29173
29174
29175
29176
29177
29178
29179
29180
29181
29182
29183
29184
29185
29186
29187
29188
29189
29190
29191
29192
29193
29194
29195
29196
29197
29198
29199
29200
29201
29202
29203
29204
29205
29206
29207
29208
29209
29210
29211
29212
29213
29214
29215
29216
29217
29218
29219
29220
29221
29222
29223
29224
29225
29226
29227
29228
29229
29230
29231
29232
29233
29234
29235
29236
29237
29238
29239
29240
29241
29242
29243
29244
29245
29246
29247
29248
29249
29250
29251
29252
29253
29254
29255
29256
29257
29258
29259
29260
29261
29262
29263
29264
29265
29266
29267
29268
29269
29270
29271
29272
29273
29274
29275
29276
29277
29278
29279
29280
29281
29282
29283
29284
29285
29286
29287
29288
29289
29290
29291
29292
29293
29294
29295
29296
29297
29298
29299
29300
29301
29302
29303
29304
29305
29306
29307
29308
29309
29310
29311
29312
29313
29314
29315
29316
29317
29318
29319
29320
29321
29322
29323
29324
29325
29326
29327
29328
29329
29330
29331
29332
29333
29334
29335
29336
29337
29338
29339
29340
29341
29342
29343
29344
29345
29346
29347
29348
29349
29350
29351
29352
29353
29354
29355
29356
29357
29358
29359
29360
29361
29362
29363
29364
29365
29366
29367
29368
29369
29370
29371
29372
29373
29374
29375
29376
29377
29378
29379
29380
29381
29382
29383
29384
29385
29386
29387
29388
29389
29390
29391
29392
29393
29394
29395
29396
29397
29398
29399
29400
29401
29402
29403
29404
29405
29406
29407
29408
29409
29410
29411
29412
29413
29414
29415
29416
29417
29418
29419
29420
29421
29422
29423
29424
29425
29426
29427
29428
29429
29430
29431
29432
29433
29434
29435
29436
29437
29438
29439
29440
29441
29442
29443
29444
29445
29446
29447
29448
29449
29450
29451
29452
29453
29454
29455
29456
29457
29458
29459
29460
29461
29462
29463
29464
29465
29466
29467
29468
29469
29470
29471
29472
29473
29474
29475
29476
29477
29478
29479
29480
29481
29482
29483
29484
29485
29486
29487
29488
29489
29490
29491
29492
29493
29494
29495
29496
29497
29498
29499
29500
29501
29502
29503
29504
29505
29506
29507
29508
29509
29510
29511
29512
29513
29514
29515
29516
29517
29518
29519
29520
29521
29522
29523
29524
29525
29526
29527
29528
29529
29530
29531
29532
29533
29534
29535
29536
29537
29538
29539
29540
29541
29542
29543
29544
29545
29546
29547
29548
29549
29550
29551
29552
29553
29554
29555
29556
29557
29558
29559
29560
29561
29562
29563
29564
29565
29566
29567
29568
29569
29570
29571
29572
29573
29574
29575
29576
29577
29578
29579
29580
29581
29582
29583
29584
29585
29586
29587
29588
29589
29590
29591
29592
29593
29594
29595
29596
29597
29598
29599
29600
29601
29602
29603
29604
29605
29606
29607
29608
29609
29610
29611
29612
29613
29614
29615
29616
29617
29618
29619
29620
29621
29622
29623
29624
29625
29626
29627
29628
29629
29630
29631
29632
29633
29634
29635
29636
29637
29638
29639
29640
29641
29642
29643
29644
29645
29646
29647
29648
29649
29650
29651
29652
29653
29654
29655
29656
29657
29658
29659
29660
29661
29662
29663
29664
29665
29666
29667
29668
29669
29670
29671
29672
29673
29674
29675
29676
29677
29678
29679
29680
29681
29682
29683
29684
29685
29686
29687
29688
29689
29690
29691
29692
29693
29694
29695
29696
29697
29698
29699
29700
29701
29702
29703
29704
29705
29706
29707
29708
29709
29710
29711
29712
29713
29714
29715
29716
29717
29718
29719
29720
29721
29722
29723
29724
29725
29726
29727
29728
29729
29730
29731
29732
29733
29734
29735
29736
29737
29738
29739
29740
29741
29742
29743
29744
29745
29746
29747
29748
29749
29750
29751
29752
29753
29754
29755
29756
29757
29758
29759
29760
29761
29762
29763
29764
29765
29766
29767
29768
29769
29770
29771
29772
29773
29774
29775
29776
29777
29778
29779
29780
29781
29782
29783
29784
29785
29786
29787
29788
29789
29790
29791
29792
29793
29794
29795
29796
29797
29798
29799
29800
29801
29802
29803
29804
29805
29806
29807
29808
29809
29810
29811
29812
29813
29814
29815
29816
29817
29818
29819
29820
29821
29822
29823
29824
29825
29826
29827
29828
29829
29830
29831
29832
29833
29834
29835
29836
29837
29838
29839
29840
29841
29842
29843
29844
29845
29846
29847
29848
29849
29850
29851
29852
29853
29854
29855
29856
29857
29858
29859
29860
29861
29862
29863
29864
29865
29866
29867
29868
29869
29870
29871
29872
29873
29874
29875
29876
29877
29878
29879
29880
29881
29882
29883
29884
29885
29886
29887
29888
29889
29890
29891
29892
29893
29894
29895
29896
29897
29898
29899
29900
29901
29902
29903
29904
29905
29906
29907
29908
29909
29910
29911
29912
29913
29914
29915
29916
29917
29918
29919
29920
29921
29922
29923
29924
29925
29926
29927
29928
29929
29930
29931
29932
29933
29934
29935
29936
29937
29938
29939
29940
29941
29942
29943
29944
29945
29946
29947
29948
29949
29950
29951
29952
29953
29954
29955
29956
29957
29958
29959
29960
29961
29962
29963
29964
29965
29966
29967
29968
29969
29970
29971
29972
29973
29974
29975
29976
29977
29978
29979
29980
29981
29982
29983
29984
29985
29986
29987
29988
29989
29990
29991
29992
29993
29994
29995
29996
29997
29998
29999
30000
30001
30002
30003
30004
30005
30006
30007
30008
30009
30010
30011
30012
30013
30014
30015
30016
30017
30018
30019
30020
30021
30022
30023
30024
30025
30026
30027
30028
30029
30030
30031
30032
30033
30034
30035
30036
30037
30038
30039
30040
30041
30042
30043
30044
30045
30046
30047
30048
30049
30050
30051
30052
30053
30054
30055
30056
30057
30058
30059
30060
30061
30062
30063
30064
30065
30066
30067
30068
30069
30070
30071
30072
30073
30074
30075
30076
30077
30078
30079
30080
30081
30082
30083
30084
30085
30086
30087
30088
30089
30090
30091
30092
30093
30094
30095
30096
30097
30098
30099
30100
30101
30102
30103
30104
30105
30106
30107
30108
30109
30110
30111
30112
30113
30114
30115
30116
30117
30118
30119
30120
30121
30122
30123
30124
30125
30126
30127
30128
30129
30130
30131
30132
30133
30134
30135
30136
30137
30138
30139
30140
30141
30142
30143
30144
30145
30146
30147
30148
30149
30150
30151
30152
30153
30154
30155
30156
30157
30158
30159
30160
30161
30162
30163
30164
30165
30166
30167
30168
30169
30170
30171
30172
30173
30174
30175
30176
30177
30178
30179
30180
30181
30182
30183
30184
30185
30186
30187
30188
30189
30190
30191
30192
30193
30194
30195
30196
30197
30198
30199
30200
30201
30202
30203
30204
30205
30206
30207
30208
30209
30210
30211
30212
30213
30214
30215
30216
30217
30218
30219
30220
30221
30222
30223
30224
30225
30226
30227
30228
30229
30230
30231
30232
30233
30234
30235
30236
30237
30238
30239
30240
30241
30242
30243
30244
30245
30246
30247
30248
30249
30250
30251
30252
30253
30254
30255
30256
30257
30258
30259
30260
30261
30262
30263
30264
30265
30266
30267
30268
30269
30270
30271
30272
30273
30274
30275
30276
30277
30278
30279
30280
30281
30282
30283
30284
30285
30286
30287
30288
30289
30290
30291
30292
30293
30294
30295
30296
30297
30298
30299
30300
30301
30302
30303
30304
30305
30306
30307
30308
30309
30310
30311
30312
30313
30314
30315
30316
30317
30318
30319
30320
30321
30322
30323
30324
30325
30326
30327
30328
30329
30330
30331
30332
30333
30334
30335
30336
30337
30338
30339
30340
30341
30342
30343
30344
30345
30346
30347
30348
30349
30350
30351
30352
30353
30354
30355
30356
30357
30358
30359
30360
30361
30362
30363
30364
30365
30366
30367
30368
30369
30370
30371
30372
30373
30374
30375
30376
30377
30378
30379
30380
30381
30382
30383
30384
30385
30386
30387
30388
30389
30390
30391
30392
30393
30394
30395
30396
30397
30398
30399
30400
30401
30402
30403
30404
30405
30406
30407
30408
30409
30410
30411
30412
30413
30414
30415
30416
30417
30418
30419
30420
30421
30422
30423
30424
30425
30426
30427
30428
30429
30430
30431
30432
30433
30434
30435
30436
30437
30438
30439
30440
30441
30442
30443
30444
30445
30446
30447
30448
30449
30450
30451
30452
30453
30454
30455
30456
30457
30458
30459
30460
30461
30462
30463
30464
30465
30466
30467
30468
30469
30470
30471
30472
30473
30474
30475
30476
30477
30478
30479
30480
30481
30482
30483
30484
30485
30486
30487
30488
30489
30490
30491
30492
30493
30494
30495
30496
30497
30498
30499
30500
30501
30502
30503
30504
30505
30506
30507
30508
30509
30510
30511
30512
30513
30514
30515
30516
30517
30518
30519
30520
30521
30522
30523
30524
30525
30526
30527
30528
30529
30530
30531
30532
30533
30534
30535
30536
30537
30538
30539
30540
30541
30542
30543
30544
30545
30546
30547
30548
30549
30550
30551
30552
30553
30554
30555
30556
30557
30558
30559
30560
30561
30562
30563
30564
30565
30566
30567
30568
30569
30570
30571
30572
30573
30574
30575
30576
30577
30578
30579
30580
30581
30582
30583
30584
30585
30586
30587
30588
30589
30590
30591
30592
30593
30594
30595
30596
30597
30598
30599
30600
30601
30602
30603
30604
30605
30606
30607
30608
30609
30610
30611
30612
30613
30614
30615
30616
30617
30618
30619
30620
30621
30622
30623
30624
30625
30626
30627
30628
30629
30630
30631
30632
30633
30634
30635
30636
30637
30638
30639
30640
30641
30642
30643
30644
30645
30646
30647
30648
30649
30650
30651
30652
30653
30654
30655
30656
30657
30658
30659
30660
30661
30662
30663
30664
30665
30666
30667
30668
30669
30670
30671
30672
30673
30674
30675
30676
30677
30678
30679
30680
30681
30682
30683
30684
30685
30686
30687
30688
30689
30690
30691
30692
30693
30694
30695
30696
30697
30698
30699
30700
30701
30702
30703
30704
30705
30706
30707
30708
30709
30710
30711
30712
30713
30714
30715
30716
30717
30718
30719
30720
30721
30722
30723
30724
30725
30726
30727
30728
30729
30730
30731
30732
30733
30734
30735
30736
30737
30738
30739
30740
30741
30742
30743
30744
30745
30746
30747
30748
30749
30750
30751
30752
30753
30754
30755
30756
30757
30758
30759
30760
30761
30762
30763
30764
30765
30766
30767
30768
30769
30770
30771
30772
30773
30774
30775
30776
30777
30778
30779
30780
30781
30782
30783
30784
30785
30786
30787
30788
30789
30790
30791
30792
30793
30794
30795
30796
30797
30798
30799
30800
30801
30802
30803
30804
30805
30806
30807
30808
30809
30810
30811
30812
30813
30814
30815
30816
30817
30818
30819
30820
30821
30822
30823
30824
30825
30826
30827
30828
30829
30830
30831
30832
30833
30834
30835
30836
30837
30838
30839
30840
30841
30842
30843
30844
30845
30846
30847
30848
30849
30850
30851
30852
30853
30854
30855
30856
30857
30858
30859
30860
30861
30862
30863
30864
30865
30866
30867
30868
30869
30870
30871
30872
30873
30874
30875
30876
30877
30878
30879
30880
30881
30882
30883
30884
30885
30886
30887
30888
30889
30890
30891
30892
30893
30894
30895
30896
30897
30898
30899
30900
30901
30902
30903
30904
30905
30906
30907
30908
30909
30910
30911
30912
30913
30914
30915
30916
30917
30918
30919
30920
30921
30922
30923
30924
30925
30926
30927
30928
30929
30930
30931
30932
30933
30934
30935
30936
30937
30938
30939
30940
30941
30942
30943
30944
30945
30946
30947
30948
30949
30950
30951
30952
30953
30954
30955
30956
30957
30958
30959
30960
30961
30962
30963
30964
30965
30966
30967
30968
30969
30970
30971
30972
30973
30974
30975
30976
30977
30978
30979
30980
30981
30982
30983
30984
30985
30986
30987
30988
30989
30990
30991
30992
30993
30994
30995
30996
30997
30998
30999
commit 894b0955a250682da0d2d3f074f910aaaf88c168
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 23:39:05 2025 -0500

    Update ReleaseNotes.md.

commit e10e526b3ed0271cf4c42cd36815e182aacabe4b
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Fri Nov 29 02:16:48 2024 +0200

    Add complex return detection for nvfortran (#765)
    
    Details:
    - Search for Intel ifx and NVIDIA/PGI Fortran compilers.
    - Correctly determine the Fortran compiler vendor for Intel ifx and NVIDIA/PGI compilers.
    - Determine the compiler version and correct Fortran complex return type for NVIDIA/PGI.
    
    (cherry picked from commit 12f2efa7dfe11a684d62af02592499d91b7e344b)

commit 9a5c0290b8996bffe1a48511255c7055b777713c
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Fri Jan 24 21:44:32 2025 +0000

    Blacklist KNL with GCC 15+ (#844)
    
    Details:
    - GCC 15 drops support for Xeon Phi architectures such as KNL.
    - This PR blacklists the `knl` configuration for GCC 15+.
    
    (cherry picked from commit 7e8a5891902312a281bce37037eaa06d7d501639)

commit 154f9fcd9cf24b85451685c7e91e34a836375aed
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 17:14:58 2025 -0500

    CHANGELOG update (2.0)

commit 200c795373e8ddeffee8d957dcadd05f7e10ab7f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 17:13:42 2025 -0500

    Update ReleaseNotes.md.

commit 11b276fb3b3848cc40cc5ceb87317f885fa90547
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 17:08:09 2025 -0500

    Add refined version macros.
    
    Details:
    - The `BLIS_VERSION` macro currently provides a string literal version of the BLIS version.
    - Add `BLIS_VERSION_{MAJOR,MINOR,REVISION}` macros which provide integer literals useful for programmatically comparing versions from `blis.h` alone.
    
    (cherry picked from commit 290af2ea8f06a84bc4792a3e64b99f539bb347a7)

commit cfbb94d7b8165f0a94c671320bace59f82626cd0
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 16:46:23 2025 -0500

    Remove unnecessary OpenMP include. (#875)
    
    Details:
    - Previously, `<omp.h>` was included in `bli_thrcomm_openmp.h` so that the framework
      could access the necessary OpenMP functions.
    - As @melven reported (#873), this causes issues when `blis.h` is included in C++ code since
      the `<omp.h>` include happens with `extern "C"`.
    - Move the include from the header to the necessary .c files so that it does not "pollute" `blis.h`.
    
    (cherry picked from commit 843a5e8d394d126ed370da523d2c09d7e12b582d)

commit 05094eaba978fe14c573ccd37dac3379251e6c59
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 15:38:15 2025 -0500

    Apply temporary fix for gcc 15. (#874)
    
    Details:
    - As reported in #845, gcc 15 fails to build the haswell
      gemmsup kernels due to the use of rbp.
    - As a temporary fix, disable slp-tree-vectorization in just
      the affected files.
    - Thanks @loveshack for reporting and @chillenb for the suggested
      fix.
    - Eventually, the kernels should be rewritten to avoid using rbp.
    
    (cherry picked from commit 36effd70b6a323856d98b17dda9cc3afd181b658)

commit 1a6809e4e5320e5f52dda768cc6c6e2faa44dbf1
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 24 14:55:38 2025 -0500

    Update ReleaseNotes.md.

commit 650b450b988fb0a467ebacdf8dede2b6cc6eb3ab
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jun 23 17:41:01 2025 -0500

    CHANGELOG update (2.0)

commit b6a0372e016d69294493b381921357efbdf89674
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jun 23 17:41:00 2025 -0500

    Version file update (2.0)

commit abc38a9ca341069bfe224bd190fa9ea712c32e54
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jun 23 17:40:12 2025 -0500

    Update ReleaseNotes.md.

commit 17ffb7212bebae92991e090f40ddb7e3c8cc1354
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Jun 21 20:00:21 2025 -0500

    Update license and related info.
    
    (cherry picked from commit cb0da3e6e851c0c6b1896812a6e986f4a6a11f4a)

commit bf7a93fa3389372fb0a1611b573a68ffb96a5786
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jun 20 18:04:46 2025 -0500

    Fix an issue with 1m and negative strides. (#871)
    
    Details:
    - In the reference `gemm1m` kernel, the code checks if the matrix
      is "preferentially stored", meaning columns (rows) are contiguous
      if the real-domain microkernel is column- (row-)preferential.
    - However, `bli_is_preferentially_stored` checks for contiguity
      based on the *absolute value* of the row and column strides,
      such that a row or column stride of -1 is indicated as
      preferentially stored.
    - Passing the stride of -1 to the real-domain kernel then essentially
      causes elements along each row or column to be written in opposite
      order. This causes problems for 1m because a) imaginary elements
      are written before real elements, and b) the provided pointer
      points to the last *complex* element, which is one real element
      too low and can then cause an out-of-bounds error when the last
      real-domain element is written to an address preceding the row
      or column storage.
    - This commit adds a check for positivity of `rs_c` and `cs_c`
      in `bli_gemm1m_ref` and `bli_gemm_ccr_ref` in order to pass
      through directly to the real-domain microkernel. Technically,
      only a -1 stride along the preferential storage direction will
      lead to the errors noted above, but who know what other bad
      things might happen for other negative strides (and god forbid
      you put in a stride of 0...).
    
    (cherry picked from commit 028c5172be2994cf2dc9daf31b20c46455d4c36e)

commit 965c667b2b7c7b229c5591a3e60b7ef51b21e6d1
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jun 11 12:36:49 2025 -0500

    Allow runtime configuration selection by name. (#870)
    
    Allow runtime configuration selection by name.
    
    Details:
    - The `BLIS_ARCH_TYPE` environment variable currently only allows
      numerical values, which requires reading the source to select the
      appropriate value (and these values can change over time).
    - Implement selection by name (case insensitive), based on the names
      returned by `blis_arch_string` (typically the same as the folder name
      in the `config` directory).
    
    (cherry picked from commit 5718de15ead2c3fdc5df63fb3159c0c6bb63b3eb)

commit b33cf051f5c65738f898bc68b5ce007d038c823d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jun 9 14:26:34 2025 -0500

    Fix the thread info node used in packing.
    
    Details:
    - Previously, the packing kernel used the wrong thread info
      node for packing, specifically, it used the node intended for
      the GEMM kernel. Normally this is OK since there is no additional
      thread partitioning between packing and the kernel. However,
      for some external applications, additional data needed to be
      allocated on the GEMM thread info node which conflicted with
      the packing buffer.
    - This commit uses the correct (parent) thread info node
      during packing.
    
    (cherry picked from commit 3e3355a4cffeccc17c5fbedb1e2144d6ad22e24d)

commit 9c941be53db3ae5e6cc3257e43f4e939220086f2
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Jun 7 14:13:18 2025 -0500

    CHANGELOG update (2.0-rc1)

commit 43c8b0085459c86b9bf7614dae399429e1865475
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Jun 7 14:13:18 2025 -0500

    Version file update (2.0-rc1)

commit 4e2f8f9071681a51abe7438ef06c884afeca59ec
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jan 17 13:54:40 2025 -0600

    Update release instructions. (#837)
    
    Details:
    - Rename `RELEASING` to `RELEASING.md`.
    - Add additional structure and Markdown notation to `RELEASING.md`.
    - Add a section on the overall release and branching strategy.
    - Clarify and tweak instructions for making release candidates and releases.
    - Add instructions for making point releaases and back-porting bug fixes.
    - Rename `build/start-new-rc.sh` to `build/do-release.sh`.
    - Tweak `do-release.sh` to do only common tasks for rcs, major releases, and point releases.
    - Add `-b` option to `do-release.sh` which does a "bare" release without a new branch or tag (for "dev releases" on master).
    - Update the version file on `master` to `3.0-dev` to reflect the new guidelines.
    
    (cherry picked from commit fb7ba1da524efa47011d95cfd8a9fee86018fcf0)

commit ac3e46f54c5c4a784644e12481b0a77894415005
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Jun 7 14:09:52 2025 -0500

    Update CREDITS and ReleaseNotes.md.

commit 60d8a4a499be9f9a7cacec375a28bf5a8bec7a30
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Jun 7 13:17:15 2025 -0500

    Fix arch definitions in CI.

commit af5de5bfd199c0ea8f369d81401ee823733c708c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Jun 7 12:33:41 2025 -0500

    Turn on CI testing for release tracking branches.
    
    (cherry picked from commit 09e77a43651ba2673c3abac23093a595f1b3a920)

commit 1a281e0f8791a0e5c86bfae6afccebe6bd378d3b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu May 1 10:11:59 2025 -0500

    Update CREDITS
    
    [ci skip]
    
    (cherry picked from commit 5097c599b58aecb7f990cc7bd7a5dad688a48df8)

commit a8f3d7efad027d06471a639db6ad85cba47f473c
Author: Atsushi Tatsuma <yoshoku@outlook.com>
Date:   Fri May 2 00:09:31 2025 +0900

    Fix to prevent is_win flag setting with clang on macOS (#867)
    
    Details:
    - In some cases, macOS was improperly detected as Windows due to a builtin preprocessor definition `#define TARGET_OS_WINDOWS 0`.
    - Update the detection to specifically look for `#define _WIN32` which more robustly detects Windows.
    
    (cherry picked from commit ec5b57289feaea755ff2eb4ab39511f3dd5879d6)

commit 97a441ada10e2dc9044616cccbc365cac0524062
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date:   Mon Apr 7 21:21:45 2025 +0200

    Examples: replace all 4.1f printm format by 4.3f (#865)
    
    Details:
    - This avoids possible misinterpretation of computation results printed on stdout (thanks Mason McBride for reporting it in #864).
    - Also force space for positive numbers to help with alignment.
    
    (cherry picked from commit 5d9e110a2aa58b6e5d131db9131bae0143f22f9f)

commit c3497ed3645284fcd61fda719a47167173767fbd
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Apr 2 12:03:43 2025 -0500

    Fix for plugins without explicit optimized kernels.
    
    (cherry picked from commit 53d21cb478801d8e978082da2889e5e67d4221c9)

commit c90ecfb26c6cae4e3f288d74d4ac8d99283f081c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Mar 2 09:08:35 2025 -0600

    Adjust CI testing (#860)
    
    Details:
    - Add tests for the `generic` config, including forcing broadcast-A,B which uses a different reference kernel. This uncovered a number of bugs, especially in `trsm`/`gemmtrsm` reference kernels, as well as diagonal packing.
    - Move threaded builds into main build and run `make check` once for each enabled backend.
    - Fix unused variable warnings in level-0 macros.
    - Fix `bli_tbastbbs_mxn` and add `bli_tcompressbbs_mxn`. The latter was missing from the reference `gemmtrsm` microkernel and is needed since the B11 block is accumulated to but, for complex datatypes, the effective imaginary stride is non-unit if B is broadcast packed.
    - Run all BLAS tests single-threaded.
    
    (cherry picked from commit 50054a6a7c0561d22720254ab6a9be1199ac10ab)

commit 92bbb13b934558b9cace3a9734a028b117983f63
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Mar 2 08:56:54 2025 -0600

    Fix check for SVE instructions which caused problems on Windows. (#859)
    
    * Fix check for SVE instructions which caused problems on Windows.
    
    Details:
    - The context intialization for `armsve` was using the HWCAP functionality of Linux to check if SVE instructions are actually available, since these are used to determine the register blocksizes. Naturally, this causes problems on Windows.
    - Instead, use functions from `bli_cpuid.c` to check for SVE. On Windows, no check is actually done and SVE is never detected.
    - In the case that the user specifically requests the `armsve` config on Windows, only enable this check for the whole `arm64` family and just assume SVE is available otherwise.
    
    * Blacklist armsve on Windows.
    
    (cherry picked from commit 37e52a613a6fec3fe1cde0ca018498a16b28a5dc)

commit 4715e59ebdc4fc710b5f53df4c1bc66374654046
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Mar 2 08:50:37 2025 -0600

    Fix problem in `bli_obj_imag_part`. (#861)
    
    Details:
    - When adjusting the buffer to point to the first imaginary element, the function `bli_obj_buffer_at_off` was used which includes and currently set offsets, but then `bli_obj_set_buffer` was used which is the offset *before* applying offsets.
    - Now a matching `bli_obj_buffer` call is used to avoid any offsets.
    
    (cherry picked from commit 97084c75acd0ed104efc5da4dac0fb38a4a044f1)

commit d3fa776b2d374a072403e8796d5e4570461e683b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Feb 27 13:48:17 2025 -0600

    Add new level-0 macro layer. (#830)
    
    Details:
    - Developed by @fgvanzee and @devinamatthews.
    - Level-0 scalar macros have moved from a named-based system (e.g. `bli_dcopys( ... )`) to a macro argument-based system (`bli_tcopys( d,d, ... )`).
    - All macros are explicitly mixed-type.
    - All input and output operands can have a distinct type (precision and/or domain). Unnecessary computations and spurious NaN/Inf propagation are avoided in mixed-domain cases.
    - All macros which do math (i.e. not copy/set/etc.) take an additional computational precision.
    - Tile-level macros, 1m, broadcast-B, and other extensions are also included.
    - All macros should correctly handle aliasing of input and output operands (this needs to be rigorously checked).
    - The macros work generically over the defined types -- new types only need limited support (primarily conversion to other types and basic math).
    - For code outside of core BLIS (optimized kernels, sandboxes, etc.), a selection of legacy macros have been added which translate to the new level-0 macros. Behavior is unchanged.
    - A standalone, templated C++ testsuite for the level-0 macros has been added. It is currently included as part of the CircleCI tests.
    - Const-correctness of level-0 macros is also checked.
    
    (cherry picked from commit a014a08189d05f45752f7ac23d8d42a24536fb93)

commit 60119e7d9a85b8f14c45d049e5abd7a7c143e85c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Feb 19 13:53:39 2025 -0600

    Update README.md
    
    Details:
    - Add status badge for CircleCI.
    - [ci skip]
    
    (cherry picked from commit 3c71737e426f8d567f1324b82609b4a61db670f8)

commit 584a936d6bb4d148304de805e34d9e8f1e50854d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Feb 19 13:31:19 2025 -0600

    Do not use symbol aliases on macOS. (#856)
    
    Details:
    - The BLAS/CBLAS function `?gemmtr` is currently implemented as a symbol alias of the already-existing `?gemmt`. This does not work on macOS/Darwin.
    - Instead, use a minimal wrapper function which calls the appropriate existing BLAS/CBLAS function.
    - Also clean up the CBLAS prototypes a bit.
    
    (cherry picked from commit 14047f62d1fc746cbabe112197cfc1afe526a82a)

commit e5400a74c3e681357031d27eb375b1eef7cae2f6
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Feb 8 13:47:08 2025 -0600

    Add CircleCI (#855)
    
    Details:
    - This PR adds CircleCI testing in addition to TravisCI and Appveyor.
    - All of the same tests as on Travis are run, except that different hardware typically ends up being used (usually Zen on Travis, Xeon Platinum on Circle). This has actually exposed a couple of bugs (see #850 and #852).
    - The `travis` directory has been renamed to `ci` as it is now shared.
    - Running SDE on CircleCI is a bit problematic because glibc changed how CPUID detection is done. This requires running some architectures with different hardware definition files and forcing a config via `BLIS_ARCH_TYPE`.
    
    (cherry picked from commit 40a52dc0289f27f74a43e886ae14bf11738db169)

commit dffdaae41531ea2160cb21ae172d117dd896f3c4
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Feb 6 23:22:24 2025 -0600

    Fix problem with clang-14.0.0 and reference `gemm` ukr. (#854)
    
    Details:
    - clang 14.0.0 apparently makes some invalid assumptions about whether
      or not the AB microtile is initialized in the `gemm` reference
      microkernel. This leads to the "scale by alpha" part doing something
      strange (all sorts of random and even NaN values pop up). I do not
      know why this only manifested for `ztrsm` on `skx` (in
      `zgemm_skx_ref` via `zgemmtrsm_skx_ref`). See #852.
    - Aliasing the AB microtile (in the proper datatype) as a pointer to
      a raw character array, and then initializing the character array
      with `= { 0 }` convinces the compiler to do the right thing.
    - The problem did not occur in 14.0.6 or 15.0.7. It may only be a narrow
      band of versions which are problematic.
    - This commit adds the char array workaround and fixes #852.
    
    (cherry picked from commit 028be422e306986674f7b1d96b99153bf2a6477e)
    (cherry picked from commit a0d7f26ba37689d351963c276711e4f51bf99e3e)

commit c091c25d116179619ffd61d9824cba55372b6e52
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Feb 5 16:10:37 2025 -0600

    Increase the max size for stack buffers. (#851)
    
    Details:
    - See #850 for details on the problem.
    - This is a temporary fix which should work for sdcz data types.
    - Altra architectures may still not fully work for MP/MD as the stack buffer size is hard-coded.
    
    (cherry picked from commit 5ad37a860b191f905a7ed895280a8057573ae909)

commit 84e2ed773972276426ecf25b2037ec642d1ac15f
Author: M. Zhou <cdluminate@gmail.com>
Date:   Wed Feb 5 14:07:01 2025 -0800

    Alias *gemmt_ as *gemmtr_ to fix lapack 3.12.1 compatibility. (#849)
    
    Details:
    - Alias `?gemmt_` as `?gemmtr_` to fix lapack 3.12.1 compatibility. (Fixes #848)
    - Add the `?gemmtr_ `and `cblas_?gemmtr` aliases to symbol list.
    - Also alias `cblas_?gemmt` as `cblas_?gemmtr` for lapack 3.12.1 compatibility.
    
    (cherry picked from commit a6f2ce9dd53fbe099650d322fa69b21a3be10fb0)

commit 9e96bcda6a126ff6bcba381bc2add2849cb792aa
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Jan 16 17:07:44 2025 -0600

    Create a new type to represent IDs for all kernels, blocksizes, etc.
    
    Details:
    - Currently, all enums used to represent built-in kernel IDs, blocksizes, preferences, and operation IDs have a special member equal to `BLIS_VA_END`, which in turn is `(siz_t)-1`. In principle, this would force the underlying type used to represent the enum values to be as wide as `siz_t`, particularly when passed to the variadic function `bli_cntx_set_ukrs` and friends. User-registered kernels IDs and such are of type `siz_t` explicitly. However, gcc (12 and older), clang, and icx pass literal enum constants (e.g. `BLIS_MR`) that are small enough as `int` when 32-bit mode is used (`-m32`). This causes a misalignment of the parameters on the stack and ultimately a segfault. The problem also exists in 64-bit mode with clang and icx and on aarch64 with clang, as parameters far enough down the list to go on the stack do not get the upper 4 bytes initialized.
    - This commit introduces a new type `kerid_t` which is always `uint32_t`. This type is used for all kernel, blocksize, preference, and operation IDs (including user-registered ones). It is also used for `BLIS_VA_END`.
    - Now all enum values are always passed as 32-bit ints on all architectures.
    - Fixes #839.
    - [cherry-picked from 32cc0ae3]

commit 19573faaba3baca0a58663eec4ad8716846bc811
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jan 14 18:18:59 2025 -0600

    CHANGELOG update (2.0-rc0)

commit 215b76cfd82ce78fdbbd6786a3743ef6ae444f9a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jan 14 18:18:58 2025 -0600

    Version file update (2.0-rc0)

commit 790b995f8f351d0147d3c9d781c3dab549317af1
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jan 14 18:15:40 2025 -0600

    ReleaseNotes.md update.

commit 0ef4580a5d459270af6e9ee2971c14bb91315fb8
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jan 14 17:17:36 2025 -0600

    Add documentation for plugins (#820)
    
    Add documentation for the plugin system and for modifying the control tree to make custom operations.
    
    Details:
    - `docs/PluginHowTo.md` describes in a "tutorial style" how to implement a custom BLAS-like operation by creating a plugin and then modifying the `gemm` control tree to achieve the desired effect.
    - Briefly, plugins allow users to add new kernels and associated block sizes/preferences to BLIS without modifying the BLIS source code. User-provided kernels are compiled using the BLIS build system for configured architectures and selected at runtime based on the actual hardware.
    - To implement custom operations, users can combine their own kernels (and/or existing BLIS kernels) with a customized control tree, which represents the specific algorithmic steps. Users can customize the kernels to be used for packing and for computation, extra information passed to kernels (e.g. additional parameters or data), block sizes, etc. An API is provided for modifying the default `gemm` control tree (also used for other level-3 operations, except `trsm`).
    - [cherry-picked from 5cb70d8e]

commit 1426d6fe5ffb5a90514cf3cf0b248322e1d172bd
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jan 14 17:05:06 2025 -0600

    CREDITS file update.

commit b36bc95693091d1777b74eeb14d29ac8e76760a3
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 10 14:48:45 2024 -0500

    Fix some aspects of the control tre/plugin infrastructure (#827)
    
    Details:
    - Use configure-time variable substitution rather than the PNAME macro
      to generate symbol names in plugins. This makes it much easier for
      uses to see what names their symbols will have (and to change them
      if desired).
    - Use 'siz_t' rather than 'ukr_t' for anything dealing with kernel IDs
      (and similar for blocksizes and kernel preferences). Because users
      can now register new kernels, the values of the IDs for their custom
      kernels are no longer enumerated in 'ukr_t', which causes type
      conversion problems. This requires also being careful about the type
      of BLIS_VA_END and forcing existing enumerations like 'ukr_t' to be
      represented using integers of the same width as 'siz_t'.
    - Modify the gemm control tree initialization function to indicate
      whether or not the operation as a whole was transposed. This is
      needed if users have to treat the initial A and B differently in the
      control tree, for example in a tensor times matrix operation (if
      transposed to matrix times tensor, we need to know which "matrix"
      object is now the tensor).

commit 8d9be878b1a59aba401fd0d7b1b24c34526f0e81
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Thu Aug 8 14:41:30 2024 -0500

    Flatten cblas.h immediately after blis.h. (#819)
    
    Details:
    - Previously, if the user enabled CBLAS via 'configure --enable-cblas'
      and then ran 'make', the flattened blis.h header file would be created
      immediately, but the flattened cblas.h header file would not be
      created until 'make install' was run. This was happening because
      nothing in the BLIS build process (except installation) depended on
      the flattened cblas.h (whereas *everything* depends on the flattened
      blis.h, and therefore it was being created first). This behavior can
      be confusing to application developers who could reasonably expect
      that the flattened cblas.h header would be available (to inspect or
      use) prior to running 'make install'.
    - This commit fixes the aforementioned issue by (1) adding cblas.h (if
      CBLAS is enabled) as a dependency to all of the build rules for core
      framework object files, and (2) making the flattened blis.h a
      prerequisite for flattening cblas.h. The upshot is that (1) ensures
      that the flattened cblas.h is created around the the same time that
      the flattened blis.h is created, and (2) ensures that the two headers
      are flattened sequentially (first blis.h and then cblas.h) even when
      using 'make -j[n]', which ensures that the output of the two processes
      do not comingle.
    - Thanks to Jeff Diamond for reporting this issue.

commit a822cb2e22b7ac0c6aec4d477f93301ccf65a296
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Thu Aug 8 13:34:37 2024 -0500

    Fixed out-of-bounds read bug in sup haswell ukr. (#824)
    
    Details:
    - Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel.
      The kernel was erroneously performing an out-of-bounds read whenever
      the singleton edge case loop executed (that is, whenever the k
      dimension of the millikernel problem was not a multiple of 8). This
      OOB error was the result of a copy-paste bug; when developing the
      s1x16n function, I started from a copy of the s2x16n function, but
      then failed to delete the instruction that reads the second element
      of A in the code that handles the PR loop's edge case. Thanks to
      @j-bm for reporting this bug in Issue #821 and helping narrow down
      the cause to the rax register.
    - CREDITS file update.

commit 8820f8f91efd32e38e2995e73323656ef767bbd8
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Tue Jun 25 22:56:23 2024 -0500

    Fixed typo in 4158930; variable renames. (#815)
    
    Details:
    - Fixed a typo in the "./configure --help" output for the ScaLAPACK
      compatibility option implemented in 4158930.
    - Trivial variable renames.

commit 31ecf820b9eb3368ad907ae6b192bf7397ebc92c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Jun 20 18:23:23 2024 -0500

    Fix a bug in the piledriver microkernels. (#814)
    
    Details:
    - At some point, the piledriver (and bulldozer and excavator)
      microkernel tests via SDE had been removed from Travis CI testing.
      This PR re-enables them.
    - A bug in the piledriver complex gemm microkernels has also been
      fixed. The `beta*C` product was not being correctly added to the `A*B`
      product before writing back out to memory.
    - Fixes #811.

commit 415893066e966159799d96166cadcf9bb5535b1c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 18 22:03:32 2024 -0500

    Add ScaLAPACK compatibility mode. (#813)
    
    Details:
    - Add configure options `--enable-scalapack-compat` and `--disabled-scalapack-compat`
      (default disabled).
    - Add a macro `BLIS_{ENABLE,DISABLE}_SCALAPACK_COMPAT` to bli_config.h.
    - This option and macro control any changes to the API necessary to maintain
      compatibility with ScaLAPACK. Currently, this only means disabling the complex
      versions of `syr`, `syr2`, and `symv`. In the future, other changes could be
      controlled by the same flag.
    - Complex `syr2` wasn't enabled at the same time that complex `syr` and `symv` were.
      This is now corrected.

commit 5cbec6503de335b3b63fa5d4f388fddd3aff2b61
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 4 11:30:22 2024 -0500

    Update CREDITS

commit 729c57c15aa50030145ff702626c31839ded3502
Author: AngryLoki <AngryLoki@users.noreply.github.com>
Date:   Wed Jun 5 00:28:41 2024 +0800

    Fix SyntaxWarning messages from python 3.12 (#809)
    
    Details:
    - When using regexes in Python, certain characters need backslash escaping, e.g.:
      ```python
      regex = re.compile( '^[\s]*#include (["<])([\w\.\-/]*)([">])' )
      ```
      However, technically escape sequences like `\s` are not valid and should actually be double-escaped: `\\s`.
      Python 3.12 now warns about such escape sequences, and in a later version these warning will be promoted
      to errors. See also: https://docs.python.org/dev/whatsnew/3.12.html#other-language-changes. The fix here
      is to use Python's "raw strings" to avoid double-escaping. This issue can be checked for all files in the current
      directory with the command `python -m compileall -d . -f -q .`
    - Thanks to @AngryLoki for the fix.

commit 6d0ab74f6975fdf4d19cee06d946b09b6ca89656
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon May 6 16:02:03 2024 -0500

    Updates to README.md section on downloading.
    
    Details:
    - Updated the text in README.md in the "How to Download BLIS" section.
      The new text no longer recommends that the reader use the 'master'
      branch over official releases, as the previous text did. The text was
      tweaked since (a) the 'master' branch is now akin to a development
      branch, and (b) the reader will no longer forgo bugfixes by sticking
      to official releases since we will (going forward) publish bugfix
      releases for the most recent version.

commit 01e151a9658cbe07ee0cac8b03fa13fef26df19e
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon May 6 15:37:27 2024 -0500

    Updated RELEASING file; fixes to ReleaseNotes.md.
    
    Details:
    - Updated RELEASING file to reflect new release protocols, given the
      more sophisticated policy of maintaining release candidate branches
      separate from 'master' (which is now more akin to a development
      branch). Further refinements to this file will likely follow.
    - Fixed typos in ReleaseNotes.md. Thanks to Robert van de Geijn for
      reporting these.

commit 06dddf1e51ccff70d77ee8cb731c3217e70eb730
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon May 6 13:47:42 2024 -0500

    ReleaseNotes.md update.

commit a876918c8c79a1c3d3d95de1f283350b7249b8ae
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon May 6 13:37:48 2024 -0500

    CHANGELOG update (1.0)

commit c2af113c7ba6d0dcc128ba36ec6e140d89180cf3
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon May 6 13:37:47 2024 -0500

    Version file update (1.0)

commit 5ab286f61525f8ead35ecc258305a5ccd4ee096b
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon May 6 13:14:52 2024 -0500

    Added a script to help create new rc branches.
    
    Details:
    - Added a new script, build/start-new-rc.sh, which:
      1. Updates the version file with a new version string.
      2. Commits (locally) the version string update.
      3. Updates the CHANGELOG file with the output of 'git log'.
      4. Commits (locally) the CHANGLOG file update.
      5. Creates a new branch whose name is equal to "<vers>-rc0" where
         <vers> is the new version string.
      6. Reminds the user to execute some final steps if everything looks
         good.
      This new script will help in the future when it's time to start a new
      release candidate branch/lineage off of 'master'. Note that this
      script is based on build/bump-version.sh (which itself may change in
      the future due to changes in the way versions/releases will be handled
      going forward).

commit cad51491e8a0b306015a5a02881dc2a9b60dd8d9
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Tue Apr 30 16:46:54 2024 -0500

    Use "-i auto" by default in test/3 drivers.
    
    Details:
    - Request default induced method behavior of BLIS via "-i auto" when
      running the standalone performance drivers in test/3 via the runme.sh
      script present in that directory. (Previously, the runme.sh script
      would use "-i native" by default.) This change was originally intended
      for fd1a7e3.

commit fd1a7e3ca9547718aa61c806848099705216182b
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Thu Apr 25 15:00:59 2024 -0500

    Allow test/3 drivers to use default ind_t method. (#804)
    
    Details:
    - Previously, the standalone performance drivers in test/3 were written
      under the assumption that the user would want to explicitly test
      either native execution *or* 1m. But because the accompanying runme.sh
      script defaults to passing "native" in for the -i command line option
      (which explicitly sets the induced method type), running the script
      without modification causes the test drivers to use slow reference
      microkernels on systems where native complex-domain microkernels are
      not registered -- which will yield poor performance for complex-domain
      level-3 operations. Furthermore, even if a user was aware of this, the
      test drivers did not support any single value for the -i option that
      would test BLIS using the library's default behavior -- that is, using
      1m on systems where it is needed and native execution on systems that
      have native microkernels implemented and registered.
    - This commit addresses the aforementioned issue by supporting a new
      value for the -i option: "auto". The "auto" value causes the driver
      to avoid explicitly setting the induced method altogether, leaving
      BLIS's default behavior in place. This "auto" option is also now the
      default setting within the runme.sh script. Thanks to Leick Robinson
      for finding and reporting this issue.
    - Also added support for "nat" as a shorthand for "native", which
      the help text already (erroneously) claimed was supported.

commit a49238e6141c96a41aa3c2a4adb0b0663d0b4968
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Apr 24 15:07:18 2024 -0500

    Refactor the control tree and other infrastructure (#710)
    
    Details:
    1. A "plugin" architecture.
    - Users are now able to register new kernels, kernel preferences, and
      blocksizes at runtime, directly from user applications.
    - Plugins can be created, configured, and built using only an installed
      version of BLIS -- no source or source code changes required.
    - Plugins support both reference and optimized kernels, as well as
      custom configuration-to-kernel-set mappings.
    - Building plugins (including reference and relevant optimized kernels)
      for enabled architectures or architecture families is automated, as is
      linking into the final library.
    - The configure script is now installed as 'configure-plugin'. In this
      mode, it can be used to initialize a plugin from a template including
      optional example code, and prepare a build system for compiling the
      plugin into a shared or static library.
    - Additional configuration files, templates, and build system components
      are also installed to '%prefix%/share/blis'.
    - The cntx_t struct now has extensible data structures for holding
      kernels, preferences, and blocksizes. These are based on a "stack"
      structure which contains a list of fixed-size data blocks. Adding a
      new entry (which may require allocating a new block or reallocating
      the block pointer array) requires locking, but looking up entries is
      lock-free and takes O(1) time.
    - Kernels can depend on either 1 or 2 type parameters (e.g.
      mixed-precision packing requires 2). The func2_t struct supports
      the latter, but can be implicitly cast to func_t if only "diagonal"
      entries are needed. The number of type parameters can be inferred from
      the kernel ID for type safety.
    - Functions have been added to register new kernels, preferences, and
      blocksizes with the global kernel structure (gks). This creates
      corresponding entries in each allocated context and returns the next
      available ID. Plugins use this API to register user kernels, although
      the user is responsible for tracking the returned IDs for later
      lookup. Setting newly-registered reference kernels, as well as
      overriding these with optimized kernels is done in exactly the same
      manner as in bli_cntx_init_ref() and bli_cntx_init_<subconfig>().
    
    2. Restructuring of the control and thread control trees.
    - The control tree has been substantially restructured to support more
      flexibility.
    - The "default" control trees for gemm (also used for
      hemm/symm/herk/her2k/syrk/syr2k/trmm/trmm3) and trsm are now
      represented as a single structure containing all necessary control
      tree nodes and parameters.
    - An API has been added to modify the default gemm/trsm control trees.
    - This same API is used by the framework and packm/gemm/trsm variants
      to access specific control tree nodes.
    - Users can alternatively create a custom control tree from scratch.
    - The blocksizes are now encoded directly in the control tree, rather
      than via loop IDs. The logic for adjusting blocksizes for certain
      operations has been moved to the control tree initialization.
    - Type information is encoded in the control tree to drive proper
      selection of packing and computational kernels provided by the user.
    - The packing microkernel now receives an opaque "params" struct which
      is user-definable and can be used to pass additional information
      through the call stack.
    - The auxinfo_t struct has been updated with a .params field for
      opaque user data as well as the global offsets of the current
      microtile.
    - The packm and gemm variants can be overridden by the user, and also
      receive an opaque params struct via the associated control tree
      node.
    - The structure-aware packing kernel bli_packm_struc_cxk() is no longer
      hard-coded to be called from the default packm variant, but can be
      overridden by the user. It also supports mixed-precision/mixed-domain
      natively now.
    - The thread control tree (thrinfo_t) is now created entirely up-front
      by inspecting the control tree. The required number of threads at each
      level is encoded in the control tree via loop IDs (actually a bitfield
      of loop IDs), although the ordering and number of such IDs is
      arbitrary. The logic for adjusting the number of threads at each level
      based on operation type (e.g. trmm) is now in the control tree
      initialization and expressed by combining loop IDs from multiple
      levels into a single level.
    - The mem_t object containing the pack buffer pointer has been moved
      from the control tree to the thread control tree. NOTE: **The control
      tree is now strictly const throughout the operation, and only a
      single copy is shared by all threads.**
    - The thread control tree node for packing has been changed so that
      there is no longer a "fake" node indicating a team of single threads.
      Instead, the number of threads and thread IDs in the "normal" thread
      control tree node are used. This change has also been made to the
      gemmsup thread control tree and packing variants, as well as to the
      gemmlike sandbox.
    - Parameters controlling packing (e.g. inversion of the diagonal,
      direction, schema) are not stored directly in the control tree but in
      the opaque params struct. The packing control tree node and its
      default params struct are stored together in the "combined"
      gemm/trsm control tree structure and initialized as a unit. Users can
      update these parameters individually or substitute a custom packm
      variant and params struct.
    - The "target" and "execution" datatypes has been removed from the obj_t
      struct and replaced by type information in the control tree.
    - The "sub-node" and "sub-prenode" of a control tree node have been
      replaced by an arbitrary number of sub-nodes accessed by index. There
      is a hard cap on the number of sub-nodes (currently 2). Sub-nodes are
      added during control tree initialization, *after*
      creation/initialization of the parent node through an updated API.
    - The level-3 thread decorator has been significantly simplified and
      directly calls bli_l3_int(). The control tree is created externally,
      and it is no longer necessary to alias matrices or set object pack
      schemas. Also, the rntm_t passed in may be NULL. Finally, family
      and scalar information is no longer needed here.
    - bli_l3_int() is now a simple inline function which extracts the next
      control tree node and variant and calls it.
    - bli_*_front() have been removed and inlined into the expert object
      API with significant simplification.
    - 1m (or other induced method) no longer uses an alternative cntx_t.
    - The .pack_fn/.ker_fn pointers and associated params fields on the
      obj_t were removed in favor of the present solution.
    
    3. Overhaul of variable substitution in configure script.
    - The configure script has been somewhat re-written to use a
      centralized mechanism for substituting variables into build system and
      other configuration files.
    - All substitution variables go through the same pathway now, which
      necessitated some variable naming changes for variables which were
      named the same in e.g. Makefile and bli_config.h but with
      different definitions.
    - CC and CXX variables can now contain spaces, e.g. 'g++ -std=c++17'.
      This provides better support for integration with build tooling such
      as autotools.
    
    4. Overhaul of packing kernels.
    - Previously there were two packing kernels referenced in the cntx_t
      structure for MRxk and NRxk shaped micropanels, respectively. These
      have now been merged into one kernel which is responsible for packing
      any dense rectangular portion of either A or B.
    - The packing kernel now receives information about the register
      blocksize (cdim_max) and duplication factor (the "broadcast-B"
      format, although this can also apply to the A matrix).
    - The structure-aware packing kernel (bli_packm_struc_cxk(), which is
      now user-overridable) also receives global offsets of the current
      micropanel within A or B.
    - Explicit kernels for packing the diagonal blocks of
      triangular/symmetric/Hermitian matrices have been added to the
      cntx_t. This means that the bli_packm_struc_ckx() "kernel" no longer
      needs to directly touch data (except to zero out some regions).
    - bli_packm_struc_cxk() has also been updated to work only in terms of
      fundamental elements (i.e., real datatypes) when computing offsets and
      when zeroing data, which greatly simplifies mixed-domain/1m packing.
    - bli_packm_scalar() has been updated to better support complex scalars
      in mixed-domain operations.
    - Pack schemas for PACKED_ROW_PANELS* and PACKED_COL_PANELS* have
      been merged into simply PACKED_PANELS*. This reflects the merging of
      the packing kernels into a single generic kernel. There were only a
      very few places which needed the row/column information and this is
      now supplied by alternative means.
    - Packing variants always behave "as if" the A matrix were being packed
      (i.e. the code assumes packing column-stored row panels). Packing of B
      is handled by applying an implicit or explicit transpose before
      packing. This change also applies to gemmsup.
    
    5. Improved MD/MP support.
    - All level-3 operations (except trsm) now support full
      mixed-domain/mixed-precision operation.
    - Explicit 1m packing kernels have been added in the cntx_t.
    - An explicit 1m microkernel wrapper has been added to the cntx_t.
    - An extra packing kernel for the "ro" format has been added, along with
      the pack_t enumeration value. This supports the packing for
      real*complex -> real, including potential scaling by a complex alpha,
      support for structured matrices, etc.
    - Extra microkernel wrappers for mixed-domain operations have been added
      to support the 'ccr' (and by extension, 'crc'), 'rcc', and 'crr'
      cases. Notably this includes full support for general stride storage
      and complex alpha/beta.
    - Packing kernels and gemm microkernels are now "templated" based on two
      type parameters rather than one. For packing this allows direct
      optimization of mixed-precision kernels, and for gemm microkernels
      this allows direct optimization of mixed-precision without writing to
      a temporary buffer. Reference packing kernels are directly
      instantiated for all mixes of precisions, while by default
      mixed-precision gemm microkernels are supported via a microkernel
      wrapper. The "old" way of specifying optimized kernels using a single
      type parameter works unchanged.
    - alpha and beta are typecast appropriately to the computational or
      output datatype, respectively, and **always** to the complex domain.
      Scalar typecasting has also been added to gemmsup for safety.
    - The gemm macrokernel doesn't have to do any typecasting anymore, as a
      microkernel wrapper or optimized mixed-precision/mixed-domain kernel
      now handles this.
    - 1m and mixed-domain operations now always use a microkernel wrapper,
      rather than adjusting parameters in the gemm macrokernel.
    - The gemmt macrokernel **does** still have to handle explicit
      write-back of microtiles which intersect the diagonal, although
      typecasting has already been performed.
    - The gemmt_x_ker_var2(), trmm_xx_ker_var2(), and trsm_xx_ker_var2()
      functions have been removed. The appropriate macrokernel pointer is
      selected during control tree initialization.
    - Real domain MR/NR are checked for even-ness based on the gemm
      microkernel's row preference in order to guarantee proper 1m and
      mixed-domain operation.
    - Full range of mixed-domain/mixed-precision functionality tested in the
      testsuite ('input.*.mixed').
    
    6. Other changes:
    - The build system has been updated to support C++ source files
      throughout the framework. While the intent is not to add such files to
      BLIS itself, this supports plugins written in C++.
    - Many instances of configuration-specific code have been simplified by
      introducing an INSERT_GENTCONF macro which instantiates a block of
      code for each enabled sub-configuration. The ConfigurationHowTo.md
      document has been updated accordingly.
    - PASTEMAC?/PASTECH?/PASTEF77? have been removed in favor of
      variadic macros which accept any number of arguments (up to a
      reasonable limit).
    - The INSERT_GENTFUNC* macros have been updated to clean up
      mixed-precision and mixed-domain instantiations.
    - bli_align_dim_to_mult() has been updated to support rounding either up
      or down based on a flag.
    - Checking for empty matrices and other early exits (level-3 only) has
      been consolidated into a single utility function.
    - The auxinfo_t struct is always passed as const.
    - The new function bli_obj_alias_submatrix() aliases a matrix while also
      resetting the root to NULL, offsets to zero (while adjusting the
      buffer), and applying any implicit transpose.
    - Level-3 pruning functions now only check matrix structure to see what
      to do, not the operation family.
    - gemmsup packing has been updated to use the "normal" pack buffer
      allocation routines.
    - Remove duplicate checks for early return from gemmsup handler.
    - bli_determine_blocksize() has been significantly simplified.
    - Partitioning packed panels is no longer allowed.
    - Added bli_xxsame macros.
    - Automated the calculation of info bit shifts and masks based on
      predefined bit sizes for various flags. This greatly simplifies
      reordering, adding, or removing flags from the info/info2 bitfields.
    - Moved more BLIS_NUM_* macros into the corresponding enums as the
      last entry so that the value is automatically computed.
    - Better const-correctness in some level0 scalar macros.
    - Better mixed-precision support in some level0 scalar macros.
    - Added a bli_axpbys_mxn() macro.
    - bli_thread_range_sub() takes explicit thread ID and number of threads
      rather than a thrinfo_t node.
    - "De-templated" BLIS gemmlike sandbox (specifically, bls_gemm_bp_var1()
      and bls_packm_var1()).
    - Combined bls_l3_packm_[ab]() into one function with thin wrappers.
    - Deleted bls_packm_var[23]().
    - Add a "termination tag" to the testsuite output so that
      'make check-blis' can accurately check for successful completion.
    - Add a new function to centrally compute FLOPs for level-3 operations
      in the testsuite.

commit a316d2c6c33fc1f8f7c58c4210ab203f48349041
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Mar 28 12:52:00 2024 -0500

    Fix incorrect commenting of `BLIS_RNTM_INITIALIZER` and `BLIS_OBJECT_INITIALIZER`.

commit 664cc6bc3ea610b4ecea63d78c6024c48f045635
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Mar 26 16:25:17 2024 -0500

    Update BLIS_*_INITIALIZER macros for C++ compatibility. (#802)
    
    Details:
    - Remove designated initializer syntax. This isn't officially supported
      until C++20.
    - Arrange initializers in the order in which they are defined in the
      struct. Even with standard or extension support for designated
      initializers, initializing non-static members out-of-order is an
      error in C++.
    - Remove the conditional code which uses '-1' as the default value of
      the 'pack_buf' member of 'mem_t' in C, but 'BLIS_BUFFER_FOR_GEN_USE'
      in C++. Simply use the latter as a common-sense default.

commit 1a8c8180b32cf5988bf9eb5d2f0f8111a729993a
Author: John <50754967+j-bm@users.noreply.github.com>
Date:   Thu Feb 15 12:35:10 2024 -0400

    Add cpu part codes for various manufacturers and use in the code (#794)
    
    * Add cpu_id symbols for arm v8.
    
    * Add symbols for arm v7.
    
    * Always assume firestorm on Apple aarch64.
    
    * Fixes incorrect usage of model vs. part in some places.
    
    * Fixes #793
    
    ---------
    
    Co-authored-by: J <jal@o75snap.localdomain>

commit c382d8bdccc07e22a341fe04960f0cbf4eec083b
Author: Igor Zhuravlov <zhuravlov.ip@ya.ru>
Date:   Sun Jan 14 04:03:31 2024 +0000

    Fix errors and typos in docs/BLIS*API.md (#791)
    
    Details:
    - Fixed errors and unified formatting in docs/BLIS*API.md docs.

commit a72e4569f2a03cc3578c019bf7ce25491a44137d
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Wed Dec 6 18:21:47 2023 -0600

    Include bli_config.h before bli_system.h in cblas.h. (#789)
    
    Details:
    - Previously, in cblas.h, bli_config.h was being #included *after*
      bli_system.h, which meant that the BLIS_ENABLE_SYSTEM macro was
      never defined in time for proper OS detection. This bug only
      affected cblas.h -- blis.h had been correctly #including
      bli_config.h before bli_system.h since fb93d24. Thanks to
      Edward Smyth for reporting this bug and suggesting the fix.

commit 1236ddab455ef3a6293ab394ff06b3a19c2913d9
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Sun Dec 3 16:42:34 2023 -0600

    Fixed random segfault in test/3 drivers. (#788)
    
    Details:
    - Fixed a segfault in the non-gemm test drivers in test/3 that was the
      result of sometimes leaving either .n_str or .k_str fields of the
      params_t struct uninitialized, depending on the operation in question.
      For example, in test_hemm.c, init_def_params() would only initialize
      the .m_str and .n_str fields, but not the .k_str field. Even though
      hemm doesn't use a 'k' dimension, the proc_params() function (called
      via parse_cl_params()) universally attempts to convert all three into
      integers via sscanf(), which was understandably failing when one of
      those strings was a NULL pointer. I'm not sure how this code ever
      worked to begin with. Special thanks to Leick Robinson for finding and
      reporting this bug.

commit 141a6c9a8e7557d9c7d28aecedec9dc5377dba13
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Tue Nov 21 12:26:43 2023 -0600

    Install helper headers to INCDIR prefix. (#787)
    
    Details:
    - Install one-line headers to INCDIR whose entire purpose is to
      #include the actual headers within the local 'blis' header directory
      so that applications can #include "blis.h" instead of #include
      <blis/blis.h> (and/or "cblas.h" instead of <blis/cblas.h> if CBLAS is
      enabled) when headers are installed to global paths. (Note that
      INCDIR is the installation prefix for headers as specified by
      '--includedir=INCDIR', which defaults to 'PREFIX/include' if not
      specified.) Not sure how this problem went unreported for so long,
      since presumably any user trying to #include "blis.h" from a global
      installation would have encountered a compiler error.
    - The one-line blis.h and cblas.h headers now reside in the 'build'
      directory, ready to install as is.
    - Thanks to to Jed Brown for reporting this via Issue #786, and for
      Devin Matthews and Mo Zhou for their engagement.
    - Harmonized the rule in the top-level Makefile for installing blis.pc
      into SHAREDIR/pkgconfig with conventions for others vis-a-vis
      verbosity/non-verbosity.

commit 2d9439298b336aa6d0ee000a5285a3adb4e6d462
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Nov 21 12:18:07 2023 -0600

    Allow users to defines [sd]complex using std::complex (#784)
    
    Details:
    - In C++ applications, it makes a lot of sense to interface to BLIS
      using C++'s standard complex number library, which uses a template
      class std::complex. Obviously BLIS doesn't know anything about this
      and defaults to a custom struct to represent complex numbers. This PR
      updates the bli_[cz]{real,imag}() functions to accept std::complex
      numbers when a C++ compiler is being used. Note that this has no
      effect on the compilation of the BLIS library (or testsuite), and only
      comes into play when including blis.h into a C++ project and forcing
      the use of std::complex for scomplex and dcomplex.
    - The application can explicitly request std:complex-based types via:
    
        #define BLIS_ENABLE_STD_COMPLEX
        #include <blis.h>
        // Call BLIS functions using std::complex<double> here.
    
    - Fixed a bug in the definition of some scalar level-0 macros, since
      bli_creal()/bli_cimag() and bli_zreal()/bli_zimag() are no longer
      interchangeable.

commit f7ce54a252028483e4c6af619015eb22063d5541
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Fri Nov 3 15:52:57 2023 -0500

    CREDITS file update.

commit 05388ddb66f8bf2d62009b162d64bf2d99226b83
Author: Aaron Hutchinson <113382047+Aaron-Hutchinson@users.noreply.github.com>
Date:   Fri Nov 3 13:30:31 2023 -0700

    Added 'sifive_x280' subconfig, kernel set. (#737)
    
    Details:
    - Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V
      instruction set architecture. The subconfig registers kernels from a
      correspondingly new kernel set, also named 'sifive_x280'.
    - Added the aforementioned kernel set, which includes intrinsics- and
      assembly-based implementations of most level-1v kernels along with
      level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm,
      gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files).
    - Registered the 'sifive_x280' subconfig as belonging to a singleton
      family by the same name.
    - Added an entry to '.travis.yml' to test the new subconfig via qemu.
    - Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280'
      subconfig and to reflect updated tarball names.
    - Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz
      for their engagement on this commit.

commit 7a87e57b69d697a9b06231a5c0423c00fa375dc1
Author: Srinivas Yadav <43375352+srinivasyadav18@users.noreply.github.com>
Date:   Sat Oct 14 02:05:41 2023 -0500

    Fixed HPX barrier synchronization (#783)
    
    Details:
    - Fixed hpx barrier synchronization. HPX was hanging on larger cores
      because blis was using non-hpx synchronization primitives. But when
      using hpx-runtime only hpx-synchronization primitives should be used.
      Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx
      barrier operations.
    - Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with
      hpx::barrier on n_threads greater than actual hardware thread count
      causes synchronization issues making hpx hanging. This can be avoided
      by using hpx::futures, which are relatively very lightweight, robust
      and scalable.

commit 8fff1e31da1c87e46cacec112b0ac280ab47cd8b
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Thu Oct 12 15:51:41 2023 -0500

    Fixed bug in sup threshold registration. (#782)
    
    Details:
    - Fixed a bug that resulted in BLIS non-deterministically calling the
      gemmsup handler, irrespective of the thresholds that are registered
      via bli_cntx_set_blkszs().
    - Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup
      thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no
      operation ever matched the criteria for gemmsup (unless specific sup
      thresholds are registered). HOWEVER, these thresholds are set via
      bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was
      only coping the thresholds into the gks' cntx_t if the values were
      strictly positive. Thus, the zero values passed into
      bli_cntx_set_blkszs() were being ignored and those threshold slots
      within the gks were left uninitialized. The upshot of this is that the
      reference gemmsup handler was being called for gemm problems
      essentially at random (and as it turns out, very rarely the reference
      gemmsup implementation would encounter a divide-by-zero error).
    - The problem was fixed by changing bli_blksz_copy_if_pos() so that it
      copies values that are non-negative (values >= 0 instead of > 0). The
      function was also renamed to bli_blksz_copy_if_nonneg()
    - Also needed to standardize use of -1 as the sole value to embed into
      blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register
      a value for that slot (and instead let whatever existing values
      remain). This required updates to the bli_cntx_init_*() functions for
      bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some
      of these codes were using 0 instead of -1.
    - Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and
      proposing a fix for this issue.

commit 1e264a42474b535431768ef925bbd518412d392e
Author: Abhishek Bagusetty <59661409+abagusetty@users.noreply.github.com>
Date:   Mon Oct 2 18:29:46 2023 -0500

    Update zen3 subconfig to support NVHPC compilers. (#779)
    
    Details:
    - Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file.
    - Minor refactor to accommodate above edit.
    - CREDITS file update.

commit c2099ed2519dcac8ee421faf999b36e1c2260be7
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Mon Oct 2 14:56:48 2023 -0500

    Fixed brokenness when sba is disabled. (#777)
    
    Details:
    - Previously, disabling the sba via --disable-sba-pools resulted in a
      segfault due to a sanity-check-triggering abort(). The problem was
      that the sba, as currently used in the l3 thread decorators, did not
      yet (fully) support pools being disabled. The solution entailed
      creating wrapper function, bli_sba_array_elem(), which either calls
      bli_apool_array_elem() (when sba pools are enabled at configure time)
      or returns a NULL sba_pool pointer (when sba pools are disabled), and
      calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note
      that the NULL pointer returned by bli_sba_array_elem() when the sba
      pools are disabled does no harm since in that situation the pointer
      goes unreferenced when acquiring and releasing small blocks. Thanks to
      John Mather for reporting this bug.
    - Guarded the bodies of bli_sba_init() and bli_sba_finalize() with
      #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary
      to fix the aforementioned bug, but it seems like good practice.
    - Moved the code in bli_l3_thrinfo_create() that checked that the array*
      pointer is non-NULL before calling bli_sba_array_elem() (previously
      bli_apool_array_elem()) into the definition of bli_sba_array_elem().
    - Renamed various instances of 'pool' variables and function parameters
      to 'sba_pool' to emphasize what kind of pool it represents.
    - Whitespace changes.

commit 37ca4fd168525a71937d16aaf6a13c0de5b4daef
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date:   Thu Sep 28 16:37:57 2023 -0500

    Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)
    
    Details:
    - Expanded existing BLAS compatibility APIs to provide interfaces to
      [cz]symv_(), [cz]syr_(). This was easy since those operations were
      already implemented natively in BLIS; the APIs were previously
      omitted only because they were not formally part of the BLAS.
    - Implemented [cz]rot_() by feeding code from LAPACK 3.11 through
      f2c.
    - Thanks to James Foster for pointing out that LAPACK contains these
      additional symbols, which prompted these additions, as well as for
      testing the [cz]rot_() functions from Julia's test infrastructure.
    - CREDITS file update.

commit 6f412204004666abac266409a203cb635efbabf3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 26 18:00:54 2023 -0500

    Added 'altra', 'altramax' subconfigs. (#775)
    
    Details:
    - Forward-ported 'altra' and 'altramax' subconfigurations from the
      older 'stable' branch lineage [1]. These subconfigs primarily target
      the Ampere Altra and AltraMax (ARM) processors. They also contain
      "QuickStart" directories with information and scripts to help
      use BLIS on these microarchitectures. Thanks to Jeff Diamond and
      Leick Robinson for developing these subconfigs and resources.
    - Updated kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c according to
      changes in the 'stable' lineage, mostly related to re-enabling of
      assembly code branches that target general stride IO.
    
    [1] Note that the 'stable' branch is being used to make sure that more
        recent commits do not introduce unreasonable performance
        regressions. As such, the name should be interpreted as shorthand
        for "performance stable," not "API stable."

commit a4a63295b96ed5b32f4df6477d24db07bf431202
Author: Srinivas Yadav <43375352+srinivasyadav18@users.noreply.github.com>
Date:   Tue Sep 26 17:58:38 2023 -0500

    Fixes to HPC runtime code path. (#773)
    
    Details:
    - Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX
      runtime was initialized using hpx::start, but the hpx::for_each
      function was being called on a non-hpx runtime (i.e standard BLIS
      runtime - single main thread). To run hpx::for_each on HPX runtime
      correctly, the code now uses hpx::run_as_hpx_thread(func, args...).
    - Replaced hpx::for_each with hpx::for_loop, which eliminates use of
      hpx::util::counting_iterator.
    - Employ hpx::execution::chunk_size(1) to make sure that a thread
      resides on a particular core.
    - Replaced hpx::apply() with updated version hpx::post().
    - Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread
      and is needed for writing results to output file.
    - By default, if not specified, the HPX runtime uses all N threads/cores
      available in the system. But, if we want to only specify n_threads out
      N threads, we use hpx::execution::experimental::num_cores(n_threads).

commit c6546c1131b1ddd45ef13f9f2b620ce2e955dbf8
Author: John Mather <54645798+jmather-sesi@users.noreply.github.com>
Date:   Wed Sep 20 13:41:07 2023 -0400

    Fixed broken link in Multithreading.md. (#774)
    
    Details:
    - Replaced 404'd link in docs/Multithreading.md with an archive from
       The Wayback Machine.
    - CREDITS file update.

commit 6dcf7666eff14348e82fbc2750be4b199321e1b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 27 14:18:57 2023 -0500

    Revamped bli_init() to use TLS where feasible. (#767)
    
    Details:
    - Revamped bli_init_apis() and bli_finalize_apis() to use separate
      bli_pthread_switch_t objects for each of the five sub-API init
      functions, with the objects for the 'ind' and 'rntm' sub-APIs being
      declared with BLIS_THREAD_LOCAL. This allows some APIs to be treated
      as thread-local and the rest as thread-shared. Thanks to Edward Smyth
      for requesting application thread-specific rntm_t structs, which
      inspired these change.
    - Combined bli_thread_init_from_env() and bli_pack_init_from_env() into
      a new function, bli_rntm_init_rntm_from_env(), and placed the combined
      code in bli_rntm.c inside of a new bli_rntm_init() function. Then
      removed the (now empty) bli_pack_init() and _finalize() function defs.
    - Deprecated bli_rntm_init() for the purposes of initializing a rntm_t
      (temporarily preserving it as bli_rntm_clear() in a cpp-undefined code
      block) so that the function name could be used for the aforementioned
      bli_rntm_init() function.
    - Updated libblis_test_pobj_create() in test_libblis.c to use a static
      rntm_t initializer instead of the deprecated bli_rntm_init()
      function-based option.
    - Minor updates to docs/Multithreading.md, including removal of
      bli_rntm_init() in the example of how to initialize rntm_t structs.
    - Changed the return value of bli_gks_init(), bli_ind_init(),
      bli_memsys_init(), bli_thread_init(), and bli_rntm_init() (and their
      finalize() counterparts) from 'void' to 'int' so that those functions
      match the function type expected by bli_pthread_switch_on()/_off().
      Those init/finalize functions now return 0 to indicate success, which
      is needed so that the switch actually changes state from off to on
      and vice versa.
    - Defined bli_thread_reset(), which copies the contents of the
      global_rntm_at_init() struct into the global_rntm struct (for the
      current application thread).
    - Guard calls to bli_pthread_mutex_lock()/_unlock() in
      - bli_pack_set_pack_a() and _pack_b()
      - bli_rntm_init_from_global()
      - bli_thread_set_ways()
      - bli_thread_set_num_threads()
      - bli_thread_set_thread_impl()
      - bli_thread_reset()
      - bli_l3_ind_oper_set_enable()
      with #ifdef BLIS_DISABLE_TLS (since TLS precludes the possibility of
      race conditions).
    - In frame/base/bli_rntm.c, declare global_rntm, global_rntm_at_init,
      and global_rntm_mutex as BLIS_THREAD_LOCAL so that separate
      application threads can change the number of ways of BLIS parallelism
      independently from one another.
    - Access global_rntm only via a new private (not exported) function,
      bli_global_rntm(). Defined a similar function for a rntm_t new to
      this commit, global_rntm_at_init, which preserves the state of the
      global rntm at initialization-time.
    - In frame/3/bli_l3_ind.c, added a guard to the declaration of the
      static variable oper_st_mutex with #ifdef BLIS_DISABLE_TLS so that the
      mutex is omitted altogether when TLS is enabled (which prevents the
      compiler from warning about an unused variable).
    - Removed redundant code from bli_thread.c:
        #ifdef BLIS_ENABLE_HPX
        #include "bli_thread_hpx.h"
        #endif
      since this code is already present in bli_thread.h.
    - Thanks to Minh Quan Ho for his review of and feedback on this commit.
    - Comment updates.

commit fa6a9b24ae2ddbd5f30f657d46004843581c768c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 19 12:44:34 2023 -0500

    Fixed error when using common.mk from testsuite. (#768)
    
    Details:
    - Commit 2db31e0 (#755) inserted logic into common.mk that attempts to
      preprocess build/detect/android/bionic.h to determine whether the
      __BIONIC__ macro is defined (in which case -lrt should not be included
      in LDFLAGS). However, the path to bionic.h was encoded without regard
      to DIST_PATH, and so utilizing common.mk anywhere that isn't the top-
      level directory (such as in the testsuite directory) resulted in a
      compiler error:
    
        gcc: error: build/detect/android/bionic.h: No such file or directory
        gcc: fatal error: no input files
        compilation terminated.
    
      This commit adds a $(DIST_PATH) prefix to the path to bionic.h so that
      it can be located from other applications' Makefiles that use BLIS's
      makefile fragments.

commit 634e532c8dcce7383d96ba33276df65c656b2198
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 9 21:54:49 2023 -0500

    Set thrcomm timpl_t id inside init functions. (#766)
    
    Details:
    - Previously, the timpl_t id being used when a thrcomm_t is being
      initialized was set within the bli_thrcomm_init() dispatch function
      after the timpl_t-specific bli_thrcomm_init_*() function returned. But
      it just occurred to me that each bli_thrcomm_init_*() function already
      intrinsically knows its own timpl_t value. This commit shifts the
      setting of the thrcomm_t.ti field into the corresponding
      bli_thrcomm_init_*() function for each timpl_t type (e.g. single,
      openmp, pthreads, hpx).
    - Removed long-deprecated code dating back nearly 10 years.
    - Whitespace changes
    - Comment updates.

commit 3cf17b4a91232709bc6a205b0e4d7ecc96579aa9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 7 13:46:20 2023 -0500

    Small fixes/improvements to docs/Multithreading.md. (#764)
    
    Details:
    - Added reminders that #include "blis.h" must be added to source files
      in order to access BLIS API function prototypes. Thanks to Barry Smith
      for suggesting this improvement.
    - Fixed pre-existing typos.
    - CREDITS file update.

commit dbc79812c390f812c7bf030bfcf87e947a1443c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 28 18:16:38 2023 -0500

    CREDITS file update.
    
    Details:
    - Thanks to Igor Zhuravlov for PR #753 (commit 915daaa).

commit 915daaa43cd189c86d93d72cd249714f126e9425
Author: Igor Zhuravlov <zhuravlov.ip@ya.ru>
Date:   Thu Jul 27 20:33:59 2023 +0000

    Fix typos in docs + example code comments. (#753)
    
    Details:
    - Fixed various typos in API documentation in docs/BLIS*API.md and
      comments in the source code examples within examples/?api/*.c.

commit 2db31e057e7e9c97fc60021b5ae72a01a48d7588
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Thu Jul 27 15:27:21 2023 -0500

    Exclude -lrt on Android with Bionic libraries. (#755)
    
    Details:
    - Added build/detect/android/bionic.h header to test whether the
      __BIONIC__ cpp macro is defined.
    - In common.mk, only add -lrt to LDFLAGS when Bionic is not present.
    - CREDITS file update.

commit 22ad8c1b752364784f320168b31995945ad84a59
Author: ct-clmsn <ct.clmsn@gmail.com>
Date:   Thu Jul 27 16:23:29 2023 -0400

    Small fixes to support hpx in the testsuite (#759)
    
    Details:
    - Minor changes to test_libblis.c to support hpx.

commit c91b41d022e33da82b3b06c82be047a29873d9b6
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Wed Jul 26 14:37:08 2023 -0500

    Auto-detect the RISC-V ABI of the compiler and use -mabi= during RISC-V Builds (#750)
    
    Details:
    - Generate a build error if there is a 32/64-bit mismatch between the
      RISC-V ABI or architecture and the BLIS configuration selected.
    - Handle Q, Zicsr, ZiFencei, Zba, Zbb, Zbc, Zbs and Zfh extensions in
      the RISC-V architecture auto-detection. ZiFencei and Zicsr is not
      detectable with built-in RISC-V macros right now.
    - ZiFencei is not important for BLIS because doesn't it have
      Just-In-Time compilation or self-modifying code, and Zicsr is implied
      by the floating-point extensions, which are required for good
      performance in BLIS.
    - Move RISC-V autodetect header files to build/detect/riscv/.

commit a0b04e3c007f1207e5678bf20c07752906742fb7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 26 17:59:21 2023 -0500

    Rewrote regen-symbols.sh (gen-libblis-symbols.sh). (#751)
    
    Details:
    - Wrote an alternative to regen-symbols.sh, gen-libblis-symbols.sh,
      that generates a list of exported symbols from the monolithic blis.h
      file rather than peeking inside of the shared object via nm. (This new
      script lives in the 'build' directory and the older script has been
      retired to build/old.) Special thanks to Devin Matthews for authoring
      gen-libblis-symbols.sh.
    - Added a 'symbols' target to the top-level Makefile which will refresh
      build/libblis-symbols.def, with supporting changes to common.mk.
    - Updates to build/libblis-symbols.def using the new symbol-generating
      script.

commit 6b894c30b9bb2c2518848d74e4c8d96844f77f24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 12 17:22:44 2023 -0500

    Rewrote/fixed broken tree barrier implementation.
    
    Details:
    - Rewrote the defintion of bli_thrcomm_tree_barrier() so that it (a)
      actually worked again, and (b) used atomics instead of a basic C99
      spin loop. (Note that the conventional barrier implementation is
      still enabled by default; the tree barrier must be toggled on
      manually within the configuration.)
    - Added an early return to the definition of bli_thrcomm_barrier() in
      the cases where comm == NULL or comm->n_threads == 1.
    - Reordered thread-related and thread-dependent header #include
      directives in blis.h so that the BLIS_TREE_BARRIER and
      BLIS_TREE_BARRIER_ARITY macros, which would be defined in the target
      configuration's in the bli_family_*.h file, would be #included prior
      to the inclusion of the thrcomm_t header that uses them.
    - Changed the type of barrier_t.count from 'int' to 'dim_t'.
    - Changed the type of barrier_t.signal from 'volatile int' to 'gint_t'.
    - Special thanks to Leick Robinson for contributing these changes.
    - Whitespace changes.

commit d639554894b6252a86bd3164921bce6fbb9e3b5e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 7 16:11:14 2023 -0500

    Pad thrcomm_t fields to avoid false sharing.
    
    Details:
    - Inserted a cache line of padding between various fields of the
      thrcomm_t and, in the case of the (presently defunct) tree barrier,
      fields of the barrier_t. This additional padding ensures that these
      fields, which both serve different purposes when performing a thread
      barrier, are only accessed when needed (and not just due to their
      spatial locality with their cache line neighbors).
    - Added a new cpp macro constant, BLIS_CACHE_LINE_SIZE, to
      bli_config_macro_defs. This new constant defines the size of a cache
      line (in bytes) and defaults to 64.
    - Special thanks to Leick Robinson for discovering this false sharing
      issue and developing/submitting the patch.

commit 89b7863fc9a88903917deedc6a5ad9fd17f83713
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon May 8 16:51:18 2023 -0500

    Fix 1m enablement for herk/her2k/syrk/syr2k. (#743)
    
    Details:
    - Ever since 28b0982, herk, her2k, syrk, and syr2k have been implemented
      in terms of the gemmt expert API. And since the decision of which
      induced method to use (1m or native) is made *below* the level of the
      expert API, executing any of {herk,her2k,syrk,syr2k} results in BLIS
      checking the enablement status for gemmt.
    - This commit applies a band-aid of sorts to this issue by modifying
      bli_l3_ind_oper_get_enable() and bli_l3_ind_oper_set_enable() so that
      any attempts to query or modify the internal enablement status for
      herk, her2k, syrk, or syr2k instead does so for gemmt.
    - This solution isn't perfect since, in theory, the user could enable 1m
      for, say, herk but then disable it for syrk, and then be confused when
      herk runs via native execution. But we don't anticipate that users
      modify 1m enablement at the operation level, and so in practice this
      solution is likely fine for now.

commit 138de3b3e88c5bf7d8718c45c88811771cf42db8
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date:   Sun May 7 13:01:38 2023 -0700

    add nvhpc compiler support (#719)
    
    Add detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`, and adjust some warning options in `config.mk`. Currently, no specific options for `nvc` have been added in the relevant configurations so it may not be usable without further tweaks.

commit 0873c0f6ed03fea321d1631b3d1a385a306aa797
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun May 7 14:03:19 2023 -0500

    Consolidate INSERT_ macro sets via variadic macros. (#744)
    
    Details:
    - Consolidated INSERT_GENTFUNC_* (and corresponding GENTPROT) macro sets
      using variadic macros (__VA_ARGS__), which means we no longer need a
      different INSERT_ macro for each possible number of arguments the
      macro might take. This change seems reasonable given that variadic
      macros are a standard C99 feature and widely supported. I took care
      not to use variadic macros where 0 variadic arguments are expected
      since that is a non-standard extension.
    - Added pre-typecast parentheses to arithmetic expressions in printf()
      statements in bli_thread_range_tlb.c.

commit ef9d3e6675320a53e7cb477c16b01388e708b1da
Author: h-vetinari <h.vetinari@gmx.com>
Date:   Sun May 7 04:59:35 2023 +1100

    Added missing #include <io.h> for Windows. (#747)
    
    Details:
    - This commit fixes issue #746, in which the _access() function (called
      from within blastest/f2c/open.c) is undeclared when compiling on
      Windows with clang 16.

commit 6fd9aabb03d172a792a7eeb106c7d965cf038421
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri May 5 14:22:52 2023 -0500

    Fix bug in detecting Fortran compiler vendor (#745)
    
    `FC` was used instead of `found_fc`.

commit 8215b02f99aa77ecc7d813508c247565115319d7
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Wed Apr 12 12:59:27 2023 -0500

    Apply #738 to make_defs.mk of RISC-V subconfigs. (#740)
    
    Details:
    - PR #738 -- which moved -fPIC flag insertion responsibilities from
      common.mk to the subconfigs' individual make_defs.mk files -- was
      merged shortly before the introduction of new RISC-V subconfigs in
      #693. This commit brings those RISC-V subconfigs up to date with the
      new -fPIC conventions.

commit 6b38c5ac07a2a27738674784e58aa699bf895447
Author: angsch <17718454+angsch@users.noreply.github.com>
Date:   Tue Apr 11 19:27:43 2023 +0200

    Add RISC-V target (#693)
    
    Details:
    - There are four RISC-V base configurations: 'rv32i', 'rv32iv', 'rv64i',
      and 'rv64iv', namely the 32-bit and 64-bit implementations with and
      without the 'V' vector extension. Additional extensions such as 'M'
      (multiplication), 'A' (atomics), 'F' ('float' hardware support), 'D'
      ('double' hardware support), and 'C' (compressed-length instructions),
      are automatically used when available. If they are not available, then
      software equivalents (e.g., softfloat and -latomic) are used.
    - './configure auto' can be invoked on a RISC-V build platform, and will
      automatically detect RISC-V CPU extensions through the RISC-V C API:
      https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md
    - The assembly kernels assume the presence of the vector extension
      RVV 1.0.
    - It is possible to build 'rv[32,64]iv' for any value of VLEN.
      However, if VLEN < 128, the targets will fall back to the generic
      kernels and blocksizes.
    - The vector microkernels are vector-length agnostic and work with
      every VLEN >=128, but are expected to work best with smaller vector
      lengths, i.e., VLEN <= 512.
    - The assembly kernels cover column major storage (rs_c == 1).
    - The blocksizes aim at being a good generic choice for out-of-order
      cores. They are not tuned to a specific RISC-V HPC core.
    - The vector kernels have been tested using vlen={128,256,512}.
    - The single- and double-precision assembly code routines for 'sgemm'
      and 'dgemm', or for 'cgemm' and 'zgemm', are combined in their RISC-V
      vector assembly source code, and are differentiated only with macros.
    - The XLEN=32 and XLEN=64 versions of the RISC-V assembly code are
      identical, except that callee-saved registers are saved and restored
      differently. There are RISC-V assembly code #include files for
      handling the saving and restoring of callee-saved registers, and they
      are future-proof if ever XLEN=128.
    - Multiplications, such as computing array strides and offsets, are
      performed in C, and later passed to the RISC-V assembly kernels. This
      is so that the compiler can determine whether the 'M' (multiply)
      extension is available and use multiplication instructions, or call
      library helper functions instead.
    - A new macro called bli_static_assert() has been added to perform
      static assertions at compile-time, regardless of the C/C++ dialect of
      the compiler. The original motivation of this was to ensure that
      calling RISC-V assembly kernels would not silently truncate arguments
      of type 'dim_t' or 'inc_t' (so-called "narrowing conversions").
    - RISC-V CI tests have been added to Travis CI, using the
      riscv-gnu-toolchain cross-compiler, and qemu simulator.
    - Thanks to Lee Killough for collaborating on this commit.

commit 593d01761910af6a9a16ee0ac097142732f73c29
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 8 16:44:16 2023 -0500

    CREDITS file update.

commit 259f68479671bbaf9c5986759aaa0004f9b05a24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 7 16:11:34 2023 -0500

    CREDITS file update.
    
    Details:
    - Added attributions associated with commits:
      - 98d4678 9b1beec: @bartoldeman
      - 2b05948 059f151: @ct-clmsn
    - Reordered attirubtion for @decandia50.

commit aea8e1d9243631635ca788d5e14f0f29328e637d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 3 12:17:51 2023 -0500

    Optionally disable thread-local storage. (#735)
    
    Details:
    - Implemented a new configure option, --disable-tls, which allows the
      user to optionally disable the use of thread-local storage qualifiers
      on static variables in BLIS. This option will rarely be needed, but
      in some situations may allow BLIS to compile when TLS is unavailable.
      Thanks to Nick Knight for suggesting this option.
    - Unlike the --disable-system option, --disable-tls does not forcibly
      disable threading. Instead, warnings of the possible consequences of
      using threading with TLS disabled are added to:
      - the output of './configure --help';
      - the output of 'configure' the --disable-tls option is parsed;
      - the informational header output by the testsuite.
      Thanks to Minh Quan Ho for suggesting these warnings.
    - Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is
      defined to nothing when BLIS_ENABLE_TLS is not defined.
    - Defined bli_info_get_enable_tls(), which returns whether the cpp macro
      BLIS_ENABLE_TLS was defined.
    - Edited --disable-system configure status output for clarity.
    - Whitespace updates.

commit 3f1432abe75cc306ef90a04381d7e0d8739fded8
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Mon Apr 3 12:10:59 2023 -0500

    Add output.testsuite to .gitignore (#736)
    
    Details:
    - Added `output.testsuite` to .gitignore since it was previously not
      being matched by `output.testsuite.*`.

commit 38fc5237520a2f20914a9de8bb14d5999009b3fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 30 17:30:07 2023 -0500

    Added mm_algorithm pdf files (bp and pb).
    
    Details:
    - Added PDF versions of the PowerPoint files added in 17cd260.

commit 17cd260cb504b2f3997c32daec77f4c828fbb32b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 29 21:47:12 2023 -0500

    Added mm_algorithm pptx files (bp and pb).
    
    Details:
    - Added two PowerPoint files that contain slides depicting the classic
      Goto algorithm for matrix multiplication as well as its sister
      "panel-block" algorithm. These files reside in docs/diagrams.

commit 9d778e0f7c94d8752dd578101e4fc6893a1f54ef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 29 17:36:49 2023 -0500

    Move -fPIC insertion to subconfigs' make_defs.mk. (#738)
    
    * Move -fPIC insertion to subconfigs' make_defs.mk.
    
    Details:
    - Previously, common.mk was appending -fPIC to the CPICFLAGS variables
      set within the various subconfigurations' make_defs.mk files. This
      seemed somewhat unintuitive, and so now the -fPIC flag is assigned to
      the various subconfigs' CPICFLAGS variables in the respective
      make_defs.mk files.
    - This also commit changes the logic in common.mk so that instead of
      appending, the variable is overwritten, but now *only* in the case
      of Windows (since apparently -fPIC needs to be omitted there). Thanks
      to Nick Knight for catching and reporting this weirdness.

commit 04090df01175477394d1e73af2e5769751d47cd6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 27 14:13:10 2023 -0500

    Fixed compile errors with `BLIS_DISABLE_BLAS_DEFS`. (#730)
    
    * Fixed compile errors with BLIS_DISABLE_BLAS_DEFS.
    
    Details:
    - This commit fixes a compile-time error related to the type definition
      (prototype) of dsdot_() when BLIS_DISABLE_BLAS_DEFS is defined by the
      application (or the configuration), which is actually a symptom of a
      larger design issue when disabling BLAS prototypes. The macro was
      intended to allow applications to bring their own BLAS prototypes and
      suppress the inclusion of duplicate (or possibly conflicting)
      prototypes within blis.h. However, prototypes are still needed during
      compilation even if they are ultimately omitted from blis.h. The
      problem is that almost every source file in BLIS--including the BLAS
      compatibility layer--only includes one header (blis.h), and if we
      were to #include a new header in the BLAS source files (to isolate
      only the BLAS prototypes), we would also have to make the build system
      aware of the location of those headers. Thanks to Edward Smyth of AMD
      for reporting this issue.
    - The solution I settled upon was to remove all cpp guards from all BLAS
      headers (by changing them to #if 1, for easy search-and-replace
      anchoring in the future if we ever need to re-insert guards) and
      modifying bli_blas.h so that the BLAS prototypes are #included if
      either (a) BLIS_ENABLE_BLAS_DEFS is defined, or (b)
      BLIS_ENABLE_BLAS_DEFS is *not* defined but BLIS_IS_BUILDING_LIBRARY
      *is* defined. (Thanks to Devin Matthews for steering me away from an
      inferior solution.)
    - This commit also spins off the actual BLAS prototypes/definitions to
      a separate file, bli_blas_defs.h.
    - CREDITS file update.

commit 5f841307f668f65b7ed5a479bd8374d2581208cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 24 20:05:13 2023 -0500

    Omit -fPIC if shared library build is disabled. (#732)
    
    Details:
    - Updated common.mk so that when --disable-shared option is given to
      configure:
      1. The -fPIC compiler flag is omitted from the individual
         configuration family members' CPICFLAGS variables (which are
         initialized in each subconfig's make_defs.mk file); and
      2. The BUILD_SYMFLAGS variable, which contains compiler flags needed
         to control the symbol export behavior, is left blank.
    - The net result of these changes is that flags specific to shared
      library builds are only used when a shared library is actually
      scheduled to be built. Thanks to Nick Knight for reporting this issue.
    - CREDITS file update.

commit 72c37eb80f964b7840377076e5009aec5b29d320
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Thu Mar 23 16:01:55 2023 -0500

    Updated configure to pass all shellcheck checks. (#729)
    
    Details:
    - Modified configure so that it passes all 'shellcheck' checks,
      disabling ones which we violate but which are just stylistic, or are
      special cases in our code.
    - Miscellaneous other minor changes, such as rearranged redirections in
      long sed/perl pipes to look more natural.
    - Whitespace tweaks.

commit 60f36347c16e6336215cd52b4e5f3c0f96e7c253
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 22 20:37:30 2023 -0600

    Fixed bugs in scal2v ref kernel when alpha == 1. (#728)
    
    Details:
    - Fixed a typo bug in ref_kernels/1/bli_scal2v_ref.c where the
      conditional that was supposed to be checking for cases when alpha is
      equal to 1.0 (so that copyv could be used instead of scal2v) was
      instead erroneously comparing alpha against 0.0.
    - Fixed another bug in the same function whereby BLIS_NO_CONJUGATE was
      erroneously being passed into copyv instead of the kernel's conjx
      parameter. This second bug was inert, however, due to the first bug
      since the "alpha == 0.0" case was already being handled, resulting in
      the code block never executing.

commit fab18dca46618799bb0b4f652820b33d36a5d4d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 22 16:50:00 2023 -0600

    Use 'void*' datatypes in kernel APIs. (#727)
    
    Details:
    - Migrated all kernel APIs to use void* pointers instead of float*,
      double*, scomplex*, and dcomplex* pointers. This allows us to define
      many fewer kernel function pointer types, which also makes it much
      easier to know which function pointer type to use at any given time.
      (For example, whereas before there was ?axpyv_ker_ft, ?axpyv_ker_vft,
      and axpyv_ker_vft, now there is just axpyv_ker_ft, which is equivalent
      so what axpyv_ker_vft used to be.)
    - Refactored how kernel function prototypes and kernel function types
      are defined so as to reduce redundant code. Specifically, the
      function signatures (excluding cntx_t* and, in the case of level-3
      microkernels, auxinfo_t*) are defined in new headers named, for
      example, bli_l1v_ker_params.h. Those signatures are reused via macro
      instantiation when defining both kernel prototypes and kernel function
      types. This will hopefully make it a little easier to update, add, and
      manage kernel APIs going forward.
    - Updated all reference kernels according to the aforementioned switch
      to void* pointers.
    - Updated all optimzied kernels according to the aforementioned switch
      to void* pointers. This sometimes required renaming variables,
      inserting typecasting so that pointer arithmetic could continue to
      function as intended, and related tweaks.
    - Updated sandbox/gemmlike according to the aforementioned switch to
      void* pointers.
    - Renamed:
      - frame/1/bli_l1v_ft_ker.h    -> frame/1/bli_l1v_ker_ft.h
      - frame/1f/bli_l1f_ft_ker.h   -> frame/1f/bli_l1f_ker_ft.h
      - frame/1m/bli_l1m_ft_ker.h   -> frame/1m/bli_l1m_ker_ft.h
      - frame/3/bli_l1m_ft_ukr.h    -> frame/3/bli_l1m_ukr_ft.h
      - frame/3/bli_l3_sup_ft_ker.h -> frame/3/bli_l3_sup_ker_ft.h
      to better align with naming of neighboring files.
    - Added the missing "void* params" argument to bli_?packm_struc_cxk() in
      frame/1m/packm/bli_packm_struc_cxk.c. This argument is being passed
      into the function from bli_packm_blk_var1(), but wasn't being "caught"
      by the function definition itself. The function prototype for
      bli_?packm_struc_cxk() also needed updating.
    - Reordered the last two parameters in bli_?packm_struc_cxk().
      (Previously, the "void* params" was passed in after the
      "const cntx_t* cntx", although because of the above bug the params
      argument wasn't actually present in the function definition.)

commit 93c63d1f469c4650df082d0fa2f29c46db0e25f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 20 11:14:23 2023 -0600

    Use 'const' pointers in kernel APIs. (#722)
    
    Details:
    - Qualified all input-only data pointers in the various kernel APIs with
      the 'const' keyword while also removing 'restrict' from those kernel
      APIs. (Use of 'restrict' was maintained in kernel implementations,
      where appropriate.) This affected the function pointer types defined
      for all of the kernels, their prototypes, and the reference and
      optimized kernel definitions' signatures.
    - Templatized the definitions of copys_mxn and xpbys_mxn static inline
      functions.
    - Minor whitespace and style changes (e.g. combining local variable
      declaration and initialization into a single statement).
    - Removed some unused kernel code left in 'old' directories.
    - Thanks to Nisanth M P for helping to validate changes to the power10
      microkernels.

commit 4e18cd34f909c5045597f411340ede3a5e0bc5e1
Author: RuQing Xu <ruqing.xu@phys.s.u-tokyo.ac.jp>
Date:   Sun Feb 19 04:18:41 2023 +0900

    Restored ArmSVE general storage case. (#708)
    
    Details:
    - Restored general storage case in armsve kernels.
    - Reason for doing this: Though real `g`-storage is difficult to
      speedup, `g`-codepath here can provide a good support for
      transposed-storage. i.e. at least good for `GEMM_UKR_SETUP_CT_AMBI`.
    - By experience, this solution is only *a little* slower than in-reg
      transpose. Plus in-reg transpose is only possible for a fixed VL in
      our case.

commit 0ba6e9eafb1e667373d9dbc2aa045557921f33e2
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Sat Feb 18 13:15:42 2023 -0600

    Refined emacs handling of indentation. (#717)
    
    Details:
    - This refines the emacs autoformatting to be better in line with
      contribution guidelines.
    - Removed a stray shebang in a .mk file which confuses emacs about the
      file mode, which should be makefile-mode. (emacs also removes stray
      whitespace at the ends of lines.)

commit 059f15105b1643fe56084f883c22b3cadf368b39
Author: ct-clmsn <ct.clmsn@gmail.com>
Date:   Sat Feb 18 14:13:23 2023 -0500

    Updated hpx namespace for make_count_shape. (#725)
    
    Details:
    - The hpx namespace for *counting_shape changed. This PR updates the use
      of counting_shape in blis to comply with the change in hpx.
    - Co-authored-by: ctaylor <ctaylor@tactcomplabs.com>

commit 0b421eff130b5c896edcc09e7358d18564d177e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Feb 18 13:11:41 2023 -0600

    Added an 'arm64' entry to `.travis.yml`. (#726)
    
    Details:
    - Added a new 'arm64' entry to the .travis.yml file in an attempt to get
      Travis CI to compile both NEON and SVE kernels, even if only NEON
      kernels are exercised in the testing. With this new 'arm64' entry, the
      'cortexa57' entry becomes redundant and may be removed. Thanks to
      RuQing Xu for this suggestion.
    - Previously, the macro BLIS_SIMD_MAX_SIZE was *not* being set in
      bli_kernels_arm64.h, which meant that the default value of 64 was
      being used. This caused a runtime consistency check to fail in
      bli_gks.c (in Travis CI), one which requires that
    
        mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE
    
      for all datatype sizes dt_size, where BLIS_STACK_BUF_MAX_SIZE is
      defined as
    
        BLIS_SIMD_MAX_NUM_REGISTERS * BLIS_SIMD_MAX_SIZE * 2
    
      This commit increases BLIS_SIMD_MAX_SIZE to 128 for the 'arm64'
      configuration, thus overriding the default and (hopefully) avoiding
      the aforementioned consistency check failures.
    - Appended '|| cat ./output.testsuite' to all 'make' commands in
      travis/do_testsuite.sh. Thanks to RuQing Xu for this suggestion.
    - Whitespace changes.

commit b1d3fc7e5b0927086e336a23f16ea59aa3611ccb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 10 15:34:47 2023 -0600

    Redirect grep stderr to /dev/null. (#723)
    
    Details:
    - In common.mk, added a redirection of stderr to /dev/null for the grep
      command being used to gather a list of header files #included from
      bli_cntx_ref.c. The redirection is desirable because as of grep 3.8,
      regular expressions with "stray" backslashes trigger warnings [1].
      But removing the backslash seems to break the BLIS build system when
      using pre-3.8 versions of grep, so this seems to be easiest way to
      satisfy the BLIS build system for both pre- and post-3.8 grep
      environments.
    
      [1] https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html

commit e3d352f1fcc93e6a46fde1aa4a7f0a18fb27bd42
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date:   Wed Feb 8 06:11:41 2023 +0530

    Added runtime selection of 'power' config family. (#718)
    
    Details:
    - Created a 'power' umbrella configuration family, which, when targeted
      at configure-time, will build both 'power9' and 'power10' subconfigs.
      (With this feature, a BLIS shared library could be compiled on a
      power9 system and run on power10 and vice-versa. Unoptimised code
      will execute if it is linked and run on any other generic system.)
    - This new configuration family will only work with gcc, since that is
      the only compiler supported by both power9 and power10 subconfigs in
      BLIS.
    - Documented power9 and power10 as supported microarchitectures in the
      docs/HardwareSupport.md document.

commit e730c685d09336b3bd09e86c94330c4eba967f3e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 6 15:31:54 2023 -0600

    Define `BLIS_VERSION_STRING` in `blis.h`. (#720)
    
    Details:
    - Previously, the version string was communicated from configure to
      config.mk (via the config.mk.in template), where it was included via
      the top-level Makefile, where it was then used to define the
      preprocessor macro BLIS_VERSION_STRING via a command line argument to
      the compiler (via -D). This macro is then used within bli_info.c to
      initialize a static string which can then be queried via the
      bli_info_get_version_str() function. However, there are some
      applications that may find utility in being able to access the version
      string by inspecting the monolithic (flattened) blis.h header file
      that is created at compile time and installed alongside the library.
      This commit moves the definition of BLIS_VERSION_STRING into
      bli_config.h (via the bli_config.h.in template) so that it is
      embedded in blis.h. The version string is now available in three
      places:
      - the static/shared library, which is installed in the 'lib'
        subdirectory of the install prefix (query-able via the
        bli_info_get_version_str() function);
      - the config.mk makefile fragment, which is installed in the 'share'
        subdirectory of the install prefix (in the VERSION variable);
      - the blis.h header file, which is installed in the 'include'
        subdirectory of the install prefix (via the BLIS_VERSION_STRING
        macro constant).
      Thanks to Mohsen Aznaveh and Tim Davis for providing the idea for this
      change.
    - CREDITS file update.

commit dc5d00a6ce0350cd82859d8c24f23d98f205d8db
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Fri Jan 27 17:36:47 2023 -0600

    Typecast printf() args to avoid compiler warnings. (#716)
    
    Details:
    - In bli_thread_range_tlb.c, typecast integer arguments passed to
      printf() -- which are typically disabled unless debugging -- to type
      "long" to guarantee a match to the "%ld" format specifiers used in
      those calls. This avoids spurious warnings with certain compilers in
      certain toolchain environments, such as 32-bit RISC-V (rv32iv).

commit ecbcf4008815035c695822fcaf106477debff89a
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Wed Jan 18 20:35:50 2023 -0600

    Use here-document for 'configure --help' output. (#714)
    
    Details:
    - Changed the configure script function that outputs "--help" text to do
      so via so-called "here-document" syntax for improved readability and
      maintainability. The change eliminates hundreds of echo statements and
      makes it easier to change existing configure options' help text, along
      with other benefits such as eliminating the need to escape double-
      quote characters (").

commit c334ec278f5e2a101625629b2e13bbf1b38dede5
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jan 18 13:10:19 2023 -0600

    Merge tlb- and slab/rr-specific gemm macrokernels. (#711)
    
    Details:
    - Merged the tlb-specific gemm macrokernel (_var2b) with the slab/rr-
      specific one (var2) so that a single function can be compiled with
      either tlb or slab/rr support, depending on the value of the
      BLIS_ENABLE_JRIR_TLB, _SLAB, and _RR. This is done by incorporating
      information from both approaches: the start/end/inc for the JR and IR
      loops from slab or rr partitioning; and the number of assigned
      microtiles, plus the starting IR dimension offset for all iterations
      after the first (ir_next). With these changes, slab, rr, and tlb can
      all be parameterized by initializing a similar set of variables prior
      to the jr loop.
    - Removed the wrap-around logic that sets the "b_next" field of the
      auxinfo_t struct, which executes during the last IR iteration of the
      last JR iteration. The potential benefit of this code is so minor
      (and hinges on the microkernel making use of the b_next field) that
      it's arguably not worth including. The code also does the wrong
      thing for some threads whenever JR_NT > 1, since only thread 0 (in the
      JR group) would even compute with the first micropanel of B.
    - Re-expressed the definition of bli_is_last_iter_slrr so that slab and
      tlb use the same code rather than rr and tlb.
    - Adjusted the initialization of the gemm control tree accordingly.

commit 5793a77937aee9847a5692c8e44b36a6380800a1
Author: HarshDave12 <122850830+HarshDave12@users.noreply.github.com>
Date:   Tue Jan 17 21:55:02 2023 +0530

    Fixed mis-mapped instruction for VEXTRACTF64X2. (#713)
    
    Details:
    - This commit fixes a typo in the macro definition for the extended
      inline assembly macro VEXTRACTF64X2 in bli_x86_asm_macros.h. The macro
      was previously defined (incorrectly) in terms of the vextractf64x4
      instruction rather than vextractf64x2.
    - CREDITS file update.

commit 16d2e9ea9ca0853197b416eba701b840a8587bca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 13 20:03:01 2023 -0600

    Defined lt, lte, gt, gte + misc. other updates. (#712)
    
    Details:
    - Changed invertsc operation to be a non-destructive operation; that is,
      it now takes separate input and output operands. This change applies
      to both the object and typed APIs.
    - Defined an alternative square root operation, sqrtrsc, which, when
      operating on complex scalars, assumes the imaginary part of the input
      to be zero.
    - Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym
      so that when the source matrix has an implicit unit diagonal, the
      operation leaves the diagonal of the destination matrix untouched.
      Previously, the operations would interpret an implicit unit diagonal
      on the source matrix as a request to manifest the unit diagonal
      *explicitly* on output (either as something to copy in the case of
      copym, or something to compute with in the cases of addm, subm, axpym,
      scal2m, and xpbym). It turns out that this behavior was too cute by
      half and could cause unintended headaches for practical use cases.
      (This change in behavior also required small modifications to the trmv
      and trsv testsuite modules so that they would properly test matrices
      with unit diagonals.)
    - Added missing dependencies for copym to gemv, ger, hemv, trmv, and
      trsv testsuite modules.
    - Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in
      frame/util, which use lt, lte, gt, and gte level-0 scalar macros.
    - Trivial variable rename in bli_part.c to harmonize with other
      variable naming conventions.

commit 9a366b14fe52c469f4664ef5dd93d85be8d97baa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 12 13:07:22 2023 -0600

    Implement cntx_t pointer caching in gks. (#709)
    
    Details:
    - Refactored the gks cntx_t query functions so that: (1) there is a
      clearer pattern of similarity between functions that query a native
      context and those that query its induced (1m) counterpart; and (2)
      queried cntx_t pointers (for both native and induced cntx_t pointers)
      are cached (by default), or deep-queried upon each invocation,
      depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is defined.
    - Refactored query-related functions in bli_arch.c to cache the queried
      arch_t value (by default), or deep-query the arch_t value upon each
      invocation, depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is
      defined.
    - Tweaked the behavior of bli_gks_query_ind_cntx_impl() (formerly named
      bli_gks_query_ind_cntx()) so that the induced method cntx_t struct is
      repopulated each time the function is called. (It is still only
      allocated once on first call.) This was mostly done in preparation for
      some future in which the arch_t value might change at runtime. In such
      a scenario, the induced method context would need to be recalculated
      any time the native context changes.
    - Added preprocessor logic to bli_config_macro_defs.h to handle enabling
      or disabling of cntx_t pointer caching (via BLIS_ENABLE_GKS_CACHING).
    - For now, cntx_t pointer caching is enabled by default and does not
      correspond to any official configure option. Disabling can be done
      by inserting a #define for BLIS_DISABLE_GKS_CACHING into the
      appropriate bli_family_*.h header file within the configuration of
      interest.
    - Thanks to Harihara Sudhan S (AMD) for suggesting that cntxt_t pointers
      (and not just arch_t values) be cached.
    - Comment updates.

commit b895ec9f1f66fb93972589c06bff171337153a31
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date:   Wed Jan 11 09:02:32 2023 +0530

    Fixing type-mismatch errors in power10 sandbox (#701)
    
    Details:
    - This commit fixes a mismatch between the function type signature of
      bli_gemm_ex() required by BLIS and the version of the function defined
      within the power10 sandbox. It also performs typecasting upon calling
      bli_gemm_front() to attain type consistency with the type signature
      defined by BLIS for bli_gemm_front().

commit 38d88d5c131253066cad4f98eea06fa9299cae3b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jan 10 21:24:58 2023 -0600

    Define new global scalar (obj_t) constants. (#703)
    
    Details:
    - This commit defines the following new global scalar constants:
      - BLIS_ONE_I: This constant encodes the imaginary unit.
      - BLIS_MINUS_ONE_I: This constant encodes the negative imaginary unit.
      - BLIS_NAN: This constant encodes a not-a-number value. Both real and
        imaginary parts are set to NaN for complex datatypes.

commit cdb22b8ffa5b31a0c16ac1a7bcecefeb5216f669
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date:   Wed Jan 11 08:50:57 2023 +0530

    Disable power10 kernels other than sgemm, dgemm. (#705)
    
    Details:
    - There is a power10 sandbox which uses microkernels for datatypes other
      than float and double (or scomplex/dcomplex). In a regular power10-
      configured build (that is, with the sandbox disabled), there were
      compile errors for some of these other non-sgemm/non-dgemm
      microkernels. This commit protects those kernels with a new cpp macro
      guard (which is defined in sandbox/power10/bli_sandbox.h) that
      prevents that kernel code from being compiled for normal, non-sandbox
      power10 builds.

commit d220f9c436c0dae409974724d42ab6c52f12a726
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date:   Wed Jan 11 08:43:03 2023 +0530

    Fix k = 0 edge case in power10 microkernels (#706)
    
    Details:
    - When power10 sgemm and dgemm microkernels are called with k = 0, they
      become caught in infinite loops and segfault. This is fixed now via an
      early exit in the case of k = 0.

commit 2e1ba9d13c23a06a7b6f8bd326af428f7ea68c31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 10 21:05:54 2023 -0600

    Tile-level partitioning in jr/ir loops (ex-trsm). (#695)
    
    Details:
    - Reimplemented parallelization of the JR loop in gemmt (which is
      recycled for herk, her2k, syrk, and syr2k). Previously, the
      rectangular region of the current MC x NC panel of C would be
      parallelized separately from from the diagonal region of that same
      submatrix, with the rectangular portion being assigned to threads via
      slab or round-robin (rr) partitioning (as determined at configure-
      time) and the diagonal region being assigned via round-robin. This
      approach did not work well when extracting lots of parallelism from
      the JR loop and was often suboptimal even for smaller degrees of
      parallelism. This commit implements tile-level load balancing (tlb) in
      which the IR loop is effectively subjugated in service of more
      equitably dividing work in the JR loop. This approach is especially
      potent for certain situations where the diagonal region of the MC x NR
      panel of C are significant relative to the entire region. However, it
      also seems to benefit many problem sizes of other level-3 operations
      (excluding trsm, which has an inherent algorithmic dependency in the
      IR loop that prevents the application of tlb). For now, tlb is
      implemented as _var2b.c macrokernels for gemm (which forms the basis
      for gemm, hemm, and symm), gemmt (which forms the basis of herk,
      her2k, syrk, and syr2k), and trmm (which forms the basis of trmm and
      trmm3). Which function pointers (_var2() or _var2b()) are embedded in
      the control tree will depend on whether the BLIS_ENABLE_JRIR_TLB cpp
      macro is defined, which is controlled by the value passed to the
      existing --thread-part-jrir=METHOD (or -r METHOD) configure option.
      This script adds 'tlb' as a valid option alongside the previously
      supported values of 'slab' and 'rr'. ('slab' is still the default.)
      Thanks to Leick Robinson for abstractly inspiring this work, and to
      Minh Quan Ho for inquiring (in PR #562, and before that in Issue #437)
      about the possibility of improved load balance in macrokernel loops,
      and even prototyping what it might look like, long before I fully
      understood the problem.
    - In bli_thread_range_weighted_sub(), tweaked the the way we compute the
      area of the current MC x NC trapezoidal panel of C by better taking
      into account the microtile structure along the diagonal. Previously,
      it was an underestimate, as it assumed MR = NR = 1 (that is, it
      assumed that the microtile column of C that overlapped with microtiles
      exactly coincided with the diagonal). Now, we only assume MR = NR.
      This is still a slight underestimate when MR != NR, so the additional
      area is scaled by 1.5 in a hackish attempt to compensate for this, as
      well as other additional effects that are difficult to model (such as
      the increased cost of writing to temporary tiles before finally
      updating C). The net effect of this better estimation of the
      trapezoidal area should be (on average) slightly larger regions
      assigned to threads that have little or no overlap with the diagonal
      region (and correspondingly slightly smaller regions in the diagonal
      region), which we expect will lead to slightly better load balancing
      in most situations.
    - Spun off the contents of bli_thread.[ch] that relate to computing
      thread ranges into one of three source/header file pairs:
      - bli_thread_range.[ch], which define functions that are not specific
        to the jr/ir loops;
      - bli_thread_range_slab_rr.[ch], which define functions that implement
        slab or round-robin partitioning for the jr/ir loops;
      - bli_thread_range_tlb.[ch], which define functions that implement
        tlb for the jr/ir loops.
    - Fixed the computation of a_next in the last iteration of the IR loop
      in bli_gemmt_l_ker_var2(). Previously, it always "wrapped" back around
      to the first micropanel of the current MC x KC packed block of A.
      However, this is almost never actually the micropanel that is used
      next. A new macro, bli_gemmt_l_wrap_a_upanel(), computes a_next
      correctly, with a similarly named bli_gemmt_u_wrap_a_upanel() for use
      in the upper-stored case (which *does* actually always choose the
      first micropanel of A as its a_next at the end of the IR loop).
    - Removed adjustments for a_next/b_next (a2/b2) for the diagonal-
      intersecting case of gemmt_l_ker_var2() and the above-diagonal case
      of gemmt_u_ker_var2() since these cases will only coincide with the
      last iteration of the IR loop in very small problems.
    - Defined bli_is_last_iter_l() and bli_is_last_iter_u(), the latter of
      which explicitly considers whether the current microtile is the last
      tile that intersects the diagonal. (The former does the same, but the
      computation coincides with the original bli_is_last_iter().) These
      functions are now used in gemmt to test when a_next (or a2) should
      "wrap" (as discussed above). Also defined bli_is_last_iter_tlb_l()
      and bli_is_last_iter_tlb_u(), which are similar to the aforementioned
      functions but are used when employing tlb in gemmt.
    - Redefined macros in bli_packm_thrinfo.h, which test whether an
      iteration of work is assigned to a thread, as static inline functions
      in bli_param_macro_defs.h (and then deleted bli_packm_thrinfo.h).
      In the process of redefining these macros, I also renamed them from
      bli_packm_my_iter_rr/sl() to bli_is_my_iter_rr/sl().
    - Renamed
        bli_thread_range_jrir_rr() -> bli_thread_range_rr()
        bli_thread_range_jrir_sl() -> bli_thread_range_sl()
        bli_thread_range_jrir()    -> bli_thread_range_slrr()
    - Renamed
        bli_is_last_iter() -> bli_is_last_iter_slrr()
    - Defined
        bli_info_get_thread_jrir_tlb()
      and renamed:
      - bli_info_get_thread_part_jrir_slab() ->
        bli_info_get_thread_jrir_slab()
      - bli_info_get_thread_part_jrir_rr() ->
        bli_info_get_thread_jrir_rr()
    - Modified bli_rntm_set_ways_for_op() to redirect IR loop parallelism
      into the JR loop when tlb is enabled for non-trsm level-3 operations.
    - Added a sanity check to prevent bli_prune_unref_mparts() from being
      used on packed objects. This prohibition is necessary because the
      current implementation does not take into account the atomicity of
      packed micropanel widths relative to the diagonal of structured
      matrices. That is, the function prunes greedily without regard to
      whether doing so would prune off part of a micropanel *which has
      already been packed* and assigned to a thread for inclusion in the
      computation.
    - Further restricted early returns in bli_prune_unref_mparts() to
      situations where the primary matrix is not only of general structure
      but also dense (in terms of its uplo_t value). The addition of the
      matrix's dense-ness to the conditional is required because gemmt is
      somewhat unusual in that its C matrix has general structure but is
      marked as lower- or upper-stored via its uplo_t. By only checking
      for general structure, attempts to prune gemmt C matrices would
      incorrectly result in early returns, even though that operation
      effectively treats the matrix as symmetric (and stored in only one
      triangle).
    - Fixed a latent bug in bli_thread_range_rr() wherein incorrect ranges
      were computed when 1 < bf. Thankfully, this bug was not yet
      manifesting since all current invocations used bf == 1.
    - Fixed a latent bug in some unexercised code in bli_?gemmt_l_ker_var2()
      that would perform incorrect pruning of unreferenced regions above
      where the diagonal of a lower-stored matrix intersects the right edge.
      Thankfully, the bug was not harming anything since those unreferenced
      regions were being pruned prior to the macrokernel.
    - Rewrote slab/rr-based gemmt macrokernels so that they no longer carved
      C into rectangular and diagonal regions prior to parallelizing each
      separately. The new macrokernels use a unified loop structure where
      quadratic (slab) partitioning is used.
    - Updated all level-3 macrokernels to have a more uniform coding style,
      such as wrt combining variable declarations with initializations as
      well as the use of const.
    - Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and
      bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and
      bli_thrinfo_thread_id(), respectively. This change probably should
      have been included in aeb5f0c.
    - Removed old prototypes in bli_gemmt_var.h and bli_trmm_var.h that
      corresponded to functions that were removed in aeb5f0c.
    - Other very minor cleanups.
    - Comment updates.

commit b6735ca26b9d459d9253795dc5841ae8de9e84c9
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jan 6 14:10:01 2023 -0600

    Refactor structure awareness in packm_blk_var1.c. (#707)
    
    Details:
    - Factored some of the structure awareness out of the loop in
      bli_packm_blk_var1(). So instead of having a single loop with
      conditionals in the body to handle various kinds of structure (and
      stored/unstored submatrix placement), we now have a conditional branch
      to handle various structure/storage scenarios with a loop in each
      section. This change was originally motivated to choose slab or round-
      robin partitioning (in the context of triangular matrices) based on
      the structure of the entire block (or panel) being packed rather than
      each micropanel individually. Previously, the code would attempt to
      limit rr to the portion of the block that intersects the diagonal and
      use slab for the remainder. However, that approach was not well-thought
      out and in many situations this would lead to inferior load balancing
      when compared to using round-robin for the entire block (or panel).
      This commit has the added benefit of incurring less overhead during
      the packing process now that each of the new loops is simpler.

commit f956b79922da412791e4c8b8b846b3aafc0a5ee0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Dec 31 20:18:08 2022 -0600

    Switch to l3 sup decorator in gemmlike sandbox. (#704)
    
    Details:
    - Modified the gemmlike sandbox to call bli_l3_sup_thread_decorator()
      rather than a local analogue of that code. This reduces redundant
      logic and makes it easier for the sandbox to inherit future
      improvements to the framework's threading code.
    - Moved addon/gemmd to addon/old/gemmd. This code has fallen out of date
      and is taking too much effort to maintain. We will very likely
      reimplement it completely once future changes are made to the
      framework proper.

commit 538150c5845ad903773ca797c740048174116aa4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Dec 25 22:28:09 2022 -0600

    Applied race condition fix to sup thread decorator.
    
    Details:
    - Applied the race condition bugfix in commit 7d23dc2 to the
      corresponding sup code in bli_l3_sup_decor.c. Note that in the case
      of sup, the race condition would have only manifested when optional
      packing was enabled at runtime (typically via setting BLIS_PACK_A
      and/or BLIS_PACK_B environment variables).
    - Both the fix in this commit and the fix in 7d23dc2 address bugs
      that were introduced when the thrinfo_t trees/communicators were
      restructured in the October omnibus commit (aeb5f0c).

commit 7d23dc2a064a371dc9883e2c2c7236a70912428c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Dec 25 19:09:14 2022 -0600

    Fix a race condition which manifested as incorrect results (rarely). (#702)
    
    The problem occurs when there are at least two teams of threads packing different parts of a matrix, and where each team has at least two threads; call them team A and team B. The problematic sequence is:
    
    1. The chief of team A checks out a block B and broadcasts the pointer to its teammates.
    2. Team A completely packs their data and perform a barrier amongst themselves.
    3. Team A commences computing with the packed data.
    4. The chief of team A finishes computing before its teammates, then calls bli_thrinfo_free on its thrinfo_t struct (which contains the mem_t object referencing the buffer B). This causes buffer B to be checked back in to the pba.
    5. The chief of team B checks out the *same* block B that was just checked back in and broadcasts the pointer to its teammates.
    6. DATA RACE: now the remaining threads of team A are reading *while* team B are writing to the same buffer B. If team A write new data before team B are done computing then an incorrect result is generated.
    
    The solution is to place a global barrier before the call to bli_thrinfo_free at the end of the computation.
    
    Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>

commit 3accacf57d11e9b109339754f91bf22329b6cb6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 16 10:26:33 2022 -0600

    Skip 1m optimization when forcing hemm_l/symm_l. (#697)
    
    Details:
    - Fixed a bug in right-sided hemm when:
      - using the 1m method,
      - #defining BLIS_DISABLE_HEMM_RIGHT in the active subconfiguration,
        and
      - the storage of C matches the gemm microkernel IO preference PRIOR to
        the right-sidedness being detected and recast in terms of the left-
        side code path.
      It turns out that bli_gemm_ind_recast_1m_params() was applying its
      optimization (recasting a complex-domain macrokernel calling a 1m
      virtual microkernel to a real-domain macrokernel calling the real-
      domain microkernel) in situations in which it should not have. The
      optimization was silently assuming that the storage of C always
      matched that of the microkernel preference, since the front-end (in
      this case, bli_hemm_front()) would have already had a chance to
      transpose the operation to bring the two into agreement. However, by
      disabling right-sided hemm, we deprive BLIS of that flexibility (as a
      transposed left-sided case would necessarily have to become a right-
      sided case), and thus the assumption was no longer holding in all
      cases. Thanks to Nisanth M P for reporting this bug in Issue #621.
    - The aforementioned bug, and its bugfix, also apply to symm when
      BLIS_DISABLE_SYMM_RIGHT is defined.
    - Comment updates.
    - CREDITS file update.

commit 4833ba224eba54df3f349bcb7e188bcc53442449
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 12 20:26:02 2022 -0600

    Fixed perf of mt sup with packing, and mt gemmlike. (#696)
    
    Details:
    - Brought the gemmsup code path up to date relative to the latest
      thrinfo_t semantics introduced in the October Omnibus commit
      (aeb5f0c). This was done by passing the prenode (instead of the
      current node) into the packm variant within bli_l3_sup_packm.c as well
      as creating the prenodes and attaching them to the thrinfo_t tree in
      bli_l3_sup_thrinfo_create(). These changes erase the performance
      degradation introduced in the omnibus when running multithreaded sup
      with optional packing enabled. Special thanks to Devin Matthews for
      sussing out this fix in short order.
    - Fixed the gemmlike sandbox in a manner similar to that of sup with
      packing, described above. This also involved passing the prenode into
      the local gemmlike packm variant. (Recall that gemmlike recycles the
      use of bli_l3_sup_thrinfo_create(), so it automatically inherits that
      part of the sup fix described above.)
    - Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and
      bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and
      bli_thrinfo_thread_id(), respectively.

commit db10dd8e11a12d85017f84455558a82c0093b1da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 29 19:10:31 2022 -0600

    Fixed _gemm_small() prototype; disabled gemm_small.
    
    Details:
    - Fixed a mismatch between the prototype for bli_gemm_small() in
      bli_gemm_front.h and the actual definition of bli_gemm_small() in
      kernels/zen/3/bli_gemm_small.c. The former was erroneously declaring
      the cntl_t* argument as 'const'. Thanks to Jeff Diamond for reporting
      this issue.
    - Commented out BLIS_ENABLE_SMALL_MATRIX, BLIS_ENABLE_SMALL_MATRIX_TRSM
      macro definitions in config/zen3/bli_family_zen3.h. AMD's small matrix
      implementation should probably remain disabled in vanilla BLIS, at
      least for now.

commit f0337b784d164ae505ca0e11277a1155680500d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Nov 13 21:36:47 2022 -0600

    Trival whitespace/comment tweaks.
    
    Details:
    - Trivial whitespace and comment changes, most of which ideally would
      have been part of the previous commit pertaining to HPX (2b05948).

commit 2b05948ad2c9785bc53f376d53a7141cbc917447
Author: ct-clmsn <ct.clmsn@gmail.com>
Date:   Sun Nov 13 17:40:22 2022 -0500

    blis support for hpx (#682)
    
    Implement threading backend via HPX.
    
    HPX is an asynchronous many task runtime system used in high performance computing applications. The runtime implements the ISO C++ parallelism specification and provides a user-space thread implementation.
    
    This PR provides BLIS a thread backend implementation using HPX and resolves feature request #681. The configuration script, makefiles, and testsuite have been updated to support an HPX build option. The addition of HPX support provides other developers an exemplar for integrating other C++ threading backends into BLIS.
    
    Co-authored-by: ctaylor <ctaylor@pennywise.cm.cluster>
    Co-authored-by: Devin Matthews <damatthews@smu.edu>

commit e1ea25da43508925e33d4e57e420cfc0a9de793f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 11 12:07:51 2022 -0600

    Fixed subtle barrier_fpa bug in bli_thrcomm.c. (#690)
    
    Details:
    - In bli_thrcommo.c, correctly initialize the BLIS_OPENMP element of the
      barrier function pointer array (barrier_fpa) to NULL when
      BLIS_ENABLE_OPENMP is *not* defined. Similarly, initialize the
      BLIS_POSIX element of barrier_fpa to NULL when BLIS_ENABLE_PTHREADS is
      not enabled. This bug was introduced in a1a5a9b and was likely the
      result of an incomplete edit. The effects of the bug would have
      likely manifested when querying a thrcomm_t that was initialized with
      a timpl_t value corresponding to a threading implementation that was
      omitted from the -t option at configure-time.

commit dc6e5f3f5770074ba38554541b8b64711a68c084
Author: leekillough <15950023+leekillough@users.noreply.github.com>
Date:   Thu Nov 3 18:33:08 2022 -0500

    Enhance emacs formatting of C files to remove trailing whitespace and ensure a newline at the end of file

commit 713d078075a4a563a43d83fd0880ab5091c2e4a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 3 20:00:11 2022 -0500

    Delete mpi_test garbage. (#689)
    
    Details:
    - tlrmchlsmth: "What even is this? No comments, no commit message, not
      used by anything. Trash."

commit 8d813f7f12732d52c95570ae884d5defbfd19234
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 3 19:10:47 2022 -0500

    Some decluttering of the top-level directory.
    
    Details:
    - Relocated 'mpi_test' directory to test/mpi_test.
    - Relocated 'so_version' and 'version' files from top-level directory to
      'build' directory.
    - Updated build/bump-version.sh script to accommodate relocation of
      'version' file to 'build' directory.
    - Updated configure script to accommodate relocation of 'so_version'
      file to 'build' directory.
    - Updated INSTALL file to replace pointers to blis-devel mailing list
      with a pointer to docs/Discord.md.
    - Updated RELEASING file to contain a reminder to consider whether the
      so_version file should be updated prior to the release.

commit 6774bf08c92fc6983706a91bbb93b960e8eef285
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date:   Thu Nov 3 15:20:47 2022 -0500

    Fix typo in configure --help text. (#686)
    
    Details:
    - Fixed a misspelling in the --help description for the --int-size (-i)
      configure option.

commit 872898d817f35702e7678ff7f3eeff0f12e641f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 2 21:53:22 2022 -0500

    Fixed trmm[3]/trsm performance bug in cf7d616. (#685)
    
    Details:
    - Fixed a performance bug in the packing of micropanels that intersect
      the diagonal of triangular matrices (i.e., those found in trmm, trmm3,
      and trsm). This bug was introduced in cf7d616 and stemmed from an
      ill-formed boolean conditional expression in bli_packm_blk_var1().
      This conditional would chose when to use round-robin parallel work
      allocation, but checked for the triangularity of the submatrix being
      packed while failing also to check for whether the current micropanel
      actually intersected the diagonal. The net result of this bug was that
      *all* micropanels of a triangular matrix, no matter where the upanels
      resided within the matrix, were assigned to threads via a round-robin
      policy. This affected some microarchitectures and threading
      configurations much worse than others, but it seems that overall the
      effect was universally negative, likely because of the reduced spatial
      locality during the packing with round-robin. Thanks to Leick Robinson
      for his tireless efforts in helping track down this issue.

commit edcc2f9940449f7d9cefcfc02159d27b013e7995
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 2 19:04:49 2022 -0500

    Support --nosup, --sup configure options. (#684)
    
    Details:
    - Added --nosup and --sup as alternative ways of requesting that sup be
      disabled or enabled. These are analagous to --disable-sup-handling and
      --enable-sup-handling, respectively. (I got tired of typing out
      --disable-sup-handling and needed a shorthand notation.)
    - Tweaked message output by configure when sup is enable/disabled for
      clarity and specificity.
    - Whitespace changes.

commit 5eea6ad9eb25f37685d1ae4ae08c73cd1daca297
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 2 17:07:54 2022 -0500

    Add mention of Wilkinson Prize to README.md. (#683)
    
    Details:
    - Added blurbs and links to Wilkinson Prize to README.md.
    - Added mention of both Best Paper and Wilkinson Prizes to the top of
      README.md.
    - Other minor tweaks.

commit 29f79f030e939969d4f3876c4fdaac7b0c5daa63
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 31 18:57:45 2022 -0500

    Fixed performance bug caused by redundant packing. (#680)
    
    Details:
    - Fixed a performance bug whereby multiple threads were redundantly
      packing the same (rather than separate) micropanels. This bug was
      caused by different parts of the code using the num_threads/thread_id
      field of the thrinfo_t vs. the n_way/work_id fields. The fix was to
      standardize on the latter and provide a "fake" thrinfo_t sub-prenode
      in the thrinfo tree which consists of single-member thread teams. The
      single team with multiple threads node is still required since it and
      only it can be used to perform barriers and broadcasts (e.g. of the
      packed buffer pointer).

commit aeb5f0cc19665456e990a7ffccdb09da2e3f504b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 27 12:39:11 2022 -0500

    Omnibus PR - Oct 2023 (#678)
    
    Details:
    - This is an "omnibus" commit, consisting of multiple medium-sized
      commits that affect non-trivial aspects of BLIS. The major highlights:
      - Relocated the pba, sba pool (from the rntm_t), and mem_t (from the
        cntl_t) to the thrinfo_t object. This allows the rntm_t to be
        effectively const (although it is sometimes copied internally and
        modified to reflect different ways of parallelism). Moving the mem_t
        sets the stage for sharing a global control tree amongst all
        threads.
      - De-templatized the macrokernels for gemmt, trmm, and trsm to match
        the macrokernel for gemm, which has been de-templatized since
        54fa28b.
      - Reimplemented bli_l3_determine_kc() by separating out the logic for
        adjusting KC based on MR/NR for triangular A and/or B into a new
        function, bli_l3_adjust_kc(). For now, this function is still called
        from bli_l3_determine_kc(), but in the future we plan to have it
        called once when constructing the control tree.
      - Refactored the level-3 thread decorator into two parts:
        - One part deals only with launching threads, each one calling a
          generic thread entry function. This code resides in frame/thread
          and constitutes the definition of bli_thread_launch(). Note that
          it is specific to the threading implementation (OpenMP, pthreads,
          single, etc.)
        - The other part deals with passing the matrix operands and related
          information into bli_thread_launch(). This is the "l3 decorator"
          and now resides in frame/3. It is agnostic to the threading
          implementation.
      - Modified the "level" of the thread control tree passed in at each
        operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was
        passed in a communicator representing the active thread teams which
        would share the available work. Now, the *parent* thread comm is
        passed in. The operation then grabs the child comm and uses it to
        partition the work. The difference is in bli_trsm_blk_var1(), where
        there are now two children nodes for this single operation (i.e. the
        thread control tree is split one level above where the control tree
        is). The sub-prenode is used for the trsm subproblem while the
        normal sub-node is used for the gemm part. Importantly, the parent
        comm is used for the barrier between them.
    - Removed cntl_t* arguments from bli_*_front() functions. These will be
      added back in the future when the control tree's creation is moved so
      that it happens much sooner (provided that bli_*_front() have not been
      absorbed into their respective bli_*_ex() functions).
    - Renamed various bli_thread_*() query functions to bli_thrinfo_*(),
      for consistency. This includes _num_threads(), _thread_id(), _n_way(),
      _work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and
      _am_chief().
    - Removed extraneous barrier from _blk_var3() of gemm and trsm.
    - Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was
      misspelled.

commit c803b03e52a7a6997a8d304a8cfa9acf7c1c555b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Oct 26 18:20:00 2022 -0500

    Add check to disable armsve on Apple M1.

commit 2dd692b710b6a9889f7ebdd7934a2108be5c5530
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Oct 26 18:10:26 2022 -0500

    Fix auto-detection of firestorm (Apple M1).

commit 88105dbecf0f9dfbfa30215743346e8bd6afb971
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 21 15:16:12 2022 -0500

    Added Discord documentation (#677)
    
    Details:
    - Added a docs/Discord.md markdown document that walks the reader
      through creating a Discord account, obtaining the invite link, and
      using the link to join the BLIS Discord server.
    - Updated README.md to reference the new Discord.md document in multiple
      places, including via the official Discord logo (used with explicit
      permission from representatives at Discord Inc.).

commit 23f5b8df3e802a27bacd92571184ec57bbdfa646
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 17 20:21:21 2022 -0500

    Shuffled checked properties in bli_l3_check.c. (#676)
    
    Details:
    - Added certain checks for matrix structure to the level-3 operations'
      _check() functions, and slightly reorganized existing checks.

commit 9453e0f163503f64a290256b4be53d8882224863
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 3 19:46:20 2022 -0500

    CREDITS file update.
    
    Details:
    - This attribution was intended to go in PR #647.

commit 76a23bd8c33e161221891935a489df9a9fb9c8c0
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 3 15:55:07 2022 -0500

    Reinstate sanity check in bli_pool_finalize. (#671)
    
    Details:
    - Added a reinit argument to bli_pool_finalize(). This bool will signal
      whether or not the function is being called from bli_pool_reinit(). If
      it is not being called from _reinit(), we can safely check to confirm
      that .top_index == 0 (i.e., all blocks have been checked in). But if
      it *is* being called from _reinit(), then that check will be skipped
      since one of the predicted use cases for bli_pool_reinit() anticipates
      that some blocks are (probably) checked out when the pool_t is
      reinitialized.
    - Updated existing invocations of bli_pool_finalize() to pass in either
      FALSE (from bli_apool_free_block() or bli_pba_finalize_pools()) or
      TRUE (from bli_pool_reinit()) for the new reinit argument.

commit 63470b49e3b9b15e00a8f666e86ccd70c6005fe9
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Sep 29 18:52:08 2022 -0500

    Fix some bugs in bli_pool.c (#670)
    
    Details:
    - Add a check for premature pool exhaustion when checking in blocks via
      bli_pool_checkin_block(). This detects "double-free" and other bad
      conditions that don't necessarily result in a segfault.
    - Make sure to copy all block pointers when growing the pool size.
      Previously, checked-out block pointers (which are guaranteed to be set
      to NULL) were not being copied, leading to the presence of
      uninitialized data.

commit 42d0e66318b186d25eeb215b40ce26115401ed8b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Sep 29 17:38:02 2022 -0500

    Add AddressSanitizer (-fsanitize=address) option. (#669)
    
    Details:
    - Added support for AddressSanitizer (ASan), a compiler-integrated
      memory error detector. The option (disabled by default) enables
      compiling and linking with the -fsanitize=address flag supported by
      clang, gcc, and probably others. This flag is employed during
      compilation of all BLIS source files *except* for optimized kernels,
      which are exempted because ASan usually requires an extra register,
      which violates the constraints for many gemm microkernels.
    - Minor whitespace, comment, ordering, and configure help text updates.

commit b861c71b50c6d48cb07282f44aa9dddffc1f1b3f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 23 13:22:27 2022 -0500

    Add consistent NaN/Inf handling in sumsqv. (#668)
    
    Details:
    - Changed sumsqv implementation as follows:
      - If there is a NaN (either real or imaginary), then return a sum of
        NaN and unit scale.
      - Else, if there is an Inf (either real or imaginary), then return a
        sum of +Inf and unit scale.
      - Otherwise behave as normal.

commit ee81efc7887374c974a78bfb3e0865776b2f97a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 22 19:15:07 2022 -0500

    Parameterized test/3 drivers via command line args. (#667)
    
    Details:
    - Rewrote the drivers in test/3, the Makefile, and the runme.sh script
      so that most of the important parameters, including parameter combo,
      datatype, storage combo, induced method, problem size range, dimension
      bindings, number of repeats, and alpha/beta values can be passed in
      via command line arguments. (Previously, most of these parameters were
      hard-coded into the driver source, except a few that were hard-coded
      into the Makefile.) If no argument is given for any particular option,
      it will be assigned a sane default. Either way, the values employed at
      runtime will be printed to stdout before the performance data in a
      section that is commented out with '%' characters (which is used by
      matlab and octave for comments), unless the -q option is given, in
      which case the driver will proceed quietly and output only performance
      data. Each driver also provides extensive help via the -h option, with
      the help text tailored for the operation in question (e.g. gemm, hemm,
      herk, etc.). In this help text, the driver reminds the user which
      implementation it was linked to (e.g. blis, openblas, vendor, eigen).
      Thanks to Jeff Diamond for suggesting this CLI-based reimagining of
      the test/3 drivers.
    - In the test/3 drivers: converted cpp macro string constants, as well
      as two string literals (for the opname and pc_str) used in each test
      driver, to global (or static) const char* strings, and replaced the
      use of strncpy() for storing the results of the command line argument
      parsing with pointer copies from the corresponding strings in argv.
      This works because the argv array is guaranteed by the C99 standard
      to persist throughout the life of the program. This new approach uses
      less storage and executes faster. Thanks to Minh Quan Ho for
      recommending this change.
    - Renamed the IMP_STR cpp macro that gets defined on the command line,
      via the test/3/Makefile, to IMPL_STR.
    - Updated runme.sh to set the problem size ranges for single-threaded
      and multithreaded execution independently from one another, as well as
      on a per-system basis.
    - Added a 'quiet' variable to runme.sh that can easily toggle quiet mode
      for the test drivers' output.
    - Very minor typecast fix in call to bli_getopt() in bli_utils.c.
    - In bli_getopt(), changed the nextchar variable from being a local
      static variable to a field of the getopt_t state struct. (Not sure why
      it was ever declared static to begin with.)
    - Other minor changes to bli_getopt() to accommodate the rewritten test
      drivers' command line parsing needs.

commit 036a4f9d822df25a76a653e70be76fb02284d3d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 22 18:36:50 2022 -0500

    Refactored some rntm_t management code. (#666)
    
    Details:
    - Separated the "sanitizing" code from the auto-factorization code
      in bli_rntm_set_ways_from_rntm() and _rntm_set_ways_from_rntm_sup().
      The santizing code now resides in bli_rntm_sanitize() while the
      factorization code resides in bli_rntm_factorize() and
      bli_rntm_factorize_sup(). (There are two different functions because
      the conventional and sup factorization codes are currently somewhat
      different.) Also note that the factorization code now relies on the
      .auto_factor field to have already been set, either during
      rntm_t initialization or when the rntm_t was previously updated and
      santized. So rather than locally determining whether to auto-
      factorize, those functions just read the .auto_factor field and
      proceed accordingly.
    - Refactored and removed most code from bli_thread_init_rntm_from_env().
      This function now reads the environment variables needed to set nt,
      jc, pc, ic, jr, and ir; sets them into the global rntm_t; and then
      calls bli_rntm_sanitize() in order to make sure that the contents are
      in a "good" state. Thanks to Devin Matthews for suggesting this
      refactoring.
    - Redefined bli_rntm_set_num_threads() and bli_rntm_set_ways() such that
      if multithreading is disabled at compile time (that is, if the cpp
      macro BLIS_ENABLE_MULTITHREADING is undefined), they ignore the
      caller's request and instead clear the nt and ways fields.
    - Redefined bli_thread_set_num_threads() and bli_thread_set_ways() such
      that if multithreading is disabled at compile time (that is, if the
      cpp macro BLIS_ENABLE_MULTITHREADING is undefined), they ignore the
      caller's request and do nothing.
    - Redefined bli_rntm_set_num_threads() and bli_rntm_set_ways() as true
      functions rather than static inline functions.
    - In bli_rntm.c, statically initialize the global_rntm global variable
      via the BLIS_RNTM_INITIALIZER macro.
    - In bli_rntm.h, defined bli_rntm_clear_auto_factor(), which sets the
      .auto_factor field of the rntm_t to FALSE.
    - Reorganized order of some inline function definitions in bli_rntm.h.
    - Changed the default value given to the .auto_factor field by the
      BLIS_RNTM_INITIALIZER macro from TRUE to FALSE.
    - Call bli_rntm_clear_auto_factor() instead of
      bli_rntm_set_auto_factor_only() in bli_rntm_init().
    - Comment/whitespace updates.

commit a1a5a9b4cbef9208da494c45a2f933a8e82559ac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 21 18:31:01 2022 -0500

    Implemented support for fat multithreading. (#665)
    
    Details:
    - Allow the user to configure BLIS in such a way that multiple threading
      implementations get compiled into the library, with one of those
      implementations chosen at runtime. For now, there are only three
      implementations available: OpenMP, pthreads, and single. (Here,
      'single' merely refers to single-threaded mode.) The configure script
      now allows the user to give the -t option with a comma-separated list
      of values, such as '-t openmp,pthreads'. The first value in the list
      will always be the default at library initialization time, and
      'single' is always silently appended to the end of the list. The user
      can specify which implementation should execute in one of three ways:
      by setting the BLIS_THREAD_IMPL environment variable prior to launch;
      by calling the bli_thread_set_thread_impl() global runtime API; or by
      encoding their choice into a rntm_t that is passed into one of the
      expert interfaces. Any of these three choices overrides the
      initialization-time default (i.e., the first value listed to the -t
      configure option). Requesting an implementation that was not compiled
      into the library will result in an error message followed by
      bli_abort().
    - Relocated the 'auto' logic for the -t option from the top-level
      Makefile to the configure script. (Currently, this logic is pretty
      dumb, choosing 'openmp' for gcc and icc, and 'pthreads' for clang.)
    - Defined a new 'timpl_t' enum in bli_type_defs.h, with three valid
      values: BLIS_SINGLE, BLIS_OPENMP, BLIS_POSIX.
    - Reorganized the thrcomm_t struct into a single defintion with two
      preprocessor blocks, one each for additional fields needed by OpenMP
      and pthreads.
    - Added timpl_t argument to bli_thrcomm_bcast(), bli_thrcomm_barrier(),
      bli_thrcomm_init(), and bli_thrcomm_cleanup(), which these functions
      need since they are now wrappers that choose the implementation-
      specific function corresponding to the currently enabled threading
      implementation.
    - Added rntm_t* to bli_thread_broadcast(), bli_thread_barrier() so that
      those functions can pass the timpl_t value into bli_thrcomm_bcast()
      and bli_thrcomm_barrier(), respectively.
    - Defined bli_env_get_str() in bli_env.c to allow the querying of
      BLIS_THREAD_IMPL (which, unlike BLIS_NUM_THREADS and friends, is
      expected to be a string).
    - Defined bli_thread_get_thread_impl(), bli_thread_set_thread_impl() to
      get and set the current threading implementation at runtime.
    - Defined bli_rntm_thread_impl() and bli_rntm_set_thread_impl() to query
      and set the threading implementation within a rntm_t. Also choose
      BLIS_SINGLE as the default value when initializing rntm_t structs.
    - Added bli_info_get_*() functions to query whether OpenMP or pthreads
      would be chosen as the default at init-time. Note that this only
      tests whether OpenMP or pthreads is the first implementation in the
      list passed to the threading configure option (-t) and is *not* the
      same as querying which implementation is currently selected, since
      that can be influenced by BLIS_THREAD_IMPL and/or
      bli_thread_set_thread_impl().
    - Changed l3int_t to l3int_ft.
    - Updated docs/Multithreading.md to document the new behavior.
    - Updated sandbox/gemmlike and addon/gemmd to work with the new fat
      threading feature. This included a few bugfixes to bring the codes up
      to date, as necessary.
    - Comment, whitespace updates.

commit 89df7b8fa3a3e47ab2fc10ac4d65d0b9fde16942
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Sep 18 18:46:57 2022 -0500

    De-templatized _sup_var1n2m.c; unified _sup_packm_a/b(). (#659)
    
    Details:
    - Re-expressed the two variants in frame/3/bli_l3_sup_var1n2m.c as a
      single function each that performs char* pointer arithmetic rather
      than four datatype-specific functions. Did the same for the functions
      in bli_l3_sup_packm_a.c and _sup_packm_b.c, and then unified the two
      into a single set of functions for packing either A or B, which now
      resides in bli_l3_sup_packm.c.
    - Pre-grow the cntl_t tree in both bli_l3_sup_var1n2m.c variants rather
      than grow them incrementally.
    - Relocated empty-matrix and scale-by-beta early return handlnig from
      bli_gemm_front() and bli_gemmt_front() to their _ex() counterparts.
    - Comment, whitespace updates.

commit fb91337eff1ee2098f315a83888f6667b3a56f86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 15 19:08:10 2022 -0500

    Fixed a harmless pc_nt bug in 05a811e.
    
    Details:
    - Added missing curly braces around some statements in bli_rntm.c, one
      of which  needed them in order for the relevant code to be executed in
      the intended way. The consequence of 05a811e omitting those braces was
      that a statement (pc_nt = 1;) was executed more often than it needed
      to be.
    - Also adjusted the analagous code in bli_thread.c to match that of
      bli_rntm.c.

commit e86076bf4461d1a78186fb21ba8320cfb430f62c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 15 14:22:59 2022 -0500

    Test the 'gemmlike' sandbox via AppVeyor. (#664)
    
    Details:
    - Added a fifth test to our .appveyor.yml that enables the 'gemmlike'
      sandbox with OpenMP enabled (via clang, the 'auto' configuration
      target, and building to a static library). Thanks to Jeff Diamond
      for pointing out that this test would be useful.

commit 63177dca48cb7d066576d884da4a7a599ececebf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 15 11:21:26 2022 -0500

    Fixed gemmlike sandbox bug introduced in 7c07b47.
    
    Details:
    - Fixed a bug in the 'gemmlike' sandbox that was introduced in 7c07b47.
      This bug was the result of the fact that the gemmlike implementation
      uses bli_thrinfo_sup_grow() to grow its thrinfo_t tree, but the
      aforementioned commit added an optimization that kicks in when the
      rntm_t .pack_a and .pack_b fields are both FALSE. Those fields were
      originally added only for sup execution; for large code path, they
      are intended to be ignored. But the default initial state of a rntm_t
      has those fields set to FALSE, which was inadvertantly activating the
      optimization (which targeted single-threaded cases only) and would
      cause multithreaded use cases of 'gemmlike' to segfault. The fix took
      the form of setting the .pack_a and .pack_b fields to TRUE in
      bls_gemm_ex().
    - Added minimal 'const' and 'const'-casting to 'gemmlike' so that gcc
      stays quiet.

commit 05a811e898b371a76581abd4afa416980cce7db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 13 19:24:05 2022 -0500

    Initialize rntm_t nt/ways fields with 1 (not -1). (#663)
    
    Details:
    - Changed the way that rntm_t structs are initialized, mainly so that
      the global rntm_t that is set via environment variables at runtime
      may be queried by the application prior to any computation taking
      place. (Strictly speaking, the application may already query these
      fields, but they do not always contain valid values and often contain
      -1 when they are unset.) These changes also served to clarify how
      these parameters are treated, and homogenized the implementations of
      bli_rntm_set_ways_from_rntm(), bli_rntm_set_ways_from_rntm_sup(), and
      bli_thread_init_rntm_from_env(). Special thanks to Jeff Diamond,
      Leick Robinson, and Devin Matthews for pointing out that the previous
      behavior was needlessly confusing and could be improved.
    - The aforementioned modifications also included subtle changes as to
      what counts as "setting" a loop's ways of parallelism for the purposes
      of deciding whether to use the ways or the total number of threads.
      Previously, setting any loop's ways, even to 1, counted in favor of
      using the ways. Now, only values greater than 1 will count as
      "setting", and all other values will silently be mapped to 1, with
      those parameters treated as if they were untouched all along.
    - Updated bli_rntm.h and bli_thread.c so that any attempt to set the
      PC_NT variable (or pc_nt field of a rntm_t) will either ignore the
      request or reassert the value as 1.
    - Updated bli_rntm_set_ways() so that rather than clear the
      num_threads field, it is set to the product of all of the per-loop
      ways of parallelism.
    - Removed code from test_libblis.c that handled the possibility of unset
      environment variables when printing out their values.
    - Removed bli_rntm_equals() inline function from bli_rntm.h, which has
      long been disabled.
    - Updates to docs/Multithreading.md related to the aforementioned
      changes.
    - Comment updates.

commit fd885cf98f4fe1d3bc46468e567776c37c670fcc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 13 11:50:23 2022 -0500

    Use kernel CFLAGS for 'kernels' subdirs in addons. (#658)
    
    Details:
    - Updated Makefile and common.mk so that the targeted configuration's
      kernel CFLAGS are applied to source files that are found in a
      'kernels' subdirectory within an enabled addon. For now, this
      behavior only applies when the 'kernels' directory is at the top
      level of the addon directory structure. For example, if there is an
      addon named 'foobar', the source code must be located in
      addon/foobar/kernels/ in order for it to be compiled with the target
      configurations's kernel CFLAGS. Any other source code within
      addon/foobar/ will be compiled with general-purpose CFLAGS (the same
      ones that were used on all addon code prior to this commit). Thanks
      to AMD (esp. Mithun Mohan) for suggesting this change and catching an
      intermediate bug in the PR.
    - Comment/whitespace updates.

commit cb74202db39dc8cb81fdd06f8a445f8837e27853
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 13 11:46:24 2022 -0500

    Fixed incorrect sizeof(type) in edge case macros. (#662)
    
    Details:
    - In bli_edge_case_macro_defs.h, the GEMM_UKR_SETUP_CT_PRE() and
      GEMMTRSM_UKR_SETUP_CT_PRE() macros previously declared their temporary
      ct microtiles as:
    
        PASTEMAC(ch,ctype)
              _ct[ BLIS_STACK_BUF_MAX_SIZE / sizeof( PASTEMAC(ch,type) ) ] \
                   __attribute__((aligned(alignment))); \
    
      The problem here is that sizeof( PASTEMAC(ch,type) ) evaluates to
      things like sizeof( BLIS_DOUBLE ), not sizeof( double ), and since
      BLIS_DOUBLE is an enum, it is typically an int, which means the
      sizeof() expression is evaluating to the wrong value. This was likely
      a benign bug, though, since BLIS does not support any computational
      datatypes that are smaller than sizeof( int ), which means the ct
      array would be *over*-allocated rather than underallocated. Thanks
      to @moon-chilled for identifying and reporting this bug in #624.
    - CREDITS file update.

commit 6e5431e8494b06bd80efcab3abf0a6456d6c0381
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Sep 10 15:16:58 2022 -0500

    Fix line number issue in flattened blis.h. (#660)
    
    Details:
    - Updated the top-level Makefile so that it invokes flatten-headers.py
      without the -c option, which was requesting that comments be stripped
      (since comment stripping is disabled by default).
    - Updated flatten-headers.py to accept a new option (-l) to enable
      insertion of #line directives into the output file. This new option
      is enabled by default.
    - Also added logic to flatten-headers.py that outputs a warning if both
      comment stripping and line numbers are requested since the comment
      stripping will cause the line numbers to become inaccurate.

commit 4afe0cfdab0e069e027f97920ea604249e34df47
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 8 18:33:20 2022 -0500

    Defined invscalv, invscalm, invscald operations. (#661)
    
    Details:
    - Defined invert-scale (invscal) operation on vectors (level-1v),
      matrices (level-1m), and diagonals (level-1d).
    - Added test modules for invscalv and invscalm to the testsuite.
    - Updated BLISObjectAPI.md and BLISTypedAPI.md API documentation to
      reflect the new operations. Also updated KernelsHowTo.md accordingly.
    - Renamed 'beta' to 'alpha' in scalv and scalm testsuite modules (and
      input.operations files) so that the parameter name matches the
      parameter used in the documentation.

commit a87eae2b11408b556e562f1b04e673c6cd1612bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 6 18:04:09 2022 -0500

    Added '-q' quiet mode option to testsuite. (#657)
    
    Details:
    - Added support for a '-q' command line option to the testsuite. This
      option suppresses most informational output that would normally
      clutter up the screen. By default, verbose mode (the previous
      status quo) will be operative, and so quiet mode must be requested.

commit dfa54139664a42d29774e140ec9e5597af869a76
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Tue Aug 30 08:07:50 2022 +0800

    Arm64 dgemmsup with extended MR&NR (#655)
    
    Details:
    - Since the number of registers in NEON is large but their lengths are
      short, I'm here extending both MR and NR.
    - The approach is to represent the C microtile in registers optionally
      in columns, so for sizes like 6x7m, the 'crr' kernel is the default
      with 'rrr' supported through an in-register transpose.
    - A few asm kernels are crafted for 'rv' to complete this extended size
      support.
    - For 'rd' I'm still relying heavily on C99 intrinsic kernels with
      branching so the performance might not be optimal. (Sorry for that.)
    - So far, these changes only affect the 'firestorm' subconfig.
    - This commit also contains row-preferential s12x8 and d6x8 gemm
      ukernels. These microkernels are templatized versions of the existing
      s8x12 and d6x8 ukernels defined in bli_gemm_armv8a_asm_d6x8.c.

commit 9e5594ad5fc41df8ef2825a025d7844ac2275c27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 11 14:36:38 2022 -0500

    Temporarily disabled #line directives from 6826c1c.
    
    Details:
    - Commented out the inclusion of #line preprocessor directives in the
      flattened header output provided by build/flatten-headers.py. This
      output was added recently in 6826c1c, but was later found to have
      thrown off the line numbering referenced by compiler warnings and
      errors (possibly due to license comment blocks, which are stripped
      from source headers as they are inlined into the monolithic header).

commit 775148bcdbb1014b4881a76306f35f5d0fedecbe
Author: jdiamondGitHub <jeff_diamond@fastmail.com>
Date:   Fri Aug 5 12:01:24 2022 -0500

    Updated ARMv8a kernels to fix 2 prefetching issues. (#649)
    
    Details:
    - The ARMv8a dgemm/sgemm microkernels had 2 prefetching issues that
      impacted performance on modern ARM platforms. The most significant
      issue was that only a single prefetch per C tile column was issued.
      When a column of C was not cache aligned, the second cache line would
      not be prefetched at all, forcing the kernel to wait for an entire
      load to update elements of C. This happened with roughly 50% of the
      C prefetches. The fix was to have two prefetches per column, spaced
      64 bytes (1 cache line) apart.
    - A secondary performance issue was that all the C prefetch instructions
      were issued sequentially at the beginning of the kernel call. This
      caused a noticeable performance slowdown. Interleaving the prefetch
      calls every 2-3 instructions in the prologue code solved the issue.

commit bbaf29abd942de47a3a99a80a67d12bab41b27db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 4 17:51:37 2022 -0500

    Very minor variable updates to common.mk.
    
    Details:
    - Fixed a harmless bug that would have allowed C++ headers into the list
      of header suffices specifically reserved for C99 headers. In practice,
      this would have had no substantive effect on anything since the core
      BLIS framework does not use C++ headers.

commit a48e29d799091a833213efeafaf2d342ebdafde9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 28 10:11:07 2022 -0500

    CREDITS file update.
    
    Details:
    - Thanks to Kihiro Bando for assisting with issue #644.

commit 5b298935de7f20462bfad1893ed34ecd691cec5a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 27 19:14:15 2022 -0500

    Removed buggy cruft from power10 subconfig.
    
    Details:
    - Removed #defines for BLIS_BBN_s and BLIS_BBN_d from
      bli_kernel_defs_power10.h. These were inadvertently set in ae10d949
      because the power10 subconfig was registering bb packm ukernels, but
      only for 6xk (power10 uses s8x16 and d8x8 ukernels) and only because
      the original author (probably) copy-pasted from power9 when getting
      started. That 6xk packm registration was effectively "dead code"
      prior to ae10d949, but was then mistaken as not-dead code during the
      ae10d949 refactor. These improper bb factors may have been causing
      bugs in power10 builds. Thanks to Nicholai Tukanov for helping remind
      me what the power10 subconfig was supposed to look like.
    - Removed extraneous microkernel preference registrations from power10
      subconfig. Preferences for single and double complex gemm were being
      registered despite there being no complex gemm ukernels registered to
      go with them. Similarly, there were trsm preferences registered
      without any trsm ukernels registered (and BLIS doesn't actually use a
      preference for the trsm ukernel anyway). These extraneous
      registrations were almost surely not hurting anything, even if they
      were quite misleading.

commit 56de31b00fa0f1ba866321817cd1e5d83000ff11
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jul 27 13:54:17 2022 -0500

    Disable modification of KC in the gemmsup kernels. (#648)
    
    This led to a ~50% performance reduction for certain gemm operations (but not others?). See #644 for example.

commit 4dde947e2ec9e139c162801320c94e6a01a39708
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 26 17:29:32 2022 -0500

    Fixed out-of-bounds bug in sup s6x16m haswell kernel.
    
    Details:
    - Fixed another out-of-bounds read access bug in the haswell sup
      assembly kernels. This bug is similar to the one fixed in 17b0caa
      and affects bli_sgemmsup_rv_haswell_asm_6x2m(). Thanks to Madeesh
      Kannan for reporting this bug (and a suitable fix) in #635.
    - CREDITS file update.

commit 6826c1cdfba855513786d9e3d606681316453398
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jul 25 18:21:05 2022 -0500

    Add `#line` directives to flattened `blis.h`. (#643)
    
    Details:
    - Modified flatten-headers.py so that #line directives are inserted into
      the flattened blis.h file. This facilitates easier debugging when
      something is amiss in the flattened blis.h because the compiler will
      be able to refer to the line number within the original constituent
      header file (which is where the fix would go) rather than the line
      number within the flattened header (which is not as helpful).

commit af3a41e02534befdae026377592ce437bab83023
Author: Alexander Grund <Flamefire@users.noreply.github.com>
Date:   Thu Jul 21 18:05:48 2022 +0200

    Add autodetection for POWER7, POWER9 & POWER10 (#647)
    
    Read from `/proc/cpuinfo` as done for ARM.
    Fixes #501

commit 17b0caa2b2bff439feb6d2b39cfa16e7591882b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 14 17:55:34 2022 -0500

    Fixed out-of-bounds read in haswell gemmsup kernels.
    
    Details:
    - Fixed memory access bugs in the bli_sgemmsup_rv_haswell_asm_Mx2()
      kernels, where M = {1,2,3,4,5,6}. The bugs were caused by loading four
      single-precision elements of C, via instructions such as:
    
            vfmadd231ps(mem(rcx, 0*32), xmm3, xmm4)
    
      in situations where only two elements are guaranteed to exist. (These
      bugs may not have manifested in earlier tests due to the leading
      dimension alignment that BLIS employs by default.) The issue was fixed
      by replacing lines like the one above with:
    
            vmovsd(mem(rcx), xmm0)
            vfmadd231ps(xmm0, xmm3, xmm4)
    
      Thus, we use vmovsd to explicitly load only two elements of C into
      registers, and then operate on those values using register addressing.
      Thanks to Daniël de Kok for reporting these bugs in #635, and to
      Bhaskar Nallani for proposing the fix).
    - CREDITS file update.

commit cc260fd7068f0fe449d818435aa11adb14c17fed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 13 16:16:01 2022 -0500

    Allow uniform max problem sizes in test/3/runme.sh.
    
    Details:
    - Tweaked test/3/runme.sh so that the test driver binaries for single-
      threaded (st), single-socket (1s), and dual-socket (2s) execution can
      be built using identical problem size ranges. Previously, this was not
      possible because runme.sh used the maximum problem size, which was
      embedded into the binary filename, to tell the three classes of
      binaries apart from one another. Now, runme.sh uses the binary suffix
      ("st", "1s", or "2s") to tell them apart. This required only a few
      changes to the logic, but it also required a change in format to the
      threading config strings themselves (replacing the max problem size
      with "st", "1s", or "2s"). Thanks to Jeff Diamond for inspiring this
      improvement.
    - Comment updates.

commit 9b1beec60be31c6ea20b85806d61551497b699e4
Author: bartoldeman <bartoldeman@users.noreply.github.com>
Date:   Mon Jul 11 20:15:12 2022 -0400

    Use BLIS_ENABLE_COMPLEX_RETURN_INTEL in blastest files (#636)
    
    Details:
    - Fixed a crash that occurs when either cblat1 or zblat1 are linked
      with a build of BLIS that was compiled with '--complex-return=intel'.
      This fix involved inserting preprocessor macro guards based on
      BLIS_ENABLE_COMPLEX_RETURN_INTEL into blastest/src/cblat1.c and
      blastest/src/zblat1.c to correctly handle situations where BLIS is
      compiled with Intel/f2c-style calling conventions for complex numbers.
    - Updated blastest/src/fortran/run-f2c.sh so that future executions
      will insert the aforementioned cpp macro conditional where
      appropriate.

commit 98d467891b74021ace7f248cb0856bec734e39b6
Author: bartoldeman <bartoldeman@users.noreply.github.com>
Date:   Mon Jul 11 19:40:53 2022 -0400

    Change complex_return='intel' for ifx. (#637)
    
    Details:
    - When checking the version string of the Fortran compiler for the
      purposes of determining a default return convention for complex
      domain values, grep for "IFORT" instead of "ifort" since that string
      is common to both the 'ifx' and 'ifort' binaries provided by Intel:
    
        $ ifx --version
        ifx (IFORT) 2022.1.0 20220316
        Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
    
        $ ifort --version
        ifort (IFORT) 2021.6.0 20220226
        Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

commit ffde54cc5c334aca8eff4d6072ba49496bf3104c
Author: jdiamondGitHub <jeff_diamond@fastmail.com>
Date:   Mon Jul 11 16:47:30 2022 -0500

    Minor changes to .gitignore and LICENSE files. (#642)
    
    Details:
    - Macs create .DS_Store files in every directory visited. Updated
      .gitignore file so these files won't be reported as untracked by
      'git status'.
    - Added Oracle Corporation to the LICENSE file.
    - Updated UT copyright on behalf of SHPC.

commit 7cba7ce3dd1533fcc4ca96ac902bdf218686139a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 8 11:15:18 2022 -0500

    Minor cleanups, comment updates to bli_gks.c.
    
    Details:
    - Removed a redundant registration of 'a64fx' subconfig in
      bli_gks_init().
    - Reordered registration of 'armsve', 'a64fx', and 'firestorm'
      subconfigs. Thanks to Jeff Diamond for his input on this reordering.
    - Comment updates to bli_gks.c and arch_t enum in bli_type_defs.h.

commit 667f201b7871da68622027d02bd6b7da3262f8e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 7 16:44:21 2022 -0500

    Fixed type bug in bli_cntx_set_ukr_prefs().
    
    Details:
    - Fixed a bug in bli_cntx_set_ukr_prefs() which erroneously typecast the
      num_t value read from va_args() down to a bool before being stored
      within the cntx_t. This bug was introduced on April 6th 2022, in
      ae10d94. This caused the ukernel preferences for double real and
      double complex to go unchanged while the preferences for single real
      and single complex were corrupted by the former datatypes'
      preference values. The bug manifested as degraded performance for
      subconfigurations that registered column-preferential ukernels. The
      reason is that the erroneous preferences trigger unnecessary
      transpositions in the operation, which forces the gemm ukernel to
      compute on matrices that are not stored according to its preference.
      Thanks to Devin Matthews, Jeff Diamond, and Leick Robinson for their
      extensive efforts and assistance in tracking down this issue.
    - Augmented the informational header that is output by the testsuite to
      include ukernel preferences for gemm, gemmtrsm_[lu], and trsm_[lu].
    - CREDITS file update.

commit d429b6bfced21a63bf711224ac402f93f0080b52
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Tue Jun 28 15:34:10 2022 -0500

    Support clang targetting MinGW (#639)
    
    * Support clang targetting MinGW
    
    * Fix pthread linking

commit d93df023348144e091f7b3e3053995648f348aa7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 15 14:09:49 2022 -0500

    Removed unused dt arg in bli_gks_query_ind_cntx().
    
    Details:
    - Removed the num_t datatype argument from bli_gks_query_ind_cntx().
      This argument stopped being needed by the function in commit e9da642.
      Its only use in bli_gks_query_ind_cntx() was to be passed through to
      the context initialization function for the chosen induced method,
      but even then, commit log notes from e9da642 indicate that I could not
      recall why the datatype argument was ever needed by the context init
      function to begin with.
    - Updated all invocations of bli_gks_query_ind_cntx() to omit the dt
      argument. Most of these invocations resided in various standalone test
      drivers (and the testsuite).

commit 56772892450cc92b3fbd6a9d0460153a43fc47ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 1 10:49:33 2022 -0500

    Added SMU citation to README.md intro.
    
    Details:
    - Added a citation to SMU and the Matthews Research Group to the general
      attribution of maintainership and development in the Introduction of
      the README.md file. Thanks to Robert van de Geijn and Devin Matthews
      for suggesting this change.

commit 4603324eb090dfceaad3693a70b2d60544036aa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 19 14:07:03 2022 -0500

    Init/finalize via bli_pthread_switch_t API (#634).
    
    Details:
    - Defined and implemented a new pthread-like abstract datatype and API
      in bli_pthread.c. The new type, bli_pthread_switch_t, is similar to
      bli_pthread_once_t in some respects. The idea is that like a switch in
      your home that controls a light or ceiling fan, it can either be on or
      off. The switch starts in the off state. Moving from one state to the
      other (on to off; off to on) causes some action (i.e., a startup or
      shutdown function) to be executed. Trying to move from one state to
      the same state (on to on; off to off) is safe in that it results in
      no action. Unlike bli_pthread_once(), the API for bli_pthread_switch_t
      contains both _on() and _off() interfaces. Also, unlike the _once()
      function, the _on() and _off() functions return error codes so that
      the 'int' error code returned from the startup or shutdown functions
      may be passed back to the caller. Thanks to Devin Matthews for his
      input and feedback on this feature.
    - Replaced the previous implementation of bli_init_once() and
      bli_finalize_once() -- both of which used bli_pthread_once() -- with
      ones that rely upon bli_pthread_switch_on() and _switch_off(),
      respectively. This also required updating the return types of
      _init_apis() and _finalize_apis() to match the function pointer type
      required by bli_pthread_switch_on()/_switch_off().
    - Comment updates.

commit 64a9b061f6032e2b59613aecdbe7bb52161605c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 10 14:54:22 2022 -0500

    Fixed misspelling of 'xpbys' in gemm macrokernel.
    
    Details:
    - Fixed a functionally harmless typo in bli_gemm_ker_var2.c where a few
      instances of the substring "xpbys" were misspelled as "xbpys". The
      misspellings were harmless because they were consistent, and because
      they referenced only local symbols.

commit 1c733402a95ab08b20f3332c2397fd52a2627cf6
Author: Jed Brown <jed@jedbrown.org>
Date:   Thu Apr 28 11:58:44 2022 -0600

    Fix version check for znver3, which needs gcc >= 10.3 (#628)
    
    Apple's clang-12 lacks znver3 support, unlike upstream clang-12.

commit 6431c9e13b86e4442b6aacba18a0ace12288c955
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 14 13:01:24 2022 -0500

    Added missing 'const' to zen bli_gemm_small.c.
    
    Details:
    - Added missing 'const' qualifiers to signatures of functions defined in
      kernels/zen/3/bli_gemm_small.c. This fixes compile-time errors when
      targeting 'zen3' subconfig (which apparently is enabling AMD's
      gemm_small code path by default). Thanks to Devin Matthews for
      reporting this error.

commit 9fea633748ed27ef3853bba7cd955690c61092b4
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Apr 13 15:59:06 2022 -0500

    Partial addition of 'const' to all interfaces above the (micro)kernels. (#625)
    
    Details:
    - Added 'const' qualifier to applicable function arguments wherever the
      the pointed-to object is not internally modified. This change affects
      all interfaces that reside above the level of the (micro)kernels.
    - Typecast certain function return values to discard 'const' qualifier.
    - Removed 'restrict' from various arguments, including cntx_t*,
      auxinfo_t*, rntm_t*, thrinfo_t*, mem_t*, and others
    - Removed parts of some APIs, such as bli_cntx_*(), due to limited use.
    - Merged some variable declarations with their corresponding
      initialization statements.
    - Whitespace changes.

commit ae10d9495486f589ed0320f0151b2d195574f1cf
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Apr 6 20:31:11 2022 -0500

    Simplify and rewrite reference packm kernels. (#610)
    
    Details:
    - Reorganized the way kernels are stored within the cntx_t structure so
      that rather than having a function pointer for every supported size of
      unrolled packm kernel (2xk, 3xk, 4xk, etc.), we store only two packm
      kernels per datatype: one to pack MRxk micropanels and one to pack
      NRxk micropanels.
      - NOTE: The "bb" (broadcast B) reference kernels have been merged into
        the "standard" kernels (packm [including 1er and unpackm], gemm,
        trsm, gemmtrsm). This replication factor is controlled by
        BLIS_BB[MN]_[sdcz] etc. Power9/10 needs testing since only a
        replication factor of 1 has been tested. armsve also needs testing
        since the MR value isn't available as a macro.
    - Simplified the bli_cntx_*() APIs to conform to the new unified kernel
      array within the cntx_t. Updated existing bli_cntx_init_<subconfig>()
      function definitions for all subconfigurations.
    - Consolidated all kernel id types (e.g. l1vkr_t, l1mkr_t, l3ukr_t,
      etc.) into one kernel id type: ukr_t.
    - Various edits, updates, and rewrites of reference kernels pursuant to
      the aforementioned changes.
    - Define compile-time macro constants (BLIS_MR_[sdcz], BLIS_NR_[sdcz],
      and friends) in bli_kernel_macro_defs.h, but only when the macro
      BLIS_IN_REF_KERNEL is defined by the build system.
    - Loose ends:
      - Still need to update documentation, including:
        - docs/ConfigurationHowTo.md
        - docs/KernelsHowTo.md
        to reflect changes made in this commit.

commit b3e674db3c05ca586b159a71deb1b61d701ae5c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 4 17:31:02 2022 -0500

    README.md update to link to releases page.

commit 69fa915464c52f09a5971a60f521900d31a34e69
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 1 08:47:46 2022 -0500

    Fixed broken "tagged releases" link in README.md.

commit 88cab8383ca90ddbb4cf13e69b7d44a1663a4425
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 1 08:12:06 2022 -0500

    CHANGELOG update (0.9.0)

commit 14c86f66b20901b60ee276da355c1b62642c18d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 1 08:12:06 2022 -0500

    Version file update (0.9.0)

commit 99bb9002f1aff598d347eae2821a3f7bdd1f48e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 1 08:10:59 2022 -0500

    ReleaseNotes.md update in advance of next version.

commit bee7678b2558a691ac850819dbe33fefe4fdbee3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 31 14:09:39 2022 -0500

    CREDITS file update.

commit cf06364327bd2d21d606392371ff3c5962bee5ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 29 16:18:25 2022 -0500

    Fixed typo in BLAS gemm3m call to _check().
    
    Details:
    - Fixed an unresolved symbol issue leftover from #590 whereby ?gemm3m_()
      as defined in bla_gemm3m.c was referencing bla_gemm3m_check(), which
      does not exist. It should have simply called the _check() function for
      gemm.

commit 1ec020b33ece1681c0041e2549eed2bd4c6cf356
Author: Dipal M Zambare <71366780+dzambare@users.noreply.github.com>
Date:   Wed Mar 30 02:45:36 2022 +0530

    AMD kernel updates; frame-specific AMD updates. (#597)
    
    Details:
    - Allow building BLIS with certain framework files (each with the '_amd'
      suffix) that have been customized by AMD for Zen-based hardware. These
      customized files were derived from portable versions of the same files
      (i.e., those without the '_amd' suffix). Whether the portable or AMD-
      specific files are compiled is now controlled by a new configure
      option, --[en|dis]able-amd-frame-tweaks. This option is disabled by
      default in vanilla BLIS, though AMD may choose to enable it by default
      in their fork. For now, the added AMD-specific files are:
      - bli_gemv_unf_var2_amd.c
      - bla_copy_amd.c
      - bla_gemv_amd.c
      These files reside in 'amd' subdirectories found within the directory
      housing their generic counterparts.
    - Register optimized real-domain copyv, setv, and swapv kernels in
      bli_cntx_init_zen.c.
    - Various minor updates to level-1v kernels in 'zen' kernel set.
    - Added caxpyf kernel as well as saxpyf and multiple daxpyf kernels to
      the 'zen' kernel set
    - If the problem passed to ?gemm_() in bla_gemm.c has a unit m or n dim,
      call gemv instead and return early.
    - Combined variable declarations with their initialization in various
      level-2 and level-3 BLAS compatibility files, and also inserted
      'const' qualifer in those same declaration statements.
    - Moved frame/compat/bla_gemmt.c and .h to frame/compat/extra/ .
    - Added copyv and swapv test drivers to 'test' directory.
    - Whitespace, comment changes.

commit 0db2bd5341c5c3ed5f1cc2bffa90952735efa45f
Author: Bhaskar Nallani <Nallani.Bhaskar@amd.com>
Date:   Fri Mar 25 05:11:55 2022 +0530

    Added BLAS/CBLAS APIs for gemm3m. (#590)
    
    Details:
    - Created ?gemm3m_() and cblas_?gemm3m() APIs that (for now) simply
      invoke the 1m implementation unconditionally. (Note that these APIs
      bypass sup handling.)
    - Added BLAS prototypes for gemm3m in frame/compat/bla_gemm3m.h.
    - Added CBLAS prototypes for gemm3m in frame/compat/cblas/src/cblas.h.
    - Relocated:
        frame/compat/cblas/src/cblas_?gemmt.c
      files into
        frame/compat/cblas/src/extra/
    - Relocated frame/compat/bla_gemmt.? into frame/compat/extra/ .
    - Minor reorganization of prototypes and cpp macro directives in
      bli_blas.h, cblas.h, and cblas_f77.h.
    - Trival whitespace change to cblas_zgemm.c.

commit d6810000e961fe807dc5a7db81180a8355f3eac0
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Mar 14 10:29:54 2022 -0500

    Update Multithreading.md
    
    Add notes about `BLIS_IR_NT` (should typically be 1) and `BLIS_JR_NT` (should typically be small, e.g. <= 4). [ci skip]

commit f1dbb0e514f53a3240d3a6cbdc3306b01a2206f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 11 13:38:28 2022 -0600

    Trival whitespace change; commit log addendum.
    
    Details:
    - A co-attribution to Mithun Mohan was inadvertently omitted from the
      commit log for headline change in the previous commit, 7c07b47.

commit 7c07b477e432adbbce5812ed9341ba3092b03976
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 11 13:28:50 2022 -0600

    Avoid gemmsup barriers when not packing A or B. (#622)
    
    Details:
    - Implemented a multithreaded optimization for the special (and common)
      case of employing the gemmsup code path when the user requests
      (implicitly or explicitly) that neither A nor B be packed during
      computation. This optimization takes the form of a greatly reduced
      code branch in bli_thrinfo_sup_create_for_cntl(), which avoids a
      broadcast and two barriers, and results in higher performance when
      obtaining two-way or higher parallelism within BLIS. Thanks to
      Bhaskar Nallani of AMD for proposing this change via issue #605.
    - Added an early return branch to bli_thrinfo_create_for_cntl() that
      detects and quickly handles cases where no parallelism is being
      obtained within BLIS (i.e., single-threaded execution). Note that
      this special case handling was/is already present in
      bli_thrinfo_sup_create_for_cntl().
    - CREDITS file update.

commit cad10410b2305bc0e328c5f2517ab02593b53428
Author: Ivan Korostelev <ivan23kor@gmail.com>
Date:   Thu Mar 10 09:58:14 2022 -0600

    POWER10: edge cases in microkernel (#620)
    
    Use new API for POWER10 gemm microkernel

commit 71851a0549276b17db18a0a0c8ab4f54493bf033
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 8 17:38:09 2022 -0600

    Fixed level-3 performance bug in haswell ukernels.
    
    Details:
    - Fixed a performance regression affecting nearly all level-3 operations
      that use the 'haswell' sgemm and dgemm microkernels. This regression
      was introduced in 54fa28b, caused by an ill-formed conditional
      expression in the assembly code that controls whether cache lines of C
      should be prefetched as rows or as columns. Essentially, the two
      branches were reversed, causing incomplete prefetching to occur for
      both row- and column-stored instances of matrix C. Thanks to Devin
      Matthews for his help finding and fixing this bug.

commit 84732bf95634ac606c5f2661d9474318e366c386
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 28 12:19:31 2022 -0600

    Revamp how tools are handled/checked by configure.
    
    Details:
    - Consolidate handling of tools that are specifiable via CC, CXX, FC,
      PYTHON, AR, and RANLIB into one bash function, select_tool_w_env().
      - If the user specifies a tool via an environment variable (e.g.
        CC=gcc) and that tool does not seem valid, print an error message
        and abort configure, unless the tool is optional (e.g. CXX or FC),
        in which case a warning message is printed instead.
      - The definition of "seems valid" above amounts to:
        - responding to at least one of a basic set of command line options
          (e.g. --version, -V, -h) if the os_name is Linux (since GNU tools
          tend to respond to flags such as --version) or if the tool in
          question is CC, CXX, FC, or PYTHON (which tend to respond to the
          expected flags regardless of OS)
        - the binary merely existing for AR and RANLIB on Darwin/OSX/BSD.
          (These OSes tend to have non-GNU versions of ar and ranlib, which
          typically do not respond to --version and friends.)
    - This PR addresses #584. Thanks to Devin Matthews for suggesting some
      of the changes in this commit.

commit d5146582b1f1bcdccefe23925d3b114d40cd7e31
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Feb 23 03:35:46 2022 +0900

    ArmSVE Ensure Non-zero Block Size (#615)
    
    Fixes #613. There are several macros/environment variables which need to be tuned to get good cache block sizes. It would be nice to have a way of getting values automatically.

commit 4d8352309784403ed6719528968531ffb4483947
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Feb 23 01:03:47 2022 +0900

    Add armsve to arm64 Metaconfig (#614)
    
    Availability of the `armsve` subconfig is controlled by the compiler version (gcc/clang). Tested for SVE and non-SVE. Fixes #612.

commit c9700f369aa84fc00f36c4b817ffb7dab72b865d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 15 15:36:52 2022 -0600

    Renamed SIMD-related macro constants for clarity.
    
    Details:
    - Renamed the following macros defined in bli_kernel_macro_defs.h:
    
        BLIS_SIMD_NUM_REGISTERS -> BLIS_SIMD_MAX_NUM_REGISTERS
        BLIS_SIMD_SIZE          -> BLIS_SIMD_MAX_SIZE
    
      Also updated all instances of these macros elsewhere, including
      subconfigurations, source code, and documentation. Thanks to Devin
      Matthews for suggesting this change.

commit ee9ff988c49f16696679d4c6cd3dcfcac7295be7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 15 15:01:51 2022 -0600

    Move edge cases to gemmtrsm ukrs; doc updates.
    
    Details:
    - Moved edge-case handling into the gemmtrsm microkernel. This required
      changing the microkernel API to take m and n dimension parameters as
      well as updating all existing gemmtrsm microkernel function pointer
      types, function signatures, and related definitions to take m and n
      dimensions. Also updated all existing gemmtrsm kernels in the
      'kernels' directory (which for now is limited to haswell and penryn
      kernel sets, plus native and 1m-based reference kernels in
      'ref_kernels') to take m and n dimensions, and implemented edge-case
      handling within those microkernels via a collection of new C
      preprocessor macros defined within bli_edge_case_macro_defs.h. Note
      that the edge-case handling for gemm-like operations had already
      been relocated into the gemm microkernel in 54fa28b.
    - Added desriptive comments to GEMM_UKR_SETUP_CT() and related macros in
      bli_edge_case_macro_defs.h to allow for easier reading.
    - Updated docs/KernelsHowTo.md to reflect above changes. Also cleaned up
      the bullet under "Implementation Notes for gemm" that covers alignment
      issues. (Thanks to Ivan Korostelev for pointing out the confusing and
      outdated language in issue #591.)
    - Other minor tweaks to KernelsHowTo.md.

commit 25061593460767221e1066f9d720fa6676bbed8f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Feb 13 20:11:55 2022 -0600

    Don't use `-Wl,-flat-namespace`.
    
    Flat namespaces can cause problems due to conflicting system libraries,
    etc., so just mark `xerbla_` as a weak symbol on macOS instead.

commit 5a4d3f5208d3d8cc1827f8cc90414c764b7ebab3
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Feb 13 17:28:30 2022 -0600

    Use -flat_namespace option to link on macOS
    
    Fixes #611.

commit 26742910a087947780a089360e2baf82ea109e01
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Feb 13 16:53:45 2022 -0600

    Update CC_VENDOR logic
    
    Look for `GCC` in addition to `gcc` to handle weird conda version strings. [ci skip]

commit 2f3872e01d51545c687ae2c8b2650e00552111a7
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Mon Feb 7 17:14:49 2022 +0900

    ArmSVE Adopts Label Wrapper
    
    For clang (& armclang?) compilation.
    
    Hopefully solves #609 .

commit 72089bb2917b78d99cf4f27c69125bf213ee54e6
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Feb 5 16:56:04 2022 +0900

    ArmSVE Use Predicate in M-Direction
    
    No need to query MR during kernel runtime.

commit 9cc897f37455d52fbba752e3801f1a9d4a5bfdc1
Author: Ruqing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Feb 3 16:40:02 2022 +0000

    Fix SVE Compil.

commit b5df1811f1bc8212b2cda6bb97b79819afe236a8
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Feb 3 02:31:29 2022 +0900

    Armv8a, ArmSVE: Simplify Gen-C

commit 35195bb5cea5d99eb3eaf41e3815137d14ceb52d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jan 31 10:29:50 2022 -0600

    Add armclang detection to configure.
    
    armclang is treated as regular clang. Fixes #606. [ci skip]

commit 0be9282cdccf73342d8571d3f7971a9b0af72363
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 26 17:46:24 2022 -0600

    Updated zen3 macro constant names.
    
    Details:
    - In config/zen3/bli_family_zen3.h, renamed:
        BLIS_SMALL_MATRIX_A_THRES_M_GEMMT -> _M_SYRK
        BLIS_SMALL_MATRIX_A_THRES_N_GEMMT -> _N_SYRK
      Thanks to Jeff Diamond for helping spot the stale _SYRK naming.

commit 0ab20c0e72402ba0b17fe2c3ed3e16bf2ace0fd3
Author: Jeff Hammond <jehammond@nvidia.com>
Date:   Thu Jan 13 07:29:56 2022 -0800

    the Apple local label thing is required by Clang in general
    
    @egaudry and I both saw this issue on Linux with Clang 10.
    
    ```
    Compiling obj/thunderx2/kernels/armv8a/3/sup/bli_gemmsup_rv_armv8a_asm_d4x8m.o ('thunderx2' CFLAGS for kernels)
    kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c:171:49: fatal error: invalid symbol redefinition
            "                                            \n\t"
                                                           ^
    <inline asm>:90:5: note: instantiated into assembly here
               .SLOOPKITER:
               ^
    1 error generated.
    ```
    
    Signed-off-by: Jeff Hammond <jehammond@nvidia.com>

commit 81f93be0561c705ae6823d19e40849facc40bef7
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jan 10 10:19:47 2022 -0600

    Fix row-/column-major pref. in 16x8 haswell sgemm ukr (unused)

commit 268ce1f29a717d18304713ecc25a2eafe41838c7
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jan 10 10:17:17 2022 -0600

    Relax alignment constraints
    
    Remove alignment of temporary AB buffer in edge case handling macros unless alignment is specifically requested (e.g. Core2, SDB/IVB). Fixes #595.

commit 3f2440b0226d5e23a43d12105d74aa917cd6c610
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 6 14:57:36 2022 -0600

    Added m, n dims to gemmd/gemmlike ukernel calls.
    
    Details:
    - Updated the gemmd addon and the gemmlike sandbox code to use the new
      microkernel calling sequence, which now includes m and n dimensions so
      that the microkernel has all the information necessary to handle edge
      cases. Thanks to Jeff Diamond for catching this, which ideally would
      have been included in commit 54fa28b.
    - Retired var2 of both gemmd and gemmlike to 'attic' directories and
      removed their corresponding prototypes. In both cases, var2 was a
      variant of the block-panel algorithm where edge-case handling was
      abstracted away to a microkernel wrapper. (Since this is now the
      official behavior of BLIS microkernels, I saw no need to have it
      included as a separate code path.)
    - Comment updates.

commit 864bfab4486ac910ef9a366e9ade4b45a39747fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 4 15:10:34 2022 -0600

    CREDITS file update.

commit 466b68a3ad118342dc49a8130b7b02f5e7748521
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun Jan 2 14:59:41 2022 -0600

    Add unique tag to branch labels for Apple ARM64.
    
    Add `%=` tag to branch labels, which expands to a unique identifier for each inline assembly block. This prevents duplicate symbol errors on Apple Silicon (#594). Fixes #594. [ci skip] since we can't test Apple Silicon anyways...

commit 08174a2f6ebbd8ed5aa2bc4edc45da80962f06bb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Jan 1 21:35:19 2022 +0900

    Evict <arm_sve.h> Requirement for SVE GEMM
    
    For 8<= GCC < 10 compatibility.

commit 54fa28bd847b389215cffb57a83dc9b3dce79c86
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Dec 24 08:00:33 2021 -0600

    Move edge cases to gemm ukr; more user-custom mods. (#583)
    
    Details:
    - Moved edge-case handling into the gemm microkernel. This required
      changing the microkernel API to take m and n dimension parameters.
      This required updating all existing gemm microkernel function pointer
      types, function signatures, and related definitions to take m and n
      dimensions. We also updated all existing kernels in the 'kernels'
      directory to take m and n dimensions, and implemented edge-case
      handling within those microkernels via a collection of new C
      preprocessor macros defined within bli_edge_case_macro_defs.h. Also
      removed the assembly code that formerly would handle general stride
      IO on the microtile, since this can now be handled by the same code
      that does edge cases.
    - Pass the obj_t.ker_fn (of matrix C) into bli_gemm_cntl_create() and
      bli_trsm_cntl_create(), where this function pointer is used in lieu of
      the default macrokernel when it is non-NULL, and ignored when it is
      NULL.
    - Re-implemented macrokernel in bli_gemm_ker_var2.c to be a single
      function using byte pointers rather that one function for each
      floating-point datatype. Also, obtain the microkernel function pointer
      from the .ukr field of the params struct embedded within the obj_t
      for matrix C (assuming params is non-NULL and contains a non-NULL
      value in the .ukr field). Communicate both the gemm microkernel
      pointer to use as well as the params struct to the microkernel via
      the auxinfo_t struct.
    - Defined gemm_ker_params_t type (for the aforementioned obj_t.params
      struct) in bli_gemm_var.h.
    - Retired the separate _md macrokernel for mixed datatype computation.
      We now use the reimplemented bli_gemm_ker_var2() instead.
    - Updated gemmt macrokernels to pass m and n dimensions into microkernel
      calls.
    - Removed edge-case handling from trmm and trsm macrokernels.
    - Moved most of bli_packm_alloc() code into a new helper function,
      bli_packm_alloc_ex().
    - Fixed a typo bug in bli_gemmtrsm_u_template_noopt_mxn.c.
    - Added test/syrk_diagonal and test/tensor_contraction directories with
      associated code to test those operations.

commit 961d9d509dd94f3a66f7095057e3dc8eb6d89839
Author: Kiran <kiran.varaganti@amd.com>
Date:   Wed Dec 8 03:00:38 2021 +0530

    Re-add BLIS_ENABLE_ZEN_BLOCK_SIZES macro for 'zen'.
    
    Details:
    - Added previously-deleted cpp macro block to bli_cntx_init_zen.c
      targeting the Naples microarchitecture that enabled different cache
      blocksizes when the number of threads exceeds 16. This commit
      represents PR #573.

commit cf7d616a2fd58e293b496770654040818bf5609c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Dec 2 17:10:03 2021 -0600

    Enable user-customized packm ukernel/variant. (#549)
    
    Details:
    - Added four new fields to obj_t: .pack_fn, .pack_params, .ker_fn, and
      .ker_params. These fields store pointers to functions and data that
      will allow the user to more flexibly create custom operations while
      recycling BLIS's existing partitioning infrastructure.
    - Updated typed API to packm variant and structure-aware kernels to
      replace the diagonal offset with panel offsets, and changed strides
      of both C and P to inc/ldim semantics. Updated object API to the packm
      variant to include rntm_t*.
    - Removed the packm variant function pointer from the packm cntl_t node
      definition since it has been replaced by the .pack_fn pointer in the
      obj_t.
    - Updated bli_packm_int() to read the new packm variant function pointer
      from the obj_t and call it instead of from the cntl_t node.
    - Moved some of the logic of bli_l3_packm.c to a new file,
      bli_packm_alloc.c.
    - Rewrote bli_packm_blk_var1.c so that it uses byte (char*) pointers
      instead of typed pointers, allowing a single function to be used
      regardless of datatype. This obviated having a separate implementation
      in bli_packm_blk_var1_md.c. Also relegated handling of scalars to a
      new function, bli_packm_scalar().
    - Employed a new standard whereby right-hand matrix operands ("B") are
      always packed as column-stored row panels -- that is, identically to
      that of left-hand matrix operands ("A"). This means that while we pack
      matrix A normally, we actually pack B in a transposed state. This
      allowed us to simplify a lot of code throughout the framework, and
      also affected some of the logic in bli_l3_packa() and _packb().
    - Simplified bli_packm_init.c in light of the new B^T convention
      described above. bli_packm_init()--which is now called from within
      bli_packm_blk_var1()--also now calls bli_packm_alloc() and returns
      a bool that indicates whether packing should be performed (or
      skipped).
    - Consolidated bli_gemm_int() and bli_trsm_int() into a bli_l3_int(),
      which, among other things, defaults the new .pack_fn field of the
      obj_t to bli_packm_blk_var1() if the field is NULL.
    - Defined a new function, bli_obj_reset_origin(), which permanently
      refocuses the view of an object so that it "forgets" any offsets from
      its original pointer. This function also sets the object's root field
      to itself. Calls to bli_obj_reset_origin() for each matrix operand
      appear in the _front() functions, after the obj_t's are aliased. This
      resetting of the underlying matrices' origins is needed in preparation
      for more advanced features from within custom packm kernels.
    - Redefined bli_pba_rntm_set_pba() from a regular function to a static
      inline function.
    - Updated gemm_ukr, gemmtrsm_ukr, and trsm_ukr testsuite modules to use
      libblis_test_pobj_create() to create local packed objects. Previously,
      these packed objects were created by calling lower-level functions.

commit e229e049ca08dfbd45794669df08a71dba892925
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 1 17:36:22 2021 -0600

    Added recu-sed.sh script to 'build' directory.
    
    Details:
    - Added a recursive sed script to the 'build' directory.

commit 12c66a4acc77bf4927b01e2358e2ac10b61e0a53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 19 14:43:53 2021 -0600

    Minor updates to README.md, docs/Addons.md.
    
    Details:
    - Add additional mentions of addons to README.md, including in the
      "What's New" section.
    - Removed mention of sandboxes from the long list of advantages
      provided by BLIS.
    - Very minor description update to opening line of Addons.md.

commit a4bc03b990fe0572001eb6409efd12cd70677dcf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 19 13:29:00 2021 -0600

    Brief mention/link to Addons.md in README.md.
    
    Details:
    - Add a blurb about the new addons feature to the "Documentation for
      BLIS developers" section of the README.md, which also links to the
      Addons.md document.

commit b727645eb7a8df39dee74068f734da66322fe0b3
Merge: 9be97c15 7bde468c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 19 13:22:09 2021 -0600

    Merge branch 'dev'

commit 9be97c150e19fa58bca30cb993a6509ae21e2025
Author: Madan mohan Manokar <86282872+madanm3@users.noreply.github.com>
Date:   Thu Nov 18 00:46:46 2021 +0530

    Support all four dts in test/test_her[2][k].c (#578)
    
    Details:
    - Replaced the hard-coded calls to double-precision real syr, syr2,
      syrk, and syrk in the corresponding standalone test drivers in the
      'test' directory with conditional branches that will call the
      appropriate BLAS interface depending on which datatype is enabled.
      Thanks to Madan mohan Manokar for this improvement.
    - CREDITS file update.

commit 26e4b6b29312b472c3cadf95ccdf5240764777f4
Author: Dipal M Zambare <71366780+dzambare@users.noreply.github.com>
Date:   Thu Nov 18 00:32:00 2021 +0530

    Added support for AMD's Zen3 microarchitecture.
    
    Details:
    - Added a new 'zen3' subconfiguration targeting support for the AMD Zen3
      microarchitecture (#561). Thanks to AMD for this contribution.
    - Restructured clang and AOCC support for zen, zen2, and zen3
      make_defs.mk files. The clang and AOCC version detection now happens
      in configure, not in the subconfigurations' makefile fragments. That
      is, we've added logic to configure that detects the version of
      clang/AOCC, outputs an appropriate variable to config.mk
      (ie: CLANG_OT_*, AOCC_OT_*), and then checks for it within the
      makefile fragment (as is currently done for the GCC_OT_* variables).
    - Added configure support for a GCC_OT_10_1_0 variable (and associated
      substitution anchor) to communicate whether the gcc version is older
      than 10.1.0, and use this variable to check for recent enough versions
      of gcc to use -march=znver3 in the zen3 subconfig.
    - Inlined the contents of config/zen/amd_config.mk into the zen and zen2
      make_defs.mk so that the files are self-contained, harmonizing the
      format of all three Zen-based subconfigurations' make_defs.mk files.
    - Added indenting (with spaces) of GNU make conditionals for easier
      reading in zen, zen2, and zen3 make_defs.mk files.
    - Adjusted the range of models checked by bli_cpuid_is_zen() (which was
      previously 0x00 ~ 0xff and is now 0x00 ~ 0x2f) so that it is
      completely disjoint from the models checked by bli_cpuid_is_zen2()
      (0x30 ~ 0xff). This is normally necessary because Zen and Zen2
      microarchitectures share the same family (23, or 0x17), and so the
      model code is the only way to differentiate the two. But in our case,
      fixing the model range for zen *wasn't* actually necessary since we
      checked for zen2 first, and therefore the wide zen range acted like
      the 'else' of an 'if-else' statement. That said, the change helps
      improve clarity for the reader by encoding useful knowledge, which
      was obtained from https://en.wikichip.org/wiki/amd/cpuid .
    - Added zen2.def and zen3.def files to the collection in travis/cpuid.
      Note that support for zen, zen2, and zen3 is now present, and while
      all the three microarchitectures have identical instruction sets from
      the perspective of BLIS microkernels, they each correspond to
      different subconfigurations and therefore merit separate testing.
      Thanks to Devin Matthews for his guidance in hacking these files as
      slight modifications of zen.def.
    - Enabled testing of zen2 and zen3 via the SDE in travis/do_sde.sh.
      Now, zen, zen2, and zen3 are tested through the SDE via Travis CI
      builds.
    - Updated travis/do_sde.sh to grab the SDE tarball from a new ci-utils
      repository on GitHub rather than on Intel's website. This change was
      made in an attempt to circumvent recent troubles with Travis CI not
      being able to download the SDE directly from Intel's website via curl.
      Thanks to Devin Matthews for suggesting the idea.
    - Updated travis/do_sde.sh to grab the latest version (8.69.1) of the
      Intel SDE from the flame/ci-utils repository.
    - Updated .travis.yml to use gcc 9. The file was previously using gcc 8,
      which did not support -march=znver2.
    - Created amd64_legacy umbrella family in config_registry for targeting
      older (bulldozer, piledriver, steamroller, and excavator)
      microarchitectures and moved those same subconfigs out of the amd64
      umbrella family. However, x86_64 retains amd64_legacy as a constituent
      member.
    - Fixed a bug in configure related to the building of the so-called
      config list. When processing the contents of config_registry,
      configure creates a series of structures and lists that allow for
      various mappings related to configuration families, subconfigs, and
      kernel sets. Two of those lists are built via substitution of
      umbrella families with their subconfig members, and one of those
      lists was improperly performing the substitution in a way that would
      erroneously match on partial umbrella family names. That code was
      changed to match the code that was already doing the substitution
      properly, via substitute_words(). Also added comments noting the
      importance of using substitute_words() in both instances.
    - Comment updates.

commit 74c0c622216aba0c24aa2c3a923811366a160cf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 16 16:06:33 2021 -0600

    Reverted cbc88fe.
    
    Details:
    - Reverted the annotation of some markdown code blocks with 'bash'
      after realizing that the in-browser syntax highlighting was not
      worthwhile.

commit cbc88feb51b949ce562d044cf9f99c4e46bb8a39
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 16 16:02:39 2021 -0600

    Marked some markdown shell code blocks as 'bash'.
    
    Details:
    - Annotated the code blocks that represent shell commands and output as
      'bash' in README.md and BuildSystem.md.

commit 78cd1b045155ddf0b9ec6e2ab815f2b216ad9a9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 16 15:53:40 2021 -0600

    Added 'Example Code' section to README.md.
    
    Details:
    - Inserted a new 'Example Code' section into the README.md immediately
      after the 'Getting Started' section. Thanks to Devin Matthews for
      recommending this addition.
    - Moved the 'Performance' section of the README down slightly so that it
      appears after the 'Documentation' section.

commit 7bde468c6f7ecc4b5322d2ade1ae9c0b88e6b9f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 13 16:39:37 2021 -0600

    Added support for addons.
    
    Details:
    - Implemented a new feature called addons, which are similar to
      sandboxes except that there is no requirement to define gemm or any
      other particular operation.
    - Updated configure to accept --enable-addon=<name> or -a <name> syntax
      for requesting an addon be included within a BLIS build. configure now
      outputs the list of enabled addons into config.mk. It also outputs the
      corresponding #include directives for the addons' headers to a new
      companion to the bli_config.h header file named bli_addon.h. Because
      addons may wish to make use of existing BLIS types within their own
      definitions, the addons' headers must be included sometime after that
      of bli_config.h (which currently is #included before bli_type_defs.h).
      This is why the #include directives needed to go into a new top-level
      header file rather than the existing bli_config.h file.
    - Added a markdown document, docs/Addons.md, to explain addons, how to
      build with them, and what assumptions their authors should keep in
      mind as they create them.
    - Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
      as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
      functions, including the user-level object and typed APIs.
    - Updated .gitignore so that git ignores bli_addon.h files.

commit 7bc8ab485e89cfc6032932e57929e208a28f4be5
Author: Meghana-vankadari <74656386+Meghana-vankadari@users.noreply.github.com>
Date:   Fri Nov 12 04:16:14 2021 +0530

    Added BLAS/CBLAS APIs for axpby, gemm_batch. (#566)
    
    Details:
    - Expanded the BLAS compatibility layer to include support for
      ?axpby_() and ?gemm_batch_(). The former is a straightforward
      BLAS-like interface into the axpbyv operation while the latter
      implements a batched gemm via loops over bli_?gemm(). Also
      expanded the CBLAS compatibility layer to include support for
      cblas_?axpby() and cblas_?gemm_batch(), which serve as wrappers to
      the corresponding (new) BLAS-like APIs. Thanks to Meghana Vankadari
      for submitting these new APIs via #566.
    - Fixed a long-standing bug in common.mk that for some reason never
      manifested until now. Previously, CBLAS source files were compiled
      *without* the location of cblas.h being specified via a -I flag.
      I'm not sure why this worked, but it may be due to the fact that
      the cblas.h file resided in the same directory as all of the CBLAS
      source, and perhaps compilers implicitly add a -I flag for the
      directory that corresponds to the location of the source file being
      compiled. This bug only showed up because some CBLAS-like source code
      was moved into an 'extra' subdirectory of that frame/compat/cblas/src
      directory. After moving the code, compilation for those files failed
      (because the cblas.h header file, presumably, could not be found in
      the same location). This bug was fixed within common.mk by explicitly
      adding the cblas.h directory to the list of -I flags passed to the
      compiler.
    - Added test_axpbyv.c and test_gemm_batch.c files to 'test' directory,
      and updated test/Makefile to build those drivers.
    - Fixed typo in error message string in cblas_sgemm.c.

commit 28b0982ea70c21841fb23802d38f6b424f8200e1
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Nov 10 12:34:50 2021 -0600

    Refactored her[2]k/syr[2]k in terms of gemmt. (#531)
    
    Details:
    - Renamed herk macrokernels and supporting files and functions to gemmt,
      which is possible since at the macrokernel level they are identical.
      Then recast herk/her2k/syrk/syr2k in terms of gemmt within the expert
      level-3 oapi (bli_l3_oapi_ex.c) while also redefining them as literal
      functions rather than cpp macros that instantiate multiple functions.
      Thanks to Devin Matthews for his efforts on this issue (#531).
    - Check that the maximum stack buffer size is sufficiently large
      relative to the register blocksizes for each datatype, and do so when
      the context is initialized rather than when an operation is called.
      Note that with this change, users who pass in their own contexts into
      the expert interfaces currently will *not* have any checks performed.
      Thanks to Devin Matthews for suggesting this change.

commit cfa3db3f3465dc58dbbd842f4462e4b49e7768b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 3 18:13:56 2021 -0500

    Fixed bug in mixed-dt gemm introduced in e9da642.
    
    Details:
    - Fixed a bug that broke certain mixed-datatype gemm behavior. This
      bug was introduced recently in e9da642 when the code that performs
      the operation transposition (for microkernel IO preference purposes)
      was moved up so that it occurred sooner. However, when I moved that
      code, I failed to notice that there was a cpp-protected "if"
      conditional that applied to the entire code block that was moved. Once
      the code block was relocated, the orphaned if-statement was now
      (erroneously) glomming on to the next thing that happened to be in the
      function, which happened to be the call to bli_rntm_set_ways_for_op(),
      causing a rather odd memory exhaustion error in the sba due to the
      num_threads field of the rntm_t still being -1 (because the rntm_t
      field were never processed as they should have been). Thanks to
      @ArcadioN09 (Snehith) for reporting this error and helpfully including
      relevant memory trace output.

commit f065a8070f187739ec2b34417b8ab864a7de5d7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 28 16:05:43 2021 -0500

    Removed support for 3m, 4m induced methods.
    
    Details:
    - Removed support for all induced methods except for 1m. This included
      removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
      code that existed only to support those implementations. These
      implementations were rarely used and posed code maintenance challenges
      for BLIS's maintainers going forward.
    - Removed reference kernels for packm that pack 3m and 4m micropanels,
      and removed 3m/4m-related code from bli_cntx_ref.c.
    - Removed support for 3m/4m from the code in frame/ind, then reorganized
      and streamlined the remaining code in that directory. The *ind(),
      *nat(), and *1m() APIs were all removed. (These additional API layers
      no longer made as much sense with only one induced method (1m) being
      supported.) The bli_ind.c file (and header) were moved to frame/base
      and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
      frame/3.
    - Removed 3m/4m support from the code in frame/1m/packm.
    - Removed 3m/4m support from trmm/trsm macrokernels and simplified some
      pointer arithmetic that was previously expressed in terms of the
      bli_ptr_inc_by_frac() static inline function (whose definition was
      also removed).
    - Removed the following subdirectories of level-0 macro headers from
      frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
      defined in these directories were used exclusively for 3m and 4m
      method codes.
    - Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
      light of 1m being the only induced method left within BLIS.
    - Removed dt_on_output field within auxinfo_t and its associated
      accessor functions.
    - Re-indexed the 1e/1r pack schemas after removing those associated with
      variants of the 3m and 4m methods. This leaves two bits unused within
      the pack format portion of the schema bitfield. (See bli_type_defs.h
      for more info.)
    - Spun off the basic and expert interfaces to the object and typed APIs
      into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
      and bli_l3_tapi_ex.c.
    - Moved the level-3 operation-specific _check function calls from the
      operations' _front() functions to the corresponding _ex() function of
      the object API. (This change roughly maintains where the _check()
      functions are called in the call stack but lays the groundwork for
      future changes that may come to the level-3 object APIs.) Minor
      modifications to bli_l3_check.c to allow the check() functions to be
      called from the expert interface APIs.
    - Removed support within the testsuite for testing the aforementioned
      induced methods, and updated the standalone test drivers in the 'test'
      directory so reflect the retirement of those induced methods.
    - Modified the sandbox contract so that the user is obliged to define
      bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
      of the *nat() functions no longer existing.) Also updated the existing
      'power10' and 'gemmlike' sandboxes to come into compliance with the
      new sandbox rules.
    - Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
      to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
      bring the document into alignment with new conventions.
    - Updated various comments; removed segments of commented-out code.

commit e8caf200a908859fa5f5ea2049911a9bdaa3d270
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 18 13:04:15 2021 -0500

    Updated do_sde.sh to get SDE from GitHub.
    
    Details:
    - Updated travis/do_sde.sh so that the script downloads the SDE tarball
      from a new ci-utils repository on GitHub rather than from Intel's
      website. This change is being made in an attempt to circumvent Travis
      CI's recent troubles with downloading the SDE from Intel's website via
      curl. Thanks to Devin Matthews for suggesting the idea.

commit 290ff4b1c26737b074d5abbf76966bc22af8c562
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 14 16:09:43 2021 -0500

    Disable SDE testing of old AMD microarchitectures.
    
    Details:
    - Skip testing on piledriver, steamroller, and excavator platforms
      in travis/do_sde.sh.

commit 514fd101742dee557e5eb43d0023a221ae8a7172
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 14 13:50:28 2021 -0500

    Fixed substitution bug in configure.
    
    Details:
    - Fixed a bug in configure related to the building of the so-called
      config list. When processing the contents of config_registry,
      configure creates a series of structures and list that allow for
      various mappings related to configuration families, subconfigs,
      and kernel sets. Two of those lists are built via subsitituion
      of umbrella families with their subconfig members, and one of
      those lists was improperly performing the subtitution in a way
      that would erroneously match on partial umbrella family names.
      That code was changed to match the code that was already doing
      the subtitution properly, via substitute_words().
    - Added comments noting the importance of using substitute_words()
      in both instances.

commit e9da6425e27a9d63c9fef92afc2dd750c601ccd7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 13 14:15:38 2021 -0500

    Allow use of 1m with mixing of row/col-pref ukrs.
    
    Details:
    - Fixed a bug that broke the use of 1m for dcomplex when the single-
      precision real and double-precision real ukernels had opposing I/O
      preferences (row-preferential sgemm ukernel + column-preferential
      dgemm ukernel, or vice versa). The fix involved adjusting the API
      to bli_cntx_set_ind_blkszs() so that the induced method context init
      function (e.g., bli_cntx_init_<subconfig>_ind()) could call that
      function for only one datatype at a time. This allowed the blocksize
      scaling (which varies depending on whether we're doing 1m_r or 1m_c)
      to happen on a per-datatype basis. This fixes issue #557. Thanks to
      Devin Matthews and RuQing Xu for helping discover and report this bug.
    - The aforementioned 1m fix required moving the 1m_r/1m_c logic from
      bli_cntx_ref.c into a new function, bli_l3_set_schemas(), which is
      called from each level-3 _front() function. The pack_t schemas in the
      cntx_t were also removed entirely, along with the associated accessor
      functions. This in turn required updating the trsm1m-related virtual
      ukernels to read the pack schema for B from the auxinfo_t struct
      rather than the context. This also required slight tweaks to
      bli_gemm_md.c.
    - Repositioned the logic for transposing the operation to accommodate
      the microkernel IO preference. This mostly only affects gemm. Thanks
      to Devin Matthews for his help with this.
    - Updated dpackm pack ukernels in the 'armsve' kernel set to avoid
      querying pack_t schemas from the context.
    - Removed the num_t dt argument from the ind_cntx_init_ft type defined
      in bli_gks.c. The context initialization functions for induced methods
      were previously passed a dt argument, but I can no longer figure out
      *why* they were passed this value. To reduce confusion, I've removed
      the dt argument (including also from the function defintion +
      prototype).
    - Commented out setting of cntx_t schemas in bli_cntx_ind_stage.c. This
      breaks high-leve implementations of 3m and 4m, but this is okay since
      those implementations will be removed very soon.
    - Removed some older blocks of preprocessor-disabled code.
    - Comment update to test_libblis.c.

commit 81e103463214d589071ccbe2d90b8d7c19a186e4
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date:   Wed Oct 13 20:28:02 2021 +0200

    Alloc at least 1 elem in pool_t block_ptrs. (#560)
    
    Details:
    - Previously, the block_ptrs field of the pool_t was allowed to be
      initialized as any unsigned integer, including 0. However, a length of
      0 could be problematic given that malloc(0) is undefined and therefore
      variable across implementations. As a safety measure, we check for
      block_ptrs array lengths of 0 and, in that case, increase them to 1.
    - Co-authored-by: Minh Quan Ho <minh-quan.ho@kalray.eu>

commit 327481a4b0acf485d0cbdd8635dd9b886ba3f2a7
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date:   Tue Oct 12 19:53:04 2021 +0200

    Fix insufficient pool-growing logic in bli_pool.c. (#559)
    
    Details:
    - The current mechanism for growing a pool_t doubles the length of the
      block_ptrs array every time the array length needs to be increased
      due to new blocks being added. However, that logic did not take in
      account the new total number of blocks, and the fact that the caller
      may be requesting more blocks that would fit even after doubling the
      current length of block_ptrs. The code comments now contain two
      illustrating examples that show why, even after doubling, we must
      always have at least enough room to fit all of the old blocks plus
      the newly requested blocks.
    - This commit also happens to fix a memory corruption issue that stems
      from growing any pool_t that is initialized with a block_ptrs length
      of 0. (Previously, the memory pool for packed buffers of C was
      initialized with a block_ptrs length of 0, but because it is unused
      this bug did not manifest by default.)
    - Co-authored-by: Minh Quan Ho <minh-quan.ho@kalray.eu>

commit 32a6d93ef6e2af5e486dfd5e46f8272153d3d53d
Merge: 408906fd 2604f407
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Oct 9 15:53:54 2021 -0500

    Merge pull request #543 from xrq-phys/armsve-packm-fix
    
    ARMSVE Block SVE-Intrinsic Kernels for GCC 8-9

commit 408906fdd8892032aa11bd061b7971128f453bef
Merge: 4277fec0 ccf16289
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Oct 9 15:50:25 2021 -0500

    Merge pull request #542 from xrq-phys/armsve-zgemm
    
    Arm SVE CGEMM / ZGEMM Natural Kernels

commit ccf16289d2e71fd9511ccf2d13dcebbfa29deabc
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Oct 8 12:34:14 2021 +0900

    Arm SVE C/ZGEMM Fix FMOV 0 Mistake
    
    FMOV [hsd]M, #imm does not allow zero immediate.
    Use wzr, xzr instead.

commit 82b61283b2005f900101056e6df2a108258db602
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Oct 8 12:17:29 2021 +0900

    SH Kernel Unused Eigher

commit 1749dfa493054abd2e4ddba7cb21278d337e4f74
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Oct 8 12:11:53 2021 +0900

    Arm SVE C/ZGEMM Support *beta==0

commit 4b648e47daad256ab8ab698173a97f71ab9f75eb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Sep 22 16:42:09 2021 +0900

    Arm SVE Config armsve Use ZGEMM/CGEMM

commit f76ea905e216cf640975e6319c6d2f54aeafed2e
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Tue Sep 21 20:38:44 2021 +0900

    Arm SVE: Update Perf. Graph
    
    Pic. size seems a bit different from upstream.
    Generaged w/ MATLAB. Open to any change.

commit 66a018e6ad00d9e8967b67e1aa3e23b20a7efdfe
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Mon Sep 20 00:16:11 2021 +0900

    Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0

commit 9e1e781cb59f8fadb2a10a02376d3feac17ce38d
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sun Sep 19 23:30:42 2021 +0900

    Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0

commit f7c6c2b119423e7ba7a24ae2156790e076071cba
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Sep 16 01:47:42 2021 +0900

    A64FX Config Use ZGEMM/CGEMM

commit e4cabb977d038688688aca39b366f98f9c36b7eb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Sep 16 01:34:26 2021 +0900

    Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg

commit b677e0d61b23f26d9536e5c363fd6bbab6ee1540
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Sep 16 01:18:54 2021 +0900

    Arm SVE Add SGEMM 2Vx10 Unindexed

commit 3f68e8309f2c5b31e25c0964395a180a80014d36
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Sep 16 01:00:54 2021 +0900

    Arm SVE ZGEMM Support Gather Load / Scatt. St.

commit c19db2ff826e2ea6ac54569e8aa37e91bdf7cabe
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Sep 15 23:39:53 2021 +0900

    Arm SVE Add ZGEMM 2Vx10 Unindexed

commit e13abde30b9e0e381c730c496e74bc7ae062a674
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Sep 15 04:19:45 2021 +0900

    Arm SVE Add ZGEMM 2Vx7 Unindexed

commit 49b9d7998eb86f340ae7b26af3e5a135d6a8feee
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Tue Sep 14 04:02:47 2021 +0900

    Arm SVE Add ZGEMM 2Vx8 Unindexed

commit 4277fec0d0293400497ae8bcfc32be5e62319ae9
Merge: 2329d990 f44149f7
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 7 13:47:22 2021 -0500

    Merge pull request #533 from xrq-phys/arm64-hi-bw
    
    ARMv8 PACKM and GEMMSUP Kernels + Apple Firestorm Subconfig

commit 2329d99016fe1aeb86da4552295f497543cea311
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 7 12:37:58 2021 -0500

    Update Travis CI badge
    
    [ci skip]

commit f44149f787ae3d4b53d9c4d8e6f23b2818b7770d
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Oct 8 02:35:58 2021 +0900

    Armv8 Trash New Bulk Kernels
    
    - They didn't make much improvements.
    - Can't register row-preferral and column-preferral ukrs at the same time.
      Will break 1m.

commit 70b52cadc5ef4c16431e1876b407019e6286614e
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 7 12:34:35 2021 -0500

    Enable testing 1m in `make check`.

commit 2604f4071300d109f28c8438be845aeaf3ec44e4
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Oct 7 02:39:00 2021 +0900

    Config ArmSVE Unregister 12xk. Move 12xk to Old

commit 1e3200326be9109eb0f8c7b9e4f952e45700cbba
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Oct 7 02:37:14 2021 +0900

    Revert __has_include(). Distinguish w/ BLIS_FAMILY_**

commit a4066f278a5c06f73b16ded25f115ca4b7728ecb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Oct 7 02:26:05 2021 +0900

    Register firestorm into arm64 Metaconfig

commit d7a3372247c37568d142110a1537632b34b8f2ff
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Oct 7 02:25:14 2021 +0900

    Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo

commit 2920dde5ac52e09f84aa42990aab8340421522ce
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Oct 7 02:01:45 2021 +0900

    Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo

commit 14b13583f1802c002e195b3b48874b3ebadbeb20
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Oct 6 10:22:34 2021 -0500

    Add test for Apple M1 (firestorm)
    
    This test will run on Linux, but all the kernels should run just fine. This does not test autodetection but then none of the other ARM tests do either.

commit a024715065532400da6257b8b3124ca5aecda405
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Oct 7 00:15:54 2021 +0900

    Firestorm CPUID Dispatcher
    
    Commenting out <sys/sysctl.h> due to possibly a Xcode bug.

commit b9da6d55fec447d05c8b67f34ce83617123d8357
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Oct 6 12:25:54 2021 +0900

    Armv8 GEMMSUP Edge Cases Require Signed Ints
    
    Fix a bug in bli_gemmsup_rd_armv8a_asm_d6x8m.c.
    For safety upon similar strategies in the future,
     change all [mn]_[iter/left] into signed ints.

commit 34919de3df5dda7a06fc09dcec12ca46dc8b26f4
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Oct 2 18:48:50 2021 -0500

    Make error checking level a thread-local variable.
    
    Previously, this was a global variable. Setting the value was synchronized via a mutex but reading the value was not. Of course, these accesses are almost certainly atomic, but there is still the possibility of one thread attempting to set the value and then reading the value set by another thread. For correct operation under user threading (e.g. pthreads), this should probably be thread-local with no mutex.

commit c3024993c3d50236fad112822215f066496c5831
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 5 15:20:27 2021 -0500

    Fix data race in testsuite.

commit 353a0d82572f26e78102cee25693130ce6e0ea5b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 5 14:24:17 2021 -0500

    Update .appveyor.yml
    
    [ci skip]

commit 4bfadf9b561d4ebe0bbaf8b6d332f07ff531d618
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Oct 6 01:51:26 2021 +0900

    Firestorm Block Size Fixes

commit 40baf83f0ea2749199b93b5a8ac45c01794b008c
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Oct 6 01:00:52 2021 +0900

    Armv8 Handle *beta == 0 for GEMMSUP ??r Case.

commit 079fbd42ce8cf7ea67a939b0f80f488de5821319
Merge: f5c03e9f 9905f443
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 17:21:48 2021 -0500

    Merge branch 'master' into arm64-hi-bw

commit 9905f44347eea4c57ef4927b81f1c63e76a92739
Merge: 6d3036e3 64a421f6
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 15:58:59 2021 -0500

    Merge pull request #553 from flame/rpath-fix
    
    Add an option to use an @rpath-dependent install_name on macOS

commit 6d3036e31d8a2c1acbc1260489eeb8f535a8f97a
Merge: 53377fcc eaa554aa
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 15:58:43 2021 -0500

    Merge pull request #545 from hominhquan/clean_error
    
    bli_error: more cleanup on the error strings array

commit 53377fcca91e595787b38e2a47780ac0c35a7e7c
Merge: d0a0b4b8 80c5366e
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 15:45:53 2021 -0500

    Merge pull request #554 from flame/armsve-cleanup
    
    Move unused ARM SVE kernels to "old" directory.

commit 80c5366e4a9b8b72d97fba1eab89bab8989c44f4
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 15:40:28 2021 -0500

    Move unused ARM SVE kernels to "old" directory.

commit 64a421f6983ab5bc0b55df30a2ddcfff5bfd73be
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 13:40:43 2021 -0500

    Add an option to control whether or not to use @rpath.
    
    Adds `--enable-rpath/--disable--rpath` (default disabled) to use an install_name starting with @rpath/. Otherwise, set the install_name to the absolute path of the install library, which was the previous behavior.

commit c4a31683dd6f4da3065d86c11dd998da5192740a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 13:27:10 2021 -0500

    Fix $ORIGIN usage on linux.

commit d0a0b4b841fce56b7b2d3c03c5d93ad173ce2b97
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Mon Oct 4 18:03:04 2021 +0000

    Arm micro-architecture dispatch (#344)
    
    Details:
    - Reworked support for ARM hardware detection in bli_cpuid.c to parse
      the result of a CPUID-like instruction.
    - Added a64fx support to bli_gks.c.
    - #include arm64 and arm32 family headers from bli_arch_config.h.
    - Fix the ordering of the "armsve" and "a64fx" strings in the
      config_name string array in bli_arch.c. The ordering did not match
      the ordering of the corresponding arch_t values in bli_type_defs.h,
      as it should have all along.
    - Added clang support to make_defs.mk in arm64, cortexa53, cortexa57
      subconfigs.
    - Updated arm64 and arm32 families in config_registry.
    - Updated docs/HardwareSupport.md to reflect added ARM support.
    - Thanks to Dave Love, RuQing Xu, and Devin Matthews for their
      contributions in this PR (#344).

commit 91408d161a2b80871463ffb6f34c455bdfb72492
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Oct 4 11:37:48 2021 -0500

    Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries.
    
    - RPATH entries (and DYLD_LIBRARY_PATH) do nothing on macOS unless the install_name of the library starts with @rpath/. While the install_name can be set to the absolute install path, this makes the installation non-relocatable. When using @path in the install_name, install paths within the normal DYLD_LIBRARY_PATH work with no changes on the user side, but for install paths off the beaten track, users must specify an RPATH entry when linking (or modify DYLD_LIBRARY_PATH at runtime). Perhaps this could be made into a configure-time option.
    - Having relocable testsuite binaries is not necessarily a priority but it is easy to do with @executable_path (macOS) or $ORIGIN (linux/BSD).

commit f5c03e9fe808f9bd8a3e0c62786334e13c46b0fc
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sun Oct 3 16:51:51 2021 +0900

    Armv8 Handle *beta == 0 for GEMMSUP ?rc Case.

commit abc648352c591e26ceee436bd3a45400115b70c5
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sun Oct 3 13:14:19 2021 +0900

    Armv8 Fix 6x8 Row-Maj Ukr
    
    - Fixed for 6x8 only, 4x4 & 4x8 pending;
    - Installed to config firestorm as benchmark seems to show better perf:
       Old:
    blis_dgemm_ukr_c                     6     8   320    36.87   2.43e-17   PASS
    blis_dgemm_ukr_c                     6     8   352    40.55   1.04e-17   PASS
    blis_dgemm_ukr_c                     6     8   384    44.24   5.68e-17   PASS
    blis_dgemm_ukr_c                     6     8   416    41.67   3.51e-17   PASS
    blis_dgemm_ukr_c                     6     8   448    34.41   2.94e-17   PASS
    blis_dgemm_ukr_c                     6     8   480    42.53   2.35e-17   PASS
    
       New:
    blis_dgemm_ukr_r                     6     8   352    50.69   1.59e-17   PASS
    blis_dgemm_ukr_r                     6     8   384    49.15   5.55e-17   PASS
    blis_dgemm_ukr_r                     6     8   416    50.44   2.86e-17   PASS
    blis_dgemm_ukr_r                     6     8   448    46.92   3.12e-17   PASS
    blis_dgemm_ukr_r                     6     8   480    48.08   4.08e-17   PASS

commit 0a45bc0fbc7aee3876c315ed567fc37f19cdc57f
Merge: 5013a6cb 13dbd5b5
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Oct 2 18:59:43 2021 -0500

    Merge pull request #552 from flame/armsve_beta_0
    
    Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.

commit 13dbd5b5d3dbf27e33ecf0e98d43c97019a6339d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Oct 2 20:40:25 2021 +0000

    Apply patch from @xrq-phys.

commit ae0eeeaf77c77892db17027cef10b95ec97c904f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Sep 29 16:42:33 2021 -0500

    Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.

commit 5013a6cb7110746c417da96e4a1308ef681b0b88
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 29 10:38:50 2021 -0500

    More edits and fixes to docs/FAQ.md.

commit b36fb0fbc5fda13d9a52cc64953341d3d53067ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 28 18:47:45 2021 -0500

    Fixed newly broken link to CREDITS in FAQ.md.

commit 3442d4002b3bfffd8848f72103b30691df2b19b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 28 18:43:23 2021 -0500

    More minor fixes to FAQ.md and Sandboxes.md.

commit 89aaf00650d6cc19b83af2aea6c8d04ddd3769cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 28 18:34:33 2021 -0500

    Updates to FAQ.md, Sandboxes.md, and README.md.
    
    Details:
    - Updated FAQ.md to include two new questions, reordered an existing
      question, and also removed an outdated and redundant question about
      BLIS vs. AMD BLIS.
    - Updated Sandboxes.md to use 'gemmlike' as its main example, along with
      other smaller details.
    - Added ARM as a funder to README.md.

commit c52c43115ec2264fda9380c48d9e6bb1e1ea2ead
Merge: 1fc23d21 1f527a93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Sep 26 15:56:54 2021 -0500

    Merge branch 'dev'

commit 1fc23d2141189c7b583a5bff2cffd87fd5261444
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 21 14:54:20 2021 -0500

    Safelist 'master', 'dev', 'amd' branches.
    
    Details:
    - Modified .travis.yml so that only commits to 'master', 'dev', and
      'amd' branches get built by Travis CI. Thanks to Devin Matthews for
      helping to track down the syntax for this change.

commit 1f527a93b996093e06ef7a8e94fb47ee7e690ce0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 20 17:56:36 2021 -0500

    Re-enable and fix fb93d24.
    
    Details:
    - Re-enabled the changes made in fb93d24.
    - Defined BLIS_ENABLE_SYSTEM in bli_arch.c, bli_cpuid.c, and bli_env.c,
      all of which needed the definition (in addition to config_detect.c) in
      order for the configure-time hardware detection binary to be compiled
      properly. Thanks to Minh Quan Ho for helping identify these additional
      files as needing to be updated.
    - Added additional comments to all four source files, most notably to
      prompt the reader to remember to update all of the files when updating
      any of the files. Also made the cpp code in each of the files as
      consistent/similar as possible.
    - Refer to issues #532 and PR #546 for more history.

commit 7b39c1492067de941f81b49a3b6c1583290336fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 20 16:13:50 2021 -0500

    Reverted fb93d24.
    
    Details:
    - The latest changes in fb93d24 are still causing problems. Reverting
      and preparing to move them to a branch.

commit fb93d242a4fef4694ce2680436da23087bbdd5fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 20 15:42:08 2021 -0500

    Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
    
    Details:
    - Re-enable the changes originally made in 8e0c425 but quickly reverted
      in 2be78fc.
    - Moved the #include of bli_config.h so that it occurs before the
      #include of bli_system.h. This allows the #define BLIS_ENABLE_SYSTEM
      or #define BLIS_DISABLE_SYSTEM in bli_config.h to be processed by the
      time it is needed in bli_system.h. This change should have been
      in the original 8e0c425, but was accidentally omitted. Thanks to Minh
      Quan Ho for catching this.
    - Add #define BLIS_ENABLE_SYSTEM to config_detect.c so that the proper
      cpp conditional branch executes in bli_system.h when compiling the
      hardware detection binary. The changes made in 8e0c425 were an attempt
      to support the definition of BLIS_OS_NONE when configuring with
      --disable-system (in issue #532).  That commit failed because, aside
      from the required but omitted header reordering (second bullet above),
      AppVeyor was unable to compile the hardware detection binary as a
      result of missing Windows headers. This commit, which builds on PR
      #546, should help fix that issue. Thanks to Minh Quan Ho for his
      assistance and patience on this matter.

commit eaa554aa52b879d181fdc87ba0bfad3ab6131517
Author: Minh Quan HO <minh-quan.ho@kalray.eu>
Date:   Wed Sep 15 15:39:36 2021 +0200

    bli_error: more cleanup on the error strings array
    
    - There was redundance between the macro BLIS_MAX_NUM_ERR_MSGS (=200) and
      the enum BLIS_ERROR_CODE_MAX (-170), while they both mean the same thing:
      the maximal number of error codes/messages.
    - The previous initialization of error messages at compile time ignored that
      the 'bli_error_string' array still occupies useless memory due to 2D char[][]
      declaration. Instead, it should be just an array of pointers, pointing at
      strings in .rodata section.
    - This commit does the two modifications:
       * retired macros BLIS_MAX_NUM_ERR_MSGS and BLIS_MAX_ERR_MSG_LENGTH everywhere
       * switch bli_error_string from char[][] to char *[] to reduce its footprint
         from 40KB (200*200) to 1.3KB (170*sizeof(char*)).
         (No problem to use the enum BLIS_ERROR_CODE_MAX at compile-time,
         since compiler is smart enough to determine its value is 170.)

commit 52f29f739dbbb878c4cde36dbe26b82847acd4e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 17 08:38:29 2021 -0500

    Removed last vestige of #define BLIS_NUM_ARCHS.
    
    Details:
    - Removed the commented-out #define BLIS_NUM_ARCHS in bli_type_defs.h
      and its associated (now outdated) comments. BLIS_NUM_ARCHS has been
      part of the arch_t enum for some time now, and so this change is
      mostly about removing any opportunity for confusion for people who
      may be reading the code. Thanks to Minh Quan Ho for leading me to
      cleanup.

commit 849aae09f4fbf8d7abf11f4df1471f1d057e874b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 16 14:47:45 2021 -0500

    Added new packm var3 to 'gemmlike'.
    
    Details:
    - Defined a new packm variant for the 'gemmlike' sandbox. This new
      variant (bls_l3_packm_var3.c) parallelizes the packing operation over
      the k dimension rather than the m or n dimensions. Note that the
      gemmlike implementation still uses var1 by default, and use of the new
      code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
      so that var3 is called instead. Thanks to Jeff Diamond for proposing
      this (perhaps NUMA-friendly) solution.

commit b6f71fd378b7cd0cdc5c780e0b8c975a7abde998
Merge: 9293a68e e3dc1954
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Sep 16 12:24:33 2021 -0500

    Merge pull request #544 from flame/haswell-gemmsup-fpe
    
    Fix more copy-paste errors in the haswell gemmsup code.

commit e3dc1954ffb5eee2a8b41fce85ba589f75770eea
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Sep 16 10:59:37 2021 -0500

    Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
    
    The fix is to use the same (valid) source register twice in the horizontal addition.

commit 5191c43faccf45975f577c60b9089abee25722c9
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Sep 16 10:16:17 2021 -0500

    Fix more copy-paste errors in the haswell gemmsup code.
    
    Fixes #486.

commit 30c29b256ef13f0141ca9e9169cbdc7a45ce3a61
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Sep 16 05:01:03 2021 +0900

    Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9
    
    Affected configs: a64fx.

commit bffa85be59dece8e756b9444e762f18892c06ee1
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Sep 16 04:31:45 2021 +0900

    Arm SVE: Correct PACKM Ker Name: Intrinsic Kers
    
    SVE-Intrinsic-based kernels ought not to use asm in their names.

commit 9293a68eb6557a9ea43a846435908c3d52d4218b
Merge: ade10f42 98ce6e8b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 10 14:13:29 2021 -0500

    Merge pull request #534 from flame/cxx_test
    
    Add test to Travis using C++ compiler to make sure blis.h is C++-compatible

commit 98ce6e8bc916e952510872caa60d818d62a31e69
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 10 14:12:13 2021 -0500

    Do a fast test on OSX. [ci skip]

commit c76fcad0c2836e7140b6bef3942e0a632a5f2cda
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 10 13:57:02 2021 -0500

    Fix AArch64 tests and consolidate some other tests.

commit e486d666ffefee790d5e39895222b575886ac1ea
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 10 13:50:16 2021 -0500

    Use C++ cross-compiler for ARM tests.

commit fbb3560cb8e2aeab205c47c2b096d4fa306d93db
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 10 13:38:27 2021 -0500

    Attempt to fix cxx-test for OOT builds.

commit 9c0064f3f67d59263c62d57ae19605562bb87cc2
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Sep 10 10:39:04 2021 -0500

    Fix config_name in bli_arch.c

commit ade10f427835d5274411cafc9618ac12966eb1e7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 27 12:47:12 2021 -0500

    Updated travis-ci.org link in README.md to .com.

commit 2be78fc97777148c83d20b8509e38aa1fc1b4540
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 27 12:17:26 2021 -0500

    Disabled (at least temporarily) commit 8e0c425.
    
    Details:
    - Reverted changes in 8e0c425 due to AppVeyor build failures that we do
      not yet understand.

commit 820f11a4694aee5f234e24277aecca40885ae9d4
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Aug 27 13:40:26 2021 +0900

    Arm Whole GEMMSUP Call Route is Asm/Int Optimized
    
    - `ref2` call in `bli_gemmsup_rv_armv8a_asm_d6x8m.c` is commented out.
    - `bli_gemmsup_rv_armv8a_asm_d4x8m.c` contains a tail `ref2` call but
      it's not called by any upper routine.

commit 8e0c4255de52a0a5cffecbebf6314aa52120ebe4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 26 15:29:18 2021 -0500

    Define BLIS_OS_NONE when using --disable-system.
    
    Details:
    - Modified bli_system.h so that the cpp macro BLIS_OS_NONE is defined
      when BLIS_DISABLE_SYSTEM is defined. Otherwise, the previous OS-
      detecting macro conditionals are considered. This change is to
      accommodate a solution to a cross-compilation issue described in
      #532.

commit d6eb70fbc382ad7732dedb4afa01cf9f53e3e027
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 26 13:12:39 2021 -0500

    Updated stale calls to malloc_intl() in gemmlike.
    
    Details:
    - Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
      sandbox. These calls to malloc_intl(), which resided in
      bls_l3_decor_pthreads.c, were missing the err_t argument that the
      function uses to report errors. Thanks to Jeff Diamond for helping
      isolate this issue.

commit 2f7325b2b770a15ff8aaaecc087b22238f0c67b7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 23 15:04:05 2021 -0500

    Blacklist clang10/gcc9 and older for 'armsve'.
    
    Details:
    - Prohibit use of clang 10.x and older or gcc 9.x and older for the
      'armsve' subconfiguration. Addresses issue #535.

commit 7e2951e61fda1c325d6a76ca9956253482d84924
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Mon Aug 23 17:06:44 2021 +0900

    Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref
    
    Ref cannot handle panel strides (packed cases) thus cannot be called
    from the beginning of `gemmsup` (i.e. cannot be dispatch target of
    gemmsup to other sizes.)

commit 4fd82b0e9348553d83e258bd4969e49a81f8fcf0
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Mon Aug 23 05:18:32 2021 +0900

    Header Typo

commit 35409ebe67557c0e7cf5ced138c8166c9c1c909f
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Mon Aug 23 04:51:47 2021 +0900

    Arm: DGEMMSUP ??r(rv) Invoke Edge Size
    
    Plus some fix at edges.
    
    TODO: Should ensure that no ref kernel appear in beginning of gemmsup
    kernels. As ref does not recognise panel stride.

commit a361492c24fdd919ee037763fc6523e8d7d2967a
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Mon Aug 23 01:13:39 2021 +0900

    Arm: DGEMMSUP ?rc(rd) Invoke Edge Size

commit eaea67401c2ab31f2e51eede59725f64c1a21785
Merge: 5fc65cdd e320ec6d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Aug 21 16:09:31 2021 -0500

    Merge branch 'master' into cxx_test

commit 5fc65cdd9e4134c5dcb16d21cd4a79ff426ca9f3
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Aug 21 15:59:27 2021 -0500

    Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.

commit e320ec6d5cd44e03cb2e2faa1d7625e84f76d668
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 20 17:15:20 2021 -0500

    Moved lang defs from _macro_def.h to _lang_defs.h.
    
    Details:
    - Moved miscellaneous language-related definitions, including defs
      related to the handling of the 'restrict' keyword, from the top half
      of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
      #included immediately after "bli_system.h" in blis.h. This change is
      an attempt to fix a report of recent breakage of C++ compilers due
      to the recent introduction of 'restrict' in bli_type_defs.h (which
      previously was being included *before* bli_macro_defs.h and its
      restrict handling therein. Thanks to Ivan Korostelev for reporting
      this issue in #527.
    - CREDITS file update.

commit e6799b26a6ecf1e80661a77d857d1c9e9adf50dc
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Aug 21 02:39:38 2021 +0900

    Arm: Implement GEMMSUP Fallback Method
    
    bli_dgemmsup_rv_armv8a_int_6x4mn

commit 7d5903d8d7570090eb37c592094424d1c64805d1
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Aug 21 01:55:50 2021 +0900

    Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin
    
    Forgot to support `alpha`/`beta` in gemmsup_armv8a_int.

commit 3b275f810b2479eb5d6cf2296e97a658cf1bb769
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 19 16:06:46 2021 -0500

    Minor tweaks to gemmlike sandbox.
    
    Details:
    - In the gemmlike sandbox, changed the loop index variable of inner
      loop of packm_cxk() from 'd' to 'i' (and likewise for the
      corresponding inlined code within packm_var2()).
    - Pack matrices A and B using packm_var1() instead of packm_var2().

commit 3eccfd456e7e84052c9a429dcde1183a7ecfaa48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 19 13:22:10 2021 -0500

    Added local _check() code to gemmlike sandbox.
    
    Details:
    - Added code to the gemmlike sandbox that handles parameter checking.
      Previously, the gemmlike implementation called bli_gemm_check(), which
      resides within the BLIS framework proper. Certain modifications that a
      user may wish to perform on the sandbox, such as adding a new matrix
      or vector operand, would have required additional checks, and so these
      changes make it easier for such a person to implement those checks for
      their custom gemm-like operation.

commit 7144230cdb0653b70035ddd91f7f41e06ad8d011
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 18 13:25:39 2021 -0500

    README.md citation updates (e.g. BLIS7 bibtex).

commit 4a955e939044cfd2048cf9f3e33024e3ad1fbe00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 16 13:49:27 2021 -0500

    Tweaks to gemmlike to facilitate 3rd party mods.
    
    Details:
    - Changed the implementation in the 'gemmlike' sandbox to more easily
      allow others to provide custom implementations of packm. These changes
      include:
      - Calling a local version of packm_cxk() that can be modified. This
        version of packm_cxk() uses inlined loops in packm_cxk() rather
        than querying the context for packm kernels (or even using scal2m).
      - Providing two variants of packm, one of which calls the
        aforementioned packm_cxk(), the other of which inlines the contents
        of packm_cxk() into the variant itself, making it self-contained.
        To switch from one to the other, simply change which function gets
        called within bls_packm_a() and bls_packm_b().
      - Simplified and cleaned up some variant names in both variants of
        packm, relative to their parent code.

commit 2c0b4150e40c83ea814f69ca766da74c19ed0a58
Merge: c99fae50 4b8ed99d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Aug 14 18:41:35 2021 -0500

    Merge pull request #527 from flame/obj_t_makeover
    
    Implement proposed new function pointer fields for obj_t.

commit 4b8ed99d926876fbf54c15468feae4637268eb6b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 13 15:31:10 2021 -0500

    Whitespace tweaks.

commit c99fae50ac3de0b5380a085aeebebfe67a645407
Merge: e6d68bc4 4f70eb79
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Aug 13 14:48:00 2021 -0500

    Merge pull request #530 from flame/fix_clang_warnings
    
    Clean up some warnings that show up on clang/OSX.

commit e6d68bc4fd0981bea90d7f045779cacfe53f6ae8
Merge: 20a1c401 ec06b6a5
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Aug 13 14:47:46 2021 -0500

    Merge pull request #529 from flame/fix_make_check_dependencies
    
    Add dependency on the "flat" blis.h file for the BLIS and BLAS testuite objects.

commit 1772db029e10e0075b5a59d3fb098487b1ad542a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Aug 13 14:46:35 2021 -0500

    Add row- and column-strides for A/B in obj_ukr_fn_t.

commit 4f70eb7913ad3ded193870361b6da62b20ec3823
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Aug 13 11:12:43 2021 -0500

    Clean up some warnings that show up on clang/OSX.

commit 3cddce1e2a021be6064b90af30022b99cbfea986
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 12 22:32:34 2021 -0500

    Remove schema field on obj_t (redundant) and add new API functions.

commit ec06b6a503a203fa0cdb23273af3c0e3afeae7fa
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 12 19:27:31 2021 -0500

    Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
    
    This fixes a bug where "make -j<N> check" may fail after a change to one or more header files, or where testsuite code doesn't get properly recompiled after internal changes.

commit 20a1c4014c999063e6bc1cfa605b152454c5cbf4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 12 14:44:04 2021 -0500

    Disabled sanity check in bli_pool_finalize().
    
    Details:
    - Disabled a sanity check in bli_pool_finalize() that was meant to alert
      the user if a pool_t was being finalized while some blocks were still
      checked out. However, this is exactly the situation that might happen
      when a pool_t is re-initialized for a larger blocksize, and currently
      bli_pool_reinit() is implemeneted as _finalize() followed by _init().
      So, this sanity check is not universally appropriate. Thanks to
      AMD-India for reporting this issue.

commit e366665cd2b5ae8d7683f5ba2de345df0a41096f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 12 14:06:53 2021 -0500

    Fixed stale API calls to membrk API in gemmlike.
    
    Details:
    - Updated stale calls to the bli_membrk API within the 'gemmlike'
      sandbox. This API is now called bli_pba (packed block allocator).
      Ideally, this forgotten update would have been included as part of
      21911d6, which is when the branch where the membrk->pba changes was
      introduced was merged into 'master'.
    - Comment updates.

commit e38ca28689f31c5e5bd2347704dc33042e5ea176
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Aug 13 03:21:19 2021 +0900

    Added Apple Firestorm (A14/M1) Subconfig
    
    - Use the same bulk kernel as Cortex-A53 / ThunderX2;
    - Larger block size;
    - Use gemmsup kernels for double precision.

commit 3df0e9b653fbb1293cad93010273eea579e753d9
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Jul 17 04:21:53 2021 +0900

    Arm64 8x4 Kernel Use Less Regs

commit 4e7e225057a05b9722ce65ddf75a9c31af9fbf36
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Jun 9 15:46:36 2021 +0900

    Armv8-A Supplimentary GEMMSUP Sizes for RD

commit c792d506ba09530395c439051727631fd164f59a
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Jun 5 04:20:24 2021 +0900

    Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm
    
    Suffixed NEON opcode is not supported by GNU assembler

commit ce4473520975c2c8790c82c65a69d75f8ad758ea
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Jun 5 04:08:14 2021 +0900

    Armv8-A Adjust Types for PACKM Kernels
    
    GCC does not have full NEON intrinsics support.

commit 8a32d19af85b61af92fcab1c316fb3be1a8d42ce
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Jun 5 03:31:30 2021 +0900

    Armv8-A GEMMSUP-RD 6x8m
    
    Armv8-A now has a complete set of GEMMSUP kernels..

commit afd0fa6ad1889ed073f781c8aa8635f99e76b601
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat Jun 5 01:19:01 2021 +0900

    Armv8-A GEMMSUP-RD 6x8n

commit 3c5f7405148ab142dee565d00da331d95a7a07b9
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Jun 4 21:50:51 2021 +0900

    Armv8-A s/d Packing Kernels Fix Typo
    
    For GCC.

commit 49b05df7929ec3abc0d27b475d2d406116fe2682
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Fri Jun 4 18:04:59 2021 +0900

    Armv8-A Introduced s/d Packing Kernels
    
    Sizes according to the 2014 kernels.

commit c3faf93168c3371ff48a2d40d597bdb27021cad4
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Jun 3 23:09:05 2021 +0900

    Armv8-A DGEMMSUP 6x8m Kernel
    
    Recommended kernels set:
      ...
      BLIS_RRR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE,
      BLIS_RCR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE,
      BLIS_RCC, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE,
      BLIS_CRR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE,
      BLIS_CCR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE,
      BLIS_CCC, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE,
      ...
      bli_blksz_init     ( &blkszs[ BLIS_MR ],    -1,     6,    -1,    -1,
                                                  -1,     8,    -1,    -1 );
      bli_blksz_init_easy( &blkszs[ BLIS_NR ],    -1,     8,    -1,    -1 );
      ...

commit 3efe707b5500954941061d4c2363d6ed41d17233
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Jun 3 17:20:57 2021 +0900

    Armv8-A DGEMMSUP Adjustments

commit 8ed8f5e625de9b77a0f14883283effe79af01771
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Jun 3 16:37:37 2021 +0900

    Armv8-A Add More DGEMMSUP
    
    - Add 6x8 GEMMSUP.
    - Adjust prefetching.
    - Workaround for Clang's disability to handle reg clobbering.
    - Subproduct 6x8 row-major GEMM <- incomplete.

commit a9ba79ea14de3b5a271e5970cb473d3c52e2fa5f
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Jun 2 15:04:29 2021 +0900

    Armv8-A Add GEMMSUP 4x8n Kernel
    
    - Compile w/ both GCC & Clang.
    - Edge cases use ref-kernels.
    - Can give performance boost in some contexts.

commit df40efe8fbfd399d76c6000ec03791a9b76ffbdf
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed Jun 2 00:04:20 2021 +0900

    Armv8-A Add Part of GEMMSUP 8x4m Kernel
    
    - Compile w/ both GCC & Clang
    - Only block part is implement. Edge cases WIP
    - Not Optimal kernel scheme. Should do 4x8 instead

commit 66399992881316514f64d68ec9eb60a87d53f674
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 29 05:52:05 2021 +0900

    Armv8A DGEMM 4x4 Kernel WIP. Slow
    
    Quite slow.

commit a29c16394ccef02d29141c79b71fb408e20073e6
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 29 04:58:45 2021 +0900

    Armv8-A Add 8x4 Kernel WIP
    
    Test result: a bit lower GFlOps than 6x8.

commit 64a1f786d58001284aa4f7faf9fae17f0be7a018
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Aug 11 17:53:12 2021 -0500

    Implement proposed new function pointer fields for obj_t.
    
    The added fields:
    1. `pack_t schema`: storing the pack schema on the object allows the macrokernel to act accordingly without side-channel information from the rntm_t and cntx_t. The pack schema and "pack_[ab]" fields could be removed from those structs.
    2. `void* user_data`: this field can be used to store any sort of additional information provided by the user. The pointer is propagated to submatrix objects and copies, but is otherwise ignored by the framework and the default implementations of the following three fields. User-specified pack, kernel, or ukr functions can do whatever they want with the data, and the user is 100% responsible for allocating, assigning, and freeing this buffer.
    3. `obj_pack_fn_t pack`: the function called when a matrix is packed. This functions receives the expected arguments, as well as a mdim_t and mem_t* as memory must be allocated inside this function, and behavior may differ based on which matrix is being backed (i.e. transposition for B). This could also be achieved by passing a desired pack schema, but this would require additional information to travel down the control tree.
    4. `obj_ker_fn_t ker`: the function called when we get to the "second loop", or the macro-kernel. Behavior may depend on the pack schemas of the input matrices. The default implementation would perform the inner two loops around the ukr, and then call either the default ukr or a user-supplied one (next field).
    5. `obj_ukr_fn_t ukr`: the function called by the default macrokernel. This would replace the various current "virtual" microkernels, and could also be used to supply user-defined behavior. Users could supply both a custom kernel (above) and microkernel, although the user-specified kernel does **not** necessarily have to call the ukr function specified on the obj_t.
    
    Note that no macros or functions for accessing these new fields have been defined yet. That is next once these are finalized. Addresses https://github.com/flame/blis/projects/1#card-62357687.

commit a32257eeab2e9946e71546a05a1847a39341ec6b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 5 16:23:02 2021 -0500

    Fixed bli_init.c compile-time error on OSX clang.
    
    Details:
    - Fixed a compile-time error in bli_init.c when compiling with OSX's
      clang. This error was introduced in 868b901, which introduced a
      post-declaration struct assignment where the RHS was a struct
      initialization expression (i.e. { ... }). This use of struct
      initializer expressions apparently works with gcc despite it not
      being strict C99. The fix included in this commit declares a temporary
      variable for the purposes of being initialized to the desired value,
      via the struct initializer, and then copies the temporary struct (via
      '=' struct assignment) to the persistent struct. Thanks to Devin
      Matthews for his help with this.

commit c8728cfbd19ecde9d43af05829e00bcfe7d86eed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 5 15:17:09 2021 -0500

    Fixed configure breakage on OSX clang.
    
    Details:
    - Accept either 'clang' or 'LLVM' in vendor string when greping for
      the version number (after determining that we're working with clang).
      Thanks to Devin Matthews for this fix.

commit 868b90138e64c873c780d9df14150d2a370a7a42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 4 18:31:01 2021 -0500

    Fixed one-time use property of bli_init() (#525).
    
    Details:
    - Fixes a rather obvious bug that resulted in segmentation fault
      whenever the calling application tried to re-initialize BLIS after
      its first init/finalize cycle. The bug resulted from the fact that
      the bli_init.c APIs made no effort to allow bli_init() to be called
      subsequent times at all due to it, and bli_finalize(), being
      implemented in terms of pthread_once(). This has been fixed by
      resetting the pthread_once_t control variable for initialization
      at the end of bli_finalize_apis(), and by resetting the control
      variable for finalization at the end of bli_init_apis(). Thanks to
      @lschork2 for reporting this issue (#525), and to Minh Quan Ho and
      Devin Matthews for suggesting the chosen solution.
    - CREDITS file update.

commit 8dba1e752c6846a85dea50907135bbc5cbc54ee5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 27 12:38:24 2021 -0500

    CREDITS file update.

commit cc9206df667b7c710b57b190b8ad351176de53b8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 16 15:48:37 2021 -0500

    Added Graviton2 Neoverse N1 performance results.
    
    Details:
    - Added single-threaded and multithreaded performance results to
      docs/Performance.md. These results were gathered on a Graviton2
      Neoverse N1 server. Special thanks to Nicholai Tukanov for
      collecting these results via the Arm-HPC/AWS hackaton.
    - Corrected what was supposed to be a temporary tweak to the legend
      labels in test/3/octave/plot_l3_perf.m.

commit fab5c86d68137b59800715efb69214c0a7e458a7
Merge: 84f9dcd4 d073fc9a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jul 13 16:46:21 2021 -0500

    Merge pull request #516 from nicholaiTukanov/p10-sandbox-rework
    
    P10 sandbox rework

commit 84f9dcd449fa7a4cf4087fca8ec4ca0d10e9b801
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jul 13 16:45:44 2021 -0500

    Remove unnecesary windows/zen2 directory.

commit 21911d6ed3438ca4ba942d05851ba5d7e9835586
Merge: 17729cf4 689fa0f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 9 18:10:46 2021 -0500

    Merge branch 'dev'

commit 17729cf449919d1db9777cea5b65d2efc77e2692
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jul 9 14:59:48 2021 -0500

    Add vzeroupper to Haswell microkernels. (#524)
    
    Details:
    - Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm'
      microkernels so as to avoid a performance penalty when mixing AVX
      and SSE instructions. These vzeroupper instructions were once part
      of the haswell kernels, but were inadvertently removed during a source
      code shuffle some time ago when we were managing duplicate 'haswell'
      and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down
      and re-inserting the missing instructions.

commit c9a7f59aa84daa54d8f8c771f1f1ef2bd8730da2
Merge: 75f03907 9a8e649c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Jul 8 14:00:38 2021 -0500

    Merge pull request #522 from flame/windows-avx512
    
    Fix Win64 AVX512 bug.

commit 9a8e649c5ac89eba951bbee7136ca28aeb24d731
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jul 7 15:23:57 2021 -0500

    Fix Win64 AVX512 bug.
    
    Use `-march=haswell` for kernels. Fixes #514.

commit 75f03907c58385b656c8bd35d111db245814a9f3
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jul 7 15:44:11 2021 -0500

    Add comment about make checkblas on Windows
    
    [ci skip]

commit 4651583b1204a965e4aa672c7ad6de60f3ab1600
Merge: 69205ac2 174f7fc9
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jul 7 01:11:20 2021 -0500

    Merge pull request #520 from flame/travis-ci-install
    
    Test installation in Travis CI

commit 69205ac266947723ad4d7bb028b7521fe5c76991
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 6 20:39:22 2021 -0500

    CREDITS file update.
    
    Details:
    - Thanks to Chengguo Sun for submitting #515 (5ef7f68).
    - Thanks to Andrew Wildman for submitting #519 (551c6b4).
    - Whitespace update to configure (spaces to tabs).

commit 174f7fc9a11712c7bd1a61510bdc5c262b3e8e1f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jul 6 19:35:55 2021 -0500

    Test installation in Travis CI

commit 551c6b4ee8cd9dd2e1d1b46c8dde09eb50b91b2c
Merge: 78eac6a0 f648df4e
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jul 6 19:32:53 2021 -0500

    Merge pull request #519 from awild82/oot_build_bugfix
    
    Fix installation from out-of-tree builds

commit f648df4e5588f069b2db96f8be320ead0c1967ef
Author: Andrew Wildman <apw4@uw.edu>
Date:   Tue Jul 6 16:35:12 2021 -0700

    Add symlink to blis.pc.in for out-of-tree builds

commit 78eac6a0ab78c995c3f4e46a9e87388b5c3e1af6
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jul 6 11:05:43 2021 -0500

    Revert "Always run `make check`."
    
    This reverts commit a201a53440c51244739aaee20e3309b50121cc68.

commit a201a53440c51244739aaee20e3309b50121cc68
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jul 5 21:39:18 2021 -0500

    Always run `make check`.
    
    I'm concerned that problems may lurk for `x86_64` builds on Windows which may be uncovered by a fuller `make check`.

commit 5ef7f684dc75fc707c82f919e0836615f90a2627
Merge: aaa10c87 ad6231cc
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jul 5 21:35:07 2021 -0500

    Merge pull request #515 from chengguosun/bug-fix
    
    Fixed configure script bug.

commit ad6231cca3fc1e477752ecd31b1ee2323398a642
Author: sunchengguo <sunchengguo@higon.com>
Date:   Tue Jul 6 07:30:00 2021 -0400

    Fixed configure script bug.
    Details:
    - Fixed kernel list string substitution error by adding function substitute_words in configure script.
      if the string contains zen and zen2, and zen need to be replaced with another string, then zen2
      also be incorrectly replaced.

commit d073fc9acac9d702556cab9fbbb3a253eeb1f998
Author: nicholaiTukanov <nicholaitukanov@gmail.com>
Date:   Fri Jul 2 19:54:33 2021 -0500

    Update POWER10.md

commit 907226c0af4afb6323b4e02be4f73f5fb89cddaf
Author: nicholaiTukanov <nicholaitukanov@gmail.com>
Date:   Fri Jul 2 19:47:18 2021 -0500

    Rework POWER10 sandbox
    
    - Add a testsuite for gathering performance (in GFLOPs) and measuring correctness for the POWER10 GEMM reduced precision/integer kernels.
    - Reworked GENERIC_GEMM template to hardcode the cache parameters.
    - Remove kernel wrapper that checked that only allowed matrices that weren't transposed or conjugated. However, the kernels still assume the matrices are not transposed. This wrapper was removed for performance reasons.
    - Renamed and restructured files and functions for clarity.
    - Editted the POWER10 document to reflect new changes.

commit aaa10c87e19449674a4ca30fa3b6392bb22c3a66
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 21 17:53:52 2021 -0500

    Skip clearing temp microtile in gemmlike sandbox.
    
    Details:
    - Removed code from gemmlike sandbox files bls_gemm_bp_var1.c and
      bls_gemm_bp_var2.c that initializes the elements of the temporary
      microtile to zero. This code, introduced recently in 7f7d726, did
      not actually fix any bug (despite that commit's log entry). The
      microtile does not need to be initialized because it is completely
      overwritten by a "beta = 0" invocation of gemm prior to it being
      read. Any NaNs or Infs present at the outset would have no impact
      on the output matrix C. Thanks to Devin Matthews for reminding me
      of this.

commit bc10a3f2ff518360c32bea825b3eb62a9e4c8a77
Merge: bf727636 6548ceba
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jun 18 19:01:08 2021 -0500

    Merge pull request #492 from flame/thunderx2-clang
    
    Allow clang for ThunderX2 config

commit bf727636632a368f3247dc8ab1d4b6119e9c511a
Merge: e28f2a2d 5fc93e28
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jun 18 18:59:43 2021 -0500

    Merge pull request #506 from xrq-phys/arm64-mac
    
    BLIS on Darwin_Aarch64

commit e28f2a2dfcff14e7094fce0b279b3a917b3ab98c
Merge: d10e05bb 56ffca6a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Jun 15 19:35:07 2021 -0500

    Merge pull request #513 from nicholaiTukanov/asm_warning_p9_fix
    
    Fix assembler warning in POWER9 DGEMM

commit 56ffca6a9bc67432a7894298739895f406e5f467
Author: nicholai <nicholai@ibm.com>
Date:   Tue Jun 15 18:17:39 2021 -0500

    Fix asm warning

commit 689fa0f40399bde1acc5367d6dd4e8fc4eb6f3ea
Merge: b683d01b d10e05bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jun 13 19:44:14 2021 -0500

    Merge branch 'master' into dev

commit d10e05bbd1ce45ce2c0dfe5c64daae2633357b3f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jun 13 19:36:16 2021 -0500

    Sandbox header edits trigger full library rebuild.
    
    Details:
    - Adjusted the top-level Makefile so that any change to a sandbox header
      file will result in blis.h being regenerated along with a full
      recompilation of the library. Previously, sandbox files were omitted
      from the list of header files that, when touched, could trigger a full
      rebuild. Why was it like that previously? Because originally we only
      envisioned using sandboxes to *replace* gemm, not augment the library
      with new functionality. When replacing gemm, blis.h does not need to
      contain any local sandbox defintions in order for the user to be able
      to (indirectly) use that sandbox. But if you are adding functions to
      the library, those functions need to be prototyped so the compiler
      can perform type checking against the user's invocation of those new
      functions. Thanks to Jeff Diamond for helping us discover this
      deficiency in the build system.

commit 7c3eb44efaa762088c190bb820ef6a3c87db8f65
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Jun 2 11:28:22 2021 -0500

    Add vhsubpd/vhsubpd.
    
    Horizontal subtraction instructions added to bli_x86_asm_macros.h, currently unused [ci skip].

commit 7f7d72610c25f511ba8cd2a53be7b59bdb80f3f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 31 16:50:18 2021 -0500

    Fixed bugs in cpackm kernels, gemmlike code.
    
    Details:
    - Fixed intermittent bugs in bli_packm_haswell_asm_c3xk.c and
      bli_packm_haswell_asm_c8xk.c whereby the imaginary component of the
      kappa scalar was incorrectly loaded at an offset of 8 bytes (instead
      of 4 bytes) from the real component. This was almost certainly a copy-
      paste bug carried over from the corresonding zpackm kernels. Thanks to
      Devin Matthews for bringing this to my attention.
    - Added missing code to gemmlike sandbox files bls_gemm_bp_var1.c and
      bls_gemm_bp_var2.c that initializes the elements of the temporary
      microtile to zero. (This bug was never observed in output but rather
      noticed analytically. It probably would have also manifested as
      intermittent failures, this time involving edge cases.)
    - Minor commented-out/disabled changes to testsuite/src/test_gemm.c
      relating to debugging.

commit 5fc93e280614b4a21a9cff36cf873b4b9407285b
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 29 18:44:47 2021 +0900

    Armv8A Rename Regs for Safe Darwin Compile
    
    Avoid x18 use in FP32 kernel:
    - C address lines x[18-26] renamed to x[19-27] (reg index +1)
    - Original role of x27 fulfilled by x5 which is free after k-loop pert.
    
    FP64 does not require changing since x18 is not used there.

commit 9f4a4a3cfb2244e4024445e127dafd2a11f39fc5
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 29 17:21:28 2021 +0900

    Armv8A Rename Regs for Clang Compile: FP32 Part
    
    Roughly the same as 916e1fa , additionally with x15 clobbering removed.
    - x15: Not used at all.
    
    Compilation w/ Clang shows warning about x18 reservation, but
    compilation itself is OK and all tests got passed.

commit 916e1fa8be3cea0e3e2a4a7e8b00027ac2ee7780
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 29 16:46:52 2021 +0900

    Armv8A Rename Regs for Clang Compile: FP64 Part
    
    - x7, x8: Used to store address for Alpha and Beta.
      As Alpha & Beta was not used in k-loops, use x0, x1 to load
      Alpha & Beta's addresses after k-loops are completed, since A & B's
      addresses are no longer needed there.
      This "ldr [addr]; -> ldr val, [addr]" would not cause much performance
      drawback since it is done outside k-loops and there are plenty of
      instructions between Alpha & Beta's loading and usage.
    - x9: Used to store cs_c. x9 is multiplied by 8 into x10 and not used
      any longer. Directly loading cs_c and into x10 and scale by 8 spares
      x9 straightforwardly.
    - x11, x12: Not used at all. Simply remove from clobber list.
    - x13: Alike x9, loaded and scaled by 8 into x14, except that x13 is
      also used in a conditional branch so that "cmp x13, #1" needs to be
      modified into "cmp x14, #8" to completely free x13.
    - x3, x4: Used to store next_a & next_b. Untouched in k-loops. Load
      these addresses into x0 and x1 after Alpha & Beta are both loaded,
      since then neigher address of A/B nor address of Alpha/Beta is needed.

commit 7fabd896af773623ed01820a71bbff432e8a7d25
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 29 16:28:03 2021 +0900

    Asm Flag Mingling for Darwin_Aarch64
    
    Apple+Arm64 requires additional "tagging" of local symbols.

commit 213dce32d2eed8b7a38c6a3f6112072b0a89ecd0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 28 14:49:57 2021 -0500

    Added a new 'gemmlike' sandbox.
    
    Details:
    - Added a new sandbox called 'gemmlike', which implements sequential and
      multithreaded gemm in the style of gemmsup but also unconditionally
      employs packing. The purpose of this sandbox is to
      (1) avoid select abstractions, such as objects and control trees, in
          order to allow readers to better understand how a real-world
          implementation of high-performance gemm can be constructed;
      (2) provide a starting point for expert users who wish to build
          something that is gemm-like without "reinventing the wheel."
      Thanks to Jeff Diamond, Tze Meng Low, Nicholai Tukanov, and Devangi
      Parikh for requesting and inspiring this work.
    - The functions defined in this sandbox currently use the "bls_" prefix
      instead of "bli_" in order to avoid any symbol collisions in the main
      library.
    - The sandbox contains two variants, each of which implements gemm via a
      block-panel algorithm. The only difference between the two is that
      variant 1 calls the microkernel directly while variant 2 calls the
      microkernel indirectly, via a function wrapper, which allows the edge
      case handling to be abstracted away from the classic five loops.
    - This sandbox implementation utilizes the conventional gemm microkernel
      (not the skinny/unpacked gemmsup kernels).
    - Updated some typos in the comments of a few files in the main
      framework.

commit 82af05f54c34526a60fd2ec46656f13e1ac8f719
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 25 15:25:08 2021 -0500

    Updated Fugaku (a64fx) performance results.
    
    Details:
    - Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
      entry within Performance.md, and also updated the experiment details
      accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
      experiments reflected in this commit.
    - In Performance.md, added an English translation of the project name
      under which the Fugaku results were gathered, courtesy of RuQing Xu.

commit e5c85da3763f73854ecd739ba3008bb467ed77c3
Merge: cbd8d393 5feb04e2
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon May 24 16:56:22 2021 -0500

    Merge pull request #503 from flame/windows-compiler-check
    
    Add explicit compiler check for Windows.

commit cbd8d3932599485727204479fded66ac19186db4
Merge: 6d4ab022 932dfe6a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon May 24 16:32:42 2021 -0500

    Merge pull request #500 from xrq-phys/armsve+travis
    
    Upgrade Travis CI for Arm SVE

commit 5feb04e233e1e6f81c727578ad9eae1367a2562f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun May 23 18:46:56 2021 -0500

    Add explicit compiler check for Windows.
    
    Check the C compiler for a predefined macro `_WIN32` to indicate (cross-)compilation for Windows. Fixes #463.

commit 6d4ab0223d9014ac2a66d66759536aa305be5867
Merge: 61584ded 859fb77a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun May 23 18:39:53 2021 -0500

    Merge pull request #502 from flame/rm-rm-dupls
    
    Remove `rm-dupls` function in common.mk.

commit 859fb77a320a3ace71d25a8885c23639b097a1b6
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sun May 23 18:15:23 2021 -0500

    Remove `rm-dupls` function in common.mk.
    
    AMD requested removal due to unclear licensing terms; original code was from stackoverflow. The function is unused but could easily be replaced by new implementation.

commit 932dfe6abb9617223bd26a249e53447169033f8c
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu May 20 02:07:31 2021 +0900

    Travis CI Revert Unnecessary Extras from 91d3636
    
    - Removed `V=1` in make line
    - Removed `CFLAGS` in configure line
    - Restored `pwd` surrounding OOT line

commit bd156a210d347a073a6939cc4adab3d9256c2e2b
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sun May 16 02:56:14 2021 +0900

    Adjust TravisCI
    
    - ArmSVE don't test gemmt (seems Qemu-only problem);
    - Clang use TravisCI-provided version instead of fixing to clang-8
      due to that clang-8 seems conflicting with TravisCI's clang-7.

commit 91d3636031021af3712d14c9fcb1eb34b6fe2a31
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Sat May 15 17:05:16 2021 +0900

    Travis Support Arm SVE
    
    - Updated distro to 20.04 focal aarch64-gcc-10.
      This is minimal version required by aarch64-gcc-10.
      SVE intrinsics would not compile without GCC >=10.
    - x86 toolchains use official repo instead of ubuntu-toolchain-r/test.
      20.04 focal is not supported by that PPA at the moment.
    - Add extra configuration-time options to .travis.yml.
    - Add Arm SVE entry to .travis.yml.

commit 61584deddf9b3af6d11a811e6e04328d22390202
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Wed May 19 23:52:29 2021 +0900

    Added 512b SVE-based a64fx subconfig + SVE kernels.
    
    Details:
    - Added 512-bit specific 'a64fx' subconfiguration that uses empirically
      tuned block size by Stepan Nassyr. This subconfig also sets the sector
      cache size and enables memory-tagging code in SVE gemm kernels. This
      subconfig utilizes (16, k) and (10, k) DPACKM kernels.
    - Added a vector-length agnostic 'armsve' subconfiguration that computes
      blocksizes according to the analytical model. This part is ported from
      Stepan Nassyr's repository.
    - Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE
      at size (2*VL, 10). These kernels use unindexed FMLA instructions
      because indexed FMLA takes 2 FMA units in many implementations.
      PS: There are indexed-FLMA kernels in Stepan Nassyr's repository.
    - Implemented 512-bit SVE dpackm kernels with in-register transpose
      support for sizes (16, k) and (10, k).
    - Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for
      size (12, k). This dpackm kernel is not currently used by any
      subconfiguration.
    - Implemented several experimental dgemmsup kernels which would
      improve performance in a few cases. However, those dgemmsup kernels
      generally underperform hence they are not currently used in any
      subconfig.
    - Note: This commit squashes several commits submitted by RuQing Xu via
      PR #424.

commit b683d01b9c4ea5f64c8031bda816beccfbf806a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 13 15:23:22 2021 -0500

    Use extra #undef when including ba/ex API headers.
    
    Details:
    - Inserted a "#include bli_xapi_undef.h" after each usage of the basic
      and expert API macro setup headers: bli_oapi_ba.h, bli_oapi_ex.h,
      bli_tapi_ba.h, and bli_tapi_ex.h. This is functionally equivalent to
      the previous status quo, in which each header made minimal #undef
      prior to its own definitions and then a single instance of
      "#include bli_xapi_undef.h" cleaned up any remaining macro defs after
      all other headers were used. This commit will guarantee that macro
      defs from the setup of one header (say, bli_oapi_ex.h) don't "infect"
      the definitions made in a subsequent header. As with this previous
      commit, this change does not fix any issue but rather attempts to
      avoid creating orphaned macro definitions that are only needed within
      a very limited scope.
    - Removed minimal #undef from bli_?api_[ba|ex].h.
    - Removed old commented-out lines from bli_?api_[ba|ex].h.

commit d4427a5b2f5cab5d2a64c58d87416628867c2b4a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 13 13:55:11 2021 -0500

    Minor preprocessor/header cleanup.
    
    Details:
    - Added frame/include/bli_xapi_undef.h, which explicitly undefines all
      macros defined in bli_oapi_ba.h, bli_oapi_ex.h, bli_tapi_ba.h, and
      bli_tapi_ex.h. (This is for safety and good cpp coding practice, not
      because it fixes anything.)
    - Added #include "bli_xapi_undef.h" to bli_l1v.h, bli_l1d.h, bli_l1f.h,
      bli_l1m.h, bli_l2.h, bli_l3.h, and bli_util.h.
    - Comment updates to bli_oapi_ba.h, bli_oapi_ex.h, bli_tapi_ba.h, and
      bli_tapi_ex.h.
    - Moved frame/3/bli_l3_ft_ex.h to local 'old' directory after realizing
      that nothing in BLIS used those function pointer types. Also commented
      out the "#include bli_l3_ft_ex.h" directive in frame/3/bli_l3.h.

commit 5aa63cd927b22a04e581b07d0b68ef391f4f9b1f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 12 19:53:35 2021 -0500

    Fixed typo in cpp guard in bli_util_ft.h.
    
    Details:
    - Changed #ifdef BLIS_OAPI_BASIC to #ifdef BLIS_TAPI_BASIC in
      bli_util_ft.h. This typo was causing some types to be redefined when
      they weren't supposed to be.

commit f0e8634775094584e89f1b03811ee192f2aaf67f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 12 18:45:32 2021 -0500

    Defined eqsc, eqv, eqm to test object equality.
    
    Details:
    - Defined eqsc, eqv, and eqm operations, which set a bool depending on
      whether the two scalars, two vectors, or two matrix operands are equal
      (element-wise). eqsc and eqv support implicit conjugation and eqm
      supports diagonal offset, diag, uplo, and trans parameters (in a
      manner consistent with other level-1m operations). These operations
      are currently housed under frame/util, at least for now, because they
      are not computational in nature.
    - Redefined bli_obj_equals() in terms of eqsc, eqv, and eqm.
    - Documented eqsc, eqv, and eqm in BLISObjectAPI.md and BLISTypedAPI.md.
      Also:
      - Documented getsc and setsc in both docs.
      - Reordered entry for setijv in BLISTypedAPI.md, and added separator
        bars to both docs.
      - Added missing "Observed object properties" clauses to various
        levle-1v entries in BLISObjectAPI.md.
    - Defined bli_apply_trans() in bli_param_macro_defs.h.
    - Defined supporting _check() function, bli_l0_xxbsc_check(), in
      bli_l0_check.c for eqsc.
    - Programming style and whitespace updates to bli_l1m_unb_var1.c.
    - Whitespace updates to bli_l0_oapi.c, bli_l1m_oapi.c
    - Consolidated redundant macro redefinition for copym function pointer
      type in bli_l1m_ft.h.
    - Added macros to bli_oapi_ba.h, _ex.h, and bli_tapi_ba.h, _ex.h that
      allow oapi and tapi source files to forego defining certain expert
      functions. (Certain operations such as printv and printm do not need
      to have both basic expert interfaces. This also includes eqsc, eqv,
      and eqm.)

commit 5d46dbee4a06ba5a422e19817836976f8574cb4f
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed May 12 18:42:09 2021 -0500

    Replace bli_dlamch with something less archaic (#498)
    
    Details:
    - Added new implementations of bli_slamch() and bli_dlamch() that use
      constants from the standard C library in lieu of dynamically-computed
      values (via code inherited from netlib). The previous implementation
      is still available when the cpp macro BLIS_ENABLE_LEGACY_LAMCH is
      defined by the subconfiguration at compile-time. Thanks to Devin
      Matthews for providing this patch, and to Stefano Zampini for
      reporting the issue (#497) that prompted Devin to propose the patch.

commit 6a89c7d8f9ac3f51b5b4d8ccb2630d908d951e6f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat May 1 18:54:48 2021 -0500

    Defined setijv, getijv to set/get vector elements.
    
    Details:
    - Defined getijv, setijv operations to get and set elements of a vector,
      in bli_setgetijv.c and .h.
    - Renamed bli_setgetij.c and .h to bli_setgetijm.c and .h, respectively.
    - Added additional bounds checking to getijm and setijm to prevent
      actions with negative indices.
    - Added documentation to BLISObjectAPI.md and BLISTypedAPI.md for getijv
      and setijv.
    - Added documentation to BLISTypedAPI.md for getijm and setijm, which
      were inadvertently missing.
    - Added a new entry to the FAQ titled "Why does BLIS have vector
      (level-1v) and matrix (level-1m) variations of most level-1
      operations?"
    - Comment updates.

commit 4534daffd13ed7a8983c681d3f5e9de17c9f0b96
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 27 18:16:44 2021 -0500

    Minor API breakage in bli_pack API.
    
    Details:
    - Changed bli_pack_get_pack_a() and bli_pack_get_pack_b() so that
      instead of returning a bool, they set a bool that is passed in by
      address. This does break the public exported API, but I expect very
      few users actually use this function. (This change is being made in
      preparation for a much more extensive commit relating to error
      checking.)

commit 6a4aa986ffc060d3e64ed230afe318b82630f8b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 23 13:10:01 2021 -0500

    Fixed typo in Table of Contents.

commit f6424b5b82160d346a09a0fbb526981ecf66cdb3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 23 13:08:06 2021 -0500

    Added dedicated Performance section to README.md.
    
    Details:
    - Spun off the Performance.md and PerformanceSmall.md links in the
      Documentation section into a new Performance section dedicated to
      those two links. (The previous entries remain redundantly listed
      within Documentation section.) Thanks to Robert van de Geijn for
      suggesting this change.

commit 40ce5fd241b9ad140bf57278d440f0598d7f15d8
Merge: 6280757b 1f3461a5
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Apr 21 09:54:25 2021 -0500

    Merge pull request #493 from cassiersg/patch-1
    
    Fix typo in FAQ.md

commit 1f3461a5a5a88510f913451a93e3190ec1556f39
Author: Gaëtan Cassiers <cassiersg@users.noreply.github.com>
Date:   Wed Apr 21 16:49:05 2021 +0200

    Fix typo in FAQ.md

commit 6548cebaf55a1f9bdb8417cc89dd0444d8f9c2e4
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Apr 14 13:00:42 2021 -0500

    Allow clang for ThunderX2 config
    
    Needed for compiling on e.g. Mac M1. AFAIK clang supports the same -mcpu flag for ThunderX2 as gcc.

commit 6280757be32f90fd77d8dd9357b07d9306e6f80d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 7 13:03:56 2021 -0500

    Minor updates to a64fx section of Performance.md.

commit 1e6ed823c6cd11f9b671779f3c8bdbd2bbb40f34
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Thu Apr 8 02:59:26 2021 +0900

    Additional A64fx Comments (#490)
    
    * Performance.md Update A64fx Comments
    
    - Reason for ARMPL's missing data;
    - Additional envs / flags for kernel selection;
    - Update BLIS SRC commit.
    
    * Include Another Fix in armsve-cfg-vendor
    
    A prototype was forgotten, causing that void* pointer was not fully returned.

commit 2688f21a5b073950f6f187c95917fdbb5aac234a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 6 19:02:37 2021 -0500

    Added Fujitsu A64fx (512-bit SVE) perf results.
    
    Details:
    - Added single-threaded and multithreaded performance results to
      docs/Performance.md. These results were gathered on the "Fugaku"
      Fujitsu A64fx supercomputer at the RIKEN Center for Computational
      Science in Kobe, Japan. Special thanks to RuQing Xu and Stepan
      Nassyr for their work in developing and optimizing A64fx support in
      BLIS and RuQing for gathering the performance data that is reflected
      in these new graphs.

commit ba3ba8da83d48397162139e11337c036a631ba79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 6 18:39:58 2021 -0500

    Minor updates and fixes to test/3/octave scripts.
    
    Details:
    - Fixed an issue where the wrong string was being passed in for the
      vendor legend string.
    - Changed the graph in which the legends appear.
    - Updates to runthese.m.

commit 09bd4f4f12311131938baa9f75d27e92b664d681
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 31 17:09:36 2021 -0500

    Add err_t* "return" parameter to malloc functions.
    
    Details:
    - Added an err_t* parameter to memory allocation functions including
      bli_malloc_intl(), bli_calloc_intl(), bli_malloc_user(),
      bli_fmalloc_align(), and bli_fmalloc_noalign(). Since these functions
      already use the return value to return the allocated memory address,
      they can't communicate errors to the caller through the return value.
      This commit does not employ any error checking within these functions
      or their callers, but this sets up BLIS for a more comprehensive
      commit that moves in that direction.
    - Moved the typedefs for malloc_ft and free_ft from bli_malloc.h to
      bli_type_defs.h. This was done so that what remains of bli_malloc.h
      can be included after the definition of the err_t enum. (This ordering
      was needed because bli_malloc.h now contains function prototypes that
      use err_t.)
    - Defined bli_is_success() and bli_is_failure() static functions in
      bli_param_macro_defs.h. These functions provide easy checks for error
      codes and will be used more heavily in future commits.
    - Unfortunately, the additional err_t* argument discussed above breaks
      the API for bli_malloc_user(), which is an exported symbol in the
      shared library. However, it's quite possible that the only application
      that calls bli_malloc_user()--indeed, the reason it is was marked for
      symbol exporting to begin with--is the BLIS testsuite. And if that's
      the case, this breakage won't affect anyone. Nonetheless, the "major"
      part of the so_version file has been updated accordingly to 4.0.0.

commit f9ad55ce7e12f59930605753959fcfd41a218d8d
Merge: 04502492 90508192
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 31 14:20:19 2021 -0500

    Merge branch 'master' into dev

commit 90508192f2d6ae95adc2a3ba9f4e5bad2c8d6fd2
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Mar 30 21:16:44 2021 -0500

    Update do_sde.sh (#489)
    
    Update to a newer version of SDE, and do a direct download as it seems you don't have to click-through the license anymore.

commit 22c6b5dc4c9cc21942f8ccc30891f9b4385a9504
Author: Nicholai Tukanov <nicholaitukanov@gmail.com>
Date:   Tue Mar 30 19:07:42 2021 -0500

    Fixed bug in power10 microkernel I/O. (#488)
    
    Details:
    - Fixed a bug in the POWER10 DGEMM kernel whereby the microkernel did
      not store the microtile result correctly due to incorrect indices
      calculations. (The error was introduced when I reorganized the
      'kernels/power10/3' directory.)

commit 04502492671456b94bcdee60b9de347b6763a32d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 28 19:11:43 2021 -0500

    Always stay initialized after BLAS compat calls.
    
    Details:
    - Removed the option to finalize BLIS after every BLAS call, which also
      means that BLIS would initialize at the beginning of every BLAS call.
      This option never really made sense and wasn't even implemented
      properly to begin with. (Because bli_init_auto() and _finalize_auto()
      were implemented in terms of bli_init_once() and _finalize_once(),
      respectively, the application would have only been able to call one
      BLAS routine before BLIS would find itself in a unusable, permanently
      uninitialized state.) Because this option was never meant for regular
      use, it never made it into configure as an actual configure-time
      option, and therefore this commit only removes parts of the code
      affected by the cpp macro guard BLIS_ENABLE_STAY_AUTO_INITIALIZED.

commit 3a6f41afb8197e831b6ce2f1ae7f63735685fa0a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 27 17:22:14 2021 -0500

    Renamed membrk files/vars/functions to pba.
    
    Details:
    - Renamed the files, variables, and functions relating to the packing
      block allocator from its legacy name (membrk) to its current name
      (pba). This more clearly contrasts the packing block allocator with
      the small block allocator (sba).
    - Fixed a typo in bli_pack_set_pack_b(), defined in bli_pack.c, that
      caused the function to erroneously change the value of the pack_a
      field of the global rntm_t instead of the pack_b field. (Apparently
      nobody has used this API yet.)
    - Comment updates.

commit 36cb4116d15cfef2d42ec4a834efd4a958f261b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 27 15:15:09 2021 -0500

    Switch allocator mutexes to static initialization.
    
    Details:
    - Switched the small block allocator (sba), as defined in bli_sba.c and
      bli_apool.c, to static initialization of its internal mutex. Did a
      similar thing for the packing block allocator (pba), which appears as
      global_membrk in bli_membrk.c.
    - Commented out bli_membrk_init_mutex() and bli_membrk_finalize_mutex()
      to ensure they won't be used in the future.
    - In bli_thrcomm_pthreads.c and .h, removed old, commented-out cpp
      blocks guarded by BLIS_USE_PTHREAD_MUTEX.

commit 159ca6f01a5f91b93513134c9470b69ff78f5354
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 24 15:57:32 2021 -0500

    Made test/3/octave scripts robust to missing data.
    
    Details:
    - Modified the octave scripts in test/3 so that the script does not
      choke when one or more of the expected OpenBLAS, Eigen, or vendor data
      files is missing. (The BLIS data set, however, must be complete.) When
      a file is missing, that data series is simply not included on that
      particular graph. Also factored out a lot of the redundant logic from
      plot_panel_4x5.m into a separate function in read_data.m.

commit 545e6c2f6d09d023b353002a9a43b11aa0c1d701
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 22 17:42:33 2021 -0500

    CHANGELOG update (0.8.1)

commit 8535b3e11d2297854991c4272932ce4974dda629
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 22 17:42:33 2021 -0500

    Version file update (0.8.1)

commit e56d9f2d94ed247696dda2cbf94d2ca05c7fc089
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 22 17:40:50 2021 -0500

    ReleaseNotes.md update in advance of next version.

commit ca83f955d45814b7d84f53933cdb73323c0dea2c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 22 17:21:21 2021 -0500

    CREDITS file update.

commit 57ef61f6cdb86957f67212aa59407f2f8e7f3d1a
Merge: bf1b578e e7a4a8ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 19 13:05:43 2021 -0500

    Merge branch 'master' of github.com:flame/blis

commit bf1b578ea32ea1c9dbf7cb3586969e8ae89aa5ef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 19 13:03:17 2021 -0500

    Reduced KC on skx from 384 to 256.
    
    Details:
    - Reduced the KC cache blocksize for double real on the skx subconfig
      from 384 to 256. The maximum (extended) KC was also reduced
      accordingly from 480 to 320. Thanks to Tze Meng Low for suggesting
      this change.

commit e7a4a8edc940942357e8e4c4594383a29a962f93
Author: Nicholai Tukanov <nicholaitukanov@gmail.com>
Date:   Wed Mar 17 19:43:31 2021 -0500

    Fix calculation of new pb size (#487)
    
    Details:
    - Added missing parentheses to the i8 and i4 instantiations of the
      GENERIC_GEMM macro in sandbox/power10/generic_gemm.c.

commit 4493cf516e01aba82642a43abe350943ba458fe2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 15 13:12:49 2021 -0500

    Redefined BLIS_NUM_ARCHS to update automatically.
    
    Details:
    - Changed BLIS_NUM_ARCHS from a cpp macro definition to the last enum
      value in the arch_t enum. This means that it no longer needs to get
      updated manually whenever new subconfigurations are added to BLIS.
      Also removed the explicit initial index assigment of 0 from the
      first enum value, which was unnecessary due to how the C language
      standard mandates indexing of enum values. Thanks to Devin Matthews
      for originally submitting this as a PR in #446.
    - Updated docs/ConfigurationHowTo.md to reflect the aforementioned
      change.

commit a4b73de84cdffcbe5cf71969a0f7f0f8202b3510
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 12 17:12:27 2021 -0600

    Disabled _self() and _equal() in bli_pthread API.
    
    Details:
    - Disabled the _self() and _equal() extensions to the bli_pthread API
      introduced in d479654. These functions were disabled after I realized
      that they aren't actually needed yet. Thanks to Devin Matthews for
      helping me reason through the appropriate consumer code that will
      appear in BLIS (eventually) in a future commit. (Also, I could never
      get the Windows branch to link properly in clang builds in AppVeyor.
      See the comment I left in the code, and #485, for more info.)

commit f9d604679d8715bc3e79a8630268446889b51388
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 11 16:57:55 2021 -0600

    Added _self() and _equal() to bli_pthread API.
    
    Details:
    - Expanded the bli_pthread API to include equivalents to pthread_self()
      and pthread_equal(). Implemented these two functions for all three cpp
      branches present within bli_pthread.c: systemless, Windows, and
      Linux/BSD.

commit fa9b3c8f6b3d5717f19832362104413e1a86dfb0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 11 15:13:51 2021 -0600

    Shuffled code in Windows branch of bli_pthreads.c.
    
    Details:
    - Reordered the definitions in the cpp branch in bli_pthreads.c that
      defines the bli_pthreads API in terms of Windows API calls. Also added
      missing comments that mark sections of the API, which brings the code
      into harmony with other cpp branches (as well as bli_pthread.h).

commit 95d4f3934d806b3563f6648d57a4e381d747caf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 11 13:50:40 2021 -0600

    Moved cpp macro redef of strerror_r to bli_env.c.
    
    Details:
    - Relocated the _MSC_VER-guarded cpp macro re-definition of strerror_r
      (in terms of strerror_s) from bli_thread.h to bli_env.c. It was
      likely left behind in bli_thread.h in a previous commit, when code
      that now resides in bli_env.c was moved from bli_thread.c. (I couldn't
      find any other instance of strerror_r being used in BLIS, so I moved
      the #define directly to bli_env.c rather than place it in bli_env.h.)
      The code that uses strerror_r is currently disabled, though, so this
      commit should have no affect on BLIS.

commit 8a3066c315358d45d4f5b710c54594455f9e8fc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 9 17:52:59 2021 -0600

    Relocated gemmsup_ref general stride handling.
    
    Details:
    - Moved the logic that checks for general stridedness in any of the
      matrix operands in a gemmsup problem. The logic previously resided
      near the top of bli_gemmsup_int(), which is the thread entry point
      for the parallel region of the current gemmsup implementation. The
      problem with this setup was that the code would attempt to reject
      problems with any general-strided operands by returning BLIS_FAILURE,
      and that return value was then being ignored by the l3_sup thread
      decorator, which unconditionally returns BLIS_SUCCESS. To solve this
      issue, rather than try to manage n return values, one from each of n
      threads, I simply moved the logic into bli_gemmsup_ref(). I didn't
      move it any higher (e.g. bli_gemmsup()) because I still want the
      logic to be part of the current gemmsup handler implementation. That
      is, perhaps someone else will create a different handler, and that
      author wants to handle general stride differently. (We don't want to
      force them into a particular way of handling general stride.)
    - Removed the general stride handling from bli_gemmtsup_int(), even
      though this function is inoperative for now.
    - This commit addresses issue #484. Thanks to RuQing Xu for reporting
      this issue.

commit 670bc7b60f6065893e8ec1bebd2fc9e5ba710dff
Author: Nicholai Tukanov <nicholaitukanov@gmail.com>
Date:   Fri Mar 5 13:53:43 2021 -0600

    Add low-precision POWER10 gemm kernels (#467)
    
    Details:
    - This commit adds a new BLIS sandbox that (1) provides implementations
      based on low-precision gemm kernels, and (2) extends the BLIS typed
      API for those new implementations. Currently, these new kernels can
      only be used for the POWER10 microarchitecture; however, they may
      provide a template for developing similar kernels for other
      microarchitectures (even those beyond POWER), as changes would likely
      be limited to select places in the microkernel and possibly the
      packing routines. The new low-precision operations that are now
      supported include: shgemm, sbgemm, i16gemm, i8gemm, i4gemm. For more
      information, refer to the POWER10.md document that is included in
      'sandbox/power10'.

commit b8dcc5bc75a746807d6f8fa22dc2123c98396bf5
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date:   Tue Mar 2 06:58:24 2021 +0800

    Fixed typed API definition for gemmt (#476)
    
    Details:
    - Fixed incorrect definition and prototype of bli_?gemmt() in
      frame/3/bli_l3_tapi.c and .h, respectively. gemmt was previously
      defined identically to gemm, which was wrong because it did not
      take into account the uplo property of C.
    - Fixed incorrect API documentation for her2k/syr2k in BLISTypedAPI.md.
      Specifically, the document erroneously listed only a single transab
      parameter instead of transa and transb.

commit a0e4fe2340a93521e1b1a835a96d0f26dec8406a
Author: Ilknur <ilknuri607@gmail.com>
Date:   Tue Mar 2 02:06:56 2021 +0400

    Fixed double free() in level1v example (#482)
    
    Details:
    - In exampls/tapi/00level1v.c, pointer 'z' was being freed twice and
      pointer 'a' was not being freed at all. This commit correctly frees
      each pointer exactly once.

commit f5871c7e06a75799251d6b55a8a5fbfa1a92cf95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Feb 28 17:03:57 2021 -0600

    Added complex asm packm kernels for 'haswell' set.
    
    Details:
    - Implemented assembly-based packm kernels for single- and double-
      precision complex domain (c and z) and housed them in the 'haswell'
      kernel set. This means c3xk, c8xk, z3xk, and z4xk are now all
      optimized.
    - Registered the aforementioned packm kernels in the haswell, zen,
      and zen2 subconfigs.
    - Minor modifications to the corresponding s and d packm kernels that
      were introduced in 426ad67.
    - Thanks to AMD, who originally contributed the double-precision real
      packm kernels (d6xk and d8xk), upon which these complex kernels are
      partially based.

commit 426ad679f55264e381eb57a372632b774320fb85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Feb 27 18:39:56 2021 -0600

    Added assembly packm kernels for 'haswell' set.
    
    Details:
    - Implemented assembly-based packm kernels for single- and double-
      precision real domain (s and d) and housed them in the 'haswell'
      kernel set. This means s6xk, s16xk, d6xk, and d8xk are now all
      optimized.
    - Registered the aforementioned packm kernels in the haswell, zen,
      and zen2 subconfigs.
    - Thanks to AMD, who originally contributed the double-precision real
      packm kernels (d6xk and d8xk), which I have now tweaked and used to
      create comparable single-precision real kernels (s6xk and s16xk).

commit f50c1b7e5886d29efe134e1994d05af9949cd4b6
Merge: 8f39aea1 b3953b93
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Feb 1 11:55:51 2021 -0600

    Merge pull request #473 from ajaypanyala/pkgconfig
    
    build: generate pkgconfig file

commit 8f39aea11f80a805b66cff4b4dc5e72727ea461d
Merge: f8db9fb3 2a815d5b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jan 30 17:59:56 2021 -0600

    Merge branch 'dev'

commit f8db9fb33b48844d6b47fdef699625bd9197745a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 28 08:04:52 2021 -0600

    Fixed missing parentheses in README.md Citations.

commit b3953b938eee59f79b4a4162ba583a5cb59fa34e
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date:   Tue Jan 12 17:07:04 2021 -0800

    drop CFLAGS in the generated pkgconfig file

commit b02d9376bac31c1a1c7916f44c4946277a1425e2
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date:   Mon Jan 11 20:50:01 2021 -0800

    add datadir

commit d8d8deeb6d8b84adb7ae5fdb88c6dd4f06624a76
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date:   Mon Jan 11 17:47:50 2021 -0800

    generate pkgconfig file

commit 8c65411c7c8737248a6f054ffa0ce008c95cb515
Merge: 328b4f88 874c3f04
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jan 11 16:01:45 2021 -0600

    Merge pull request #471 from flame/fix-470
    
    Fix kernel-to-config mapping for intel64

commit 874c3f04ece9af4d8fdf0e2713e21a259c117656
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jan 8 13:56:30 2021 -0600

    Update configure
    
    Choose last sub-config in the kernel-to-config map if the config list doesn't contain the name of the kernel set. E.g. for "zen: skx knl haswell" pick "haswell" instead of "skx" which was chosen previously. Fixes #470.

commit 2a815d5b365d934cb351b2f2a8cd1366e997b2e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 4 18:03:39 2021 -0600

    Support trsm pre-inversion in 1m, bb, ref kernels.
    
    Details:
    - Expanded support for disabling trsm diagonal pre-inversion to other
      microkernel types, including the reference microkernel as well as the
      kernel implementations for 1m and the pre-broadcast B (bb) format used
      by the power9 subconfig. This builds on the 'haswell' and 'penryn'
      kernel support added in 7038bba. Thanks to Bhaskar Nallani for
      reminding me, in #461 (post-closure), that 1m support was missing from
      that commit.
    - Removed cpp branch of ref_kernels/3/bli_trsm_ref.c that contained the
      omp simd implementation after making a stripped-down copy in 'old'.
      This code has been disabled for some time and it seemed better suited
      to rot away out of sight rather than clutter up a file that is already
      cluttered by the presence of lower and upper versions.
    - Minor comment update to bli_ind_init().

commit c3ed2cbb9f60100fc9beb2a9d75476de9f711dc5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 4 16:16:32 2021 -0600

    Enable 1m only if real domain ukr is not reference.
    
    Details:
    - Previously, BLIS would automatically enable use of the 1m method
      for a given precision if the complex domain microkernel was a
      reference kernel. This commit adds an additional constraint so that
      1m is only enabled if the corresponding real domain microkernel is
      NOT reference. That is, BLIS now forgos use of 1m if both the real and
      complex domain kernels are reference implementations. Note that this
      does not prevent 1m from being enabled manually under those
      conditions; it only means that 1m will not be enabled automatically
      at initialization-time.

commit ed50c947385ba3b0b5d550015f38f7f0a31755c0
Merge: 0cef09aa 328b4f88
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 4 14:31:44 2021 -0600

    Merge branch 'master' into dev

commit 328b4f8872b4bca9a53d2de8c6e285f3eb13d196
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Dec 30 17:54:18 2020 -0600

    Shared object (dylib) was not built correctly for partial build.
    
    The SO build rule used $? instead of $^. Observed on macOS, not sure if it affected Linux or not.

commit ae6ef66ef824da9bc6348bf9d1b588cd4f2ded9b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Dec 30 17:34:55 2020 -0600

    bli_diag_offset_with_trans had wrong return type. Fixes #468.

commit ebcf197fb86fdd0a864ea928140752bc2462e8c6
Merge: 472f138c 21aa67e1
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Dec 5 22:26:27 2020 -0600

    Merge pull request #466 from isuruf/patch-3
    
    fix cc_vendor for crosstool-ng toolchains

commit 21aa67e11cebbc5a6dd7c6353154256294df3c33
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Sat Dec 5 21:59:13 2020 -0600

    fix cc_vendor for crosstool-ng toolchains

commit 472f138cb927b7259126ebb9c68919cfcc7a4ea3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Dec 5 14:13:52 2020 -0600

    Fixed typo in README.md to CodingConventions.md.

commit 0cef09aa92208441a656bf097f197ea8e22b533b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 4 16:40:59 2020 -0600

    Consolidated code in level-3 _front() functions.
    
    Details:
    - Reduced a code segment that appears in all of the bli_*_front()
      functions except for bli_gemm_front(). Previously, the code looked
      like this (taken from bli_herk_front()):
    
        if ( bli_cntx_method( cntx ) == BLIS_NAT )
        {
            bli_obj_set_pack_schema( BLIS_PACKED_ROW_PANELS, &a_local );
            bli_obj_set_pack_schema( BLIS_PACKED_COL_PANELS, &ah_local );
        }
        else // if ( bli_cntx_method( cntx ) != BLIS_NAT )
        {
            pack_t schema_a = bli_cntx_schema_a_block( cntx );
            pack_t schema_b = bli_cntx_schema_b_panel( cntx );
    
            bli_obj_set_pack_schema( schema_a, &a_local );
            bli_obj_set_pack_schema( schema_b, &ah_local );
        }
    
      This code segment is part of a sort-of-hack that allows us to
      communicate the pack schemas into the level-3 thread decorator, which
      needs them so that they can be passed into bli_l3_cntl_create_if(),
      where the control tree is created. However, the first conditional case
      above is unnecessary because the second case is fully generalized.
      That is, even in the native case, the context contains correct,
      queryable schemas. Thus, these code segments were reduced to something
      like:
    
        pack_t schema_a = bli_cntx_schema_a_block( cntx );
        pack_t schema_b = bli_cntx_schema_b_panel( cntx );
    
        bli_obj_set_pack_schema( schema_a, &a_local );
        bli_obj_set_pack_schema( schema_b, &ah_local );
    
      There's always a small chance that the seemingly unnecessary code
      in the first branch case has some special use that is not apparent to
      me, but the testsuite's default input parameters seem to think this
      commit will be fine.

commit 7038bbaa05484141195822291cf3ba88cbce4980
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 4 16:08:15 2020 -0600

    Optionally disable trsm diagonal pre-inversion.
    
    Details:
    - Implemented a configure-time option, --disable-trsm-preinversion, that
      optionally disables the pre-inversion of diagonal elements of the
      triangular matrix in the trsm operation and instead uses division
      instructions within the gemmtrsm microkernels. Pre-inversion is
      enabled by default. When it is disabled, performance may suffer
      slightly, but numerical robustness should improve for certain
      pathological cases involving denormal (subnormal) numbers that would
      otherwise result in overflow in the pre-inverted value. Thanks to
      Bhaskar Nallani for reporting this issue via #461.
    - Added preprocessor macro guards to bli_trsm_cntl.c as well as the
      gemmtrsm microkernels for 'haswell' and 'penryn' kernel sets pursuant
      to the aforementioned feature.
    - Added macros to frame/include/bli_x86_asm_macros.h related to division
      instructions.

commit 78aee79452cce2691c40f05b3632bdfc122300af
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 2 13:02:36 2020 -0600

    Allow amaxv testsuite module to run with dim = 0.
    
    Details:
    - Exit early from libblis_test_amaxv_check() when the vector dimension
      (length) of x is 0. This allows the module to run when the testsuite
      driver passes in a problem size of 0. Thanks to Meghana Vankadari for
      alerting us to this issue via #459.
    - Note: All other testsuite modules appear to work with problem sizes
      of 0, except for the microkernel modules. I chose not to "fix" those
      modules because a failure (or segmentation fault, as happens in this
      case) is actually meaningful in that it alerts the developer that some
      microkernels cannot be used with k = 0. Specifically, the 'haswell'
      kernel set contains microkernels that preload elements of B. Those
      microkernels would need to be restructured to avoid preloading in
      order to support usage when k = 0.

commit 92d2b12a44ee0990c22735472aeaf1c17deb2d9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 2 13:02:00 2020 -0600

    Fixed obscure testsuite gemmt dependency bug.
    
    Details:
    - Fixed a bug in the gemmt testsuite module that only manifested when
      testing of gemmt is enabled but testing of gemv is disabled. The bug
      was due to a copy-paste error dating back to the introduction of gemmt
      in 88ad841.

commit b43dae9a5d2f078c9bbe07079031d6c00a68b7de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 1 16:44:38 2020 -0600

    Fixed copy-paste bugs in edge-case sup kernels.
    
    Details:
    - Fixed bugs in two sup kernels, bli_dgemmsup_rv_haswell_asm_1x6() and
      bli_dgemmsup_rd_haswell_asm_1x4(), which involved extraneous assembly
      instructions that were left over from when the kernels were first
      written. These instructions would cause segmentation faults in some
      situations where extra memory was not allocated beyond the end of
      the matrix buffers. Thanks to Kiran Varaganti for reporting these
      bugs and to Bhaskar Nallani for identifying the cause and solution.

commit 11dfc176a3c422729f453f6c23204cf023e9954d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 1 19:51:27 2020 +0000

    Reorganized thread auto-factorization logic.
    
    Details:
    - Reorganized logic of bli_thread_partition_2x2() so that the primary
      guts were factored out into "fast" and "slow" variants. Then added
      logic to the "fast" variant that allows for more optimal thread
      factorizations in some situations where there is at least one factor
      of 2.
    - Changed BLIS_THREAD_RATIO_M from 2 to 1 in bli_kernel_macro_defs.h and
      added comments to that file describing BLIS_THREAD_RATIO_? and
      BLIS_THREAD_MAX_?R.
    - In bli_family_zen.h and bli_family_zen2.h, preprocessed out several
      macros not used in vanilla BLIS and removed the unused macro
      BLIS_ENABLE_ZEN_BLOCK_SIZES from the former file.
    - Disabled AMD's small matrix handling entry points in bli_syrk_front.c
      and bli_trsm_front.c. (These branches of small matrix handling have
      not been reviewed by vanilla BLIS developers.)
    - Added commented-out calls printf() to bli_rntm.c.
    - Whitespace changes to bli_thread.c.

commit 6d3bafacd7aa7ad198762b39490876c172bfbbcb
Author: Devin Matthews <damatthews@smu.edu>
Date:   Sat Nov 28 17:17:56 2020 -0600

    Update BuildSystem.md
    
    Add git version >= 1.8.5 requirement (see #462).

commit 64856ea5a61b01d585750815788b6a775f729647
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 23 16:54:51 2020 -0600

    Auto-reduce (by default) prime numbers of threads.
    
    Details:
    - When requesting multithreaded parallelism by specifying the total
      number of threads (whether it be via environment variable, globally at
      runtime, or locally at runtime), reduce the number of threads actually
      used by one if the original value (a) is prime and (b) exceeds a
      minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which is set
      to 11 by default. If, when specifying the total number of threads (and
      not the individual ways of parallelism for each loop), prime numbers
      of threads are desired, this feature may be overridden by defining the
      BLIS_ENABLE_AUTO_PRIME_NUM_THREADS macro in the bli_family_*.h that
      corresponds to the configuration family targeted at configure-time.
      (For now, there is no configure option(s) to control this feature.)
      Thanks to Jeff Diamond for suggesting this change.
    - Defined a new function in bli_thread.c, bli_is_prime(), that returns a
      bool that determines whether an integer is prime. This function is
      implemented in terms of existing functions in bli_thread.c.
    - Updated docs/Multithreading.md to document the above feature, along
      with unrelated minor edits.

commit 55933b6ff6b9b8a12041715f42bba06273d84b74
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 20 10:39:32 2020 -0600

    Added missing attribution to docs/ReleaseNotes.md.

commit e310f57b4b29fbfee479e0f9fe2040851efdec4f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 19 13:33:37 2020 -0600

    CHANGELOG update (0.8.0)

commit 9b387f6d5a010969727ec583c0cdd067a5274ed8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 19 13:33:37 2020 -0600

    Version file update (0.8.0)

commit 2928ec750d3a3e1e5d55de5b57ddc04e9d0bd796
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 18 18:31:35 2020 -0600

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated docs/ReleaseNotes.md in preparation for next version.

commit b9899bedff6854639468daa7a973bb14ca131a74
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 18 16:52:41 2020 -0600

    CREDITS file update.

commit 9bb23e6c2a44b77292a72093938ab1ee6e6cc26a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 16 15:55:45 2020 -0600

    Added support for systemless build (no pthreads).
    
    Details:
    - Added a configure option, --[enable|disable]-system, which determines
      whether the modest operating system dependencies in BLIS are included.
      The most notable example of this on Linux and BSD/OSX is the use of
      POSIX threads to ensure thread safety for when application-level
      threads call BLIS. When --disable-system is given, the bli_pthreads
      implementation is dummied out entirely, allowing the calling code
      within BLIS to remain unchanged. Why would anyone want to build BLIS
      like this? The motivating example was submitted via #454 in which a
      user wanted to build BLIS for a simulator such as gem5 where thread
      safety may not be a concern (and where the operating system is largely
      absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
    - Another, more minor side effect of the --disable-system option is that
      the implementation of bli_clock() unconditionally returns 0.0 instead
      of the time elapsed since some fixed point in the past. The reasoning
      for this is that if the operating system is truly minimal, the system
      function call upon which bli_clock() would normally be implemented
      (e.g. clock_gettime()) may not be available.
    - Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
      to remove redundancies.
    - Removed old comments and commented #include of "bli_pthread_wrap.h"
      from bli_system.h.
    - Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
      and BLISTypedAPI.md, with a note that both are non-functional when
      BLIS is configured with --disable-system.

commit 88ad84143414644df4c56733b1cf91a36bfacaf8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 14 09:39:48 2020 -0600

    Squash-merge 'pr' into 'squash'. (#457)
    
    Merged contributions from AMD's AOCL BLIS (#448).
    
    Details:
    - Added support for level-3 operation gemmt, which performs a gemm on
      only the lower or upper triangle of a square matrix C. For now, only
      the conventional/large code path will be supported (in vanilla BLIS).
      This was accomplished by leveraging the existing variant logic for
      herk. However, some of the infrastructure to support a gemmtsup is
      included in this commit, including
      - A bli_gemmtsup() front-end, similar to bli_gemmsup().
      - A bli_gemmtsup_ref() reference handler function.
      - A bli_gemmtsup_int() variant chooser function (with variant calls
        commented out).
    - Added support for inducing complex domain gemmt via the 1m method.
    - Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
    - Added gemmt test module to testsuite.
    - Added standalone gemmt test driver to 'test' directory.
    - Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
    - Added a C++ template header (blis.hh) containing a BLAS-inspired
      wrapper to a set of polymorphic CBLAS-like function wrappers defined
      in another header (cblas.hh). These two headers are installed if
      running the 'install' target with INSTALL_HH is set to 'yes'. (Also
      added a set of unit tests that exercise blis.hh, although they are
      disabled for now because they aren't compatible with out-of-tree
      builds.) These files now live in the 'vendor' top-level directory.
    - Various updates to 'zen' and 'zen2' subconfigurations, particularly
      within the context initialization functions.
    - Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
      various minor updates to dotv and scalv kernels. Also added various
      sup kernels contributed by AMD to kernels/zen/3. However, these
      kernels are (for now) not yet used, in part because they caused
      AppVeyor clang failures, and also because I have not found time to
      review and vet them.
    - Output the python found during configure into the definition of PYTHON
      in build/config.mk (via build/config.mk.in).
    - Added early-return checks (A, B, or C with zero dimension; alpha = 0)
      to bli_gemm_front.c.
    - Implemented explicit beta = 0 handling in for the sgemm ukernel in
      bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
      bug surfaced because the gemmt module verifies its computation using
      gemm with its beta parameter set to zero, which, on a cortexa15 system
      caused the gemm kernel code to unconditionally multiply the
      uninitialized C data by beta. The C matrix likely contained
      non-numeric values such as NaN, which then would have resulted in a
      false failure.
    - Fixed a bug whereby the implementation for bli_herk_determine_kc(),
      in bli_l3_blocksize.c, was inadvertantly being defined in terms of
      helper functions meant for trmm. This bug was probably harmless since
      the trmm code should have also done the right thing for herk.
    - Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
      kernels/zen/3/bli_gemm_small.c since those macros are not used in
      vanilla BLIS.
    - Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
      accommodate C++'s stricter type checking.
    - Added cpp guard to test/*.c drivers that facilitate compilation on
      Windows systems.
    - Various whitespace changes.

commit 234b8b0cf48f1ee965bd7999b291fc7add3b9a54
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 12 19:11:16 2020 -0600

    Increased dotxaxpyf testsuite thresholds.
    
    Details:
    - Increased the test thresholds used by the dotxaxpyf testsuite module
      by a factor of five in order to avoid residuals that unnecessarily
      fall in the MARGINAL range. This commit should fix #455. Thanks to
      @nagsingh for reporting this issue.

commit ed612dd82c50063cfd23576a6b2465213d31b14b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 7 13:09:42 2020 -0600

    Updated README.md with sgemmsup blurb.
    
    Details:
    - Added an entry to the "What's New" section of the README.md to
      announce the availability of sgemmsup.

commit e14424f55b15d67e8d18384aea45a11b9b772e02
Merge: 0cfe1aac eccdd75a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 7 13:02:50 2020 -0600

    Merge branch 'dev'

commit 0cfe1aac222008a78dff3ee03ef5183413936706
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 30 17:10:36 2020 -0500

    Relocated operation index to ToC in API docs.
    
    Details:
    - Moved the "Operation index" section of both the BLISObjectAPI.md and
      BLISTypedAPI.md docs to appear immediately after the table of contents
      of each document. This allows the reader to quickly jump to the
      documentation for any operation without having to scroll through much
      of the document (when rendered via a web browser).
    - Fixed a mistake in the BLISObjectAPI.md for the setd operation, which
      does *not* observe the diag property of its matrix argument. Thanks to
      Jeff Diamond for reporting this.

commit 2a0682f8e5998be536da313525292f0da6193147
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Oct 18 18:04:03 2020 -0500

    Implemented runtime subconfig selection (#451).
    
    Details:
    - Implemented support for the user manually overriding the automatic
      subconfiguration selection that happens at runtime. This override
      can be requested by setting the BLIS_ARCH_TYPE environment variable.
      The variable must be set to the arch_t id (as enumerated in
      bli_type_defs.h) corresponding to the desired subconfiguration. If a
      value outside this enumerated range is given, BLIS will abort with an
      error message. If the value is in the valid range but corresponds to a
      subconfiguration that was not activated at configure-time/compile-time,
      BLIS will abort with a (different) error message. Thanks to decandia50
      for suggesting this feature via issue #451.
    - Defined a new function bli_gks_lookup_id to return the address of an
      internal data structure within the gks. If this address is NULL, then
      it indicates that the subconfig corresponding to the arch_t id passed
      into the function was not compiled into BLIS. This function is used
      in the second of the two abort scenarios described above.
    - Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which
      is returned for the latter of the two abort scenarios mentioned above,
      along with a corresponding error message and a function to perform
      the error check.
    - Added cpp macro branching to bli_env.c to support compilation of the
      auto-detect.x executable during configure-time. This cpp branch is
      similar to the cpp code already found in bli_arch.c and bli_cpuid.c.
    - Cleaned up the auto_detect() function to facilitate easier maintenance
      going forward. Also added a convenient debug switch that outputs the
      compilation command for the auto-detect.x executable and exits.

commit eccdd75a2d8a0c46e91e94036179c49aa5fa601c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 9 15:44:16 2020 -0500

    Whitespace tweak in docs/PerformanceSmall.md.

commit 7677e9ba60ac27496e3421c2acc7c239e3f860e9
Merge: addcd46b a0849d39
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 9 15:41:25 2020 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit addcd46b0559d401aa7d33d4c7e6f63f5313a8e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 9 15:41:09 2020 -0500

    Added Epyc 7742 Zen2 ("Rome") sup perf results.
    
    Details:
    - Added single-threaded and multithreaded sup performance results to
      docs/PerformanceSmall.md for both sgemm and dgemm. These results were
      gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
      microarchitecture. Special thanks to Jeff Diamond for facilitating
      access to the system via the Oracle Cloud.
    - Updates to octave scripts in test/sup/octave for use with Octave 5.2
      and for use with subplot_tight().
    - Minor updates to octave scripts in test/3/octave.
    - Renamed files containing the previous Zen performance results for
      consistency with the new results.
    - Decreased line thickness slightly in large/conventional Zen2 graphs.
      I'm done tweaking those this time. Really.
    - Added missing line regarding eigen header installation for each
      microarchitecture section.

commit a0849d390d04067b82af937cda8191b049b98915
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 9 20:22:17 2020 +0000

    Register l3 sup kernels in zen2 subconfig.
    
    Details:
    - Registered full suite of sgemm and dgemm sup millikernels, blocksizes,
      and crossover thresholds in bli_cntx_init_zen2.c.
    - Minor updates to test/sup/runme.sh for running on Zen2 Epyc 7742
      system.

commit d98368c32d5fbfaab8966ee331d9bcb5c4fe7a59
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 8 19:05:51 2020 -0500

    Another tweak to line thickness of Zen2 graphs.

commit 1855dfbdaafa37892b36c97fd317fd5d8da76676
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 8 19:01:00 2020 -0500

    Tweaked line thickness in Zen2 graphs once more.
    
    Details:
    - Decreased (relative to previous commit) line thickness in recent Zen2
      graphs.

commit 0991611e7ed82889c53a5c3f1ef1d49552c50d61
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 8 18:54:49 2020 -0500

    Increased line thickness in recent Zen2 graphs.
    
    Details:
    - Increased the width of the lines in the graphs introduced in 74ec6b8.

commit 8273cbacd7799e9af59e5320d66055f2f5d9cb31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 7 14:51:33 2020 -0500

    README.md, docs/FAQ.md updates.
    
    Details:
    - Added a frequently asked question to docs/FAQ.md regarding the
      difference between upstream (vanilla) BLIS and AMD BLIS.
    - Updated the name of ICES in the README.md to reflect the Oden
      rebranding.

commit a178a822ad3d5021489a0e61f909d8550ae12a8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 30 16:00:52 2020 -0500

    Added Zen2 links to docs/Performance.md Contents.

commit 74ec6b8f457cabe37d2382aaab35ba04fc737948
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 30 15:54:18 2020 -0500

    Added Epyc 7742 Zen2 ("Rome") performance results.
    
    Details:
    - Added single-threaded and multithreaded performance results to
      docs/Performance.md. These results were gathered on an Epyc 7742
      "Rome" server with AMD's Zen2 microarchitecture. Special thanks
      to Jeff Diamond for facilitating access to the system via the
      Oracle Cloud.
    - Renamed files containing the previous Zen performance results for
      consistency with the new results.

commit bc4a213a2c3dcf8bbfcbb3a1ef3e9fc9e3226c34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 30 15:28:20 2020 -0500

    Updated matlab (now octave) plot code in test/3.
    
    Details:
    - Renamed test/3/matlab to test/3/octave.
    - Within test/3, updated and tuned plot_l3_perf.m and plot_panel_4x5.m
      files for use with octave (which is free and doesn't crash on me
      mid-way through my use of subplot).
    - Updated runthese.m scratchpad for zen2 invocations.
    - Added Nikolay S.'s subplot_tight() function, along with its license.

commit c77ddc418187e1884fa6bcfe570eee295b9cb8bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 30 20:15:43 2020 +0000

    Added optional numactl usage to test/3/runme.sh.

commit 2d8ec164e7ae4f0c461c27309dc1f5d1966eb003
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date:   Tue Sep 29 16:52:18 2020 -0500

    Add POWER10 support to BLIS (#450)

commit 4fd8d9fec2052257bf2a5c6e0d48ae619ff6c3e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 28 23:39:05 2020 +0000

    Tweaked zen2 subconfig's MC cache blocksizes.
    
    Details:
    - Updated the MC cache blocksizes registered by the 'zen2' subconfig.
    - Minor updates to test/3/Makefile and test/3/runme.sh.

commit 5efcdeffd58af621476d179afc0c19c0f912baa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 25 14:25:24 2020 -0500

    More minor README.md updates.

commit 9e940f8aad6f065ea1689e791b9a4e1fb7900c40
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 25 13:53:35 2020 -0500

    Added 1m SISC bibtex to README.md.
    
    Details:
    - Added final citation info to 1m bibtex in README.md file.
    - Updated draft 1m paper link.
    - Changed some http to https.

commit e293cae2d1b9067261f613f25eaa0e871356b317
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 15 16:09:11 2020 -0500

    Implemented sgemmsup assembly kernels.
    
    Details:
    - Created a set of single-precision real millikernels and microkernels
      comparable to the dgemmsup kernels that already exist within BLIS.
    - Added prototypes for all kernels within bli_kernels_haswell.h.
    - Registered entry-point millikernels in bli_cntx_init_haswell.c and
      bli_cntx_init_zen.c.
    - Added sgemmsup support to the Makefile, runme.sh script, and source
      file in test/sup. This included edits that allow for separate "small"
      dimensions for single- and double-precision as well as for single-
      vs. multithreaded execution.

commit 2765c6f37c11cb7f71cd4b81c64cea6130636c68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 12 17:48:15 2020 -0500

    Type saga continues; fixed sgemm ukernel signature.
    
    Details:
    - Changed double* pointers in sgemm function signature to float*. At
      this point I've lost track of whether this was my fault or another
      dormant bug like the one described in ece9f6a, but at this point I
      no longer care. It's one of those days (aka I didn't ask for this).

commit 0779559509e0a1af077530d09ed151dac54f32ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 12 17:37:21 2020 -0500

    Fixed missing restrict in knl sgemm prototype.
    
    Details:
    - Added a missing 'restrict' qualifier in the sgemm ukernel prototype
      for knl. (Not sure how that code was ever compiling before now.)

commit ece9f6a3ef1b26b53ecf968cd069df7a85b139fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 12 17:22:42 2020 -0500

    Fixed dormant type bugs in bli_kernels_knl.h.
    
    Details:
    - Fixed dormant type mismatches in the use of the prototype-generating
      macros in bli_kernels_knl.h. Specifically, some float prototypes
      were incorrectly using double as their ctype. This didn't actually
      matter until the type changes in 645d771, as previously those types
      were not used since packm was prototyped with void* pointers.

commit 8ebb3b60e1c4c045ddb48e02de6e246cecde24a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 12 17:00:47 2020 -0500

    Fixed accidental breakage in 645d771.
    
    Details:
    - In trying to clean up kappa_cast variables in the reference packm
      kernels, which I initally believed to be redundant given the other
      void* -> ctype* changes in 645d771, I accidentally ended up violating
      restrict semantics for 1e/1r packing and possibly other packm kernels.
      (Normally, my pre-commit testsuite run would have caught this, but I
      was unknowingly using an edited input.operations file in which I'd
      disabled most tests as part of unrelated work.) This commit reverts
      the kappa_cast changes in 645d771.

commit 645d771a14ae89aa7131d6f8f4f4a8090329d05e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 12 15:31:56 2020 -0500

    Minor packm kernel type cleanup (void* -> ctype*).
    
    Details:
    - Changed all void* function arguments in reference packm kernels to
      those of the native type (ctype*). These pointers no longer need to
      be void* and are better represented by their native types anyway.
      (See below for details.) Updated knl packm kernels accordingly.
    - In the definition of the PACKM_KER_PROT prototype macro template in
      frame/1m/bli_l1m_ker_prot.h, changed the pointer types for kappa, a,
      and p from void* to ctype*. They were originally void* because these
      function signatures had to share the same type so they could all be
      stored in a single array of that shared type, from which they were
      queried and called by packm_cxk(). This is no longer how the function
      pointers are stored, and so it no longer makes sense to force the
      caller of packm kernels to use void*, only so that the implementor
      of the packm kernels can typecast back to the native datatype within
      the kernel definition. This change has no effect internally within
      BLIS because currently all packm kernels are called after querying
      the function addresses from the context and then typecasting to the
      appropriate function pointer type, which is based upon type-specific
      function pointers like float* and double*.
    - Removed a comment in frame/1m/bli_l1m_ft_ker.h that was outdated and
      misleading due to changes to the handling of packm kernels since
      moving them into the context.

commit 54bf6c35542a297e25bc8efec6067a6df80536f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 10 15:42:01 2020 -0500

    Minor README.md update.
    
    Details:
    - Added a new entry to the "What people are saying about BLIS" section.

commit e50b4d40462714ae33df284655a2faf7fa35f37c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 9 14:12:53 2020 -0500

    Minor update to README.md (SIAM Best Paper Prize).

commit a8efb72074691e2610372108becd88b4b392299e
Merge: b0c4da17 97e87f2c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Sep 7 16:18:19 2020 -0500

    Merge pull request #434 from flame/intel-zdot
    
    Add an option to change the complex return type.

commit 97e87f2c9f3878a05e1b7c6ec237ee88d9a72a42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 7 15:56:42 2020 -0500

    Whitespace/comment updates to #434 PR.

commit b0c4da1732b6c6a9ff66f70c36e4722e0f9645ae
Merge: 810e90ee b1b5870d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Sep 7 15:47:54 2020 -0500

    Merge pull request #436 from flame/s390x
    
    Add checks so that s390x is detected as 64-bit.

commit 810e90ee806510c57504f0cf8eeaf608d38bd9dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 1 16:11:40 2020 -0500

    Minor README.md update.
    
    Details:
    - Added HPE to list of funders.
    - Changed http to https in funders' website links.

commit 7d411282196e036991c26e52cb5e5f85769c8059
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 13 17:50:58 2020 -0500

    Use -O2 for all framework code. (#435)
    
    It seems that -O3 might be causing intermittent problems with the f2c'ed packed and banded code. -O3 is retained for kernel code. Fixes #341 and fixes #342.

commit 9c5b485d356367b0a1288761cd623f52036e7344
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Fri Aug 7 20:11:18 2020 +0000

    Don't override -mcpu with -march on ARM (#353)
    
    * Use -mcpu for ARM
    See the GCC doc about -march, -mtune, and -mpu and maybe
    https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu
    
    * Fix typo in flags
    
    * Fix typo in cortexa9 flags
    
    * Modify cortexa53 compilation flags to fix failing BLAS check (#341)

commit c253d14a72a746b670b3ffbb6e81bcafc73d1133
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Aug 7 09:39:04 2020 -0500

    Also handle Intel-style complex return in CBLAS interface.

commit 5d653a11a0cc71305d0995507b1733995856f475
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 6 17:58:26 2020 -0500

    Update Multithreading.md
    
    Addresses the issue raised in #426.

commit b1b5870dd3f9b1c78cf5f58a53514d73f001fc4c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 6 17:34:20 2020 -0500

    Add checks so that s390x is detected as 64-bit.

commit 882dcb11bfc9ea50aa2f9044621833efd90d42be
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 6 17:28:14 2020 -0500

    Mention example code at top of documentation docs.
    
    Details:
    - Steer the reader towards the example code section of each
      documentation doc (object and typed).
    - Trivial update to examples/oapi/README, examples/tapi/README.

commit f4894512e5bf56ff83701c07dd02972e300741a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 6 17:20:00 2020 -0500

    Very minor updates to previous commit.

commit adedb893ae8dfacd1dc54035979e15c44d589dbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 6 17:14:01 2020 -0500

    Documented mutator functions in BLISObjectAPI.md.
    
    Details:
    - Added documentation for commonly-used object mutator functions in
      BLISObjectAPI.md. Previously, only accessor functions were documented.
      Thanks to Jeff Diamond for pointing out this omission.
    - Explicitly set the 'diag' property of objects in oapi example modules
      (08level2.c and 09level3.c).

commit 5b5278ff494888509543a79c09ea82089f6c95d9
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 6 14:19:37 2020 -0500

    Use #ifdef instead of #if as macro may be undefined.

commit 7fdc0fc893d0c6727b725ea842053b65be2c20ba
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Aug 6 14:03:55 2020 -0500

    Add an option to change the complex return type.
    
    ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes #433.

commit 6e522e5823b762d4be09b6acdca30faafba56758
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 30 19:31:37 2020 -0500

    Mention disabling of sup in docs/Sandboxes.md.
    
    Details:
    - Added language to remind the reader to disable sup if the intended
      behavior is for the sandbox implementation to handle all problem
      sizes, even the smaller ones that would normally be handled by the
      sup code path.

commit 00e14cb6d849e963a2e1ac35e7dbbe186af00a58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 29 14:24:34 2020 -0500

    Replaced use of bool_t type with C99 bool.
    
    Details:
    - Textually replaced nearly all non-comment instances of bool_t with the
      C99 bool type. A few remaining instances, such as those in the files
      bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
      bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
      used not for boolean purposes but to index into an array.
    - This commit constitutes the third phase of a transition toward using
      C99's bool instead of bool_t, which was raised in issue #420. The first
      phase, which cleaned up various typecasts in preparation for using
      bool as the basis for bool_t (instead of gint_t), was implemented by
      commit a69a4d7. The second phase, which redefined the bool_t typedef
      in terms of bool (from gint_t), was implemented by commit 2c554c2.

commit 2c554c2fce885f965a425e727a0314d3ba66c06d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 24 15:57:19 2020 -0500

    Redefined bool_t typedef in terms of C99 bool.
    
    Details:
    - Changed the typedef that defines bool_t from:
    
        typedef gint_t bool_t;
    
      where gint_t is a signed integer that forms the basis of most other
      integers in BLIS, to:
    
        typedef bool bool_t;
    
    - Changed BLIS's TRUE and FALSE macro definitions from being in terms of
      integer literals:
    
        #define TRUE  1
        #define FALSE 0
    
      to being in terms of C99 boolean constants:
    
        #define TRUE  true
        #define FALSE false
    
      which are provided by stdbool.h.
    - This commit constitutes the second phase of a transition toward using
      C99's bool instead of bool_t, which will address issue #420. The first
      phase, which cleaned up various typecasts in preparation for using
      bool as the basis for bool_t (instead of gint_t), was implemented by
      commit a69a4d7.

commit e01dd125581cec87f61e15590922de0dc938ec42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 24 15:41:46 2020 -0500

    Fail-safe updates to Makefiles in 'test' dir.
    
    Details:
    - Updated Makefiles in test, test/3, and test/sup so that running any of
      the usual targets without having first built BLIS results in a helpful
      error message. For example, if BLIS is not yet configured, make will
      output:
    
        Makefile:327: *** Cannot proceed: config.mk not detected! Run
        configure first.  Stop.
    
      Similarly, if BLIS is configured but not yet built, make will output:
    
        Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
        make first.  Stop.
    
      In previous commits, these actions would result in a rather cryptic
      make error such as:
    
        make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
        needed by 'blis-nat-st'.  Stop.

commit b4f47f7540062da3463e2cb91083c12fdda0d30a
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Jul 24 13:56:13 2020 -0500

    Add BLIS_EXPORT_BLIS to bli_abort. (#429)
    
    Fixes #428.

commit a69a4d7e2f4607c919db30b14535234ce169c789
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 22 16:13:09 2020 -0500

    Cleaned up bool_t usage and various typecasts.
    
    Details:
    - Fixed various typecasts in
    
        frame/base/bli_cntx.h
        frame/base/bli_mbool.h
        frame/base/bli_rntm.h
        frame/include/bli_misc_macro_defs.h
        frame/include/bli_obj_macro_defs.h
        frame/include/bli_param_macro_defs.h
    
      that were missing or being done improperly/incompletely. For example,
      many return values were being typecast as
        (bool_t)x && y
      rather than
        (bool_t)(x && y)
      Thankfully, none of these deficiencies had manifested as actual bugs
      at the time of this commit.
    - Changed the return type of bli_env_get_var() from dim_t to gint_t.
      This reflects the fact that bli_env_get_var() needs to be able to
      return a signed integer, and even though dim_t is currently defined
      as a signed integer, it does not intuitively appear to necessarily be
      signed by inspection (i.e., an integer named "dim_t" for matrix
      "dimension"). Also, updated use of bli_env_get_var() within
      bli_pack.c to reflect the changed return type.
    - Redefined type of thrcomm_t.barrier_sense field from bool_t to gint_t
      and added comments to the bli_thrcomm_*.h files that will explain a
      planned replacement of bool_t with C99's bool type.
    - Note: These changes are being made to facilitate the substitution of
      'bool' for 'bool_t', which will eliminate the namespace conflict with
      arm_sve.h as reported in issue #420. This commit implements the first
      phase of that transition. Thanks to RuQing Xu for reporting this
      issue.
    - CREDITS file update.

commit a6437a5c11d364c6c88af527294d29734d7cc7d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 20 19:21:07 2020 -0500

    Replaced broken ref99 sandbox w/ simpler version.
    
    Details:
    - The 'ref99' sandbox was broken by multiple refactorings and internal
      API changes over the last two years. Rather than try to fix it, I've
      replaced it with a much simpler version based on var2 of gemmsup.
      Why not fix the previous implementation? It occurred to me that the
      old implementation was trying to be a lightly simplified duplication
      of what exists in the framework. Duplication aside, this sandbox
      would have worked fine if it had been completely independent of the
      framework code. The problem was that it was only partially
      independent, with many function calls calling a function in BLIS
      rather than a duplicated/simplified version within the sandbox. (And
      the reason I didn't make it fully independent to begin with was that
      it seemed unnecessarily duplicative at the time.) Maintaining two
      versions of the same implementation is problematic for obvious
      reasons, especially when it wasn't even done properly to begin with.
      This explains the reimplementation in this commit. The only catch is
      that the newer implementation is single-threaded only and does not
      perform any packing on either input matrix (A or B). Basically, it's
      only meant to be a simple placeholder that shows how you could plug
      in your own implementation. Thanks to Francisco Igual for reporting
      this brokenness.
    - Updated the three reference gemmsup kernels (defined in
      ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
      conjugation of conja and/or conjb. The general storage kernel, which
      is currently identical to the column-storage kernel, is used in the
      new ref99 sandbox to provide basic support for all datatypes
      (including scomplex and dcomplex).
    - Minor updates to docs/Sandboxes.md, including adding the threading
      and packing limitations to the Caveats section.
    - Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
      sandbox implementation is based).

commit bca040be9da542dd9c75d91890fa7731841d733d
Merge: 2605eb4d 171ecc1d
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Jul 20 09:27:30 2020 -0500

    Merge pull request #425 from gmargari/patch-1
    
    Update Multithreading.md

commit 171ecc1dc6f055ea39da30e508f711b49a734359
Author: Giorgos Margaritis <gmargari@protonmail.com>
Date:   Mon Jul 20 12:24:06 2020 +0300

    Update Multithreading.md

commit 2605eb4d99d3813c37a624c011aa2459324a6d89
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 15 15:25:19 2020 -0500

    Added missing rv_d?x6 edge cases to sup kernel.
    
    Details:
    - Added support to bli_gemmsup_rv_haswell_asm_d6x8n.c for handling
      various n = 6 edge cases with a single sup kernel call. Previously,
      only n = {4,2,1} were handled explicitly as single kernel calls;
      that is, cases where n = 6 were previously being executed via two
      kernel calls (n = 4 and n = 2).
    - Added commented debug line to testsuite's test_libblis.c.

commit 72f6ed0637dfcb021de04ac7d214d5c87e55d799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 3 17:55:54 2020 -0500

    Declare/define static functions via BLIS_INLINE.
    
    Details:
    - Updated all static function definitions to use the cpp macro
      BLIS_INLINE instead of the static keyword. This allows blis.h to
      use a different keyword (inline) to define these functions when
      compiling with C++, which might otherwise trigger "defined but
      not used" warning messages. Thanks to Giorgos Margaritis for
      reporting this issue and Devin Matthews for suggesting the fix.
    - Updated the following files, which are used by configure's
      hardware auto-detection facility, to unconditionally #define
      BLIS_INLINE to the static keyword (since we know BLIS will be
      compiled with C, not C++):
        build/detect/config/config_detect.c
        frame/base/bli_arch.c
        frame/base/bli_cpuid.c
    - CREDITS file update.

commit 5fc701ac5f94c6300febbb2f24e731aa34f0f34a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 1 15:48:58 2020 -0500

    Added -fomit-frame-pointer option to CKOPTFLAGS.
    
    Details:
    - Added the -fomit-frame-pointer compiler option to the CKOPTFLAGS
      variable in the following make_defs.mk files:
        config/haswell/make_defs.mk
        config/skx/make_defs.mk
      as well as comments that mention why the compiler option is needed.
      This option is needed to prevent the compiler from using the rbp
      frame register (in the very early portion of kernel code, typically
      where k_iter and k_left are defined and computed), which, as of
      1c719c9, is used explicitly by the gemmsup millikernels. Thanks to
      Devin Matthews for identifying this missing option and to Jeff
      Diamond for reporting the original bug in #417.
    - The file
        config/zen/amd_config.mk
      which feeds into the make_defs.mk for both zen and zen2 subconfigs,
      was also touched, but only to add a commented-out compiler option
      (and the aforementioned explanatory comment) since that file already
      uses -fomit-frame-pointer in COPTFLAGS, which forms the basis of
      CKOPTFLAGS.

commit 6af59b705782dada47e45df6634b479fe781d4fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 1 14:54:23 2020 -0500

    Fixed disabled edge case optimization in gemmsup.
    
    Details:
    - Fixed an inadvertently disabled edge case optimization in the two
      gemmsup variants in bli_l3_sup_var1n2m.c. Background: These edge case
      optimizations allow the last millikernel operation in the jr loop to
      be executed with inflated an register blocksize if it is the last
      (or only) iteration. For example, if mr=6 and nr=8 and the gemmsup
      problem is m=8, n=100, k=100. (In this case, the panel-block variant
      (var1n) is executed, which places the jr loop in the m dimension.)
      In principle, this problem could be executed as two millikernels: one
      with dimensions 6x100x100, and one as 2x100x100. However, with the
      support for inflated blocksizes in the kernel, the entire 8x100x100
      problem can be passed to the millikernel function, which will then
      execute it more favorably as two 4x100x100 millikernel sub-calls.
      Now, this optimization is disabled under certain circumstances, such
      as when multithreading. Previously, the is_mt predicate was being set
      incorrectly such that it was non-zero even when running
      single-threaded.
    - Upon fixing the is_mt issue above, another bit of code needed to be
      moved so that the result of the optimization could have an impact on
      the assignment of loop bounds ranges to threads.

commit b37634540fab0f9b8d4751b8356ee2e17c9e3b00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 25 16:05:12 2020 -0500

    Support ldims, packing in sup/test drivers.
    
    Details:
    - Updated the test/sup source file (test_gemm.c) and Makefile to support
      building matrices with small or large leading dimensions, and updated
      runme.sh to support executing both kinds of test drivers.
    - Updated runme.sh to allow for executing sup drivers with unpacked (the
      default) or packed matrices (via setting BLIS_PACK_A, BLIS_PACK_B
      environment variables), and for capturing output to files that encode
      both the leading dimension (small or large) and packing status into
      the filenames.
    - Consolidated octave scripts in test/sup/octave_st, test/sup/octave_mt
      into test/sup/octave and updated the octave code in that consolidated
      directory to read the new output filename format (encoding ldim and
      packing). Also added comments and streamlined code, particularly in
      plot_panel_trxsh.m. Tested the octave scripts with octave 5.2.0.
    - Moved old octave_st, octave_mt directories to test/sup/old.

commit ceb9b95a96cc3844ecb43d9af48ab289584e76b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 18 17:15:25 2020 -0500

    Fixed incorrect link to shiftd in BLISTypedAPI.md.
    
    Details:
    - Previously, the entry for shiftd in the Operation index section of
      BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
      in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
      helping find this incorrect link.

commit b3c42016818797f79e55b32c8b7d090f9d0aa0ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 18 14:00:56 2020 -0500

    CREDITS file update.

commit 31af73c11abae03248d959da0f81eacea015b57a
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Thu Jun 18 13:35:54 2020 -0500

    Expand windows instructions (#414)
    
    * Expand windows instructions
    
    * Windows: both static and shared don't work at the same time

commit b5b604e106076028279e6d94dc0e51b8ad48e802
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 17 16:42:24 2020 -0500

    Ensure random objects' 1-norms are non-zero.
    
    Details:
    - Fixed an innocuous bug that manifested when running the testsuite on
      extremely small matrices with randomization via the "powers of 2 in
      narrow precision range" option enabled. When the randomization
      function emits a perfect 0.0 to fill a 1x1 matrix, the testsuite will
      then compute 0.0/0.0 during the normalization process, which leads to
      NaN residuals. The solution entails smarter implementaions of randv,
      randnv, randm, and randnm, each of which will compute the 1-norm of
      the vector or matrix in question. If the object has a 1-norm of 0.0,
      the object is re-randomized until the 1-norm is not 0.0. Thanks to
      Kiran Varaganti for reporting this issue (#413).
    - Updated the implementation of randm_unb_var1() so that it loops over
      a call to the randv_unb_var1() implementation directly rather than
      calling it indirectly via randv(). This was done to avoid the overhead
      of multiple calls to norm1v() when randomizing the rows/columns of a
      matrix.
    - Updated comments.

commit 35e38fb693e7cbf2f3d7e0505a63b2c05d3f158d
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Tue Jun 16 10:59:41 2020 -0500

    FIx typo in FAQ

commit 1c719c91a3ef0be29a918097652beef35647d4b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 4 17:21:08 2020 -0500

    Bugfixes, cleanup of sup dgemm ukernels.
    
    Details:
    - Fixed a few not-really-bugs:
      - Previously, the d6x8m kernels were still prefetching the next upanel
        of A using MR*rs_a instead of ps_a (same for prefetching of next
        upanel of B in d6x8n kernels using NR*cs_b instead of ps_b). Given
        that the upanels might be packed, using ps_a or ps_b is the correct
        way to compute the prefetch address.
      - Fixed an obscure bug in the rd_d6x8m kernel that, by dumb luck,
        executed as intended even though it was based on a faulty pointer
        management. Basically, in the rd_d6x8m kernel, the pointer for B
        (stored in rdx) was loaded only once, outside of the jj loop, and in
        the second iteration its new position was calculated by incrementing
        rdx by the *absolute* offset (four columns), which happened to be the
        same as the relative offset (also four columns) that was needed. It
        worked only because that loop only executed twice. A similar issue
        was fixed in the rd_d6x8n kernels.
    - Various cleanups and additions, including:
      - Factored out the loading of rs_c into rdi in rd_d6x8[mn] kernels so
        that it is loaded only once outside of the loops rather than
        multiple times inside the loops.
      - Changed outer loop in rd kernels so that the jump/comparison and
        loop bounds more closely mimic what you'd see in higher-level source
        code. That is, something like:
          for( i = 0; i < 6; i+=3 )
        rather than something like:
          for( i = 0; i <= 3; i+=3 )
      - Switched row-based IO to use byte offsets instead of byte column
        strides (e.g. via rsi register), which were known to be 8 anyway
        since otherwise that conditional branch wouldn't have executed.
      - Cleaned up and homogenized prefetching a bit.
      - Updated the comments that show the before and after of the
        in-register transpositions.
      - Added comments to column-based IO cases to indicate which columns
        are being accessed/updated.
      - Added rbp register to clobber lists.
      - Removed some dead (commented out) code.
      - Fixed some copy-paste typos in comments in the rv_6x8n kernels.
      - Cleaned up whitespace (including leading ws -> tabs).
      - Moved edge case (non-milli) kernels to their own directory, d6x8,
        and split them into separate files based on the "NR" value of the
        kernels (Mx8, Mx4, Mx2, etc.).
      - Moved config-specific reference Mx1 kernels into their own file
        (e.g. bli_gemmsup_r_haswell_ref_dMx1.c) inside the d6x8 directory.
      - Added rd_dMx1 assembly kernels, which seems marginally faster than
        the corresponding reference kernels.
      - Updated comments in ref_kernels/bli_cntx_ref.c and changed to using
        the row-oriented reference kernels for all storage combos.

commit 943a21def0bedc1732c0a2453afe7c90d7f62e95
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Thu May 21 14:09:21 2020 -0500

    Add build instructions for Windows (#404)

commit fbef422f0d968df10e598668b427af230cfe07e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 21 10:30:41 2020 -0500

    Separate OS X and Windows into separate FAQs.
    
    Details:
    - Separated the unified Mac OS X / Windows frequently asked question
      into two separate questions, one for each OS.

commit 28be1a4265ea67e3f177c391aba3dbbcf840bd52
Author: Guodong Xu <guodong.xu@linaro.org>
Date:   Thu May 21 02:22:22 2020 +0800

    avoid loading twice in armv8a gemm kernel (#403)
    
    This bug happens at a corner case, when k_iter == 0 and we jump to
    CONSIDERKLEFT.
    
    In current design, first row/col. of a and b are loaded twice.
    
    The fix is to rearrange a and b (first row/col.) loading instructions.
    
    Signed-off-by: Guodong Xu <guodong.xu@linaro.org>

commit d51245e58b0beff2717156b980007c90337150d8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 8 18:00:54 2020 -0500

    Add support for Intel oneAPI in configure.
    
    Details:
    - Properly select cc_vendor based on the output of invoking CC with the
      --version option, including cases where CC is the variant of clang
      that is included with Intel oneAPI. (However, we continue to treat
      the compiler as clang for other purposes, not icc.) Thanks to Ajay
      Panyala and Devin Matthews for reporting on this issue via #402.

commit 787adad73bd5eb65c12c39d732723a1ac0448748
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 8 16:18:20 2020 -0500

    Defined netlib equivalent of xerbla_array().
    
    Details:
    - Added a function definition for xerbla_array_(), which largely mirrors
      its netlib implementation. Thanks to Isuru Fernando for suggesting the
      addition of this function.

commit c53b5153bee585685bf95ce22e058a7af72ecef0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 5 12:39:12 2020 -0500

    Documented Perl prerequisite for build system.
    
    Details:
    - Added Perl to list of prerequisites for building BLIS. This is in part
      (and perhaps completely?) due to some substitution commands used at
      the end of configure that include '\n' characters that are not
      properly interpreted by the version of sed included on some versions
      of OS X. This new documentation addresses issue #398.

commit f032d5d4a6ed34c8c3e5ba1ed0b14d1956d0097c
Author: Guodong Xu <guodong.xu@linaro.org>
Date:   Thu Apr 30 01:08:46 2020 +0800

    New kernel set for Arm SVE using assembly (#396)
    
    Here adds two kernels for Arm SVE vector extensions.
    1. a gemm  kernel for double at sizes 8x8.
    2. a packm kernel for double at dimension 8xk.
    
    To achive best performance, variable length agonostic programming
    is not used. Vector length (VL) of 256 bits is mandated in both kernels.
    Kernels to support other VLs can be added later.
    
    "SVE is a vector extension for AArch64 execution mode for the A64
    instruction set of the Armv8 architecture. Unlike other SIMD architectures,
    SVE does not define the size of the vector registers, but constrains into
    a range of possible values, from a minimum of 128 bits up to a maximum of
    2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
    extension by choosing the vector register size that better suits the
    workloads the CPU is targeting. Instructions are provided specifically
    to query an implementation for its register size, to guarantee that
    the applications can run on different implementations of the ISA without
    the need to recompile the code."  [1]
    
    [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning
    
    Signed-off-by: Guodong Xu <guodong.xu@linaro.org>

commit 4d87eb24e8e1f5a21e04586f6df4f427bae0091b
Author: Yingbo Ma <mayingbo5@gmail.com>
Date:   Mon Apr 27 17:02:47 2020 -0400

    Update KernelsHowTo.md (#395)

commit 477ce91c5281df2bbfaddc4d86312fb8c8f879e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 22 14:26:49 2020 -0500

    Moved #include "cpuid.h" to bli_cpuid.c.
    
    Details:
    - Relocated the #include "cpuid.h" directive from bli_cpuid.h to
      bli_cpuid.c. This was done because cpuid.h (which is pulled into
      the post-build blis.h developer header) doesn't protect its
      definitions with a preprocessor guard of the form:
    
        #ifndef FOOBAR_H
        #define FOOBAR_H
        // header contents.
        #endif
    
      and as a result, applications (previously) could not #include both
      blis.h and cpuid.h (since the former was already including the
      latter). Thanks to Bhaskar Nallani for raising this issue via #393
      and to Devin Matthews for suggesting this fix.
    - CREDITS file update.

commit 8bde63ffd7474a97c3a3b0b0dc1eae45be0ab889
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 18 12:50:12 2020 -0500

    Adding missing conjy to her2/syr2 in typed API doc.
    
    Details:
    - Fixed a missing argument (conjy) in the function signatures of
      bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
      van de Geijn for reporting this omission.

commit 976902406b610afdbacb2d80a7a2b4b43ff30321
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 17 15:11:10 2020 -0500

    Disable packing by default in expert rntm_t init.
    
    Details:
    - Changed the behavior of bli_rntm_init() as well as the static
      initializer, BLIS_RNTM_INITIALIZER, so that user-initialized rntm_t
      objects by default specify the disabling of packing for A and B.
      Packing of A/B was already disabled by default when calling non-expert
      APIs (and enabled only when the user set environment variables
      BLIS_PACK_A or BLIS_PACK_B). With this commit, the default behavior of
      using user-initialized rntm_t objects with expert APIs comes into line
      with the default behavior of non-expert APIs--that is, they now both
      lead to the avoidance of packing in the sup code path. (Note: The
      conventional code path is unaffected by the environment variables
      BLIS_PACK_A/BLIS_PACK_B and/or the disabling of packing in a rntm_t
      object when calling an expert API.) This addresses issue #392. Thanks
      to Kiran Varaganti for bringing this inconsistency to our attention.
    - The above change was accomplished by changing the the definitions of
      static functions bli_rntm_clear_pack_a() and bli_rntm_clear_pack_b()
      in bli_rntm.h, which are both for internal use only.

commit 5f2aee7c5fa5d562acaf8fbde3df0e2a04e1dd1b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 7 14:55:15 2020 -0500

    README.md update to promote supmt dgemm.
    
    Details:
    - Updated the sup entry in the "What's New" section of the README.md
      file to promote the multithreaded dgemm sup feature introduced in
      c0558fd.

commit f5923cd9ff5fbd91190277dea8e52027174a1d57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 7 14:41:45 2020 -0500

    CHANGELOG update (0.7.0)

commit 68b88aca6692c75a9f686187e6c4a4e196ae60a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 7 14:41:44 2020 -0500

    Version file update (0.7.0)

commit b04de636c1702e4cb8e7ad82bab3cf43d2dbdfc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 7 14:37:43 2020 -0500

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated docs/ReleaseNotes.md in preparation for next version.

commit 2cb604ba472049ad498df72d4a2dc47a161d4c3c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 6 16:42:14 2020 -0500

    Rename more bli_thread_obarrier(), _obroadcast().
    
    Details:
    - Renamed instances of bli_thread_obarrier() and bli_thread_obroadcast()
      that were made in the supmt-specific code commited to the 'amd'
      branch, which has now been merged with 'master'. Prior to the merge,
      'master' received commit c01d249, which applied these renamings to
      the existing, non-sup codebase.

commit efb12bc895de451067649d5dceb059b7827a025f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 6 15:01:53 2020 -0500

    Minor updates/elaborations to RELEASING file.

commit 2e3b3782cfb7a2fd0d1a325844983639756def7d
Merge: 9f3a8d4d da0c086f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 6 14:55:35 2020 -0500

    Merge branch 'master' into amd

commit da0c086f4643772e111318f95a712831b0f981a8
Author: Satish Balay <balay@mcs.anl.gov>
Date:   Tue Mar 31 17:09:41 2020 -0500

    OSX: specify the full path to the location of libblis.dylib (#390)
    
    * OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime
    
    Before this change:
    
    Appication gives runtime error [when linked with blis]
    dyld: Library not loaded: libblis.3.dylib
    
    balay@kpro lib % otool -L libblis.dylib
    libblis.dylib:
            libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
    
    After this change:
    balay@kpro lib % otool -L libblis.dylib
    libblis.dylib:
            /Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
    
    * INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR
    
    Co-Authored-By: Jed Brown <jed@jedbrown.org>
    
    * CREDITS file update.
    
    Co-authored-by: Jed Brown <jed@jedbrown.org>
    Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>

commit 2bca03ea9d87c0da829031a5332545d05e352211
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 28 22:10:00 2020 +0000

    Updates, tweaks to runme.sh in test/1m4m.
    
    Details:
    - Made several updates to test/1m4m/runme.sh, including:
      - Added missing handling for 1m and 4m1a implementations when setting
        the BLIS_??_NT environment variables.
      - Added support for using numactl to run the test executables.
      - Several other cleanups.

commit c40a33190b94af5d5c201be63366594859b1233f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 26 16:55:00 2020 -0500

    Warn user when auto-detection returns 'generic'.
    
    Details:
    - Added logic to configure that causes the script to output a warning
      to the user if/when "./configure auto" is run and the underlying
      hardware feature detection code is unable to identify the hardware.
      In these cases, the auto-detect code will return 'generic', which
      is likely not what the user expected, and a flag will be set so that
      a message is printed at the end of the configure output. (Thankfully,
      we don't expect this scenario to play out very often.) Thanks to
      Devin Matthews for suggesting this fix #384.

commit 492a736fab5b9c882996ca024b64646877f22a89
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Mar 24 17:28:47 2020 -0500

    Fix vectorized version of bli_amaxv (#382)
    
    * Fix vectorized version of bli_amaxv
    
    To match Netlib, i?amax should return:
    - the lowest index among equal values
    - the first NaN if one is encountered
    
    * Fix typos.
    
    * And another one...
    
    * Update ref. amaxv kernel too.
    
    * Re-enabled optimized amaxv kernels.
    
    Details:
    - Re-enabled the optimized, intrinsics-based amaxv kernels in the 'zen'
      kernel set for use in haswell, zen, zen2, knl, and skx subconfigs.
      These two kernels (for s and d datatypes) were temporarily disabled in
      e186d71 as part of issue #380. However, the key missing semantic
      properties that prompted the disabling of these kernels--returning the
      index of the *first* rather than of the last element with largest
      absolute value, and returning the index of the first NaN if one is
      encountered--were added as part of #382 thanks to Devin Matthews.
      Thus, now that the kernels are working as expected once more, this
      commit causes these kernels to once again be registered for the
      affected subconfigs, which effectively reverts all code changes
      included in e186d71.
    - Whitespace/formatting updates to new macros in bli_amaxv_zen_int.c.
    
    Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>

commit e186d7141a51f2d7196c580e24e7b7db8f209db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 21 18:40:36 2020 -0500

    Disabled optimized amaxv kernels.
    
    Details:
    - Disabled use of optimized amaxv kernels, which use vector intrinsics
      for both 's' and 'd' datatypes. We disable these kernels because the
      current implementations fail to observe a semantic property of the
      BLAS i?amax_() subroutine, which is to return the index of the
      *first* element containing the maximum absolute value (that is, the
      first element if there exist two or more elements that contain the
      same value). With the optimized kernels disabled, the affected
      subconfigurations (haswell, zen, zen2, knl, and skx) will use the
      default reference implementations. Thanks to Mat Cross for reporting
      this issue via #380.
    - CREDITS file update.

commit 9f3a8d4d851725436b617297231a417aa9ce8c6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 14 17:48:43 2020 -0500

    Added missing return to bli_thread_partition_2x2().
    
    Details:
    - Added a missing return statement to the body of an early case handling
      branch in bli_thread_partition_2x2(). This bug only affected cases
      where n_threads < 4, and even then, the code meant to handle cases
      where n_threads >= 4 executes and does the right thing, albeit using
      more CPU cycles than needed. Nonetheless, thanks to Kiran Varaganti
      for reporting this bug via issue #377.
    - Whitespace changes to bli_thread.c (spaces -> tabs).

commit 8c3d9b9eeb6f816ec8c32a944f632a5ad3637593
Merge: 71249fe8 0f9e0399
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 10 14:03:33 2020 -0500

    Merge branch 'amd' of github.com:flame/blis into amd

commit 71249fe8ddaa772616698f1e3814d40e012909ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 10 13:55:29 2020 -0500

    Merged test/sup, test/supmt into test/sup.
    
    Details:
    - Updated the Makefile, test_gemm.c, and runme.sh in test/sup to be able
      to compile and run both single-threaded and multithreaded experiments.
      This should help with maintenance going forward.
    - Created a test/sup/octave_st directory of scripts (based on the
      previous test/sup/octave scripts) as well as a test/sup/octave_mt
      directory (based on the previous test/supmt/octave scripts). The
      octave scripts are slightly different and not easily mergeable, and
      thus for now I'll maintain them separately.
    - Preserved the previous test/sup directory as test/sup/old/supst and
      the previous test/supmt directory as test/sup/old/supmt.

commit 0f9e0399e16e96da2620faf2c0c3c21274bb2ebd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 5 17:03:21 2020 -0600

    Updated sup performance graphs; added mt results.
    
    Details:
    - Reran all existing single-threaded performance experiments comparing
      BLIS sup to other implementations (including the conventional code
      path within BLIS), using the latest versions (where appropriate).
    - Added multithreaded results for the three existing hardware types
      showcased in docs/PerformanceSmall.md: Kaby Lake, Haswell, and Epyc
      (Zen1).
    - Various minor updates to the text in docs/PerformanceSmall.md.
    - Updates to the octave scripts in test/sup/octave, test/supmt/octave.

commit 90db88e5729732628c1f3acc96eeefab49f2da41
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 2 15:06:48 2020 -0600

    Updated sup[mt] Makefiles for variable dim ranges.
    
    Details:
    - Updated test/sup/Makefile and test/supmt/Makefile to allow specifying
      different problem size ranges for the drivers where one, two, or three
      matrix dimensions is large. This will facilitate the generation of
      more meaningful graphs, particularly when two dimensions are tiny.

commit 31f11a06ea9501724feec0d2fc5e4644d7dd34fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 27 14:33:20 2020 -0600

    Updates to octave scripts in test/sup[mt]/octave.
    
    Details:
    - Optimized scripts in test/sup/octave and test/supmt/octave for use
      with octave 5.2.0 on Ubuntu 18.04.
    - Fixed stray 'end' keywords in gen_opsupnames.m and plot_l3sup_perf.m,
      which were not only unnecessary but also causing issues with versions
      5.x.

commit c01d249d7c546fe2e3cee3fe071cd4c4c88b9115
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 25 14:50:53 2020 -0600

    Renamed bli_thread_obarrier(), _obroadcast().
    
    Details:
    - Renamed two bli_thread_*() APIs:
        bli_thread_obarrier()   -> bli_thread_barrier()
        bli_thread_obroadcast() -> bli_thread_broadcast()
      The 'o' was a leftover from when thrcomm_t objects tracked both
      "inner" and "outer" communicators. They have long since been
      simplified to only support the latter, and thus the 'o' is
      superfluous.

commit f6e6bf73e695226c8b23fe7900da0e0ef37030c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 24 17:52:23 2020 -0600

    List Gentoo under supported external packages.
    
    Details:
    - Add mention of Gentoo Linux under the list of external packages in
      the README.md file. Thanks to M. Zhou for maintaining this package.

commit 9e5f7296ccf9b3f7b7041fe1df20b927cd0e914b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 18 15:16:03 2020 -0600

    Skip building thrinfo_t tree when mt is disabled.
    
    Details:
    - Return early from bli_thrinfo_sup_grow() if the thrinfo_t object
      address is equal to either &BLIS_GEMM_SINGLE_THREADED or
      &BLIS_PACKM_SINGLE_THREADED.
    - Added preprocessor logic to bli_l3_sup_thread_decorator() in
      bli_l3_sup_decor_single.c that (by default) disables code that
      creates and frees the thrinfo_t tree and instead passes
      &BLIS_GEMM_SINGLE_THREADED as the thrinfo_t pointer into the
      sup implementation.
    - The net effect of the above changes is that a small amount of
      thrinfo_t overhead is avoided when running small/skinny dgemm
      problems when BLIS is compiled with multithreading disabled.

commit 90081e6a64b5ccea9211bdef193c2d332c68492f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 17 14:57:25 2020 -0600

    Fixed bug(s) in mt sup when single-threaded.
    
    Details:
    - Fixed a syntax bug in bli_l3_sup_decor_single.c as a result of
      changing function interface for the thread entry point function
      (of type l3supint_t).
    - Unfortunately, fixing the interface was not enough, as it caused
      a memory leak in the sba at bli_finalize() time. It turns out that,
      due to the new multithreading-capable variant code useing thrinfo_t
      objects--specifically, their calling of bli_thrinfo_grow()--we
      have to pass in a real thrinfo_t object rather than the global
      objects &BLIS_PACKM_SINGLE_THREADED or &BLIS_GEMM_SINGLE_THREADED.
      Thus, I inserted the appropriate logic from the OpenMP and pthreads
      versions so that single-threaded execution would work as intended
      with the newly upgraded variants.

commit c0558fde4511557c8f08867b035ee57dd2669dc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 17 14:08:08 2020 -0600

    Support multithreading within the sup framework.
    
    Details:
    - Added multithreading support to the sup framework (via either OpenMP
      or pthreads). Both variants 1n and 2m now have the appropriate
      threading infrastructure, including data partitioning logic, to
      parallelize computation. This support handles all four combinations
      of packing on matrices A and B (neither, A only, B only, or both).
      This implementation tries to be a little smarter when automatic
      threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
      recalculate the factorization in units of micropanels (rather than
      using the raw dimensions) in bli_l3_sup_int.c, when the final
      problem shape is known and after threads have already been spawned.
    - Implemented bli_?packm_sup_var2(), which packs to conventional row-
      or column-stored matrices. (This is used for the rrc and crc storage
      cases.) Previously, copym was used, but that would no longer suffice
      because it could not be parallelized.
    - Minor reorganization of packing-related sup functions. Specifically,
      bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
      instead of from the variant functions. This has the effect of making
      the variant functions more readable.
    - Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
      and inserted usage of these functions within bli_thrinfo_init(), which
      previously was accessing thrinfo_t fields via the -> operator.
    - Renamed bli_partition_2x2() to bli_thread_partition_2x2().
    - Added an auto_factor field to the rntm_t struct in order to track
      whether automatic thread factorization was originally requested.
    - Added new test drivers in test/supmt that perform multithreaded sup
      tests, as well as appropriate octave/matlab scripts to plot the
      resulting output files.
    - Added additional language to docs/Multithreading.md to make it clear
      that specifying any BLIS_*_NT variable, even if it is set to 1, will
      be considered manual specification for the purposes of determining
      whether to auto-factorize via BLIS_NUM_THREADS.
    - Minor comment updates.

commit d7a7679182d72a7eaecef4cd9b9a103ee0a7b42b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 7 17:37:03 2020 -0600

    Fixed int-to-packbuf_t conversion error (C++ only).
    
    Details:
    - Fixed an error that manifests only when using C++ (specifically,
      modern versions of g++) to compile drivers in 'test' (and likely most
      other application code that #includes blis.h. Thanks to Ajay Panyala
      for reporting this issue (#374).

commit d626112b8d5302f9585fb37a8e37849747a2a317
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 15 13:27:02 2020 -0600

    Removed sorting on LDFLAGS in common.mk (#373).
    
    Details:
    - Removed a line of code in common.mk that passed LDFLAGS through the
      sort function. The purpose was not to sort the contents, but rather
      to remove duplicates. However, there is valid syntax in a string of
      linker flags that, when sorted, yields different/broken behavior.
      So I've removed the line in common.mk that sorts LDFLAGS. Also, for
      future use, I've added a new function, rm-dupls, that removes
      duplicates without sorting. (This function was based on code from a
      stackoverflow thread that is linked to in the comments for that
      code.) Thanks to Isuru Fernando for reporting this issue (#373).

commit e67deb22aaeab5ed6794364520190936748ef272
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 14 16:01:34 2020 -0600

    CHANGELOG update (0.6.1)

commit 10949f528c5ffc5c3a2cad47fe16a802afb021be
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 14 16:01:33 2020 -0600

    Version file update (0.6.1)

commit 5db8e710a2baff121cba9c63b61ca254a2ec097a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 14 15:59:59 2020 -0600

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated ReleaseNotes.md in preparation for next version.

commit cde4d9d7a26eb51dcc5a59943361dfb8fda45dea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 14 15:19:25 2020 -0600

    Removed 'attic/windows' (to prevent confusion).
    
    Details:
    - Finally removed 'attic/windows' and its contents. This directory once
      contained "proto" Windows support for BLIS, but we've since moved on
      to (thanks to Isuru Fernando) providing Windows DLL support via
      AppVeyor's build artifacts. Furthermore, since 'windows' was the only
      subdirectory within 'attic', the directory path would show up in
      GitHub's listing at https://github.com/flame/blis, which probably led
      to someone being confused about how BLIS provides Windows support. I
      assume (but don't know for sure) that nobody is using these files, so
      this is admittedly a case of shoot first and ask questions later.

commit 7d3407d4681c6449f4bbb8ec681983700ab968f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 14 15:17:53 2020 -0600

    CREDITS file update.

commit f391b3e2e7d11a37300d4c8d3f6a584022a599f5
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Mon Jan 6 20:15:48 2020 +0000

    Fix parsing in vpu_count on workstation SKX (#351)
    
    * Fix parsing in vpu_count on workstation SKX
    
    * Document Skylake-X as Haswell for single FMA
    
    * Update vpu_count for Skylake and Cascade Lake models
    
    * Support printing the configuration selected, controlled by the environment
    
    Intended particularly for diagnosing mis-selection of SKX through
    unknown, or incorrect, number of VPUs.
    
    * Move bli_log outside the cpp condition, and use it where intended
    
    * Add Fixme comment (Skylake D)
    
    * Mostly superficial edits to commits towards #351.
    
    Details:
    - Moved architecture/sub-config logging-related code from bli_cpuid.c
      to bli_arch.c, tweaked names, and added more set/get layering.
    - Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
    - Content, whitespace changes to new bullet in HardwareSupport.md that
      relates to single-VPU Skylake-Xs.
    
    * Fix comment typos
    
    Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>

commit 5ca1a3cfc1c1cc4dd9da6a67aa072ed90f07e867
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 6 12:29:12 2020 -0600

    Fixed 'configure' breakage introduced in 6433831.
    
    Details:
    - Added a missing 'fi' (endif) keyword to a conditional block added in
      the configure script in commit 6433831.

commit e7431b4a834ef4f165c143f288585ce8e2272a23
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 6 12:01:41 2020 -0600

    Updated 1m draft article link in README.md.

commit 6433831cc3988ad205637ebdebcd6d8f7cfcf148
Author: Jeff Hammond <jeff.r.hammond@intel.com>
Date:   Fri Jan 3 17:52:49 2020 -0800

    blacklist ICC 18 for knl/skx due to test failures
    
    Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>

commit af3589f1f98781e3a94a8f9cea8d5ea6f155f7d2
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Fri Jan 3 13:23:24 2020 -0800

    blacklist Intel 19+
    
    Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>

commit 60de939debafb233e57fd4e804ef21b6de198caf
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Wed Jan 1 21:30:38 2020 -0800

    fix link to docs
    
    the comment contains an incorrect link, which is trivially fixed here.
    
    @fgvanzee I hope you don't mind that I committed directly to master but this cannot break anything.

commit 52711073789b6b84eb99bb0d6883f457ed3fcf80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 16 16:30:26 2019 -0600

    Fixed bugs in cblas_sdsdot(), sdsdot_().
    
    Details:
    - Fixed a bug in sdsdot_sub() that redundantly added the "alpha" scalar,
      named 'sb'. This value was already being added by the underlying
      sdsdot_() function. Thus, we no longer add 'sb' within sdsdot_sub().
      Thanks to Simon Lukas Märtens for reporting this bug via #367.
    - Fixed a second bug in order of typecasting intermediate products in
      sdsdot_(). Previously, the "alpha" scalar was being added after the
      "outer" typecast to float. However, the operation is supposed to first
      add the dot product to the (promoted) scalar and THEN downcast the sum
      to float. Thanks to Devin Matthews for catching this bug.

commit fe2560a4b1d8ef8d0a446df6002b1e7decc826e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 6 17:12:44 2019 -0600

    Annoted missing thread-related symbols for export.
    
    Details:
    - Added BLIS_EXPORT_BLIS annotation to function prototypes for
    
        bli_thrcomm_bcast()
        bli_thrcomm_barrier()
        bli_thread_range_sub()
    
      so that these functions are exported to shared libraries by default.
      This (hopefully) fixes issue #366. Thanks to Kyungmin Lee for
      reporting this bug.
    - CREDITS file update.

commit 2853825234001af8f175ad47cef5d6ff9b7a5982
Merge: efa61a6c 61b1f0b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 6 16:06:46 2019 -0600

    Merge branch 'master' into amd

commit 61b1f0b0602faa978d9912fe58c6c952a33af0ac
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date:   Wed Dec 4 14:18:47 2019 -0600

    Add prototypes for POWER9 reference kernels (#365)
    
    Updates and fixes to power9 subconfig.
    
    Details:
    - Register s,c,z reference gemm and trsm ukernels that assume elements
      of B have been broadcast.
    - Added prototypes for level-3 ukernels that assume elements of B have
      been broadcast. Also added prototype for an spackm function that
      employs a duplication/broadcast factor of 4.
    - Register virtual gemmtrsm ukernels that work with broadcasting of B.
    - Disable right-side hemm, symm, trmm, and trmm3 in bli_family_power9.h.
    - Thanks to Nicholai Tukanov for providing these updates.

commit efa61a6c8b1cfa48781fc2e4799ff32e1b7f8f77
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 29 16:17:04 2019 -0600

    Added missing bli_l3_sup_thread_decorator() symbol.
    
    Details:
    - Defined dummy versions of bli_l3_sup_thread_decorator() for Openmp
      and pthreads so that those builds don't fail when performing shared
      library linking (especially for Windows DLLs via AppVeyor). For now,
      these dummy implementations of bli_l3_sup_thread_decorator() are
      merely carbon-copies of the implementation provided for single-
      threaded execution (ie: the one found in bli_l3_sup_decor_single.c).
      Thus, an OpenMP or pthreads build will be able to use the gemmsup
      code (including the new selective packing functionality), as it did
      before 39fa7136, even though it will not actually employ any
      multithreaded parallelism.

commit 39fa7136f4a4e55ccd9796fb79ad5f121b872ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 29 15:27:07 2019 -0600

    Added support for selective packing to gemmsup.
    
    Details:
    - Implemented optional packing for A or B (or both) within the sup
      framework (which currently only supports gemm). The request for
      packing either matrix A or matrix B can be made via setting
      environment variables BLIS_PACK_A or BLIS_PACK_B (to any
      non-zero value; if set, zero means "disable packing"). It can also
      be made globally at runtime via bli_pack_set_pack_a() and
      bli_pack_set_pack_b() or with individual rntm_t objects via
      bli_rntm_set_pack_a() and bli_rntm_set_pack_b() if using the expert
      interface of either the BLIS typed or object APIs. (If using the
      BLAS API, environment variables are the only way to communicate the
      packing request.)
    - One caveat (for now) with the current implementation of selective
      packing is that any blocksize extension registered in the _cntx_init
      function (such as is currently used by haswell and zen subconfigs)
      will be ignored if the affected matrix is packed. The reason is
      simply that I didn't get around to implementing the necessary logic
      to pack a larger edge-case micropanel, though this is entirely
      possible and should be done in the future.
    - Spun off the variant-choosing portion of bli_gemmsup_ref() into
      bli_gemmsup_int(), in bli_l3_sup_int.c.
    - Added new files, bli_l3_sup_packm_a.c, bli_l3_sup_packm_b.c, along
      with corresponding headers, in which higher-level packm-related
      functions are defined for use within the sup framework. The actual
      packm variant code resides in bli_l3_sup_packm_var.c.
    - Pass the following new parameters into var1n and var2m: packa, packb
      bool_t's, pointer to a rntm_t, pointer to a cntl_t (which is for now
      always NULL), and pointer to a thrinfo_t* (which for nowis the address
      of the global single-threaded packm thread control node).
    - Added panel strides ps_a and ps_b to the auxinfo_t structure so that
      the millikernel can query the panel stride of the packed matrix and
      step through it accordingly. If the matrix isn't packed, the panel
      stride of interest for the given millikernel will be set to the
      appropriate value so that the mkernel may step through the unpacked
      matrix as it normally would.
    - Modified the rv_6x8m and rv_6x8n millikernels to read the appropriate
      panel strides (ps_a and ps_b, respectively) instead of computing them
      on the fly.
    - Spun off the environment variable getting and setting functions into
      a new file, bli_env.c (with a corresponding prototype header). These
      functions are now used by the threading infrastructure (e.g.
      BLIS_NUM_THREADS, BLIS_JC_NT, etc.) as well as the selective packing
      infrastructure (e.g. BLIS_PACK_A, BLIS_PACK_B).
    - Added a static initializer for mem_t objects, BLIS_MEM_INITIALIZER.
    - Added a static initializer for pblk_t objects, BLIS_PBLK_INITIALIZER,
      for use within the definition of BLIS_MEM_INITIALIZER.
    - Moved the global_rntm object to bli_rntm.c and extern it where needed.
      This means that the function bli_thread_init_rntm() was renamed to
      bli_rntm_init_from_global() and relocated accordingly.
    - Added a new bli_pack.c function, which serves as the home for
      functions that manage the pack_a and pack_b fields of the global
      rntm_t, including from environment variables, just as we have
      functions to manage the threading fields of the global rntm_t in
      bli_thread.c.
    - Reorganized naming for files in frame/thread, which mostly involved
      spinning off the bli_l3_thread_decorator() functions into their own
      files. This change makes more sense when considering the further
      addition of bli_l3_sup_thread_decorator() functions (for now limited
      only to the single-threaded form found in the  _single.c file).
    - Explicitly initialize the reference sup handlers in both
      bli_cntx_init_haswell.c and bli_cntx_init_zen.c so that it's more
      obvious how to customize to a different handler, if desired.
    - Removed various snippets of disabled code.
    - Various comment updates.

commit bbb21fd0a9be8c5644bec37c75f9396eeeb69e48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 21 18:15:16 2019 -0600

    Tweaked SIAM/SC Best Prize language in README.md.

commit 043366f92d5f5f651d5e3371ac3adb36baf4adce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 21 18:13:51 2019 -0600

    Fixed typo in previous commit (SIAM/SC prize).

commit 05a4d583e65a46ff2a1100ab4433975d905d91f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 21 18:12:24 2019 -0600

    Added SIAM/SC prize to "What's New" in README.md.

commit 881b05ecd40c7bc0422d3479a02a28b1cb48383f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 21 16:34:27 2019 -0600

    Fixed blastest failure for 'generic' subconfig.
    
    Details:
    - Fixed a subtle and complicated bug that only manifested via the BLAS
      test drivers in the generic subconfiguration, and possibly any other
      subconfiguration that did not register complex-domain gemm ukernels,
      or registered ONLY real-domain ukernels as row-preferential. This is
      a long story, but it boils down to an exception to the "transpose the
      operation to bring storage of C into agreement with ukernel pref"
      optimization in bli_hemm_front.c and bli_symm_front.c sabotaging the
      proper functioning of the 1m method, but only when the imaginary
      component of beta is zero. See the comments in issue #342 for more
      details. Thanks to Dave Love for identifying the commit in which this
      bug was introduced, and other feedback related to this bug.

commit 0c7165fb01cdebbc31ec00124d446161b289942f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 14 16:48:14 2019 -0600

    Fixed obscure bug in bli_acquire_mpart_[mn]dim().
    
    Details:
    - Fixed a bug in bli_acquire_mpart_mdim(), bli_acquire_mpart_ndim(),
      and bli_acquire_mpart_mndim() that allowed the use of a blocksize b
      that is too large given the current row/column index (i.e., the i/j
      argument) and the size of the dimension being partitioned (i.e., the
      m/n argument). This bug only affected backwards partitioning/motion
      through the dimension and was the result of a misplaced conditional
      check-and-redirect to the backwards code path. It should be noted
      that this bug was discovered not because it manifested the way it
      could (thanks to the callers in BLIS making sure to always pass in
      the "correct" blocksize b), but could have manifested if the
      functions were used by 3rd party callers. Thanks to Minh Quan Ho for
      reporting the bug via issue #363.

commit fb8bef9982171ee0f60bc39e41a33c4d31fd59a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 14 13:05:28 2019 -0600

    Fixed copy-paste bug in bli_spackm_6xk_bb4_ref().
    
    Details:
    - Fixed a copy-paste bug in the new bli_spackm_6xk_bb4_ref() that
      manifested as failures in single-precision real level-3 operations.
      Also replaced the duplication factor constants with a const-qualifed
      varialbe, dfac, so that this won't happen again.
    - Changed NC for single-precision real from 4080 to 8160 so that the
      packed matrix B will have the same byte footprint in both single
      and double real.

commit 8f399c89403d5824ba767df1426706cf2d19d0a7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 12 15:32:57 2019 -0600

    Tweaked/added notes to docs/Multithreading.md.
    
    Details:
    - Added language to docs/Multithreading.md cautioning the reader about
      the nuances of setting multithreading parameters via the manual and
      automatic ways simultaneously, and also about how these parameters
      behave when multithreading is disabled at configure-time. These
      changes are an attempt to address the issues that arose in issue #362.
      Thanks to Jérémie du Boisberranger for his feedback on this topic.
    - CREDITS file update.

commit bdc7ee3394500d8e5b626af6ff37c048398bb27e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 11 15:47:17 2019 -0600

    Various fixes to support packing duplication in B.
    
    Details:
    - Added cpp macros to trmm and trmm3 front-ends to optionally force
      those operations to be cast so the structured matrix is on the left.
      symm and hemm already had such macros, but these too were renamed so
      that the macros were individual to the operation. We now have four
      such macros:
        #define BLIS_DISABLE_HEMM_RIGHT
        #define BLIS_DISABLE_SYMM_RIGHT
        #define BLIS_DISABLE_TRMM_RIGHT
        #define BLIS_DISABLE_TRMM3_RIGHT
      Also, updated the comments in the symm and hemm front-ends related to
      the first two macro guards, and added corresponding comments to the
      trmm and trmm3 front-ends for the latter two guards. (They all
      functionally do the same thing, just for their specific operations.)
      Thanks to Jeff Hammond for reporting the bugs that led me to this
      change (via #359).
    - Updated config/old/haswellbb subconfiguration (used to debug issues
      related to duplicating B during packing) to register: a packing
      kernel for single-precision real; gemmbb ukernels for s, c, and z;
      trsmbb ukernels for s, c, and z; gemmtrsmbb virtual ukrnels for s, c
      and z; and to use non-default cache and register blocksizes for s, c,
      and z datatypes. Also declared prototypes for all of the gemmbb,
      trsmbb, and gemmtrsmbb ukernel functions within the
      bli_cntx_init_haswellbb() function. This should, once applied to the
      power9 configuration, fix the remaining issues in #359.
    - Defined bli_spackm_6xk_bb4_ref(), which packs single reals with a
      duplication factor of 4. This function is defined in the same file as
      bli_dpackm_6xk_bb2_ref() (bli_packm_cxk_bb_ref.c).

commit 0eb79ca8503bd7b237994335b9687457227d3290
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 8 14:48:48 2019 -0600

    Avoid unused variable warning in lread.c (#356).
    
    Details:
    - Replaced the line
    
        f = f;
    
      with
    
        ( void )f;
    
      for the unused variable 'f' in blastest/f2c/lread.c. (Hopefully)
      addresses issue #356, but since we don't use xlc who knows. Thanks
      to Jeff Hammond for reporting this.

commit f377bb448512f0b578263387eed7eaf8f2b72bb7
Author: Jérôme Duval <jerome.duval@gmail.com>
Date:   Thu Nov 7 23:39:29 2019 +0100

    Add Haiku to the known OS list (#361)

commit e29b1f9706b6d9ed798b7f6325f275df4e6be973
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 5 17:15:19 2019 -0600

    Fixed failing testsuite gemmtrsm_ukr for power9.
    
    Details:
    - Added code that fixes false failures in the gemmtrsm_ukr module of the
      testsuite. The tests were failing because the computation (bli_gemv())
      that performs the numerical check was not able to properly travserse
      the matrix operands bx1 and b11 that are views into the micropanel of
      B, which has duplicated/broadcast elements under the power9 subconfig.
      (For example, a micropanel of B with duplication factor of 2 needs to
      use a column stride of 2; previously, the column stride was being
      interpreted as 1.)
    - Defined separate bli_obj_set_row_stride() and bli_obj_set_col_stride()
      static functions in bli_obj_macro_defs.h. (Previously, only the
      function bli_obj_set_strides() was defined. Amazing to think that we
      got this far without these former functions.)
    - Updated/expounded upon comments.

commit 49177a6b9afcccca5b39a21c6fd8e243525e1505
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 4 18:09:37 2019 -0600

    Fixed latent testsuite ukr module bugs for power9.
    
    Details:
    - Fixed a latent bug in the testsuite ukernel modules (gemm, trsm, and
      gemmtrsm) that only manifested once we began running with parameters
      that mimic those of power9. The problem was rooted in the way those
      modules were creating objects (and thus allocating memory) for the
      micropanel operands to the microkernel being tested. Since power9
      duplicates/broadcasts elements of B in memory, we needed an easy way
      of asking for more than one storage element per logical element in
      the matrix. I incorrectly expressed this as:
    
        bli_obj_create( datatype, k, n, ldbp, 1, &bp );
    
      The problem here is that bli_obj_create() is exceedingly efficient
      at calculating the size it passes to malloc() and doesn't allocate a
      full leading dimension's worth of elements for the last column (or
      row, in this example). This would normally not bother anyone since
      you're not supposed to access that memory anyway. But here, my
      attempted "hack" for getting extra elements was insufficient, and
      needed to be changed to:
    
        bli_obj_create( datatype, k, ldbp, ldbp, 1, &bp );
    
      That is, the extra elements needed to be baked into the dimensions of
      the matrix object in order to have the intended effect on the number
      of elements actually allocated. Thanks to Jeff Hammond for reporting
      this bug.
    - Fixed a typically harmless memory leak in the aforementioned test
      modules (the objects for the packed micropanels were not being freed).
    - Updated/expanded a common comment across all three ukr test modules.

commit c84391314d4f1b3f73d868f72105324e649f2a72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 4 13:57:12 2019 -0600

    Reverted minor temp/wspace changes from b426f9e.
    
    Details:
    - Added missing license header to bli_pwr9_asm_macros_12x6.h.
    - Reverted temporary changes to various files in 'test' and 'testsuite'
      directories.
    - Moved testsuite/jobscripts into testsuite/old.
    - Minor whitespace/comment changes across various files.

commit 4870260f6b8c06d2cc01b7147d7433ddee213f7f
Author: Jeff Hammond <jeff.r.hammond@intel.com>
Date:   Mon Nov 4 11:55:47 2019 -0800

    blacklist GCC 5 and older for POWER9 (#360)

commit b426f9e04e5499c6f9c752e49c33800bfaadda4c
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date:   Fri Nov 1 17:57:03 2019 -0500

    POWER9 DGEMM  (#355)
    
    Implemented and registered power9 dgemm ukernel.
    
    Details:
    - Implemented 12x6 dgemm microkernel for power9. This microkernel
      assumes that elements of B have been duplicated/broadcast during the
      packing step. The microkernel uses a column orientation for its
      microtile vector registers and thus implements column storage and
      general stride IO cases. (A row storage IO case via in-register
      transposition may be added at a future date.) It should be noted that
      we recommend using this microkernel with gcc and *not* xlc, as issues
      with the latter cropped up during development, including but not
      limited to slightly incompatible vector register mnemonics in the GNU
      extended inline assembly clobber list.

commit 58102aeaa282dc79554ed045e1b17a6eda292e15
Merge: 52059506 b9bc222b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 28 17:58:31 2019 -0500

    Merge branch 'amd'

commit 52059506b2d5fd4c3738165195abeb356a134bd4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 23 15:26:42 2019 -0500

    Added "How to Download BLIS" section to README.md.
    
    Details:
    - Added a new section to the README.md, just prior to the "Getting
      Started" section, titled "How to Download BLIS". This section details
      the user's options for obtaining BLIS and lays out four common ways
      of downloading the library. Thanks to Jeff Diamond for his feedback
      on this topic.

commit e6f0a96cc59aef728470f6850947ba856148c38a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 14 17:05:39 2019 -0500

    Updated README.md to ack Facebook as funder.

commit b9bc222bfc3db4f9ae5d7b3321346eed70c2c3fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 14 16:38:15 2019 -0500

    Call bli_syrk_small() before error checking.
    
    Details:
    - In bli_syrk_front(), moved the conditional call to bli_syrk_check()
      (if error checking is enabled) and the conditional scaling of C by
      beta (if alpha is zero) so that they occur after, instead of before,
      the call to bli_syrk_small(). This sequencing now matches that of
      bli_gemm_small() in bli_gemm_front() and bli_trsm_small() in
      bli_trsm_front().

commit f0959a81dbcf30d8a1076d0a6348a9835079d31a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 14 15:46:28 2019 -0500

    When manual config is blacklisted, output error.
    
    Details:
    - Fixed and adjusted the logic in configure so that a more informative
      error message is output when a user runs './configure ... <conf>' and
      <conf> is present in the configuration blacklist. Previously, this
      particular set of conditions would result in the message:
    
        'user-specified configuration '' is NOT registered!
    
      That is, the error message mis-identified the targeted configuration
      as the empty string, and (more importantly) mis-identifies the
      problem. Thanks to Tze Meng Low for reporting this issue.
    - Fixed a nearby error messages somewhat unrelated to the issue above.
      Specifically, the wrong string was being printed when the error
      message was identifying an auto-detected configuration that did not
      appear to be registered.

commit 6218ac95a525eefa8921baf8d0d7057dfacebe9c
Merge: 0016d541 a617301f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 11:53:51 2019 -0500

    Merge branch 'master' into amd

commit 0016d541e6b0da617b1fae6612d2b314901b7a75
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 11:09:44 2019 -0500

    Changed -march=znver2 to =znver1 for clang on zen2.
    
    Details:
    - In config/zen2/make_defs.mk, changed the -march= flag so that
      -march=znver1 is used instead of -march=znver2 when CC_VENDOR is
      clang. (The gcc branch attempts to differentiate between various
      versions, but the equivalent version cutoffs for clang are not
      yet known by us, so we have to use a single flag for all versions
      of clang. Hopefully -march=znver1 is new enough. If not, we'll
      fall back to -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp.)
      This issue was discovered thanks to AppVeyor.

commit e94a0530e5ac4c78a18f09105f40003be2b517f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 10:48:27 2019 -0500

    Corrected zen NC that was non-multiple of NR.
    
    Details:
    - Updated an incorrectly set cache blocksize NC for single real within
      config/zen/bli_cntx_init_zen.c that was non a multiple of the
      corresponding value of NR. This issue, which was caught by Travis CI,
      was introduced in 29b0e1e.

commit a2ffac752076bf55eb8c1fe2c5da8d9104f1f85b
Merge: 1cfe8e25 29b0e1ef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 10:31:18 2019 -0500

    Merge branch 'amd-master' into amd

commit 29b0e1ef4e8b84ce76888d73c090009b361f1306
Merge: 1cfe8e25 fdce1a56
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 10:24:24 2019 -0500

    Code review + tweaks to AMD's AOCL 2.0 PR (#349).
    
    Details:
    - NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
      into 'amd-master' of flame/blis.
    - Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
      inadvertantly not incremented when the Zen2 subconfiguration was
      added.
    - In bli_gemm_front(), added a missing conditional constraint around the
      call to bli_gemm_small() that ensures that the computation precision
      of C matches the storage precision of C.
    - In bli_syrk_front(), reorganized and relocated the notrans/trans logic
      that existed around the call to bli_syrk_small() into bli_syrk_small()
      to minimize the calling code footprint and also to bring that code
      into stylistic harmony with similar code in bli_gemm_front() and
      bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
      proper accessor static functions (e.g. 'a->dim[0]' becomes
      'bli_obj_length( a )').
    - Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
      bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
      strictly speaking unnecessary, but it serves as a useful visual cue to
      those who may be reading the files.
    - Removed cpp macro-protected small matrix debugging code from
      bli_trsm_front.c.
    - Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
      version check for availability of -march=znver2, and added appropriate
      support to configure script.
    - Cleanups to compiler flags common to recent AMD microarchitectures in
      config/zen/amd_config.mk, including: removal of -march=znver1 et al.
      from CKVECFLAGS (since the -march flag is added within make_defs.mk);
      setting CRVECFLAGS similarly to CKVECFLAGS.
    - Cleanups to config/zen/bli_cntx_init_zen.c.
    - Cleanups, added comments to config/zen/make_defs.mk.
    - Cleanups to config/zen2/make_defs.mk, including making use of newly-
      added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
      set of compiler flags based on the version of gcc being used.
    - Reverted downstream changes to test/test_gemm.c.
    - Various whitespace/comment changes.

commit a617301f9365ac720ff286514105d1b78951368b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 8 17:14:05 2019 -0500

    Updates to docs/CodingConventions.md.

commit 171f10069199f0cd280f18aac184546bd877c4fe
Merge: 702486b1 05d58edf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 4 11:18:23 2019 -0500

    Merge remote-tracking branch 'loveshack/emacs'

commit 702486b12560b5c696ba06de9a73fc0d5107ca44
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 2 16:35:41 2019 -0500

    Removed stray FAQ section introduced in 1907000.

commit 1907000ad6ea396970c010f07ae42980b7b14fa0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 2 16:31:54 2019 -0500

    Updated to FAQ (AMD-related questions).
    
    Details:
    - Added a couple potential frequently-asked questions/answers releated
      to AMD's fork of BLIS.
    - Updated existing answers to other questions.

commit 834f30a0dad808931c9d80bd5831b636ed0e1098
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 2 12:45:56 2019 -0500

    Mention mixeddt paper in docs/MixedDatatypes.md.

commit 05d58edfe0ea9279971d74f17a5f7a69c4672ed5
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Wed Oct 2 10:33:44 2019 +0100

    Note .dir-locals.el in docs

commit 531110c339f199a4d165d707c988d89ab4f5bfe8
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Wed Oct 2 10:16:22 2019 +0100

    Modify Emacs config
    Confine it to cc-mode and add comment-start/end.

commit 4bab365cab98202259c70feba6ec87408cba28d8
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Tue Oct 1 19:22:47 2019 +0000

    Add .dir-locals.el for Emacs (#348)
    
    A minimal version that could probably do with extending, but at least
    gets the indentation roughly right.

commit 4ec8dad66b3d37b0a2b47d19b7144bb62d332622
Author: Dave Love <dave.love@manchester.ac.uk>
Date:   Thu Sep 26 16:27:53 2019 +0100

    Add .dir-locals.el for Emacs
    
    A minimal version that could probably do with extending, but at least
    gets the indentation roughly right.

commit bc16ec7d1e2a30ce4a751255b70c9cbe87409e4f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 23 15:37:33 2019 -0500

    Set execute bits of shared library at install-time.
    
    Details:
    - Modified the 0644 octal code used during installation of shared
      libraries to 0755 (for Linux/OSX only). Thanks to Adam J. Stewart
      for reporting this issue via #343.
    - CREDITS file update.

commit c60db26aee9e7b4e5d0b031b0881e58d23666b53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 17 18:04:17 2019 -0500

    Fixed bad loop counter in bli_[cz]scal2bbs_mxn().
    
    Details:
    - Fixed a typo in the loop counter for the 'd' (duplication) dimension
      in the complex macros of frame/include/level0/bb/bli_scal2bbs_mxn.h.
      They shouldn't be used by anyone yet, but thankfully clang via
      AppVeyor spit out warnings that alerted me to the issue.

commit c766c81d628f0451d8255bf5e4b8be0a4ef91978
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 17 18:00:29 2019 -0500

    Added missing schema arg to knl packm kernels.
    
    Details:
    - Added the pack_t schema argument to the knl packm kernel functions.
      This change was intended for inclusion in 31c8657. (Thank you SDE +
      Travis CI.)

commit 31c8657f1d6d8f6efd8a73fd1995e995fc56748b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 17 17:42:10 2019 -0500

    Added support for pre-broadcast when packing B.
    
    Details:
    - Added support for being able to duplicate (broadcast) elements in
      memory when packing matrix B (ie: the left-hand operand) in level-3
      operations. This turns out advantageous for some architectures that
      can afford the cost of the extra bandwidth and somehow benefit from
      the pre-broadcast elements (and thus being able to avoid using
      broadcast-style load instructions on micro-rows of B in the gemm
      microkernel).
    - Support optionally disabling right-side hemm and symm. If this occurs,
      hemm_r is implemented in terms of hemm_l (and symm_r in terms of
      symm_l). This is needed when broadcasting during packing because the
      alternative--supporting the broadcast of B while also allowing matrix
      B to be Hermitian/symmetric--would be an absolute mess.
    - Support alignment factors for packed blocks of A, B, and C separately
      (as well as for general-purpose buffers). In addition, we support
      byte offsets from those alignment values (which is different from
      aligning by align+offset bytes to begin with). The default alignment
      values are BLIS_PAGE_SIZE in all four cases, with the offset values
      defaulting to zero.
    - Pass pack_t schema into bli_?packm_cxk() so that it can be then passed
      into the packm kernel, where it will be needed by packm kernels that
      perform broadcasts of B, since the idea is that we *only* want to
      broadcast when packing micropanels of B and not A.
    - Added definition for variadic bli_cntx_set_l3_vir_ukrs(), which can be
      used to set custom virtual level-3 microkernels in the cntx_t, which
      would typically be done in the bli_cntx_init_*() function defined in
      the subconfiguration of interest.
    - Added a "broadcast B" kernel function for use with NP/NR = 12/6,
      defined in in ref_kernels/1m/bli_packm_cxk_bb_ref.c.
    - Added a gemm, gemmtrsm, and trsm "broadcast B" reference kernels
      defined in ref_kernels/3/bb. (These kernels have been tested with
      double real with NP/NR = 12/6.)
    - Added #ifndef ... #endif guards around several macro constants defined
      in frame/include/bli_kernel_macro_defs.h.
    - Defined a few "broadcast B" static functions in
      frame/include/level0/bb for use by "broadcast B"-style packm reference
      kernels. For now, only the real domain kernels are tested and fully
      defined.
    - Output the alignment and offset values for packed blocks of A and B
      in the testsuite's "BLIS configuration info" section.
    - Comment updates to various files.
    - Bumped so_version to 3.0.0.

commit fd9bf497cd4ff73ccdfc030ba037b3cb2f1c2fad
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 17 15:45:24 2019 -0500

    CREDITS file update.

commit 6c8f2d1486ce31ad3c2083e5c2035acfd4409a43
Author: ShmuelLevine <shmuel.levine@gmail.com>
Date:   Tue Sep 17 16:43:46 2019 -0400

    Fix description for function bli_*pxby2v (#340)
    
    Fix typo in BLISTypedAPI.md for bli_?axpy2v() description.

commit b5679c1520f8ae7637b3cc2313133461f62398dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 17 14:00:37 2019 -0500

    Inserted Multithreading links into BuildSystem.md.
    
    Details:
    - Inserted brief disclaimers about default disabled multithreading
      and default single-threadedness to BuildSystem.md along with links to
      the Multithreading.md document. Thanks to Jeff Diamond for suggesting
      these additions.
    - Trivial reword of sentence regarding automatically-detected
      architectures.

commit f4f5170f8482c94132832eb3033bc8796da5420b
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Sep 11 07:34:48 2019 -0500

    Update README.md (#338)

commit 1cfe8e2562e5e50769468382626ce36b734741c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 5 16:08:30 2019 -0500

    Reimplemented bli_cpuid_query() for ARM.
    
    Details:
    - Rewrote bli_cpuid_query() for ARM architectures to use stdio-based
      functions such as fopen() and fgets() instead of popen(). The new code
      does more or less the same thing as before--searches /proc/cpuinfo for
      various strings, which are then parsed in order to determine the
      model, part number, and features. Thanks to Dave Love for suggesting
      this change in issue #335.

commit 7c7819145740e96929466a248d6375d40e397e19
Author: Devin Matthews <damatthews@smu.edu>
Date:   Fri Aug 30 16:52:09 2019 -0500

    Always use sqsumv to compute normfv. (#334)
    
    * Always use sqsumv to compute normfv on MacOS.
    
    * Unconditionally disable the "dot trick" in normfv.
    
    * Added explanatory comment to normfv definition.
    
    Details:
    - Added a comment above the unconditional disabling of the dotv-based
      implementation to normfv. Thanks to Roman Yurchak, Devin Matthews,
      and Isuru Fernando in helping with this improvement.
    - CREDITS file update.

commit 80e6c10b72d50863b4b64d79f784df7befedfcd1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 29 12:12:08 2019 -0500

    Added reproduction section to Performance docs.
    
    Details:
    - Added section titled "Reproduction" to both Performance.md and
      PerformanceSmall.md that briefly nudges the motivated reader in the
      right direction if he/she wishes to run the same performance
      benchmarks used to produce the graphs shown in those documents.
      Thanks to Dave Love for making this suggestion.

commit 14cb426414856024b9ae0f84ac21efcc1d329467
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 28 17:04:33 2019 -0500

    Updated OpenBLAS, Eigen sup results.
    
    Details:
    - Updated the results shown in docs/PerformanceSmall.md for OpenBLAS and
      Eigen.

commit b02e0aae8ce2705e91023b98ed416cd05430a78e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 27 14:37:46 2019 -0500

    Updated test drivers to iterate backwards.
    
    Details:
    - Updated test driver source in test, test/3, test/1m4m, and
      test/mixeddt to iterate through the problem space backwards. This
      can help avoid certain situations where the CPU frequency does not
      immediately throttle up to its maximum. Thanks to Robert van de
      Geijn for recommending this fix (originally made to test/sup drivers
      in 57e422a).
    - Applied off-by-one matlab output bugfix from b6017e5 to test drivers
      in test, test/3, test/1m4m, and test/mixeddt directories.

commit b6017e53f4b26c99b14cdaa408351f11322b1e80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 27 14:18:14 2019 -0500

    Bugfix of output text + tweaks to test/sup driver.
    
    Details:
    - Fixed an off-by-one bug in the output of matlab row indices in
      test/sup/test_gemm.c that only manifested when the problem size
      increment was equal to 1.
    - Disabled the building of rrc, rcr, rcc, crr, crc, and ccr storage
      combinations for blissup drivers in test/sup. This helps make the
      building of drivers complete sooner.
    - Trivial changes to test/sup/runme.sh.

commit 138d403b6bb15e687a3fe26d3d967b8ccd1ed97b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Mon Aug 26 18:11:27 2019 -0500

    Use -funsafe-math-optimizations and -ffp-contract=fast for all reference kernels when using gcc or clang. (#331)

commit d5a05a15a7fcc38fb2519031dcc62de8ea4a530c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 26 16:54:31 2019 -0500

    Cropped whitespace from new sup graphs.
    
    Details:
    - Previously forgot crop whitespace from the new .png graphs
      added/updated in docs/graphs/sup.

commit a6c80171a353db709e43f9e6e7a3da87ce4d17ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 26 16:51:31 2019 -0500

    Fixed contents links in docs/PerformanceSmall.md.
    
    Details:
    - Corrected links in contents section of docs/PerformanceSmall.md,
      which were erroneously directing readers to the corresponding
      sections of docs/Performance.md.

commit 40781774df56a912144ef19cc191ed626a89f0de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 26 16:47:37 2019 -0500

    Updated sup performance graphs with libxsmm.
    
    Details:
    - Added libxsmm to column-stored sup graphs presented in
      docs/PerformanceSmall.md.
    - Updated sup results for BLASFEO.
    - Added sup results for Lonestar5 (Haswell).
    - Addresses issue #326.

commit bfddf671328e7e372ac7228f72ff2d9d8e03ae18
Author: figual <figual@ucm.es>
Date:   Mon Aug 26 12:01:33 2019 +0200

    Fixed context registration for Cortex A53 (#329).

commit 4a0a6e89c568246d14de4cc30e3ff35aac23d774
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 24 15:25:16 2019 -0500

    Changed test/sup alpha to 1; test libxsmm+netlib.
    
    Details:
    - Changed the value of alpha to 1.0 in test/sup/test_gemm.c. This is
      needed because libxsmm currently only optimizes gemm operations where
      alpha is unit (and beta is unit or zero).
    - Adjusted the test/sup/Makefile to test libxsmm with netlib BLAS as its
      fallback library. This is the library that will be called the
      problem dimensions are deemed too large, or any other criteria for
      optimization are not met. (This was done not because it is realistic,
      but rather so that it would be very clear when libxsmm ceased handling
      gemm calls internally when the data are graphed.)

commit 7aa52b57832176c5c13a48e30a282e09ecdabf73
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 23 16:12:50 2019 -0500

    Use libxsmm API in test/sup; add missing -ldl.
    
    Details:
    - Switch the driver source in test/sup so that libxsmm_?gemm() is called
      instead of ?gemm_() when compiling for / linking against libxsmm.
      libxsmm's documentation isn't clear on whether it is even *trying* to
      provide BLAS API compatibility, and I got tired of trying to figure it
      out.
    - Added missing -ldl in LDFLAGS when linking against libxsmm.

commit 57e422aa168bee7416965265c93fcd4934cd7041
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 23 14:17:52 2019 -0500

    Added libxsmm support to test/sup drivers.
    
    Details:
    - Modified test/sup/Makefile to build drivers that test the performance
      of skinny/small problems via libxsmm.
    - Modified test/sup/runme.sh to run aforementioned drivers.
    - Modified test/sup/test_gemm.c so that problem sizes are tested in
      reverse order (from largest to smallest). This can help avoid certain
      situations where the CPU frequency does not immediately throttle up
      to its maximum. Thanks to Robert van de Geijn for recommending this
      fix.

commit 661681fe33978acce370255815c76348f83632bc
Merge: 2f387e32 ef0a1a0f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 22 14:29:50 2019 -0500

    Merge branch 'master' of github.com:flame/blis

commit 2f387e32ef5f9a17bafb5076dc9f66c38b52b32d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 22 14:27:30 2019 -0500

    Added Eigen -march=native hack to perf docs.
    
    Details:
    - Spell out the hack given to me by Sameer Agarwal in order to get Eigen
      to build with -march=native (which is critically important for Eigen)
      in docs/Performance.md and docs/PerformanceSmall.md.

commit ef0a1a0faf683fe205f85308a54a77ffd68a9a6c
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Aug 21 17:40:24 2019 -0500

    Update do_sde.sh (#330)
    
    * Update do_sde.sh
    
    Automatically accept SDE license and download directly from Intel
    
    * Update .travis.yml
    
    [ci skip]
    
    * Update .travis.yml
    
    Enable SDE testing for PRs.

commit 0cd383d53a8c4a6871892a0395591ef5630d4ac0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 21 13:39:05 2019 -0500

    Corrected variable type and comment update.
    
    Details:
    - Forgot to save all changes from bli_gemmtrsm4m1_ref.c before commit
      in 8122f59. Fixed type mismatch and referenced github issue in
      comment.

commit 8122f59745db780987da6aa1e851e9e76aa985e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 21 13:22:12 2019 -0500

    Pacify 'restrict' warning in gemmtrsm4m1 ref ukr.
    
    Details:
    - Previously, some versions of gcc would complain that the same
      pointer, one_r, is being passed in for both alpha and beta in the
      fourth call to the real gemm ukernel in bli_gemmtrsm4m1_ref.c. This
      is understandable since the compiler knows that the real gemm ukernel
      qualifies all of its floating-point arguments (including alpha and
      beta) with restrict. A small hack has been inserted into the file
      that defines a new variable to store the value 1.0, which is now used
      in lieu of one_r for beta in the fourth call to the real gemm ukernel,
      which should pacify the compiler now. Thanks to Dave Love for
      reporting this issue (#328) and for Devin Matthews for offering his
      'restrict' expertise.

commit e8c6281f139bdfc9bd68c3b36e5e89059b0ead2e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 21 12:38:53 2019 -0500

    Add -march support for specific gcc version ranges.
    
    Details:
    - Added logic to configure that checks the version of the compiler
      against known version ranges that could cause problems later in the
      build process. For example, versions of gcc older than 4.9.0 use
      different -march labels than version 4.9.0 or later
      ('-march=corei7-avx' vs '-march=sandybridge', respectively).
      Similarly, before 6.1, compilation on Zen was possible, but you
      need to start with -march=bdver4 and then disable instruction sets
      that were discarded during the transition from Excavator to Zen. So
      now, configure substitutes 'yes'/'no' values into anchors in
      config.mk.in, which sets various make variables (e.g. GCC_OT_4_9_0),
      which can be accessed and branched upon by the various
      configurations' make_defs.mk files when setting their compiler flags.
    - Updated config/haswell/make_defs.mk to branch on GCC_OT_4_9_0.
    - Updated config/sandybridge/make_defs.mk to branch on GCC_OT_4_9_0.
    - Updated config/zen/make_defs.mk to branch on GCC_OT_6_1_0.

commit e6ac4ebcb6e6a372820e7f509c0af3342966b84a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 20 13:49:47 2019 -0500

    Added page size, source location to perf docs.
    
    Details:
    - Added the page size, as returned via 'getconf -a | grep PAGE_SIZE',
      and the location of the performance drivers to docs/Performance.md
      (test/3) and docs/PerformanceSmall.md (test/sup). Thanks to Dave
      Love for suggesting these additions in #325.

commit fdce1a5648d69034fab39943100289323011c36f
Author: Meghana <Meghana.Vankadari@amd.com>
Date:   Wed Jul 24 15:04:41 2019 +0530

    changed gcc version check condition from 'ifeq' to 'if greater or equal'
    
    Change-Id: Ie4c461867829bcc113210791bbefb9517e52c226

commit c9486e0c4f82cd9f58f5ceb71c0df039e9970a20
Author: Meghana <Meghana.Vankadari@amd.com>
Date:   Wed Jul 24 09:45:17 2019 +0530

    code to detect version of gcc and set flags accordingly for zen2
    
    Change-Id: I29b0311d0000dee1a2533ee29941acf53f9e9f34

commit 54afe3dfe6828a1aff65baabbf14c98d92e50692
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 23 16:54:28 2019 -0500

    Added "Education and Learning" ToC entry to README.

commit 9f53b1ce7ac702e84e71801fe96986f6aa16040e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 23 16:50:35 2019 -0500

    Added "Education and Learning" section to README.
    
    Details:
    - Added a short section after the Intro of the README.md file titled
      "Education and Learning" that directs interested readers to the
      "LAFF-On Programming for High-Performance" massive open online course
      (MOOC) hosted via edX.

commit deda4ca8a094ee18d7c7c45e040e8ef180f33a48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 22 13:59:05 2019 -0500

    Added test/1m4m driver directory.
    
    Details:
    - Added a new standalone test driver directory named '1m4m' that can
      build and run performance experiments for BLIS 1m, 4m1a, assembly,
      OpenBLAS, and the vendor library (MKL). This new driver directory
      was used to regenerate performance results for the 1m paper.
    - Added alternate (commented-out) cache blocksizes to
      config/haswell/bli_cntx_init_haswell.c. These blocksizes tend to
      work well on an a 12-core Intel Xeon E5-2650 v3.

commit dcc0ce12fde4c6dca2b4764a1922a2ab19725867
Author: Meghana <Meghana.Vankadari@amd.com>
Date:   Mon Jul 22 17:12:01 2019 +0530

    Added a global Makefile for AMD architectures in config/zen folder
    This Makefile(amd_config.mk) has all the flags that are common to EPYC series
    
    Change-Id: Ic02c60a8293ccdd37f0f292e631acd198e6895de

commit af17bca26a8bd3dcbee8ca81c18d7b25de09c483
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 19 14:46:23 2019 -0500

    Updated haswell MC cache blocksizes.
    
    Details:
    - Updated the default MC cache blocksizes used by the haswell subconfig
      for both row-preferential (the default) and column-preferential
      microkernels.

commit b5e9bce4dde5bf014dd9771ae741048e1f6c7748
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 19 14:42:37 2019 -0500

    Updated -march flags for sandybridge, haswell.
    
    Details:
    - Updated the '-march=corei7-avx' flag in the sandybridge subconfig
      to '-march=sandybridge' and the '-march=core-avx2' flag in the
      haswell subconfig to '-march=haswell'. The older flags were used
      by older versions of gcc and should have been updated to the newer
      forms a long time ago. (The older flags were clearly working, even
      though they are no longer documented in the gcc man page.)

commit c22b9dba5859a9fc94c8431eccc9e4eb9be02be1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 16 13:14:47 2019 -0500

    More updates to comments in testsuite modules.
    
    Details:
    - Updated most comments in testsuite modules that describe how the
      correctness test is performed so that it is clear whether the vector
      (normfv) or matrix (normfm) form of Frobenius norm is used.

commit c4cc6fa702f444a05963db01db51bc7d6669e979
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 16 13:00:35 2019 -0500

    New cntx_t blksz "set" functions + misc tweaks.
    
    Details:
    - Defined two new static functions in bli_cntx.h:
        bli_cntx_set_blksz_def_dt()
        bli_cntx_set_blksz_max_dt()
      which developers may find convenient when experimenting with different
      values of cache blocksizes.
    - Updated one- and two-socket multithreaded problem size range and
      increment values in test/3/Makefile.
    - Changed default to column storage in test/3/test_gemm.c.
    - Fixed typo in comment in testsuite/src/test_subm.c.

commit b84cee29f42855dc1f263e42b83b1a46ac8def87
Merge: 1f80858a c7dd6e6c
Author: Meghana Vankadari <Meghana.Vankadari@amd.com>
Date:   Mon Jul 8 02:03:07 2019 -0400

    Merge "Added compiler flags for vanilla clang" into amd-staging-rome2.0

commit 1f80858abf5ca220b2998fbe6f9b06c32d3864c3
Author: kdevraje <kiran.Devrajegowda@amd.com>
Date:   Fri Jul 5 16:05:11 2019 +0530

     This checkin solves the dgemm performance issue jira ticket CPUPL 458, as #else was missed during integration, it was always following else path to get the block sizes
    
    Change-Id: I0084b5856c2513ab1066c08c15b5086db6532717

commit c7dd6e6cd2f910cbefcdc1e04a5adeb919a23de0
Author: Meghana <meghana.vankadari@amd.com>
Date:   Thu Jul 4 09:32:51 2019 +0530

    Added compiler flags for vanilla clang
    
    Change-Id: I13c00b4c0d65bbda4c929848fd48b0ab611952ab

commit 2acd49b76457635625a01e31c2abc8902b23cf51
Author: Meghana <meghana.vankadari@amd.com>
Date:   Mon Jul 1 15:42:38 2019 +0530

    fix for test failures using AOCC 2.0
    
    Change-Id: If44eaccc64bbe96bbbe1d32279b1b5773aba08d1

commit ceee2f973ebe115beca55ca77f9e3ce36b14c28a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 24 17:47:40 2019 -0500

    Fixed thrinfo_t printing bug for small problems.
    
    Details:
    - Fixed a bug in bli_l3_thrinfo_print_gemm_paths() and
      bli_l3_thrinfo_print_trsm_paths(), defined in bli_l3_thrinfo.c,
      whereby subnodes of the thrinfo_t tree are "dereferenced" near the
      beginning of the functions, which may lead to segfaults in certain
      situations where the thread tree was not fully formed because the
      matrix problem was too small for the level of parallelism specified.
      (That is, too small because some problems were assigned no work due
      to the smallest units in the m and n dimensions being defined by the
      register blocksizes mr and nr.) The fix requires several nested levels
      of if statements, and this is one of those few instances where use of
      goto statements results in (mostly) prettier code, especially in the
      case of _gemm_paths(). And while it wasn't necessary, I ported this
      goto usage to the loop body that prints the thrinfo_t work_id and
      comm_id values for each thread. Thanks to Nicholai Tukanov for helping
      to find this bug.

commit cac127182dd88ed0394ad81e6b91b897198e168a
Merge: 565fa385 3a45ecb1
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Mon Jun 24 13:01:27 2019 +0530

    Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis
    with public repo commit id 565fa3853b381051ac92cff764625909d105644d.
    
    Change-Id: I68b9824b110cf14df248217a24a6191b3df79d42

commit c152109e9a3b1cd74760e8a3215a676d25c18d2e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 19 13:23:24 2019 -0500

    Updated BLASFEO results in PerformanceSmall.md.
    
    Details:
    - Updated the BLASFEO performance graphs shown in PerformanceSmall.md
      using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
      accordingly.
    - Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
      containing the mpnpkp results do not need to be preprocessed in order
      to plot half the problem size range (ie: up to 400 instead of the
      800 range of the other shape cases).
    - Trivial updates to runme.m.

commit 4d19c98110691d33ecef09d7e1b97bd1ccf4c420
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 8 11:02:03 2019 -0500

    Trivial change to MixedDatatypes.md link text.

commit 24965beabe83e19acf62008366097a7f198d4841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 8 11:00:22 2019 -0500

    Fixed typo in README.md's MixedDatatypes.md link.

commit 50dc5d95760f41c5117c46f754245edc642b2179
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 7 13:10:16 2019 -0500

    Adjust -fopenmp-simd for icc's preferred syntax.
    
    Details:
    - Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
      icc. Recall that this option is used for SIMD auto-vectorization in
      reference kernels only. Support for the -f option has been completely
      deprecated and removed in newer versions of icc in favor of -q. Thanks
      to Victor Eijkhout for reporting this issue and suggesting the fix.

commit ad937db9507786874c801b41a4992aef42d924a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 7 11:34:08 2019 -0500

    Added missing #include "bli_family_thunderx2.h".
    
    Details:
    - Added a cpp-conditional directive block to bli_arch_config.h that
      #includes "bli_family_thunderx2.h". The code has been missing since
      adf5c17f. However, this never manifested as an error because the file
      is virtually empty and not needed for thunderx2 (or most subconfigs).
      Thanks to Jeff Diamond for helping to spot this.

commit ce671917b2bc24895289247feef46f6fdd5020e7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 6 14:17:21 2019 -0500

    Fixed formatting/typo in docs/PerformanceSmall.md.

commit 86c33a4eb284e2cf3282a1809be377785cdb3703
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 5 11:43:55 2019 -0500

    Tweaked language in README.md related to sup/AMD.

commit cbaa22e1ca368d36a8510f2b4ecd6f1523d1e1f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 4 16:06:58 2019 -0500

    Added BLASFEO results to docs/PerformanceSmall.md.
    
    Details:
    - Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
      and added documenting language accordingly.
    - Updated scripts in test/sup/octave to plot BLASFEO data.
    - Minor tweak to language re: how OpenBLAS was configured for
      docs/Performance.md.

commit 763fa39c3088c0e2c0155675a3ca868a58bffb30
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 4 14:46:45 2019 -0500

    Minor tweaks to test/sup.
    
    Details:
    - Changed starting problem and increment from 16 to 4.
    - Added 'lll' (square problems) to list of problem size shapes to
      compile and run with.
    - Define BLASFEO location and added BLASFEO-related definitions.

commit 5e1e696003c9151b1879b910a1957b7bdd7b0deb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 3 18:37:20 2019 -0500

    CHANGELOG update (0.6.0)

commit 18c876b989fd0dcaa27becd14e4f16bdac7e89b3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 3 18:37:19 2019 -0500

    Version file update (0.6.0)

commit 0f1b3bf49eb593ca7bb08b68a7209f7cd550f912
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 3 18:35:19 2019 -0500

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated ReleaseNotes.md in preparation for next version.
    - CREDITS file update.

commit 27da2e8400d900855da0d834b5417d7e83f21de1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 3 17:14:56 2019 -0500

    Minor edits to docs/PerformanceSmall.md.
    
    Details:
    - Added performance analysis to "Comments" section of both Kaby Lake and
      Epyc sections.
    - Added emphasis to certain passages.

commit 09ba05c6f87efbaadf085497dc137845f16ee9c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 3 16:53:19 2019 -0500

    Added sup performance graphs/document to 'docs'.
    
    Details:
    - Added a new markdown document, docs/PerformanceSmall.md, which
      publishes new performance graphs for Kaby Lake and Epyc showcasing
      the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
      For now, only single-threaded dgemm performance is shown.
    - Reorganized graphs in docs/graphs into docs/graphs/large, with new
      graphs being placed in docs/graphs/sup.
    - Updates to scripts in test/sup/octave, mostly to allow decent output
      in both GNU octave and Matlab.
    - Updated README.md to mention and refer to the new PerformanceSmall.md
      document.

commit 6bf449cc6941734748034de0e9af22b75f1d6ba1
Merge: abd8a9fa a4e8801d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 31 17:42:40 2019 -0500

    Merge branch 'amd'

commit a4e8801d08d81fa42ebea6a05a990de8dcedc803
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 31 17:30:51 2019 -0500

    Increased MT sup threshold for double to 201.
    
    Details:
    - Fine-tuned the double-precision real MT threshold (which controls
      whether the sup implementation kicks for smaller m dimension values)
      from 180 to 201 for haswell and 180 to 256 for zen.
    - Updated octave scripts in test/sup/octave to include a seventh column
      to display performance for m = n = k.

commit 3a45ecb15456249c30ccccd60e42152f355615c1
Merge: 3f867c96 b69fb0b7
Author: Kiran Devrajegowda <Kiran.Devrajegowda@amd.com>
Date:   Fri May 31 06:47:02 2019 -0400

    Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0

commit b69fb0b74a4756168de270fc9b18f7cf7aa57f17
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Fri May 31 15:14:22 2019 +0530

    Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup
    
    Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc

commit 3f867c96caea3bbbbeeff1995d90f6cf8c9895fb
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Fri May 31 12:22:44 2019 +0530

     When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
    
    Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d

commit abd8a9fa7df4569aa2711964c19888b8e248901f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 28 12:49:44 2019 -0500

    Inadvertantly hidden xerbla_() in blastest (#313).
    
    Details:
    - Attempted a fix to issue #313, which reports that when building only
      a shared library (ie: static library build is disabled), running the
      BLAS test drivers can fail because those drivers provide their own
      local version of xerbla_() as a clever (albeit still rather hackish)
      way of checking the error codes that result from the individual tests.
      This local xerbla_() function is never found at link-time because the
      BLAS test drivers' Makefile imports BLIS compilation flags via the
      get-user-cflags-for() function, which currently conveys the
      -fvisibility=hidden flag, which hides symbols unless they are
      explicitly annotated for export. The -fvisibility=hidden flag was
      only ever intended for use when building BLIS (not for applications),
      and so the attempted solution here is to omit the symbol export
      flag(s) from get-user-cflags-for() by storing the symbol export
      flag(s) to a new BULID_SYMFLAGS variable instead of appending it
      to the subconfigurations' CMISCFLAGS variable (which is returned by
      every get-*-cflags-for() function). Thanks to M. Zhou for reporting
      this issue and also to Isuru Fernando for suggesting the fix.
    - Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
      created BUILD_SYMFLAGS.
    - Fixed typo in entry for --export-shared flag in 'configure --help'
      text.

commit 13806ba3b01ca0dd341f4720fb930f97e46710b0
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Mon May 27 16:24:43 2019 +0530

     This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
    
    Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041

commit ee123f535872510f77100d3d55a43d4ca56047d5
Author: Meghana <meghana.vankadari@amd.com>
Date:   Mon May 27 15:36:44 2019 +0530

    Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
    Updated copyright information for kernels/zen/bli_trsm_small.c file
    Removed separate kernels for zen2 architecture
    Instead added threshold conditions in zen kernels both for ROME and NAPLES
    
    Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5

commit 9d93a4caa21402d3a90aac45d7a1603736c9fd63
Author: prangana <pradeep.rao@amd.com>
Date:   Fri May 24 17:59:13 2019 +0530

    update version 2.0

commit 755730608d923538273a90c48bfdf77571f86519
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 23 17:34:36 2019 -0500

    Minor rewording of language around mt env. vars.

commit ba31abe73c97c16c78fffc59a215761b8d9fd1f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 23 14:59:53 2019 -0500

    Added BLIS theading info to Performance.md.
    
    Details:
    - Documented the BLIS environment variables that were set
      (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
      threading configuration in order to achieve the parallelism reported
      on in docs/Performance.md.

commit cb788ffc89cac03b44803620412a5e83450ca949
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 23 13:00:53 2019 -0500

    Increased MT sup threshold for double to 180.
    
    Details:
    - Increased the double-precision real MT threshold (which controls
      whether the sup implementation kicks for smaller m dimension values)
      from 80 to 180, and this change was made for both haswell and zen
      subconfigurations. This is less about the m dimension in particular
      and more about facilitating a smoother performance transition when
      m = n = k.

commit 057f5f3d211e7513f457ee6ca6c9555d00ad1e57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 23 12:51:17 2019 -0500

    Minor build system housekeeping.
    
    Details:
    - Commented out redundant setting of LIBBLIS_LINK within all driver-
      level Makefiles. This variable is already set within common.mk, and
      so the only time it should be overridden is if the user wants to link
      to a different copy of libblis.
    - Very minor changes to build/gen-make-frags/gen-make-frag.sh.
    - Whitespace and inconsequential quoting change to configure.
    - Moved top-level 'windows' directory into a new 'attic' directory.

commit e05171118c377f356f89c4daf8a0d5ddc5a4e4f7
Author: Meghana <meghana.vankadari@amd.com>
Date:   Thu May 23 16:15:27 2019 +0530

    Implemented TRSM for small matrices for cases where A is on the right
    
    Added separate kernels for zen and zen2
    
    Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134

commit 02920f5c480c42706b487e37b5ecc96c3555b851
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Thu May 23 15:29:59 2019 +0530

    make checkblis fails for matrix dimension check at the begining hence reverting it
    
    Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87

commit 84215022f29fb3bfedd254d041635308d177e6c0
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Thu May 23 11:08:41 2019 +0530

     Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
    
    Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5

commit a3554eb1dcc1b5b94d81c60761b2f01c3d827ffa
Merge: ea082f83 17b878b6
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Thu May 23 11:51:07 2019 +0530

    Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2
    
    Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae

commit ea082f839071dd9ec555062dc3851c31d12f00e4
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Thu May 23 10:38:29 2019 +0530

    adding empty zen2 directory with .gitignore file
    
    Change-Id: Ifa37cf54b2578aa19ad335372b44bca17043fe4b

commit b80bd5bcb2be8551a9a21fafc8e6c8b6336c99b5
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue May 21 15:11:47 2019 +0530

    config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
    config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
    config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
    frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
    test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
    Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples
    
    Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2

commit a042db011df9a1c3e7c7ac546541f4746b176ea5
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Mon May 20 14:17:32 2019 +0530

    Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0
    
    Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43

commit a23f92594cf3d530e5794307fe97afc877d853b7
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Mon May 20 10:48:06 2019 +0530

    config_registry: New AMD zen2 architecture configuration added.
      frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
      frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
      frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
      frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
      frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
      frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20
    
    Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866

commit 17b878b66d917d50b6fe23721d8579e826cb3e8c
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Wed May 22 14:02:53 2019 +0530

    adding license same as in ut-austin-amd-branch
    
    Change-Id: I6790768d2bf5d42369d304ef93e34701f95fbaff

commit df755848b8a271323e007c7a628c64af63deab00
Merge: ca4b33c0 c72ae27a
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Wed May 22 13:30:07 2019 +0530

    Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0
    
    Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc

commit c72ae27adee4726679ee004d02c972582b5285b4
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Mar 19 12:49:26 2018 +0530

    Re-enabling the small matrix gemm optimization for target zen
    
    Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit ab0818af80f7f683080873f3fa24734b65267df2
Author: sraut <Biplab.Raut@amd.com>
Date:   Wed Oct 3 15:30:33 2018 +0530

    Review comments incorporated for small TRSM.
    
    Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9

commit 32392cfc72af7f42da817a129748349fb1951346
Author: Jeff Hammond <jeff.r.hammond@intel.com>
Date:   Tue May 14 15:52:30 2019 -0400

    add info about CXX in configure (#311)

commit fa7e6b182b8365465ade178b0e4cd344ff6f6460
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 1 19:13:00 2019 -0500

    Define _POSIX_C_SOURCE in bli_system.h.
    
    Details:
    - Added
        #ifndef _POSIX_C_SOURCE
        #define _POSIX_C_SOURCE 200809L
        #endif
      to bli_system.h so that an application that uses BLIS (specifically,
      an application that #includes blis.h) does not need to remember to
      #define the macro itself (either on the command line or in the code
      that includes blis.h) in order to activate things like the pthreads.
      Thanks to Christos Psarras for reporting this issue and suggesting
      this fix.
    - Commented out #include <sys/time.h> in bli_system.h, since I don't
      think this header is used/needed anymore.
    - Comment update to function macro for bli_?normiv_unb_var1() in
      frame/util/bli_util_unb_var1.c.

commit 3df84f1b5d5e1146bb01bfc466ac20c60a9cc859
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 27 21:27:32 2019 -0500

    Minor bugfixes in sup dgemm implementation.
    
    Details:
    - Fixed an obscure but in the bli_dgemmsup_rv_haswell_asm_5x8n() kernel
      that only affected the beta == 0, column-storage output case. Thanks
      to the BLAS test drivers for catching this bug.
    - Previously, bli_gemmsup_ref_var1n() and _var2m() were returning if
      k = 0, when the correct action would be to scale by beta (and then
      return). Thanks to the BLAS test drivers to catching this bug.
    - Changed the sup threshold behavior such that the sup implementation
      only kicks in if a matrix dimension is strictly less than (rather than
      less than or equal to) the threshold in question.
    - Initialize all thresholds to zero (instead of 10) by default in
      ref_kernels/bli_cntx_ref.c. This, combined with the above change to
      threshold testing means that calls to BLIS or BLAS with one or more
      matrix dimensions of zero will no longer trigger the sup
      implementation.
    - Added disabled debugging output to frame/3/bli_l3_sup.c (for future
      use, perhaps).

commit ecbdd1c42dcebfecd729fe351e6bb0076aba7d81
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 27 19:38:11 2019 -0500

    Ceased use of BLIS_ENABLE_SUP_MR/NR_EXT macros.
    
    Details:
    - Removed already limited use of the BLIS_ENABLE_SUP_MR_EXT and
      BLIS_ENABLE_SUP_NR_EXT macros in bli_gemmsup_ref_var1n() and
      bli_gemmsup_ref_var2m(). Their purpose was merely to avoid a long
      conditional that would determine whether to allow the last iteration
      to be merged with the second-to-last iteration. Functionally, the
      macros were not needed, and they ended up causing problems when
      building configuration families such as intel64 and x86_64.

commit aa8a6bec3036a41e1bff2034f8ef6766a704ec49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 27 18:53:33 2019 -0500

    Fixed typo in --disable-sup-handling macro guard.
    
    Details:
    - Fixed an incorrectly-named macro guard that is intended to allow
      disabling of the sup framework via the configure option
      --disable-sup-handling. In this case, the preprocessor macro,
      BLIS_DISABLE_SUP_HANDLING, was still named by its name from an older
      uncommitted version of the code (BLIS_DISABLE_SM_HANDLING).

commit b9c9f03502c78a63cfcc21654b06e9089e2a3822
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 27 18:44:50 2019 -0500

    Implemented gemm on skinny/unpacked matrices.
    
    Details:
    - Implemented a new sub-framework within BLIS to support the management
      of code and kernels that specifically target matrix problems for which
      at least one dimension is deemed to be small, which can result in long
      and skinny matrix operands that are ill-suited for the conventional
      level-3 implementations in BLIS. The new framework tackles the problem
      in two ways. First the stripped-down algorithmic loops forgo the
      packing that is famously performed in the classic code path. That is,
      the computation is performed by a new family of kernels tailored
      specifically for operating on the source matrices as-is (unpacked).
      Second, these new kernels will typically (and in the case of haswell
      and zen, do in fact) include separate assembly sub-kernels for
      handling of edge cases, which helps smooth performance when performing
      problems whose m and n dimension are not naturally multiples of the
      register blocksizes. In a reference to the sub-framework's purpose of
      supporting skinny/unpacked level-3 operations, the "sup" operation
      suffix (e.g. gemmsup) is typically used to denote a separate namespace
      for related code and kernels. NOTE: Since the sup framework does not
      perform any packing, it targets row- and column-stored matrices A, B,
      and C. For now, if any matrix has non-unit strides in both dimensions,
      the problem is computed by the conventional implementation.
    - Implemented the default sup handler as a front-end to two variants.
      bli_gemmsup_ref_var2() provides a block-panel variant (in which the
      2nd loop around the microkernel iterates over n and the 1st loop
      iterates over m), while bli_gemmsup_ref_var1() provides a panel-block
      variant (2nd loop over m and 1st loop over n). However, these variants
      are not used by default and provided for reference only. Instead, the
      default sup handler calls _var2m() and _var1n(), which are similar
      to _var2() and _var1(), respectively, except that they defer to the
      sup kernel itself to iterate over the m and n dimension, respectively.
      In other words, these variants rely not on microkernels, but on
      so-called "millikernels" that iterate along m and k, or n and k.
      The benefit of using millikernels is a reduction of function call
      and related (local integer typecast) overhead as well as the ability
      for the kernel to know which micropanel (A or B) will change during
      the next iteration of the 1st loop, which allows it to focus its
      prefetching on that micropanel. (In _var2m()'s millikernel, the upanel
      of A changes while the same upanel of B is reused. In _var1n()'s, the
      upanel of B changes while the upanel of A is reused.)
    - Added a new configure option, --[en|dis]able-sup-handling, which is
      enabled by default. However, the default thresholds at which the
      default sup handler is activated are set to zero for each of the m, n,
      and k dimensions, which effectively disables the implementation. (The
      default sup handler only accepts the problem if at least one dimension
      is smaller than or equal to its corresponding threshold. If all
      dimensions are larger than their thresholds, the problem is rejected
      by the sup front-end and control is passed back to the conventional
      implementation, which proceeds normally.)
    - Added support to the cntx_t structure to track new fields related to
      the sup framework, most notably:
      - sup thresholds: the thresholds at which the sup handler is called.
      - sup handlers: the address of the function to call to implement
        the level-3 skinny/unpacked matrix implementation.
      - sup blocksizes: the register and cache blocksizes used by the sup
        implementation (which may be the same or different from those used
        by the conventional packm-based approach).
      - sup kernels: the kernels that the handler will use in implementing
        the sup functionality.
      - sup kernel prefs: the IO preference of the sup kernels, which may
        differ from the preferences of the conventional gemm microkernels'
        IO preferences.
    - Added a bool_t to the rntm_t structure that indicates whether sup
      handling should be enabled/disabled. This allows per-call control
      of whether the sup implementation is used, which is useful for test
      drivers that wish to switch between the conventional and sup codes
      without having to link to different copies of BLIS. The corresponding
      accessor functions for this new bool_t are defined in bli_rntm.h.
    - Implemented several row-preferential gemmsup kernels in a new
      directory, kernels/haswell/3/sup. These kernels include two general
      implementation types--'rd' and 'rv'--for the 6x8 base shape, with
      two specialized millikernels that embed the 1st loop within the kernel
      itself.
    - Added ref_kernels/3/bli_gemmsup_ref.c, which provides reference
      gemmsup microkernels. NOTE: These microkernels, unlike the current
      crop of conventional (pack-based) microkernels, do not use constant
      loop bounds. Additionally, their inner loop iterates over the k
      dimension.
    - Defined new typedef enums:
      - stor3_t: captures the effective storage combination of the level-3
        problem. Valid values are BLIS_RRR, BLIS_RRC, BLIS_RCR, etc. A
        special value of BLIS_XXX is used to denote an arbitrary combination
        which, in practice, means that at least one of the operands is
        stored according to general stride.
      - threshid_t: captures each of the three dimension thresholds.
    - Changed bli_adjust_strides() in bli_obj.c so that bli_obj_create()
      can be passed "-1, -1" as a lazy request for row storage. (Note that
      "0, 0" is still accepted as a lazy request for column storage.)
    - Added support for various instructions to bli_x86_asm_macros.h,
      including imul, vhaddps/pd, and other instructions related to integer
      vectors.
    - Disabled the older small matrix handling code inserted by AMD in
      bli_gemm_front.c, since the sup framework introduced in this commit
      is intended to provide a more generalized solution.
    - Added test/sup directory, which contains standalone performance test
      drivers, a Makefile, a runme.sh script, and an 'octave' directory
      containing scripts compatible with GNU Octave. (They also may work
      with matlab, but if not, they are probably close to working.)
    - Reinterpret the storage combination string (sc_str) in the various
      level-3 testsuite modules (e.g. src/test_gemm.c) so that the order
      of each matrix storage char is "cab" rather than "abc".
    - Comment updates in level-3 BLAS API wrappers in frame/compat.

commit 0d549ceda822833bec192bbf80633599620c15d9
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Sat Apr 27 22:56:02 2019 +0000

    make unix friendly archives on appveyor (#310)

commit ca4b33c001f9e959c43b95a9a23f9df5adec7adf
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Wed Apr 24 15:02:39 2019 +0530

     Added compiler option (-mno-avx256-split-unaligned-store) in the file config/zen/make_defs.mk to improve performance of intrinsic codes, this flag ensures compiler generates 256-bit stores for the equivalent intrinsics code.
    
    Change-Id: I8f8cd81a3604869df18d38bc42097a04f178d324

commit 945928c650051c04d6900c7f4e9e29cd0e5b299f
Merge: 663f6629 74e513eb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 17 15:58:56 2019 -0500

    Merge branch 'amd' of github.com:flame/blis into amd

commit 74e513eb6a6787a925d43cd1500277d54d86ab8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 17 13:34:44 2019 -0500

    Support row storage in Eigen gemm test/3 driver.
    
    Details:
    - Added preprocessor branches to test/3/test_gemm.c to explicitly
      support row-stored matrices. Column-stored matrices are also still
      supported (and is the default for now). (This is mainly residual work
      leftover from initial integration of Eigen into the test drivers, so
      if we ever want to test Eigen with row-stored matrices, the code will
      be ready to use, even if it is not yet integrated into the Makefile
      in test/3.)

commit b5d457fae9bd75c4ca67f7bc7214e527aa248127
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 16 12:50:01 2019 -0500

    Applied forgotten variable rename from 89a70cc.
    
    Details:
    - Somehow the variable name change (root_file_name -> root_inputname)
      in flatten-headers.py mentioned in the commit log entry for 89a70cc
      didn't make it into the actual commit. This commit applies that
      change.

commit 89a70cccf869333147eb2559cdfa5a23dc915824
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 11 18:33:08 2019 -0500

    GNU-like handling of installation prefix et al.
    
    Details:
    - Changed the default installation prefix from $HOME/lib to /usr/local.
    - Modified the way configure internally handles the prefix, libdir,
      includedir, and sharedir (and also added an --exec-prefix option).
      The defaults to these variables are set as follows:
        prefix:      /usr/local
        exec_prefix: ${prefix}
        libdir:      ${exec_prefix}/lib
        includedir:  ${prefix}/include
        sharedir:    ${prefix}/share
      The key change, aside from the addition of exec_prefix and its use to
      define the default to libdir, is that the variables are substituted
      into config.mk with quoting that delays evaluation, meaning the
      substituted values may contain unevaluated references to other
      variables (namely, ${prefix} and ${exec_prefix}). This more closely
      follows GNU conventions, including those used by GNU autoconf, and
      also allows make to override any one of the variables *after*
      configure has already been run (e.g. during 'make install').
    - Updates to build/config.mk.in pursuant to above changes.
    - Updates to output of 'configure --help' pursuant to above changes.
    - Updated docs/BuildSystem.md to reflect the new default installation
      prefix, as well as mention EXECPREFIX and SHAREDIR.
    - Changed the definitions of the UNINSTALL_OLD_* variables in the
      top-level Makefile to use $(wildcard ...) instead of 'find'. This
      was motivated by the new way of handling prefix and friends, which
      leads to the 'find' command being run on /usr/local (by default),
      which can take a while almost never yielding any benefit (since the
      user will very rarely use the uninstall-old targets).
    - Removed periods from the end of descriptive output statements (i.e.,
      non-verbose output) since those statements often end with file or
      directory paths, which get confusing to read when puctuated by a
      period.
    - Trival change to 'make showconfig' output.
    - Removed my name from 'configure --help'. (Many have contributed to it
      over the years.)
    - In configure script, changed the default state of threading_model
      variable from 'no' to 'off' to match that of debug_type, where there
      are similarly more than two valid states. ('no' is still accepted
      if given via the --enable-debug= option, though it will be
      standardized to 'off' prior to config.mk being written out.)
    - Minor variable name change in flatten-headers.py that was intended for
      32812ff.
    - CREDITS file update.

commit 9d76688ad90014a11ddc0c2f27253d62806216b1
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Thu Apr 11 10:22:48 2019 +0530

    Fix for single rank crash with HPL application. When computing offset of C buffer, as integer variables are used for a row and column index, the intermediate result value overflows and a negative value gets added to the buffer, when the negative value is too large it would index the buffer out of the range resulting in segmentation fault. Although the crash is a result of dgemm kernel, added similar code in sgemm kernel also.
    
    Change-Id: I171119b0ec0dfbd8e63f1fcd6609a94384aabd27

commit 32812ff5aba05d34c421fe1024a61f3e2d5e7052
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 9 12:20:19 2019 -0500

    Minor bugfix to flatten-headers.py.
    
    Details:
    - Fixed a minor bug in flatten-headers.py whereby the script, upon
      encountering a #include directive for the root header file, would
      erroneously recurse and inline the conents of that root header.
      The script has been modified to avoid recursion into any headers
      that share the same name as the root-level header that was passed
      into the script. (Note: this bug didn't actually manifest in BLIS,
      so it's merely a precaution for usage of flatten-headers.py in other
      contexts.)

commit bec90e0b6aeb3c9b19589c2b700fda2d66f6ccdf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 2 17:45:13 2019 -0500

    Minor update to docs/HardwareSupport.md document.
    
    Details:
    - Added more details and clarifying language to implications of 1m and
      the recycling of microkernels between microarchitectures.

commit 89cd650e7be01b59aefaa85885a3ea78970351e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 2 17:23:55 2019 -0500

    Use void_fp for function pointers instead of void*.
    
    Change void*-typed function pointers to void_fp.
    - Updated all instances of void* variables that store function pointers
      to variables of a new type, void_fp. Originally, I wanted to define
      the type of void_fp as "void (*void_fp)( void )"--that is, a pointer
      to a function with no return value and no arguments. However, once
      I did this, I realized that gcc complains with incompatible pointer
      type (-Wincompatible-pointer-types) warnings every time any such a
      pointer is being assigned to its final, type-accurate function
      pointer type. That is, gcc will silently typecast a void* to
      another defined function pointer type (e.g. dscalv_ker_ft) during
      an assignment from the former to the latter, but the same statement
      will trigger a warning when typecasting from a void_fp type. I suspect
      an explicit typecast is needed in order to avoid the warning, which
      I'm not willing to insert at this time.
    - Added a typedef to bli_type_defs.h defining void_fp as void*, along
      with a commented-out version of the aborted definition described
      above. (Note that POSIX requires that void* and function pointers
      be interchangeable; it is the C standard that does not provide this
      guarantee.)
    - Comment updates to various _oapi.c files.

commit ffce3d632b284eb52474036096815ec38ca8dd5f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 2 14:40:50 2019 -0500

    Renamed armv8a gemm kernel filename.
    
    Details:
    - Renamed
        kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c
      to
        kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c.
      This follows the naming convention used by other kernel sets, most
      notably haswell.

commit 77867478af02144544b4e7b6df5d54d874f3f93b
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Tue Apr 2 13:33:11 2019 -0500

    Use pthreads on MinGW and Cygwin (#307)

commit 7bc75882f02ce3470a357950878492e87e688cec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 28 17:40:50 2019 -0500

    Updated Eigen results in docs/graphs with 3.3.90.
    
    Details:
    - Updated the level-3 performance graphs in docs/graphs with new Eigen
      results, this time using a development version cloned from their git
      mirror on March 27, 2019 (version 3.3.90). Performance is improved
      over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
    - Very minor updates to docs/Performance.md and matlab scripts in
      test/3/matlab.

commit 20ea7a1217d3833db89a96158c42da2d6e968ed8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 27 18:09:17 2019 -0500

    Minor text updates (Eigen) to docs/Performance.md.
    
    Details:
    - Added/updated a few more details, mostly regarding Eigen.

commit bfb7e1bc6af468e4ff22f7e27151ea400dcd318a
Merge: 044df950 2c85e1dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 27 17:58:19 2019 -0500

    Merge branch 'dev'

commit 2c85e1dd9d5d84da7228ea4ae6deec56a89b3a8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 27 16:29:51 2019 -0500

    Added Eigen results to performance graphs.
    
    Details:
    - Updated the Haswell, SkylakeX, and Epyc performance graphs in
      docs/graphs to report on Eigen implementations, where applicable.
      Specifically, Eigen implements all level-3 operations sequentially,
      however, of those operations it only provides multithreaded gemm.
      Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
      omitted. Thanks to Sameer Agarwal for his help configuring and
      using Eigen.
    - Updated docs/Performance.md to note the new implementation tested.
    - CREDITS file update.

commit bfac7e385f8061f2e6591de208b0acf852f04580
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 27 16:04:48 2019 -0500

    Added ability to plot with Eigen in test/3/matlab.
    
    Details:
    - Updated matlab scripts in test/3/matlab to optionally plot/display
      Eigen performance curves. Whether Eigen is plotted is determined by
      a new boolean function parameter, with_eigen.
    - Updated runme.m scratchpad to reflect the latest invocations of the
      plot_panel_4x5() function (with Eigen plotting enabled).

commit 67535317b9411c90de7fa4cb5b0fdb8f61fdcd79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 27 13:32:18 2019 -0500

    Fixed mislabeled eigen output from test/3 drivers.
    
    Details:
    - Fixed the Makefile in test/3 so that it no longer incorrectly labels
      the matlab output variables from Eigen-linked hemm, herk, trmm, and
      trsm driver output as "vendor". (The gemm drivers were already
      correctly outputing matlab variables containing the "eigen" label.)

commit 044df9506f823643c0cdd53e81ad3c27a9f9d4ff
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Mar 27 12:39:31 2019 -0500

    Test with shared on windows (#306)
    
    Export macros can't support both shared and static at the same time.
    When blis is built with both shared and static, headers assume that
    shared is used at link time and dllimports the symbols with __imp_
    prefix.
    
    To use the headers with static libraries a user can give
    -DBLIS_EXPORT= to import the symbol without the __imp_ prefix

commit 5e6b160c8a85e5e23bab0f64958a8acf4918a4ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 19:10:59 2019 -0500

    Link to Eigen BLAS for non-gemm drivers in test/3.
    
    Details:
    - Adjusted test/3/Makefile so that the test drivers are linked against
      Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
      this since Eigen's headers don't define implementations to the
      standard BLAS APIs.
    - Simplified #included headers in hemm, herk, trmm, and trsm source
      driver files, since nothing specific to Eigen is needed at
      compile-time for those operations.

commit e593221383aae19dfdc3f30539de80ed05cfec7f
Merge: 92fb9c87 c208b9dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 15:51:45 2019 -0500

    Merge branch 'master' into dev

commit 92fb9c87bf88b9f9c401eeecd9aa9c3521bc2adb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 15:43:23 2019 -0500

    Add more support for Eigen to drivers in test/3.
    
    Details:
    - Use compile-time implementations of Eigen in test_gemm.c via new
      EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
      library is not necessary.) However, as of Eigen 3.3.7, Eigen only
      parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
      any other level-3 operation.
    - Fixed a bug in trmm and trsm drivers whereby the wrong function
      (bli_does_trans()) was being called to determine whether the object
      for matrix A should be created for a left- or right-side case. This
      was corrected by changing the function to bli_is_left(), as is done
      in the hemm driver.
    - Added support for running Eigen test drivers from runme.sh.

commit c208b9dc46852c877197d53b6dd913a046b6ebb6
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Mon Mar 25 13:03:44 2019 -0500

    Fix clang version detection (#305)
    
    clang -dumpversion gives 4.2.1 for all clang versions as clang was
    originally compatible with gcc 4.2.1
    
    Apple clang version and clang version are two different things
    and the real clang version cannot be deduced from apple clang version
    programatically. Rely on wikipedia to map apple clang to clang version
    
    Also fixes assembly detection with clang
    
    clang 3.8 can't build knl as it doesn't recognize zmm0

commit 53842c7e7d530cb2d5609d6d124ae350fc345c32
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Fri Mar 22 13:57:14 2019 +0530

    Removed printing alpha and beta values
    
    Change-Id: I49102db510311a30f6a936f9d843f35838f50d23

commit 6805db45e343d83d1adaf9157cf0b841653e9ede
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Fri Mar 22 12:55:35 2019 +0530

    Corrected setting alpha & beta values- alpha = -1 and beta = 1 - bli_setc(-1.0, 0, &alpha) should be used rather than bli_setc(0.0, -1.0, &alpha). This corrected now
    
    Change-Id: Ic1102dfd6b50ccf212386a1211c6f31e8d987ef9

commit feefcab4427a75b0b55af215486b85abcda314f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 21 18:11:20 2019 -0500

    Allow disabling of BLAS prototypes at compile-time.
    
    Details:
    - Modified bli_blas.h so that:
      - By default, if the BLAS layer is enabled at configure-time, BLAS
        prototypes are also enabled within blis.h;
      - But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including
        blis.h, BLAS prototypes are skipped over entirely so that, for
        example, the application or some other header pulled in by the
        application may prototype the BLAS functions without causing any
        duplication.
    - Updated docs/BuildSystem.md to document the feature above, and
      related text.

commit 20153cd4b594bc34f860c381ec18de3a6cc743c7
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Thu Mar 21 16:23:53 2019 +0530

    Modified test_gemm.c file in test folder
    A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
    Format for input file:
    Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
    The operation always implemented is C = C - A*B and column-major format.
    When macro is disabled - it reverts back to original implementation.
    Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
    GEMM is called through BLAS interface
    For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
    for MKL/OpenBLAS - ignore this character
    
    Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36

commit 288843b06d91e1b4fade337959aef773090bd1c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 20 17:52:23 2019 -0500

    Added Eigen support to test/3 Makefile, runme.sh.
    
    Details:
    - Added targets to test/3/Makefile that link against a BLAS library
      build by Eigen. It appears, however, that Eigen's BLAS library does
      not support multithreading. (It may be that multithreading is only
      available when using the native C++ APIs.)
    - Updated runme.sh with a few Eigen-related tweaks.
    - Minor tweaks to docs/Performance.md.

commit 153e0be21d9ff413e370511b68d553dd02abada9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 17:53:18 2019 -0500

    More minor tweaks to docs/Performance.md.
    
    Details:
    - Defined GFLOPS as billions of floating-point operations per second,
      and reworded the sentence after about normalization.

commit 05c4e42642cc0c8dbfa94a6c21e975ac30c0517a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 17:07:20 2019 -0500

    CHANGELOG update (0.5.2)

commit 9204cd0cb0cc27790b8b5a2deb0233acd9edeb9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 17:07:18 2019 -0500

    Version file update (0.5.2)

commit 64560cd9248ebf4c02c4a1eeef958e1ca434e510
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 17:04:20 2019 -0500

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated ReleaseNotes.md in preparation for next version.

commit ab5ad557ea69479d487c9a3cb516f43fa1089863
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 16:50:41 2019 -0500

    Very minor tweaks to Performance.md.

commit 03c4a25e1aa8a6c21abbb789baa599ac419c3641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 16:47:15 2019 -0500

    Minor fixes to docs/Performance.md.
    
    Details:
    - Fixed some incorrect labels associated with the pdf/png graphs,
      apparently the result of copy-pasting.

commit fe6dd8b132f39ecb8893d54cd8e75d4bbf6dab83
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 16:30:23 2019 -0500

    Fixed broken section links in docs/Performance.md.
    
    Details:
    - Fixed a few broken section links in the Contents section.

commit 913cf97653f5f9a40aa89a5b79e2b0a8882dd509
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 16:15:24 2019 -0500

    Added docs/Performance.md and docs/graphs subdir.
    
    Details:
    - Added a new markdown document, docs/Performance.md, which reports
      performance of a representative set of level-3 operations across a
      variety of hardware architectures, comparing BLIS to OpenBLAS and a
      vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
      in pdf and png formats, reside in docs/graphs.
    - Updated README.md to link to new Performance.md document.
    - Minor updates to CREDITS, docs/Multithreading.md.
    - Minor updates to matlab scripts in test/3/matlab.

commit 9945ef24fd758396b698b19bb4e23e53b9d95725
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 15:28:44 2019 -0500

    Adjusted cache blocksizes for zen subconfig.
    
    Details:
    - Adjusted the zen sub-configuration's cache blocksizes for float,
      scomplex, and dcomplex based on the existing values for double.
      (The previous values were taken directly from the haswell subconfig,
      which targets Intel Haswell/Broadwell/Skylake systems.)

commit d202d008d51251609d08d3c278bb6f4ca9caf8e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 18 18:18:25 2019 -0500

    Renamed --enable-export-all to --export-shared=[].
    
    Details:
    - Replaced the existing --enable-export-all / --disable-export-all
      configure option with --export-shared=[public|all], with the 'public'
      instance of the latter corresponding to --disable-export-all and the
      'all' instance corresponding to --enable-export-all. Nothing else
      semantically about the option, or its default, has changed.

commit ff78089870f714663026a7136e696603b5259560
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 18 13:22:55 2019 -0500

    Updates to docs/Multithreading.md.
    
    Details:
    - Made extra explicit the fact that: (a) multithreading in BLIS is
      disabled by default; and (b) even with multithreading enabled, the
      user must specify multithreading at runtime in order to observe
      parallelism. Thanks to M. Zhou for suggesting these clarifications
      in #292.
    - Also made explicit that only the environment variable and global
      runtime API methods are available when using the BLAS API. If the
      user wishes to use the local runtime API (specify multithreading on
      a per-call basis), one of the native BLIS APIs must be used.

commit 3a929a3d0ba0353159a6d4cd188f01b7a390ccfc
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Mon Mar 18 10:51:41 2019 +0530

    Fixed code merging: bli_gemm_small.c - missed conditional checks for L!=0 && K!=0. Now they are added. This fix is done to pass blastest
    
    Change-Id: Idc9c9a04d2015a68a19553c437ecaf8f1584026c

commit 663f662932c3f182fefc3c77daa1bf8c3394bb8b
Merge: 938c05ef 6bfe3812
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 16 16:17:12 2019 -0500

    Merge branch 'amd' of github.com:flame/blis into amd

commit 938c05ef8654e2fc013d39a57f51d91d40cc40fb
Merge: 4ed39c09 5a5f494e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 16 16:01:43 2019 -0500

    Merge branch 'amd' of github.com:flame/blis into amd

commit 6bfe3812e29b86c95b828822e4e5473b48891167
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 15 13:57:49 2019 -0500

    Use -fvisibility=[...] with clang on Linux/BSD/OSX.
    
    Details:
    - Modified common.mk to use the -fvisibility=[hidden|default] option
      when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
      etc.). Thanks to Isuru Fernando for pointing out this option works
      with clang on these OSes.

commit 809395649c5bbf48778ede4c03c1df705dd49566
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 13 18:21:35 2019 -0500

    Annotated additional symbols for export.
    
    Details:
    - Added export annotations to additional function prototypes in order to
      accommodate the testsuite.
    - Disabled calling bli_amaxv_check() from within the testsuite's
      test_amaxv.c.

commit e095926c643fd9c9c2220ebecd749caae0f71d42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 13 17:35:18 2019 -0500

    Support shared lib export of only public symbols.
    
    Details:
    - Introduced a new configure option, --enable-export-all, which will
      cause all shared library symbols to be exported by default, or,
      alternatively, --disable-export-all, which will cause all symbols to
      be hidden by default, with only those symbols that are annotated for
      visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
      symbols), to be exported. The default for this configure option is
      --disable-export-all. Thanks to Isuru Fernando for consulting on
      this commit.
    - Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
      which was intended for 5a5f494.
    - Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
      frame/include/bli_config_macro_defs.h.
    - Provided appropriate logic within common.mk to implement variable
      symbol visibility for gcc, clang, and icc (to the extend that each of
      these compilers allow).
    - Relocated --help text associated with debug option (-d) to configure
      slightly further down in the list.

commit 5a5f494e428372c7c27ed1f14802e15a83221e87
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 12 18:45:09 2019 -0500

    Removed export macros from all internal prototypes.
    
    Details:
    - After merging PR #303, at Isuru's request, I removed the use of
      BLIS_EXPORT_BLIS from all function prototypes *except* those that we
      potentially wish to be exported in shared/dynamic libraries. In other
      words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
      functions that can be considered private or for internal use only.
      This is likely the last big modification along the path towards
      implementing the functionality spelled out in issue #248. Thanks
      again to Isuru Fernando for his initial efforts of sprinkling the
      export macros throughout BLIS, which made removing them where
      necessary relatively painless. Also, I'd like to thank Tony Kelman,
      Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
      participating in the initial discussion in issue #37 that was later
      summarized and restated in issue #248.
    - CREDITS file update.

commit 3dc18920b6226026406f1d2a8b2c2b405a2649d5
Merge: b938c16b 766769ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 12 11:20:25 2019 -0500

    Merge branch 'master' into dev

commit 766769eeb944bd28641a6f72c49a734da20da755
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Mon Mar 11 19:05:32 2019 -0500

    Export functions without def file (#303)
    
    * Revert "restore bli_extern_defs exporting for now"
    
    This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.
    
    * Remove symbols not intended to be public
    
    * No need of def file anymore
    
    * Fix whitespace
    
    * No need of configure option
    
    * Remove export macro from definitions
    
    * Remove blas export macro from definitions

commit 4ed39c0971c7917e2675cf5449f563b1f4751ccc
Merge: 540ec1b4 b938c16b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 8 11:56:58 2019 -0600

    Merge branch 'amd' of github.com:flame/blis into amd

commit b938c16b0c9e839335ac2c14944b82890143d02f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 7 16:40:39 2019 -0600

    Renamed test/3m4m to test/3.
    
    Details:
    - Renamed '3m4m' directory to '3', which captures the directory nicely
      since it builds test drivers to test level-3 operations.
    - These test drivers ceased to be used to test the 3m and 4m (or even
      1m) induced methods long ago, hence the name change.

commit ab89a40582ec7acf802e59b0763bed099a02edd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 7 16:26:12 2019 -0600

    More minor updates and edits to test/3m4m.
    
    Details:
    - Further updates to matlab scripts, mostly for compatibility with
      GNU Octave.
    - More tweaks to runme.sh.
    - Updates to runme.m that allow copy-paste into matlab interactive
      session to generate graphs.

commit f0e70dfbf3fee4c4e382c2c4e87c25454cbc79a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 7 01:04:05 2019 +0000

    Very minor updates to test/3m4m for ul252.
    
    Details:
    - Very minor updates to the newly revamped test/3m4m drivers when used
      on a Xeon Platinum (SkylakeX).

commit 7fe44748383071f1cbbc77d904f4ae5538e13065
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Wed Mar 6 16:23:31 2019 +0530

    Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning
    
    Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57

commit 9f1dbe572b1fd5e7dd30d5649bdf59259ad770d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 5 17:47:55 2019 -0600

    Overhauled test/3m4m Makefile and scripts.
    
    Details:
    - Rewrote much of Makefile to generate executables for single- and dual-
      socket multithreading as well as single-threaded. Each of the three
      can also use a different problem size range/increment, as is often
      appropriate when doubling/halving the number of threads.
    - Rewrote runme.sh script to flexibly execute as many threading
      parameter scenarios as is given in the input parameter string
      (currently set within the script itself). The string also encodes
      the maximum problem size for each threading scenario, which is used
      to identify the executable to run. Also improved the "progress" output
      of the script to reduce redundant info and improve readability in
      terminals that are not especially wide.
    - Minor updates to test_*.c source files.
    - Updated matlab scripts according to changes made to the Makefile,
      test drivers, and runme.sh script, and renamed 'plot_all.m' to
      'runme.m'.

commit f5ed95ecd7d5eb4a63e1333ad5cc6765fc8df9fe
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Mar 5 15:01:57 2019 +0530

    Merged BLIS Release 1.3
    Modified config/zen/make_defs.mk, now CKVECFLAGS     := -mavx2 -mfpmath=sse -mfma -march=znver1
    
    Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81

commit 3bdab823fa93342895bf45d812439324a37db77c
Merge: 70f12f20 e2a02ebd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 28 14:07:24 2019 -0600

    Merge branch 'master' into dev

commit e2a02ebd005503c63138d48a2b7d18978ee29205
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 28 13:58:59 2019 -0600

    Updates (from ls5) to test/3m4m/runme.sh.
    
    Details:
    - Lonestar5-specific updates to runme.sh.

commit f0dcc8944fa379d53770f5cae5d670140918f00c
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Feb 27 17:27:23 2019 -0600

    Add symbol export macro for all functions (#302)
    
    * initial export of blis functions
    
    * Regenerate def file for master
    
    * restore bli_extern_defs exporting for now

commit 540ec1b479712d5e1da637a718927249c15d867f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Feb 24 19:09:10 2019 -0600

    Updated level-3 BLAS to call object API directly.
    
    Details:
    - Updated the BLAS compatibility layer for level-3 operations so that
      the corresponding BLIS object API is called directly rather than first
      calling the typed BLIS API. The previous code based on the typed BLIS
      API calls is still available in a deactivated cpp macro branch, which
      may be re-activated by #defining BLIS_BLAS3_CALLS_TAPI. (This does not
      yet correspond to a configure option. If it seems like people might
      want to toggle this behavior more regularly, a configure option can be
      added in the future.)
    - Updated the BLIS typed API to statically "pre-initialize" objects via
      new initializor macros. Initialization is then finished via calls to
      static functions bli_obj_init_finish_1x1() and bli_obj_init_finish(),
      which are similar to the previously-called functions,
      bli_obj_create_1x1_with_attached_buffer() and
      bli_obj_create_with_attached_buffer(), respectively. (The BLAS
      compatibility layer updates mentioned above employ this new technique
      as well.)
    - Transformed certain routines in bli_param_map.c--specifically, the
      ones that convert netlib-style parameters to BLIS equivalents--into
      static functions, now in bli_param_map.h. (The remaining three classes
      of conversation routines were left unchanged.)
    - Added the aforementioned pre-initializor macros to bli_type_defs.h.
    - Relocated bli_obj_init_const() and bli_obj_init_constdata() from
      bli_obj_macro_defs.h to bli_type_defs.h.
    - Added a few macros to bli_param_macro_defs.h for testing domains for
      real/complexness and precisions for single/double-ness.

commit 8e023bc914e9b4ac1f13614feb360b105fbe44d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 22 16:55:30 2019 -0600

    Updates to 3m4m/matlab scripts.
    
    Details:
    - Minor updates to matlab graph-generating scripts.
    - Added a plot_all.m script that is more of a scratchpad for copying and
      pasting function invocations into matlab to generate plots that are
      presently of interest to us.

commit b06244d98cc468346eb1a8eb931bc05f35ff280c
Merge: e938ff08 4c7e6680
Author: praveeng <praveen.g@amd.com>
Date:   Thu Feb 21 12:56:15 2019 +0530

    Merge branch 'ut-austin-amd' of ssh://git.amd.com:29418/cpulibraries/er/blis into ut-austin-amd

commit e938ff08cea3d108c84524eb129d9e89d701ea90
Author: praveeng <praveen.g@amd.com>
Date:   Thu Feb 21 12:44:38 2019 +0530

    deleted test.txt
    
    Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3

commit ed13ad465dcba350ad3d5e16c9cc7542e33f3760
Author: mkv <Mallikarjuna-Reddy.K-V@amd.com>
Date:   Thu Feb 21 01:04:16 2019 -0500

    added test file for initial commit

commit 4c7e6680832b497468cf50c2399e3ac4de0e3450
Author: praveeng <praveen.g@amd.com>
Date:   Thu Feb 21 12:44:38 2019 +0530

    deleted test.txt
    
    Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3

commit 95e070581c54ed2edc211874faec56055ea298c8
Author: mkv <Mallikarjuna-Reddy.K-V@amd.com>
Date:   Thu Feb 21 01:04:16 2019 -0500

    added test file for initial commit

commit 70f12f209bc1901b5205902503707134cf2991a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 20 16:10:10 2019 -0600

    Changed unsafe-loop to unsafe-math optimizations.
    
    Details:
    - Changed -funsafe-loop-optimizations (re-)introduced in 7690855 for
      make_defs.mk files' CRVECFLAGS to -funsafe-math-optimizations (to
      account for a miscommunication in issue #300). Thanks to Dave Love
      for this suggestion and Jeff Hammond for his feedback on the topic.

commit 7690855c5106a56e5b341a350f8db1c78caacd89
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 19:16:01 2019 -0600

    Restored -funsafe-loop-optimizations to subconfigs.
    
    Details:
    - Restored use of -funsafe-loop-optimizations in the definitions of
      CRVECFLAGS (when using gcc), but only for sub-configurations (and
      not configuration families such as amd64, intel64, and x86_64).
      This more or less reverts 5190d05 and 6cf1550.

commit 44994d1490897b08cde52a615a2e37ddae8b2061
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 18:35:30 2019 -0600

    Disable TBM, XOP, LWP instructions in AMD configs.
    
    Details:
    - Added -mno-tbm -mno-xop -mno-lwp to CKVECFLAGS in bulldozer,
      piledriver, steamroller, and excavator configurations to explicitly
      disable AMD's bulldozer-era TBM, XOP, and LWP instruction sets in an
      attempt to fix the invalid instruction error that has plagued Travis
      CI builds since 6a014a3. Thanks to Devin Matthews for pointing out
      that the offending instruction was part of TBM (issue #300).
    - Restored -O3 to piledriver configuration's COPTFLAGS.

commit 1e5b530744c1906140d47f43c5cad235eaa619cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 18:04:38 2019 -0600

    Reverted piledriver COPTFLAGS from -O3 to -O2.
    
    Details:
    - Debugging continues; changing COPTFLAGS for piledriver subconfig from
      -O3 to -O2, its original value prior to 6a014a3.

commit 6cf155049168652c512aefdd16d74e7ff39b98df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 17:29:51 2019 -0600

    Removed -funsafe-loop-optimizations from all configs.
    
    Details:
    - Error persists. Removed -funsafe-loop-optimizations from all remaining
      sub-configurations.

commit 5190d05a27c5fa4c7942e20094f76eb9a9785c3e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 17:07:35 2019 -0600

    Removed -funsafe-loop-optimizations from piledriver.
    
    Details:
    - Error persists; continuing debugging from bf0fb78c by removing
      -funsafe-loop-optimizations from piledriver configuration.

commit bf0fb78c5e575372060d22f5ceeb5b332e8978ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 16:51:38 2019 -0600

    Removed -funsafe-loop-optimizations from families.
    
    Details:
    - Removed -funsafe-loop-optimizations from the configuration families
      affected by 6a014a3, specifically: intel64, amd64, and x86_64.
      This is part of an attempt to debug why the sde, as executed by
      Travis CI, is crashing via the following error:
    
        TID 0 SDE-ERROR: Executed instruction not valid for specified chip
        (ICELAKE): 0x9172a5: bextr_xop rax, rcx, 0x103

commit 6a014a3377a2e829dbc294b814ca257a2bfcb763
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 14:52:29 2019 -0600

    Standardized optimization flags in make_defs.mk.
    
    Details:
    - Per Dave Love's recommendation in issue #300, this commit defines
        COPTFLAGS := -03
      and
        CRVECFLAGS := $(CKVECFLAGS) -funsafe-loop-optimizations
      in the make_defs.mk for all Intel- and AMD-based configurations.

commit 565fa3853b381051ac92cff764625909d105644d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 18 11:43:58 2019 -0600

    Redirect trsm pc, ir parallelism to ic, jr loops.
    
    Details:
    - trsm parallelization was temporarily simplifed in 075143d to entirely
      ignore any parallelism specified via the pc or ir loops. Now, any
      parallelism specified to the pc loop will be redirected to the ic
      loop, and any parallelism specified to the ir loop will be redirected
      to the jr loop. (Note that because of inter-iteration dependencies,
      trsm cannot parallelize the ir loop. Parallelism via the pc loop is
      at least somewhat feasible in theory, but it would require tracking
      dependencies between blocks--something for which BLIS currently lacks
      the necessary supporting infrastructure.)

commit a023c643f25222593f4c98c2166212561d030621
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 14 20:18:55 2019 -0600

    Regenerated symbols in build/libblis-symbols.def.
    
    Details:
    - Reran ./build/regen-symbols.sh after running
      'configure --enable-cblas auto'

commit 075143dfd92194647da9022c1a58511b20fc11f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 14 18:52:45 2019 -0600

    Added support for IC loop parallelism to trsm.
    
    Details:
    - Parallelism within the IC loop (3rd loop around the microkernel) is
      now supported within the trsm operation. This is done via a new branch
      on each of the control and thread trees, which guide execution of a
      new trsm-only subproblem from within bli_trsm_blk_var1(). This trsm
      subproblem corresponds to the macrokernel computation on only the
      block of A that contains the diagonal (labeled as A11 in algorithms
      with FLAME-like partitioning), and the corresponding row panel of C.
      During the trsm subproblem, all threads within the JC communicator
      participate and parallelize along the JR loop, including any
      parallelism that was specified for the IC loop. (IR loop parallelism
      is not supported for trsm due to inter-iteration dependencies.) After
      this trsm subproblem is complete, a barrier synchronizes all
      participating threads and then they proceed to apply the prescribed
      BLIS_IC_NT (or equivalent) ways of parallelism (and any BLIS_JR_NT
      parallelism specified within) to the remaining gemm subproblem (the
      rank-k update that is performed using the newly updated row-panel of
      B). Thus, trsm now supports JC, IC, and JR loop parallelism.
    - Modified bli_trsm_l_cntl_create() to create the new "prenode" branch
      of the trsm_l cntl_t tree. The trsm_r tree was left unchanged, for
      now, since it is not currently used. (All trsm problems are cast in
      terms of left-side trsm.)
    - Updated bli_cntl_free_w_thrinfo() to be able to free the newly shaped
      trsm cntl_t trees. Fixed a potentially latent bug whereby a cntl_t
      subnode is only recursed upon if there existed a corresponding
      thrinfo_t node, which may not always exist (for problems too small
      to employ full parallelization due to the minimum granularity imposed
      by micropanels).
    - Updated other functions in frame/base/bli_cntl.c, such as
      bli_cntl_copy() and bli_cntl_mark_family(), to recurse on sub-prenodes
      if they exist.
    - Updated bli_thrinfo_free() to recurse into sub-nodes and prenodes
      when they exist, and added support for growing a prenode branch to
      bli_thrinfo_grow() via a corresponding set of help functions named
      with the _prenode() suffix.
    - Added a bszid_t field thrinfo_t nodes. This field comes in handy when
      debugging the allocation/release of thrinfo_t nodes, as it helps trace
      the "identity" of each nodes as it is created/destroyed.
    - Renamed
        bli_l3_thrinfo_print_paths() -> bli_l3_thrinfo_print_gemm_paths()
      and created a separate bli_l3_thrinfo_print_trsm_paths() function to
      print out the newly reconfigured thrinfo_t trees for the trsm
      operation.
    - Trival changes to bli_gemm_blk_var?.c and bli_trsm_blk_var?.c
      regarding variable declarations.
    - Removed subpart_t enum values BLIS_SUBPART1T, BLIS_SUBPART1B,
      BLIS_SUBPART1L, BLIS_SUBPART1R. Then added support for two new labels
      (semantically speaking): BLIS_SUBPART1A and BLIS_SUBPART1B, which
      represent the subpartition ahead of and behind, respectively,
      BLIS_SUBPART1. Updated check functions in bli_check.c accordingly.
    - Shuffled layering/APIs for bli_acquire_mpart_[mn]dim() and
      bli_acquire_mpart_t2b/b2t(), _l2r/r2l().
    - Deprecated old functions in frame/3/bli_l3_thrinfo.c.

commit 78bc0bc8b6b528c79b11f81ea19250a1db7450ed
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date:   Thu Feb 14 13:29:02 2019 -0600

    Power9 sub-configuration  (#298)
    
    Formally registered power9 sub-configuration.
    
    Details:
    - Added and registered power9 sub-configuration into the build system.
      Thanks to Nicholai Tukanov and Devangi Parikh for these contributions.
    - Note: The sub-configuration does not yet have a corresponding
      architecture-specific kernel set registered, and so for now the
      sub-config is using the generic kernel set.

commit 6b832731261f9e7ad003a9ea4682e9ca973ef844
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 12 16:01:28 2019 -0600

    Generalized ref kernels' pragma omp simd usage.
    
    Details:
    - Replaced direct usage of _Pragma( "omp simd" ) in reference kernels
      with PRAGMA_SIMD, which is defined as a function of the compiler being
      used in a new bli_pragma_macro_defs.h file. That definition is cleared
      when BLIS detects that the -fopenmp-simd command line option is
      unsupported. Thanks to Devin Matthews and Jeff Hammond for suggestions
      that guided this commit.
    - Updated configure and bli_config.h.in so that the appropriate anchor
      is substituted in (when the corresponding pragma omp simd support is
      present).

commit b1f5ce8622b682b79f956fed83f04a60daa8e0fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 5 17:38:50 2019 -0600

    Minor updates to scripts in test/mixeddt/matlab.

commit 38203ecd15b1fa50897d733daeac6850d254e581
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Mon Feb 4 15:28:28 2019 -0500

    Added thunderx2 system in the mixeddt test scripts
    
    Details:
     - Added thunderx2 (tx2) as a system in the runme.sh in test/mixeddt

commit dfc91843ea52297bf636147793029a0c1345be04
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Mon Feb 4 15:23:40 2019 -0500

    Fixed gcc flags for thunderx2 subconfiguration
    
    Details:
    - Fixed -march flag. Thunderx2 is an armv8.1a architecture not armv8a.

commit c665eb9b888ec7e41bd0a28c4c8ac4094d0a01b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 28 16:22:23 2019 -0600

    Minor updates to docs, Makefiles.
    
    Details:
    - Changed all occurrances of
        micro-kernel -> microkernel
        macro-kernel -> macrokernel
        micro-panel  -> micropanel
      in all markdown documents in 'docs' directory. This change is being
      made since we've reached the point in adoption and acceptance of
      BLIS's insights where words such as "microkernel" are no longer new,
      and therefore now merit being unhyphenated.
    - Updated "Implementation Notes" sections of KernelsHowTo.md, which
      still contained references to nonexistent cpp macros such as
      BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
    - Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
    - Minor updates to Testsuite.md, including suggesting use of
      'make check' and 'make check-fast' when running from the local
      testsuite directory.
    - Added a comment to top-level Makefile explaining the purpose behind
      the TESTSUITE_WRAPPER variable, which at first glance appears to serve
      no purpose.

commit 1aa280d0520ed5eaea3b119b4e92b789ecad78a4
Author: M. Zhou <5723047+cdluminate@users.noreply.github.com>
Date:   Sun Jan 27 21:40:48 2019 +0000

    Amend OS detection for kFreeBSD. (#295)

commit fffc23bb35d117a433886eb52ee684ff5cf6997f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 25 13:35:31 2019 -0600

    CREDITS file update.

commit 26c5cf495ce22521af5a36a1012491213d5a4551
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 24 18:49:31 2019 -0600

    Fixed bug in skx subconfig related to bdd46f9.
    
    Details:
    - Fixed code in the skx subconfiguration that became a bug after
      committing bdd46f9. Specifically, the bli_cntx_init_skx() function
      was overwriting default blocksizes for the scomplex and dcomplex
      microkernels despite the fact that only single and double real
      microkernels were being registered. This was not a problem prior to
      bdd46f9 since all microkernels used dynamically-queried (at runtime)
      register blocksizes for loop bounds. However, post-bdd46f9, this
      became a bug because the reference ukernels for scomplex and dcomplex
      were written with their register blocksizes hard-coded as constant
      loop bounds, which conflicted the the erroneous scomplex and dcomplex
      values that bli_cntx_init_skx() was setting in the context. The
      lesson here is that going forward, all subconfigurations must not set
      any blocksizes for datatypes corresponding to default/reference
      microkernels. (Note that a blocksize is left unchanged by the
      bli_cntx_set_blkszs() function if it was set to -1.)

commit 180f8e42e167b83a757340ad4bd4a5c7a1d6437b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 24 18:01:15 2019 -0600

    Fixed undefined behavior trsm ukr bug in bdd46f9.
    
    Details:
    - Fixed a bug that mainfested anytime a configuration was used in which
      optimized microkernels were registered and the trsm operation (or
      kernel) was invoked. The bug resulted from the optimized microkernels'
      register blocksizes conflicting with the hard-coded values--expressed
      in the form of constant loop bounds--used in the new reference trsm
      ukernels that were introduced in bdd46f9. The fix was easy: reverting
      back to the implementation that uses variable-bound loops, which
      amounted to changing an #if 0 to #if 1 (since I preserved the older
      implementation in the file alongside the new code based on constant-
      bound loops). It should be noted that this fix must be permanent,
      since the trsm kernel code with constant-bound loops can never work
      with gemm ukernels that use different register blocksizes.

commit bdd46f9ee88057d52610161966a11c224e5a026c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 24 17:23:18 2019 -0600

    Rewrote reference kernels to use #pragma omp simd.
    
    Details:
    - Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
      indexing annotated by the #pragma omp simd directive, which a compiler
      can use to vectorize certain constant-bounded loops. (The new kernels
      actually use _Pragma("omp simd") since the kernels are defined via
      templatizing macros.) Modest speedup was observed in most cases using
      gcc 5.4.0, which may improve with newer versions. Thanks to Devin
      Matthews for suggesting this via issue #286 and #259.
    - Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
      be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
      respectively, with a default row preference for the gemm ukernel. Also
      updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
      respectively, for all datatypes.
    - Modified configure to verify that -fopenmp-simd is a valid compiler
      option (via a new detect/omp_simd/omp_simd_detect.c file).
    - Added a new header in which prefetch macros are defined according to
      which compiler is detected (via macros such as __GNUC__). These
      prefetch macros are not yet employed anywhere, though.
    - Updated the year in copyrights of template license headers in
      build/templates and removed AMD as a default copyright holder.

commit 63de2b0090829677755eb5cdb27e73bc738da32d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 23 12:16:27 2019 -0600

    Prevent redef of ftnlen in blastest f2c_types.h.
    
    Details:
    - Guard typedef of ftnlen in f2c_types.h with a #ifndef HAVE_BLIS_H
      directive to prevent the redefinition of that type. Thanks to Jeff
      Diamond for reporting this compiler warning (and apologies for the
      delay in committing a fix).

commit eec2e183a7b7d67702dbd1f39c153f38148b2446
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 21 12:12:18 2019 -0600

    Added escaping to '/' in os_name in configure.
    
    Details:
    - Add os_name to the list of variables into which the '/' character is
      escaped. This is meant to address (or at least make progress toward
      addressing) #293. Thanks to Isuru Fernando for spotting this as the
      potential fix, and also thanks to M. Zhou for the original report.

commit adf5c17f0839fdbc1f4a1780f637928b1e78e389
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 18 15:14:45 2019 -0600

    Formally registered thunderx2 subconfiguration.
    
    Details:
    - Added a separate subconfiguration for thunderx2, which now uses
      different optimization flags than cortexa57/cortexa53.

commit 094cfdf7df6c2764c25fcbfce686ba29b933942c
Author: M. Zhou <5723047+cdluminate@users.noreply.github.com>
Date:   Fri Jan 18 18:46:13 2019 +0000

    Port BLIS to GNU Hurd OS. (#294)
    
    Prevent blis.h from misidentifying Hurd as OSX.

commit 5d7d616e8e591c2f3c7c2d73220eb27ea484f9c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 15 20:52:51 2019 -0600

    README.md update re: mixeddt TOMS paper.

commit 58c7fb4788177487f73a3964b7a910fe4dc75941
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 8 17:00:27 2019 -0600

    Added more matlab scripts for mixeddt paper.
    
    Details:
    - Added a variant set of matlab scripts geared to producing plots that
      reflect performance data gathered with and without extra memory
      optimizations enabled. These scripts reside (for now) in
      test/mixeddt/matlab/wawoxmem.

commit 34286eb914b48b56cdda4dfce192608b9f86d053
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 8 11:41:20 2019 -0600

    Minor update to docs/HardwareSupport.md.

commit 108b04dc5b1b1288db95f24088d1e40407d7bc88
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 7 20:16:31 2019 -0600

    Regenerated symbols in build/libblis-symbols.def.
    
    Details:
    - Reran ./build/regen-symbols.sh after running
      'configure --enable-cblas auto' to reflect removal of
      bli_malloc_pool() and bli_free_pool().

commit 706cbd9d5622f4690e6332a89cf41ab5c8771899
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 7 18:28:19 2019 -0600

    Minor tweaks/cleanups to bli_malloc.c, _apool.c.
    
    Details:
    - Removed malloc_ft and free_ft function pointer arguments from the
      interface to bli_apool_init() after deciding that there is no need to
      specify the malloc()/free() for blocks within the apool. (The apool
      blocks are actually just array_t structs.) Instead, we simply call
      bli_malloc_intl()/_free_intl() directly. This has the added benefit
      of allowing additional output when memory tracing is enabled via
      --enable-mem-tracing. Also made corresponding changes elsewhere in
      the apool API.
    - Changed the inner pools (elements of the array_t within the apool_t)
      to use BLIS_MALLOC_POOL and BLIS_FREE_POOL instead of BLIS_MALLOC_INTL
      and BLIS_FREE_INTL.
    - Disabled definitions of bli_malloc_pool() and bli_free_pool() since
      there are no longer any consumers of these functions.
    - Very minor comment / printf() updates.

commit 579145039d945adbcad1177b1d53fb2d3f2e6573
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date:   Mon Jan 7 23:00:15 2019 +0100

    Initialize error messages at compile time (#289)
    
    * Initialize error messages at compile time
    
    - Assigning strings directly to the bli_error_string array, instead of
    snprintf() at execution-time.
    
    * Retired bli_error_init(), _finalize().
    
    Details:
    - Removed functions obviated by changes in 80e8dc6: bli_error_init(),
      bli_error_finalize(), and bli_error_init_msgs(), as well as calls to
      the former two in bli_init.c.
    
    * Regenerated symbols in build/libblis-symbols.def.
    
    Details:
    - Reran ./build/regen-symbols.sh after running
      'configure --enable-cblas auto'.

commit aafbca086e36b6727d7be67e21fef5bd9ff7bfd9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 7 12:38:21 2019 -0600

    Updated external package language in README.md.
    
    Details:
    - Updated/added comments about Fedora, OpenSUSE, and GNU Guix under the
      newly-renamed "External GNU/Linux packages" section. Thanks to Dave
      Love for providing these revisions.

commit daacfe68404c9cc8078e5e7ba49a8c7d93e8cda3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 7 12:12:47 2019 -0600

    Allow running configure with python 3.4.
    
    Details:
    - Relax version blacklisting of python3 to allow 3.4 or later instead
      of 3.5 or later. Thanks to Dave Love for pointing out that 3.4 was
      sufficient for the purpose of BLIS's build system. (It should be
      noted that we're not sure which, if any, python3 versions prior to
      3.4 are insufficient, and that the only thing stopping us from
      determining this is the fact that these earlier versions of python3
      are not readily available for us to test with.)
    - Updated docs/BuildSystem.md to be explicit about current python2 vs
      python3 version requirements.

commit cdbf16aa93234e0d6a80f0d0e385ec81e7b75465
Author: prangana <pradeep.rao@amd.com>
Date:   Fri Jan 4 15:59:21 2019 +0530

    Update version 1.3
    
    Change-Id: I32a7d24af860e87a60396614075236afb65a28a9

commit cf9c1150515b8e9cc4f12e0d4787b3471b12ba4a
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date:   Thu Jan 3 09:51:46 2019 +0530

     This commit adds a macro, which is to be enabled when BLIS is working on single instance mode
    
    Change-Id: I7f3fd654b78e64c4e6e24e9f0e245b1a30c492b0

commit ad8d9adb09a7dd267bbdeb2bd1fbbf9daf64ee76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 3 16:08:24 2019 -0600

    README.md, CREDITS update.
    
    Details:
    - Added "What's New" and "What People Are Saying About BLIS" sections to
      README.md.
    - Added missing github handles to various individuals' entries in the
      CREDITS file.

commit 7052fca5aef430241278b67d24cef6fe33106904
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 2 13:48:40 2019 -0600

    Apply f272c289 to bli_fmalloc_noalign().
    
    Details:
    - Perform the same check for NULL return values and error message output
      in bli_fmalloc_noalign() as is performed by bli_fmalloc_align(). (This
      change was intended for f272c289.)

commit 528e3ad16a42311a852a8376101959b4ccd801a5
Merge: 3126c52e f272c289
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 2 13:39:19 2019 -0600

    Merge branch 'amd'

commit 3126c52ea795ffb7d30b16b7f7ccc2a288a6158d
Merge: 61441b24 8091998b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 2 13:37:37 2019 -0600

    Merge branch 'amd'

commit f272c2899a6764eedbe05cea874ee3bd258dbff3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 2 12:34:15 2019 -0600

    Add error message to malloc() check for NULL.
    
    Details:
    - Output an error message if and when the malloc()-equivalent called by
      bli_fmalloc_align() ever returns NULL. Everything was already in place
      for this to happen, including the error return code, the error string
      sprintf(), the error checking function bli_check_valid_malloc_buf()
      definition, and its prototype. Thanks to Minh Quan Ho for pointing out
      the missing error message.
    - Increased the default block_ptrs_len for each inner pool stored in the
      small block allocator from 10 to 25. Under normal execution, each
      thread uses only 21 blocks, so this change will prevent the sba from
      needing to resize the block_ptrs array of any given inner pool as
      threads initially populate the pool with small blocks upon first
      execution of a level-3 operation.
    - Nix stray newline echo in configure.

commit eb97f778a1e13ee8d3b3aade05e479c4dfcfa7c0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 25 20:17:09 2018 -0600

    Added missing AMD copyrights to previous commit.
    
    Details:
    - Forgot to add AMD copyrights to several touched files that did not
      already have them in 2f31743.

commit 2f3174330fb29164097d664b7c84e05c7ced7d95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 25 19:35:01 2018 -0600

    Implemented a pool-based small block allocator.
    
    Details:
    - Implemented a sophisticated data structure and set of APIs that track
      the small blocks of memory (around 80-100 bytes each) used when
      creating nodes for control and thread trees (cntl_t and thrinfo_t) as
      well as thread communicators (thrcomm_t). The purpose of the small
      block allocator, or sba, is to allow the library to transition into a
      runtime state in which it does not perform any calls to malloc() or
      free() during normal execution of level-3 operations, regardless of
      the threading environment (potentially multiple application threads
      as well as multiple BLIS threads). The functionality relies on a new
      data structure, apool_t, which is (roughly speaking) a pool of
      arrays, where each array element is a pool of small blocks. The outer
      pool, which is protected by a mutex, provides separate arrays for each
      application thread while the arrays each handle multiple BLIS threads
      for any given application thread. The design minimizes the potential
      for lock contention, as only concurrent application threads would
      need to fight for the apool_t lock, and only if they happen to begin
      their level-3 operations at precisely the same time. Thanks to Kiran
      Varaganti and AMD for requesting this feature.
    - Added a configure option to disable the sba pools, which are enabled
      by default; renamed the --[dis|en]able-packbuf-pools option to
      --[dis|en]able-pba-pools; and rewrote the --help text associated with
      this new option and consolidated it with the --help text for the
      option associated with the sba (--[dis|en]able-sba-pools).
    - Moved the membrk field from the cntx_t to the rntm_t. We now pass in
      a rntm_t* to the bli_membrk_acquire() and _release() APIs, just as we
      do for bli_sba_acquire() and _release().
    - Replaced all calls to bli_malloc_intl() and bli_free_intl() that are
      used for small blocks with calls to bli_sba_acquire(), which takes a
      rntm (in addition to the bytes requested), and bli_sba_release().
      These latter two functions reduce to the former two when the sba pools
      are disabled at configure-time.
    - Added rntm_t* arguments to various cntl_t and thrinfo_t functions, as
      required by the new usage of bli_sba_acquire() and _release().
    - Moved the freeing of "old" blocks (those allocated prior to a change
      in the block_size) from bli_membrk_acquire_m() to the implementation
      of the pool_t checkout function.
    - Miscellaneous improvements to the pool_t API.
    - Added a block_size field to the pblk_t.
    - Harmonized the way that the trsm_ukr testsuite module performs packing
      relative to that of gemmtrsm_ukr, in part to avoid the need to create
      a packm control tree node, which now requires a rntm_t that has been
      initialized with an sba and membrk.
    - Re-enable explicit call bli_finalize() in testsuite so that users who
      run the testsuite with memory tracing enabled can check for memory
      leaks.
    - Manually imported the compact/minor changes from 61441b24 that cause
      the rntm to be copied locally when it is passed in via one of the
      expert APIs.
    - Reordered parameters to various bli_thrcomm_*() functions so that the
      thrcomm_t* to the comm being modified is last, not first.
    - Added more descriptive tracing for allocating/freeing small blocks and
      formalized via a new configure option: --[dis|en]able-mem-tracing.
    - Moved some unused scalm code and headers into frame/1m/other.
    - Whitespace changes to bli_pthread.c.
    - Regenerated build/libblis-symbols.def.

commit 61441b24f3244a4b202c29611a4899dd5c51d3a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 20 19:38:11 2018 -0600

    Make local copy of user's rntm_t in level-3 ops.
    
    Details:
    - In the case that the caller passes in a non-NULL rntm_t pointer into
      one of the expert APIs for a level-3 operation (e.g. bli_gemm_ex()),
      make a local copy of the rntm_t and use the address of that local copy
      in all subsequent execution (which may change the contents of the
      rntm_t). This prevents a potentially confusing situation whereby a
      user-initialized rntm_t is used once (in, say, gemm), and then found
      by the user to be in a different state before it is used a second
      time.

commit e809b5d2f1023b4249969e2f516291c9a3a00b80
Merge: 76016691 0476f706
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 20 16:27:26 2018 -0600

    Merge branch 'master' into amd

commit 1f4eeee5175a8fc9ac312847c796ce6db5fe75b9
Author: sraut <Biplab.Raut@amd.com>
Date:   Wed Dec 19 21:21:10 2018 +0530

    Fixed BLAS test failures of small matrix SYRK for single and double precision.
    
    Details:
    - SYRK for small matrix was implemented by reusing small GEMM routine. This was
      resulting in output written to the full C matrix, and C being symmetric the
      lower and upper triangles of C matrix contained same results. BLAS SYRK API
      spec demands either lower or upper triangle of C matrix to be written with
      results. So, this was resulting in BLAS test failures, even though testsuite
      of BLIS was passing small SYRK operation.
    - To fix BLAS test failures of small matrix SYRK, separate kernel routines are
      implemented for small SYRK for both single and double precision. The newly
      added small SYRK routines are in file kernels/zen/3/bli_syrk_small.c.
      Now the intermediate results of matrix C are written to a scratch buffer.
      Final results are written from scratch buffer to matrix C using SIMD
      copy to either lower or upper traingle part of matrix C.
    - Source and header files frame/3/syrk/bli_syrk_front.c and
      frame/3/syrk/bli_syrk_front.h are changed to invoke new small SYRK routines.
    
    Change-Id: I9cfb1116c93d150aefac673fca033952ecac97cb

commit 6d267375c3a0543f20604d74cc678ad91db3b6f1
Author: sraut <Biplab.Raut@amd.com>
Date:   Wed Dec 19 14:22:21 2018 +0530

    This commit improves the performance of multi-instance DGEMM when these multiple threads are binded to a CCX.
    Multi-Instance: Each thread runs a sequential DGEMM.
    Change-Id: I306920c8061b6dad61efac1dae68727f4ac27df6

commit 0476f706b93e83f6b74a3d7b7e6e9cc9a1a52c3b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 18 14:56:20 2018 -0600

    CHANGELOG update (0.5.1)

commit e0408c3ca3d53bc8e6fedac46ea42c86e06c922d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 18 14:56:16 2018 -0600

    Version file update (0.5.1)

commit 3ab231afc9f69d14493908c53c85a84c5fba58aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 18 14:53:37 2018 -0600

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated ReleaseNotes.md in preparation for next version.

commit d1aa87164e1e82347d62aa98793963c5265ef7e7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 18 14:52:40 2018 -0600

    README.md update (External packages section).
    
    Details:
    - Updated External packages section in anticipation of introducing BLIS
      into Debian package universe. Thanks to M. Zhou for sponsoring BLIS in
      Debian.

commit 7bf901e9265a1acd78e44c06f7178c8152c7e267
Author: sraut <Biplab.Raut@amd.com>
Date:   Tue Dec 18 14:39:16 2018 +0530

    Fix on EPYC machine for multi instance performance issue,
    Issue: For the default values of mc, kc and nc with multi instance mode the performance across the cores dip drastically.
    Fix: After experimentation found different set of values (mc, kc and nc) which fits in the cache size, and performance across the remains same across all the cores.
    
    Change-Id: I98265e3b7e61cd7602a0cc5596240e86c08c03fe

commit d2b2a0819a2fccad9165bc48c0e172d79a87542c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 17 19:26:35 2018 -0600

    Removed stray sections from Multithreading.md.
    
    Details:
    - Removed unintended section headers from before table of contents.

commit 93d56319f2953cf0e9df1ff2cda90b8e41351b2c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 17 19:17:30 2018 -0600

    Added missing bli_init_once() in bli_thread API.
    
    Details:
    - Fixed an issue with specifying threading globally at runtime via
      bli_thread_set_num_threads() (the automatic way) or via
      bli_thread_set_ways() (the manual way), with bli_thread_init_rntm()
      also affected. These functions were not calling bli_init_once() prior
      to acting, and therefore their effects on the global rntm_t structure
      were being wiped out by the eventual call to bli_init_once(), by some
      other BLIS function. Thanks to Ali Emre Gülcü for reporting the
      behavior associated with this bug.
    - Added additional content to docs/Multithreading.md covering topics of
      choosing between OpenMP and pthreads, and specifying affinity via
      OpenMP.
    - CREDITS file update.

commit 76016691e2c514fcb59f940c092475eda968daa2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 13 17:23:09 2018 -0600

    Improvements to bli_pool; malloc()/free() tracing.
    
    Details:
    - Added malloc_ft and free_ft fields to pool_t, which are provided when
      the pool is initialized, to allow bli_pool_alloc_block() and
      bli_pool_free_block() to call bli_fmalloc_align()/bli_ffree_align()
      with arbitrary align_size values (according to how the pool_t was
      initialized).
    - Added a block_ptrs_len argument to bli_pool_init(), which allows the
      caller to specify an initial length for the block_ptrs array, which
      previously suffered the cost of being reallocated, copied, and freed
      each time a new block was added to the pool.
    - Consolidated the "buf_sys" and "buf_align" pointer fields in pblk_t
      into a single "buf" field. Consolidated the bli_pblk API accordingly
      and also updated the bli_mem API implementation. This was done
      because I'd previously already implemented opaque alignment via
      bli_malloc_align(), which allocates extra space and stores the
      original pointer returned by malloc() one element before the element
      whose address is aligned.
    - Tweaked bli_membrk_acquire_m() and bli_membrk_release() to call
      bli_fmalloc_align() and bli_ffree_align(), which required adding an
      align_size field to the membrk_t struct.
    - Pass the pack schemas directly into bli_l3_cntl_create_if() rather
      than transmit them via objects for A and B.
    - Simplified bli_l3_cntl_free_if() and renamed to bli_l3_cntl_free().
      The function had not been conditionally freeing control trees for
      quite some time. Also, removed obj_t* parameters since they aren't
      needed anymore (or never were).
    - Spun-off OpenMP nesting code in bli_l3_thread_decorator() to a
      separate function, bli_l3_thread_decorator_thread_check().
    - Renamed:
        bli_malloc_align()   -> bli_fmalloc_align()
        bli_free_align()     -> bli_ffree_align()
        bli_malloc_noalign() -> bli_fmalloc_noalign()
        bli_free_noalign()   -> bli_ffree_noalign()
      The 'f' is for "function" since they each take a malloc_ft or free_ft
      function pointer argument.
    - Inserted various printf() calls for the purposes of tracing memory
      allocation and freeing, guarded by cpp macro ENABLE_MEM_DEBUG, which,
      for now, is intended to be a "hidden" feature rather than one hooked
      up to a configure-time option.
    - Defined bli_rntm_equals(), which compares two rntm_t for equality.
      (There are no use cases for this function yet, but there may be soon.)
    - Whitespace changes to function parameter lists in bli_pool.c, .h.

commit f808d829c58dc4194cc3ebc3825fbdde12cd3f93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 12 15:22:59 2018 -0600

    Handle edge cases, zero-filling in packm kernels.
    
    Details:
    - Updated the API and semantics of packm kernels such that they must now
      handle edge cases, meaning that a c-by-k packm kernel must be able to
      pack edge cases that are fewer than c rows/columns and be able to
      zero-fill the remaining elements. They must also be able to zero-fill
      the equivalent region when copying fewer than k columns/rows (which is
      needed by trsm). The new packm kernel API is generally:
    
        void packm_kernel
             (
               conj_t           conja,
               dim_t            cdim,
               dim_t            n,
               dim_t            n_max,
               ctype*  restrict kappa,
               ctype*  restrict a, inc_t inca, inc_t lda,
               ctype*  restrict p,             inc_t ldp,
               cntx_t* restrict cntx
             );
    
      where cdim and n are the dimensions (short and long, respectively) of
      the submatrix being copied from the source matrix A, and n_max is the
      "full" long dimension (corresponding to the k dimension in gemm) of
      the micropanel. The "full" short dimension (corresponding to the
      register blocksize MR or NR) is not part of the API because it is
      known intrinsically by the packm kernel implementation. Thanks to
      Devin Matthews for prompting us to make this change (#282).
    - Updated all reference packm kernels in ref_kernels/1m according to
      above changes, as well as all optimized packm kernels (which only
      consisted of those for knl).
    - Bumped the major soname version number in 'so_version' to 2. At first
      I was considering leaving it unchanged, but I couldn't escape the
      reality that the packm kernel API is much closer to an expert API
      than it is some obscure helper function interface within the framework
      that nobody would ever notice.
    - Removed reference packm kernels for mr/nr = 30. The only sub-config
      that would have been using those kernels is knc, which is likely no
      longer being used by very many people (if any). (This also mostly
      offset the larger object code footprint incurred by moving the edge-
      case handling into the individual packm kernels.)
    - Fixed an obscure race condition for 3mh and 4mh induced methods in
      which those implementations were modifying the contexts stored in the
      gks rather than a local copy.
    - Fixed a minor bug in the testsuite that prevented non-1m-based induced
      method implementations of trsm from executing.

commit 02ec0be3ba0b0d6b4186386ae140906a96de919b
Merge: e275def3 c534da62
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 5 19:33:53 2018 -0600

    Merge branch 'master' into amd

commit c534da62c0015f91391983da5376c9e091378010
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 5 15:51:05 2018 -0600

    Disabled ARM configuration families in registry.
    
    Details:
    - Disabled (commented out) the arm32 and arm64 configuration families
      in the config_registry file. Having a configuration family registered
      only makes sense if BLIS is currently outfitted with runtime hardware
      detection logic to choose the appropriate sub-configuration. That
      logic is currently missing for ARM architectures, and thus having the
      ARM configuration families in the configuration registry only serves
      to confuse people. Thanks to Devangi Parikh for suggesting this
      change.

commit 6885051a164628904fad0d8a3b39c82f9a7b193c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 5 14:45:39 2018 -0600

    Generalizations/cleanup to mixeddt matlab scripts.
    
    Details:
    - Parameterized, reorganized, and added comments to matlab scripts in
      test/mixeddt/matlab.
    - Reordered some lines of code and added comments to plot_l3_perf.m in
      test/3m4m/matlab.

commit cbdb0566bf3201a495bbdcb8cb50342fa0098649
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 5 20:06:32 2018 +0000

    Updates to 3m4m, mixeddt test driver files.
    
    Details:
    - Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
      port recent changes to the former to the latter.
    - Disabled (for now) code in 3m4m/test_*.c files that disables all
      induced methods except for the one that is requested from the
      Makefile via the IND macro. This is done because usually, we want to
      test whatever method is enabled automatically for complex datatypes.
      (That is, when native complex microkernels are missing, we usually
      want to test performance of 1m.)

commit 0645f239fbdf37ee9d2096ee3bb0e76b3302cfff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 4 14:31:06 2018 -0600

    Remove UT-Austin from copyright headers' clause 3.
    
    Details:
    - Removed explicit reference to The University of Texas at Austin in the
      third clause of the license comment blocks of all relevant files and
      replaced it with a more all-encompassing "copyright holder(s)".
    - Removed duplicate words ("derived") from a few kernels' license
      comment blocks.
    - Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
      with format of all other comment blocks.

commit 9b688a2d69dd420f4d2582827c5ac87e422cd3bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 4 13:30:25 2018 -0600

    Refer to color mm algorithm in Multithreading.md.

commit 22384fd2b749aa8cfdfad1084ce5e7dbd4ad2d64
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 4 13:09:04 2018 -0600

    Minor updates to test_gemm.c in test/mixeddt.

commit 2ba3b1780cbca58e43a3948d67bd07e637036125
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 3 19:40:39 2018 -0600

    Removed symbols from libblis-symbols.def.
    
    Details:
    - Removed bli_gemm_md_front() and bli_gemm_md_zgemm() symbols from
      build/libblis-symbols.def, which will hopefully appease AppVeyor.

commit dcb38c4e59c3395c258799e69bfe2104c578c528
Merge: dc184095 375eb30b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 3 18:06:19 2018 -0600

    Merge branch 'dev'

commit 375eb30b0a63ac06a363a5f75f283584258db48b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 3 17:49:52 2018 -0600

    Added mixed-precision support to 1m method.
    
    Details:
    - Lifted the constraint that 1m only be used when all operands' storage
      datatypes (along with the computation datatype) are equal. Now, 1m may
      be used as long as all operands are stored in the complex domain. This
      change largely consisted of adding the ability to pack to 1e and 1r
      formats from one precision to another. It also required adding logic
      for handling complex values of alpha to bli_packm_blk_var1_md()
      (similar to the logic in bli_packm_blk_var1()).
    - Fixed a bug in several virtual microkernels (bli_gemm_md_c2r_ref.c,
      bli_gemm1m_ref.c, and bli_gemmtrsm1m_ref.c) that resulted in the wrong
      ukernel output preference field being read. Previously, the preference
      for the native complex ukernel was being read instead of the pref for
      the native real domain ukernel. This bug would not manifest if the
      preference for the native complex ukernel happened to be equal to that
      of the native real ukernel.
    - Added support for testing mixed-precision 1m execution via the gemm
      module of the testsuite.
    - Tweaked/simplified bli_gemm_front() and bli_gemm_md.c so that pack
      schemas are always read from the context, rather than trying to
      sometimes embed them directly to the A and B objects. (They are still
      embedded, but now uniformly only after reading the schemas from the
      context.)
    - Redefined cpp macro bli_l3_ind_recast_1m_params() as a static function
      and renamed to bli_gemm_ind_recast_1m_params() (since gemm is the only
      consumer).
    - Added 1m optimization logic (via bli_gemm_ind_recast_1m_params()) to
      bli_gemm_ker_var2_md().
    - Added explicit handling for beta == 1 and beta == 0 in the reference
      gemm1m virtual microkernel in ref_kernels/ind/bli_gemm1m_ref.c.
    - Rewrote various level-0 macro defs, including axpyris, axpbyris,
      scal2ris, and xpbyris (and their conjugating counterparts) to
      explicitly support three operand types and updated invocations to
      xpbyris in bli_gemmtrsm1m_ref.c.
    - Query and use the storage datatype of the packed object instead of the
      storage datatype of the source object in bli_packm_blk_var1().
    - Relocated and renamed frame/ind/misc/bli_l3_ind_opt.h to
      frame/3/gemm/ind/bli_gemm_ind_opt.h.
    - Various whitespace/comment updates.

commit e275def30ac41cadce296560fa67282704f20a02
Merge: 8091998b dc184095
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 30 15:39:50 2018 -0600

    Merge branch 'master' into amd

commit dc18409551f341125169fe8d4d43ac45e81bdf28
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 28 11:58:40 2018 -0600

    CREDITS file update.

commit ee4d2712963816f84d7e3fdd39d93424e1aaf63d
Merge: e81c4b56 3d7e8bc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 28 11:52:57 2018 -0600

    Merge pull request #287 from SuperFluffy/fix_configuration_links
    
    Fix configuration links

commit 3d7e8bc3b8e77693152138e75676f71573e5e6cd
Author: Richard Janis Goldschmidt <janis.beckert@gmail.com>
Date:   Wed Nov 28 15:56:37 2018 +0100

    Fix configuration links

commit 6a4885f8be9ecd81423ebf2eb6da75d7981c979b
Merge: 1d8aae22 e81c4b56
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 27 13:22:59 2018 -0600

    Merge branch 'master' into dev

commit e81c4b56660b25a39f8fdc09fbe07459c5bd8e8e
Merge: 757043ea cfbdb58d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 21 17:00:49 2018 -0600

    Merge pull request #285 from isuruf/pthread
    
    Move LDFLAGS to the end

commit cfbdb58de2e44f2e3a3d8b14fceece7aef4b3006
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 14:23:39 2018 -0600

    Move LDFLAGS to the end
    
    Otherwise the linker will drop flags like -lpthread

commit 757043eae8630c0a76e9bb04f2cb0bd72439a86a
Merge: e769bf46 7af8fa01
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 21 13:07:26 2018 -0600

    Merge pull request #283 from isuruf/patch-3
    
    Fix MinGW and Cygwin build failures

commit 7af8fa01373b7bb30fa3b1fd110fd201c87ea225
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 02:10:05 2018 -0600

    Fix blis dll path

commit 2acd8dcd23805203a6821358c5e3e09d521fecdf
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 02:02:18 2018 -0600

    Fix install path of dll.a

commit b7b0ad22b151e89e2a6c7782cf4d8d47b4e60734
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:54:44 2018 -0600

    Test mingw

commit bafe521ed0012b7b8814404b78a6c576d8386370
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:54:36 2018 -0600

    Fixes for mingw

commit be831879bd03edcddff8a345161f749ad92215af
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:39:32 2018 -0600

    test gcc shared

commit f6b924648c79c4b1c3d3c7fbf85372680aff8362
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:39:19 2018 -0600

    Don't use .def for gcc

commit ce6e4eae6d5e977e6f699acc9cf239be8ac53771
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:34:56 2018 -0600

    test no threading

commit c9169b4685bfe81bc562cf9128b35a6a9884799b
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:17:36 2018 -0600

    Add mingw64 path

commit 0f753090eaf4264b743a49ce15de97514bcbe112
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:14:52 2018 -0600

    Fix PATH

commit d424470b1f2fa8717fa54c0245b21341504665f6
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 01:04:26 2018 -0600

    Check openmp and pthreads threading

commit c73e7601e58239e2dedec6c9f1b752e949254a42
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 00:50:33 2018 -0600

    Revert "enable rdp"
    
    This reverts commit 368274bcbd0c9232521d14fa28304f35ced0e6d7.

commit 6209b2e6060b89e65f3405c31333af8952dd63c0
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 00:50:22 2018 -0600

    Remove conda

commit 0b1b344447b8a2fcd635a48f0ce7ce89b2107dc4
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 00:42:39 2018 -0600

    Fix make name

commit 7a9838983ba8dd32ac9f87712255721542ff561f
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 00:35:27 2018 -0600

    Use m2w64-make

commit 4c1dedd6a90087807f16353a5d0bcaaade35a7a5
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 00:28:20 2018 -0600

    No activate on gcc

commit 368274bcbd0c9232521d14fa28304f35ced0e6d7
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Tue Nov 20 23:40:26 2018 -0600

    enable rdp

commit 707a5e7f9b07f554e1e9289dd0ce3b7dc4fded6e
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Tue Nov 20 23:39:31 2018 -0600

    No conda for mingw build

commit 65b0565c0ad9162d4474bd84eabde491fa971538
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Tue Nov 20 23:19:38 2018 -0600

    Check MinGW-w64

commit 9ddffba5847080e0d77d9e6059d05dc4b1d89ba5
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Nov 21 00:23:34 2018 -0600

    Fix MinGW build failure
    
    Fixes https://github.com/flame/blis/issues/278

commit 1d8aae220bc52ce8e3a8afaa64b57e5d83480bdc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 20 18:42:07 2018 -0600

    Track internal scalar datatypes.
    
    Details:
    - Added a num_t datatype bitfield to the obj_t in the form of a new
      info2 field in the obj_t. This change was made primarily so that in
      the case of mixed-datatype gemm, the alpha scalar would not need to
      be cast to the storage datatype of B (or A) before then being cast to
      the computation datatype just before the macrokernel is called. This
      double-casting regime could result in loss of precision if the storage
      datatype of B (or A) is less than the computation precision. In
      practice, it was likely not going to be a big deal since most usage of
      alpha is for -1.0, 0.0, and 1.0 (or integer multiples thereof), which
      can all be represented exactly in single or double precision.
    - The type of objbits_t was changed to uint32_t, so the new format
      potentially takes up the same space as the previous obj_t definition,
      assuming no padding inserted by the compiler. Shrinking info to 32
      bits and spilling over into a second field was chosen over using the
      high 32 bits of a single 64-bit objbits_t info field because many of
      the bitwise operations are performed with enums such as num_t, dom_t,
      and prec_t, which may take on the type of 32-bit ints. It's easier to
      just keep all of those bitwise operations in 32 bits than perform a
      million typecasts throughout bli_type_defs.h and bli_obj_macro_defs.h
      to ensure that the integers are treated as 64-bit for the purposes of
      the ANDs, ORs, and bitshifts.
    - Many comment updates.
    - Thanks to Devin Matthews and Devangi Parikh for their feedback and
      involvement during this commit cycle.

commit e769bf46b0931d68031af212110484ec98e16908
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 20 16:16:53 2018 -0600

    Tweak testsuite to issue FAIL for Nan, Inf (#279).
    
    Details:
    - Adjusted the definition for libblis_test_get_string_for_result() in
      testsuite/src/test_libblis.c so that the "FAIL" string is returned if
      the computed residual contains either NaN or Inf. Previously, a
      residual containing NaN would result in the selection of the "PASS"
      string. Thanks to Devin Matthews for reporting this issue (#279).
    - Expounded on comment for the macro definitions of bli_isnan() and
      bli_isinf() in bli_misc_macro_defs.h to make it more obvious why they
      must remain macros.

commit 279deae18fb8b8106161863b46fcb38232314de4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 16 11:34:19 2018 -0600

    Added 4x5 matlab plotting scripts to test/3m4m.
    
    Details:
    - Added a new directory, test/3m4m/matlab, containing matlab scripts for
      plotting 4x5 panels of performance graphs (using the subplot()
      function) for gemm, hemm, herk, trmm, and trsm across all four
      floating-point datatypes. I expect to further refine these scripts as
      time goes on, but their current state constitutes a good start.

commit 7b02c726650336c12286c8ba166d1d0fdf7601a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 14 13:49:55 2018 -0600

    CREDITS file update.

commit 84dd298a27033945fa2d3b6e5dce1fe625cd2a0a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 14 13:47:45 2018 -0600

    Patch to fix msys2/Windows build failure (#277).
    
    Details:
    - Expanded cpp guard in frame/include/bli_x86_asm_macros.h to also check
      __MINGW32__ in addition to _WIN32, __clang__, and __MIC__. Thanks to
      Isuru Fernando for suggesting this fix, and also to Costas Yamin for
      originally reporting the issue (#277).

commit 8091998b6500e343c2024561c2b1aa73c3bafb0b
Merge: 333d8562 7b5ba731
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 14 12:36:35 2018 -0600

    Merge branch 'master' into amd

commit 7b5ba7319b3901ad0e6c6b4fa3c1d96b579efbe9
Merge: ce719f81 52392932
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 14 12:32:01 2018 -0600

    Merge branch 'dev' of github.com:flame/blis into dev

commit 52392932dc1ea3c16220cc4e6978efcb2f5f0616
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 13 22:23:38 2018 +0000

    Minor fixes to test/3m4m drivers.
    
    Details:
    - Cleanups to Makefile to allow all test drivers to be built for
      OpenBLAS and MKL in addition to BLIS.
    - Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
    - Fixed incorrect types for betap in BLAS cpp macro branch of
      test_herk.c.

commit 4f12e36a0d0e6df146314b4e50e36c5e7a1af3d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 13 14:23:12 2018 -0600

    Fixed number of columns in first output line.
    
    Details:
    - In previous commit, forgot to remove output column corresponding to
      the k dimension.

commit a2e0cdd7debf8109198536d55af05d5631072fb2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 13 14:15:11 2018 -0600

    Added hemm test driver to test/3m4m.
    
    Details:
    - Added a new test_hemm.c test driver to test/3m4m, which was modeled
      after the driver by the similar name in test. Also updated Makefile
      so that blis-nat-[sm]t would trigger builds for the new driver.

commit 0f9b53e84b48d8d73a56cc9889eae3595ca58a78
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 13 13:03:15 2018 -0600

    Fixed a bug in high-level mixeddt conditional.
    
    Details:
    - Fixed a bug in frame/3/bli_l3_oapi.c in the conditional that divides
      use of induced method (1m) execution from native execution. The former
      was intended to only be used in cases where all storage datatypes are
      complex and the datatype of C is equal to the computation datatype.
      (If mixed datatypes are detected, native execution would be used.)
      However, the code in bli_gemm() was erroneously checking the execution
      datatype instead of the computation datatype, which at that point is
      guaranteed to be equal to the storage datatype even if the computation
      datatype contains a different value. Thanks to Devangi Parikh for
      helping in isolating this bug.

commit 333d8562f04eea0676139a10cb80a97f107b45b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Nov 11 14:28:53 2018 -0600

    Added debug output to bli_malloc.c.
    
    Details:
    - Added debug output to bli_malloc.c in order to debug certain kinds of
      memory behavior in BLIS. The printf() statements are disabled and must
      be enabled manually.
    - Whitespace/comment updates in bli_membrk.c.

commit ce719f816d1237f5277527d7f61123e77180be54
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 10 14:48:43 2018 -0600

    More edits to mixeddt matlab scripts.
    
    Details:
    - Renamed scripts in test/mixeddt/matlab:
        plot_case_all.m -> plot_dom_all.m
        plot_case_md.m  -> plot_dom_case.m
        plot_all_md.m   -> plot_dt_all.m
    - Added plot_dt_select.m in order to plot select graphs for the main
      body of the mixeddt paper, and added additional related legend
      handling in plot_gemm_perf.m.
    - Added test/mixeddt/matlab/output and a .gitkeep file within in order
      to force git to recognize the directory.

commit bf99e7c14baf45725b698d06ad043b531e3a2763
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 8 18:47:17 2018 -0600

    Minor updates to test/mixeddt driver.
    
    Details:
    - Cleaned up test/mixeddt Makefile in preparation for gathering new
      data for mixeddt paper, including renaming implementations to
      "internal" and "ad-hoc" to match the terminology to be used in the
      paper.
    - Added new matlab scripts for generating 8 figures, each covering all
      mixed-precision cases for each mixed-domain case.
    - Updated the runme.sh script according to changes to Makefile.
    - Fixed a minor bug in test_gemm.c that may have given incorrect
      performance in complex, homogeneous storage datatype cases where
      the computation precision was equal to the storage precisions.
      (Examples: zzzd, cccs.)

commit 4bbb454bf3c361af9e97bfa394a73d610cd9002a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 3 19:11:01 2018 -0500

    Testsuite docs update for mixed-datatype gemm.
    
    Details:
    - Updated docs/Testsuite.md to include mention of the new mixed-domain
      and mixed-precision settings, including descriptions.
    - Updated docs/MixedDatatypes.md to include a brief section on running
      the testsuite to exercise mixed-datatype functionality, which mostly
      amounts to a link to the Testsuite.md document.
    - Minor verbiage change to testsuite output to correct a misleading
      label associated with the value returned by the query function
      bli_info_get_simd_num_registers(). (The function does not return the
      number of SIMD registers present in the hardware, but rather a maximum
      assumed value for the purposes of allocating temporary microtile
      workspace on the function stack.)

commit 16401ae922b1285437cf5f6867b2764650a95fb0
Merge: f19c33af 2d403a15
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 3 19:09:43 2018 -0500

    Merge branch 'dev'

commit 2d403a1535380a2ebe2ae2c0f5ac54ba7564fbeb
Merge: e90e7f30 4a12979f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 1 20:18:53 2018 -0500

    Merge pull request #275 from RhysU/patch-1
    
    Spelling in FAQ

commit 4a12979f65697ed79ba290efd59f4b994ac9429b
Author: Rhys Ulerich <rhys.ulerich@gmail.com>
Date:   Thu Nov 1 20:20:59 2018 -0400

    Spelling in FAQ

commit f19c33af4cbe6f5705b96fbf2b8799c3c2bd75c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 26 17:07:15 2018 -0500

    Disallow 64b BLAS integers + 32b BLIS integers.
    
    Details:
    - Print an error message from configure if the user attempts to
      explicitly configure BLIS for simultaneous use of 64-bit integers in
      the BLAS API with 32-bit integers in the BLIS API.
    - Added cpp macro conditional to bli_type_defs.h to mandate that BLIS
      integers be 64 bits if the BLAS integers are 64 bits. This and the
      above item take care of issue #274. Thanks to Devin Matthews and
      Jeff Hammond for suggesting these safeguards.
    - Slight reorganization and relabeling (for clarity) of BLAS/CBLAS
      sections and BLIS integer size line of the testsuite configuration
      output.
    - Very minor edits to docs/MixedDatatypes.md.

commit e90e7f309b3f2760a01e8e09a29bf702754fa2b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 25 14:09:43 2018 -0500

    CHANGELOG update (0.5.0)

commit be7c57819cfd48adb175d9a480cc9f37928645c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 25 14:09:40 2018 -0500

    Version file update (0.5.0)

commit 75da7f2a208ad7d26ed9c6d3e10d08b2a1caf9d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 25 14:02:41 2018 -0500

    ReleaseNotes.md update in advance of next version.
    
    Details:
    - Updated ReleaseNotes.md in preparation for next version.
    - Updated docs/FAQ.md to reflect recent developments, and other edits.
    - Minor updates to RELEASING.

commit 6fbc456fb3f4401ec951a618990f15a84fdfa236
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 25 13:20:25 2018 -0500

    Added SALT testing to Travis CI.
    
    Details:
    - Modified .travis.yml to automatically employ the simulation of
      application-level threading within the testsuite, with supporting
      changes to common.mk, the top-level Makefile, and
      travis/do_testsuite.sh.
    - Added a new pair of input files to testsuite directory with the
      '.salt' suffix (similar to those with the '.fast' suffix) for
      testing application-level threading.
    - Updated docs/BuildSystem.md to document the new make targets
      'testblis-salt' and 'checkblis-salt'.

commit 0e27963a6770e6b64f3299ad0613d5df45d8b6ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 24 12:16:19 2018 -0500

    Add bli_pthread_mutex_trylock().
    
    Details:
    - Added the missing bli_pthread_mutex_trylock() function and prototype
      to the non-Windows sections of bli_pthread.c and .h. This function
      isn't needed by BLIS, but I figured why not make the Windows and
      non-Windows sections consistent with one another.

commit 4b683740c12f83804a51ec610b16ce28607d5c85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 24 11:56:16 2018 -0500

    Defined bli_pthread_cond_*() and related defs.
    
    Details:
    - Added function definitions for bli_pthread_cond_*() as well as related
      types and constants to bli_pthread.c, and corresponding prototypes to
      bli_pthread.h.

commit 4b4f8072b9bb495b3e01d45698b0bad3dac31ba8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 24 11:31:46 2018 -0500

    Define bli_pthreads barrier types on OS X.
    
    Details:
    - Fully define bli_pthreads barrier-related types on OS X. Only typedef
      those types in terms of pthreads types on non-Windows, non-Apple OSes
      (i.e. Linux).

commit ad98790dcef6bd9aab7f13d615b987b5daa58757
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 23 20:35:05 2018 -0500

    Fix names of Windows pthread initializer macros.
    
    Details:
    - Renamed the PTHREAD_ initializer macros in the Windows cpp case to use
      BLIS_ prefixes to match their non-Windows counterparts.

commit 06c23954e6b17219a50c3d37821544a46defaf89
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 23 19:16:54 2018 -0500

    Defined unified bli_pthreads_*() API for all OSes.
    
    Details:
    - Expanded the bli_pthread_*() -> pthread_*() wrappers in
      frame/thread/bli_pthread.c to include cases for Windows taken from
      frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
      and always used by BLIS and the BLIS testsuite (in lieu of calling
      pthreads directly, as before). The implementation used in this new
      API depends on whether we are building for Windows, and to a lesser
      extent, whether we are building on OS X. For the core API, Windows
      uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
      OS X and Windows get barriers implemented in terms of other
      bli_pthread_*() functions, and Linux gets barriers implemented in
      terms of pthread_barrier*(). This commit addresses issue #273.
    - Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
      which was erroneously calling pthread_mutex_lock().
    - Minor changes to configure so that the auto-detection executable
      can be built given the above changes (most notably, turning on
      POSIX extensions via -D_GNU_SOURCE).
    - Removed temporary play-test code for shiftd that accidentally got
      committed into test/3m4m/test_gemm.c.

commit 0ae9585da1e3db1cf8034d4b16305a5883beb0d3
Author: pradeeptrgit <pradeep.rao@amd.com>
Date:   Tue Oct 23 09:36:23 2018 +0530

    Update version number to 1.2
    
    Change-Id: Ibb31f6683cdecca6b218bc2f0c14701d7e92ebf3

commit eac7d267a017d646a2c5b4fa565f4637ebfd9da7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 22 18:10:59 2018 -0500

    Unconditionally define bli_l3_thread_entry().
    
    Details:
    - Define a dummy bli_l3_thread_entry() function when multithreading is
      disabled altogether, or enabled via OpenMP. This function was
      originally necessary when multithreading is enabled via pthreads.
      By defining the function no matter the threading options given, it is
      less likely that an AppVeyor Windows build will complain due to a
      missing symbol in the DLL. (To be clear: AppVeyor was working fine
      before, but a problem may have arisen if it were switched to an
      OpenMP build.)
    - Removed the prototype for bli_l3_thread_entry() from
      bli_thrcomm_pthreads.c and placed it in bli_thrcomm.h.
    - Regenerated the symbols list file build/libblis-symbols.def.

commit 4ee986f0a74207f4ca29df077929134725d62b80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 22 14:09:44 2018 -0500

    Added mixed-datatype testing to Travis CI (#271).
    
    Details:
    - Modified .travis.yml to automatically test the mixed-datatype support
      of the gemm operation, with supporting changes to common.mk, the
      top-level Makefile, and travis/do_testsuite.sh.
    - Added a new pair of input files to testsuite directory with the
      '.mixed' suffix (similar to those with the '.fast' suffix) for testing
      mixed-datatype gemm.
    - Updated docs/BuildSystem.md to document the new make targets
      'testblis-md' and 'checkblis-md'.

commit c3c6ebc9c6244053d654a9b0c955acb2fef42ee8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Oct 21 18:48:54 2018 -0500

    Fixed thrinfo_t printing for small problems.
    
    Details:
    - Fixed a bug in the code that prints out the communicator and work ids
      from the various threads' thrinfo_t nodes. This bug manifested when
      the dimension being parallelized was not large enough such that every
      thread was assigned actual work (since the minimum amount of work is
      determined by the register blocksize in the dimension being
      parallelized). In those cases, the threads that receive no work in
      that dimension do not finish building their thrinfo_t tree, leaving
      lower-level nodes non-existent. (The bug itself was usally observed as
      a segfault when the printing code attempted to dereference all the way
      down the thrinfo_t tree.) The solution involves explicitly checking
      each node as it is dereferenced, and if at any time NULL is found, all
      subsequent communicator and work ids are set to -1.

commit 73a222c0d99dcc221be7dea10eaebf844f31f72e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Oct 20 14:13:04 2018 -0500

    Minor edits to 'configure --help' text.

commit 14f3d5e6df183819a0c393b2661ad15df0786544
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 20:39:35 2018 -0500

    Refresh libblis-symbols.def post-merge 090e4f0.

commit 090e4f08fc2f429a1b2db77b0a6f8276f892a7ac
Merge: c9be5889 0854e880
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 18:41:10 2018 -0500

    Merge branch 'master' into dev

commit 0854e880b0848e0c2e3d0644c93c80b0fd13c0dc
Merge: 4e38a8d4 343a2715
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 18:05:00 2018 -0500

    Merge pull request #261 from flame/win-pthreads
    
    Implement missing pthreads function on Windows

commit c9be5889fbe947c64ef75740662e4d63032f4c35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 17:42:40 2018 -0500

    Added "Known issues" section to Multithreading.md.
    
    Details:
    - Added known issues section to Multithreading.md.
    - Trivial changes to MixedDatatypes.md, Sandboxes.md.

commit 343a2715ebee28d250ee41b914abdcd1dc77c344
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 16:59:19 2018 -0500

    Whitespace changes to configure, bli_pthread_wrap.
    
    Details:
    - Mostly whitespace changes (spaces to tabs) to configure and
      bli_pthread_wrap.c and .h.

commit 3678a1cd518df9447b4b1ea86885eb2ba8abcf6e
Merge: 85397cd4 4e38a8d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 16:11:31 2018 -0500

    Merge branch 'master' into win-pthreads

commit 4e38a8d4eebb18ead74e644fac76a4fde8e7f6c6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 15:54:15 2018 -0500

    Implemented python version checking in configure.
    
    Details:
    - Added python version checking to configure script. (Recall that python
      is needed to execute the flatten-headers.py script.) Minimum versions
      of python needed are currently as follows:
        python2: 2.7 or later
        python3: 3.5 or later
      The standard search order for python interpeters is:
        python python3 python2
      The PYTHON environment variable is also supported and will be checked
      before the standard search order list.
    - Updated BuildSystem.md to include: a minimum make version; mention
      that the C compiler must actually be a C99 compiler; and the caveat
      that Windows builds do not require pthreads since BLIS can provide
      an implementation of pthreads internally.

commit 85397cd4fa52f6c4c33f4fb715478c55533c680e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 13:12:43 2018 -0500

    Added explanatory comment to bli_pthread.c.
    
    Details:
    - Added a verbose comment to bli_pthread.c that explains why a bli_
      wrapper to pthreads APIs is useful.

commit 53c07035ef61cc9b8469636d4d8fa5085f37652d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 19 12:53:03 2018 -0500

    Refresh libblis-symbols.def from bb6df28.
    
    Details:
    - Forgot to regenerate the symbols file after the previous commit
      (bb6df281) in which shiftd operation was introduced.

commit 473ce54f5fbea4860ac0514e7e8b022c1ea03e63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 18 19:03:56 2018 -0500

    Added bli_pthread_*() API.
    
    Details:
    - Defined a bli_pthread_*() API so that the testsuite, when being linked
      against a Windows DLL, will be able to access pthreads functionality
      without those pthreads functions being explicitly exported by the DLL.
      Instead, we export the bli_pthread_*() layer, which uses types and
      functions that are identical to pthreads, but adds a 'bli_' prefix.
      Only a few basic functions are present in the bli_pthreads_*() API
      for now. Thanks to Devin Matthews and Isuru Fernando for their help
      on a related PR (#261) that this commit will hopefully facilitate.
    - Updated testsuite so that it calls bli_pthread_*() layer instead of
      pthread_*() functions directly.
    - Regenerated build/libblis-symbols.def.
    - Comment updated to build/regen-symbols.sh.

commit bb6df2814fcaa2fa62a549379f61be2f8667a598
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 18 17:11:39 2018 -0500

    Defined a new level-1d operation: shiftd.
    
    Details:
    - Defined a new level-1d operation called 'shiftd', including object and
      typed APIs. This operation adds a scalar value to every element along
      an arbitrary diagonal of a matrix. Currently, shiftd is implemented in
      terms of the addv kernel. (The scalar is passed in as the x vector
      with an increment of zero.)
    - Replaced ad-hoc usage of setd and addd (after creating a temporary
      matrix object) with use of shiftd, which is much more concise, in
      various test driver files in the testsuite. Similar changes were made
      to the standalone test drivers and the example code.
    - Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md
      for bli_shiftd() and bli_?shiftd(), respectively.
    - Added observed object properties to level-1d documentation in
      BLISObjectAPI.md.

commit 53e0a0c9b38e8525c7224e280342ef56328af567
Merge: 1c7247b6 ec676799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 18 14:54:59 2018 -0500

    Merge branch 'master' into win-pthreads

commit ec67679990660a60362a49406595383672812287
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 18 14:27:02 2018 -0500

    Refreshed Windows symbol list; added regen script.
    
    Details:
    - Moved windows/build/libblis-symbols.def to build/libblis-symbols.def.
      Updated link commands in common.mk accordingly.
    - Added a new script build/regen-symbols.sh that will regenerate the
      libblis-symbols.def file in its new location after building a
      haswell-targeted shared library. Thanks to Isuru Fernando for
      providing the symbol generation command.
    - Ran the new script to refresh the symbols file.

commit fdad54ab8eee4a7efd04ec4afb3e6902eb22e60a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 18 12:43:22 2018 -0500

    Removed old symbol from libblis-symbols.def.
    
    Details:
    - Removed bli_gemm_ker_var1() from windows/build/libblis-symbols.def
      since this function is no longer compiled.

commit 49d3f9fcbb4a75553439f97c099ea48d85763eea
Merge: 779d64dc 3c527256
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 17 18:00:40 2018 -0500

    Merge branch 'master' into dev

commit 3c52725693d0d7726e1c8fb224f9b1ef786db8b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 17 14:56:22 2018 -0500

    Renamed/moved l3 zen ukernels to haswell kernel set.
    
    Details:
    - Renamed the microkernels in kernels/zen/3 to kernels/haswell/3 and
      then updated the file contents to use the 'haswell' infix.
    - Updated bli_cntx_init_zen.c and bli_cntx_init_haswell.c according to
      above function renames.
    - Moved/updated the corresponding prototypes in bli_kernels_zen.h to
      bli_kernels_haswell.h.
    - Updated config_registry according to above changes.
    - NOTE: This rename reflects the fact that haswell microkernels are
      specifically written to overcome the floating-point latency for FMA
      instructions on Intel Haswell-like architectures, which can issue two
      FMA instructions per cycle. These ukernels happen to work fine on AMD
      Zen-based architectures. However, Zen only issues one FMA per cycle,
      which, while halving its floating-point throughput, gives it extra
      flexibility in the design of its microkernels--namely, mr and nr can
      be smaller and still overcome the floating-point latency for those
      single-issue cores. A smaller value of mr and nr allows for a larger
      value of kc, which may be useful in some situations. In the future,
      we may write such Zen-specific microkernels to take advantage of this
      additional flexibility.

commit 71c5832d5f5596f25204980803423d08143a4010
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 17 14:11:01 2018 -0500

    Consolidated slab/rr-explicit level-3 macrokernels.
    
    Details:
    - Consolidated the *sl.c and *rr.c level-3 macrokernels into a single
      file per sl/rr pair, with those files named as they were before
      c92762e. The consolidation does not take away the *option* of using
      slab or round-robin assignment of micropanels to threads; it merely
      *hides* the choice within the definitions of functions such as
      bli_thread_range_jrir(), bli_packm_my_iter(), and bli_is_last_iter()
      rather than expose that choice explicitly in the code. The choice of
      slab or rr is not always hidden, however; there are some cases
      involving herk and trmm, for example, that require some part of the
      computation to use rr unconditionally. (The --thread-part-jrir option
      controls the partitioning in all other cases.)
    - Note: Originally, the sl and rr macrokernels were separated out for
      clarity. However, aside from the additional binary code bloat, I later
      deemed that clarity not worth the price of maintaining the additional
      (mostly similar) codes.

commit 57eab3a4f0e43099fc2ff189df9fcc0d7801c2cd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 17 11:29:20 2018 -0500

    CREDITS file update.

commit 6722ec21817cbab9d86ee63f00984eb407b5e627
Author: Ye Luo <xw111luoye@gmail.com>
Date:   Wed Oct 17 11:26:00 2018 -0500

    Fix bgclang compilation on BGQ (#270)
    
    * Fix bgq kernels
    
    * Support bgq with bgclang

commit 1c7247b6d146fc728d7c4240e4e069e33f8f8868
Merge: c1bc5530 6c5a1aaf
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 16 14:44:32 2018 -0500

    Merge branch 'win-pthreads' of github.com:flame/blis into win-pthreads

commit c1bc5530d51bf55b4aa3c35165f6d4452a0fd779
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 16 14:44:10 2018 -0500

    Don't call pthread_once in auto-detect.

commit b9c61d03f542a2e92551ff0595415bec3076ab25
Merge: 5a1e461f 3612ecac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 16 14:39:57 2018 -0500

    Merge branch 'nested-omp-patch'

commit 5a1e461ffe09ed200ee2fc7aafccf6dd7e8c0080
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 16 14:21:45 2018 -0500

    Execute flatten-headers.py via $(PYTHON).
    
    Details:
    - Execute build/flatten-headers.py python script via $(PYTHON) in
      common.mk. This allows distributions that define the current/preferred
      python interpreter in the PYTHON environment variable to use that
      interpreter when executing flatten-headers.py. Thanks to Isuru
      Fernando for this suggestion, and for Dave Love for submitting the
      initial issue/request.

commit 6c5a1aaff540b19672e91501e894ed695aee322b
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 16 10:15:59 2018 -0500

    Fix type in bli_pthread_wrap.c

commit 29e6245816760b1bd4ac738d7d3e11a9d9d13473
Merge: 0b73209f ed657714
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 16 10:12:25 2018 -0500

    Merge branch 'master' into win-pthreads

commit 0b73209f6b22cc024169146d343627f6999b63d8
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 16 10:02:06 2018 -0500

    Add missing argument to WaitForSingleObject and use $is_win in configure
    to turn off pthreads.

commit ed65771482a705f7ed028d822489766327b44e76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 15 17:54:45 2018 -0500

    Fixed merge fail on testsuite threading macros.
    
    Details:
    - Applied the following C preprocessor macro renames
    
        BLIS_DEFAULT_MR_THREAD_MAX  -> BLIS_THREAD_MAX_IR
        BLIS_DEFAULT_NR_THREAD_MAX  -> BLIS_THREAD_MAX_JR
        BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
        BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N
    
      in src/test_libblis.c. This is apparently the result of a failure by
      git to properly merge the 'master' and 'amd' branches in the previous
      commit. (The 'master' branch contained a commit, 53a9ab1, in which
      these same cpp macros were renamed throughout the source distribution.

commit dc5fd898af8c74c2e2a75fc647157da0d04dd922
Merge: 667d3929 637c2ce7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 15 17:41:35 2018 -0500

    Merge branch 'amd'

commit 779d64dc3091dea6b7530283304e52878151d218
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 15 17:13:18 2018 -0500

    Added entry for xpbym to input.operations.fast.
    
    Details:
    - Forgot to add an entry for the new xpbym operation to
      input.operations.fast in previous commit.

commit 5fec95b99f61761963834f62a9867f797687813c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 15 16:37:39 2018 -0500

    Implemented mixed-datatype support for gemm.
    
    Details:
    - Implemented support for gemm where A, B, and C may have different
      storage datatypes, as well as a computational precision (and implied
      computation domain) that may be different from the storage precision
      of either A or B. This results in 128 different combinations, all
      which are implemented within this commit. (For now, the mixed-datatype
      functionality is only supported via the object API.) If desired, the
      mixed-datatype support may be disabled at configure-time.
    - Added a memory-intensive optimization to certain mixed-datatype cases
      that requires a single m-by-n matrix be allocated (temporarily) per
      call to gemm. This optimization aims to avoid the overhead involved in
      repeatedly updating C with general stride, or updating C after a
      typecast from the computation precision. This memory optimization may
      be disabled at configure-time (provided that the mixed-datatype
      support is enabled in the first place).
    - Added support for testing mixed-datatype combinations to testsuite.
      The user may test gemm with mixed domains, precisions, both, or
      neither.
    - Added a standalone test driver directory for building and running
      mixed-datatype performance experiments.
    - Defined a new variation of castm, castnzm, which operates like castm
      except that imaginary values are not touched when casting a real
      operand to a complex operand. (By contrast, in these situations castm
      sets the imaginary components of the destination matrix to zero.)
    - Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
      usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
      also simplified the implementation of bli_obj_imag_equals().
    - Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
      when given BLIS_CONSTANT objects.
    - Disabled dt_on_output field in auxinfo_t structure as well as all
      accessor functions. Also commented out all usage of accessor
      functions within macrokernels. (Typecasting in the microkernel is
      still feasible, though probably unrealistic for now given the
      additional complexity required.)
    - Use void function pointer type (instead of void*) for storing function
      pointers in bli_l0_fpa.c.
    - Added documentation for using gemm with mixed datatypes in
      docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
    - Defined level-1d operation xpbyd and level-1m operation xpbym.
    - Added xpbym test module to testsuite.
    - Updated frame/include/bli_x86_asm_macros.h with additional macros
      (courtsey of Devin Matthews).

commit 3612ecac98a9d36c3fcd64154121d420bb69febd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 11 15:16:41 2018 -0500

    Added comments to nested OpenMP handling code.
    
    Details:
    - Added comments to bli_thrcomm_openmp.c relating to changes made in
      6ac0c80 and 1064d79.

commit 667d3929ee20e94849b4e25b693b4037b7e3f350
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 11 11:47:57 2018 -0500

    Added Fortran APIs for some thread functions.
    
    Details:
    - Defined Fortran-77 compatible APIs for bli_thread_set_num_threads()
      and bli_thread_set_ways(). These wrappers are defined in
      frame/compat/blis/thread/b77_thread.c. Thanks to Kay Dewhurst for
      suggesting these new interfaces.
    - Added missing prototype for bli_thread_set_ways() in bli_thread.h and
      removed prototypes for non-existent functions bli_thread_set_*_nt().
    - CREDITS file update.

commit 1064d79711f03a0541b92d8b8b9b7e25e04097a5
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 11 11:14:25 2018 -0500

    Adjust rntm_t struct as well.

commit 6ac0c805609b85616ddb32e50101c4f9feb25a35
Author: Devin Matthews <damatthews@smu.edu>
Date:   Thu Oct 11 10:45:07 2018 -0500

    Fix OMP nesting problem.
    
    Detect when OpenMP uses fewer threads than requested and correct accordingly, so that we don't wait forever for nonexistent threads. Fixes #267.

commit 78a6935483409ae277c766406e175772e820b1de
Author: sraut <Biplab.Raut@amd.com>
Date:   Thu Oct 11 10:49:40 2018 +0530

    Added comments for the change in syrk small matrix change.
    
    Change-Id: I958939e9953323730da49ef07d1b10e578837d82

commit 53a9ab1c85be14dcfd2560f5b16e898e3e258797
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 10 15:11:09 2018 -0500

    Renamed thread auto-factorization macro constants.
    
    Details:
    - Renamed the following C preprocessor macros whose fallback/default
      values are specified within frame/include/bli_kernel_macro_defs.h:
    
        BLIS_DEFAULT_MR_THREAD_MAX  -> BLIS_THREAD_MAX_IR
        BLIS_DEFAULT_NR_THREAD_MAX  -> BLIS_THREAD_MAX_JR
        BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
        BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N
    
    - Renamed the above cpp macro overrides within the knl, skx, and zen
      sub-configurations, as well as invocations of those macros in
      bli_rntm.c.
    - Moved config/zen/bli_kernel.h to an 'old' directory as it is no longer
      used by any code within BLIS.

commit 637c2ce794b0414ba8b25e9a452f7d64f825d63a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 9 17:18:04 2018 -0500

    Updated column index range for irun.py -q.
    
    Details:
    - Forgot to apply the column index range fix in 10f179f to situations
      when "quiet" mode (-q) is requested. This commit applies the new
      column index range modifications to the quiet case.

commit e2a59400bdda7ed7ee0ff00edea70c00ed593b6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 9 15:29:48 2018 -0500

    Allow trsm_l parallelism in the jc loop.
    
    Details:
    - Previously, trsm was consolidating all ways of parallelism into the jr
      loop. This was unnecessary and to some degree detrimental on some
      types of hardware. Now, any parallelism bound for the jc loop will be
      applied to the jc loop, while all other loops' parallelism is funneled
      to the jr loop. Thanks to Devangi Parikh for helping investigate this
      issue and suggesting the fix.
    - NOTE: This change affects only left-side trsm. However, currently
      right-side trsm is currently implemented in terms of the left-side
      case, and thus the change effectively applies to both left and right
      cases.

commit f1dba506c970f14e612580d3c171e7c5ffd0a5fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 8 17:59:41 2018 -0500

    Output threading status/params from testsuite.
    
    Details:
    - Updated testsuite to output various parameters related to parallelism
      in BLIS. These parameters include:
      - threading status: disabled, openmp, or pthreads;
      - thread partitioning for jr/ir loops: slab or rr (round-robin);
      - ways of parallelism from environment variables, and also actual
        values used by gemm, herk, trmm_l, trmm_r, trsm_l, and trsm_r for
        square problems (assuming all dimensions are set to 1000);
      - automatic thread factorization parameters.
    - Also output the status of two relatively new configure-time options:
      libmemkind and the sandbox.

commit 10f179fb13fc1179921a4ef8efdd2174f01e07da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 8 14:36:38 2018 -0500

    Updated irun.py to use updated column index range.
    
    Details:
    - Updated the irun.py script so that it updates the matlab column index
      range (if found) to reflect the additional columns of data that are
      substituted in. Thanks to Devangi Parikh for recognizing and reporting
      this issue.

commit c244a716c97849dee41f52b5f424116aae1b710b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Oct 7 20:59:40 2018 -0500

    Added missing -r option to configure --help output.
    
    Details:
    - Added inadvertantly-omitted mention of -r option-equivalent to
      --thread-part-jrir to the output for 'configure --help'. Also made
      minor edits to the same text.

commit c92762ecdca1eb0b08c8acd583b4739a1e3fbd39
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Oct 7 20:30:32 2018 -0500

    Added option of slab or rr partitioning in jr/ir.
    
    Details:
    - Updated existing macrokernel function names and definitions to
      explicitly use slab assignment of micropanels to threads, then created
      duplicate versions of macrokernels that explicitly use round-robin
      assignment instead of slab. NOTE: As in ac18949, trsm_r macrokernels
      were not substantially updated in this commit because they are
      currently disabled in bli_trsm_front.c.
    - Updated existing packing function (in blk_packm_blk_var1.c) to
      explicitly use slab partitioning, and then duplicated for round-robin.
    - Updated control tree initialization to use the appropriate macrokernel
      and packm function pointers depending on which method (slab or rr) was
      enabled at configure-time.
    - Updated configure script to accept new --thread-part-jrir=[slab|rr]
      option (-m [slab|rr] for short), which allows the user to explicitly
      request either slab or round-robin assignment (partitioning) of
      micropanels to threads.
    - Updated sandbox/ref99 according to above changes.
    - Minor updates to build/add-copyright.py.

commit 98e01ea04bfe1032e5bd4781043afd84f864a19e
Merge: ac18949a 541b8a3b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 4 20:44:12 2018 -0500

    Merge branch 'master' into amd

commit 541b8a3b3e9af4078f5e6fb2f9608d681839952a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 4 20:39:06 2018 -0500

    Removed 1h short-circuit from bli_clock_min_diff().
    
    Details:
    - Removed a guard from bli_clock_min_diff() that would return 0 if the
      time delta was greater than 60 minutes. This was originally intended
      to disregard extremely large values under the assumption that the
      user probably didn't intend to run a test that long. However, since
      it is in bli_clock_min_diff(), it doesn't actually help short-circuit
      an implementation that is hanging or looping infinitely, since such
      an implementation would first have to finish before the
      bli_clock_min_diff() is called. Thanks to Kiran Varaganti for
      reporting this issue.

commit f0c3ef359f7c6c1687fb2671cb35deb346e00597
Author: Kiran V <Kiran.Varaganti@amd.com>
Date:   Thu Oct 4 16:32:21 2018 +0530

    This is a fix to floating-point exception error for BLIS SGEMM with larger matrix sizes.
    BUG No: CPUPL-197 fixed by Thangaraj Santanu
    The bli_clock_min_diff() function in BLIS assumed that if the time taken is greater than 1 hour then the reading must be wrong. However this is not the case in general, while the other checks such as time taken closer to zero or nsec is ofcourse valid.
    gerrit review: http://git.amd.com:8080/#/c/118694/1/frame/base/bli_clock.c
    
    Change-Id: I9dc313d7c5fdc20684f67a516bf3237de3e0694a

commit 8bf30eb4735872388b5317883d99b775a344ce25
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Wed Oct 3 22:22:29 2018 -0400

    Fixed runme.sh in test/studies/thunderx2
    
    Details:
    - Fixed the setting of threads for a single core run.

commit f6f2456ba2afa8f85f43c7c2c90acc439d61d94f
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Wed Oct 3 21:43:46 2018 -0400

    Fixed the Makefile in test/studies/thunderx2
    
    Details:
    - Fixed target for make-all-st and make-all-mt so that the armpl
      targets are built

commit 743a1a6dec1bd3908f0f15513b501c9bd59715b3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 3 14:40:10 2018 -0500

    Fixed misleading version query from gcc 7+.
    
    Details:
    - gcc 7 introduced new behavior to the -dumpversion option whereby only
      the major version component is output. However, as part of this
      change, gcc 7 also introduced a new option, -dumpfullversion, which is
      guaranteed to always output the major, minor, and revision numbers. If
      we are using gcc 7 or later, we re-query the version string with this
      new option and then re-parse the result so as to avoid misleading
      output from configure (e.g. using gcc 7.3.0 is reported as 7.7.7).

commit de07840ba5672b9d7b2ed2b918974e98c3f249fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 3 13:57:25 2018 -0500

    Whitespace, https updates to README.md.
    
    Details:
    - Reformatted to fit all lines within 80 columns, unless a link is too
      long to fit on a single line.
    - Changed some links from http to https.

commit 80a8b3dd8034ec8bc03d31be3f9c837c3f6fc94b
Author: sraut <Biplab.Raut@amd.com>
Date:   Wed Oct 3 15:30:33 2018 +0530

    Review comments incorporated for small TRSM.
    
    Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9

commit b8dfd82e0d1afda4ee5436662d63515a59b2dee3
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 15:37:12 2018 -0500

    Get pthreads via blis.h in the test driver.

commit d0c0c20b7bd3ecf914b5910a50f618fb7d7aa355
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 15:16:00 2018 -0500

    There seems to be a problem with _POSIX_BARRIERS on Travis.

commit 0904d9e4df0c8a256ac35c491f14a587ebe9fca2
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 15:04:36 2018 -0500

    *Always* use Windows primitives instead of pthreads.

commit 998317d309934cd7129f8c818ea6e5f07534ebc8
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 14:43:24 2018 -0500

    Remove pthreads from appveyor build.

commit 627d0c5bfd4b7b149803587391c93b164c11ced5
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 14:40:55 2018 -0500

    Combine the alternative barrier implementation for macOS with the pthread wrapper for Windows. Also implement pthread_{create,join} for Windows.

commit 81d2c064a209df7eca7d6103696ca3a137a7f82e
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 11:46:36 2018 -0500

    Add wrapper for basic pthreads functionality (mutex, once) with MSVC.

commit d33f130ea621fca1dccb30631f454d237918eb04
Author: Devin Matthews <damatthews@smu.edu>
Date:   Tue Oct 2 11:45:43 2018 -0500

    Some configure changes:
    
    1) Allow environment variables to be set anywhere in the argument list.
    2) Allow any environment variable to be set.
    3) Allow LIBPHTREAD to be set to null without getting defaulted to -lpthread.

commit 9d5f1c4f3bf70c2c0ea84bfa326a0113ae2d176c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 1 17:39:26 2018 -0500

    Patch to avoid gcc warning in blastest/f2c/open.c.
    
    Details:
    - Use the modulo operator to limit the size of an integer that is given
      to sprintf(). This avoids a warning in some versions of gcc about the
      integer potentially overflowing the available space in the string into
      which the integer is being printed.

commit 0c3cd00ba76de607e807f8deb04b1a2ce18ea7a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 1 16:18:25 2018 -0500

    More README.md updates.
    
    Details:
    - Replaced much of "Getting Started" section with a shortened version of
      the bullet list of documentation currently shown in the github wiki
      page. Thanks to Devangi Parikh for her feedback in this change.

commit 8eaf34bd23b30a1857a50d7142ee9811895f24bf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 1 14:29:07 2018 -0500

    Very minor README.md update.

commit 599090e0eb41b2706fa1231fa7b90096f3281678
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 1 14:04:30 2018 -0500

    README.md update.
    
    Details:
    - Added language mentioning SHPC group to Introduction.

commit ee46fa3efb6e920fa6c3d0b0601007f5de31deb5
Author: sraut <Biplab.Raut@amd.com>
Date:   Mon Oct 1 16:30:30 2018 +0530

    Small TRSM optimization changes :- 1) single precision small trsm kernels for XAt=B case are further optimized for performance. 2) double precision small trsm kernels for AX=B and XAtB cases are implemented. 3) single precision small trsm kernels for AutX=B are implemented in intrinsics to improve the current performance.
    
    Change-Id: Ic9d67ae6d8522615257dde018903f049dcffa2cf

commit 08045a6c52b6e025652c5b18eb120c0f4e61cf6f
Author: sraut <Biplab.Raut@amd.com>
Date:   Mon Oct 1 15:38:23 2018 +0530

    Corrected the fix made for  blastest level-3 failure to check m,n,k non-zero condition in bli_gemm_small.c
    
    Change-Id: Idaf9f2327c3127b04a2738ae8a058b83d6c57934

commit ac18949a4b9613741b9ea8e5026d8083acef6fe4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Sep 30 18:54:56 2018 -0500

    Multithreading optimizations for l3 macrokernels.
    
    Details:
    - Adjusted the method by which micropanels are assigned to threads in
      the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly)
      employ contiguous "slab" partitioning rather than interleaved (round
      robin) partitioning. The new partitioning schemes and related details
      for specific families of operations are listed below:
      - gemm: slab partitioning.
      - herk: slab partitioning for region corresponding to non-triangular
              region of C; round robin partitioning for triangular region.
      - trmm: slab partitioning for region corresponding to non-triangular
              region of B; round robin partitioning for triangular region.
              (NOTE: This affects both left- and right-side macrokernels:
              trmm_ll, trmm_lu, trmm_rl, trmm_ru.)
      - trsm: slab partitioning.
              (NOTE: This only affects only left-side macrokernels trsm_ll,
              trsm_lu; right-side macrokernels were not touched.)
      Also note that the previous macrokernels were preserved inside of
      the 'other' directory of each operation family directory (e.g.
      frame/3/gemm/other, frame/3/herk/other, etc).
    - Updated gemm macrokernel in sandbox/ref99 in light of above changes
      and fixed a stale function pointer type in blx_gemm_int.c
      (gemm_voft -> gemm_var_oft).
    - Added standalone test drivers in test/3m4m for herk, trmm, and trsm
      and minor changes to test/3m4m/Makefile.
    - Updated the arguments and definitions of bli_*_get_next_[ab]_upanel()
      and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h.
    - Renamed bli_thread_get_range*() APIs to bli_thread_range*().

commit b952ca8feb6f17f71a4512649c2aa72bdee9c8f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 28 16:12:32 2018 -0500

    CREDITS file update.

commit 7d96fc437ebaa9dd2d7071865b5df16402fadd64
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 28 15:40:45 2018 -0500

    Allow slashes ('/') in version tags.
    
    Details:
    - Updated the configure script to allow slashes in version string. This
      is needed so that downstream maintainers (such as those for Debian)
      can create local tags such as "upstream/0.4.1". Thanks to M. Zhou for
      reporting this issue via PR #256 and providing me the information
      needed to debug the problem.

commit 5fdddf6f37c64da093c7f59e3a85214e819ae652
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 28 11:25:54 2018 -0500

    Removed 'debian' directory.
    
    Details:
    - Removed the top-level 'debian' directory. This directory is apparently
      no longer needed (issue #257). Thanks to M. Zhou and Nico Schlömer for
      their contributions.

commit 9814cfdf3157ef4726ee604fc895d56e8063d765
Author: Meghana <meghana.vankadari@amd.com>
Date:   Fri Sep 28 11:02:39 2018 +0530

    fixed blastest level-3 failure by adding ((M&N&K) != 0) to check condition in bli_gemm_small.c
    
    Change-Id: I85e4a32996ebb880f3c00bd293edc38f74700fe6

commit 86330953b14c180862deef3ccdcc6431259be27b
Merge: 7af5283d 807a6548
Author: praveeng <praveen.g@amd.com>
Date:   Fri Sep 28 10:08:06 2018 +0530

    Resolved conflicts and modified bli_trsm_small.c
    
    Change-Id: I578d419cff658003e0fdd4c4cdc93145d951ce31

commit 60b2650d7406d266feffe232c2d5692a9e3886d0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 24 15:04:45 2018 -0500

    Added statistics-collecting irun.py script.
    
    Details:
    - Added irun.py script to 'build' directory. This irun.py script is a
      python script for repeatedly invoking a test driver executable, such
      as those found in test/3m4m, and replace the performance output column
      with four columns that aggregate statistics. Specifically, the script
      reports the minimum, average, maximum, and standard deviation for each
      problem size. This script is useful especially (though not
      exclusively) when trying to determine the impact of relatively minor
      changes to the code, or other small optimizations that may be
      difficult to distinguish from "noise." One way this "noise" manifests
      is that a test executable may run slightly slower or faster for all
      problem sizes (and all implementations) tested by the executable over
      the life of a single execution. The cause of these minor
      across-the-board pertubations in the overall performance signatures is
      unknown, though we hypothesize that it may relate to any number of
      issues such as operating system scheduling, where in memory the
      program is loaded, or how the CPU clock frequency is throttled at the
      time of execution. Regardless of the source of these subtle
      performance anomalies, the statistical properties reported by the
      irun.py script help the user to more precisely characterize the
      underlying performance exhibited by any given test driver, which
      allows him or her to make better judgments about the true difference
      in performance between two implementations, or minor changes within a
      single implementation.

commit 807a654888117fb3a27ea36384f1c1c11b882cd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 20 15:41:05 2018 -0500

    Fixed confusing configure message for libmemkind.
    
    Details:
    - Corrected feedback echoed to user by configure when libmemkind is
      found but not explicitly requested. In these cases, configure would
      echo a message that it had received an explicit request to enable
      libmemkind, which was not accurate, even if the end result was the
      same--that libmemkind is enabled by default when it is found. Thanks
      To Devangi Parikh for reporting this issue.

commit 02adab427c779b0aaf38a5877a5f0246b1909e8f
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Thu Sep 20 14:38:50 2018 -0400

    Created a 'thunderx2' subdirectory within test/studies
    
    Details:
    - Created a 'thunderx2' subdirectory within test/studies to house
      various level-3 test driver used to measure performance on
      ThunderX2.

commit d7537fb51dac0636591fc7c68261a2322642ab3c
Merge: dad07245 c03728f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 12 15:24:20 2018 -0500

    Merge branch 'dev'

commit dad07245dbcfaf35232ec379ba756eb133c361c1
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Wed Sep 12 04:16:58 2018 -0500

    Fixed yet another bug in runme script in test/studies
    
    Details:
    - Fixed another copy-paste bug

commit e669057fe35f2037d8111af687d84a0ecf6d7a2a
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Tue Sep 11 22:29:42 2018 -0500

    Fixed bug in runme script in test/studies
    
    Details:
    - Fixed bug in runme script for skx studies that set the number of
      threads incorrectly

commit 232fdc3df3e01ae3f86d53767bd14eb93b511e6e
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Mon Sep 10 18:45:50 2018 -0500

    Updated runme script in test/studies.
    
    Details:
    - Updated runme script for skx studies to run multithreading tests
      on 1 and 2 sockets.

commit c03728f1f45edb5e434db90ab8a77ba0184a682b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 10 17:54:27 2018 -0500

    Various minor cleanups.
    
    Details:
    - Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
      unconditionally, but differently for Windows and non-Windows, but
      then disabled the definition of bli_setenv() entirely since BLIS
      no longer needs to set environment variables. Updated bli_winsys.h
      accordingly, and call bli_sleep() from within testsuite instead of
      sleep() directly.
    - Use
        #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
      instead of
        #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
      when guarding against local definition of pthread barrier in
      testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
      should always be set to 200809L when barriers are supported, though I
      won't be surprised if we encounter a case in the future where it is
      set to something else such as 1 while still supported.)
    - Removed old _VERS_CONF_INST definitions and installation rules in
      top-level Makefile. These are no longer needed because we no longer
      output libraries with the version and configuration name as
      substrings.
    - Comment/whitespace updates in Makefile, config.mk.in, common.mk,
      configure, bli_extern_defs.h, and test_libblis.h.
    - Added mention of 1m to README.md and other trivial tweaks.

commit e249a00a82908054ecd307cf602c8801275903e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 10 16:48:35 2018 -0500

    Imported skx dgemm ukernel from skx-redux branch.
    
    Details:
    - Added the new bli_dgemm_skx_asm_16x14.c microkernel from the skx-redux
      branch, along with appropriate blocksizes in bli_cntx_init_skx.c and
      a prototype in bli_kernels_skx.h. (Devin has not yet written the
      sgemm analague, so for now we will continue using the older sgemm
      ukernel.)
    - Updated frame/include/bli_x86_asm_macros.h with a minor change that
      was present within the skx-redux branch.

commit e93b01ff60bf9742baa5eefd93e208d1219e7a43
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Sun Sep 9 15:57:43 2018 -0500

    Windows DLL support (#246)
    
    * Enable shared
    
    * Enable rdp
    
    * Add support for dll
    
    * Use libblis-symbols.def
    
    * Fix building dlls
    
    * Fix libblis-symbols.def
    
    * Fix soname
    
    * Fix Makefile error
    
    * Fix install target
    
    * Fix missing symbols
    
    * Add BLIS_MINUS_TWO
    
    * Add path to dll
    
    * Fix OSX soname
    
    * Add declspec for dll
    
    * Add -DBLIS_BUILD_DLL
    
    * Replace @enable_shared@ in config
    
    * switch to auto for now
    
    * blis_ -> bli_
    
    * Remove BLIS_BUILD_DLL in make check
    
    * change auto->haswell
    
    * enable_shared_01
    
    * Add wno-macro-redefined
    
    * print out.cblat3
    
    * BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY
    
    * Use V=1
    
    * Remove fpic for windows
    
    * Remember LIBPTHREAD
    
    * Remove libm for windows
    
    * Remember AR
    
    * Fix remembering libpthread
    
    * Add Wno-maybe-uninitialized in only gcc
    
    * Don't do blastest for shared for now
    
    * Fix install target
    
    And remove unnecessary change
    
    * test auto and x86_64
    
    * Fix install target again
    
    * Use IS_WIN variable
    
    * Remove leading dot from LIBBLIS_SO_MAJ_EXT
    
    * Make is_win yes/no
    
    * Add comments for windows builds
    
    * Change if else blocks location

commit 1330d5c4bc3b644ec0af54c3939a5b9f00eacd9c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 7 19:37:59 2018 -0500

    Employ "user" cflags for tl Makefile test targets.
    
    Details:
    - Use get-user-cflags-for() to generate cflags when compiling BLAS test
      drivers and BLIS testsuite from top-level Makefile. Meant to include
      these changes in previous commit (4b5437e). Thanks to Isuru Fernando
      for pointing out this oversight.

commit 4b5437ec7afb2befffffbb83f7872bcb4fc61e51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 7 17:24:32 2018 -0500

    Define a cpp macro specific to BLIS compilation.
    
    Details:
    - Tweaked the cflags functions in common.mk so that a new preprocessor
      macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
      itself is being built. This macro will not be defined when, for
      example, the testsuite or example code compiles code local to those
      applications. This was done in part by defining a new cflags function
      get-user-cflags-for(), which is now the designated function for
      application Makefiles if they wish to inherit a basic set of CFLAGS
      from BLIS. (The compiler flags returned are identical to that of
      get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
      omitted.)
    - Updated all test driver-like makefiles to call get-user-cflags-for()
      instead of get-frame-cflags-for().

commit cc2cca4f56eb30212a0dce3e5c121e64d9e59560
Merge: e19e7212 fb81c7fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 6 17:12:13 2018 -0500

    Merge branch 'dev'

commit e19e7212872da3d464734199193436faa51f0da0
Merge: 97965b09 b3d0702c
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Thu Sep 6 14:58:49 2018 -0700

    Merge pull request #244 from kali/pthread-barrier-osx
    
    add an adhoc impl for pthread_barrier

commit b3d0702cf2ef6dda19a23dd8a677be1b6f73c322
Merge: 4e7d0670 97965b09
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Thu Sep 6 14:58:23 2018 -0700

    Merge branch 'master' into pthread-barrier-osx

commit 4e7d06700f176a62952d7d51e41fdcbc6b7a9d5f
Author: Mathieu Poumeyrol <kali@zoy.org>
Date:   Thu Sep 6 23:48:31 2018 +0200

    second __APPLE__

commit fb81c7fc665d68e6a2add163feb29acc0bce8936
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 6 16:29:39 2018 -0500

    Defined cortexa53 sub-configuration.
    
    Details:
    - Added a new sub-configuration 'cortexa53', which is a mirror image
      of cortexa57 except that it will use slightly different compiler
      flags. Thanks to Mathieu Poumeyrol for making this suggestion after
      discovering that the compiler flags being used by cortexa57 were
      not working properly in certain OS X environments (the fix to which
      is currently pending in pull request #245).

commit 24ecc0d94aaa9ab4df1ae6d199c4ec6d7783169f
Author: Mathieu Poumeyrol <kali@zoy.org>
Date:   Thu Sep 6 22:10:16 2018 +0200

    use _POSIX_BARRIERS instead of __APPLE__

commit 97965b09059a610db06fb7a22bdfa79c0d37d673
Author: Mathieu Poumeyrol <kali@users.noreply.github.com>
Date:   Thu Sep 6 21:10:29 2018 +0200

    cortexa9 and cortexa53 travis build + qemu test (#245)

commit a6802eab7d94b5a9de633c53beca8245b74f5dc6
Author: Mathieu Poumeyrol <kali@zoy.org>
Date:   Thu Sep 6 17:16:35 2018 +0200

    reinstantiate test on macos

commit d688a2b7e5a19cba44ea398a99e325e19b8fce50
Author: Mathieu Poumeyrol <kali@zoy.org>
Date:   Thu Sep 6 15:25:16 2018 +0200

    add an adhoc impl for pthread_barrier

commit ab9f9e684dc3ffbb70cc45b21c67af5d916919e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 30 15:14:02 2018 -0500

    CHANGELOG update (0.4.1)

commit 10fd614031307c46db3d893528d4e5fc31f490b3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 30 15:13:59 2018 -0500

    Version file update (0.4.1)

commit 08dd67c4b21244851f8416bd59159bea7a9c5b3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 30 15:12:13 2018 -0500

    ReleaseNotes.md update in advance of next version.

commit 4fa4cb0734e7de6505b5d6f1aeef3a5d5c89dcbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 29 18:06:41 2018 -0500

    Trivial comment header updates.
    
    Details:
    - Removed four trailing spaces after "BLIS" that occurs in most files'
      commented-out license headers.
    - Added UT copyright lines to some files. (These files previously had
      only AMD copyright lines but were contributed to by both UT and AMD.)
    - In some files' copyright lines, expanded 'The University of Texas' to
      'The University of Texas at Austin'.
    - Fixed various typos/misspellings in some license headers.

commit b051ffb815baf6c3ece2b5118b679fd9219d5780
Merge: 6f33d9de aaa549f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 29 17:06:48 2018 -0500

    Merge branch 'dev'

commit 6f33d9de21fbc2f579846b9104fb9d513753f79c
Author: Mathieu Poumeyrol <kali@users.noreply.github.com>
Date:   Wed Aug 29 23:48:22 2018 +0200

    fix compilation of armv7a kernels (#242)

commit 8199e339aefdd27019c7f3d8c99818d375d5400b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 27 07:00:12 2018 -0500

    Added testsuite threading to input.general.fast.
    
    Details:
    - Added lines associated with the testsuite's new threading option to
      input.general.fast. This change was intended for the previous commit
      (10d0735).

commit 10d07357afbb2d468837aa97369ef9a6d0610817
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 26 20:34:30 2018 -0500

    Better thread safety; added threading to testsuite.
    
    Details:
    - Replaced critical sections that were conditional upon multithreading
      being enabled (via pthreads or OpenMP) with unconditional use of
      pthreads mutexes. (Why pthreads? Because BLIS already requires it
      for its initialization mechanism: pthread_once().) This was done in
      bli_error.c, bli_gks.c, bli_l3_ind.c. Also, replaced usage of BLIS's
      mtx_t object and bli_mutex_*() API with pthread mutexes in
      bli_thread.c. The previous status quo could result in a race condition
      if the application called BLIS from more than one thread. The new
      pthread-based code should be completely agnostic to the application's
      threading configuration. Thanks to AMD for bringing to our attention
      the need for a thread-safety review.
    - Added an option to the testsuite to simulate application-level
      multithreading. Specifically, each thread maintains a counter that is
      incremented after each experiment. The thread only executes the
      experiment if: counter % n_threads == thread_id. In other words, the
      threads simply take turns executing each problem experiment. Also,
      POSIX guarantees that fprintf() will not intermingle output, so
      output was switched to fprintf() instead of libblis_test_fprintf().
    - Changed membrk_t objects to use pthread_mutex_t intead of mtx_t and
      replaced use of bli_mutex_init()/_finalize() in bli_membrk.c with
      wrappers to pthread_mutex_init()/_destroy().
    - Changed the implementation of bli_l3_ind_oper_enable_only() to fix
      a race condition; specifically, two threads calling the function with
      the same parameters could lead to a non-deterministic outcome.
    - Added #include <pthread.h> to bli_cpuid.c and moved the same in
      bli_arch.c.
    - Added 'const' to declaration of OPT_MARKER in bli_getopt.c.
    - Added #include <pthread.h> to bli_system.h.
    - Added add-copyright.py script to automate adding new copyright lines
      to (and updating existing lines of) source files.

commit aaa549f4d1e63929fe2bea023ce849253cfbbb42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 26 20:13:51 2018 -0500

    Minor update to configure --help (--sharedir option).
    
    Details:
    - Fixed/tweaked description for --sharedir=SHAREDIR option.

commit 573b8ac373f821a65cc8afd51cdbe03b8ec01081
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 26 13:51:32 2018 -0500

    Fixed copy-paste typo in previous commit.
    
    Details:
    - Fixed a typo in travis/do_testsuite.sh introduced in 62ea1d3.

commit 62ea1d33d3bc1e890420a1e828b9d0e87e87533b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 26 13:35:53 2018 -0500

    Fixed broken out-of-tree builds.
    
    Details:
    - Fixed stale filepaths to check-blastest.sh and check-blistest.sh in
      travis/do_testsuite.sh and travis/do_sde.sh.
    - Create a symbolic link to the 'config' directory so that the top-level
      Makefile can find the configs' make_defs.mk files during out-of-tree
      builds.
    - Added additional case handling to out-of-tree scenario to handle
      situations where files 'Makefile', 'common.mk', or 'config' exist but
      are not symbolic links. In such cases, configure warns the user and
      exits.
    - Homogenized various error messages throughout configure.
    - Belated thanks to Victor Eijkhout for requesting the feature added
      in 0f491e9 whereby lesser Makefiles can compile and link against
      an existing installation of BLIS.

commit 0f491e994a7e14d4dfce26e6a51dba2bccad29a3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 25 20:12:36 2018 -0500

    Allow lesser Makefiles to reference installed BLIS.
    
    Details:
    - Updated the build system so that "lesser" Makefiles, such as those in
      belonging to example code or the testsuite, may be run even if the
      directory is orphaned from the original build tree. This allows a
      user to configure, compile, and install BLIS, delete the build tree
      (that is, the source distribution, or the build directory for out-
      of-tree builds) and then compile example or testsuite code and link
      against the installed copy of BLIS (provided the example or testsuite
      directory was preserved or obtained from another source). The only
      requirement is that make be invoked while setting the
      BLIS_INSTALL_PATH variable to the same installation prefix used when
      BLIS was configured. The easiest syntax is:
    
        make BLIS_INSTALL_PATH=/install/prefix
    
      though it's also permissible to set BLIS_INSTALL_PATH as an
      environment variable prior to running 'make'.
    - Updated all lesser Makefiles to implement the new aforementioned build
      behavior.
    - Relocated check-blastest.sh and check-blistest.sh from build to
      blastest and testsuite, respectively, so that if those directories are
      copied elsewhere the user can still run 'make check' locally.
    - Updated docs/Testsuite.md with language that mentions this new option
      of building/linking against an installed copy of BLIS.

commit 36ff92ce0d3b428b15b6cddc6f5944afe22e43ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 24 18:26:09 2018 -0500

    Missing C++ compiler no longer fatal to configure.
    
    Details:
    - Changed configure so that the absence of any C++ compiler from the
      pre-defined search list does not result in an exit. Instead, in this
      situation, the found_cxx variable is assigned 'c++notfound' and the
      error message is changed to remind the user that C++ will not be
      available in the sandbox. Thanks to Devangi Parikh for reporting this
      issue.
    - Also tweaked the message when a C++ compiler *is* found to remind any
      would-be confused user that BLIS will only use C++ if it is needed by
      code in the sandbox.

commit 658f0a129bdc565b072696b6ebddce501132091c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 24 17:49:37 2018 -0500

    Fixed obscure integer size bug in va_arg() usage.
    
    Details:
    - Fixed a bug in the way that the variadic bli_cntx_set_l3_nat_ukrs()
      function was defined. This function is meant to take a microkernel id,
      microkernel datatype, microkernel address, and microkernel preference
      as arguments, and is typically called within the bli_cntx_init_*()
      function defined within a sub-configuration for initializing an
      appropriate context. The problem is with the final argument: the
      microkernel preference. These preferences are actually boolean values,
      0 or 1 (encoded as FALSE or TRUE). Since the variadic function does
      not give the compiler any type information for any variadic arguments,
      they are "promoted" in the course of internal (macroized) processing
      according to default argument promotion rules. Thus, integer literals
      such as 0 and 1 become int and floating-point literals (such as 0.0 or
      1.0) become double. Previous to this commit, we indicated to va_arg()
      that the ukernel preference was a 'bool_t', which is a typedef of
      int64_t on 64-bit systems. On systems where int is defined as 64 bits,
      no problems manifest since int is the same size as the type we passed
      in to va_arg(), but on systems where int is 32 bits, the ukernel
      preference could be misinterpreted as a garbage value. (This was
      observed on a modern armv8 system.) The fix was to interpret the
      bool_t value as int and then immediately typecast it to and store it
      as a bool_t. Special thanks to Devangi Parikh for helping track down
      this issue, including deciphering the use of va_arg() and its
      byzantine treatment of types.
    - Added explicit typecasts for all invocations of va_arg() in
      bli_cntx.c.

commit e71dc389120b032e42091e4d1a928515ed6f7275
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 24 15:56:04 2018 -0500

    Fixed a very minor memory leak in gks.
    
    Details:
    - Fixed a memory leak in the global kernel structure that resulted in 56
      bytes per configured architecture (of which only 18 are presently
      supported by BLIS). The leak would only manifest if BLIS was
      initialized and then finalized before the application terminated.
      Thanks to Devangi Parikh for helping track down this leak.

commit a7e3a5f9753468c8e665e6c5c3b38d22b7c92500
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 24 14:51:11 2018 -0500

    Fixed uncallable bli_finalize().
    
    Details:
    - Previously, bli_finalize_once()--which, like bli_init_once(), was
      implemented in terms of pthread_once()--was using the same
      pthread_once_t control object being used by bli_init(), thus
      guaranteeing that it would never be called as long as BLIS had already
      been initialized. This could manifest as a rather large memory leak to
      any application that attempted to finalize BLIS midway through its
      execution (since BLIS reserves several megabytes of storage for
      packing buffers per thread used). The fix entailed giving each
      function its own pthread_once_t object. Thanks to Devangi Parikh for
      helping track down this very quiet bug.

commit a79c21c7c17fb4854fd24c73b81ec5543f74082d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 23 14:40:46 2018 -0500

    Fixed cleanmk target post-1b0f8d6.
    
    Details:
    - Changed the cleanmk target to delete makefile fragments from their new
      home in obj/$(CONFIG_NAME). The old definition worked only because of
      a typo (REFERKN_PATH instead of REFKERN_PATH), and only in the
      non-verbose (V != 1) case.

commit ffb57242f3eb1175c991fe1b492595fdaa175c27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 22 18:22:41 2018 -0500

    Cosmetic output changes to configure.
    
    Details:
    - Disable sandbox-related obj directory creation, directory mirroring,
      and makefile fragment generation when a sandbox is not enabled.
    - Prevent various duplicate actions by configure (such as those
      mentioned above for sandboxes above).

commit ac17454aae9ad430f05aa7c156919c6c695c300c
Merge: a77bec76 7afd095a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 22 15:34:53 2018 -0500

    Merge branch 'master' into dev

commit a77bec766a01e42f13f8cacbec8c4cbde8ecefef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 22 15:31:29 2018 -0500

    Whitespace changes, minor renames in build system.
    
    Details:
    - Minor whitespace cleanup, mostly in the form of spaces -> tabs.
    - Shortened certain variables' _FRAGMENT_ infixes to _FRAG_ in
      common.mk.

commit 1b0f8d60d1132b56485cc202ebf1246898d3a2a4
Author: Devin Matthews <damatthews@smu.edu>
Date:   Wed Aug 22 13:19:29 2018 -0700

    Generate makefile fragments in build tree (#240)
    
    * Make src dir read-only in out-of-tree build test.
    
    * Generate makefile fragments in the build tree.

commit 7afd095af33690e0175903852b354c9fe46993f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 22 14:58:24 2018 -0500

    Removed skx from code snippet in previous commit.
    
    Details:
    - The docs/ConfigurationHowTo.md document was written with examples that
      did not yet contain the skx sub-configuration, but the previous commit
      included bli_arch.c code copied and pasted from a recent commit that
      does support skx. To keep things consistent, I've removed skx from the
      recently-added ConfigurationHowTo.md code snippet.

commit 48211a980d78673133076e8eced1007b1980f5e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 22 14:55:02 2018 -0500

    Update to docs/ConfigurationHowTo.md.
    
    Details:
    - Added missing language directing the reader to modify the config_name
      string array in bli_arch.c when adding a new sub-configuration. Thanks
      to Devangi Parikh for reporting this missing section.

commit 65c9096c6e21f3dc2947fa12be9ea3034f8662dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 17 11:44:12 2018 -0500

    Fixed broken -p option to configure.
    
    Details:
    - Fixed some stale code that was preventing the -p option to configure
      from working as expected (though the --prefix option was unaffected).
      This bug was was most likely introduced in  7e5648c (May 7 2018).
      Thanks to Dave Love for reporting this issue.

commit e358d5e497c77b305af462f44266370a596445e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 16 12:18:45 2018 -0500

    README.md update (Funding section).

commit a61dd5e7bcf23f7237d407a5e06dd44e1bec9ad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 14 17:08:03 2018 -0500

    Changed 'test' target to be more like 'check'.
    
    Details:
    - Redefined the 'test' make target in the top-level Makefile so that the
      final result ("everything passed" or at "least one failure") is echoed
      to stdout. Note that 'check' is unchanged, and thus is now effectively
      a fast version of 'test'.
    - Updated docs/BuildSystem.md to reflect the above change.

commit ce5c3a198a7ae1ca676c27da4541d51ed19d16e1
Merge: 4f6745d6 0bbe69d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 14 16:52:19 2018 -0500

    Merge branch 'master' of github.com:flame/blis

commit 4f6745d68a2c66511695eff0beb00a82ffc6bbbe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 14 16:50:47 2018 -0500

    Fixed link error when building only shared library.
    
    Details:
    - Fixed a linker error that occurred when attempting to compile and link
      the testsuite and/or BLAS test drivers after having configured BLIS to
      only generate a shared library (no static library). The chosen
      solution involved
      (1) adding the local library path, $(BASE_LIB_PATH), to the search
          paths for the shared library via the link option
          -Wl,-rpath,$(BASE_LIB_PATH).
      (2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
          version number so that ld would find the shared library at
          execution time.
      Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
      pointing out the need for the -rpath option, and to Devangi Parikh for
      helping Sajid isolate the problem.
    - Added #include <ctype.h> to bli_system.h to avoid a compiler warning
      resulting from using toupper() from bli_string.c without a prototype.
      Thanks again to Sajid Ali, whose build log revealed this compiler
      warning.
    - Added '*.so.*' to .gitignore.
    - CREDITS file update.

commit 0bbe69d5ed260849297d8f2d35b7668d167482ed
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Tue Aug 14 14:49:58 2018 -0500

    Updated plotting scripts in test/studies.
    
    Details:
    - Fixed indexing on plots to correspond to the removal of dtime in
      the test drivers.

commit e93e0e149e087e08eca2885f1a748a4e88ffe55d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 7 15:54:30 2018 -0500

    Removed redefinition of axpyv, scal2v func types.
    
    Details:
    - Removed a stray/accidental redefinition of axpyv and scal2v function
      types in frame/1d/bli_l1d_ft.h (probably a copy/paste leftover during
      development).

commit 1deb33bd16349aaa643694d1bd685ff8a9a5f476
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 7 15:02:50 2018 -0500

    Updated penryn kernels to use new _ker_ft type names.
    
    Details:
    - Updated older _ft kernel type suffixes used within penryn level-1v
      and -1f kernels to use the newer _ker_ft suffix that was introduced
      in 0175483. (Thank you Travis CI.)

commit 9cb0b023ca91abdc056d726cdc070062e4954611
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 7 14:21:07 2018 -0500

    INSTALL file update.

commit 017548314f3f78f66fbe3264509ac5302bd8d62b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 7 14:13:25 2018 -0500

    Replaced function chooser macros w/ func ptr arrays.
    
    Details:
    - Previously, most object API functions (_oapi.c) used a function
      chooser macro that would expand out to an if-elseif-elseif-else
      conditional that used a num_t datatype to call the appropriate
      type-specific API (_tapi.c). This always felt a little hackish, and
      would get in the way somewhat of addig support for new num_t datatypes
      in the future. So, I've replaced that functionality with code that
      queries a function pointer that is then typecast appropriately. This
      model of function calling was already pervasive for kernels queried
      from the cntx_t structure. It was also already in use in various other
      functions, such as macrokernels, and this commit simply extends that
      pattern.
    - The above change required many new files, mostly header files, that
      define the function types (mostly _ft.h) for the queriable functions
      as well as some source files to define the function pointer arrays and
      their corresponding query functions (_fpa.c). Various other function
      types, mostly for kernel function types, were renamed to reduce the
      potential for confusion with the function types for expert and basic
      (non-expert) typed API functions.
    - Removed definitions for all of the "bli_call_ft_*()" function chooser
      macros from bli_misc_macro_defs.h.

commit addce089664561f9f63efa6f107e58fc48d29871
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 6 13:18:20 2018 -0500

    Format spec and other updates in test, test/3m4m.
    
    Details:
    - Removed the dtime (delta time, or wallclock time) column from the
      matlab output of all test drivers in test, test/3m4m, test/studies.
      This value was rarely (if ever) really needed and usually only served
      to take up screen space.
    - Updated format specifier in test/studies/skx to use %7.2f instead of
      %6.3f.
    - For the test drivers in 'test' directory, added an initial line of
      output that sets last entry of matlab matrix to zero in order to
      induce a pre-allocation of the entire array of performance results.

commit 94d5ef42c833a4d43e50a80d46dddbd7a56d2db6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 4 15:57:17 2018 -0500

    Adjusted gflops format spec in testsuite, test/3m4m.
    
    Details:
    - Changed the format specifier for the gflops column in the testsuite
      output from %7.3f to %7.2f. This was done mainly to keep the output
      aligned properly when the expected perfomance exceeded 1000 gflops.
      Also, two decimal places still conveys plenty of precision for all
      practical applications, including just eyeballing performance deltas
      between two executions (let alone two implementations).
    - Changed the format specifier for gflops in the test/3m4m drivers
      from %6.3f to %7.2f (for the same reasons listed above).

commit c7ff06bae92b9b6c6656f2030d13486b95417821
Merge: 6074082c ebe998d0
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Wed Aug 1 14:20:41 2018 -0500

    Merge branch 'master' of https://github.com/flame/blis

commit 6074082cd359dd775ef72478f8f3a281c5a6a6f9
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Wed Aug 1 13:30:51 2018 -0500

    Fixed bug in bli_cntx_set_packm_ker_dt() implementation.
    
    Details:
    - Fixed bug in static function bli_cntx_set_[packm/unpackm]_ker_dt(), which
       were incorrectly calling bli_cntx_get_[packm/unpackm]_ker_dt to get the
       corresponding func_t.

commit ebe998d06cc56a9a9d66990b6ebf683d6fd0efdf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 1 13:24:00 2018 -0500

    Fixed typos in BuildSystem.md from previuos commit.

commit e72a344e94c5ae253f69b60f41d92ca89a5d1d1c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 1 13:00:38 2018 -0500

    Added table of 'make' targets to BuildSystem.md.
    
    Details:
    - Added a new section to BuildSystem.md that describes the most useful
      make targets defined in the top-level Makefile.

commit 4f60d0288e00586dc921ff57db851f1266ff8e70
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 30 19:22:57 2018 -0500

    README.md, comment updates.
    
    Details:
    - Added links, and sandbox language to README.md.
    - Adjusted some comments in high-level level-3 object functions to make
      clear what bli_thread_init_rntm() does.

commit 455d3f49e5c8362395be14c79e6adb5123e29623
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 29 18:31:29 2018 -0500

    Edits to object/typed API, multithreading docs.

commit 922a1c05e06f52c97fb369870dce07233e61c4c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 20:15:55 2018 -0500

    More tweaks to README.md.

commit a7a0cf2b5d9f1dea5061c0f20eeaf371dfd4ea12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:59:31 2018 -0500

    More edits to docs/Multithreading.md.

commit be21d0cf68c330fd0d2048465a43ddc59d0b9d6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:46:51 2018 -0500

    Fixed typos in docs/Multithreading.md.

commit eac07c7b4f7a41c68d63f1e67141b2b58009609e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:45:28 2018 -0500

    Edits to docs/Multithreading.md.

commit 5438375a032273b46ae626fee909ffc05f48ab72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:34:21 2018 -0500

    Fixed link in README.md.

commit 1f1a237d3f0b24d71ce2d7ee52d8a84f8e6a29ad
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:33:28 2018 -0500

    Fixed links in BLISTypedAPI.md.

commit 89c8806e3aa49310f36c0314c5f6956c83a627a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:30:56 2018 -0500

    Minor doc fixes to previous commit.

commit b8c7574f84873b9c408f70c29c41ce464df57c2d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 28 16:27:09 2018 -0500

    README.md, typed/object API updates.
    
    Details:
    - Updated the typed and object APIs to include language on the rntm_t
      parameters in the expert interfaces.
    - Updated README to include link to object API.

commit 29c34c4adb02d91fb34d1ccc0e821d6cfb7ce5c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 27 16:26:19 2018 -0500

    CREDITS file update.

commit 55a04edf52ac4f16c51b738bc884684adc1f1777
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 27 16:10:46 2018 -0500

    CHANGELOG update (0.4.0)

commit 4ad61ce905d250dd3ef197f0d06a69ce6d99d309
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 27 16:10:43 2018 -0500

    Version file update (0.4.0)

commit b86cf13793b07f35c027a56c9faec8f4b6279d3e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 27 16:08:21 2018 -0500

    Release Notes update in advance of next version.

commit a8b4084a0e04e47ac02ceae93a2018f5363e1205
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 27 16:07:26 2018 -0500

    CREDITS file update.

commit 8e10cac5f388ac961c3d77b0a465214e7c9dc91a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 27 14:45:35 2018 -0500

    Updates to CREDITS, RELEASING, config/README.md.
    
    Details:
    - Added individuals' github handles to CREDITS file.
    - Updated RELEASING, config/README.md files.

commit 401b69c8f26a86726ac5e1fb4f9fc2d2098ef204
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 25 17:55:13 2018 -0500

    More indentation in docs/ConfigurationHowTo.md.

commit 1c6a1b921ef96999bb449d657cca6d9a556f7245
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 25 17:14:58 2018 -0500

    Trying new indentation in ConfigurationHowTo.md.
    
    Details:
    - Modified a few sections to take advantage of a feature of markdown
      that allows a bullet or enumeration to have multiple paragraphs. This
      is a trial run to make sure the indentation looks good when rendered
      in a web browser.

commit 71f978719527fcf17617cb234e48bf349a76c12d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 25 15:55:36 2018 -0500

    Whitespace changes to macrokernels' func ptr defs.

commit 87d57c31c2bfcf4609dfe31ce915e9345150e613
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 25 14:20:18 2018 -0500

    Various minor updates to typed, object API docs.

commit fb6e16268aaafbab2fd78d47cbf821e2152261fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 25 14:17:28 2018 -0500

    Consolidated prototypes in bli_l1v_tapi.h.
    
    Details:
    - Consolidated typed API function prototypes in bli_l1v_tapi.h by
      leveraging identical function signatures between operations.
    - Removed 'restrict' keyword since it is not actually present in the
      function definitions.

commit af60d738f21340ccb0903e6c87dbf6af4fc44fc0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 24 15:35:52 2018 -0500

    Finished object creation part of BLISObjectAPI.md.
    
    Details:
    - Filled in remaining section on object creation function reference
      of BLISObjectAPI.md. All object management functions demonstrated as
      part of the example code in examples/oapi are now documented, as well
      as some other functions that are not shown in the example code.
    - Updated variuos links (mostly in function index) to correctly point to
      the object API reference instead of the typed API reference.
    - Added documentation to getijm, setijm.

commit 8217a6a3b68382c62f016c658d337e6086112fef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 24 13:13:10 2018 -0500

    Moved sandbox README.md to docs/Sandboxes.md.
    
    Details:
    - Relocated sandbox/ref99/README.md to docs/Sandboxes.md and made minor
      edits to the document.

commit b7db29332394324ffd1a73c3847a75e9a5b38c8d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 19 11:14:30 2018 -0500

    Explicitly typecast return vals in static funcs.
    
    Details:
    - Added explicit typecasting to various functions (mostly static
      functions), primarily those in bli_param_macro_defs.h,
      bli_obj_macro_defs.h, bli_cntx.h, bli_cntl.h, and a few other header
      files.
    - This change was prompted by feedback from Jacob Gorm Hansen, who
      reported that #including "blis.h" from his application caused a
      gcc to output error messages (relating to types being returned
      mismatching the declared return types) when used via the C++ compiler
      front-end. This is the first pass of fixes, and we may need to
      iterate with additional follow-up commits (#233).

commit fa08e5ead95f9d757af6ab5b095a8bf131e3874d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 17 19:02:15 2018 -0500

    Fixed minor issues in ecbebe7 with mt disabled.
    
    Details:
    - Fixed an unused variable warning in frame/base/bli_rntm.c when
      multithreading is disabled.
    - Fixed a missing variable declaration in bli_thread_init_rntm_from_env()
      when multithreading is disabled.

commit ecbebe7c2e43950dfa369f71c2b83cabe348a046
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 17 18:37:32 2018 -0500

    Defined rntm_t to relocate cntx_t.thrloop (#235).
    
    Details:
    - Defined a new struct datatype, rntm_t (runtime), to house the thrloop
      field of the cntx_t (context). The thrloop array holds the number of
      ways of parallelism (thread "splits") to extract per level-3
      algorithmic loop until those values can be used to create a
      corresponding node in the thread control tree (thrinfo_t structure),
      which (for any given level-3 invocation) usually happens by the time
      the macrokernel is called for the first time.
    - Relocating the thrloop from the cntx_t remedies a thread-safety issue
      when invoking level-3 operations from two or more application threads.
      The race condition existed because the cntx_t, a pointer to which is
      usually queried from the global kernel structure (gks), is supposed to
      be a read-only. However, the previous code would write to the cntx_t's
      thrloop field *after* it had been queried, thus violating its read-only
      status. In practice, this would not cause a problem when a sequential
      application made a multithreaded call to BLIS, nor when two or more
      application threads used the same parallelization scheme when calling
      BLIS, because in either case all application theads would be using
      the same ways of parallelism for each loop. The true effects of the
      race condition were limited to situations where two or more application
      theads used *different* parallelization schemes for any given level-3
      call.
    - In remedying the above race condition, the application or calling
      library can now specify the parallelization scheme on a per-call basis.
      All that is required is that the thread encode its request for
      parallelism into the rntm_t struct prior to passing the address of the
      rntm_t to one of the expert interfaces of either the typed or object
      APIs. This allows, for example, one application thread to extract 4-way
      parallelism from a call to gemm while another application thread
      requests 2-way parallelism. Or, two threads could each request 4-way
      parallelism, but from different loops.
    - A rntm_t* parameter has been added to the function signatures of most
      of the level-3 implementation stack (with the most notable exception
      being packm) as well as all level-1v, -1d, -1f, -1m, and -2 expert
      APIs. (A few internal functions gained the rntm_t* parameter even
      though they currently have no use for it, such as bli_l3_packm().)
      This required some internal calls to some of those functions to
      be updated since BLIS was already using those operations internally
      via the expert interfaces. For situations where a rntm_t object is
      not available, such as within packm/unpackm implementations, NULL is
      passed in to the relevant expert interfaces. This is acceptable for
      now since parallelism is not obtained for non-level-3 operations.
    - Revamped how global parallelism is encoded. First, the conventional
      environment variables such as BLIS_NUM_THREADS and BLIS_*_NT  are only
      read once, at library initialization. (Thanks to Nathaniel Smith for
      suggesting this to avoid repeated calls getenv(), which can be slow.)
      Those values are recorded to a global rntm_t object. Public APIs, in
      bli_thread.c, are still available to get/set these values from the
      global rntm_t, though now the "set" functions have additional logic
      to ensure that the values are set in a synchronous manner via a mutex.
      If/when NULL is passed into an expert API (meaning the user opted to
      not provide a custom rntm_t), the values from the global rntm_t are
      copied to a local rntm_t, which is then passed down the function stack.
      Calling a basic API is equivalent to calling the expert APIs with NULL
      for the cntx and rntm parameters, which means the semantic behavior of
      these basic APIs (vis-a-vis multithreading) is unchanged from before.
    - Renamed bli_cntx_set_thrloop_from_env() to bli_rntm_set_ways_for_op()
      and reimplemented, with the function now being able to treat the
      incoming rntm_t in a manner agnostic to its origin--whether it came
      from the application or is an internal copy of the global rntm_t.
    - Removed various global runtime APIs for setting the number of ways of
      parallelism for individual loops (e.g. bli_thread_set_*_nt()) as well
      as the corresponding "get" functions. The new model simplifies these
      interfaces so that one must either set the total number of threads, OR
      set all of the ways of parallelism for each loop simultaneously (in a
      single function call).
    - Updated sandbox/ref99 according to above changes.
    - Rewrote/augmented docs/Multithreading.md to document the three methods
      (and two specific ways within each method) of requesting parallelism
      in BLIS.
    - Removed old, disabled code from bli_l3_thrinfo.c.
    - Whitespace changes to code (e.g. bli_obj.c) and docs/BuildSystem.md.

commit 323eaaab99752858b12e81e2eb8e416f009a3028
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Fri Jul 13 11:40:06 2018 -0500

    Removed left over code from plotting scripts.

commit 60c197736495b47ce974ffb9b43874d1ebcfe78c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 12 19:22:14 2018 -0500

    Documented accessor functions in BLISObjectAPI.md.
    
    Details:
    - Added documentation to docs/BLISObjectAPI.md for a handful of
      commonly-used obj_t accessor functions.
    - Minor updates to docs/BLISTypedAPI.md.

commit 77327ad796e11ef67df0cc91d45ed663598ba4df
Merge: 73b0b2a3 9fef8575
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Thu Jul 12 17:09:33 2018 -0500

    Merge branch 'master' of https://github.com/flame/blis

commit 73b0b2a3ac1be6dfbe85c116886b4e29d98ac945
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date:   Thu Jul 12 16:53:10 2018 -0500

    Created hardware-specific test driver directory.
    
    Details:
    - Created a 'studies' subdirectory within 'test' to be used to house
       test drivers, makefiles, run scripts, matlab plot code, and related
       files that have been customized for collecting performance data on
       specific host machines or product lines. This new setup will help us
       catalog, track, and share test driver materials over time, and in a
       way that facilitates reproducibility.
    - Created an 'skx' subdirectory within 'test/studies' to house various
       level-3 test driver files used to measure performance on SkylakeX
       nodes (specifically, those nodes used by TACC's stampede2 system).

commit 9fef85756d15ee0f977fff6e57acd01c20cba184
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 11 18:40:30 2018 -0500

    Cleaned up loose ends in BLISObjectAPI.md.
    
    Details:
    - Deleted some lines from the API function signatures that did not
      belong (and were only left over from the copy-paste of the typed API).
    - Fixed some paragraph-in-bullet indentation.

commit 80ddeae4629022b69fdf1f1b053a1fcba643c40c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 11 18:31:57 2018 -0500

    Added BLISObjectAPI.md to docs.
    
    Details:
    - Added first draft of BLISObjectAPI.md. (Object management section is
      still missing.)
    - Small fixes to BLISTypedAPI.md found while writing BLISObjectAPI.md.
    - In various .md files, changed ``` verbatim blocks to language
      attributes (e.g. ```c for C code).

commit 038442add39ce629fee0d960b212ce0c95138d46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 11 12:24:18 2018 -0500

    Added -lpthread to makefile example in BuildSystem.md.
    
    Details:
    - Added missing pthreads library linking to example makefile in
      docs/BuildSystem.md, as well as similar language to build requirements
      at the beginning of the document. Thanks to Stefanos Mavros for
      bringing this to our attention.
    - Updated CREDITS file.

commit bf10d8624e7b5902c9d9189c7c93f318b8e1b9a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 9 18:40:13 2018 -0500

    Small updates to KernelsHowTo.md, BLISTypedAPI.md.
    
    Details:
    - Minor updates to BLISTypedAPI.md, mostly to bring terminology
      up-to-date with the new "typed API" classification.
    - Added contents section to KernelsHowTo.md.

commit 1fd3bce59e43b422e62f9684bca9d1296a29edc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 9 18:20:11 2018 -0500

    Further updates to KernelsHowTo.md, BLISTypedAPI.md.
    
    Details:
    - Added missing level-1v operations to BLISTypedAPI (e.g. axpbyv,
      xpbyv).
    - Updated broken linkes in KernelsHowTo.md based on misnamed anchors.
    - Other minor changes.

commit c40d30a6c920bd2e5a8353a3cd07a7e2b2265758
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 9 17:55:54 2018 -0500

    Updated KernelsHowTo.md, BLISTypedAPI.md.
    
    Details;
    - Added missing (basic) information in KernelsHowTo.md for level-1f and
      level-1v kernels.
    - Updated section regarding contexts.

commit f8913c2bf91c0e0fb4e68aedf64a242a19db92a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 20:35:13 2018 -0500

    Fixed outdated scalv() calls in penryn l1f kernels.
    
    Details:
    - Fixed stale calls to dscalv() from the dotxf and dotxaxpyf penryn
      kernels that were not updated during the basic/expert API separation
      in e88aeda.

commit e78e71d549ac17ecd52c7b33008df1cd78f1b59e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 20:18:09 2018 -0500

    Added README.md mention/link to examples/tapi.
    
    Details:
    - Added language to README.md to bring the reader's attention to the
      example code for the typed API (in addition to those for the object
      API).

commit 419ffb158573a26bfec47bac73e4394e7926a7b8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 20:14:23 2018 -0500

    Updates to README.md.
    
    Details:
    - Updated wiki links according to renamed/relocated files in 'docs'.
    - Converted links to relative paths.
    - Added link to docs/Multithreading.md.

commit 7d3e8a7e5f1ec299d009fb6c9071f0c1b089b460
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 20:01:29 2018 -0500

    Reverted docs/*.md links to relative paths.
    
    Details:
    - Within the documents in docs/*.md, reverted links to other local
      documents to relative paths.
    - Fixed some links/documents that did not yet have the '.md' suffix.
    - Testing whether we can use relative links ('docs/BLISTypedAPI.md')
      from within README.md.

commit d97c862c2b9170d774f414e63ae365488fffb4f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 19:40:41 2018 -0500

    Updated links (URLs) in docs/*.md.
    
    Details:
    - Updated most markdown links in the documents/wikis to use absolute
      paths instead of the relative paths that were in use previously.
      A few links were not updated, except for adding a ".md" to reflect
      the documents' new names, in order to test whether relative
      linking still works.

commit 3a0c12135875e0fb04de9798664e4fae632d994e
Merge: 2c7960c8 bcacddfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 16:51:38 2018 -0500

    Merge branch 'dev'

commit bcacddfad75b20969660606751eea6ead6c42ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 16:45:29 2018 -0500

    Added 'docs' directory with wiki markdown files.
    
    Details:
    - Exported all github wikis to a new 'docs' directory.
    - Renamed 'BLISAPIQuickReference' wiki to 'BLISTypedAPI' and removed
      all cntx_t* arguments from the (now non-expert) APIs (with the
      exception of the kernel APIs).
    - Added section to BuildSystem documenting new ARG_MAX hack.

commit 3ee2bc0f7aa3b08da92331d64271bee99eaf8c1d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 7 16:02:16 2018 -0500

    Renamed files that distinguish basic/expert APIs.
    
    Details:
    - Renamed various files that were previously named according to a
      "with context" or "without context" convention. For example, the
      following files in frame/3 were renamed:
    
        frame/3/bli_l3_oapi_woc.c -> frame/3/bli_l3_oapi_ba.c
        frame/3/bli_l3_oapi_wc.c  -> frame/3/bli_l3_oapi_ex.c
        frame/3/bli_l3_tapi_woc.c -> frame/3/bli_l3_tapi_ba.c
        frame/3/bli_l3_tapi_wc.c  -> frame/3/bli_l3_tapi_ex.c
    
      Here, the "ba" is for "basic" and "ex" is for "expert". This new
      naming scheme will make more sense especially if/when additional
      expert parameters are added to the expert APIs (typed and object).

commit e88aedae735dfeb6fa5ac28d4527eb3ca58c6510
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 6 19:14:02 2018 -0500

    Separated expert, non-expert typed APIs.
    
    Details:
    - Split existing typed APIs into two subsets of interfaces: one for use
      with expert parameters, such as the cntx_t*, and one without. This
      separation was already in place for the object APIs, and after this
      commit the typed and object APIs will have similar expert and non-
      expert APIs. The expert functions will be suffixed with "_ex" just as
      is the case for expert interfaces in the object APIs.
    - Updated internal invocations of typed APIs (functions such as
      bli_?setm() and bli_?scalv()) throughout BLIS to reflect use of the
      new explictly expert APIs.
    - Updated example code in examples/tapi to reflect the existence (and
      usage) of non-expert APIs.
    - Bumped the major soname version number in 'so_version'. While code
      compiled against a previous version/commit will likely still work
      (since the old typed function symbol names still exist in the new API,
      just with one less function argument) the semantics of the function
      have changed if the cntx_t* parameter the application passes in is
      non-NULL. For example, calling bli_daxpyv() with a non-NULL context
      does not behave the same way now as it did before; before, the
      context would be used in the computation, and now the context would
      be ignored since the interace for that function no longer expects a
      context argument.

commit 331694e52414c0cd50048daf880a9ace9e29b94a
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Fri Jul 6 09:07:38 2018 -0600

    Fix windows build and enable x86_64 on appveyor (#230)
    
    * Upload artifacts built on appveyor (#228)
    
    * Upload artifacts
    
    * Fix install in appveyor
    
    * Remove windows.h in bli_winsys.c (#229)
    
    Looks like it is unneeded.
    
    * Implemented ARG_MAX hack in configure, Makefile.
    
    Details:
    - Added support for --enable-arg-max-hack to configure, which will
      change the behavior of make when building BLIS so that rather than
      invoke the archiver/linker with all of the object files as command
      line arguments, those object files are echoed to a temporary file
      and then the archiver/linker is fed that temporary file via the @
      notation. An example of this can be found in the GNU make docs at
      https://www.gnu.org/software/make/manual/make.html#File-Function
    - Thanks to Isuru Fernando for prompting this feature.
    
    * Enable x86_64 and arg-max-hack on appveyor
    
    * Use gas style assembly for clang on windows

commit a64a780d28c99d35f237f59212772e9beff35b3e
Merge: 89e178ce 3cb396d1
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 6 09:38:42 2018 -0500

    Merge pull request #231 from flame/travis-pr
    
    Disable SDE for PRs

commit 3cb396d1ae4ee569f862db201c6a976712fd128e
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 6 09:19:44 2018 -0500

    Disable SDE for PRs
    
    Pull requests cannot use Travis secret variables, so SDE needs to be disabled. This PR should suffice as a test.

commit 2c7960c8416ee9b67364be5f2b210fd7a0aec4b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 5 14:38:33 2018 -0500

    Implemented ARG_MAX hack in configure, Makefile.
    
    Details:
    - Added support for --enable-arg-max-hack to configure, which will
      change the behavior of make when building BLIS so that rather than
      invoke the archiver/linker with all of the object files as command
      line arguments, those object files are echoed to a temporary file
      and then the archiver/linker is fed that temporary file via the @
      notation. An example of this can be found in the GNU make docs at
      https://www.gnu.org/software/make/manual/make.html#File-Function
    - Thanks to Isuru Fernando for prompting this feature.

commit c422a5cd191d47e6aeb9cea6de0e348f46e3e318
Merge: b6470262 89e178ce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 5 12:33:35 2018 -0500

    Merge branch 'dev'

commit b6470262ea66c0f48a5b4d85ca4bf85c1fb2b3af
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Jul 4 19:14:29 2018 -0600

    Remove windows.h in bli_winsys.c (#229)
    
    Looks like it is unneeded.

commit eac4bdf98691c5ec784af0dc11d1ad2269840661
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Jul 4 18:31:01 2018 -0600

    Upload artifacts built on appveyor (#228)
    
    * Upload artifacts
    
    * Fix install in appveyor

commit 89e178ce380439dea951925e33703dc4b979e914
Merge: d868eb3e e32b2ef9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 4 17:51:16 2018 -0500

    Merge branch 'master' into dev

commit e32b2ef983ea1c3521dd3821116c0078690f125e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 4 17:49:39 2018 -0500

    Update to CREDITS file.

commit 14648e137696484e0ff04f89b16c6b4183ea42b8
Author: Isuru Fernando <isuruf@gmail.com>
Date:   Wed Jul 4 16:48:42 2018 -0600

    Native windows support using clang (#227)
    
    * Add appveyor file
    
    * Build script
    
    * Remove fPIC for now
    
    * copy as
    
    * set CC and CXX
    
    * Change the order of immintrin.h
    
    * Fix testsuite header
    
    * Move testsuite defs to .c
    
    * Fix appveyor file
    
    * Remove fPIC again and fix strerror_r missing bug
    
    * Remove appveyor script
    
    * cd to blis directory
    
    * Fix sleep implementation
    
    * Add f2c_types_win.h
    
    * Fix f2c compilation
    
    * Remove rdp and rename appveyor.yml
    
    * Remove setenv declaration in test header
    
    * set CPICFLAGS to empty
    
    * Fix another immintrin.h issue
    
    * Escape CFLAGS and LDFLAGS
    
    * Fix more ?mmintrin.h issues
    
    * Build x86_64 in appveyor
    
    * override LIBM LIBPTHREAD AR AS
    
    * override pthreads in configure
    
    * Move windows definitions to bli_winsys.h
    
    * Fix LIBPTHREAD default value
    
    * Build intel64 in appveyor for now

commit b45ea92fc6f77f2313b50dbe95922f838cbead07
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 3 18:27:29 2018 -0500

    Added typed (BLAS-like) API code examples.
    
    Details:
    - Added new example code to examples/tapi demonstrating how to use the
      BLIS typed API. These code examples directly mirror the corresponding
      example code files in examples/oapi. This setup provides a convenient
      opportunity for newcomers to BLIS to compare and contrast the typed
      and object APIs when they are used to perform the same tasks.
    - Minor cleanups to examples/oapi.

commit d868eb3e200f657a1284c4cc933e7a4d25260dce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 29 12:36:04 2018 -0500

    Implemented bli_obj_scalar_cast_to().
    
    Details:
    - Implemented bli_obj_scalar_cast_to(), which will typecast the value in
      the internal scalar of an obj_t to a specified datatype.
    - Changed bli_obj_scalar_attach() so that the scalar value being attached
      is first typecast to the storage datatype of the destination object
      rather than the target datatype.
    - Reformatted function type signatures in bli_obj_scalar.c as well as
      prototypes  in its corresponding header file.

commit 52d80b5f09517d80ac8a7c96983a576c1ec2080b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 29 12:30:44 2018 -0500

    Fixed static funcs related to target and exec dts.
    
    Details:
    - Fixed incorrect bit shifts in the following static functions:
        bli_obj_set_target_domain()
        bli_obj_set_target_prec()
        bli_obj_set_exec_domain()
        bli_obj_set_exec_prec()
    - Fixed incorrect bitmask in bli_dt_proj_to_single_prec().
    - Updated bli_obj_real_part() and bli_obj_imag_part() so that it updates
      the target and exec datatypes (in addition to the storage datatypes).

commit e006f2d0eeb229c1cd05a424496a774c29bdc5d7
Merge: bd8c55fe dafca7a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 27 15:54:38 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit bd8c55fe268e8e352508341ebd739ef4fc68eb92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 27 15:52:37 2018 -0500

    Added dt_on_output field to auxinfo_t.
    
    Details:
    - Added a new field to the auxinfo_t struct that can be used, in theory,
      to request type conversion before the microkernel stores/accumulates
      its microtile back to memory.
    - Added the appropriate get/set static functions to bli_type_defs.h.

commit dafca7a0c2c72aaf15cb588b2bef6f246abb1905
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jun 25 16:20:10 2018 -0500

    Fix botched memory addressing in Penryn kernel (no effect for GAS output).

commit de493b0f349efebab98ab17f063d4d3d932c24c3
Merge: 195480be a7166feb
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jun 25 14:26:06 2018 -0500

    Merge pull request #226 from devinamatthews/dev
    
    Finish macroization of assembly ukernels.

commit 195480beb589db7d582646f556e855c611d4c3a9
Merge: 07c3d0a9 3f387ca3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 25 13:24:21 2018 -0500

    Merge branch 'master' into dev

commit 3f387ca35e42519f0d6a154814e4c8800fa2acb8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 25 12:32:03 2018 -0500

    Fixed bugs in configure's select_cc() function.
    
    Details:
    - This commit fixes several bugs in configure relating to selecting a C
      compiler. By dumb luck, two of the two bugs sort of cancelled each
      other out in most use cases, which manifested as the expected behavior.
      Thanks to Mathieu Poumeyrol for bringing this issue to our attention,
      and to Devin Matthews for suggesting the more portable way of
      capturing both stdout and stderr and suggesting a return code check
      instead of testing stdout/stderr.
    - The first bug: As the values of the compiler search list are iterated
      over, only stderr is captured when querying a compiler with --version
      rather than both stdout and stderr.
    - The second bug: After each query, a conditional attempted to test
      whether the query resulted in anything being output. That conditional
      erroneously was using "-z" instead of "-n" for non-emptiness. Thus,
      most of the time, stderr was empty (because the --version info was
      being output on stdout), and since it was empty, the -z conditional
      (intended to execute only when a compiler was found to be responsive)
      executed.
    - A third bug was also fixed in the way that the merged stdout/stderr
      output was tested for non-emptiness (moving the 'cat' invocation to
      another line and testing the contents of a variable instead).
    - The three bugs above have been fixed as part of a partial rewrite of
      the select_cc() function in terms of a return code check, which
      obviated the need to save the output of stdout and stderr.
    - The fourth bug involved a misnamed variable in the right-hand side
      of a statement intended to prepend CC to search_list when CC was
      non-empty. This typically did not manifest as a bug since usually CC
      (if it was set) was set to a value that was known to work.

commit a7166feb1053814b7dd27f3879ae38acfc9637fc
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jun 25 12:09:18 2018 -0500

    Finish macroization of assembly ukernels.

commit f986396c2af5de06283b9834112782afd0a8907e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 22 18:12:40 2018 -0500

    Added 'configure --help' text for CFLAGS, LDFLAGS.
    
    Details:
    - Added mention of the new support for preset CFLAGS, LDFLAGS to the
      bottom of the text output by './configure --help'.
    - Updated usage example to use 'haswell' instead of 'sandybridge'.

commit 884175d9ffb62e49535e6c1f7d58fb3b83e7e78f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 22 18:08:43 2018 -0500

    Added configure support for preset CFLAGS, LDFLAGS.
    
    Details:
    - Any preexisting values set to the CFLAGS environment variable (or the
      CFLAGS variable if given on the command line) are saved by configure
      for later inclusion (prepending, to be precise) along with the
      compiler flags automatically determined by the BLIS build system.
      LDFLAGS is treated in a similar manner.) Thanks to Dave Love for
      requesting this feature in issue #223 and Mathieu Poumeyrol for his
      support on this and a previous related issue.
    - Comment updates to build/config.mk.in.
    - Strip whitespace from return value of various cflags functions in
      common.mk.

commit 07c3d0a95190bd23f0cd2ef220deb3384d8378d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 21 12:35:07 2018 -0500

    Update to CREDITS file.

commit a1ebbbf158c7b34c9032ef45431bc610b6f14858
Merge: 17928b1c c81c6f23
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jun 20 15:37:53 2018 -0500

    Merge pull request #224 from devinamatthews/asm-macros
    
    Asm macros

commit c81c6f23b9547b5d55ae68fd5a3bbd8a78290b6b
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jun 20 15:20:44 2018 -0500

    Fix problem with inc and dec macros.

commit 5a63971c822fd452f97ba869625c8e87f6cbeebc
Merge: b4d94e54 17928b1c
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jun 20 14:07:49 2018 -0500

    Merge remote-tracking branch 'upstream/dev' into asm-macros

commit b4d94e54d44cf30e4bb452ca5263be3473c0582d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jun 20 14:07:24 2018 -0500

    Convert x86 microkernels to assembly macros.

commit 17928b1c9941aa58aef1f122c793e2b14e705267
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 19 17:59:03 2018 -0500

    Added static funcs bli_dt_domain(), bli_dt_prec().
    
    Details:
    - Added definitions of static functions bli_dt_domain()/bli_dt_prec(),
      which extract a dom_t domain or prec_t precision value, respectively,
      from a num_t datatype.
    - Changed the return types of bli_obj_domain() and bli_obj_prec() from
      objbits_t to dom_t and prec_t. (Not sure why they were ever set to
      return objbits_t.)

commit 5f7fbb7115b1bf532c169dfd9adef84c41a95031
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 19 15:38:55 2018 -0500

    Static funcs for projecting dt to single/double.
    
    Details:
    - Added static functions for projecting a datatype to single precision
      or double precision, both for obj_t's storage datatypes and standalone
      datatypes.

commit d4a22702c7a90273dc14f271db465c2e11e5b87e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 19 14:54:57 2018 -0500

    Set up haswell config for optional col-pref ukrs.
    
    Details:
    - Added two presently-disabled cpp blocks in bli_cntx_init_haswell.c to
      easily allow one to switch to a set of column-preferential gemm
      microkernels (in the haswell subconfiguration). The second column-
      preferring block sets the the register blocksizes to their appropriate
      values. However, cache blocksizes are left unchanged, and therefore are
      likely suboptimal. This should be addressed later.

commit f317c2e31bfc329cb6bb4e06005e45b9c8a9d6a7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 19 12:21:23 2018 -0500

    Added get/set static funcs for exec dt/dom/prec.
    
    Details:
    - Added functions to bli_obj_macro_defs.h to get and set the target
      domain and target precision bits in the obj_t, and also added the
      appropriate support in bli_type_defs.h.

commit e88a5b8da8c26caebd2b0fb73b30836fb5417c9c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 18 15:56:26 2018 -0500

    Implemented castm, castv operations.
    
    Details:
    - Implemented castm and castv operations, which behave like copym and
      copyv except where the obj_t operands can be of different datatypes.
      These new operations, however, unlike copym/copyv, do not build upon
      existing level-1v kernels.
    - Reorganized projm, projv into a 'proj' subdirectory of frame/base (to
      match the newly added frame/base/cast directory).
    - Added new macros to bli_gentfunc_macro_defs.h, _gentprot_macro_defs.h
      that insert GENTFUNC2/GENTPROT2 macros for all non-homogeneous datatype
      combinations. Previously, one had to invoke two additional macros--one
      which mixed domains only and another that included all remaining
      cases--in order to get full type combination coverage.
    - Defined a new static function, bli_set_dims_incs_2m(), to aid in the
      setting of various variables in the implementations of bli_??castm().
      This static function joins others like it in bli_param_macro_defs.h.
    - Comment update to bli_copysc.h.

commit 2000cdff59272974438e88e0e82d8e1a32710325
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 18 14:17:28 2018 -0500

    Update to CREDITS file.

commit ed2c8aed848ba2dede18df090cf2e0b6e4cc059f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 18 11:49:34 2018 -0500

    Temporarily disabled small matrix handling on zen.
    
    Details:
    - Disabled small matrix handling in config/zen/bli_family_zen.h due to
      what appears to be a bug that manifests as failures in the single and
      double precision real level-3 BLAS test drivers (visible via
      out.sblat3 and out.dblat3). Thanks to Robin Christ for reporting this
      issue.

commit ed20392c500940bfc0947795c1ff7c8c24f8e26f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 15 16:31:22 2018 -0500

    Added get/set static funcs for exec dt/dom/prec.
    
    Details:
    - Added functions to bli_obj_macro_defs.h to get and set the execution
      domain and execution precision bits in the obj_t.
    - Added/rearranged a few functions in bli_obj_macro_defs.h.
    - Renamed some macros in bli_type_defs.h: EXECUTION -> EXEC.

commit 22594e8e9ab55f5bc0e69d96a23e128502849999
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 14 17:35:23 2018 -0500

    Updated sandbox/ref99 according to f97a86f.
    
    Details:
    - Applied changes to ref99 sandbox analagous to those applied to
      framework code in f97a86f. This involves setting the pack schemas of
      A and B objects temporarily to communicate those desired schemas to
      the control tree creation function in blx_gemm_cntl.c. This allows us
      to (henceforth) query the schemas from the control tree rather than
      the context.

commit 1b5d0424d2c7e5eac33e02359c12917ef280949f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 13 18:41:32 2018 -0500

    Prototype column-preferential zen gemm ukernels.
    
    Details:
    - Added prototypes to bli_kernels_zen.h for each of the four gemm
      microkernels that prefer outputting to column storage.

commit f88c2e7a539e383297e846e6d4647058dd3db128
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 13 18:27:46 2018 -0500

    Defined static function bli_blksz_scale_def_max().
    
    Details:
    - Added a new static function to bli_blksz.h that scales both the default
      (regular) blocksize as well as the maximum blocksize in the blksz_t
      object. Reminder: maximum blocksizes have different meanings in
      different contexts. For register blocksizes, they refer to the packing
      register blocksizes (PACKMR or PACKNR) while for cache blocksizes, they
      refer to the maximum blocksize to use during the final iteration of a
      loop.

commit 87db5c048e0c7f37351fda486abaf7d19fc5821c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 12 19:38:37 2018 -0500

    Changed usage of virtual microkernel slots in cntx.
    
    Details:
    - Changed the way virtual microkernels are handled in the context.
      Previously, there were query routines such as bli_cntx_get_l3_ukr_dt()
      which returned the native ukernel for a datatype if the method was
      equal to BLIS_NAT, or the virtual ukernel for that datatype if the
      method was some other value. Going forward, the context native and
      virtual ukernel slots will both be initialized to native ukernel
      function pointers for native execution, and for non-native execution
      the virtual ukernel pointer will be something else. This allows us
      to always query the virtual ukernel slot (from within, say, the
      macrokernel) without needing any logic in the query routine to decide
      which function pointer (native or virtual) to return. (Essentially,
      the logic has been shifted to init-time instead of compute-time.)
      This scheme will also allow generalized virtual ukernels as a way
      to insert extra logic in between the macrokernel and the native
      microkernel.
    - Initialize native contexts (in bli_cntx_ref.c) with native ukernel
      function addresses stored to the virtual ukernel slots pursuant to
      the above policy change.
    - Renamed all static functions that were native/virtual-ambiguous, such
      as bli_cntx_get_l3_ukr_dt() or bli_cntx_l3_ukr_prefers_cols_dt()
      pursuant to the above polilcy change. Those routines now use the
      substring "get_l3_vir_ukr" in their name instead of "get_l3_ukr". All
      of these functions were static functions defined in bli_cntx.h, and
      most uses were in level-3 front-ends and macrokernels.
    - Deprecated anti_pref bool_t in context, along with related functions
      such as bli_cntx_l3_ukr_eff_dislikes_storage_of(), now that 1m's
      panel-block execution is disabled.

commit dbaf440540837b03643190cd685ed889fa7fd212
Merge: 22aa44eb 2610fff0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 11 12:37:04 2018 -0500

    Merge branch 'master' into dev

commit 2610fff0b07bdb345cb2e334ef6bea0c63c8cead
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 11 12:32:54 2018 -0500

    Renamed 1m packm kernels from _1e to _1er.
    
    Details:
    - Renamed the reference packm kernels used by 1m. Previously, they used
      a _1e suffix, which was confusing since they packed to both 1e and 1r
      schemas. This was likely an artifact of the time when there were
      separate kernels for each schema before I decided to combine them into
      a single function (per datatype and panel dimension), and the 1e
      functions were the ones to inherit the 1r functionality. The kernels
      have now been renamed to use a _1er suffix.

commit 7af5283dcc3dded114852d6013d33134021b81aa
Author: sraut <Biplab.Raut@amd.com>
Date:   Mon Jun 11 15:00:22 2018 +0530

    added check condition on n-dimension for XA'=B intrinsic code to process till 128 size
    
    Change-Id: I95d020a5ca3ea21d446b8c2e379d56e1eea18530

commit 712de9b371a8727682352a2f52cd4880de905f0b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 9 14:36:30 2018 -0500

    Added missing semicolon in 03obj_view.c
    
    Details:
    - Thanks to Tony Skjellum for pointing out this typo due to a
      last-minute change to the source prior to committing.

commit 043d0cd37ef4a27b1901eeb89d40083cfb2a57ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 9 13:46:49 2018 -0500

    Implemented bli_acquire_mpart(), added example code.
    
    Details:
    - Implemented bli_acquire_mpart(), a general-purpose submatrix view
      function that will alias an obj_t to be a submatrix "view" of an
      existing obj_t.
    - Renumbered examples in examples/oapi and inserted a new example file,
      03obj_view.c, which shows how to use bli_acquire_mpart() to obtain
      submatrix views of existing objects, which can then be used to
      indirectly modify the parent object.

commit f1908d39767baef56077def69126d96f805ee27e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 8 14:22:22 2018 -0500

    Fixed broken input.operations.fast.
    
    Details:
    - Removed three input lines from input.operations.fast (labeled
      "test sequential micro-kernel") that I intended to remove in bd02c4e.
      These lines prevented 'make check' (and 'make checkblis-fast') from
      completing correctly. Note: This bug was fixed in 3df39b3, but that
      commit has not yet been merged into master, hence this redundant
      commit. Thanks to Robert van de Geijn for reporting this issue.

commit 262a62e3482c5caa947a89cabb562b5887555bd6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 8 12:10:54 2018 -0500

    Fixed undefined ref in steamroller/excavator configs.
    
    Details:
    - Fixed erroneous calls to bli_cntx_init_piledriver_ref() in
      bli_cntx_init_steamroller() and bli_cntx_init_excavator(), which
      should have been to their respectively-named bli_cntx_init_*()
      functions instead. Thanks to qnerd for bringing these bugs to our
      attention.

commit 22aa44ebec2c7884bdc944775a1aa7534ab53f0d
Merge: 65fae950 b65d0b84
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 7 17:42:59 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit 65fae95074d239354737355bbe6f202d4f8b2871
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 7 17:41:09 2018 -0500

    Implemented bli_setrm, _setim, _setrv, _setiv.
    
    Details:
    - Defined new wrappers to setm/setv operations in frame/base/bli_setri.c
      that will target only the real or only the imaginary parts of a
      matrix/vector object.
    - Updated bli_obj_real_part() so that the complex-specific portions of
      the function are not executed if the object is real.
    - Defined bli_obj_imag_part().
      - Caveat: If bli_obj_imag_part() is called on a real object, it does
        nothing, leaving the destination object untouched. The caller must
        take care to only call the function on complex objects.
    - Reordered some of the static functions in bli_obj_macro_defs.h related
      to aliasing.

commit b65d0b841b7e4357bc2cf743bbb03384a3ab0bfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 7 14:38:41 2018 -0500

    Fixed bug in bli_dt_proj_to_complex().
    
    Details:
    - Fixed a bug identical to the one fixed in 0a4a27e, except this time in
      the bli_obj_param_defs.h header file. It looks like the only consumers
      of this static function were in bli_l0_oapi.c, and so this may not have
      been manifesting (yet).

commit 55b6abdf7458e31df3ad01796d67c2332c776948
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 7 14:08:12 2018 -0500

    Enforce consistent datatypes in most object APIs.
    
    Details:
    - Added logic to level-1v, -1d, -1f, -1m, -2, and -3 operations' _check()
      functions to ensure that all operands are of the same datatype. There
      are some exceptions that were left out, such as the _check() function
      for the various norm operations since they have a different idea of
      datatype consistency (ie: the norm object must be the real projection
      of the primary input vector/matrix object).

commit 513138b1a1ecebd015580423c779810cae5c67f2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 7 12:24:47 2018 -0500

    Defined/implemented bli_projv().
    
    Details:
    - Added an implementation for bli_projv() to go along with the
      implementation of bli_projm() added in 0a4a27e. The only difference
      between the two is that bli_projv() may only be used on vectors,
      whereas bli_projm() is general-purpose.
    - Added a _check() function corresponding to bli_projv().

commit 5f71c1e719eb482b2a4e40daa280c4f7d05b6963
Merge: b5a641e9 3df39b37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 6 19:06:14 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit b5a641e968469805906eb2c971384d12ad1beac5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 6 19:05:37 2018 -0500

    Added char-to-dt and dt-to-char mapping functions.
    
    Details:
    - Defined additional functions in bli_param_map.c:
        bli_param_map_char_to_blis_dt()
        bli_param_map_blis_to_char_dt()
      which will map a char to its corresponding num_t, or vice versa.

commit 0a4a27e1a4487480410bc0b1bb034bcf97583214
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 6 19:02:29 2018 -0500

    Defined/implemented bli_projm().
    
    Details:
    - Defined a new operation in frame/base/bli_proj.c, bli_projm(), which
      behaves like bli_copym(), except that operands a and b are allowed to
      contain data of differing domains (e.g. a is real while b is complex,
      or vice versa). The file is named bli_proj.c, rather than bli_projm.c,
      with the intention that a 'v' vector version of the function may be
      added to the same file (at some point in the future).
    - Added supporting bli_check_*() functions in bli_check.c to confirm
      consistent precisions between to datatypes/objects, as well as the
      appropriate error message in bli_error.c and a new error code in
      bli_type_defs.h.
    - Wrote a bli_projm_check() function to go along with bli_projm().
    - Defined static function bli_obj_real_part() in bli_obj_macro_defs.h,
      which will initialize an obj_t alias to the real part of the source
      object.
    - Fixed a bug in the static function bli_dt_proj_to_complex(), found
      in bli_param_macro_defs.h. Thankfully, there were no calls to the
      function to produce buggy behavior.

commit 3df39b37a0134befa34b6b6259db98467c7bc965
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 6 15:35:05 2018 -0500

    Fixed recently broken input.operations.fast.
    
    Details:
    - Removed "test sequential front-end" lines from microkernel test
      entries of input.operations.fast. This change was meant for inclusion
      in bd02c4e but was missed due to slightly different wording of the
      comment (I used "sed //d" to remove the lines). This fixes the broken
      'make checkblis-fast' (and 'make check') targets.

commit 695cd520e2f5eab938f66afe9fe36201ab2700c5
Author: sraut <Biplab.Raut@amd.com>
Date:   Wed Jun 6 11:48:56 2018 +0530

    AMD Copyright information changed to 2018
    
    Change-Id: Idfd11afd5d252f8063d0158680d24bf7e2854469

commit df1dd24fd896821de60917b429f303bab7fd0d4b
Author: sraut <Biplab.Raut@amd.com>
Date:   Wed Jun 6 11:24:33 2018 +0530

    small matrix trsm intrinsics optimization code for AX=B and XA'=B
    
    Change-Id: I90123c4d9adbd314c867995cd19dc975150b448c

commit 3f48c38164b4135515b5c752c506fdccc4480be2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 5 16:52:35 2018 -0500

    Cosmetic fix to configure output in config.mk.
    
    Details:
    - Fixed configure so that MK_ENABLE_MEMKIND is assigned "no" when the
      option is disabled due to libmemkind not being present. This wasn't
      affecting anything since the one use of the variable (in common.mk)
      was formulated as "ifeq ($(MK_ENABLE_MEMKIND),yes)". That is, the
      variable being empty was effectively equivalent to it being set to
      "no".
    - Comment updates to build/config.mk.in, common.mk.

commit 5df201260f64aa98a365931f6d2da70144d69932
Merge: 1b9af85e 96d2774b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 5 16:14:19 2018 -0500

    Merge branch 'master' into dev

commit 1b9af85ec98d91bb2b27aadaa3df344d18faff35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 5 16:07:13 2018 -0500

    Updated ref99 call to _cntx_set_thrloop_from_env().
    
    Details:
    - Reordered the arguments in the ref99 sandbox's call to
      bli_cntx_set_thrloop_from_env() to be consistent with the updated
      function signature from f97a86f. Thanks to Devangi Parikh for
      reporting this issue.

commit 96d2774b4cb44ff1e8b5798d7cfc83154a607624
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Tue Jun 5 14:17:39 2018 +0200

    Make bli_auxinfo_next_b() return b_next, not a_next (#216)

commit d4c24ea5f644eb635046e7fe249d3e8e58b4c98a
Author: sraut <biplab.raut@amd.com>
Date:   Tue Jun 5 15:42:59 2018 +0530

    copyright message changed to 2018
    
    Change-Id: I33c1ebda41bc7f1973ff19e3b1947bdad62b4d44

commit 3f1ba4e646776699ebfaa042fe24691d9e2f55d0
Author: sraut <biplab.raut@amd.com>
Date:   Tue Jun 5 14:21:13 2018 +0530

    copyright changed to 2018
    
    Change-Id: Ie916c7cd6f95aedc3cab6eec3a703c9ddb333bc3

commit bd02c4e9f7fe07487276e61507335d48c8e05f35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 4 13:42:17 2018 -0500

    Cleanups to testsuite, input.operations format.
    
    Details:
    - Removed the line in each operation entry in input.operations titled
      "test sequential front-end" and the corresponding support for the lines
      in the testsuite input parsing code. This line was included in the some
      of the earliest versions of the testsuite, back when I intended to
      eventually have separate multithreaded APIs. Specifically, I envisioned
      that multithreaded and sequential testing could be enabled or disabled
      on an operation level. However, BLIS evolved in a different direction
      and still does not have multithreaded-specific APIs (even if it will
      eventually someday). But even if it did have such APIs, I doubt I would
      allow the user to enable/disable them on an operation level. Thus, this
      was a zombie future parameter that was never used and never made sense
      to begin with. The one instance of the front_seq variable, used in the
      various libblis_test_<operation>() functions to guard the call to the
      operation test driver, that remains was commented out instead of
      deleted so that someday it could be easily changed via sed, if desired.
    - Various minor cleanups to the testsuite code, including consolidating
      use of DISABLE and DISABLE_ALL and reexpressing certain conditional
      expressions in the libblis_test_<operation>() functions in terms of
      boolean functions.

commit 2c6d99b99e50d70f904da298a0c59be16cc5c180
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jun 3 18:13:36 2018 -0500

    Fixed names out of alphabetical order in CREDITS.

commit 7a207e8f2c5046f8b295a78e029ff2de765c7409
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jun 3 18:04:27 2018 -0500

    Disabled indirect blacklisting (issue #214).
    
    Details:
    - Return early from function, pass_config_kernel_registries(), that
      implements indirect blacklisting of subconfigurations (during pass 0).
      In short, I realized that indirect blacklisting is not needed in the
      situations I envisioned, and can actually cause problems under certain
      circumstances. Thanks to Tony Skjellum for reporting the issue (#214)
      that led to this commit, and to Devin Matthews for prompting me to
      realize that indirect blacklisting was unnecessary, at least as
      originally envisioned.

commit d7fb32682057c7458c8891c0eedafc374fd9beef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jun 3 13:20:37 2018 -0500

    Fixed syntax artifacts from 4b36e85 in examples.
    
    Details:
    - Fixed artifacts of malformed recursive sed expressions used when
      preparing 4b36e85, in which most function-like macros were converted
      to static functions. The syntactically defective code was contained
      entirely in examples/oapi. Thanks to Tony Skjellum for reporting this
      issue.
    - Update to CREDITS file.

commit ed7dedfd4a07eefeb5a038f9899afb8053b45383
Merge: f97a86f3 469727d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 2 20:29:53 2018 -0500

    Merge branch 'master' into dev

commit f97a86f322a6e3e31f33c89befc66189b0b8c64f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 2 20:28:20 2018 -0500

    Updated setting/querying pack schema (cntx->cntl).
    
    - Query pack schemas in level-3 bli_*_front() functions and store those
      values in the schema bitfields of the correponding obj_t's when the
      cntx's method is not BLIS_NAT. (When method is BLIS_NAT, the default
      native schemas are stored to the obj_t's.)
    - In bli_l3_cntl_create_if(), query the schemas stored to the obj_t's in
      bli_*_front(), clear the schema bitfields, and pass the queried values
      into bli_gemm_cntl_create() and bli_trsm_cntl_create().
    - Updated APIs for bli_gemm_cntl_create() and bli_trsm_cntl_create() to
      take schemas for A and B, and use these values to initialize the
      appropriate control tree nodes. (Also cpp-disabled the panel-block cntl
      tree creation variant, bli_gemmpb_cntl_create(), as it has not been
      employed by BLIS in quite some time.)
    - Simplified querying of schema in bli_packm_init() thanks to above
      changes.
    - Updated openmp and pthreads definitions of bli_l3_thread_decorator()
      so that thread-local aliases of matrix operands are guaranteed, even
      if aliasing is disabled within the internal back-end functions (e.g.
      bli_gemm_int.c). Also added a comment to bli_thrcomm_single.c
      explaining why the extra aliasing is not needed there.
    - Change bli_gemm() and level-3 friends so that the operation's ind()
      function is called only if all matrix operands have the same datatype,
      and only if that datatype is complex. The former condition is needed
      in preparation for work related to mixed domain operands, while the
      latter helps with readability, especially for those who don't want to
      venture into frame/ind.
    - Reshuffled arguments in bli_cntx_set_thrloop_from_env() to be
      consistent with BLIS calling conventions (modified argument(s) are
      last), and updated all invocations in the level-3 _front() functions.
    - Comment updates to bli_cntx_set_thrloop_from_env().

commit 965db85d29977d228ea744581edf2b682eb8e8a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 1 12:32:15 2018 -0500

    Updated macro invocations in bli_gemm_ker_var2.c.
    
    Details:
    - Updated "get next a/b micropanel" macro invocations in
      bli_gemm_ker_var2.c according to changes in 9588625.
    - Comment update in bli_cntx.c.

commit 8749fa0b48a7710f4115023e2c46bc80167bc8f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 31 12:34:01 2018 -0500

    Cleanups to ref99/README.md, test/3m4m/Makefile.
    
    Details:
    - Minor edits to sandbox/ref99/README.md.
    - Removed cpp guards in sandbox/ref99/thread/blx_gemm_thread.h to be
      consistent with other headers in sandbox/ref99.
    - Additional targets and related cleanups in test/3m4m/Makefile.

commit 9588625c43c86ef1bde8140f620a30f52420e6a6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 30 15:19:53 2018 -0500

    Renamed "next micropanel" macros in _l3_thrinfo.h.
    
    Details:
    - Renamed several macros defined in bli_l3_thrinfo.h designed to compute
      the values of a_next and b_next to insert into an auxinfo_t struct in
      level-3 macrokernels. (Previously, the macros did not use a bli_
      prefix.)
    - Updated instances of above macro usage within various macrokernels.

commit e4420591225fca2f63ca74ef6a23b962fcd4bec0
Merge: 34f974d1 850a8a46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 29 17:12:22 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit 34f974d1a83a7d29ba09f67e392d361231fdf99c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 29 17:11:52 2018 -0500

    More tweaks/updates to sandbox/ref99/README.md.

commit 850a8a46c0a569a2652d8c200e5c53b61bcf988d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue May 29 13:51:21 2018 -0500

    Test all x86_64 configurations*... (#212)
    
    * Add custom SDE cpuid files.
    
    * Set up testing of all x86_64 architectures (except bulldozer) using SDE.
    
    * Update .travis.yml
    
    [ci skip]
    
    * Update do_testsuite.sh
    
    [ci skip]
    
    * Updated .travis.yml with my secret token.
    
    Details:
    - Replaced Devin's temporary secret token with my own, which is used by
      Travis when accessing the Intel SDE via Dropbox.
    
    * Work around CPUID dispatch in glibc/libm by patching ld.so.
    
    * Detect path of loader at runtime.
    
    * Attempt to make SDE run on Travis
    
    * Allow unpatched ld.so if we don't know how to patch it.
    
    I *think* this only happens for older glibc without the multi-arch stuff (e.g. Ubuntu 14.04 on Travis), but who knows?
    
    * Upgrade Travis to gcc-6 and binutils-2.26.
    
    * Try to get Travis to use the right assembler.
    
    * Apparently you need ld-2.26 too.
    
    * Try to also patch ld.so from Ubuntu 14.04.
    
    * Take the nuclear option.
    
    * Account for non-absolute dependencies in ldd output.
    
    * String manipulation fail.
    
    * Update patch-ld-so.py
    
    * Add Zen to SDE testing.
    
    * Removed dead variable from travis/do_testsuite.sh.
    
    Details:
    - Removed 'BLIS_ENABLE_TEST_OUTPUT=yes' from make invocations in
      travis/do_testsuite.sh. This variable is no longer present in the
      BLIS build system (if it ever was?), and therefore has no effect.

commit 42ea02a34e5c144893fe239ae55daef895d92677
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 29 12:48:14 2018 -0500

    Renamed c99 sandbox to ref99.
    
    Details:
    - Renamed sandbox/c99 to sandbox/ref99. I wanted to name the sandbox so
      that it would be thought of as a "reference" sandbox. I kept the "99"
      to differientiate it from future reference sandboxes that may be
      written in another language (such as C++).
    - Updates to sandbox/ref99/README.md.

commit 0e7205ccef50dccd4306cf427a63633396472813
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 29 12:36:13 2018 -0500

    Remove sandbox/.gitkeep now that dir is non-empty.

commit 3a4603858e3819cbd6ed7dd67d0fc0b3f89ed254
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat May 26 15:51:08 2018 -0500

    More README.md updates to sandbox/c99.
    
    Details:
    - Added a section that walks the reader through how to configure BLIS to
      use a gemm sandbox.

commit 2bad97f6bdf4642884d60fc03970549902a54d74
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat May 26 15:31:16 2018 -0500

    Updates to CREDITS, sandbox/c99/README.md.

commit 2b4a447526effa3e847a7e5c15c3758573f12318
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 25 18:51:23 2018 -0500

    Initial implementation of c99 "reference" sandbox.
    
    Details:
    - Added a c99 sandbox (in sandbox/c99) to serve as a starting point for
      others looking to experiment with alternative implementations of gemm
      in BLIS. Note that this sandbox implementation is a first draft and
      will be refined over time.
    - Minor updates to Makefile and common.mk to restrict what source files
      get recompiled when sandbox files are touched.
    - Added an initial draft of a README.md in sandbox/c99.

commit 469727d4f8a976d8713afb4d0b6235c322498db0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 25 16:17:13 2018 -0500

    Very minor comment updates.

commit 66dbe69a0f9359bf1e39b5672ee365213de2e3ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 25 15:45:53 2018 -0500

    Converted macros to static funcs in _packm_cntl.h.
    
    Details:
    - Converted various macros in frame/1m/packm/bli_packm_cntl.h (designed
      to access fields of a packm_params_t struct) to static functions.

commit 22deef2f5463a47e3b3c37fc313d17550f10ee06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 24 14:28:55 2018 -0500

    Support alternative gemm implementation sandboxes.
    
    Detail:
    - configure:
      - add support for --enable-sandbox=NAME to configure script, where NAME
        is a subdirectory of a new 'sandbox' directory that contains an
        alternative implementation of gemm. (For now, only implementations of
        gemm may be provided via a sandbox.);
      - add support for C++ compiler. C++ compilers are handled in a manner
        similar to that of C compilers, in that a default search order is
        used, and that CXX is searched for first, if the variable is set. In
        practice, the C++ compiler that is selected should correspond to the
        selected C compiler. (Example: If gcc is selected for C, g++ should
        be selected for C++.) The result of the search is output to config.mk
        via build/config.mk.in. NOTE: The use of C++ in BLIS is still
        hypothetical, but may eventually move to being experimental. This
        support was intended only for use of C++ within a gemm sandbox.
    - build/config.mk.in:
      - define SANDBOX variable containing sandbox subdirectory name.
    - build/bli_config.in:
      - define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
        macros in bli_config.h.
    - common.mk:
      - include makefile fragments that were propagated into the specified
        sandbox subdirectory;
      - generate different CFLAGS for sandboxes, as well as a separate
        CXXFLAGS variable for sandboxes when C++ source files are compiled;
      - isolate into a single location lists of file suffixes for various
        purposes.
      - reorganized/clean up code related to identifying header files and
        paths.
    - Makefile:
      - generate object filepaths for and compile source code files found in
        sandbox sub-directory;
      - remove makefile fragments placed in sandbox sub-directory (cleanmk);
      - various other cleanups.
    - Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
      makefile fragments (via build/gen-make-frags/suffix_list).
    - Updated blis.h to conditionally #include bli_sandbox.h (via a new file,
      bli_sbox.h), which each sandbox is assumed to use for any type
      definitions and function prototypes it wishes to export out to blis.h.
    - Conditionally disable bli_gemmnat() implementation in frame/3 when
      BLIS_ENABLE_SANDBOX is defined.

commit 25e3501ed57a0db7f860c88b7199b36049aec12a
Merge: 216a4cb9 5140ee34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 24 13:57:16 2018 -0500

    Merge branch 'master' into dev

commit 5140ee3424c744981a3fed3b5a748ebbfc111388
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 23 16:56:14 2018 -0500

    Updated types of bli_is_[un]aligned_to() functions.
    
    Details:
    - Changed the void* arguments of the following static functions:
        bli_is_aligned_to()
        bli_is_unaligned_to()
        bli_offset_past_alignment()
      to siz_t, and the return type of bli_offset_past_alignment() from
      guint_t to siz_t. This allows for more versatile usage of these
      functions (e.g. when aligning both pointers and leading dimension).
    - Updated all invocations of these functions, mostly in kernels/penryn
      but also in kernels/bgq, to include explicit typecasts to siz_t when
      pointer arguments are passed in.
    - Thanks to Devin Matthews for pointing out this potential bug (via issue
      #211).
    - Deleted a few trailing spaces in various penryn kernels.
    - Removed duplicate instances of the words "derived" and "THEORY" from
      various kernel license headers, likely from a malformed recursive sed
      performed long ago.

commit 216a4cb9cb87fa4c93f6ceb6ae90602e5018b305
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 18 18:47:03 2018 -0500

    Minor update to flatten-headers.[py|sh] help text.
    
    Details:
    - Fixed a typo and removed some outdated language from the help text of
      flatten-headers.py and flatten-headers.sh.

commit 962a706a6f56ea070ac4683f0af69c7e59af8ecb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 18 18:19:40 2018 -0500

    Updated LICENSE file to mention HP Enterprise.
    
    Details:
    - Added HP Enterprise to the LICENSE file. Previously, only the source
      files touched by HPE contained the corresponding copyright notices.
      (This oversight was unintentional.)
    - Updated file-level copyright notices to include a comma, to match
      the formatting used for UT and AMD copyrights.

commit efa43e13effe901ad31e734ac90f027e89473bd9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 18 12:20:40 2018 -0500

    More updates to CREDITS and RELEASING files.

commit f94ab97af8e86baf9ee9a9cbaef8bb3712df2e11
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 17 17:45:31 2018 -0500

    Update to CREDITS file.

commit 4919b10c005e006a6d818eb8f865f9dbd8aa16df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 17 16:38:49 2018 -0500

    Minor changes to README.md and CONTRIBUTING.md.

commit b89451187e8321b673a1cf7603c8d48028d9d4c8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 17 16:23:06 2018 -0500

    README.md update.
    
    Details:
    - Added "Contributing" section with relevant links.

commit af244194e7d76276a1b90fe59f9307dde0429e1d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 17 15:38:02 2018 -0500

    Removed explicit critical sec. from bli_memsys.c.
    
    Details:
    - Removed critical sections protecting the initialization/finalization of
      bli_memsys.c. These synchronization mechanisms are no longer needed now
      that BLIS initializes all APIs via pthread_once().

commit 10c9e8f95254d8c6436c4d3cb093fa5544b45c90
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 17 15:22:51 2018 -0500

    Cache hardware's arch_t id after querying once.
    
    Details:
    - Added logic to bli_arch.c that will call what was previously the body
      of bli_arch_query_id() only once and then cache the value in a static
      variable local to the file. (Previously, the arch_t associated with
      the hardware/configuration was queried every time bli_arch_query_id()
      was called, which was at least once per level-3 function call. Thanks
      to Devin Matthews for suggesting this feature via issue #175.
    - Added -lpthread to the compile/link command line of the compiler
      invocation that compiles build/detect/config/config_detect.c, which
      prints the string identifying the detected configuration, since it
      is now needed due to new pthread_once() logic in bli_arch.c.
    - Implementation note: I chose to implement this arch_t caching feature
      via pthread_once(), using a separate pthread_once_t variable local to
      the file, rather than calling bli_init_once(). The reason is that I
      did not want to require bli_init() as a prerequisite to this function.
      bli_init() already calls several sub-components, some of which make use
      of bli_arch_query_id(), and therefore it would be easy to fall into a
      circular self-init situation (which usually causes pthreads to hang
      indefinitely).

commit f28a15293890ac6fbceac229fd204dbc9fec6e27
Author: Francisco Igual <figual@ucm.es>
Date:   Thu May 17 09:26:14 2018 +0000

    Fixed clobber list bug in ARMv8 ukernel

commit 2e31dd7852b4d6a9355899cf9659d4b8130461cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 16 17:28:33 2018 -0500

    Inserted missing integer typecasting into ukernels.
    
    Details:
    - Inserted missing safeguards into most microkernels to ensure that the
      integers read by the microkernel's assembly instructions are of the
      appropriate size. In many cases, this bug was going undetected likely
      because the compiler was inserting zero padding before the integers
      in the calling function, allowing the assembly code to read 64-bits
      in a way that did not corrupt the "lower" 32 integer bits with garbage
      in the higher bits. Thanks to Francisco Igual and Devangi Parikh for
      finding this issue.

commit 12dfa9516428b4092554f0ce70b07571d35de222
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 16 12:46:57 2018 -0500

    Fixed a bug in determining default integer size.
    
    Details:
    - Fixed a bug that would cause configurations to inadvertantly define
      their integers to be 32 bits when those environments actually call for
      64-bit integers. While either BLIS_ARCH_64 or BLIS_ARCH_32 is defined
      in bli_system.h (based on whether preprocessor macros such as __x86_64
      or __aarch64__ are defined by the environment), bli_system.h was being
      #included *after* bli_config_macro_defs.h, in which the BLIS_ARCH_64
      macro was used to choose an integer type size in the event that
      BLIS_INT_TYPE_SIZE was not already defined by configure via
      bli_config.h. And due to the structure of the cpp code in that file,
      the 32-bit integer case was being chosen. Thanks to Francisco Igual
      and Devangi Parikh for their help in isolating this bug.
    - Moved the #include of hbwmalloc.h and related preprocessor code to
      bli_kernel_macro_defs.h to facilitate the reshuffling of the #include
      for bli_system.h in blis.h.

commit f930cec0f35824c0f9ebbd218614209217d491cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 15 17:47:08 2018 -0500

    More tweaks to CONTRIBUTING.md.

commit 173e30ff7d293ba31f3fab8ab0c0a695eda3d4fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 15 14:48:34 2018 -0500

    Added initial draft of CONTRIBUTING.md file.
    
    Details:
    - Thanks to the Ruby on Rails project for providing a good template off
      of which to build.

commit 6e25e758b444bf725046674e1e64c6a52421749d
Author: Nico Schlömer <nico.schloemer@gmail.com>
Date:   Tue May 15 14:03:20 2018 +0200

    Debian config (#206)
    
    * add debian config
    
    * correct wording in the README

commit fcf6c6a3c87da08a7cdb92b102489b991ef7a644
Author: Alex Arslan <ararslan@comcast.net>
Date:   Mon May 14 18:41:03 2018 -0700

    Fix shared library builds on platforms other than Linux and macOS (#209)
    
    * Fix detection of systems other than Linux and macOS
    
    The way the logic is currently laid out, any platform that isn't Linux
    gets assigned the .dylib shared library extension and the macOS-specific
    compiler flags. This reverses the logic to check for macOS first, and
    have the fallback use the Linux definitions, which apply to most other
    systems as well.
    
    * Use SHLIB_EXT instead of SO_SUF
    
    The former is more standard, as jakirkham pointed out in a comment.

commit 6f7f51048c48f31d691c06451d0fd2cbc453ad03
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 14 18:41:56 2018 -0500

    Echo cc_vendor when printing compiler version.
    
    Details:
    - Echo the ${cc_vendor} when informing the user of the compiler's version.
      Previously, the actual ${cc} (which could be a path to the executable)
      was being printed, which has already been printed by that point in the
      configure script.

commit ad67dc4e348b0a381efc057573a6b03cc7e26db0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 14 18:35:28 2018 -0500

    Communicate cc, cc_vendor to make via config.mk.
    
    Details:
    - Historically, the compiler selection has happened statically in the
      various make_defs.mk and would only be overriden by setting CC (either
      prior to running configure or as a configure argument). However, in
      the last couple months, configure has evolved to contain rather
      sophisticated compiler detection logic for the purposes of blacklisting
      sub-configurations. It only makes sense that configure now fully take
      over the responsibility of selecting a compiler from the GNU make side
      of the build system. Thanks to Alex Arslan for his help exposing this
      issue.
    - Substitute found_cc into CC in config.mk via configure.
    - Set a new variable, CC_VENDOR, in config.mk via substitution from
      configure, and disable the corresponding CC_VENDOR code in common.mk.
    - Disabled default compiler selection (usually gcc) in the sub-configs'
      various make_def.mk files.

commit 20af119fc97ec6120017a7a5ba5f9aaa920c7640
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 14 17:44:58 2018 -0500

    Added README.md to 'config' directory.
    
    Details:
    - Added a brief README.md file to the config directory to redirect those
      who may be exploring the source tree to the ConfigurationHowTo wiki.
      (Included is a very brief explanation of configurations for those who
      don't have time to read the wiki.) Thanks to Nico Schlömer for this
      suggestion.

commit 9dbce16269c3e1f27c7a0d64372cc76aed30dfc1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 14 17:04:54 2018 -0500

    Search for 'cc clang gcc' on OpenBSD, FreeBSD.
    
    Details:
    - Swapped gcc and clang in the compiler search list for OpenBSD.
    - Use the same search list for FreeBSD as above.

commit 55ebf24d63128b5fd15b10160485667415a02a55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 14 16:19:08 2018 -0500

    Change compiler search order on OpenBSD.
    
    Details:
    - Set a compiler search list (and order) as a function of the OS detected
      via 'uname -s'. By default, this list and order is 'gcc clang cc' for
      Linux and Darwin (OS X), and any other OS except OpenBSD). On OpenBSD,
      we use 'cc gcc clang' because OpenBSD's default installation of gcc
      (4.2.1) is too old for BLIS. Thanks to Alex Arslan for reporting this
      issue and suggesting a fix.

commit 4fb353bd90e6642c8aeffd1b1e6329f54eee4bb4
Merge: 4b36e85b 8a2857b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun May 13 17:50:51 2018 -0500

    Merge branch 'master' into dev

commit 8a2857b5e3c633b18c24f2275110437a702a71d0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 11 18:42:05 2018 -0500

    Fixed README.md typo; mention 'make check'.

commit 543935c02f9335142d2e485a15f37dbaebe012ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 11 18:35:32 2018 -0500

    Updated README.md with Ubuntu packages link.
    
    Details:
    - Created a separate section of README.md for external packages, with
      one bullet each for Dave Love's rpms and Nico Schlömer's Ubuntu apt
      packages. Thanks to Dave and Nico for their contributions.

commit af1d8470b56d3b2a1c8513d366d788dddcb84baa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 11 17:49:58 2018 -0500

    Better handling of shared libraries on OS X.
    
    Details:
    - Use the .dylib shared library suffix on OS X (instead of .so in Linux).
    - Link with the -dynamiclib and -install_name options on OS X (instead of
      -shared and -soname in Linux).
    - Determine operating system (e.g. Linux, Darwin) during configure and
      substitute into config.mk.in rather than run 'uname -s' during make.
    - Echo operating system during configure.

commit 4b72a462d7467cf815422aafac7b05037d2e3b13
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 10 18:35:38 2018 -0500

    Enable building shared library by default.
    
    Details:
    - Tweaked configure so that the shared library is generated by default.
    - Updated --help text and configure's feedback messages reporting the
      status of the static/shared builds.
    - Changed the order of build product installation so that headers are
      installed last, after libraries and symlinks.

commit b699bb1ff03c6e9baaa054805b4939983ae7145b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu May 10 15:54:17 2018 -0500

    Adopt Linux-like .so versioning at install-time.
    
    Details:
    - Changed the naming conventions used for installed libraries and
      symlinks to more closely mirror patterns used by typical GNU/Linux
      libraries. Whereas previously static and shared libraries were
      installed and symlinked as follows:
    
        (library) libblis-0.3.2-15-haswell.a
        (library) libblis-0.3.2-15-haswell.so
        (symlink) libblis.a -> libblis-0.3.2-15-haswell.a
        (symlink) libblis.so -> libblis-0.3.2-15-haswell.so
    
      we now use the following naming conventions:
    
        (library) libblis.a
        (symlink) libblis.so -> libblis.so.0.1.2
        (symlink) libblis.so.0 -> libblis.so.0.1.2
        (library) libblis.so.0.1.2
    
      where 0.1.2 indicates shared library major, minor, and build versions
      of 0, 1, and 2, respectively. The conventional version string can
      still be queried by linking to the library in question and then calling
      bli_info_get_version_str(). (The testsuite binary does this
      automatically at startup.)
    - Added logic to common.mk to set the soname field in the shared library
      via the -soname linker flag.
    - Added a 'so_version' file to the top-level directory containing two
      lines. The first line specifies the .so major version number, and the
      second line specifies the minor and build version numbers joined with
      a '.'. This file is read by configure and those values substituted
      into build/config.mk.in to define SO_MAJOR, SO_MINORB, and SO_MMB
      variables.

commit fc2d9ec6bf46f6e5b19d196208415ce433e95b10
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 9 15:19:28 2018 -0500

    Tweaks to top-level clean and distclean targets.
    
    Details:
    - Moved the removal of bli_config.h from cleanh to distclean.
    - Removed cleantest as a dependency of clean.

commit bf0350305971e3991861b5117a13fda31ff97b6d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 8 16:49:22 2018 -0500

    Renamed (shortened) a few build system variables.
    
    Details:
    - Renamed the following variables in config.mk (via build/config.mk.in):
        BLIS_ENABLE_VERBOSE_MAKE_OUTPUT -> ENABLE_VERBOSE
        BLIS_ENABLE_STATIC_BUILD        -> MK_ENABLE_STATIC
        BLIS_ENABLE_SHARED_BUILD        -> MK_ENABLE_SHARED
        BLIS_ENABLE_BLAS2BLIS           -> MK_ENABLE_BLAS
        BLIS_ENABLE_CBLAS               -> MK_ENABLE_CBLAS
        BLIS_ENABLE_MEMKIND             -> MK_ENABLE_MEMKIND
      and also renamed all uses of these variables in makefiles and makefile
      fragments. Notice that we use the "MK_" prefix so that those variables
      can be easily differentiated (such as via grep) from their "BLIS_" C
      preprocessor macro counterparts.
    - Other whitespace changes to build/config.mk.in.
    - Renamed the following C preprocessor macros in bli_config.h (via
      build/bli_config.h.in):
        BLIS_ENABLE_BLAS2BLIS        -> BLIS_ENABLE_BLAS
        BLIS_DISABLE_BLAS2BLIS       -> BLIS_DISABLE_BLAS
        BLIS_BLAS2BLIS_INT_TYPE_SIZE -> BLIS_BLAS_INT_TYPE_SIZE
      and also renamed all relevant uses of these macros in BLIS source
      files.
    - Renamed "blas2blis" variable occurrences in configure to "blas", as
      was done in build/config.mk.in and build/bli_config.h.in.
    - Renamed the following functions in frame/base/bli_info.c:
        bli_info_get_enable_blas2blis() -> bli_info_get_enable_blas()
        bli_info_get_blas2blis_int_type_size()
                                        -> bli_info_get_blas_int_type_size()
    - Remove bli_config.h during 'make cleanh' target of top-level Makefile.

commit 4b36e85be9b516b4089b24768f881dd976668997
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 8 14:26:30 2018 -0500

    Converted function-like macros to static functions.
    
    Details:
    - Converted most C preprocessor macros in bli_param_macro_defs.h and
      bli_obj_macro_defs.h to static functions.
    - Reshuffled some functions/macros to bli_misc_macro_defs.h and also
      between bli_param_macro_defs.h and bli_obj_macro_defs.h.
    - Changed obj_t-initializing macros in bli_type_defs.h to static
      functions.
    - Removed some old references to BLIS_TWO and BLIS_MINUS_TWO from
      bli_constants.h.
    - Whitespace changes in select files (four spaces to single tab).

commit 7e5648ca150757b874f6823da832f3798c40b9f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 7 18:59:19 2018 -0500

    Add configure support for --libdir, --includedir.
    
    Details:
    - Added support for two new configure options: --libdir and --includedir.
      They specify the precise install directories for libraries and header
      files, respectively, and override any location implied by the --prefix
      option (including the default install prefix, if --prefix was not
      given). Thanks to Nico Schlömer for suggesting this via issue #195.
    - Removed the INSTALL_PREFIX definition/anchor from build/config.mk.in
      and replaced it with corresponding definitions/anchors for libdir and
      includedir.
    - Updated top-level Makefile to use the new variables, INSTALL_LIBDIR
      and INSTALL_INCDIR, instead of INSTALL_PREFIX (which is now no longer
      needed by make).
    - Set default sane values for INSTALL_LIBDIR and INSTALL_INCDIR in
      common.mk when configure has not been run, as is already done for
      DIST_PATH. This is to safeguard against statements in the top-level
      Makefile that use 'find' to locate old libraries and headers for the
      uninstall targets, which run regardless of make target. Without setting
      INSTALL_LIBDIR and INSTALL_INCDIR, those variables are empty and the
      'find' ends up looking at '/', which is obviously not what we want.
      (Also enclosed those definitions in an IS_CONFIGURED guard so that they
      won't get evaluated unless configure has been run.)
    - Rearranged "ifeq ($(IS_CONFIGURED),yes)" conditionals in Makefile to
      reduce occurrences and separated "local" and top-level components of
      cleanblastest and cleanblistest targets to improve readability.
    - Adjusted out-of-tree builds so that they are no longer oblivious to
      the .git directories, if present, and thus now properly augment version
      strings with the appropriate patch number.
    - Include missing version string in 'configure --help' output.

commit b09e4e8852a6c42895910e3bcb9041124dc8bf9f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 7 14:37:50 2018 -0500

    Allow 'make clean' and friends without configuring.
    
    Details:
    - Modified top-level Makefile so that a user can run 'make distclean',
      'make clean', or any of the other clean-related targets prior to
      running configure (or after a previous 'make distclean'). Thanks to
      Nico Schlömer for suggesting this via issue #197.
    - Made the cleanblastest and cleanblistest more comprehensive in that
      they now clean out build products that would have resulted from local
      compilation (ie: builds performed within the 'blastest' or 'testsuite'
      directories).
    - Added "cc" to list of expected compiler "vendors" since the CC variable
      seems to automatically be set to "cc" on Ubuntu 16.04 (which is just an
      alias to gcc).
    - Comment update to build/config.mk.in.

commit 35c5a1449c3efe0b2ec43cdefcfdf00e71828149
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 7 12:04:57 2018 -0500

    No longer update version file during configure.
    
    Details:
    - Recycled the core functionality of build/update-version-file.sh into a
      function in configure, disabling the updating of the 'version' file in
      the process. Instead of writing the patched version string back to the
      version file and then reading it again from within configure, the
      patched version string is now saved directly to a variable in the main()
      function in configure. This will prevent developers from accidentally
      committing configure-induced changes to the version file in between
      releases.

commit 8adb2f919b62da4a2885ae04a10925e0e6a2e304
Author: Mathieu Poumeyrol <kali@users.noreply.github.com>
Date:   Sun May 6 19:58:16 2018 +0200

    Some cross compilations fixes (#198)
    
    * cross-compilation fixes
    * add doc ranlib variable
    * icc support -dumpversion, posix compatible test, plus one stupid mistake
    * retab
    * revert version as requested

commit 89acd9ebe516eeb97006dba344354bfc98826645
Merge: 4cff432d 0557eba7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 2 12:53:35 2018 -0500

    Merge branch 'amd'

commit 4cff432d707891ada705b039a7e043558bbf3c51
Author: Nisanth M P <31736542+nisanthmpamd@users.noreply.github.com>
Date:   Wed May 2 23:20:42 2018 +0530

    AMD specific optimizations for target 'zen' (#194)
    
    Re-enabled AMD-specific optimizations for zen.
    
    Details:
    - Re-enabled Zen-specific cache blocksizes for 'zen' sub-configuration.
    - Re-enabled small matrix gemm optimization for 'zen'.
    - These were both temporarily disabled during a previous merge simply due to lack of Zen hardware for testing.

commit 8eda5fe7f678b413cb274bd84716995a7d0b87a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 2 12:20:37 2018 -0500

    Typo fix in README.md.

commit 0557eba78f5fcf28f0f039f28da79498ffde848c
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Mar 19 12:49:26 2018 +0530

    Re-enabling the small matrix gemm optimization for target zen
    
    Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit df78ceb3d6f33a27fe69017854405edaea7c40e5
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Mar 19 11:34:32 2018 +0530

    Re-enabling Zen optimized cache block sizes for config target zen
    
    Change-Id: I8191421b876755b31590323c66156d4a814575f1

commit 5e515f9a76f4aaf43dc21315a34d797726ca8069
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 1 13:44:10 2018 -0500

    Tweaked new language in README.md.

commit 1ddd9e316ad5024af8b606dfcebd1e7d587a130f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 1 13:36:28 2018 -0500

    Added link to Dave Love's Fedora Copr page.
    
    Details:
    - Added a blurb to README.md advertising Dave Love's Copr homepage,
      which contains rpm packages for RHEL/Fedora-like distributions.

commit 078a852f738c66c6468bd5e64b06467edc9057fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 30 16:15:26 2018 -0500

    Minor tweaks to top-level 'make clean' target.
    
    Details:
    - Execute 'cleanh' target as part of 'clean'
    - Remove cblas.h file from 'include/<configname>/' as part of 'cleanh'
      target.
    - Updated the echoed (non-verbose) text for uniformity.

commit 75d0d1057dda69c655bd1cd8f791cb39b54d99b8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 30 14:57:33 2018 -0500

    Renamed various datatype-related macros/functions.
    
    Details:
    - Renamed the following macros in bli_obj_macro_defs.h and
      bli_param_macro_defs.h:
      - bli_obj_datatype()                 -> bli_obj_dt()
      - bli_obj_target_datatype()          -> bli_obj_target_dt()
      - bli_obj_execution_datatype()       -> bli_obj_exec_dt()
      - bli_obj_set_datatype()             -> bli_obj_set_dt()
      - bli_obj_set_target_datatype()      -> bli_obj_set_target_dt()
      - bli_obj_set_execution_datatype()   -> bli_obj_set_exec_dt()
      - bli_obj_datatype_proj_to_real()    -> bli_obj_dt_proj_to_real()
      - bli_obj_datatype_proj_to_complex() -> bli_obj_dt_proj_to_complex()
      - bli_datatype_proj_to_real()        -> bli_dt_proj_to_real()
      - bli_datatype_proj_to_complex()     -> bli_dt_proj_to_complex()
    - Renamed the following functions in bli_obj.c:
      - bli_datatype_size()                -> bli_dt_size()
      - bli_datatype_string()              -> bli_dt_string()
      - bli_datatype_union()               -> bli_dt_union()
    - Removed a pair of old level-1f penryn intrinsics kernels that were no
      longer in use.

commit 01c4173238baf08e7f6700a3f91a2ea58cca50c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 28 14:07:34 2018 -0500

    CHANGELOG update (0.3.2)

commit 2fb440876690bdcec0c11a30e2b33ad100bab529
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 28 14:07:31 2018 -0500

    Version file update (0.3.2)

commit cdf041ddadd8725e578e2f59f37ae341f26655af
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 28 14:05:00 2018 -0500

    Use config.mk instead of common.mk in bump-version.sh.
    
    Details:
    - Fixed inadvertent targeting of common.mk when testing whether configure
      had already been run, rather than config.mk.

commit 6ded8f9f0364b3c07255e2532ada3eeb2ed2a715
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 28 14:01:29 2018 -0500

    Account for recent 'make distclean' in bump-version.sh.
    
    Details:
    - Added logic to build/bump-version.sh that will run './configure auto'
      if 'common.mk' is not present (usually because 'make distclean' was run
      recently).

commit 7c16fdce433f5dea0e83d5047553c955d8e46fd2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 28 13:50:55 2018 -0500

    Fixed typo in RELEASING file.

commit 5e5ca4984fcf6d72d3036c338bb9cdc64520a325
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 28 13:48:01 2018 -0500

    README updates.
    
    Details:
    - Updates to the top-level README files in the top-level directory as
      well as the 'examples/oapi' directory.

commit 627b045e301defea6770dc5b64e1110cbec25153
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 27 18:11:19 2018 -0500

    Added an example of using transposition with gemm.
    
    Details:
    - Added an example to examples/oapi/8level3.c to show how to indicate
      transposition when performing a gemm operation.

commit 13a0eadc69d72933e322901f5b44944834e3c787
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 27 18:00:07 2018 -0500

    Added more transposition/conjugation examples.
    
    Details:
    - Added code to examples/oapi/5level1m.c that demonstrates transposing
      (and conjugate-transposing) unstructured matrices.
    - Comment updates to 6level1m_diag.c to maintain consistency with new
      examples in 5level1m.c.

commit 5606cd8881e75264a96af45dc8ea1905bab054f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 27 17:13:10 2018 -0500

    Added utility module to examples/oapi.
    
    Details:
    - Added a new code example file to examples/oapi demonstrating how to use
      various utility operations.
    - Comment updates to other example files.
    - README updates.

commit ff26c94c6486374c709f93c6965ea18903bd6a18
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 27 12:31:34 2018 -0500

    Added missing gcc version constraint for knl.
    
    Details:
    - Previously forgot to add explicit enforcement of a minimum gcc version
      in configure script when 'knl' sub-configuration is requested.
    - Comment updates to configure.

commit 4d97574e477b3e55ddbb6044b0542a92cd9bab30
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 24 18:48:09 2018 -0500

    Added object API example code.
    
    Details:
    - Added an 'examples' directory at the top level.
    - Added an 'oapi' subdirectory in 'examples' that contains a tutorial-like
      sequence of example code demostrating the core functionality of BLIS's
      object-based API, along with a Makefile and README. Thanks to Victor
      Eijkhout for being the first to suggest including such code in BLIS.

commit d6ab25a3232aa52b9b855088fb4b0b46ff2c00c8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 24 18:43:03 2018 -0500

    Add setijm, getijm operations.
    
    Details:
    - Added bli_setgetijm.c, which defines bli_setijm(), bli_getijm(), and
      related functions that can be used to read and write individual
      elements of an obj_t.
    - Defined a new function, bli_obj_create_conf_to(), in bli_obj.c that will
      create a new object with dimensions conformal to an existing object.
      Transposition and conjugation states on the existing object are ignored,
      as are structure and uplo fields.
    - Defined a new function, bli_datatype_string(), in bli_obj.c that returns
      a char* to a string representation of the name of each num_t datatype.
      For example, BLIS_DOUBLE is "double" and BLIS_DCOMPLEX is "dcomplex".
      BLIS_INT is included (as "int"), but BLIS_CONSTANT is not, and thus is
      not a valid input argument to bli_datatype_string().
    - Added calls to bli_init_once() to various functions in bli_obj.c, the
      most important of which was bli_obj_create_without_buffer().
    - Removed unintended/extra newline from the end of printv output.
    - Whitespace changes to
      - frame/base/bli_machval.c
      - frame/base/bli_machval.h
      - frame/0/copysc/bli_copysc.c
    - Trivial changes to README.md and common.mk.

commit a731a428f7fc02fd6ab4f953ead828c1d06fb5a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 17 16:44:55 2018 -0500

    Another README.md update.

commit c734ee928a824b27d280a9a67b1b4bc8423d5795
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 17 16:40:05 2018 -0500

    README.md update.

commit 03ecad372d8eb603ee905a7b944d0544a813460a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 17 14:16:59 2018 -0500

    Added RELEASING file.
    
    Details:
    - Added a file named 'RELEASING' that contains basic notes on how to
      create a new version/release of BLIS. This is mostly just a reminder
      to myself, but also may become useful if/when others take over
      development and administration of the project.

commit 24b3c3149ce66546b9a1afc2cc794a637a86aa60
Merge: 60366a3f 817b67c0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 16 18:49:38 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit 60366a3faba4e60cee85c3b87a3f69625f4b9026
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 16 18:46:21 2018 -0500

    Updates to knl kernels and related code.
    
    Details:
    - Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
      kernel) from TBLIS and enabled its use in the knl sub-config. Also
      Added sgemm microkernel prototype to bli_kernels_knl.h.
    - Updated dgemm and dpackm microkernels from TBLIS, which included an
      important change regarding the offsets array (changed from extern
      declaration to static declaration/definition).
    - Activated use of level-1v and -1f zen kernels in skx and knl
      sub-configs.
    - Removed some old macros no longer needed in bli_family_skx.h now that
      libmemkind support exists in configure.
    - Moved bli_avx512_macros.h to frame/include and adjusted #includes in
      skx and knl kernels accordingly.
    - Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
      directory.
    - Fixed a minor bug in the 'make' output per compile when verboseness
      is not turned on. The rule-generating function 'make-kernel-rule' was
      previously passing in the name of the config, rather than the name of
      the kernel set returned by get-config-for-kset, which could give
      misleading information to the user when the kconfig_map mapped a
      kernel set to a sub-configuration that did not share the same name.
      (This didn't affect the CFLAGS that were actually used.)
    - Updated test/3m4m/Makefile, removing acml targets and renaming the
      remaining targets.

commit 817b67c01752e0ca8fe230bb8ad23afc7bd0f64e
Merge: 67c9c2f8 2b7108a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 16 14:06:26 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit 67c9c2f86d5ef2accc439b21581d73d82754a2e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 16 14:03:12 2018 -0500

    Retired haswell gemm microkernels.
    
    Details:
    - Moved microkernels in kernels/haswell/3 to kernels/haswell/3/old. These
      microkernels were no longer being used and only sowed confusion to
      anyone inspecting the repository without being fully cognizant of the
      build system and how it works (and sometimes even to those who wrote
      the build system). Note that the haswell configuration currently
      employs the zen microkernels.

commit 2b7108a8ef8ce958b3acad028ff07c85ff97fd63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 16 12:35:53 2018 -0500

    Minor updates to test driver makefiles.
    
    Details:
    - Cleaned up and homogenized the various test driver Makefiles in
      testsuite and test directories.
    - Very minor updates to test driver code.

commit 9f56df95570a24587b910b169f342bd356ccbfb6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 11 14:51:36 2018 -0500

    Trivial tweaks to configure blacklisting output.
    
    Details:
    - Updated output of information vis-a-vis configuration blacklisting.

commit f56481efebd9a7785c0618f3a12c0bec36f46333
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 10 19:02:21 2018 -0500

    Cleaned up assembler version query on OS X.
    
    Details:
    - Swiched from querying version of 'objdump' to 'as' (e.g. the
      assembler).
    - Fixed the outputting of the version of 'as' on OS X, which required
      this beauty:
        ...=$(as -v /dev/null -o /dev/null 2>&1)
    - Only add sub-configs to blacklist if the sub-config hasn't already
      been added.

commit 088c474e629535affbe111f141f895af50d109be
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 10 18:09:56 2018 -0500

    Added support for blacklisting via the assembler.
    
    Details:
    - Added logic to configure that attempts to assemble various small files
      containing select instructions designed to reveal whether binutils
      (specifically, the assembler) supports emitting those instruction sets.
      This information provides additional opportunities to blacklist sub-
      configurations that are unsupported by the environment. Thanks to Devin
      Matthews for pointing me towards a similar solution in TBLIS as an
      example.
    - Various other cleanups in configure.
    - Reorganized the detection code in the 'build' directory, bringing the
      "auto-detect" configuration detection, libmemkind detection, and new
      instruction set detection codes into a single new subdirectory named
      'detect'.

commit 78a24e7dada52a3582f8488795bd1a44993989d9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 9 17:02:13 2018 -0500

    Updated bli_avx512_macros.h in knl and skx configs.
    
    Details:
    - Downloaded updated version of bli_avx512_macros.h from TBLIS [1] in
      attempt to address issue #192.
      [1] https://github.com/devinamatthews/tblis/

commit 388f64d6ade14caa4a6c286845ad2d565378b2bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 9 15:33:10 2018 -0500

    Fixed failure to honor CC= argument to configure.
    
    Details:
    - Fixed a failure to observe the value of CC when selecting the compiler
      in configure. Thanks to Devangi Parikh for reporting this bug.
    - The semantics now also work for the CC environment variable. That is,
      if CC is set prior to running configure, that value is used, but will
      be overridden by specifying the CC= argument to configure. If the CC
      environment variable is not set, the CC= value is used. If neither the
      environment variable nor CC= are specified, then the choice is made
      internally to configure: first attempting to find gcc, then clang, and
      then cc.

commit 45fbe66b3e2ab92f0b4fdf437d57c5d06603803d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 9 14:01:08 2018 -0500

    Fixed libmemkind dependency for x86_64.
    
    Details:
    - Removed some old conditional code in config/knl/make_defs.mk that
      added -lmemkind to LDFLAGS if DEBUG_TYPE was not 'sde' and inserted
      code into common.mk that affirmatively filters out -lmemkind from
      LDFLAGS if DEBUG_TYPE is 'sde'. (Thanks to Dave Love for reporting
      this issue.) Other minor cleanups to neighboring code in common.mk.
    - Updated CRVECFLAGS in knl/make_defs.mk to be based on -march=knl,
      and then AVX-512 functionality is manually removed via various
      -mno-avx512* flags. Also, make the setting of CRVECFLAGS conditional
      on CC_VENDOR. Similar change to skx/make_defs.mk.
    - Comment/whitespace updates.

commit ca982148b3b419db063cad2fa74376ec383a5c80
Author: dnp <devangiparikh@gmail.com>
Date:   Sun Apr 8 21:27:10 2018 -0500

    Fixed bug in SKX sgemm microkernel. Modified SKX dgemm mircokernel to be consistent with the sgemm microkernel

commit bd0276752ccdd56ff897b1a5ae022f2ffe6e0b38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 6 18:51:43 2018 -0500

    Track separate ref kernel flags for each sub-config.
    
    Details:
    - Renamed CVECFLAGS variables in sub-configurations' make_defs.mk files
      to CKVECFLAGS.
    - Added default defintions of two new make variables to most sub-
      configurations' make_defs.mk files--CROPTFLAGS and CRVECFLAGS--
      which correspond to reference kernel analogues of the CKOPTFLAGS
      and CKVECFLAGS, which track optimization and vectorization flags for
      optimized kernels. Currently, two sub-configurations (knl and skx)
      explicitly set CRVECFLAGS to non-default values (using AVX2 instead of
      AVX-512 for reference kernels. Thanks to Jeff Hammond, whose feedback
      prompted me to make this change (issue #187).
    - Changed common.mk so that the get-refkern-cflags-for function returns
      the flags associated with the given sub-configuration's CROPTFLAGS
      and CRVECFLAGS (instead of CKOPTFLAGS and CKVECFLAGS).

commit b9aebce19480448817373e2df2b36bd090eae41a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 6 18:37:33 2018 -0500

    De-verbosify makefile fragment generation.
    
    Details:
    - Changed from -v1 to -v0 when calling gen-make-frag.sh from configure.
      The directory-by-directory recursive output didn't add much value to
      the user, so now we just echo a line for each top-level directory into
      which we will recurse (e.g. 'config', 'ref_kernels', 'frame', etc.).
      This also helps keep more interesting information (from earlier in the
      execution of configure) from scrolling out of the terminal window.

commit b549b91f26948991e13364f1f26a878da0f43aa0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 6 16:31:33 2018 -0500

    Added 64-bit integer support to BLAS test drivers.
    
    Details:
    - Updated the build system and BLAS test drivers to use 64-bit integers
      when BLIS is configured for 64-bit integers in the BLAS layer. Also
      updated blastest/Makefile accordingly. Thanks to Dave Love for
      reporting the need for this feature.
    - Added a 'check' target to blastest/Makefile so that the user can see
      a summary of the tests.
    - Commented out the initial definition of INCLUDE_PATHS in common.mk,
      which was used pre-monolithic header, back when BLIS needed paths to
      *all* headers, rather than just a select few. This line is no longer
      needed since the value of INCLUDE_PATHS is overwritten by a later
      definition limited to only the header paths that are needed now.

commit d39fa1c04265869bdf8b6f453076359eec2f3c59
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 5 19:38:35 2018 -0500

    Adjusted CFLAGS used to compile bli_cntx_ref.c.
    
    Details:
    - Removed CKOPTFLAGS and CVECFLAGS from the set of CFLAGS used to
      compile bli_cntx_ref.c for each configuration. This is necessary
      because the file defines functions like bli_cntx_init_skx_ref(),
      which are called during BLIS's initialization of the global kernel
      structure, potentially being executed by an architecture that lacks
      the instruction set used to compile the kernels for, in this example,
      skx, which would lead to an illegal instruction error. Thanks to
      Dave Love for reporting this issue.
    - Further adjusted CFLAGS used when compiling code in the 'config'
      directory (e.g. bli_cntx_init_skx.c) as well as code in 'frame' so
      as to avoid the aforementioned issue.

commit 08b123084d35680beab379012f8f5a5a8b44a443
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 5 14:25:39 2018 -0500

    Added color-coding to 'make check' output.
    
    Details:
    - Added color coding to output of check-blistest.sh, check-blastest.sh
      scripts. Success messages are coded green and failure are coded red.
      This helps draw the eye toward those messages as the 'make checkblis',
      'make checkblis-fast', and 'make checkblas' targets are executed.
    - Changed top-level Makefile so that execution will not halt if
      'checkblis', 'checkblis-fast', or 'checkblas' targets fail, which
      means that the second of the two tests (BLIS and BLAS) run by
      'make check' will run even if the first test fails.

commit c9e4d7db7410b03c1ffe8c9727e9f1b2ba7fecfe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 4 17:13:15 2018 -0500

    CHANGELOG update (0.3.1)

commit 1f28d7c86e17730f05bd239c8e8d67e3e7510a4f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 4 17:13:15 2018 -0500

    Version file update (0.3.1)

commit e6cc9ee26bcf0450f1120d5d12985b04d9fb8516
Merge: 786d15c5 3c91c7ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 4 16:08:18 2018 -0500

    Merge branch 'dev' of github.com:flame/blis into dev

commit 786d15c5ef09f1f647b126b63d57e76d5810c58e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 4 16:06:47 2018 -0500

    Added skx, knl to x86_64 configuration family.
    
    Details:
    - Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration
      family in the config_registry file.
    - Added logic to configure that avoids committing certain sub-configs to
      the configuration/kernel registries if those sub-configs cannot be
      handled properly by the chosen compiler. (This was modeled after
      similar logic in TBLIS's configure; thanks to Devin Matthews for
      pointing this out.) First, the compiler and its version are inspected
      and, based on the results, certain configurations are added to a
      "blacklist". Then, as the configuration registries are being created,
      configurations and/or kernels that match items in the blacklist are
      skipped over and not commited to the registries. Under certain
      circumstances, omitting a blacklisted configuration will indirectly
      invalidate other configurations due to the loss of availability of
      the original blacklisted configuration's kernel set. This additional
      indirect blacklist is also accounted for.
    - Added output to the beginning of configure that echos information
      about the chosen compiler as well as the configurations that are
      blacklisted and must be stripped from the registries.
    - Various other cleanups in configure, especially with respect to
      explicitly declaring local variables in functions.
    - Comment updates to config/zen/make_defs.mk regarding choice of -march
      flags based on compiler version.

commit 3c91c7aebafb446a2582267beb3b22c8bb475b3b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 2 12:40:25 2018 -0500

    Fixed 64b type mismatch warning in cblas_xerbla.c.
    
    Details:
    - Fixed a compiler warning concerning a type mismatch between the
      format specifier of the printf() call in cblas_xerbla.c and its
      corresponding (info) argument. The warning manifested when the CBLAS
      layer was enabled and the BLAS/CBLAS integer type siwas is set to 64
      (the default is 32). The warning was fixed by changing the specifier
      from %d to %jd and typecasting the argument to intmax_t. Thanks to
      Dave Love for reporting this issue and submitting the patch.

commit 71eaf449a812fe2bd640d21513ec83974b2edb45
Merge: 6a628184 ae9a5be5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 27 17:21:43 2018 -0500

    Merge branch 'dev'

commit ae9a5be56d6f9b87278d6032154d2dcf3fb7d54f
Author: dnp <devangiparikh@gmail.com>
Date:   Tue Mar 27 17:01:23 2018 -0500

    Fixed bug in skx sgemm microkernel

commit 3f02af0905b1e2e2e065862f8afe5e9a52f282b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 26 17:40:04 2018 -0500

    Row storage optimizations to zen dotxf kernels.
    
    Details:
    - Split the main loop bodies of zen's [sd]dotxf kernels into two cases:
      one to handle a column-stored matrix A and one to handle a row-stored
      matrix A. This allows vector instructions to be employed even if A is
      stored by rows (and A^T appears stored as columns). Both storage cases
      use a common edge case loop. Thanks to Devin Matthews for this idea
      and for prototyping the change needed for sdotxf kernel.

commit 679dcc331dd870ec680e135a3fb65ffa6e3a91c2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 26 15:35:17 2018 -0500

    Make k_iter/k_left uint64_t in bulldozer fma ukrs.
    
    Details:
    - Changed the declaration of k_iter and k_left for d, c, z microkernels
      from dim_t to uint64_t. This is needed to ensure compatibility with
      the movq instruction used to load the value into registers. This
      change should have been made a long time ago, but for some reason
      only recently began showing up via Travis CI.

commit 6a628184f6938673440e4cdd4fed0208c51fd1f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 26 14:48:16 2018 -0500

    Fixed a memkind-related compile-time bug on knl.
    
    Details:
    - Fixed a compile-time error that occurred due to the fact that
      BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
      soon enough to be used in bli_system.h where it is needed to determine
      whether hbwmalloc.h should be #included. bli_system.h is now included
      after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
      for reporting this issue.
    - Tweaked the language used by configure to echo the status of the
      --with[out]-memkind option.

commit e2192a8fd58ec3657434ddd407033e097edad8f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 23 12:53:48 2018 -0500

    Removed vzeroupper intrinsics from zen kenels.
    
    Details:
    - Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a
      vzeroupper instruction destoryed part of the intermediate result
      stored by the vdpps instructions that came right before. (The
      vzeroupper instrinsic was removed.)
    - Removed remaining vzeroupper instrinsics from other zen kernels.
      Previously, the vzeroupper instructions were included because BLIS is
      typically compiled with -mfpmath=sse. But it was brought to my
      attention that inserting these vzeroupper instructions is unnecessary
      for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar
      code rather than literal SSE instructions, and (b) compilers already
      (likely) insert vzeroupper instructions where necessary. Thanks to
      Devin Matthews for zeroing in on the dotxf bug.
    - Removed -malign-double from bulldozer make_defs.mk. This alignment
      was already happening by default since bulldozer is an x86_64 system.

commit 22289ad23cd10b81451ce82f60d84b5f97e7fd85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 22 18:21:30 2018 -0500

    Added build system support for libmemkind.
    
    Details:
    - Added support for libmemkind to configure. configure attempts to
      detect the presence of libmemkind by compiling a small program
      containing #include <hbwmalloc.h> and a call to hbw_malloc(). If
      successful, it is assumed that libmemkind is present and available.
      If present, use of libmemkind is enabled by default, and otherwise
      use is disabled by default. If libmemkind is present, the user may
      explicitly disable use of the library by running configure with the
      --without-memkind option. Furthermore, a configuration may disable
      libmemkind, perhaps conditional on some aspect of the build system,
      by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
      make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
      set in config.mk, to 'no'. (The knl configuration makes use of this
      latter feature; see below.)
    - If enabled at configure-time, bli_system.h will #include <hbwmalloc.h>
      and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
      BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
    - Deprecated explicit use of BLIS_NO_HBWMALLOC in
      config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
      config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
      (#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
      would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
      variable to 'no'.
    - common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.

commit 7dc40eafdd9af3e8c4519a8d1b04d25830b4ca7a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 21 18:39:16 2018 -0500

    Updates to top-level and test driver Makefiles.
    
    Details:
    - Added logic to common.mk that will choose a BLIS library against which
      to link (LIBBLIS_LINK). The default choice is the static (.a) library;
      the shared (.so) library is chosen only if the shared library build was
      enabled and the static one was disabled.
    - Updated the various test driver Makefiles to reference this common,
      pre-chosen library against which to link. (Previously, these drivers
      unconditionally linked against the static library and would have
      failed if the static library build was disabled at configure-time.)
    - Renamed many of the variables in common.mk and the top-level Makefile
      so that variables relating to the libblis.[a|so] files, including
      paths to those files, begin with "LIBBLIS".
    - Shuffled around some of the library definitions from the top-level
      Makefile to common.mk.
    - Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
      the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
      and in configure.
    - A few other cleanups in the top-level Makefile.

commit 97e1eeade3c51df1bae574a9bc1da34b05bf2bd3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 21 15:47:11 2018 -0500

    Added input.operations.fast file for 'make check'.
    
    Details:
    - Added an 'input.operations.fast' file to testsuite directory to go
      along with the 'input.general.fast' file used by the 'make check'
      target in the top-level Makefile. This will allow the "fast" check
      to prune operations and/or parameter combinations from the test
      space in order to save time.
    - Currently, input.operations.fast prunes trmm3 and all transposition
      and conjugation parameters from the level-3 test space.
    - Reduced problem size tested in input.general.fast to 100 and disabled
      testing of 1m method.

commit c441caa95aabe69f54e2160eb67bf4ca76a66c34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 20 17:56:02 2018 -0500

    README update.
    
    Details:
    - Minor updates to README.md.
    - Minor change to blastest/Makefile.

commit 6fe018eb4ac8c16f2edc916c24f5994848017b7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 20 15:35:45 2018 -0500

    Added .gitkeep file to blastest/obj.
    
    Details:
    - Added an empty file named '.gitkeep' to blastest/obj/ so that git will
      track the otherwise empty directory. (This is already done for the BLIS
      testsuite in testsuite/obj.)

commit 0e6d000db9291342913dc5f8590a28c67bbcbc95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 20 15:08:43 2018 -0500

    Updated .gitignore to ignore BLAS test out.* files.

commit 40c040a31d96fbadff11f761d0cad1ef03ef2cc5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 20 14:33:50 2018 -0500

    Fixes to .travis.yml.
    
    Details:
    - Invoke the full BLIS testsuite via 'make testblis' instead of the fast
      version via 'blistest-fast' (which was wrong anyway, since the correct
      fast traget is 'testblis-fast').
    - Invoke the BLAS tests via 'make testblas' instead of 'blastest'.

commit 664ec4813d8b53121cce7a68bef47da656ece9cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 20 13:54:58 2018 -0500

    Integrated f2c'ed netlib BLAS test suite.
    
    Details:
    - Created a new test suite that exercises only the BLAS compatibility
      found in BLIS. The test suite is a straightforward port of code
      obtained from netlib LAPACK, run through f2c and linked to a stripped-
      down version of libf2c that is compiled along with the test drivers
      (to prevent any obvious ABI issues). The new BLAS test suite can be
      run from within its new local directory, 'blastest' (through its local
      'make ; make run' targets) or from the top-level Makefile (via the
      'make testblas' target). Output files are created in whatever directory
      the test drivers are run, whether it be the 'blastest' directory, the
      top-level source distribution directory, or the out-of-tree directory
      in which 'configure' was run. Also, the results of the BLAS test suite
      can be checked via 'make checkblas', which summarizes the presence or
      absence of test failures in a single line printed to stdout.
    - Updated the 'test' target to run both 'testblis' and 'testblas'.
    - Added a new 'testblis-fast' target that runs the BLIS testsuite with
      smaller problem sizes, allowing it to finish more quickly.
    - Added a 'make check' target, which runs 'checkblis-fast' and
      'checkblas'.
    - Changed .travis.yml so that Travis CI runs 'testblis-fast' instead of
      'testblis' before (calling the check-blistest.sh script to check the
      result manually).
    - Renamed some targets in the top-level Makefile to be consistent between
      BLAS and BLIS.

commit fc53ad6c5b2e39238b1bbbf625cc0c638b9da4e1
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Mar 19 12:49:26 2018 +0530

    Re-enabling the small matrix gemm optimization for target zen
    
    Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit d12d34e167d7dc32732c0ed135f8065a55088106
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Mar 19 11:34:32 2018 +0530

    Re-enabling Zen optimized cache block sizes for config target zen
    
    Change-Id: I8191421b876755b31590323c66156d4a814575f1

commit 40fa10396c0a3f9601cf49f6b6cd9922185c932e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 19 18:19:43 2018 -0500

    Fixed a few obscure bugs in the BLAS API.
    
    Details:
    - Fixed a missing parameter in the definition of sdsdot_(). The 'sb'
      argument was missing. Strangely, the argument is omitted from dsdot_()
      in the BLAS API.
    - Fixed the missing 'c' or 'u' in the "?gerc" or "?geru" operation string
      passed to xerbla_() by the bla_ger_check() macro.
    - For bla_syrk_check() and bla_syr2k_check() macros, only allow
      conjugate-transpose (trans='c') as a valid argument for the real
      domain functions [sd]syrk_() and [sd]syr2k_(). (Previously, the
      argument was allowed even for the complex domain equivalents, which
      was inconsistent with the BLAS API.)

commit fe7d7f1e43e4c26249eed83d4188beee1ba96202
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 18 19:43:06 2018 -0500

    Fixed cpp macro parameter "ch" typo in bla_ger.c.
    
    Details:
    - Previously, the BLAS routine-generating macro in bla_ger.c was
      incorrectly passing MKSTR(ch) into the _check() macro when it
      should have been passing in the char that was available, chxy.
      I've instead changed the name of the macro parameter from chxy
      to ch. Similar change as made to bla_ger.h for consistency.
      Thanks to Dave Love in helping track this down. (NOTE: This is
      actually the root cause of the bug that was first patched by
      increasing the length of the operation name strings passed into
      xerbla_(), as defined by the constant BLIS_MAX_BLAS_FUNC_STR_LENGTH,
      in 3d1a5a7. In theory, that change could be backed out now.)
    - Applied aforementioned chxy->ch change to bla_dot.[ch], as well as
      frame/compat/cblas/f77_sub/f77_dot_sub.[ch] (not because it needed
      to happen, but for naming consistency).
    - Reformatted function signatures/prototypes of CBLAS functions and
      function calls to BLAS in frame/compat/cblas/f77_sub/*.c.

commit cb7ed90752d1ddbac11368c4510641ca4f3a02eb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 16 13:05:56 2018 -0500

    Convert op names to uppercase before calling xerbla_().
    
    Details:
    - Defined a new function, bli_string_mkupper(), that calls toupper() on
      every non-NULL character in a string.
    - Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3
      BLAS _check() macros. This prevents the BLAS testsuite from complaining
      that the operation name (e.g. "dgemm") does not match the expected
      value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue.

commit 3d1a5a7c08fed3ba29f060fe1db2b0dc42dde223
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 16 12:24:07 2018 -0500

    Fixed printf() format overflow.
    
    Details:
    - Increased the length of operation name strings passed to xerbla_() in
      the level-2 and level-3 operation _check() functions, found in
      frame/compat/check. This avoids a format specifier overflow warning by
      gcc 7. Thanks to Dave Love for reporting this issue and suggesting the
      fix.

commit c73055f028684d998e03b2392093c393782bbfe7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 15 16:08:21 2018 -0500

    Return after non-zero info in BLAS checks.
    
    Details:
    - Previously, when calling the BLAS compatibility layer, discovering a
      parameter check failure would result in the proper setting of the
      info parameter (printed by xerbla_()), but would also come with an
      immediate abort() rather than a return. This was incorrect behavior
      for two overlapping reasons.
      (1) BLAS should return gracefully to the caller in the event of a
          bad set of parameters, not abort().
      (2) When BLIS was being tested via the BLAS testsuite, BLIS's
          xerbla_() would correctly get preempted/overridden by the
          xerbla_() in the BLAS testsuite, but execution would then
          erroneously continue on to the BLIS implementation with bad
          parameter values.
    - The previous issue was addressed by disabling the abort() in BLIS's
      xerbla_(), changing all of the BLAS _check() functions to cpp macros,
      and adding a return statement to the end of each _check() macro's
      "if ( info != 0 )" conditional.
      Thanks to Dave Love for reporting this issue.

commit c4f1d18b97a6a8c3ea0366aa759db597a664062a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 14 19:10:09 2018 -0500

    Minor typo fix to printing arch in testsuite.
    
    Details:
    - Mistakenly was calling bli_cpuid_query_id() instead of
      bli_arch_query_id() in the recent addition to the testsuite output
      that prints the active sub-configuration. The former function is
      only used for multi-architecture builds, whereas the latter is the
      more general option that also works for single configuration
      (including 'configure auto') builds.

commit 8f2fabec800a720b3e94b33c0048cc8c4ead436d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Mar 14 17:43:42 2018 -0500

    Make arm32 and arm64 families work. (#176)

commit fc6a1842518a0820c6708c285611346d5a1419da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 14 15:31:17 2018 -0500

    Print sub-configuration name in testsuite output.
    
    Details:
    - Added a line to the testsuite output that prints the name of the
      current/active sub-configuration. This is useful when linking the
      testsuite against multi-configuration builds because it confirms
      the sub-configuration that is actually being employed at runtime.
      Thanks to Devin Matthews for suggesting this feature.

commit 9943a899d64bf7ec4a24106f6f4c70629bbe1f6e
Merge: 290dd4a9 b1a15ae6
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Mar 14 13:27:44 2018 -0500

    Merge pull request #173 from devinamatthews/dev
    
    Fix Cortex-A9 and Cortex-A15 configs.

commit b1a15ae6ee0f46c9a95cf59f9555925e0e8e21ff
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Mar 14 13:26:44 2018 -0500

    Use BLIS_H_FLAT

commit 290dd4a9feee447e69b40ad108954af78e196f7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 14 13:15:37 2018 -0500

    Allow arbitrarily deep configuration families.
    
    Details:
    - Updated configure so that configuration families specified in the
      config_registry are no longer constrained as being only one level
      deep. For example, previously the x86_64 family could not be defined
      concisely in terms of, say, intel64 and amd64 families, and instead
      had to be defined as containing "haswell, sandybridge, penryn, zen,
      etc." In other words, families were constrained to only having
      singleton configurations as their members. That constraint is now
      lifted.
    - Redefined x86_64 family in config_registry in terms of intel64 and
      amd64.

commit 9cee78e006d56543ac02fc9c488905c0434e60ae
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Mar 14 13:09:48 2018 -0500

    Fix Cortex-A9 and Cortex-A15 configs.
    
    Tested with QEMU.

commit 1a3031740f7fcbbcc2c99d5c4cb50d0413407455
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 13 16:04:40 2018 -0500

    Updates to ARM hardware detection support.
    
    Details:
    - Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c.
      Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit)
      sub-configurations are supported. However, the functions that detect
      features specific to a15 and a9 are identical, and since a15 is tested
      first, it will always be chosen for arm32 hardware (even if both
      sub-configurations were enabled at configure-time and the library is
      linked and run on an a9). Thus, more work needs to be done to
      distinguish these two.
    - Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either
      the x86_64 or ARM code will be compiled (or neither, if neither
      environment is detected).
    - In bli_arch_query_id(), call bli_cpuid_query_id() when the
      BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined.
    - Added arm64 and arm32 configuration families to config_registry.
    - Added a note to the arch_t typedef enum in bli_type_defs.h reminding
      the developer to update the string array in bli_arch.c whenever new
      enum values are added or existing values are reordered.

commit 1442d06886ebdc34d8f1cb620229ddc6062c2ce8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 11 16:59:50 2018 -0500

    Fixed misnamed kernels in _cntx_init_cortexa57.c.
    
    Details:
    - Changed incorrect kernel function names in bli_cntx_init_cortexa57.c:
        bli_sgemm_cortexa57_asm_8x12 -> bli_sgemm_armv8a_asm_8x12
        bli_dgemm_cortexa57_asm_6x8  -> bli_dgemm_armv8a_asm_6x8
      Thanks to Jacob Gorm Hansen for reporting this issue.

commit 28bcea37dfcf0eb99a99da6f46de2a2830393d1d
Merge: b1ea3092 8b0475a8
Author: praveeng <praveen.g@amd.com>
Date:   Fri Mar 9 19:13:08 2018 +0530

    Merge master code till 06_mar_2018 to amd-staging
    
    Change-Id: I12267e5999c92417e3715fef4f36ac2131d00f1a

commit 48da9f5805f0a49f6ad181ae2bf57b4fde8e1b0a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 7 12:54:06 2018 -0600

    Tweaked common.mk, Makefile, skx/knl make_defs.mk.
    
    Details:
    - Reorganized linker-related section of common.mk so that LDFLAGS set
      in a sub-configuration's make_defs.mk file will not be immediately
      (and erroneously) overridden by the default values.
    - Re-enabled redirected (to file) output of the testsuite when run from
      the top-level Makefile via 'make test'. (For some reason, it was
      commented-out for the non-verbose case.)
    - Removed old/unnecessary code from the make_defs.mk files of skx and
      knl sub-configurations.

commit 8b0475a87daa177916e2caac0e530c6a57fa07cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 6 06:39:44 2018 -0600

    Fixed typo in attempted fix in 1a8350f7.
    
    Details:
    - Mistakenly entered 148 as knl mc blocksize for double real when the
      value should have been 144. Thanks to Dave Love for reporting this.

commit 8912e6886b97eabb4ce0c35a3609a0fd994d347b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 5 18:00:45 2018 -0600

    Fixed missing flags during shared object build.
    
    Details:
    - Fixed a bug in common.mk that caused warning, position-independent
      code, miscellaneous, and general preprocessor flags to be omitted
      from the configuration family-specific variables that hold those
      values, as registered by the family's make_defs.mk file. This would
      most obviously manifest when targeting a configuration family such as
      'intel64' while simultaneously configuring for a shared object build,
      as the key '-fPIC' flag would be omitted at compile-time and prevent
      successful linking. Thanks to Dave Love for reporting this bug.
    - Other cleanups to common.mk for readability and clarity.

commit 1a8350f70557fc53ca0c2eadf2076710dd0d9bc9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 5 13:32:00 2018 -0600

    Fixed cache blocksize bug in knl configuration.
    
    Details:
    - Changed the mc blocksize for double real execution in the knl sub-
      configuration from 160 to 148. The old value was not a multiple of
      mr (which is 24), and thus the safeguards in bli_gks_register_cntx()
      were tripping. Thanks for Dave Love for reporting this issue.
    - Switch knl sub-configuration to use default blocksizes for datatypes
      not supported by native kernels.
    - Fixed typos in bli_error.c that prevented certain error strings
      (which report maximum cache blocksizes not being multiples of their
      corresponding register blocksize) from properly initializing.

commit c09fffa827fe6241dc20193a1c404496664220de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 3 13:13:39 2018 -0600

    Added missing cntx_t* arg in knl packm kernels.
    
    Details:
    - Added the missing cntx_t* argument to the function signature of packm
      kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this
      issue.

commit b1ea30925dff751eced23dfa94ff578a20ea0b94
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 23 17:42:48 2018 -0600

    CHANGELOG update (0.3.0)
    
    Change-Id: Id038b00a62de51c9818ad249651ec5dc662f4415

commit 1ef9360b1fd0209fbeb5766f7a35402fbd080fcb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 1 14:36:39 2018 -0600

    Enable non-unit vector stride tests by default.
    
    Details:
    - Change "vector storage schemes to test" parameter in testsuite's
      input.general file to "cj". This means that both unit stride column
      vectors and non-unit stride column vectors will be tested in
      operations with vector operands (e.g. level-1v, level-1f, level-2).
    - Very minor comment (typo) changes to input.operations.

commit 8c4e55a1a1ead9a5e970200fee027ffd2c7e8454
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 28 17:01:47 2018 -0600

    Added individual operation overrides in testsuite.
    
    Details:
    - Updated the testsuite driver so that setting one or more individual
      operation test switches to "2" in input.operations will enable ONLY
      those operations and disable all others, regardless of the values of
      the section overrides and other operation switches. This makes it
      every easy to quickly test only one or two operations, and equally
      easy to revert back to the previous combination of operation tests.
    - Added more comments to input.operations describing the use of
      individual "enable only" overrides.

commit 34862aed89e5d5a8f35aeecd49f3052ada1f337b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 28 15:30:14 2018 -0600

    Use zen kernels in haswell sub-configuration.
    
    Details:
    - Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv,
      dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf
      and dotxf. This works because these kernels simply target AVX/AVX2,
      and therefore work without modification on haswell hardware.
    - Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen
      kernels are essentially identical to those used by haswell, except that
      now zen kernels are a bit more up-to-date. In the future, I may
      continue to maintain duplicates, or I may keep the kernels named after
      one architecture (zen or haswell) but used by both sub-configurations.
    - In config_registry, enable use of both haswell and zen kernels for the
      haswell sub-configuration. This is necessary in order to make zen
      kernels visible when registering kernels in bli_cntx_init_haswell.c.
    - Enable use of assembly-based complex gemm microkernels for zen,
      bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in
      bli_cntx_init_zen.c. This was actually intended for 1681333.

commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 23 17:42:48 2018 -0600

    Version file update (0.3.0)

commit d9079655c9cbb903c6761d79194a21b7c0a322bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 23 17:42:48 2018 -0600

    CHANGELOG update (0.3.0)

commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 23 17:38:19 2018 -0600

    Applied 34b72a3 to non-active/unused microkernels.
    
    Details:
    - Applied the read-beyond-bounds bugfix in 34b72a3 to other haswell and
      zen kernels (ie: other microtile shapes) which are not used by default.
      This was done mostly in case someone decided to pick up these kernels
      and start using them, not because it affects BLIS's behavior
      out-of-the-box.

commit 34b72a351745aa0d47bb0b74ebcd0f0a616d613d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 23 16:33:32 2018 -0600

    Fixed obscure read-beyond-bounds bug in sgemm ukrs.
    
    Details:
    - Fixed an obscure bug in the bli_sgemm_haswell_asm_6x16 and
      bli_sgemm_zen_asm_6x16 microkernels when the input/output matrix C
      is stored with general stride (ie: both rs and cs are non-unit). The
      bug was rooted in the way those microkernels read from matrix C--
      namely, they used vmovlps/vmovhps instead of movss. By loading two
      floats at a time, even if one of them was treated as junk, the
      assembly code could be written in a more concise manner. However,
      under certain conditions--if m % mr == 0 and n % nr == 0 and the
      underlying matrix is not an internal "view" into a larger matrix--
      this could result in the very last vmovhps of the last (bottom-right)
      microkernel invocation reading beyond valid memory. Specifically, the
      low 32 bits read would always be valid, but the high 32 bits could
      reside beyond the bounds of the array in which the output C matrix is
      contained. To remedy this situation, we now selectively use movss to
      load any element that could be the last element in the matrix.

commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 23 14:31:26 2018 -0600

    Added missing 'restrict' to some kernels' cntx_t*.
    
    Details:
    - Added missing 'restrict' keyword to cntx_t* argument of function
      signatures corresponding to level-1v, level-1f, and level-1m kernels.
      This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and
      bli_l1m_ker_prot.h. (The 'restrict' was already being used to
      qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.)
    - Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and
      bli_l3_ukr.h that help explain how those headers function to produce
      kernel prototypes using the prototype macros defined in the files
      mentioned above.

commit 1fa8af95d807168e0849adb668492601e7009be0
Merge: c084b03b 16813335
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 21 17:54:02 2018 -0600

    Merge branch 'rt'

commit c084b03b31d84427a120e391963db5419f1911ee
Merge: 5d03b6e6 fa74af4e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 21 17:52:17 2018 -0600

    Merge branch 'rt'

commit 16813335bdb5978bc9a26cd00a32bd5a130130c4
Merge: fa74af4e 5a7005dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 21 17:43:32 2018 -0600

    Merge branch 'amd' into rt
    
    Details:
    - Merged contributions made by AMD via 'amd' branch (see summary below).
      Special thanks to AMD for their contributions to-date, especially with
      regard to intrinsic- and assembly-based kernels.
    - Added column storage output cases to microkernels in
      bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
      the extra cost of transposing the microtile in registers, this is
      much faster than using the general storage case when the underlying
      matrix is column-stored.
    - Added s and d assembly-based zen gemmtrsm_u microkernel (including
      column storage optimization mentioned above).
    - Updated zen sub-configuration to reflect presence of new native
      kernels.
    - Temporarily reverted zen sub-configuration's level-3 cache blocksizes
      to smaller haswell values.
    - Temporarily disabled small matrix handling for zen configuration
      family in config/zen/bli_family_zen.h.
    - Updated zen CFLAGS according to changes in 1e4365b.
    - Updated haswell microkernels such that:
      - only one vzeroupper instruction is called prior to returning
      - movapd/movupd are used in leiu of movaps/movups for double-real
        microkernels. (Note that single-real microkernels still use
        movaps/movups.)
    - Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
      now included via frame/include/bli_arch_config.h.
    - Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
      in testsuite/src/test_amaxv.c).
    - Added early return for alpha == 0 in bli_dotxv_ref.c.
    - Integrated changes from f07b176, including a fix for undefined
      behavior when executing the 1m method under certain conditions.
    - Updated config_registry; no longer need haswell kernels for zen
      sub-configuration.
    - Tweaked marginal and pass thresholds for dotxf.
    - Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
      comments.
    - Updated LICENSE file to explicitly mention that parts are copyright
      UT-Austin and AMD.
    - Added AMD copyright to header templates in build/templates.
    
    Summary of previous changes from 'amd' branch.
    - Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
      s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
    - Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
      and scalv, with extra-unrolling variants for axpyv and scalv.
    - Added a small matrix handler to bli_gemm_front(), with the handler
      implemented in kernels/zen/3/bli_gemm_small_matrix.c.
    - Added additional logic to sumsqv that first attempts to compute the
      sum of the squares via dotv(). If there is a floating-point exception
      (FE_OVERFLOW), then the previous (numerically conservative) code is
      used; otherwise, the result of dotv() is square-rooted and stored as
      the result. This new implementation is only enabled when FE_OVERFLOW
      is #defined. If the macro is not #defined, then the previous
      implementation is used.
    - Added axpyv and dotv standalone test drivers to test directory.
    - Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
    - Added thread-local and __attribute__-related macros to bli_macro_defs.h.

commit 5d03b6e6e19d5a07f0cccf1a158f02fbd62dfd99
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Feb 19 11:31:30 2018 -0600

    Fix asm macro include line for KNL. Fixes #167.

commit f07b176c84dc9ca38fb0d68805c28b69287c938a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 15 18:36:54 2018 -0600

    Fixed an obscure bug in the 1m implementation.
    
    Details:
    - Fixed a bug in the way the bli_gemm1m_cntx_ref() function (defined in
      ref_kernels/bli_cntx_ref.c) initializes its context for 1m execution.
      Previously, the function probed the context that was in the process of
      being updated for use with 1m--this context being previously
      initialized/copied from a native context--for its storage preference
      to determine which "variant" (row- or column-oriented) of 1m would be
      needed. However, the _cntx_ref() function was not updating the method
      field of the context until AFTER this query, and the conditional which
      depended on it, had taken place, meaning the storage preference query
      function would mistakenly think the context was for native execution,
      since the context's method field would still be set to BLIS_NAT. This
      would lead it to incorrectly grab the storage preference of the complex
      domain microkernel rather than the corresponding real domain
      microkernel, which could cause the storage preference predicate to
      evaluate to the wrong value, which would lead to the _cntx_ref()
      function choosing the wrong variant. This could lead to undefined
      behavior at runtime. The method is now explicitly set within the
      context prior to calling the storage preference query function.
    - Updated comments in frame/ind/oapi/bli_l3_3m4m1m_oapi.c.
    - Fixed a typo in the commented-out CFLAGS in config/zen/make_defs.mk,
      which are appropriate for gcc 6.x and newer. (Mistakenly used
      -march=bdver4 instead of -march=znver1.)

commit 1f94bb7b96eb2b67257e6c4df89e29c73e9ab386
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 19 12:46:53 2018 -0600

    Document how to enable zen-specific instructions.
    
    Details:
    - Added as a comment in config/zen/make_defs.mk the list of compiler flags
      that could be added to manually enable the instructions provided by the
      Zen microarchitecture that are not already implied by -march=bdver4.
      This information, along with the previous commit's flags to selectively
      disable Bulldozer instructions no longer present in Zen, was gathered
      from [1]. I hesitate to enable use of these instructions since I don't
      have any Zen hardware to test on yet.
      [1] https://wiki.gentoo.org/wiki/Ryzen

commit 1e4365b21bafa02bd108c5ac4705a25671fb9441
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 18 12:03:51 2018 -0600

    Augment zen CFLAGS to prevent illegal instruction.
    
    Details:
    - Added various compiler flags (-mno-fma4 -mno-tbm -mno-xop -mno-lwp) so
      that compiling with -march=bdver4 on zen-based architectures does not
      result in an illegal instruction error at runtime. Note: This fix is
      only needed for gcc 5.4; gcc 6.3 or later supports the use of
      -march=znver1, which can be used in lieu of the augmented set of flags
      based on bdver4. Thanks to Nisanth Padinharepatt for reporting this
      error.

commit fa74af4e1fa7385ac3f3089fe1ea7bb88c906029
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 9 13:43:15 2018 -0600

    Minor labeling update for './configure -c' output.
    
    Details:
    - Print the name of the configuration in the output of the
      kernel-to-config map (and chosen pairs list) as a subtle way to remind
      the user that these only apply to the targeted configuration (whereas
      the config list and kernel list are printed without regard to which
      configuration was actually targeted).

commit 5cdea756c7391e2c6cbfb38436ef9a205f860237
Merge: 9d8858b5 1e7a4896
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jan 7 19:45:20 2018 -0600

    Merge branch 'rt'

commit 9d8858b5cff4a4b078b87872847a5710073fff0a
Merge: 0b3ca3cf f7df64da
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sun Jan 7 10:03:25 2018 -0600

    Merge pull request #164 from devinamatthews/master
    
    Don't use memkind for skx configuration.

commit f7df64daf6bbe6431effada6e13d8d1fab5aa221
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sun Jan 7 09:37:25 2018 -0600

    Don't use memkind for skx configuration. Fixes #163.

commit 1e7a4896e0cbe73c4685fa956278e3f28273cdf9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 5 12:33:48 2018 -0600

    Minor error handling in update-version-file.sh.
    
    Details:
    - Added explicit handling of situations when 'git describe --tags'
      returns an error. This command is used by update-version-file.sh
      when deciding whether or not to update the version file prior to
      configuration.
    - Removed bli_packm.c and bli_unpackm.c, as they contained no source
      code.

commit 0b3ca3cfb682715a3686fd93ebb10d4a695d1162
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 4 20:51:35 2018 -0600

    Intelligently select compiler for auto-detection.
    
    Details:
    - Rewrote code that selects the compiler for the purposes of compiling
      the auto-detection executable. CC (if specified) is tried first. Then
      gcc. Then clang. The absolute fallback is cc. The previous code was
      sort of broken, and seemed to unintentionally always use gcc.
    - Moved various configuration-agnostic flags from config/*/make_defs.mk
      files to common.mk. The new mechanism appends the configuration-
      agnostic flags to the various compiler flag variables initialized in
      make_defs.mk. Flags specific to the sub-configuration are still set
      in make_defs.mk.
    - Added -Wno-tautological-compare to CMISCFLAGS when clang is in use.
      Also added the flag to the compiler instantiation during configure-
      time hardware detection (when clang is selected).
    - Added some missing (but mostly-optional) quotes to configure script.

commit 5a7005dd44ed3174abbe360981e367fd41c99b4b
Merge: 7be88705 3bc99a96
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Wed Jan 3 12:05:12 2018 +0530

    Merge changes in AMD beta release 0.95 into amd branch

commit 0b9c5127e91508c115228ca604ee2dac8de8f477
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Dec 23 15:53:44 2017 -0600

    Enabled C99, added stdint.h to auto-detect build.
    
    Details:
    - Added "-std=c99" to compiler arguments when building auto-detection
      driver in configure script.
    - Added #include <stdint.h> to all three source files needed by auto-
      detection program.

commit 0ce5e19c318e04909d3e664d69accb3a0fc6b988
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Dec 23 15:32:03 2017 -0600

    Reimplemented configure-time hardware detection.
    
    Details:
    - Reimplemented the hardware detection functionality invoked when running
      "./configure auto". Previously, a standalone script in build/auto-detect
      that used CPUID was used. However, the script attempted to enumerate all
      models for each microarchitecture supported. The new approach recycles
      the same code used for runtime hardware detection introduced in 2c51356.
      This has two immediate benefits. First, it reduces and consolidates the
      code required to detect microarchitectures via the CPUID instruction.
      Second, it provides an indirect way of testing at configure-time the
      code that is used to detect hardware at runtime. This code is (a) only
      activated when targeting a configuration family (such as intel64 or
      amd64) at configure-time and (b) somewhat difficult to test in
      practice, since it relies on having access to older microarchitectures.
    - The above change required placing conditional cpp macro blocks in
      bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include
      a bare-bones set of headers that does not rely on the presence of a
      bli_config.h header. This is needed because bli_config.h has not been
      created yet when configure-time auto-detection takes places.
    - Defined a new function in bli_arch.c, bli_arch_string(), which takes
      an arch_t id and returns a pointer to a string that contains the
      lowercase name of the corresponding microarchitecture. This function
      is used by the auto-detection script to printf() the name of the
      sub-configuration corresponding to the detected hardware.

commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 21 19:22:57 2017 -0600

    Added option to disable pack buffer memory pools.
    
    Details:
    - Added a new configure option, --[en|dis]able-packbuf-pools, which will
      enable or disable the use of internal memory pools for managing buffers
      used for packing. When disabled, the function specified by the cpp
      macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed
      (and BLIS_FREE_POOL is called when the buffer is ready to be released,
      usually at the end of a loop). When enabled, which was the status quo
      prior to this commit, a memory pool data structure is created and
      managed to provide threads with packing buffers. The memory pool
      minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls
      BLIS_MALLOC_POOL), but does so through a somewhat more complex
      mechanism that may incur additional overhead in some (but not all)
      situations. The new option defaults to --enable-packbuf-pools.
    - Removed the reinitialization of the memory pools from the level-3
      front-ends and replaced it with automatic reinitialization within the
      pool API's implementation. This required an extra argument to
      bli_pool_checkout_block() in the form of a requested size, but hides
      the complexity entirely from BLIS. And since bli_pool_checkout_block()
      is only ever called within a critical section, this change fixes a
      potential race condition in which threads using contexts with different
      cache blocksizes--most likely a heterogeneous environment--can check
      out pool blocks that are too small for the submatrices it wishes to
      pack. Thanks to Nisanth Padinharepatt for reporting this potential
      issue.
    - Removed several functions in light of the relocation of pool reinit,
      including bli_membrk_reinit_pools(), bli_memsys_reinit(),
      bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool().
    - Updated the testsuite to print whether the memory pools are enabled or
      disabled.

commit 107801aaae180c00022f1b990bc59038c14949d2
Merge: d9c05745 0084531d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 18 16:29:28 2017 -0600

    Merge branch 'master' into selfinit

commit 0084531d3eea730a319ecd7018428148c81bbba7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Dec 17 18:58:25 2017 -0600

    Updated flatten-headers.py for python3.
    
    Details:
    - Modifed flatten-headers.py to work with python 3.x. This mostly
      amounted to removing print statements (which I replaced with calls
      to my_print(), a wrapper to sys.stdout.write()). Thanks to Stefan
      Husmann for pointing out the script's incompatibility with python 3.
    - Other minor changes/cleanups.

commit 90b11b79c302f208791bdfb1ed754873103c7ce5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Dec 17 17:34:32 2017 -0600

    Modest performance boost to flatten-headers.py.
    
    Details:
    - Updated flatten-headers.py to pre-compile the main regular expression
      used to isolate #include directives and the header filenames they
      reference. The compiled regex object is then used over and over on
      each header file in the tree of referenced headers. This appears to
      have provided a 1.7-2x performance increase in the best case.
    - Other minor tweaks, such as renaming the main recursive function from
      replace_pass() to flatten_header().

commit 99dee87f30b4d437fa6b5e4ba862526d07b9f08b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Dec 17 16:47:27 2017 -0600

    Reimplemented flatten-headers.sh in python.
    
    Details:
    - Added flatten-headers.py, a python implementation of the bash script
      flatten-headers.sh. The new script appears to be 25-100x faster,
      depending on the operating system, filesystem, etc. The python script
      abides by the same command line interface as its predecessor and
      targets python 2.7 or later. (Thanks to Devin Matthews for suggesting
      that I look into a python replacement for higher performance.)
    - Activated use of flatten-headers.py in common.mk via the FLATTEN_H
      variable.
    - Made minor tweaks to flatten-headers.sh such as spelling corrections
      in comments.

commit d9c0574599c3f97c0f9b6c334a077bab9452e1f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 14 17:13:42 2017 -0600

    Allow travis failures of OS X builds that run testsuite.
    
    Details:
    - Added an allowance for OS X builds that run the testsuite to fail.
      There seems to be an issue with 1m when running in Travis CI under
      OS X and clang, but only in double-precision. Haven't been able to
      reproduce the error on my own, and thus, I can't debug it. (Hopefully
      it is simply a version-specific compiler bug.)

commit 86cd23b7379b00a42b4ecc04fa668f1e3f9b54ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 14 15:47:41 2017 -0600

    Fixed testsuite Makefile brokenness from 9091a207.
    
    Details:
    - Fixed a makefile error encountered when building the testsuite directly
      in its directory (as opposed to indirectly via 'make test'). The fix
      involves introducing a new variable, BUILD_PATH, alongside the existing
      DIST_PATH variable. By default, BUILD_PATH is set to the current
      directory, and is overridden by other Makefiles used by, for example,
      the testsuite and standalone test drivers in testsuite or test,
      respectively.
    - Some files/directories in common.mk were redefined in terms of
      BUILD_DIR, such as the locations of config.mk file and the intermediate
      include directory.

commit 6a3a8924c04d25507fc4aa593df30c56c7dc12f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 14 13:20:02 2017 -0600

    Temporarily show Makefile's testsuite output.
    
    Details:
    - Disabled redirection of testsuite output for 'test' target. This is
      part of an attempt to debug a segmentation fault on OS X via Travis.

commit 9a01080dd426915bed18229f70401bfa639dc283
Merge: 83316485 a32e8a47
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 14 11:27:19 2017 -0600

    Merge branch 'master' into selfinit

commit a32e8a47c022b6071302b2956af5728976c83ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 16:31:36 2017 -0600

    Added an exclusion to .travis.yml.
    
    Details:
    - Added exclusion for out-of-tree builds on OS X (clang).

commit b9f7d987df548965c86e16e0ba94d5cad0d9b399
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 16:22:09 2017 -0600

    Cleaned up after previous travis oot debugging.
    
    Details:
    - Removed debugging output from common.mk related to Travis CI
      out-of-tree builds.
    - Other minor cleanups to common.mk.

commit 9091a207aa8c49e279676ea02be533480b3b0d5a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 16:12:34 2017 -0600

    Attempted fix to travis oot build failure.
    
    Details:
    - Found the likely cause of the Travis CI out-of-tree build failures:
      config.mk was being read from DIST_PATH, rather than the current
      directory.

commit c01c71c33e236e6c91f5ddd3ec1e3faec89368c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 15:58:50 2017 -0600

    Added debugging output to Makefile.
    
    Details:
    - Added $(info ...) statements in key locations in an attempt to reveal
      why Travis CI doesn't like building BLIS out-of-tree.

commit 784289d69dd6b3692444d3b3e290f6a014465b72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 15:31:27 2017 -0600

    Updated SHELL in common.mk from /bin/bash to bash.

commit d9bb1d1d4ebc89ea75d9d927d09882162a914f77
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 15:27:54 2017 -0600

    Defined SHELL in common.mk so "echo -n" works.
    
    Details:
    - Defined the SHELL variable in common.mk as "/bin/bash" so that the
      -n option can be used with echo in the Makefile rule for flattening
      blis.h. Thanks to Devin Matthews for suggesting this fix.

commit 9289a08667df2044f3a37af54d893efe2b56d555
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 15:14:27 2017 -0600

    Attempt 3 on .travis.yml.

commit 720bfcf0ef54fdc41df0dcaa94503edb0d5c8972
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 14:52:28 2017 -0600

    More fixes to .travis.yml.
    
    Details:
    - Fixed a mistake (hopefully) in d0c4dd0 that resulted in many more
      osx/clang sub-tests than intended.
    - Shortened the variable names in an effort to make them more readable
      via the Travis CI web interface.

commit 8717c9c97fe9b1ecd3b3192049a73976f8390ca7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 14:36:37 2017 -0600

    Added 'pwd' commands to .travis.yml for debugging.
    
    Details:
    - Added 'pwd' commands to the script portion of the .travis.yml file in
      an attempt to uncover the problem with the recent out-of-tree build
      testing changes made in d0c4dd0.

commit 83316485ce10f6fcafe92a1c146282de0dd8068a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 13 14:14:50 2017 -0600

    Simplified/fixed self-initialization.
    
    Details:
    - Fixed a race condition in self-initialization whereby the bli_is_init
      static variable could be erroneously read as TRUE by thread 1 while
      thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
      use the library before it is actually ready. Thanks to to Minh Quan Ho
      and Devin Matthews for pointing out this issue.
    - Part of the solution to the aforementioned race condition was involved
      replacing the runtime initialization of the global scalar constants
      (e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
      initialization of those same constants. This eliminates the need for
      bli_const_init() altogether. (The static initialization is made concise
      via preprocess macros.)
    - Defined bli_gks_query_cntx_noinit(), which behaves just like
      bli_gks_query_cntx(), except that it does not call bli_init_once(). This
      function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
      bli_memsys_init() so as to not result in any recursion into
      bli_init_once().
    - Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
      They have no use in BLIS or its test products, and we have little reason
      to believe they are used by others.
    - Removed testsuite/out file, which was accidentally committed as part
      of 70640a3.

commit 6526d1d4ae6dbfa854ca8d1e5f224cd6ab3fa958
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 12 13:50:43 2017 -0600

    Added temp_dir argument to flatten-headers.sh.
    
    Details:
    - Added "temp_dir" argument to flatten-headers.sh so that the caller can
      specify where intermediate files should be created as the script runs.
    - Updated flatten-headers.sh to create intermediate files in temp_dir
      instead of alongside the corresponding source files. This should now
      (once again) allow out-of-tree builds where the BLIS distribution is
      read-only, or where the out-of-tree build is running concurrently with
      another out-of-tree build. (Thanks to Devin Matthews for pointing out
      the possibility of simultaneous out-of-tree builds.)

commit 94755017c967630daf2e31c1f63ed5e88ab0d6ab
Merge: d0c4dd00 5cf7b0c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 12 12:50:41 2017 -0600

    Merge branch 'master' of github.com:flame/blis

commit d0c4dd000ff38acc249e8acf7e0655a523991695
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 12 12:47:53 2017 -0600

    Added out-of-tree build test to .travis.yml file.
    
    Details:
    - Modified .travis.yml file to include an out-of-tree build test (using
      the "auto" configure target). Thanks to Devin Matthews for this
      suggestion.

commit 5cf7b0c4e52922069183a87dc2aa177419644e04
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Dec 12 12:38:48 2017 -0600

    Ignore blis.h.interm [ci skip]

commit 8d8ff74d15b4a584929cec36034ba6d3c53f7d27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 12 12:32:50 2017 -0600

    Further attempt to fix out-of-tree builds.
    
    Details:
    - Fix applied in 87978f6 was necessary but not sufficient to fix
      out-of-tree builds. It turns out that using a source tree that had
      already built the target erroneously gave the impression that
      out-of-tree builds were working again, when in fact they were still
      broken. The additional changes in this commit should complete the
      fix that was started in the aforementioned commit. Thanks to Devin
      Matthews and Shaden Smith for their help in isolating this issue.

commit 70640a37109290b57c344083c00624e13c496e30
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 11 17:18:43 2017 -0600

    Implemented library self-initialization.
    
    Details:
    - Defined two new functions in bli_init.c: bli_init_once() and
      bli_finalize_once(). Each is implemented with pthread_once(), which
      guarantees that, among the threads that pass in the same pthread_once_t
      data structure, exactly one thread will execute a user-defined function.
      (Thus, there is now a runtime dependency against libpthread even when
      multithreading is not enabled at configure-time.)
    - Added calls to bli_init_once() to top-level user APIs for all
      computational operations as well as many other functions in BLIS to
      all but guarantee that BLIS will self-initialize through the normal
      use of its functions.
    - Rewrote and simplified bli_init() and bli_finalize() and related
      functions.
    - Added -lpthread to LDFLAGS in common.mk.
    - Modified the bli_init_auto()/_finalize_auto() functions used by the
      BLAS compatibility layer to take and return no arguments. (The
      previous API that tracked whether BLIS was initialized, and then
      only finalized if it was initialized in the same function, was too
      cute by half and borderline useless because by default BLIS stays
      initialized when auto-initialized via the compatibility layer.)
    - Removed static variables that track initialization of the sub-APIs in
      bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
      bli_ind.c. We don't need to track initialization at the sub-API level,
      especially now that BLIS can self-initialize.
    - Added a critical section around the changing of the error checking
      level in bli_error.c.
    - Deprecated bli_ind_oper_has_avail() as well as all functions
      bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
      name. These functions had no use cases within BLIS and likely none
      outside of BLIS.
    - Commented out calls to bli_init() and bli_finalize() in testsuite's
      main() function, and likewise for standalone test drivers in 'test'
      directory, so that self-initialization is exercised by default.

commit 70a64432ee5a7adbee10fb7ff6d7b608c1940a7a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 11 13:14:20 2017 -0600

    Fixed off-by-one indexing in bli_cpuid.c.
    
    Details:
    - In bli_cpuid.c, fixed an off-by-one indexing statement in vpu_count()
      whereby a string-terminating NULL character, '\0', is written beyond
      the bounds of the model_num string.
    - Minor whitespace and formatting edits to bli_cpuid.c.

commit 87978f6261a080d261d01f9acf4e9cc18855c833
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 11 12:49:03 2017 -0600

    Fixed broken out-of-tree builds since 52f9e6f.
    
    Details:
    - Added missing $(DIST_PATH)/ prefix to relative path to flatten-headers.sh
      script in common.mk so that the script could be found during out-of-tree
      builds. Thanks to Devin Matthews for reporting this bug.

commit 513ef4d040f89a18dda5154e8c4cf1aaf7463999
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 11 12:35:59 2017 -0600

    Various typecasting fixes, mis-typed enums, etc.
    
    Details:
    - Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
    - Properly typecast integer arguments to match format specifier in various
      calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
      bli_util_oapi.c.
    - Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
      bli_cntx.h.
    - Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
      l1fkr_t or l1vkr_t).
    - Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
      value BLIS_GEMM_UKR in bli_cntx_ref.c.
    - NOTE: These issues were identified via compiler warnings when building
      BLIS with clang on a rather old installation of OS X:
        $ clang --version
        Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
        Target: x86_64-apple-darwin15.2.0
        Thread model: posix

commit 3bc99a96a3648f51b9acdc8a8c7e1cf4eb815459
Merge: 3a441183 78199c53
Author: prangana <pradeep.rao@amd.com>
Date:   Mon Dec 11 12:53:03 2017 +0530

    Fix merge conflicts after rebase with release branch
    
    Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630

commit 3a44118398955d6f872e01f73ae5bb4a4f8500f7
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Wed Nov 15 11:11:17 2017 +0530

    Added AMD copyright line to the changed files in last 3 commits
    
    Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66

commit 268a56c06e94d1c388766dbfe81d54efbe432809
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 1 11:51:41 2017 -0500

    Revert to default SIMD alignment for bulldozer.
    
    Details:
    - Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
      config/bulldozer/bli_kernel.h. Not sure where this value came from, but
      it would seem to allow for insufficient starting address alignment for
      any matrices created via bli_malloc_user(), such as via
      bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
      led us to this bug.
    - This commit is a manual patch of the same fix made to the 'rt' branch
      in 8f150f2.

commit 510a6863e28277f9446abfb77f1aea9f01d37e7a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Oct 30 10:04:42 2017 -0500

    Fix CVECFLAGS for bulldozer config.

commit c669716790bdda5d2b11ea0a026cbc121b228842
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Tue Oct 24 16:36:36 2017 +0530

    Adding __attribute__((constructor/destructor)) for CLANG case.
    
    CLANG supports __attribute__, but its documentation doesn't
    mention support for constructor/destructor. Compiling with
    clang and testing shows that it does support this.
    
    Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b

commit 24e64a9d0877d788357fc63d4b947e977f8697f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 18 13:41:25 2017 -0500

    Removed a duplicate bli_avx512_macros.h header.
    
    Details:
    - Removed a duplicate header file that was causing problems during
      installation for the 'knl' configuration. Thanks to Victor Eijkhout
      for reporting this issue.

commit 9c0a3c4c0260cbfefb9f11532f46508b4fd19ec2
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Oct 16 22:06:57 2017 +0530

    Thread Safety: Move bli_init() before and bli_finalize() after main()
    
    BLIS provides APIs to initialize and finalize its global context.
    One application thread can finalize BLIS, while other threads
    in the application are stil using BLIS.
    
    This issue can be solved by removing bli_finalize() from API.
    One way to do this is by getting bli_finalize() to execute by default
    after application exits from main().
    
    GCC supports this behaviour with the help of __attribute__((destructor))
    added to the function that need to be executed after main exits.
    
    Similarly bli_init() can be made to run before application enters main()
    so that application need not call it.
    
    Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac

commit 83f31253eb21c5ecd8a5907835e57720daae0b8b
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Oct 16 21:07:50 2017 +0530

    Thread safety: Make the global induced method status array local to thread
    
    BLIS retains a global status array for induced methods, and provides
    APIs to modify this state during runtime. So, one application thread
    can modify the state, before another starts the corresponding
    BLIS operation.
    
    This patch solves this issue by making the induced method status array
    local to threads.
    
    Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe

commit e923402e68029be379a4297de3ac6fb155ffd928
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Thu Sep 28 12:15:36 2017 +0530

    The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
    
    Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5

commit a64c15de19327c7595376d699be676c7003e850e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 26 19:02:53 2017 -0500

    Fixed a pthread typo in previous commit.
    
    Details:
    - Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.

commit 42dcd589c37e1a2473ab2e1539207da97aebc07f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 26 17:00:04 2017 -0500

    Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
    
    Details:
    - Fixed a bug in gemmtrsm test module that was due to improper partitioning
      into a k x k triangular matrix for the purposes of obtaining an mr x k
      micropanel of A with which to test.
    - Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
      very large k (depending on the product of mr x kc on that architecture).
      The bug arose from the fact that the test module was triggering the
      allocation of blocks from the internal memory pools, which are limited in
      size. This allocation imposes an implicit assumption that the micro-
      panel being tested with will fit inside, and this assumption is violated
      for large values of k. Arbitrarily large k may now be tested for both
      operation tests.
    - Added OpenMP/pthread critical sections around the setting or getting of
      statuses from the induced method operation lookup table in bli_l3_ind.c.
    - Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
    - Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
      issues.

commit 206beb68ff73b75f5c382413967aacbb8a0aac3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 9 14:10:15 2017 -0500

    Updated bibtex info for BLIS5 (3m4m) article.

commit 0c8c0363aeb1f4aa88f7ec2d02403dab05a6e014
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Mon Aug 28 16:44:42 2017 +0530

    Bug fix for the testsuite build failing
    
    Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77

commit 63d1c84465b50f64787808dd3e8494e683c16821
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Aug 23 13:01:14 2017 +0530

    Adding auto hardware detection for Zen
    
    Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf

commit 537fb2a895b09be94b11947696fd2da629be24dd
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Aug 15 10:02:25 2017 -0500

    Add vzeroupper to Intel AVX kernels.

commit 7628de3f76f78a44788807605a4601ddda445854
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 10 16:24:28 2017 -0500

    Removed trailing enum commas from bli_type_defs.h.
    
    Details:
    - Removed trailing commas from enums in bli_type_defs.h. Thanks to
      Erling Andersen for pointing out this inconsistency and suggesting
      the change.

commit a666fd4e267ffae3d4b21f38d569c61ff56adc9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 5 13:04:31 2017 -0500

    Added edge handling to _determine_blocksize_b().
    
    Details:
    - Added explicit handling of situations where i == dim to
      bli_determine_blocksize_b_sub(). This isn't actually needed by any
      current use case within BLIS, but handling the situation is nonetheless
      prudent. Thanks to Minh Quan for reporting this issue and requesting
      the fix.

commit 0c8afa546d7f33760415519ba328d7c49eb7aa06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 4 14:17:44 2017 -0500

    Fixed a minor bug in level-3 packm management.
    
    Details:
    - Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
      entries to be released and then re-acquired unnecessarily. (In essence,
      the "<" operands in the conditional that guards the
      release-and-reacquire code block simply needed to be swapped.) The bug
      should have only affected performance (rather than the computed result).
      Thanks to Minh Quan for identifying and reporting the bug.

commit 6cf68a185d83fa46d438fcef65258ace78e24b13
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 31 15:19:51 2017 -0500

    Change lsame_ signature to match lapacke.

commit 6a9bd97295cc4fb1cbcd28f69824a43c073c9a76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 29 20:17:05 2017 -0500

    Fixed pthreads compile bug with previous commit.
    
    Details:
    - Erroneously passed family parameter into l3int_t function despite
      that function not taking the parameter. Oops.

commit 95adc43d800431dc0a02ca83a51426dbef641ad6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 29 14:53:39 2017 -0500

    Moved 'family' field from cntx_t to cntl_t.
    
    Details:
    - Removed the family field inside the cntx_t struct and re-added it to the
      cntl_t struct. Updated all accessor functions/macros accordingly, as well
      as all consumers and intermediaries of the family parameter (such as
      bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
      change was motivated by the desire to keep the context limited, as much
      as possible, to information about the computing environment. (The family
      field, by contrast, is a descriptor about the operation being executed.)
    - Added additional functions to bli_blksz_*() API.
    - Added additional functions to bli_cntx_*() API.
    - Minor updates to bli_func.c, bli_mbool.c.
    - Removed 'obj' from bli_blksz_*() API names.
    - Removed 'obj' from bli_cntx_*() API names.
    - Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
      that operate only on a single struct to contain the "_node" suffix to
      differentiate with those routines that operate on the entire tree.
    - Added enums for packm and unpackm kernels to bli_type_defs.h.
    - Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
      They weren't being used and probably never will be.

commit a98e4aa547f61ab09dd91d11478c2a2ef9882e11
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 14:50:13 2017 -0500

    Clang can't make up it's mind what to support.

commit 32eb36c3e8c2add2528514272044de16faed0c8f
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 12:54:58 2017 -0500

    Add default #define for __has_extension.

commit 2a9aa134f7c29d3d4fdc160022ff257e61885a95
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 10:04:34 2017 -0500

    Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143.

commit 6f07a034d575e1e9e30bb6417b8fcb77cf301297
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 19 15:40:48 2017 -0500

    Updated ar option list used by all configurations.
    
    Details:
    - Dropped 'u' from the list of modifiers passed into the library archiver
      ar. Previously, "cru" was used, while now we employ only "cr". This
      change was prompted by a warning observed on Ubuntu 16.04:
    
        ar: `u' modifier ignored since `D' is the default (see `U')
    
      This caused me to realize that the default mode causes timestamps to be
      zero, and thus the 'u' option, which causes only changed object files to
      be inserted, is not applicable.

commit 32bc03f9eed8795cfd2f2615d1c9f8673e039c57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 19 13:51:53 2017 -0500

    Added --force-version=STRING option to configure.
    
    Details:
    - Added an option to configure that allows the user to force an arbitrary
      version string at configure-time. The help text also now describes the
      usage information.
    - Changed the way the version string is communicated to the Makefile.
      Previously, it was read into the VERSION variable from the 'version' file
      via $(shell cat ...). Now, the VERSION variable is instead set in
      config.mk (via a configure-substituted anchor from config.mk.in).

commit befaee6dd8b2a72de9e0461fe2ec1f36e9f88f3c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 18 17:56:00 2017 -0500

    Updated openmp/pthread barriers with GNU atomics.
    
    Details:
    - Updated the non-tree openmp and pthreads barriers defined in
      bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
      implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
      implementation goes through the same motions as the previous codes, but
      protects its loads and increments with GNU atomic built-ins. These atomic
      statements take memory ordering parameters that allow us to specify just
      enough constraints for the barrier to work as intended on weakly-ordered
      hardware. The prior implementation was only guaranteed to work on systems
      with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
      this change and his crash-course in atomics and memory ordering.)
    - Removed 'volatile' from structs' barrier field declarations in
      bli_thrcomm_*.h.
    - Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
      consistent with that of the _openmp.? files.
    - Updated other bli_thrcomm_* files to rename "communicator" variables to
      simply "comm".

commit 8f739cc847fcff2ddeeb336f8b2b9d080eb16f6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 17 19:03:22 2017 -0500

    Added API to set mt environment variables.
    
    Details:
    - Renamed bli_env_get_nway() -> bli_thread_get_env().
    - Added bli_thread_set_env() to allow setting environment variables
      pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
    - Added the following convenience wrapper routines:
        bli_thread_get_jc_nt()
        bli_thread_get_ic_nt()
        bli_thread_get_jr_nt()
        bli_thread_get_ir_nt()
        bli_thread_get_num_threads()
        bli_thread_set_jc_nt()
        bli_thread_set_ic_nt()
        bli_thread_set_jr_nt()
        bli_thread_set_ir_nt()
        bli_thread_set_num_threads()
    - Added #include "errno.h" to bli_system.h.
    - This commit addresses issue #140.
    - Thanks to Chris Goodyer for inspiring these updates.

commit 10163833075fd42be5b5b503acc855f91a484cfd
Author: Marat Dukhan <marat@fb.com>
Date:   Thu Jul 13 21:39:24 2017 -0700

    Fix Emscripten builds

commit c09b30d115eade72f44f37bf90aa848c9c0e79af
Author: Minh Quan HO <mqho@kalray.eu>
Date:   Fri Jul 7 10:52:05 2017 +0200

    set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
    
    The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
    not set in bli_membrk_init

commit 997628ed9793c72e9ef576dd8d715cfec27c4862
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Fri Jun 30 12:23:19 2017 +0530

    Reducing the framework overhead of GEMV routines
    
    Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684

commit ee869066168239b710ad9938bb0e1ae454883f3a
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Jul 4 12:57:32 2017 +0530

    Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
    
    Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4

commit 7b933b90b1859c96de49a402d48de82909bc73e5
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Jun 6 20:23:17 2017 -0500

    Add new SSI acknowledgment

commit 3485abba4b426fbf42b146a9611a0841f6d236c6
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed May 24 11:48:16 2017 +0530

    Checked in the small matrix code to compute GEMM called with A transpose case
    
    Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462

commit de16beb83b29b4b9748f70db985b0fe04db85f7d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri May 26 14:49:31 2017 -0400

    PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.

commit 25d0e618544b6eea7d3f13c7aec513ac0139801d
Author: Devin Matthews <dmatthews@gator3.ufhpc>
Date:   Fri May 26 14:47:36 2017 -0400

    Revert "Change PACKDIM_MR (double) for haswell to 8."
    
    This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.

commit c5bdd84b35bc2a8ebf55b7763fb56c0c945be0cb
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri May 26 12:28:09 2017 -0500

    Change PACKDIM_MR (double) for haswell to 8.

commit 172789d562001293b973bbdd8015bd27d37292e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 17 13:03:52 2017 -0500

    Restored deleted lines from makefile fragments.

commit 3ea9bd2c8e90dbd35655fa6a5b953dfea1f308fe
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed May 17 12:29:44 2017 -0500

    Change to /bin/sh.
    
    All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.

commit 49438409eedb98d3f0ebf00b8d1eee0ae45f4f8c
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed May 17 12:27:14 2017 -0500

    Remove shebangs from makefiles.

commit 497e2640474c016d576dce3530fa6a66891642a0
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 23:11:22 2017 -0400

    Fix if/else structure. Thanks to TravisCI.

commit 835035c56a8de36ad25bb8d1375db170d489ef57
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:23:27 2017 -0400

    Mark piledriver compilable w/ clang.

commit 6cdb533472ee61af297c1f948307abbf45828887
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:12:12 2017 -0400

    Mark bulldozer compilable w/ clang.

commit a85697d62272da06d28cd1c947f6cf1098df6467
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:06:59 2017 -0400

    Correct error message.

commit e0c64cad271058688a2b999caf8c2767dc3aef7e
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:03:23 2017 -0400

    Indeed once can compile for carrizo also using clang.

commit 4aafe0505d3f0954d095ded5459a76976e5093b4
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 21:50:49 2017 -0400

    A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash

commit abaeaa68ea11e84be1810f564d6f38d506cbeb6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 5 15:06:56 2017 -0500

    Fixed a bug in norm1v, norm1m.
    
    Details:
    - Fixed a bug that manifested as improperly-computed 1-norm for vectors
      and matrices. This is one of the few operations in BLIS that does not
      have its own test module within the testsuite, hence why it went
      undetected for so long. The bad 1-norms were being used to normalize
      matrices in the testsuite after initialization, which led to some
      matrices containing a combination of "large" and "small" values. This
      tended to push the residuals computed after each test away from zero.
      In some cases, they were off *just* enough to the testsuite to label
      it a "failure". Many thanks to Jeff Hammond for reporting this bug.
      (Wonky details: the bug was due to improperly-defined level-0 scalar
      macros for abval2, an operation that computes the absolute square,
      or complex magnitude/modulus. Certain complex domain instances of
      abval2 were being incorrectly defined in terms of real-only solutions,
      leading to bad results. This level-0 operation forms the basis of
      norm1v/norm1m. absq2 was also affected, but almost nothing uses
      this operation.)

commit cc3107ae1c2074f72b724aa748d2e5b4cb290ed5
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu May 4 10:35:22 2017 -0500

    Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123.

commit c8ab91f70d399ee14edd30a3a5c46b24c5d2f910
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 3 15:04:51 2017 -0500

    Disable complex 3m/4m in testsuite by default.
    
    Details:
    - Disabled testsuite tests of all level-3 implementations based on 3m
      and 4m. This will improve testing runtime on Travis CI as well as for
      anyone manually running the testsuite using default test parameters.
      Thanks to Devin Matthews for suggesting this change.

commit 9700f0e5785007ddafb72a5ca83800dee61fd35c
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Tue May 2 19:25:21 2017 -0700

    allow KNL build without hbwmalloc.h (i.e. emulated)
    
    we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
    although it is possible to install hbwmalloc implementation on such
    systems, it is easier not to, since obviously the performance of SDE
    execution is not representative so there is no reason to emulate HBW
    allocation.

commit 17dcd5a33ff91967f67e7c0ba09b4f18754609a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 16:48:43 2017 -0500

    Fixed stray parentheses in README citations.

commit 2910d44ff9e1d951d3249313f4ab39d18ea1b48d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 16:38:43 2017 -0500

    CHANGELOG update (0.2.2)

commit 5ca3863220e07972fcefc6682ddd3f6e54fe4a94
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 15:48:30 2017 -0500

    Fixed a trsm1m bug that affected right-side cases.
    
    Details:
    - Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
      was nondeterministic behavior (usually segmentation faults) for certain
      problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
      cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
      which explicitly directed the virtual gemm micro-kernel to use temporary
      space if the storage preference of the [real domain] gemm ukernel did
      not match the storage of the output matrix C. In the context of gemm,
      this handling is not needed because agreement between the storage pref
      and the matrix is guaranteed by a high-level optimization in BLIS.
      However, this optimization is not applied to trsm because the storage
      of C is not necessarily the same as the storage of the micro-panels of
      B--both of which are updated by the micro-kernel during a trsm
      operation. Thus, the guarantee of storage/preference agreement is not
      in place for trsm, which means we must handle that case within the
      virtual gemm micro-kernel.
    - Comment updates and a minor macro change to bli_trsm*_cntx_init() for
      3m1, 4m1a, and 1m.

commit 1af0b09f5c275ee7bac896cc6f36f42af721d9b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 12:09:39 2017 -0500

    README.md update.
    
    Details:
    - Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
      and 6th BLIS papers.

commit db4a0bb8ba7cd697d68be8e5632371ee3e59fd63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 17 12:07:27 2017 -0500

    Whitespace reformatting to armv8a kernels file.
    
    Details:
    - Updated formatting of function signature/header in
      kernels/armv8a/3/bli_gemm_opt_4x4.c.

commit e3eb01f6b990e205b15edcbaffd3d54b3ddd1ca4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 21 15:33:39 2017 -0600

    Disabled experiment-related 1m code.
    
    Details:
    - Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
      specifically inserted to facilitate the benchmarking of 1m block-panel
      and panel-block algorithms.
    - Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
      reflect changes used/needed during benchmarking.

commit 4f61528d56eed6a139eeac9db0c44e56f2d2d136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 25 16:25:46 2017 -0600

    Added 1m-specific APIs for bp, pb gemm algorithms.
    
    Details:
    - Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
      body of bli_gemm_cntl_create() replaced with a call to the former.
    - Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
      bli_cntl_free() can check if the thread parameter is NULL, and if so,
      call the latter, and otherwise call the former.
    - Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
      terms of bli_gemm1mxx_cntx_init(), which behaves the same as
      bli_gemm1m_cntx_init() did before, except that an extra bool parameter
      (is_pb) is used to support both bp and pb algorithms (including to
      support the anti-preference field described below).
    - Added support for "anti-preference" in context. The anti_pref field,
      when true, will toggle the boolean return value of routines such as
      bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
      causing BLIS to transpose the operation to achieve disagreement (rather
      than agreement) between the storage of C and the micro-kernel output
      preference. This disagreement is needed for panel-block implementations,
      since they induce a transposition of the suboperation immediately before
      the macro-kernel is called, which changes the apparent storage of C. For
      now, anti-preference is used only with the pb algorithm for 1m (and not
      with any other non-1m implementation).
    - Defined new functions,
        bli_cntx_l3_ukr_eff_prefers_storage_of()
        bli_cntx_l3_ukr_eff_dislikes_storage_of()
        bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
        bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
      which are identical to their non-"eff" (effectively) counterparts except
      that they take the anti-preference field of the context into account.
    - Explicitly initialize the anti-pref field to FALSE in
      bli_gks_cntx_set_l3_nat_ukr_prefs().
    - Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
      in terms of the existing block-panel macro-kernel _ker_var2(). This
      technique requires inducing transposes on all operands and swapping
      the A and B.
    - Changed bli_obj_induce_trans() macro so that pack-related fields are
      also changed to reflect the induced transposition.
    - Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
      specify the 1m algorithm (block-panel or panel-block).
    - Renamed the following cntx_t-related macros:
        bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
        bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
        bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
      and updated all instantiations. Also updated the field names in the
      cntx_t struct.
    - Comment updates.

commit 1d728ccb2394e77365e7c42683db6579c5fba014
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 25 18:29:49 2016 -0600

    Implemented the 1m method.
    
    Details:
    - Implemented the 1m method for inducing complex domain matrix
      multiplication. 1m support has been added to all level-3 operations,
      including trsm, and is now the default induced method when native
      complex domain gemm microkernels are omitted from the configuration.
    - Updated _cntx_init() operations to take a datatype parameter. This was
      needed for the corresponding function for 1m (because 1m requires us
      to choose between column-oriented or row-oriented execution, which
      requires us to query the context for the storage preference of the
      gemm microkernel, which requires knowing the datatype) but I decided
      that it made sense for consistency to add the parameter to all other
      cntx initialization functions as well, even though those functions
      don't use the parameter.
    - Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
      a second scalar for each blocksize entry. The semantic meaning of the
      two scalars now is that the first will scale the default blocksize
      while the second will scale the maximum blocksize. This allows scaling
      the two independently, and was needed to support 1m, which requires
      scaling for a register blocksize but not the register storage
      blocksize (ie: "packdim") analogue.
    - Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
      bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
      default and maximum blocksizes to some desired blocksize multiple.
      These functions are needed in the updated definitions of
      bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
    - Added support for the 1e and 1r packing schemas to packm, including
      1e/1r packing kernels.
    - Added a minor optimization to bli_gemm_ker_var2() that allows, under
      certain circumstances (specifically, real domain beta and row- or
      column-stored matrix C), the real domain macrokernel and microkernel
      to be called directly, rather than using the virtual microkernel
      via the complex domain macrokernel, which carries a slight additional
      amount of overhead.
    - Added 1m support to the testsuite.
    - Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
      some code in test_gemm.c driver.

commit 0d1b90286e29aa8b768e280b5286d92c02ad87a1
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Tue Oct 25 21:15:26 2016 -0700

    never use libm with Intel compilers
    
    Intel compilers include a highly optimized math library (libimf) that
    should be used instead of GNU libm.
    
    yes, this change is for ALL targets, including those that are not
    supported by the Intel compiler.  there is no harm in doing this, and it
    is future-proof in the event that the Intel compilers support other
    architectures.

commit b150870397e7aee558e61d1bd72a0c0d1d99bee8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 8 16:08:41 2017 -0600

    Removed most "old" directories.
    
    Details:
    - Removed the vast majority of directories named "old", which contained
      deprecated code that I wasn't quite ready to jettison from the source
      tree.

commit 270c65985df849297ba1951aa3b56c03948d7775
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 8 15:21:18 2017 -0600

    Modified bli_getopt() for thread-safety.
    
    Details:
    - Changed the interface of bli_getopt() to take a new argument, a getopt_t
      struct, that stores the values of optarg, optind, opterr, and optopt,
      and updated the implementation accordingly. (Previously,  these
      variables were assumed to be global.)
    - Added a function for initializing a getopt_t struct.
    - Changed test_libblis.c--currently the only consumer of bli_getopt()--to
      utilize the new getopt_t state object.

commit ce4d8fabc2e39371f89c12192fb707be82ae021a
Merge: 39be59f2 e05a8dfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 7 17:36:44 2017 -0600

    Merge branch 'master' of github.com:flame/blis

commit 39be59f2a8470f40475907d9dd52639b8a911a92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 7 17:35:20 2017 -0600

    Replaced several macros with static function APIs.
    
    Details:
    - Reimplemented several sets of get/set-style preprocessor macros with
      static functions, including those in the following frame/base headers:
      auxinfo, cntl, mbool, mem, membrk, opid, and pool. A few headers in
      frame/thread were touched as well: mutex_*, thrcomm, and thrinfo.

commit e05a8dfa7cc7df41e966c1ad04e51c482b308b23
Merge: 79507337 4423e33d
Author: dnp <devangiparikh@gmail.com>
Date:   Wed Dec 6 16:45:24 2017 -0600

    Merge branch 'rt'

commit 4423e33dc593115cda92c5763d756d7ad1298aa9
Author: dnp <devangiparikh@gmail.com>
Date:   Wed Dec 6 16:35:03 2017 -0600

    Adding SKX kernels and configuration.

commit 79507337e140daec7639f6eb3ed9cfe6e123d342
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Dec 6 16:21:35 2017 -0600

    Various checks to ensure that arch_t id is in range.
    
    Details:
    - Expanded checking of the arch_t id in bli_gks.c--either passed in from
      the caller or as returned from bli_arch_query_id()--against the expected
      range of id values. Thanks to Devangi Parikh for suggesting these
      additional sanity checks.

commit fde7c1126c58373ecde83471890b257399144876
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 4 16:11:01 2017 -0600

    Added 'uninstall-old-headers' target to Makefile.
    
    Details:
    - Defined a new 'uninstall-old-headers' target that allows users of BLIS to
      uninstall no-longer-needed headers left over from previous installations.
    - Fixed the 'uninstall-old' target so that it will install both .a and .so
      libraries.
    - Renamed 'uninstall-old' to 'uninstall-old-libs'.
    - Added 'uninstall-old' target (different from previous 'uninstall-old'
      target) that combines 'uninstall-old-libs' and 'uninstall-old-headers'.

commit d4ee770bde213a87aa6049245145318324dc6b51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 4 14:53:43 2017 -0600

    Create/install monolithic cblas.h.
    
    Details:
    - When CBLAS is enabled at configure-time, BLIS now creates a monolithic
      cblas.h using the same flatten-header.sh script that was recently
      introduced for creating monolithic blis.h header files. The top-level
      Makefile will also install this cblas.h file into the install prefix
      alongside blis.h when the 'install' target is invoked. The two header
      files are compatible with one another. Regardless whether the user's
      source #includes cblas.h, both blis.h and cblas.h, or just blis.h,
      the user will get the CBLAS function prototypes and enums, as expected.

commit 52f9e6f1b6468785af8947317656445d4729fc8b
Merge: ab57b979 21360dd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 1 12:28:09 2017 -0600

    Merge branch 'rt'

commit 21360dd8e2c7287100645e109acaabcc6ba1140c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 29 14:11:34 2017 -0600

    Fixed cntx_t packm query when ker_id > _NUM_PACKM_KERS.
    
    Details:
    - Fixed a subtle bug in bli_cntx_get_[un]packm_ker_dt() in which the
      function fails to return NULL when passed a kernel id argument that is
      equal to or beyond BLIS_NUM_[UN]PACKM_KERS. Instead, the function was
      attempting to index into the cntx_t's packm kernel array, which resulted
      in undefined behvaior. Thanks to Devangi Parikh for finding this bug.

commit 244a6f4e66e8ff091e995f8090ce779c1928aa8b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 28 17:48:48 2017 -0600

    Fixed POSIX sed non-compliance in flatten-header.sh.
    
    Details:
    - Changed GNU usage of 'i' and 'a' sed commands used in flatten-header.sh
      to POSIX-compliant usage that will work on OS X's sed.

commit 45078621676833e53a2878af8f89479c4f93b8ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 28 15:16:22 2017 -0600

    Generate/compile with/install monolithic blis.h.
    
    Details:
    - Rewrote monolithify-header.sh (and renamed to flatten-header.sh) so that
      headers are inserted recursively. This improves performance by a factor
      of 3-4x.
    - Modified configure to create an 'include/<configname>' directory in which
      make can create a monolithic header.
    - Modified the top-level Makefile so that a monolithic header is generated
      unconditionally prior to compilation (stored in include/<configname>) and
      so that the single header is installed instead of the 450 or so header
      files that reside throughout the framework source tree.
    - Added "include/*/*.h" to .gitignore file.
    - Removed some pnacl/emscripten leftovers that I intended to include in
      a1caeba (mostly in testsuite/Makefile).
    - Trivial comment changes to frame/include/bli_f2c.h.

commit 1f30b1301bf6d6047ec29e57a5fde8eb1072a0ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 25 16:54:26 2017 -0600

    Added missing framework support for x86_64 family.
    
    Details:
    - Added support for the x86_64 configuration family to bli_arch.c and
      bli_arch_config.h. Thanks to Johannes Dieterich for reporting this
      issue.
    - Bumped the default value for BLIS_SIMD_NUM_REGISTERS from 16 to 32 and
      the default value for BLIS_SIMD_SIZE from 32 to 64. This will support
      configuration families that include Skylake and newer processors without
      any supported needed in the bli_family_*.h file. The semantics of these
      values have always been "maximum" and not exact values; comments in
      bli_kernel_macro_defs.h and the github wiki have been adjusted
      accordingly.

commit 9f39806c4ed484c9ed13edf96005838d977722a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 21 16:03:56 2017 -0600

    Fixed a bug in e31f0b3/b131b9a.
    
    Details:
    - Erroneously placed the "don't overwrite existing blocksize" logic in
      bli_blksz_init*() rather than in bli_cntx_set_blkszs(). It belongs in
      the latter because that function copies blocksizes as-is from the
      blksz_t function argument to the appropriate field in the cntx_t. If
      the blksz_t was previously initialized selectively, based on the sign
      of the blocksize value passed into bli_blksz_init*(), that just leaves
      some fields possibly uninitialized (with garbage values), which
      definitely will not work.
    - The aforementioned logic has been moved to bli_cntx_set_blkszs() via
      a new function bli_blksz_copy_if_pos(), which selectively copies only
      the blocksizes that are greater than zero.

commit b131b9a025c15f548d4c2952a9ec85eee3d139b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 21 14:30:26 2017 -0600

    Updated configs to omit setting some blocksizes.
    
    Details:
    - Employ the new semantics of bli_blksz_init*() in e31f0b3 in various
      sub-configurations' bli_cntx_init_*() functions by passing in 0 for
      register and cache blocksizes that correpond to gemm microkernel
      datatypes that were not registered, allowing the default values
      set by the bli_cntx_init_*_ref() function call to remain.

commit 499a4c002f895744ecaf81ef7f62d2d6d0d7d594
Merge: e31f0b3e 6c3ba502
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 21 14:25:08 2017 -0600

    Merge branch 'rt' of github.com:flame/blis into rt

commit e31f0b3e2dba19ca8a2946bc21beb136a42d0f57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 21 14:21:25 2017 -0600

    Subtle update to bli_blksz_init*() API.
    
    Details:
    - Updated the semantics of bli_blksz_init() and bli_blksz_init_ed() so
      that non-positive blocksize values are ignored entirely. This provides
      an easy way to indicate that certain existing values should not be
      touched by the update. Thanks to Devangi Parikh for feedback that led
      to these changes.

commit 6c3ba502a11f87bc67555d26154cfd39d0af1bac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 21 13:50:53 2017 -0600

    Added 'x86_64' sub-config directory.
    
    Details:
    - Added missing x86_64 configuration directory, which was intended to be
      part of b7ca580.
    - Added -Wfatal-errors compiler warning flag to all configurations so that
      compilation stops after the first error.
    - Changed the vectorization flags for intel64 configuration to be compatible
      with 'penryn', the oldest sub-config included in that family.
    - Changed the vectorization flags for penryn to target the 'core2'
      microarchitecture and ssse3.

commit 25eee3cc49b0631812485d4d5ceef0c23ed1b6dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 21 12:34:20 2017 -0600

    Added a dummy file to kernels/generic.
    
    Details:
    - Added a dummy file to kernels/generic, which was previously empty, so
      that git would begin tracking the otherwise-empty directory. This
      directory's existence is necessary for proper execution of configure
      for any configuration family that contains the 'generic'
      sub-configuration. Thanks to Johannes Dieterich for reporting the
      issue that led to this fix.

commit ef024ce4cafa217669eaabb31ff8ab6df93cca05
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 20 18:08:29 2017 -0600

    More tweaks to monolithify-header.sh
    
    Details:
    - Further fixes monolithify-header.sh script.
    - Removed unnecessary #include "blis.h" from frame/3/bli_l3_packm.h.

commit 5028e7dec269b62895511453272585da36e591b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 20 17:00:37 2017 -0600

    Second attempt to implement travis_wait.
    
    Details:
    - Corrected accidental misplacement of the travis_wait prefix (on the
      wrong line of the .travis.yml file) in commit 13e5d91.

commit 13e5d9107b3763cba46fb1bae87476852601b47c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 20 15:57:06 2017 -0600

    Added travis_wait prefix to testsuite via Travis.
    
    Details:
    - It appears that Travis CL has implemented a new policy that results in
      a test failing if it does not produce any output for more than 10
      minutes. (Two test instances are now failing in Travis despite the most
      recent commit not affecting the library or testsuite.) This issue can
      be worked around by executing the test run via travis_wait, which takes
      an optional time parameter. This commit attempts to use 'travis_wait 30'
      in the .travis.yml file to prevent the early failure at 10 minutes.

commit a1caeba0ea79c8fecb1abadca1f91c6367ab3afb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 20 13:31:20 2017 -0600

    Removed pnacl, emscripten support from Makefile.

commit 78199c539beaa50f37893add220261ce0dcb921a
Merge: b3d8ab2e ab57b979
Author: praveeng <praveen.g@amd.com>
Date:   Mon Nov 20 15:51:20 2017 +0530

    Merge master code till 01-Nov-2017 to amd-staging
    
    Change-Id: I40b53f876db84c8b947b3f2385c9b882245c6603

commit 9df6dda9ec51a0d40166169d2d8a2f84b42266e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 18 19:03:26 2017 -0600

    Improvements, bugfixes to monolithify-header.sh.

commit 21d26201f90b884eb8d5de279ed74bbd244ffcb5
Merge: 43baa3b3 b7ca5806
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 18 14:16:53 2017 -0600

    Merge branch 'rt' of github.com:flame/blis into rt

commit 43baa3b327d5ae1e2ba619432687b4dd849b05e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 18 14:14:44 2017 -0600

    Removed unnecessary flags for generic config.
    
    Details:
    - Removed -D_POSIX_C_SOURCE=200112L and -m64 flags from make_defs.mk file
      of generic sub-configuration. These flags are generally not necessary,
      and particularly not desirable for the generic configuration since they
      unnecessarily restrict the environments in which the configuration can
      be built.

commit b7ca580618f9382b7982168fd035ed058f83e4c2
Author: iotamudelta <dieterich@ogolem.org>
Date:   Sat Nov 18 14:56:05 2017 -0500

    [WIP] Add x86 and x86_64 processor families. (#154)
    
    * Add x86 and x86_64 processor families.
    * Use generic config as fallback for more families.
    
    After discussion with fgvanzee, a) it's "generic" and 2) use it for all the families as a fallback. Goal is that if a specific CPU is not yet supported by a family (say a new Intel microarchitecture on x86_64), it'll fall through to still work with the slower "generic" kernels

commit 870597d1663aaba1b74d7654b1d4946280aa0d3f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 17 17:06:42 2017 -0600

    Added bash script for creating monolithic headers.
    
    Details:
    - Added a new script, monolithify-header.sh, to the 'build' directory.
      This script recursively replaces all #include directives in a selected
      file with the contents of the header files referenced by each directive.
      The idea is to "flatten" a tree of .h files into a single file, with
      the script acting as a C preprocessor that only processes #include
      directives.

commit c76f77f4cc1e71988251c5e63cf6ef137477bf9c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 17 15:10:52 2017 -0600

    Removed unnecessary #include "blis.h" from header.
    
    Details:
    - Removed an errant #include "blis.h directive from bli_cntx_ind_stage.h.
      The generaly policy is that no header file in BLIS should include
      blis.h. This will be important in the near future when using a tool to
      recursively create a monolithic blis.h file from its consitutent
      headers.

commit 2bb9bc6e9536fa239fbc19a7efaaf151116e15b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 17 13:50:14 2017 -0600

    Miscellaneous tweaks to gks, rt functionality.
    
    Details:
    - Updated bli_cpuid_query_id() so that BLIS_ARCH_GENERIC is always returned
      if the hardware fails to test positive for any supported sub-configuration.
    - Defined bli_gks_init_ref_cntx(), which will call the context initialization
      function bli_cntx_init_configname() for the sub-configuration 'configname'
      associated with the arch_t id returned by bli_arch_query_id(). This makes
      initializing a reference context easy for experts who wish to construct
      those contexts.

commit b3d8ab2ea02c127ab241532abc214624f35bfaab
Merge: 189ffbb0 fe71c06e
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date:   Wed Nov 15 01:33:12 2017 -0500

    Merge "Added AMD copyright line to the changed files in last 3 commits" into amd-staging

commit fe71c06e42b072407c83112779055b0afb67173d
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Wed Nov 15 11:11:17 2017 +0530

    Added AMD copyright line to the changed files in last 3 commits
    
    Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66

commit d5bf79e50bf97072bbe7117c86b7c45e6e707ea0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 13 14:24:29 2017 -0600

    Miscellaneous tweaks and fixes.
    
    Details:
    - Fixed incorrect calling sequence in bli_cntx_init_knl.c--an instance of
      bli_blksz_init_easy() that should have been bli_blksz_init().
    - Fixed a bug in code that is supposed to output the list of sub-directories
      in the 'config' directory when configure script is run with no arguments.
    - Expanded the output of "make showconfig" to include more info from config.mk.
    - Minor changes to build/auto-detect/cpuid_x86.c, mostly in preparation for
      someone to add excavator and zen support.
    - Added a link to the ConfigurationHowTo wiki to config_registry.
    - Other minor tweaks to configure.

commit 673e5184030532c4ebd9fdeecbaa6442bb3ad54f
Merge: 2c51356a 8f150f28
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 1 17:37:42 2017 -0500

    Merge branch 'rt' of github.com:flame/blis into rt

commit 2c51356a8b2699c99f9507c80d69c08a35d45fe3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 1 17:37:02 2017 -0500

    Implemented runtime hardware detection via cpuid.
    
    Details:
    - Added runtime support for selecting an appropriate arch_t value based
      on the results of the cpuid instruction (for x86_64). This allows
      deferral of choosing a context (kernels, blocksizes, etc.) until
      runtime, which allows BLIS to be built with support for multiple
      microarchitectures. Currently, only amd64 and intel64 configurations
      are registered in the config_registry; however, one could create
      custom configuration families to support arbitrary sets of x86_64
      microarchitectures.
    - Current Intel microarchitectures supported via cpuid are knl, haswell,
      sandybridge, and penryn.
    - Current AMD microarchitectures supported via cpuid are: zen, excavator,
      steamroller, piledriver, and bulldozer.

commit ab57b979046479bcda7f83165838a80117c2ad95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 1 11:51:41 2017 -0500

    Revert to default SIMD alignment for bulldozer.
    
    Details:
    - Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
      config/bulldozer/bli_kernel.h. Not sure where this value came from, but
      it would seem to allow for insufficient starting address alignment for
      any matrices created via bli_malloc_user(), such as via
      bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
      led us to this bug.
    - This commit is a manual patch of the same fix made to the 'rt' branch
      in 8f150f2.

commit 8f150f28a678c4a0c1591400177ad7cca81fcaec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 1 11:41:45 2017 -0500

    Revert to default SIMD alignment for bulldozer.
    
    Details:
    - Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
      bli_family_bulldozer.h. Not sure where this value came from, but it
      would seem to allow for insufficient starting address alignment for
      any matrices created via bli_malloc_user(), such as via
      bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
      led us to this bug.

commit e3f10557caf114441fbfff990e3ce3576c177bdc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 30 13:37:54 2017 -0500

    Use perl for some substitution for OS X compatibility.
    
    Details:
    - Discovered that sed commands where the replacement string contains '\n'
      are problematic with the version of sed present in OS X. For these cases
      cases in the configure script, we instead use 'perl -pe' for
      search-and-replace functionality.
    - Various other minor comment/whitespace tweaks to configure.
    - Removed remaining lines of code related to setting/checking variables to
      track "unregistered" configurations.

commit dd45cfdfc3d8f9acf4cf7f69138d9b83dafc8842
Merge: 3e4f42a4 f60c827b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 30 12:23:05 2017 -0500

    Merge branch 'master' into rt

commit f60c827ba95f452c8454fb914f5564f4895bf644
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Oct 30 10:04:42 2017 -0500

    Fix CVECFLAGS for bulldozer config.

commit 3e4f42a4d2ebb37b95988933d92e561c5b2cc201
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 27 11:41:37 2017 -0500

    Typecast l1mkr_t enum value prior to comparison.
    
    Details:
    - Typecast l1mkr_t enum value in bli_cntx.h to guint_t before testing for
      out-of-range value. This is an attempt to pacify a strange warning from
      clang on OS X that is seemingly the result of the following compiler
      warning flag:
        -Wtautological-constant-out-of-range-compare

commit aec6e038d942d35b81bbd723a640cce2c054fb8e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 26 16:12:36 2017 -0500

    Removed associative arrays from configure.
    
    Details:
    - Implemented a replacement for associative arrays in the configure script
      that does not utilize arrays, and therefore works in pre-4.0 versions of
      bash. (It appears that Mac OS X will be stuck with version 3.2 indefinitely
      due to bash switching to the GPL 3.0 license starting with version 4.0.)

commit 189ffbb0d37262b21acddc0d35b4a22f2cbbca94
Merge: 06e0e635 3eb44f67
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date:   Wed Oct 25 02:00:30 2017 -0400

    Merge changes Ie115b206,I7ce6cfa2,Iff59b6f4 into amd-staging
    
    * changes:
      Adding __attribute__((constructor/destructor)) for CLANG case.
      Thread Safety: Move bli_init() before and bli_finalize() after main()
      Thread safety: Make the global induced method status array local to thread

commit 3eb44f67618b91ae5f5f0aaaba67e38f16042ee4
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Tue Oct 24 16:36:36 2017 +0530

    Adding __attribute__((constructor/destructor)) for CLANG case.
    
    CLANG supports __attribute__, but its documentation doesn't
    mention support for constructor/destructor. Compiling with
    clang and testing shows that it does support this.
    
    Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b

commit 07c352188bf5265af242255f8e6fcb97050d973d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 23 16:59:22 2017 -0500

    Added "generic" configuration.
    
    Details:
    - Added a "generic" configuration that leaves the default blocksizes and
      kernels unchanged. This replaces the older "reference" configuration.
      Updated auto-detect script and code accordingly.
    - Added support for generic configuration to arch_t (bli_type_defs.h),
      bli_gks_init() (bli_gks.c), and bli_arch_config.h
    - Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
    - Whitespace changes to configurations' make_defs.mk files.

commit c1a98d6f70608b02a1e6bcad6ba020a60773dace
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 23 14:24:41 2017 -0500

    Minor update to .travis.yml file.

commit 75b9383f01caa8b83f8be0117e15085b0d807ba6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 20 16:41:22 2017 -0500

    Minor header renaming ahead of bli_arch.c.
    
    Details:
    - Renamed the various configurations' "bli_arch_<configname>.h" header files
      (replacing "arch" with "family") to free up the 'bli_arch' namespace for a
      different purpose (hardware detection).
    - Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
      "bli_arch_config.h" and "bli_arch_config_pre.h", respectively.

commit 482af51add26d5ed103c3e3f167657f273b32c7a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 20 15:44:26 2017 -0500

    Fixed 'make test' target from top-level Makefile.
    
    Details:
    - Updated the top-level Makefile's build rule for testsuite object files to
      properly obtain CFLAGS via get-frame-cflags-for() function instead of
      simply using the $(CFLAGS) variable (which is empty). This means that
      'make test' should now work as expected.

commit 3c269f700d207efe6c04193f09d519c88c1d4045
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 20 13:57:21 2017 -0500

    Makefile updates for test drivers, testsuite.
    
    Details:
    - Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
      as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
      sub-directories.
    - Factored out much of the top-level Makefile into common.mk. A Makefile
      needs only set DIST_PATH to the relative path to the top level of the
      BLIS source distribution before including common.mk in order to acquire
      all of the definitions typically needed in a Makefile that tests BLIS.

commit 0557189d463446b4c32077cdcf0467fa71ca68dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 18 15:05:27 2017 -0500

    Minor updates to .travis.yml, configure script.

commit 2553734d1d62043793f4e783a027349ef6d4d563
Merge: 453deb29 37534279
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 18 13:46:50 2017 -0500

    Merge branch 'master' into rt

commit 375342799cbae981c28d831793af588d7951f3f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 18 13:41:25 2017 -0500

    Removed a duplicate bli_avx512_macros.h header.
    
    Details:
    - Removed a duplicate header file that was causing problems during
      installation for the 'knl' configuration. Thanks to Victor Eijkhout
      for reporting this issue.

commit 453deb29068889698e274f269c9aa90eea99b527
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 18 13:29:32 2017 -0500

    Implemented runtime kernel management.
    
    Details:
    - Reworked the build system around a configuration registry file, named
      config_registry', that identifies valid configuration targets, their
      constituent sub-configurations, and the kernel sets that are needed by
      those sub-configurations. The build system now facilitates the building
      of a single library that can contains kernels and cache/register
      blocksizes for multiple configurations (microarchitectures). Reference
      kernels are also built on a per-configuration basis.
    - Updated the Makefile to use new variables set by configure via the
      config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
      in determining which sub-configurations (CONFIG_LIST) and kernel sets
      (KERNEL_LIST) are included in the library, and which make_defs.mk files'
      CFLAGS (KCONFIG_MAP) are used when compiling kernels.
    - Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
      functions into a standard format that includes the kernel set name
      (e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
      kernels sub-directory. These files exist to provide prototypes for the
      kernels present in those directories.
    - Reorganized reference kernels into a top-level 'ref_kernels' directory.
      This directory includes a new source file, bli_cntx_ref.c (compiled on
      a per-configuration basis), that defines the code needed to initialize
      a reference context and a context for induced methods for the
      microarchitecture in question.
    - Rewrote make_defs.mk files in each configuration so that the compiler
      variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
      basis.
    - Modified bli_config.h.in template so that bli_config.h is generated with
      #defines for the config (family) name, the sub-configurations that are
      associated with the family, and the kernel sets needed by those
      sub-configurations.
    - Deprecated all kernel-related information in bli_kernel.h and transferred
      what remains to new header files named "bli_arch_<configname>.h", which
      are conditionally #included from a new header bli_arch.h. These files
      are still needed to set library-wide parameters such as custom
      malloc()/free() functions or SIMD alignment values.
    - Added bli_cntx_init_<configname>.c files to each configuration directory.
      The files contain a function, named the same as the file, that initializes
      a "native" context for a particular configuration (microarchitecture). The
      idea is that optimized kernels, if available, will be initialized into
      these contexts. Other fields will retain pointers to reference functions,
      which will be compiled on a per-configuration basis. These bli_cntx_init_*()
      functions will be called during the initialization of the global kernel
      structure. They are thought of as initializing for "native" execution, but
      they also form the basis for contexts that use induced methods. These
      functions are prototyped, along with their _ref() and _ind() brethren, by
      prototype-generating macros in bli_arch.h.
    - Added a new typedef enum in bli_type_defs.h to define an arch_t, which
      identifies the various sub-configurations.
    - Redesigned the global kernel structure (gks) around a 2D array of cntx_t
      structures (pointers to cntx_t, actually). The first dimension is indexed
      over arch_t and the inner dimension is the ind_t (induced method) for
      each microarchitecture. When a microarchitecture (configuration) is
      "registered" at init-time, the inner array for that configuration in the
      2D array is initialized (and allocated, if it hasn't been already). The
      cntx_t slot for BLIS_NAT is initialized immediately and those for other
      induced method types are initialized and cached on-demand, as needed. At
      cntx_t registration, we also store function pointers to cntx_init functions
      that will initialize (a) "reference" contexts and (b) contexts for use with
      induced methods. We don't cache the full contexts for reference contexts
      since they are rarely needed. The functions that initialize these two kinds
      of contexts are generated automatically for each targeted sub-configuration
      from cpp-templatized code at compile-time. Induced method contexts that
      need "stage" adjustments can still obtain them via functions in
      bli_cntx_ind_stage.c.
    - Added new functions and functionality to bli_cntx.c, such as for setting
      the level-1f, level-1v, and packm kernels, and for converting a native
      context into one for executing an induced method.
    - Moved the checking of register/cache blocksize consistency from being cpp
      macros in bli_kernel_macro_defs.h to being runtime checks defined in
      bli_check.c and called from bli_gks_register_cntx() at the time that the
      global kernel structure's internal context is initialized for a given
      microarchitecture/configuration.
    - Deprecated all of the old per-operation bli_*_cntx.c files and removed
      the previous operation-level cntx_t_init()/_finalize() invocations.
      Instead, we now query the gks for a suitable context, usually via
      bli_gks_query_cntx().
    - Deprecated support for the 3m2 and 3m3 induced methods. (They required
      hackery that I was no longer willing to support.)
    - Consolidated the 1e and 1r packm kernels for any given register blocksize
      into a single kernel that will branch on the schema and support packing
      to both formats.
    - Added the cntx_t* argument to all packm kernel signatures.
    - Deprecated the local function pointer array in all bli_packm_cxk*.c files
      and instead obtain the packm kernel from the cntx_t.
    - Added bli_calloc_intl(), which serves as the calloc-equivalent to to
      bli_malloc_intl(). Useful when we wish to allocate and initialize to
      zero/NULL.
    - Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
      bli_cntx.h into static functions.

commit 4607aac297e55ad540cbe5fffbe02e6b1889c181
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Oct 16 22:06:57 2017 +0530

    Thread Safety: Move bli_init() before and bli_finalize() after main()
    
    BLIS provides APIs to initialize and finalize its global context.
    One application thread can finalize BLIS, while other threads
    in the application are stil using BLIS.
    
    This issue can be solved by removing bli_finalize() from API.
    One way to do this is by getting bli_finalize() to execute by default
    after application exits from main().
    
    GCC supports this behaviour with the help of __attribute__((destructor))
    added to the function that need to be executed after main exits.
    
    Similarly bli_init() can be made to run before application enters main()
    so that application need not call it.
    
    Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac

commit 0f5ce26fc597cda6e8ae93a7526f52eb8cba01e9
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Oct 16 21:07:50 2017 +0530

    Thread safety: Make the global induced method status array local to thread
    
    BLIS retains a global status array for induced methods, and provides
    APIs to modify this state during runtime. So, one application thread
    can modify the state, before another starts the corresponding
    BLIS operation.
    
    This patch solves this issue by making the induced method status array
    local to threads.
    
    Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe

commit b882648af87deb1b365fc6b3e94151e69c5ccfa4
Merge: 8b379069 e02d3cb8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 11 16:32:21 2017 -0500

    Merge branch 'master' into rt

commit 06e0e6351acb9481225975ad9a4e0b8925336621
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Thu Sep 28 12:15:36 2017 +0530

    The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
    
    Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5

commit e02d3cb84190a345ebe9b32f53db03a1838976b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 26 19:02:53 2017 -0500

    Fixed a pthread typo in previous commit.
    
    Details:
    - Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.

commit f5962a1aae0fb3c9be104d0035c0d73210e7f670
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 26 17:00:04 2017 -0500

    Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
    
    Details:
    - Fixed a bug in gemmtrsm test module that was due to improper partitioning
      into a k x k triangular matrix for the purposes of obtaining an mr x k
      micropanel of A with which to test.
    - Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
      very large k (depending on the product of mr x kc on that architecture).
      The bug arose from the fact that the test module was triggering the
      allocation of blocks from the internal memory pools, which are limited in
      size. This allocation imposes an implicit assumption that the micro-
      panel being tested with will fit inside, and this assumption is violated
      for large values of k. Arbitrarily large k may now be tested for both
      operation tests.
    - Added OpenMP/pthread critical sections around the setting or getting of
      statuses from the induced method operation lookup table in bli_l3_ind.c.
    - Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
    - Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
      issues.

commit 8e917b256ca2d4bcdc059fe98d86be8775c69561
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 9 14:10:15 2017 -0500

    Updated bibtex info for BLIS5 (3m4m) article.

commit 7be887057358df4978a4833eeae0c17e15acd9d1
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Mon Aug 28 17:38:22 2017 +0530

    Merging "Adding auto hardware detection for Zen"
    
    Change-Id: Id450fb0c4f91a5cd5cbdc06970f4f9ed28dd8520

commit e056d810d16621891ead032603de0c2105cfc0f7
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Mon Aug 28 16:44:42 2017 +0530

    Bug fix for the testsuite build failing
    
    Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77

commit 83796b7caf745fafc263e9e5e1bfcf5eff00c025
Merge: 8176f4e4 d1ee7762
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Mon Aug 28 05:23:28 2017 -0400

    Merge "Adding auto hardware detection for Zen" into amd-staging

commit d1ee776202b26874333af7a91b6d2686342c4c81
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Aug 23 13:01:14 2017 +0530

    Adding auto hardware detection for Zen
    
    Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf

commit 8176f4e43872714b997f1a5f83056daadb0ff1a5
Merge: 12413018 adafe974
Author: praveeng <praveen.g@amd.com>
Date:   Mon Aug 28 12:21:16 2017 +0530

    resolving conflicts bli_gemm_front.c and LICENCE
    
    Change-Id: Id24ce53896d4c1c7ceccc3e004014a0ecceb5474

commit 57e1e5cd51e7ffe8612c96a20b6a041b55426ddb
Merge: f86ce54d d6ef56c6
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date:   Tue Aug 22 17:07:44 2017 +0530

    Merge AMD authored changes

commit adafe974b4bc3fc0663bc2f6f4ce2fde71a97988
Merge: f86ce54d 7dc78b49
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Aug 15 15:17:21 2017 -0500

    Merge pull request #150 from devinamatthews/vzeroupper
    
    Add vzeroupper to Intel AVX kernels.

commit 7dc78b49f97e6b3cd6d72fcdc588ace534d0e700
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Aug 15 10:02:25 2017 -0500

    Add vzeroupper to Intel AVX kernels.

commit f86ce54d6f315006984534fe29e47a2deaacc9f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 10 16:24:28 2017 -0500

    Removed trailing enum commas from bli_type_defs.h.
    
    Details:
    - Removed trailing commas from enums in bli_type_defs.h. Thanks to
      Erling Andersen for pointing out this inconsistency and suggesting
      the change.

commit 60a1eeb2317939d732b9eb6ff1e0d6d668c9a1e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 5 13:04:31 2017 -0500

    Added edge handling to _determine_blocksize_b().
    
    Details:
    - Added explicit handling of situations where i == dim to
      bli_determine_blocksize_b_sub(). This isn't actually needed by any
      current use case within BLIS, but handling the situation is nonetheless
      prudent. Thanks to Minh Quan for reporting this issue and requesting
      the fix.

commit b01c80829907d50ec79977fba8e7b53cfe7db80a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 4 14:17:44 2017 -0500

    Fixed a minor bug in level-3 packm management.
    
    Details:
    - Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
      entries to be released and then re-acquired unnecessarily. (In essence,
      the "<" operands in the conditional that guards the
      release-and-reacquire code block simply needed to be swapped.) The bug
      should have only affected performance (rather than the computed result).
      Thanks to Minh Quan for identifying and reporting the bug.

commit 8b379069fcd4811669855b1248ece831f190dff6
Merge: 1f3a5819 05925dd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 1 15:30:40 2017 -0500

    Merge branch 'master' into rt

commit 05925dd5d30e8f403bb671ce33029170d65ce7c0
Merge: 803bbef0 cecdc05d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Aug 1 09:31:02 2017 -0500

    Merge pull request #146 from devinamatthews/master
    
    Change lsame_ signature to match lapacke.

commit cecdc05d2834786a84ff85775d3f99a958c0765a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 31 15:19:51 2017 -0500

    Change lsame_ signature to match lapacke.

commit 803bbef0a386dd0571ad389f69d55154dbfe3c50
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 29 20:17:05 2017 -0500

    Fixed pthreads compile bug with previous commit.
    
    Details:
    - Erroneously passed family parameter into l3int_t function despite
      that function not taking the parameter. Oops.

commit c63980f4ca750618f359031d0691289b1abf5146
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 29 14:53:39 2017 -0500

    Moved 'family' field from cntx_t to cntl_t.
    
    Details:
    - Removed the family field inside the cntx_t struct and re-added it to the
      cntl_t struct. Updated all accessor functions/macros accordingly, as well
      as all consumers and intermediaries of the family parameter (such as
      bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
      change was motivated by the desire to keep the context limited, as much
      as possible, to information about the computing environment. (The family
      field, by contrast, is a descriptor about the operation being executed.)
    - Added additional functions to bli_blksz_*() API.
    - Added additional functions to bli_cntx_*() API.
    - Minor updates to bli_func.c, bli_mbool.c.
    - Removed 'obj' from bli_blksz_*() API names.
    - Removed 'obj' from bli_cntx_*() API names.
    - Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
      that operate only on a single struct to contain the "_node" suffix to
      differentiate with those routines that operate on the entire tree.
    - Added enums for packm and unpackm kernels to bli_type_defs.h.
    - Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
      They weren't being used and probably never will be.

commit 07837395560d413a1ba828163b41186e21a7bcfe
Merge: ca1d1d85 ad8610b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 21 16:49:48 2017 -0500

    Merge pull request #139 from Maratyszcza/emscripten
    
    Fix Emscripten builds

commit ad8610b4415cc7982804d74f9aba29875e9e2b6c
Merge: 8772a0b3 ca1d1d85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 21 15:18:33 2017 -0500

    Merge branch 'master' into emscripten

commit ca1d1d8560c9ab1a7e3b0ac43ac70d08075bf904
Merge: b537b5bb 733faf84
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 21 09:49:50 2017 -0500

    Merge pull request #144 from devinamatthews/fix_atomics_on_bgq
    
    Add fallbacks to __sync_* or __c11_atomic_* builtins...

commit 733faf848dcc54834fcdfbb0185dc644978d8864
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 14:50:13 2017 -0500

    Clang can't make up it's mind what to support.

commit 7425d0744d9e9cd29a887120e57c2b43ba287040
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 12:54:58 2017 -0500

    Add default #define for __has_extension.

commit b537b5bbe8cbee459a85bac11458498ae2bce4de
Merge: 1f1ec0db 7f41bb0a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 10:58:39 2017 -0500

    Merge pull request #133 from devinamatthews/haswell-packdim
    
    Fix prefetching in haswell ukernel

commit 8823f91a14638ce6f4e45e67df03212bb61609d6
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Jul 20 10:04:34 2017 -0500

    Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143.

commit 1f1ec0db9380b87679d5c771c4594daa1cfc5f0d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 19 15:40:48 2017 -0500

    Updated ar option list used by all configurations.
    
    Details:
    - Dropped 'u' from the list of modifiers passed into the library archiver
      ar. Previously, "cru" was used, while now we employ only "cr". This
      change was prompted by a warning observed on Ubuntu 16.04:
    
        ar: `u' modifier ignored since `D' is the default (see `U')
    
      This caused me to realize that the default mode causes timestamps to be
      zero, and thus the 'u' option, which causes only changed object files to
      be inserted, is not applicable.

commit 5caaba2d61cbbc36d63102a0786ece28ff797f72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 19 13:51:53 2017 -0500

    Added --force-version=STRING option to configure.
    
    Details:
    - Added an option to configure that allows the user to force an arbitrary
      version string at configure-time. The help text also now describes the
      usage information.
    - Changed the way the version string is communicated to the Makefile.
      Previously, it was read into the VERSION variable from the 'version' file
      via $(shell cat ...). Now, the VERSION variable is instead set in
      config.mk (via a configure-substituted anchor from config.mk.in).

commit 13175c5fb70fb6a378d5fff6ecede62e5ea6a1f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 18 17:56:00 2017 -0500

    Updated openmp/pthread barriers with GNU atomics.
    
    Details:
    - Updated the non-tree openmp and pthreads barriers defined in
      bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
      implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
      implementation goes through the same motions as the previous codes, but
      protects its loads and increments with GNU atomic built-ins. These atomic
      statements take memory ordering parameters that allow us to specify just
      enough constraints for the barrier to work as intended on weakly-ordered
      hardware. The prior implementation was only guaranteed to work on systems
      with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
      this change and his crash-course in atomics and memory ordering.)
    - Removed 'volatile' from structs' barrier field declarations in
      bli_thrcomm_*.h.
    - Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
      consistent with that of the _openmp.? files.
    - Updated other bli_thrcomm_* files to rename "communicator" variables to
      simply "comm".

commit 0e58ba1b3aa84700ca51a96f1c0eed6067562fba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 17 19:03:22 2017 -0500

    Added API to set mt environment variables.
    
    Details:
    - Renamed bli_env_get_nway() -> bli_thread_get_env().
    - Added bli_thread_set_env() to allow setting environment variables
      pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
    - Added the following convenience wrapper routines:
        bli_thread_get_jc_nt()
        bli_thread_get_ic_nt()
        bli_thread_get_jr_nt()
        bli_thread_get_ir_nt()
        bli_thread_get_num_threads()
        bli_thread_set_jc_nt()
        bli_thread_set_ic_nt()
        bli_thread_set_jr_nt()
        bli_thread_set_ir_nt()
        bli_thread_set_num_threads()
    - Added #include "errno.h" to bli_system.h.
    - This commit addresses issue #140.
    - Thanks to Chris Goodyer for inspiring these updates.

commit 8772a0b33a90154c80d88b381dcdd66f824e041f
Author: Marat Dukhan <marat@fb.com>
Date:   Thu Jul 13 21:39:24 2017 -0700

    Fix Emscripten builds

commit 72c8b49bb8d3b9370b2cc37718da22f065de9c57
Merge: 70cc825b ba7cada5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 12 14:58:12 2017 -0500

    Merge pull request #138 from hominhquan/membrk_set_free_fp
    
    Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers

commit ba7cada51a238d320528e3504ed0f0a17a6b022a
Author: Minh Quan HO <mqho@kalray.eu>
Date:   Fri Jul 7 10:52:05 2017 +0200

    set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
    
    The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
    not set in bli_membrk_init

commit 1241301869957c96f16a2c6567e3ad70afa547de
Merge: 969b67e8 25ead66f
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Wed Jul 5 02:24:00 2017 -0400

    Merge "Reducing the framework overhead of GEMV routines" into amd-staging

commit 25ead66fb78557f73af48bac305724d5d8aa3309
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Fri Jun 30 12:23:19 2017 +0530

    Reducing the framework overhead of GEMV routines
    
    Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684

commit 969b67e8800fbd5d14a086606f3b5afbf66ed093
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Jul 4 12:57:32 2017 +0530

    Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
    
    Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4

commit 70cc825b552dec05165b9d70f9e6eb33d8abb118
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Jun 6 21:58:21 2017 -0500

    Update LICENSE
    
    Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].

commit cf54c77bc79a0f33a514be72c80a654c4e6e6f63
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Jun 6 20:23:17 2017 -0500

    Add new SSI acknowledgment

commit d6ef56c6dbaf6df8ee1af1ca6a0f0792a811396a
Author: prangana <pradeep.rao@amd.com>
Date:   Thu Jun 1 16:11:09 2017 +0530

    Update version number
    
    Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4

commit 897bfa0e92082c30bbb74229562d7d7327cbbac8
Author: prangana <pradeep.rao@amd.com>
Date:   Thu Jun 1 16:11:09 2017 +0530

    Update version number
    
    Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4

commit 99d0ba5606d4b63e6a9c639aa78d4defc2455f79
Merge: be2c7eb8 6d17e012
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date:   Thu Jun 1 02:19:02 2017 -0400

    Merge "Checked in the small matrix code to compute GEMM called with A transpose case" into amd-staging

commit 6d17e0120fe5c127b941136ad2c0c08e91439535
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed May 24 11:48:16 2017 +0530

    Checked in the small matrix code to compute GEMM called with A transpose case
    
    Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462

commit 9d93f8481a1404695f7b78a3ced8ca47e890b649
Author: prangana <pradeep.rao@amd.com>
Date:   Tue May 30 09:58:10 2017 +0530

    Update Licence File
    
    Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2

commit be2c7eb85168937bd4318f4d05ded37620119310
Author: prangana <pradeep.rao@amd.com>
Date:   Tue May 30 09:58:10 2017 +0530

    Update Licence File
    
    Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2

commit 7f41bb0a0becde6a7de7df0f99668d7b4686c3b0
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri May 26 14:49:31 2017 -0400

    PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.

commit d87614af3f3d9187be94d6e77984b282bf890928
Author: Devin Matthews <dmatthews@gator3.ufhpc>
Date:   Fri May 26 14:47:36 2017 -0400

    Revert "Change PACKDIM_MR (double) for haswell to 8."
    
    This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.

commit 681eec913d7c2ebcff637cec5c1627ced9a92b99
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri May 26 12:28:09 2017 -0500

    Change PACKDIM_MR (double) for haswell to 8.

commit 0a3ae0ecaa0ddcb5887005d7051fa234499f1120
Merge: 0f4e6652 6e04f9df
Author: praveeng <praveen.g@amd.com>
Date:   Sat May 20 16:53:50 2017 +0530

    frame/3/gemm/bli_gemm_front.c
    
    Change-Id: I52a0fbc1d33bb948d430942323bbc5fe44e3ca13

commit 6e04f9df01d79c1b0e673943ca0d5d0a6095eb2e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 17 13:03:52 2017 -0500

    Restored deleted lines from makefile fragments.

commit ec5c0c0448275280dca0991f6f33afeb73650450
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed May 17 12:29:44 2017 -0500

    Change to /bin/sh.
    
    All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.

commit 555ddc30d4c7e44f3f335e436c98606f56e1598b
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed May 17 12:27:14 2017 -0500

    Remove shebangs from makefiles.

commit f26bd7f42e0c2a47fe321b2c452644990b689654
Merge: cbf8710a 169fb05f
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed May 17 11:58:41 2017 -0500

    Merge pull request #128 from iotamudelta/master
    
    Portability and clang

commit 169fb05f225c2f060265bcaa872f7f80dc638b70
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 23:11:22 2017 -0400

    Fix if/else structure. Thanks to TravisCI.

commit 0579dfea0bcfbb90ebc073fcf78b92a5cf7238e1
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:58:07 2017 -0400

    Restore version.

commit a75b05c23dc786a1fdc45dc1627a5ce2299f1a7b
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:23:27 2017 -0400

    Mark piledriver compilable w/ clang.

commit 7541d46e2ba8659bb2e36b444edef112fefa1345
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:12:12 2017 -0400

    Mark bulldozer compilable w/ clang.

commit 91f897073ec0df3330ede449c4d6af8158266ae3
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:06:59 2017 -0400

    Correct error message.

commit f5131e1e49167f948bddd714bb1af1761829c212
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 22:03:23 2017 -0400

    Indeed once can compile for carrizo also using clang.

commit 5fa4e9439c04f35f89dd7d26ff742cb2dadc3180
Author: J M Dieterich <dieterich@ogolem.org>
Date:   Tue May 16 21:50:49 2017 -0400

    A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash

commit 1f3a58197e5d5f9ac862bda91e7527cbfbab5d76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 8 16:10:03 2017 -0500

    Housekeeping, induced method file/function renames.
    
    Details:
    - Renamed all level-3 induced method files to use the "_vir.c" suffix
      instead of "_ref.c". Also renamed functions within these files
      accordingly.
    - Renamed cpp macro definitions in frame/ind/include according to the
      above changes.
    - Removed frame/3/old.

commit cbf8710a1ba63e25aadaa6fc5da51ea81b3d596d
Merge: cf39d3ef fdc66f12
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Mon May 8 11:21:20 2017 -0500

    Merge pull request #127 from devinamatthews/fix_blis_nt_xx
    
    Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS

commit cf39d3ef3b29b8058c39fb4638c1a734fe64aaed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 5 15:06:56 2017 -0500

    Fixed a bug in norm1v, norm1m.
    
    Details:
    - Fixed a bug that manifested as improperly-computed 1-norm for vectors
      and matrices. This is one of the few operations in BLIS that does not
      have its own test module within the testsuite, hence why it went
      undetected for so long. The bad 1-norms were being used to normalize
      matrices in the testsuite after initialization, which led to some
      matrices containing a combination of "large" and "small" values. This
      tended to push the residuals computed after each test away from zero.
      In some cases, they were off *just* enough to the testsuite to label
      it a "failure". Many thanks to Jeff Hammond for reporting this bug.
      (Wonky details: the bug was due to improperly-defined level-0 scalar
      macros for abval2, an operation that computes the absolute square,
      or complex magnitude/modulus. Certain complex domain instances of
      abval2 were being incorrectly defined in terms of real-only solutions,
      leading to bad results. This level-0 operation forms the basis of
      norm1v/norm1m. absq2 was also affected, but almost nothing uses
      this operation.)

commit 799485124f4d823e908d2e5d38b0c3a1e6172ade
Merge: 773a24ef 0df3541f
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu May 4 10:52:09 2017 -0500

    Merge pull request #121 from jeffhammond/not-real-knl
    
    allow KNL build without hbwmalloc (i.e. emulated)

commit fdc66f12d40754ff46179804bff592fddafbca02
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu May 4 10:35:22 2017 -0500

    Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123.

commit 773a24efb2fa1c3a220bf0ce1dd621a3176196da
Merge: dd58c954 b8854259
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 3 15:07:59 2017 -0500

    Merge branch 'master' of github.com:flame/blis

commit dd58c9545c877c3f7553eaebca7b5e9720a66f5d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 3 15:04:51 2017 -0500

    Disable complex 3m/4m in testsuite by default.
    
    Details:
    - Disabled testsuite tests of all level-3 implementations based on 3m
      and 4m. This will improve testing runtime on Travis CI as well as for
      anyone manually running the testsuite using default test parameters.
      Thanks to Devin Matthews for suggesting this change.

commit 0df3541f54b7fe0c604ab2ec47ba814f12391798
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Tue May 2 19:25:21 2017 -0700

    allow KNL build without hbwmalloc.h (i.e. emulated)
    
    we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
    although it is possible to install hbwmalloc implementation on such
    systems, it is easier not to, since obviously the performance of SDE
    execution is not representative so there is no reason to emulate HBW
    allocation.

commit b88542591d4dd0cde366e5ae35afd3205cb81bdc
Merge: 43007f7b c2c91e09
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 19:22:41 2017 -0500

    Merge pull request #107 from jeffhammond/intel-compilers-no-use-libm
    
    never use libm with Intel compilers

commit 43007f7b65ec7926cbbfc39965ff733fa251c15f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 16:48:43 2017 -0500

    Fixed stray parentheses in README citations.

commit a4f1d0b8801c114e9ef8be39df01e1b8d27ebcb3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 16:38:43 2017 -0500

    CHANGELOG update (0.2.2)

commit 940a707ac78de975110e17c95765e65b89aa5e10
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 16:38:42 2017 -0500

    Version file update (0.2.2)

commit d5a5e003ea9b24bb6abf12e88862e8eb61ffb03d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 15:48:30 2017 -0500

    Fixed a trsm1m bug that affected right-side cases.
    
    Details:
    - Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
      was nondeterministic behavior (usually segmentation faults) for certain
      problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
      cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
      which explicitly directed the virtual gemm micro-kernel to use temporary
      space if the storage preference of the [real domain] gemm ukernel did
      not match the storage of the output matrix C. In the context of gemm,
      this handling is not needed because agreement between the storage pref
      and the matrix is guaranteed by a high-level optimization in BLIS.
      However, this optimization is not applied to trsm because the storage
      of C is not necessarily the same as the storage of the micro-panels of
      B--both of which are updated by the micro-kernel during a trsm
      operation. Thus, the guarantee of storage/preference agreement is not
      in place for trsm, which means we must handle that case within the
      virtual gemm micro-kernel.
    - Comment updates and a minor macro change to bli_trsm*_cntx_init() for
      3m1, 4m1a, and 1m.

commit e80993e71f4d571e9650a8e90ed386e32059eae5
Merge: a509fbd5 ca3a7924
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 12:30:28 2017 -0500

    Merge branch 'master' into 1m

commit ca3a7924770d6cf203cce4ca9f5482e1d0d4e961
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 2 12:09:39 2017 -0500

    README.md update.
    
    Details:
    - Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
      and 6th BLIS papers.

commit 0f4e6652dfe9b30105d3bab328ac26d9d5c11182
Merge: 42e7f6fb 6e7de6ef
Author: praveeng <praveen.g@amd.com>
Date:   Wed Apr 19 17:54:10 2017 +0530

    Merge master code till 2017_04_19 to amd-staging
    
    Change-Id: Ibebe83c8ea2e7eb15798c2bcf214b7228a1c9518

commit 42e7f6fb2a531429ee600b2fe0293b67371c7ccb
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Mar 28 18:10:03 2017 +0530

    fixed license attribute issues in AMD added files
    
    Change-Id: I303f870a777c7cd1c1af29ea0b93f3e0a27948e4

commit 5600001e973c6cea048bd3fdb28117f1d7c98b9d
Merge: 0b190293 b3ed4933
Author: prangana <pradeep.rao@amd.com>
Date:   Mon Mar 20 13:56:33 2017 +0530

    Fix merge conflicts after sync with release branch
    
    Change-Id: Icf14a09f728befb69a73fff9fa79c4128e728310

commit 6e7de6ef84babb273dc5528a9b9d01f0febe394b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 17 12:10:24 2017 -0500

    Minor updates to test/3m4m.
    
    Details:
    - Updated initial problem size and increment in Makefile.
    - Updated code in test_gemm.c to correctly query kc from context.

commit f484c6cd4389dc7ae5b972849e12e98ad5bbf9a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 17 12:07:27 2017 -0500

    Whitespace reformatting to armv8a kernels file.
    
    Details:
    - Updated formatting of function signature/header in
      kernels/armv8a/3/bli_gemm_opt_4x4.c.

commit 0b19029342ffc530fa22ef20398a26221cb8f6ec
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Mar 14 14:51:31 2017 +0530

    Code cleanup, removed warnings from trsm, removed unused routines in axpyv & scalv
    
    Change-Id: I02867f394c5f416194c4b1769a6c75f39243ec81

commit 825363bd2a5a60a923d4a6d9691dc143845a9cab
Merge: 093bdb80 513944e4
Author: praveeng <praveen.g@amd.com>
Date:   Wed Mar 8 15:42:49 2017 +0530

    Merge code from master to amd-staging as on 2017_03_08 by praveeng
    
    Change-Id: I80740081b2cb54c9b77a3e78b9fe540e170be23d

commit 093bdb80c86b06367e595aa17487139ae983822f
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Mar 7 13:35:50 2017 +0530

    Checked in Unpacked DGEMM code
    
    Change-Id: I39dcc7b238b328f73ee2675d21a5e521d0488723

commit 33923da9a108854590d386e74b6ee66b971e7796
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Mon Mar 6 14:31:31 2017 +0530

     Added variant 10 for double precision axpyv microkernel
    
    Change-Id: I7a20cc113a422603250bc450825c965136354974

commit bc828f7f8e3ddb9f58af07edc0b935b21759fb0f
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Fri Mar 3 14:45:35 2017 +0530

    Added new axpyv (single precision) microkernel where it performs 10 FMAs per loop- This gives better performance than all other implementations of axpyv
    
    Change-Id: Ic4f0e4c67e367d67d0b24febcf34f81a70a39972

commit c9949f4603419267c10973adf1d63ec38497475d
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Fri Feb 17 14:16:33 2017 +0530

    Checked in DGEMMTRSM and edge case handling routine in DDOTXF
    
    Change-Id: I65f00661af6c09b2507294fd43e0a10641c0597e

commit a509fbd5ac04fafd4e51b43d2f59ca56432dc212
Merge: 69b4846a 513944e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 21 17:06:16 2017 -0600

    Merge branch 'master' into 1m

commit 69b4846ae9adb157c4171b52e159684db2867853
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 21 15:33:39 2017 -0600

    Disabled experiment-related 1m code.
    
    Details:
    - Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
      specifically inserted to facilitate the benchmarking of 1m block-panel
      and panel-block algorithms.
    - Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
      reflect changes used/needed during benchmarking.

commit 513944e4a951d8823b4de161b86ad7a965b4d99b
Merge: 8b462a0e 0e18f68c
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Feb 20 10:04:33 2017 -0500

    Merge pull request #118 from devinamatthews/master
    
    Handle k=0 correctly in KNL dgemm ukernel.

commit 0e18f68cf12eb9189ba901a20040b1cdae417670
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Feb 20 09:03:21 2017 -0600

    Handle k=0 correctly in KNL dgemm ukernel.

commit 8b462a0e8c3e9252f0401940849e53cc772256fa
Merge: c362afc5 7d42fc07
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sun Feb 19 23:03:03 2017 -0500

    Merge pull request #117 from devinamatthews/master
    
    Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.

commit 7d42fc0796ef0c010375fd8e59b1240ba41ce4d2
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sun Feb 19 21:10:55 2017 -0500

    Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.

commit 04245c9ff7f8b3c70d61003029c964bb9a4320ee
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Fri Feb 10 14:24:30 2017 +0530

    Reoptimized scalv routines - two vector multiplies are done per iteration, and these routines are enabled in bli_kernel.h
    
    Change-Id: Ic5654508573d1f6bde2edef06aefe117e581feb5

commit c362afc525bab4050581d1b0fcea2fe4d582c608
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 9 11:54:59 2017 -0600

    Added missing "level-0" BLAS [sd]cabs1_().
    
    Details:
    - Fixed issue #115 by adding implementations for scabs1_() and dcabs1_()
      to the BLAS compatibility layer. Thanks to heroxbd for pointing out
      their absence.

commit 018180c938c32efbeaaf626ba71ec5b780664db1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 8 11:20:52 2017 -0600

    Fixed a minor bug in configure (issue #114).
    
    Details:
    - Fixed a bug in the configure script whereby a non-preferred value for
      --enable-threading would cause problems in common.mk vis-a-vis detecting
      which threading model was chosen. Thanks to heroxbd for reporting this
      issue.

commit 58b5b77e5fdb179ea465e398e416e6a00d917e05
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Wed Feb 8 21:43:34 2017 +0530

    Fixed a bug in axpyv, the arguments passed to intrinsic fmad instruction are corrected
    
    Change-Id: If12f24c6bc74b22ac9e4acd6b9378e06d79f2f5e

commit 85de4ebf74d0a5587d5a12724eb5489d51674db3
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Wed Feb 8 14:41:04 2017 +0530

    variant 4 axpyv single precision modified: explicitly used FMA intrinsics, replaced vector multiply and add operations
    
    Change-Id: I975feef56696d479d2b9e9441b0660021cf4f6ff

commit 3fa53e8af31d634779f40258c51483ae8af494fa
Merge: b5291a44 95be7b04
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Wed Feb 8 11:46:34 2017 +0530

    Merged axpyv and gemm small in bli_kernel.h
    Merge branch 'amd-staging' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging
    
            modified:   config/zen/bli_kernel.h
            modified:   frame/3/gemm/bli_gemm_front.c
            modified:   kernels/x86_64/zen/3/bli_gemm_small_matrix.c
    
    Change-Id: If181cf9345178c448b3530beb8bef453917fe295

commit 95be7b04709e688a4cb01fba680081e30f4258ef
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Feb 7 14:01:27 2017 +0530

    Added logic for packing matrix A and prefetching matrix C in Unpacked SGEMM code
    
    Change-Id: I99efeca9eb5b4449286ec0ec133fd554ef1bb4f0

commit b5291a445b1313e01f1e0e8102c5f3660ab07f69
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Feb 7 12:39:31 2017 +0530

    Added optimization variant 4 for axpyv single precision - this performs 5 FMA per loop, keeping the IPC always full
    
    Change-Id: Ie77ed22584271136a257e673bcd3b1ba71136bc9

commit f4bfc1662af82aa4b98185334c44835e51f1cbec
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Mon Feb 6 15:04:27 2017 +0530

    New routines implemented for axpyv to improve performance for small vector sizes, vectorization is done for vectors as small as 8 (single precision) 4(double precision), since this operation has low compute to memory ratio, higher matrix sizes memory operations are dominating and hence not much gain - This still needs some work- added saxpyv and daxpyv var 3 routines in the file bli_axpyv_opt_var1.c
    
    Change-Id: Ic1b33bd5516e10113b00e44ab41b97eb19d46072

commit ddf45e71770c55ea4a58ca24ea4913fe5d8beb9b
Merge: a6ab91bc 78e1b16e
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jan 27 14:25:40 2017 -0600

    Merge pull request #113 from devinamatthews/knl_thread_params
    
    Change default threading parameters for KNL.

commit 78e1b16e16d589ed31b2e712115ee282097f114d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jan 27 14:22:20 2017 -0600

    Change default threading parameters for KNL.

commit 574472ba5a89924eca7dbd10055d0e1dcd7f4c71
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Jan 10 14:51:46 2017 +0530

    checked in unpacked SGEMM optimization
    
    Change-Id: I8e4ea374415c0c402c660b656fb076af15354181

commit 1c732d3ddc4ac0861d3b0e0dd15eb7e071615502
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 25 16:25:46 2017 -0600

    Added 1m-specific APIs for bp, pb gemm algorithms.
    
    Details:
    - Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
      body of bli_gemm_cntl_create() replaced with a call to the former.
    - Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
      bli_cntl_free() can check if the thread parameter is NULL, and if so,
      call the latter, and otherwise call the former.
    - Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
      terms of bli_gemm1mxx_cntx_init(), which behaves the same as
      bli_gemm1m_cntx_init() did before, except that an extra bool parameter
      (is_pb) is used to support both bp and pb algorithms (including to
      support the anti-preference field described below).
    - Added support for "anti-preference" in context. The anti_pref field,
      when true, will toggle the boolean return value of routines such as
      bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
      causing BLIS to transpose the operation to achieve disagreement (rather
      than agreement) between the storage of C and the micro-kernel output
      preference. This disagreement is needed for panel-block implementations,
      since they induce a transposition of the suboperation immediately before
      the macro-kernel is called, which changes the apparent storage of C. For
      now, anti-preference is used only with the pb algorithm for 1m (and not
      with any other non-1m implementation).
    - Defined new functions,
        bli_cntx_l3_ukr_eff_prefers_storage_of()
        bli_cntx_l3_ukr_eff_dislikes_storage_of()
        bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
        bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
      which are identical to their non-"eff" (effectively) counterparts except
      that they take the anti-preference field of the context into account.
    - Explicitly initialize the anti-pref field to FALSE in
      bli_gks_cntx_set_l3_nat_ukr_prefs().
    - Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
      in terms of the existing block-panel macro-kernel _ker_var2(). This
      technique requires inducing transposes on all operands and swapping
      the A and B.
    - Changed bli_obj_induce_trans() macro so that pack-related fields are
      also changed to reflect the induced transposition.
    - Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
      specify the 1m algorithm (block-panel or panel-block).
    - Renamed the following cntx_t-related macros:
        bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
        bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
        bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
      and updated all instantiations. Also updated the field names in the
      cntx_t struct.
    - Comment updates.

commit 41595e98eedaf3f1f93802c14dcae490402f933f
Merge: d625c49e a6ab91bc
Author: praveeng <praveen.g@amd.com>
Date:   Wed Dec 7 15:13:21 2016 +0530

    Merge master code as on 2016_12_07 to amd-staging
    
    Change-Id: I5d9ecef9bff960aeb9b51ca4e4b21714e789e44f

commit d625c49e20bd3c50d6d44e330e34076cced114a3
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Nov 29 15:05:19 2016 +0530

    checked-in SGEMMTRSM microkernel for Zen
    
    Change-Id: Ib61936418dea911b2154aa99f703b66e9669f94f

commit a6ab91bc61432490fadf18d596de4589645f37dd
Merge: 145a551d 7f31a630
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 30 09:26:58 2016 -0600

    Merge pull request #111 from figual/master
    
    Fixed missing cntx argument in ARMv8 microkernels.

commit 7f31a6307b7bd35f913c895947552c3a176f789b
Author: Francisco Igual <figual@ucm.es>
Date:   Sun Nov 27 14:40:47 2016 +0100

    Fixed missing cntx argument in ARMv8 microkernels.

commit 126482a3b609b9ad7026ba348f6c4bf6a29be8a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 25 18:29:49 2016 -0600

    Implemented the 1m method.
    
    Details:
    - Implemented the 1m method for inducing complex domain matrix
      multiplication. 1m support has been added to all level-3 operations,
      including trsm, and is now the default induced method when native
      complex domain gemm microkernels are omitted from the configuration.
    - Updated _cntx_init() operations to take a datatype parameter. This was
      needed for the corresponding function for 1m (because 1m requires us
      to choose between column-oriented or row-oriented execution, which
      requires us to query the context for the storage preference of the
      gemm microkernel, which requires knowing the datatype) but I decided
      that it made sense for consistency to add the parameter to all other
      cntx initialization functions as well, even though those functions
      don't use the parameter.
    - Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
      a second scalar for each blocksize entry. The semantic meaning of the
      two scalars now is that the first will scale the default blocksize
      while the second will scale the maximum blocksize. This allows scaling
      the two independently, and was needed to support 1m, which requires
      scaling for a register blocksize but not the register storage
      blocksize (ie: "packdim") analogue.
    - Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
      bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
      default and maximum blocksizes to some desired blocksize multiple.
      These functions are needed in the updated definitions of
      bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
    - Added support for the 1e and 1r packing schemas to packm, including
      1e/1r packing kernels.
    - Added a minor optimization to bli_gemm_ker_var2() that allows, under
      certain circumstances (specifically, real domain beta and row- or
      column-stored matrix C), the real domain macrokernel and microkernel
      to be called directly, rather than using the virtual microkernel
      via the complex domain macrokernel, which carries a slight additional
      amount of overhead.
    - Added 1m support to the testsuite.
    - Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
      some code in test_gemm.c driver.

commit d8f13beeea90338e0ecb0a3aeaa2d59d8ebd6c36
Merge: c25a9205 145a551d
Author: praveeng <praveen.g@amd.com>
Date:   Fri Nov 25 17:31:08 2016 +0530

    Merge master code till  2016_11_25 to amd-staging

commit c25a9205fd8c8d8de7fd81b1e5621e7ac79f4e87
Merge: 65298762 bdc0a264
Author: praveeng <praveen.g@amd.com>
Date:   Fri Nov 25 17:06:36 2016 +0530

    Merge master code till Switched to simpler trsm_r 2016_11_25 to amd-staging
    
    Change-Id: Ibf71d224d8fb6cf0bc497f84d50c27d276512cc1

commit 145a551d524ae5492667a05fc248923d922df850
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 23 17:59:06 2016 -0600

    Switched to simpler trsm_r implementation.
    
    Details:
    - Disabled the implementation of trsm_r that allows the right-hand matrix
      B to be trianglar, and switched to the implementation that simply
      transposes the operation (and thus the storage of C) in order to recast
      the operation as trsm_l. This avoids the need to use trsm_rl and trsm_ru
      macrokernels, which require an awkward swapping of MR and NR. For now,
      the support for trsm_r macrokernels, via separate control trees, remains.
    - Modified bli_config_macro_defs.h so that BLIS_RELAX_MCNR_NCMR_CONSTRAINTS
      is defined by default. This is mostly a safety precaution in case someone
      tries to switch back to the previous trsm_r implementation, but also
      serves as a convenience on some systems where one does not naturally
      choose blocksizes in a way that satisfies MC % NR = 0 and NC % MR = 0.

commit b3e58ee30307cf1e11529f2113acb9abbeda25af
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 23 17:58:26 2016 -0600

    Reimplemented 4x12 haswell ukernels (real only).
    
    Details:
    - Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
      defines 4x24 single real and 4x12 double real gemm microkernels, with
      broadcast-based implementations. (The previous microkernel file has been
      moved to an 'old' subdirectory.)

commit 65298762ff15c45e8588e0c279a9feaa98c927a0
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Nov 22 12:15:33 2016 +0530

    removed a redundant copy operation in DNRM2
    
    Change-Id: I673b08efde4480e871779716f7715566740ad9ce

commit d6863e851adeef037e4d1476fe63bb293fb9d987
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Mon Nov 21 11:30:30 2016 +0530

    checked-in DNRM2 optimizations
    
    Change-Id: I3b31d768bd7f4fbf43042aa5a0762995c73c4522

commit bdc0a264d2fb5940bfd09298b1de823674a39053
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 16 14:13:08 2016 -0600

    Adjusted stride selection of ct in macrokernels.
    
    Details:
    - Updated the changes introduced in 618f433 so that the strides of the
      temporary microtile ct used in the macrokernels is determined based
      on the storage preference of the microkernel (via the new functions
      below), rather than the strides of c. In almost all cases, presently,
      this change results in no net effect, as a high-level optimization
      in the _front() functions aligns the storage of c to that of the
      microkernel's preference. However, I encountered some cases where
      this is not always the case in some development code that has yet
      to be committed, and therefore I'm generalizing the framework code
      in advance.
    - Defined two new functions in bli_cntx.c:
        bli_cntx_l3_ukr_prefers_rows_dt()
        bli_cntx_l3_ukr_prefers_cols_dt()
      which return bool_t's based on the current micro-kernel's storage
      preferences. For induced methods, the preference of the underlying
      real domain microkernel is returned.
    - Updated definition of bli_cntx_l3_ukr_dislikes_storage_of(), and
      by proxy bli_cntx_l3_ukr_prefers_storage_of(), to be in terms of
      the above functions, rather than querying the preferences of the
      native microkernel directly (which did the wrong thing for induced
      methods).

commit 031978d2647cf08316858baf29c84ebba9c3133e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 16 14:04:33 2016 -0600

    Fixed inactive trsm_r blocksize constraint code.
    
    Details:
    - Changed a cpp macro that was meant to prevent using certain trsm_r code
      if BLIS_RELAX_MCNR_NCMR_CONSTRAINTS was defined. It was actually coded
      incorrectly at first. I've now fixed its location and changed its
      consequence to a compile-time #error message.

commit 9772218cae57d55c252595b01e3669d8bed84944
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Nov 16 15:19:19 2016 +0530

    Added optimized DAMAX routines for Zen
    
    Change-Id: I499c0c8f0f4ce6c19235c47b86d5608db6ba50f8

commit 9c448e30174e5eb76a94b43b30819704a5dfcb3f
Merge: 998d8240 e35d3c23
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date:   Wed Nov 16 04:18:57 2016 -0500

    Merge "Added new optimized micro-kernel for dotxv routine" into amd-staging

commit 998d824044adac0d54c921dcd44fb58f3d54aad2
Merge: 0d13e9a4 6b5a4032
Author: praveeng <praveen.g@amd.com>
Date:   Wed Nov 16 14:22:42 2016 +0530

    Merge master code till devinamatthews/omp_num_thrds 2016_11_16 to amd-staging
    
    Change-Id: I601ff1d3ec8a680e1be039ffc7b299744e8a27c5

commit 6b5a4032d2e3ed29a272c7f738b7e3ed6657e556
Merge: 3b524a08 a8220e3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 10 15:28:24 2016 -0600

    Merge pull request #109 from devinamatthews/omp_num_threads
    
    Add automatic loop thread assignment.

commit a8220e3a86433b5d76789e32ea7ca014a11b6d17
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Nov 10 14:19:34 2016 -0600

    - Fix typo in bli_cntx.c
    - Bump BLIS_DEFAULT_NR_THREAD_MAX to 4

commit e35d3c23f28784e50ee13d2e77a69d60e0c24c1f
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Thu Nov 10 14:30:53 2016 +0530

    Added new optimized micro-kernel for dotxv routine
    
    Change-Id: I2c544e9b25a454d971ad690353502a55cd668391

commit 0d13e9a4f6f2fcda08f205215240cdf86442d6c6
Merge: e044fa62 3b524a08
Author: praveeng <praveen.g@amd.com>
Date:   Mon Nov 7 14:40:41 2016 +0530

    bli_kernel.h
    
    Change-Id: I425d089f79497a0de7d1622e829c3ca9edf7f091

commit c05b3862f6241486442b313eff0c8bee7b5e1274
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Nov 4 15:48:02 2016 -0500

    Add automatic loop thread assignment.
    
    - Number of threads is determined by BLIS_NUM_THREADS or OMP_NUM_THREADS, but can be overridden by BLIS_XX_NT as before.
    - Threads are assigned to loops (ic, jc, ir, and jc) automatically by weighted partitioning and heuristics, both of which are tunable via bli_kernel.h.
    - All level-3 BLAS covered.

commit 3b524a08e3fb8380e7b8b2ba835312c51a331570
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 2 17:45:18 2016 -0500

    Consolidated 3m1/4m1 gemmtrsm, trsm ukernel code.
    
    Details:
    - Consolidated the macros that define the lower and upper versions of the
      gemmtrsm microkernels into a single macro that is instantiated twice.
      Did this for both 3m1 and 4m1 microkernels.
    - Consolidated lower and upper versions of the trsm microkernels for 3m1
      and 4m1 into single files (each).

commit ead231aca635deb3db270f118454e4222c627f31
Merge: d25e6f8b 62987f60
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 2 13:03:50 2016 -0500

    Merge pull request #108 from devinamatthews/patch-2
    
    Update .travis.yml with additional tests

commit 62987f60a6a6ff0a75b31d0404f493593ce35ccc
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Nov 2 11:20:37 2016 -0500

    Allow KNL to fail

commit 8f9010542c751ae3cbfe6121cb011d8985c1e00d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Nov 2 11:18:32 2016 -0500

    Fix some problems with OSX builds:
    
    - Update CPU detection for Intel archs (esp. Skylake)
    - Allow clang for the reference config

commit d25e6f8b63c57f30b8a67dffbf4995977cf9f235
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 1 14:35:15 2016 -0500

    Can disable trsm_r-specific blocksize constraints.
    
    Details:
    - Added cpp guards around the constraints in bli_kernel_macro_defs.h
      that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
      needed when handling right-side trsm by allowing the matrix on the
      right (matrix B) to be triangular, because it involves swapping
      register, but not cache, blocksizes (packing A by NR and B by MR)
      and then swapping the operands to gemmtrsm just before that kernel
      is called. It may be useful to disable these constraints if, for
      example, the developer wishes to test the configuration with
      a different set of cache blocksizes where only MC % MR = 0 and
      NC % NR = 0 are enforced.
    - In summary, #defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
      the enforcement of MC % NR = 0 and NC % MR = 0.

commit 1a67e3688edb073a9d44c160e7b0798e08796b8a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Nov 1 13:53:18 2016 -0500

    Bogus commit
    
    Need to trigger another Travis build.

commit 2cd82d67b372cad1bed50cfd99e524f1f40b4e24
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Nov 1 13:25:50 2016 -0500

    Some fixes for .travis.yml
    
    - Switch to gcc-5 to support knl
    - Don't run tests in parallel -- it is super slow.
    - Use clang on OSX since gcc is only a zombie husk.

commit a3db4e6bdfe745083acf704ab0f51f74ea869538
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Nov 1 10:33:18 2016 -0500

    Update .travis.yml with additional tests
    
    - Test knl configuration (without running of course).
    - Test openmp and pthreads threading for auto configuration with 4 threads.
    - Test auto configuration with and without pthreads on OSX.
    - Also, run make in parallel.
    
    I don't know how the `addons:` section works on OSX; hopefully it is just ignored.

commit 8a11a2174a1a5b9426f13bbc5338dc86ab138cdd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 31 19:07:55 2016 -0500

    Updates to non-default haswell microkernels.
    
    Details:
    - Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
      constraints.
    - Added missing c and z microkernels, which are based on the corresponding
      kernels in the d6x8 set.
    - This completes the d8x6 set (which may be used for situations when it
      is desirable to have a microkernel with a column preference).

commit 618f4331eba209803ecab99747872eceb1b5f091
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 31 14:40:51 2016 -0500

    Align strides of ct in macrokernels to that of c.
    
    Details:
    - Previously, rs_ct and cs_ct, the strides of the temporary microtile used
      primarily in the macrokernels' edge case handling, were unconditionally
      set to 1 and MR, respectively. However, Devin Matthews noted that this
      ought to be changed so that the strides of ct were in agreement with the
      strides of C. (That is, if C was row-stored, then ct should be accessed
      as by rows as well.) The implicit assumption is that the strides of C
      have already been adjusted, via induced transposition, if the storage
      preference of the microkernel is at odds with the storage of C. So, if
      the microkernel prefers row storage, the macrokernel's interior cases
      would present row-stored (ideal) microkernel subproblems to the
      microkernel, but for edge cases, it would still see column-stored
      subproblems (not ideal). This commit fixes this issue. Thanks to Devin
      for his suggestion.

commit c2c91e09b4893cb81314774557f728a95080f81e
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Tue Oct 25 21:15:26 2016 -0700

    never use libm with Intel compilers
    
    Intel compilers include a highly optimized math library (libimf) that
    should be used instead of GNU libm.
    
    yes, this change is for ALL targets, including those that are not
    supported by the Intel compiler.  there is no harm in doing this, and it
    is future-proof in the event that the Intel compilers support other
    architectures.

commit 630391002325a589063aec2ab0a7d89ef2e178c0
Merge: 956b3edf 216206c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 25 19:34:51 2016 -0500

    Merge pull request #105 from devinamatthews/knl
    
    Support for Intel Knight's Landing.

commit 216206c1d328a865c2192e35a4df6e9aff79a85b
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Oct 25 13:56:18 2016 -0500

    Fix up for merge to master.

commit 11eb7957abbcdf02d5e312898e094260eadb1209
Merge: cd5b6681 956b3edf
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Oct 25 13:51:07 2016 -0500

    Merge branch 'master' into knl
    
    # Conflicts:
    #       frame/thread/bli_thread.h

commit cd5b6681838899283cd94e5427dfda206e7fbabe
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Oct 25 13:49:27 2016 -0500

    Don't use %rbp in KNL packing kernels.

commit 956b3edf8eb09480f31f2e861c1b10f9ecbb2e52
Merge: b7e41d71 0662a3c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 25 13:02:57 2016 -0500

    Merge pull request #104 from devinamatthews/misspellings
    
    Add flexible options for thread model (pthread/posix for pthreads etc.).

commit 0662a3c1b1f4644a86bf8e5073d1391808c91b4a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Oct 25 12:42:44 2016 -0500

    Add flexible options for thread model (pthread/posix for pthreads etc.).

commit e044fa624008c161de32a39d734cddf1dd22dd41
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Oct 25 13:03:05 2016 +0530

    Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
    
    Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a

commit b3ed4933aa0da72ad771fb0fdf1727e5ba9ad7b4
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Tue Oct 25 13:03:05 2016 +0530

    Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
    
    Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a

commit b7e41d71b07d2af6d22d632c70e0c5f7ce46852c
Merge: 4bd905bd 5117d444
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 24 16:47:46 2016 -0500

    Merge pull request #103 from devinamatthews/patch-1
    
    Change .align to .p2align in Bulldozer ukernels.

commit 5117d444f7f3a2bc327f067926eaf2398212edda
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Oct 24 16:20:47 2016 -0500

    Change .align to .p2align in Bulldozer ukernels
    
    Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.

commit 4bd905bd4597e0ad7bedf31e25e779d3e2dfda29
Merge: 936d5fdc 7f32dd57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 21 14:48:44 2016 -0500

    Merge pull request #93 from ShadenSmith/config_check
    
    Adds sanity check to configuration choice.

commit 936d5fdc26c6c4dab199a8d11fde948975cfa1d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 21 14:34:27 2016 -0500

    Fixed multithreading compilation bug in 970745a.
    
    Details:
    - Moved the definition of the cpp macro BLIS_ENABLE_MULTITHREADING
      from bli_thread.h to bli_config_macro_defs.h. Also moved the
      sanity check that OpenMP and POSIX threads are not both enabled.
    - Thanks to Krzysztof Drewniak for reporting this bug.

commit d250e6a3af3af8beedcda28f508ac03e94efb3c8
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Thu Oct 20 14:34:39 2016 +0530

    Merged TRSM and scalv routines into zen folder
    
    Change-Id: Ice897bc83e8fb70b90f23cc3ce892c39883aceb9

commit 8feb0f85a674e84bec2417486e3bcea584b14c04
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 19 16:05:41 2016 -0500

    Removed auto-prototyping of malloc()/free() substitutes.
    
    Details:
    - Removed the header file, bli_malloc_prototypes.h, which automatically
      generated prototypes for the functions specified by the following
      cpp macros:
        BLIS_MALLOC_INTL
        BLIS_FREE_INTL
        BLIS_MALLOC_POOL
        BLIS_FREE_POOL
        BLIS_MALLOC_USER
        BLIS_FREE_USER
      These prototypes were originally provided primarily as a convenience
      to those developers who specified their own malloc()/free() substitutes
      for one or more of the following. However, we generated these prototypes
      regardless, even when the default values (malloc and free) of the
      macros above were used. A problem arose under certain circumstances
      (e.g., gcc in C++ mode on Linux with glibc) when including blis.h that
      stemmed from the "throw" specification which was added to the glibc's
      malloc() prototype, resulting in a prototype mismatch. Therefore, going
      forward, developers who specify their own custom malloc()/free()
      substitutes must also prototype those substitutes via bli_kernel.h.
      Thanks to Krzysztof Drewniak for reporting this bug, and Devin Matthews
      for researching the nature and potential solutions.

commit 970745a5fc7c29de3e202988e5eb104fabca4fdc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 19 15:58:03 2016 -0500

    Reorganized typedefs to avoid compiler warnings.
    
    Details:
    - Relocated membrk_t definition from bli_membrk.h to bli_type_defs.h.
    - Moved #include of bli_malloc.h from blis.h to bli_type_defs.h.
    - Removed standalone mtx_t and mutex_t typedefs in bli_type_defs.h.
    - Moved #include of bli_mutex.h from bli_thread.h to bli_typedefs.h.
    - The redundant typedefs of membrk_t and mtx_t caused a warning on some C
      compilers. Thanks to Tyler Smith for reporting this issue.

commit 1c2f7b57d557c05f5ef6148cccafaf0f70d910da
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Tue Oct 18 15:06:35 2016 +0530

    Removed symlinks to zen kernels from haswell kernel folder and also modified the bli_kernel.h file accordingly
    
    Change-Id: Ib3736af48e851c8243bbe10d937fb942c49ad048

commit d864ea9f4f039fe2b2dc395d0015bd9e8902bc8e
Merge: 7045fcbf 28b2af8a
Author: praveeng <praveen.g@amd.com>
Date:   Fri Oct 14 17:00:57 2016 +0530

    Merge master code 2016_10_14 till Added disabled code thrinfo_t structures
    
    Change-Id: If7db98d286c1471fcd30f00757abee9b253ef987

commit 28b2af8a71133ce68774e153b6e05afb05affba8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 13 14:50:08 2016 -0500

    Added disabled code to print thrinfo_t structures.
    
    Details:
    - Added cpp-guarded code to bli_thrcomm_openmp.c that allows a curious
      developer to print the contents of the thrinfo_t structures of each
      thread, for verification purposes or just to study the way thread
      information and communicators are used in BLIS.
    - Enabled some previously-disabled code in bli_l3_thrinfo.c for freeing
      an array of thrinfo_t* values that is used in the new, cpp-guarde code
      mentioned above.
    - Removed some old commented lines from bli_gemm_front.c.

commit 11eed3f683d09e65f721567b346b0f733bff9a64
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 13 14:23:23 2016 -0500

    Fixed a configure -t omp/openmp bug from fd04869.
    
    Details:
    - Forgot to update certain occurrences of "omp" in common.mk during
      commit fd04869, which changed the preferred configure option string
      for enabling OpenMP from "omp" to "openmp".

commit 7045fcbf0bd349ebe6cb9ac4508c6a387bb05966
Merge: 7e044900 9cda6057
Author: praveeng <praveen.g@amd.com>
Date:   Thu Oct 13 12:02:28 2016 +0530

    Merge master code 2016_10_13 Removed previously renamed/old files
    
    Change-Id: I8106d371afaa0af474a8967388d44481b05de923

commit 7e04490002206d3557fcfb7dd893838a7f36916f
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Oct 12 16:43:02 2016 +0530

    Checked in the SAMAX optimizations
    
    Change-Id: I7faf8c3adf52ff01432188ad3b9866ee4b9a9dfd

commit 9cda6057eaa16a24ac8785a9fa167df6c9edba44
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 11 13:21:26 2016 -0500

    Removed previously renamed/old files.
    
    Details:
    - Removed frame/base/bli_mem.c and frame/include/bli_auxinfo_macro_defs.h,
      both of which were renamed/removed in 701b9aa. For some reason, these
      files survived when the compose branch was merged back into master.
      (Clearly, git's merging algorithm is not perfect.)
    - Removed frame/base/bli_mem.c.prev (an artifact of the long-ago changed
      memory allocator that I was keeping around for no particular reason).

commit 22377abd84b9e560ffe1c4e4d284eb443ddb7133
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 10 13:43:56 2016 -0500

    Fixed bli_gemm() segfault on empty C matrices.
    
    Details:
    - Fixed a bug that would manifest in the form of a segmentation fault
      in bli_cntl_free() when calling any level-3 operation on an empty
      output matrix (ie: m = n = 0). Specifically, the code previously
      assumed that the entire control tree was built prior to it being
      freed. However, if the level-3 operation performs an early exit, the
      control tree will be incomplete, and this scenario is now handled.
      Thanks to Elmar Peise for reporting this bug.

commit 0b571cd94d9b175331c9453258a6b1389a718ae8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 6 14:48:15 2016 -0500

    Fixed segfault in bli_free_align() for NULL ptrs.
    
    Details:
    - Fixed a bug in bli_free_align() caused by failing to handle NULL pointers
      up-front, which led to performing pointer arithmetic on NULL pointers in
      order to free the address immediately before the pointer. Thanks to Devin
      Matthews for reporting this bug.

commit cd84fb95182514601d72c78ee0e36a394d0284d7
Author: praveeng <praveen.g@amd.com>
Date:   Thu Oct 6 15:08:21 2016 +0530

    syntax erros in configure file
    
    Change-Id: Ibe8a6071aad97df550df64c009fec33a9d8f43a1

commit f2e7ea113aa93b74f1d42408d5db2c5a7b00a653
Merge: 133983c3 86969873
Author: praveeng <praveen.g@amd.com>
Date:   Thu Oct 6 12:35:30 2016 +0530

    conflicts merge for bli_kernel.h
    
    Change-Id: I15d846bd34e11f86ebfd7ed091ff671a1f3366a0

commit 133983c36fa01c7acb6d666b3744f77f216314a5
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Thu Oct 6 11:26:22 2016 +0530

    code clean up in bli_kernel.h
    
    Change-Id: I11d9cdf2af8e8199209eb084f6c3a7c910b83d5d

commit 4fb9b4ef2e4cf2626a6e000a41628fb823f16da8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 5 14:41:35 2016 -0500

    CHANGELOG update (0.2.1)

commit 866b2dde3f41760121115fb25f096d4344e8b4f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 5 14:41:34 2016 -0500

    Version file update (0.2.1)

commit 87fddeab3c8a5ccb1bbf02e5f89db1464e459ba9
Merge: 86969873 6f71cd34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 5 13:35:01 2016 -0500

    Merge branch 'compose'

commit 6f71cd344951854e4cff9ea21bbdfe536e72611d
Merge: c0630c40 8d55033c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 4 15:53:46 2016 -0500

    Merge pull request #94 from flame/distcomm
    
    Implemented distributed thrinfo_t management.

commit 86969873b5b861966d717d8f9f370af39e3d9de6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 4 14:24:59 2016 -0500

    Reclassified amaxv operation as a level-1v kernel.
    
    Details:
    - Moved amaxv from being a utility operation to being a level-1v operation.
      This includes the establishment of a new amaxv kernel to live beside all
      of the other level-1v kernels.
    - Added two new functions to bli_part.c:
        bli_acquire_mij()
        bli_acquire_vi()
      The first acquires a scalar object for the (i,j) element of a matrix,
      and the second acquires a scalar object for the ith element of a vector.
    - Added integer support to bli_getsc level-0 operation. This involved
      adding integer support to the bli_*gets level-0 scalar macros.
    - Added a new test module to test amaxv as a level-1v operation. The test
      module works by comparing the value identified by bli_amaxv() to the
      the value found from a reference-like code local to the test module
      source file. In other words, it (intentionally) does not guarantee the
      same index is found; only the same value. This allows for different
      implementations in the case where a vector contains two or more elements
      containing exactly the same floating point value (or values, in the case
      of the complex domain).
    - Removed the directory frame/include/old/.

commit 8d55033c966feed99fcca2a58017c3ab5b1646dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 27 15:20:58 2016 -0500

    Implemented distributed thrinfo_t management.
    
    Details:
    - Implemented Ricardo Magana's distributed thread info/communicator
      management. Rather that fully construct the thrinfo_t structures, from
      root to leaf, prior to spawning threads, the threads individually
      construct their thrinfo_t trees (or, chains), and do so incrementally,
      as needed, reusing the same structure nodes during subsequent blocked
      variant iterations. This required moving the initial creation of the
      thrinfo_t structure (now, the root nodes) from the _front() functions
      to the bli_l3_thread_decorator(). The incremental "growing" of the tree
      is performed in the internal back-end (ie: _int()) function, and so
      mostly invisible. Also, the incremental growth of the thrinfo_t tree is
      done as a function of the current and parent control tree nodes (as well
      as the parent thrinfo_t node), further reinforcing the parallel
      relationship between the two data structures.
    - Removed the "inner" communicator from thrinfo_t structure definition,
      as well as its id. Changed all APIs accordingly. Renamed
      bli_thrinfo_needs_free_comms() to bli_thrinfo_needs_free_comm().
    - Defined bli_l3_thrinfo_print_paths(), which prints the information
      in an array of thrinfo_t* structure pointers. (Used only as a
      debugging/verification tool.)
    - Deprecated the following thrinfo_t creation functions:
        bli_packm_thrinfo_create()
        bli_l3_thrinfo_create()
      because they are no longer used. bli_thrinfo_create() is now called
      directly when creating thrinfo_t nodes.

commit fd04869ae4d4a3b0ebb9052557c296456bce7c0d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 27 14:14:11 2016 -0500

    Changed configure's 'omp' threading to 'openmp'.
    
    Details:
    - Changed the configure script so that the expected string argument to the
      -t (or --enable-threading=) option that enables OpenMP multithreading is
      'openmp'. The previous expected string, 'omp', is still supported but
      should be considered deprecated.

commit 9424af87209e4e435e2e742430945152690170b0
Merge: efa7341d c0630c40
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 27 12:51:08 2016 -0500

    Merge branch 'compose'

commit 7f32dd57c6bd41c0704341752842277dd6a4c8eb
Author: Shaden Smith <shaden@cs.umn.edu>
Date:   Sat Sep 17 11:33:57 2016 -0500

    Adds sanity check to configuration choice.

commit efa7341df0b0115926aa8a6e8a4ebfb24fdbf11e
Merge: 121c39d4 e1453f68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 16 11:01:57 2016 -0500

    Merge pull request #92 from ShadenSmith/readme_fix
    
    Fixes broken URL in README.md

commit e1453f68f6afd90ae9a29b7a5faa46aa79bbf741
Author: Shaden Smith <ShadenTSmith@gmail.com>
Date:   Fri Sep 16 09:29:28 2016 -0500

    Fixes broken URL in README.md

commit b922d7563422e14c49a4677bc6ae088a408861ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 23 13:38:36 2016 -0500

    Avoid compiling BLAS/CBLAS files when disabled.
    
    Details:
    - Updated the top-level Makefile, build/config.mk.in template, and
      configure script so that object files corresponding to source files
      belonging to the BLAS compatibility layer are not compiled (or archived)
      when the compatibility layer is disabled. (Same for CBLAS.) Thanks
      to Devin Matthews for suggesting this optimization.
    - Slight change to the way configure handles internal variables. Instead
      of converting (overwriting) some, such as enable_blas2blis and
      enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
      now stored in new variables that live alongside the originals (with the
      suffix "_01").  This is convenient since some values need to be
      sed-substituted into the config.mk.in template, which requires "yes" or
      "no", while some need to be written to the bli_config.h.in template,
      which requires "0" or "1".
    
    Updated BLIS4 TOMS citation in README.md.
    
    Added complex gemm micro-kernels for haswell.
    
    Details:
    - Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
      architectures. As with their real domain brethren, these kernels perfer
      row storage, (though this doesn't affect most users due to high-level
      optimizations in most level-3 operations that induce a transpose to
      whatever storage preference the kernel may have).
    
    Change-Id: I512ab90784ecbb7cdaee24928d2ccebb544ba5c1

commit 69826110bab2a064ec76457c24843d28f2581281
Merge: 64598ee4 a58dd35e
Author: Pradeep Rao <Pradeep.Rao@amd.com>
Date:   Wed Sep 14 03:26:25 2016 -0400

    Merge "Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision" into amd-staging

commit c0630c4024b08750043a2942a3e8a037aa6b6259
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 12 13:59:02 2016 -0500

    Added debugging printf()'s to bli_l3_thrinfo.c.
    
    Details:
    - Added optional printf() statements to print out thread communicator
      info as the thrinfo_t structure is built in bli_l3_thrinfo.c.
    - Minor changes to frame/thread/bli_thrinfo.h.

commit 7b3bf1ffcd7160ccbf6c2518af6d88f6742e4977
Merge: 35509818 121c39d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 6 15:47:13 2016 -0500

    Merge branch 'master' into compose

commit 121c39d455f2db6f7ce6802ba7f73ad5e088c68c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 5 13:11:42 2016 -0500

    Added complex gemm micro-kernels for haswell.
    
    Details:
    - Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
      architectures. As with their real domain brethren, these kernels perfer
      row storage, (though this doesn't affect most users due to high-level
      optimizations in most level-3 operations that induce a transpose to
      whatever storage preference the kernel may have).

commit 35509818cbea1598b123421f81c42120889a03c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 31 17:34:15 2016 -0500

    Added, moved some thread barriers.
    
    Details:
    - Removed thread barriers from the end of the loop bodies of
      bli_gemm_blk_var1(), bli_gemm_blk_var2(), bli_trsm_blk_var1(),
      and bli_trsm_blk_var2().
    - Moved the thread barrier at the end of bli_packm_int() to the
      end of bli_l3_packm(), and added missing barriers to that function.
    - Removed the no longer necessary (and now incorrect) ochief guard
      in bli_gemm3m3_packa() on the bli_obj_scalar_reset() on C.
    - Thanks to Tyler Smith for help with these changes.

commit 64598ee4cfb86f64abbd4bcef5a82ba0d5565b67
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Aug 31 12:54:50 2016 +0530

    fixed the symlink issue
    
    Change-Id: I2186d529f295c576597c189e1ae219bc1a83f955

commit abd61f9fa75d77a96d1491b3e035451ee73238fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 30 12:34:19 2016 -0500

    Updated BLIS4 TOMS citation in README.md.

commit 8a2373f26ba8fcd5b2d7b2cc72cb8b2e1f841a03
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Mon Aug 29 14:10:45 2016 +0530

    Norm 2 optimization
    
    Change-Id: Ide9decaccd20bf0ccc32c9abb6556e038dceed2b

commit fdc663902347aa252ea88cf09ce24ab748958dff
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Mon Aug 29 10:43:38 2016 +0530

    Placed 1 and 1f AMD optimized AVX routines under zen folder
    
    Change-Id: I26795211ef11d232ed794ce36dd0a9c1f8706328

commit 701b9aa3ff028decbf90efac0dca5bd64fe26269
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 26 19:04:45 2016 -0500

    Redesigned control tree infrastructure.
    
    Details:
    - Altered control tree node struct definitions so that all nodes have the
      same struct definition, whose primary fields consist of a blocksize id,
      a variant function pointer, a pointer to an optional parameter struct,
      and a pointer to a (single) sub-node. This unified control tree type is
      now named cntl_t.
    - Changed the way control tree nodes are connected, and what computation
      they represent, such that, for example, packing operations are now
      associated with nodes that are "inline" in the tree, rather than off-
      shoot braches. The original tree for the classic Goto gemm algorithm was
      expressed (roughly) as:
    
        blk_var2 -> blk_var3 -> blk_var1 -> ker_var2
                             |           |
                             -> packb    -> packa
    
      and now, the same tree would look like:
    
        blk_var2 -> blk_var3 -> packb -> blk_var1 -> packa -> ker_var2
    
      Specifically, the packb and packa nodes perform their respective packing
      operations and then recurse (without any loop) to a subproblem. This means
      there are now two kinds of level-3 control tree nodes: partitioning and
      non-partitioning. The blocked variants are members of the former, because
      they iteratively partition off submatrices and perform suboperations on
      those partitions, while the packing variants belong to the latter group.
      (This change has the effect of allowing greatly simplified initialization
      of the nodes, which previously involved setting many unused node fields to
      NULL.)
    - Changed the way thrinfo_t tree nodes are arranged to mirror the new
      connective structure of control trees. That is, packm nodes are no longer
      off-shoot branches of the main algorithmic nodes, but rather connected
      "inline".
    - Simplified control tree creation functions. Partitioning nodes are created
      concisely with just a few fields needing initialization. By contrast, the
      packing nodes require additional parameters, which are stored in a
      packm-specific struct that is tracked via the optional parameters pointer
      within the control tree struct. (This parameter struct must always begin
      with a uint64_t that contains the byte size of the struct. This allows
      us to use a generic function to recursively copy control trees.) gemm,
      herk, and trmm control tree creation continues to be consolidated into
      a single function, with the operation family being used to select
      among the parameter-agnostic macro-kernel wrappers. A single routine,
      bli_cntl_free(), is provided to free control trees recursively, whereby
      the chief thread within a groups release the blocks associated with
      mem_t entries back to the memory broker from which they were acquired.
    - Updated internal back-ends, e.g. bli_gemm_int(), to query and call the
      function pointer stored in the current control tree node (rather than
      index into a local function pointer array). Before being invoked, these
      function pointers are first cast to a gemm_voft (for gemm, herk, or trmm
      families) or trsm_voft (for trsm family) type, which is defined in
      frame/3/bli_l3_var_oft.h.
    - Retired herk and trmm internal back-ends, since all execution now flows
      through gemm or trsm blocked variants.
    - Merged forwards- and backwards-moving variants by querying the direction
      from routines as a function of the variant's matrix operands. gemm and
      herk always move forward, while trmm and trsm move in a direction that
      is dependent on which operand (a or b) is triangular.
    - Added functions bli_thread_get_range_mdim(), bli_thread_get_range_ndim(),
      each of which takes additional arguments and hides complexity in managing
      the difference between the way ranges are computed for the four families
      of operations.
    - Simplified level-3 blocked variants according to the above changes, so that
      the only steps taken are:
      1. Query partitioning direction (forwards or backwards).
      2. Prune unreferenced regions, if they exist.
      3. Determine the thread partitioning sub-ranges.
      <begin loop>
        4. Determine the partitioning blocksize (passing in the partitioning
           direction)
        5. Acquire the curren iteration's partitions for the matrices affected
           by the current variants's partitioning dimension (m, k, n).
        6. Call the subproblem.
      <end loop>
    - Instantiate control trees once per thread, per operation invocation.
      (This is a change from the previous regime in which control trees were
      treated as stateless objects, initialized with the library, and shared
      as read-only objects between threads.) This once-per-thread allocation
      is done primarily to allow threads to use the control tree as as place
      to cache certain data for use in subsequent loop iterations. Presently,
      the only application of this caching is a mem_t entry for the packing
      blocks checked out from the memory broker (allocator). If a non-NULL
      control tree is passed in by the (expert) user, then the tree is copied
      by each thread. This is done in bli_l3_thread_decorator(), in
      bli_thrcomm_*.c.
    - Added a new field to the context, and opid_t which tracks the "family"
      of the operation being executed. For example, gemm, hemm, and symm are
      all part of the gemm family, while herk, syrk, her2k, and syr2k are
      all part of the herk family. Knowing the operation's family is necessary
      when conditionally executing the internal (beta) scalar reset on on
      C in blocked variant 3, which is needed for gemm and herk families,
      but must not be performed for the trmm family (because beta has only
      been applied to the current row-panel of C after the first rank-kc
      iteration).
    - Reexpressed 3m3 induced method blocked variant in frame/3/gemm/ind
      to comform with the new control tree design, and renamed the macro-
      kernel codes corresponding to 3m2 and 4m1b.
    - Renamed bli_mem.c (and its APIs) to bli_memsys.c, and renamed/relocated
      bli_mem_macro_defs.h from frame/include to frame/base/bli_mem.h.
    - Renamed/relocated bli_auxinfo_macro_defs.h from frame/include to
      frame/base/bli_auxinfo.h.
    - Fixed a minor bug whereby the storage-to-ukr-preference matching
      optimization in the various level-3 front-ends was not being applied
      properly when the context indicated that execution would be via an
      induced method. (Before, we always checked the native micro-kernel
      corresponding to the datatype being executed, whereas now we check
      the native micro-kernel corresponding to the datatype's real projection,
      since that is the micro-kernel that is actually used by induced methods.
    - Added an option to the testsuite to skip the testing of native level-3
      complex implementations. Previously, it was always tested, provided that
      the c/z datatypes were enabled. However, some configurations use
      reference micro-kernels for complex datatypes, and testing these
      implementations can slow down the testsuite considerably.

commit a58dd35ed7b5b77a6b272655d2edd7a822b8fa87
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date:   Fri Aug 26 14:55:12 2016 +0530

    Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision
    
    Change-Id: Ibddf989f4aad577e89558673e1038cf6ece654d9

commit 73517f522b69de429dd7f3df60a70c068149ab28
Merge: c6f5c215 50293da3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 23 13:46:59 2016 -0500

    Merge branch 'master' into compose

commit 50293da38d5f2b7be9bbc94b9e85aacb6a10f672
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 23 13:38:36 2016 -0500

    Avoid compiling BLAS/CBLAS files when disabled.
    
    Details:
    - Updated the top-level Makefile, build/config.mk.in template, and
      configure script so that object files corresponding to source files
      belonging to the BLAS compatibility layer are not compiled (or archived)
      when the compatibility layer is disabled. (Same for CBLAS.) Thanks
      to Devin Matthews for suggesting this optimization.
    - Slight change to the way configure handles internal variables. Instead
      of converting (overwriting) some, such as enable_blas2blis and
      enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
      now stored in new variables that live alongside the originals (with the
      suffix "_01").  This is convenient since some values need to be
      sed-substituted into the config.mk.in template, which requires "yes" or
      "no", while some need to be written to the bli_config.h.in template,
      which requires "0" or "1".

commit 22dd6a353ddb56614309c01533b1a94c9fd32bca
Merge: cdfb3c3f f20ed388
Author: praveeng <praveen.g@amd.com>
Date:   Tue Aug 23 15:15:35 2016 +0530

    Merge master code as on 2016_08_23 to amd-staging branch by praveeng
    
     Changes to be committed:
            modified:   frame/thread/bli_mutex_openmp.h
            modified:   frame/thread/bli_mutex_pthreads.h
    
    Change-Id: Ica522edbb1d0173f53f38d5057b1f7aef73666be

commit c6f5c215ee793d03ea834469fc2adc53feaffc42
Merge: d52cb767 16a4c7a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 22 17:33:02 2016 -0500

    Merge branch 'master' into compose

commit f20ed3885d628992fab88690f629a5a2bab3eb88
Merge: 02ac597e 4bc842ca
Author: praveeng <praveen.g@amd.com>
Date:   Mon Aug 22 15:27:33 2016 +0530

    Merge branch 'master' of https://github.com/clMathLibraries/blis-amd for "Fixed bugs in bli_mutex_init() and friends."

commit 02ac597e4b9be2670d9fff65d28552f8e1ec81b3
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:11:08 2016 +0530

    Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
    
    Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99

commit 84e41cc73c9c87ce64582acd4264b8e1b5316482
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:01:36 2016 +0530

    Revert commits 8aee306
    
    Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189

commit 30ccfcee82db93d0109d1571242e2db925e95d0a
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 25 14:14:00 2016 +0530

    removed changes from readme file which are giving confilcts
    
    Change-Id: Ic71ad1313e1404fed444e899466043704d875af6

commit aeca25cd63fc8971f8fe7809599c57853f976548
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit 6b2274864b36fd1019d97bcc4ca6dd7a57ef16d9
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit daa7a9ecb25982f2551adbd95e65f8ba97cfe944
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit 5f66a4aa05aeffcb6eb587851d78d9527319466c
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit c6cbd78d2388c08824822b91a1c36ac4349bb67f
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:11:08 2016 +0530

    Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
    
    Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99

commit 9219a9060762525f87ebbf556d78fe8621858513
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:01:36 2016 +0530

    Revert commits 8aee306
    
    Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189

commit 728573296efa7cf14d2381570e116509dfe2a240
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 25 14:14:00 2016 +0530

    removed changes from readme file which are giving confilcts
    
    Change-Id: Ic71ad1313e1404fed444e899466043704d875af6

commit ad7862e291c240505c733a41d231b1a126ade73c
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit ad4b471a25ce77867295e5529dfc787e7c18b03f
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit 55d641363fcd8bdfdabbd7c22822fa2d0b7f3fa6
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit f3b6b15f6d591d323802bd6c81c522a02056506d
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit 16a4c7a823d60707ed9272f5d36e5c5d54c0ba4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 19 11:38:36 2016 -0500

    Fixed bugs in bli_mutex_init() and friends.
    
    Details:
    - Fixed a couple of bugs that affected OpenMP and POSIX threads
      configurations that resulted in compiler errors and warnings due
      to type mismatch, and in the case of pthreads, a missing function
      argument. The bugs are fairly recent, introduced in a017062.

commit c8e4ef93953ba2b79fb7e0973c08469c0e28a2cd
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Aug 3 16:13:03 2016 -0500

    Add prefetchw to 30x8 kernel.

commit 4b5a2f3d6e7ffeb5cc2be8448554f5c2083ad68f
Merge: 380736bf 9f52a587
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Aug 3 16:09:51 2016 -0500

    Merge remote-tracking branch 'origin/knl' into knl
    
    # Conflicts:
    #       kernels/x86_64/knl/3/bli_dgemm_opt_24x8.c

commit 380736bfe955efbdd7274c90b6fd635688e83bc4
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Aug 3 16:08:28 2016 -0500

    Add (new) 30x8 KNL kernel and fix non-scatter prefetch bug.

commit 9f52a587dee855daa73c194e41b6951416544e9a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Aug 3 16:03:53 2016 -0500

    Try prefetchw[t1] instead of regular prefetch for C.

commit 8945a1512d366bc6a8a85718d12cbf5de6f2898b
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Aug 3 11:28:24 2016 -0500

    This version gets ~1550 GFLOPs on KNL wuth 16x4.

commit cdfb3c3f29d321033fca106aa58ab67ead90a95d
Merge: 50a2f2ef 4bc842ca
Author: praveeng <praveen.g@amd.com>
Date:   Fri Jul 29 12:45:04 2016 +0530

    Merge master code as on 2016_07_29 to amd-staging branch by praveeng
    
    Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486

commit 4bc842ca3a64e658c0808bfe4c5693a5ace97923
Merge: 117f8838 b0d510bf
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 17:32:12 2016 +0530

    Merge branch 'master' of  publicrepo

commit 117f8838511a478aa16137e770d27dd21f4227c5
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:11:08 2016 +0530

    Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
    
    Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99

commit 2fcdc28f1055d385b2e662aa920fb97c472394d7
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:01:36 2016 +0530

    Revert commits 8aee306
    
    Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189

commit 1b5d104afe0628b8b6c0650f1e58cfb08be67004
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 25 14:14:00 2016 +0530

    removed changes from readme file which are giving confilcts
    
    Change-Id: Ic71ad1313e1404fed444e899466043704d875af6

commit d81273047bff56501e9413a90991d3d1f8b56a06
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit 65905c3011a11cda95761681d4ae84337e46bdb5
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit 23cca231be10fe1797aed451bcbc69d38c78bc0c
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit 922e3091702f25e3287b417719a33adbd5bbf138
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit b0d510bf0e4dfd177f9e4ae0069f41921e2ecdc1
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:11:08 2016 +0530

    Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
    
    Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99

commit 5ebeece5b4a8df81d59ca7558b278a4263d15128
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 28 15:01:36 2016 +0530

    Revert commits 8aee306
    
    Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189

commit 6ce4c022ebdea00c2b951090e3c2e9e88735b9ce
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jul 27 16:26:36 2016 -0500

    Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved.

commit d52cb7671509592a8078729477b40b60380518a2
Merge: 95abea46 c31b1e7b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 27 16:04:55 2016 -0500

    Merge branch 'master' into compose

commit c31b1e7b9d659b96433a87e5aecb90e457a104cc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 27 15:58:07 2016 -0500

    Relax alignment restrictions for sandybridge ukrs.
    
    Details:
    - Relaxed the base pointer and leading dimension alignment restrictions
      in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
      instead of vmovaps/vmovapd. These change mimic those made to the haswell
      microkernels in e0d2fa0 and ee2c139.
    - Updated testsuite modules as well as standalone test drivers in 'test'
      directory to use DBL_MAX as the initial time candidate. Thanks to Devin
      Matthews for suggesting this change.
    - Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
    - Minor update (vis-a-vis contexts) to driver code in test/3m4m.

commit b8f2b55532849d45d379afbdd05a52ff6100800d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jul 27 15:22:55 2016 -0500

    Try an 8x24 kernel for the hell of it.

commit 7ede5863ae3567f7c0852efc2d5cd649ca19e0f3
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jul 27 13:41:27 2016 -0600

    Allocate pack buffer on MCDRAM for KNL.

commit ad89ed2e829c7b261d8ba0998a3cb83ad576ee04
Merge: 2c9de740 81e2b05f
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jul 27 11:45:40 2016 -0500

    Merge branch 'knl' of github.com:devinamatthews/blis into knl

commit 2c9de740edb66c4692c200731763bbd1d3171ccb
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jul 27 11:44:54 2016 -0500

     This version gets ~26GF on one core.

commit 81e2b05f31bca4e1e1676e7b533d1868d9f9be33
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jul 27 11:39:05 2016 -0500

    Add optimized packing kernels for KNL.

commit a7d8ca97b8d835c32d90ff20a565c82733f014a8
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 25 15:15:13 2016 -0500

    All fixed.

commit 963d0393b023f4134bb0c682923faf9964c0e645
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 25 14:40:53 2016 -0500

    Add 24xk pack kernel.

commit 117b76739afba481768897d2580f8365d3345417
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 25 13:53:07 2016 -0500

    In the midst of debugging.

commit 8c0a4fd1d3535d608a9a309a61ffee0a73c3646f
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 25 13:09:24 2016 -0500

    Fix some row/column confusion.

commit c44f9f96930312125b15e64c326ab5ab5cc02633
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 25 12:02:24 2016 -0500

    Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length.

commit e0cce177cc1b47ec9f11ac0556241feaa3564df1
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Jul 25 10:02:25 2016 -0500

    Minor fixes for 8x24 KNL kernel.

commit 50a2f2efcbeb46537f1deaa8e44dc579a4e49eb8
Merge: 1aa77dfc cfd46c88
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 25 17:01:20 2016 +0530

    Merge master code as on 2016_07_25 to amd-staging branch by praveeng
    
    Change-Id: I84886ae241db2aac0bef6b7ef399f04aa8bca16d

commit cfd46c88d59c8f61d5e7cf768d606e4c44623584
Merge: f493bf4d a017062f
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 25 15:38:13 2016 +0530

    Merge remote-tracking branch 'publicrepo/master'

commit f493bf4d704fe0e967783cd6e6877d3302c056a1
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 25 14:14:00 2016 +0530

    removed changes from readme file which are giving confilcts
    
    Change-Id: Ic71ad1313e1404fed444e899466043704d875af6

commit 65735bbedf75784c48bd11e05b3fdc98fc66b4bc
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sun Jul 24 21:50:32 2016 -0500

    Switch to 24x8 kernel, unrolled by 16.

commit 45d5dc97177117220bd9dd0abf85aafc185acad1
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sun Jul 24 14:25:26 2016 -0500

    Add 24x8 "KNC-style" kernel for KNL.

commit 95abea46f86816fddfc9ff0abfa52880801461be
Merge: d0dfe5b5 a017062f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 23 15:38:33 2016 -0500

    Merge branch 'master' into compose

commit a017062fdf763037da9d971a028bb07d47aa1c8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 22 17:02:59 2016 -0500

    Integrated "memory broker" (membrk_t) abstraction.
    
    Details:
    - Integrated a patch originally authored and submitted by Ricardo Magana
      of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
      (memory broker) that allows multiple sets of memory pools on, for example,
      separate NUMA nodes, each of which has a separate memory space.
    - Added membrk field to cntx_t and defined corresponding accessor macros.
    - Added membrk field to mem_t object and defined corresponding accessor macros.
    - Created new bli_membrk.c file, which contains the new memory broker API,
      including:
        bli_membrk_init(), bli_membrk_finalize()
        bli_membrk_acquire_[mv](), bli_membrk_release(),
        bli_membrk_init_pools(), bli_membrk_reinit_pools(),
        bli_membrk_finalize_pools(),
        bli_membrk_pool_size()
    - In bli_mem.c, changed function calls to
        bli_mem_init_pools()     -> bli_membrk_init()
        bli_mem_reinit_pools()   -> bli_membrk_reinit()
        bli_mem_finalize_pools() -> bli_membrk_finalize()
    - In bli_packv_init.c, bli_packm_init.c, changed function calls to:
        bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
        bli_mem_release()      -> bli_membrk_release()
    - Added bli_mutex.c and related files to frame/thread. These files define
      abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
      single-threaded execution. This new API is employed within functions
      such as bli_membrk_acquire_[mv]() and bli_membrk_release().

commit 8ff2e069c48c12fd06b9c48c6b3aeb4ea9b0e6e1
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 16:22:26 2016 -0500

    Add 4x unrolled variant for KNL microkernel.

commit 9cb2ed9b0c25f31a22c1c9719b062fa665ad7adf
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 16:10:30 2016 -0500

    Git rid of one RBX update.

commit 451bde076f0320d60cd2475cfb048ac4a2b798bb
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 15:43:00 2016 -0500

    Add some more knobs to twiddle for KNL microkernel.

commit 8c6e621c099521e7a4d87e007bb8224faa5f33a3
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 15:05:15 2016 -0500

    Make knl conform to new kernel dir structure.

commit ce7214c6618d6f22f4ce2ee452336236916d1f30
Merge: 119d0399 ce59f811
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 14:59:53 2016 -0500

    Merge remote-tracking branch 'origin/master' into knl

commit ce59f81108ec9aea918a7e77030da8acfdd397ce
Merge: ff41153f 707a2b7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 22 14:48:14 2016 -0500

    Merge pull request #88 from devinamatthews/32bit-dim_t
    
    Handle 32-bit dim_t in 64-bit microkernels.

commit 707a2b7faca137cca7cab7b11a12c44ddaf7ad53
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 13:49:44 2016 -0500

    Somehow forgot the most important microkernel.

commit 47ec045056351ac4f0791c071fa0daaa81699c8c
Merge: 08f1d6b6 ff41153f
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 13:45:23 2016 -0500

    Merge remote-tracking branch 'upstream/master' into 32bit-dim_t

commit 08f1d6b6fa344275de0f675f69737145ccf6646a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 13:44:37 2016 -0500

    Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit.

commit ff41153f4eb7f38ed94bdd9a3fd81fb979f3f401
Merge: f9214ced e0d2fa0d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 22 13:21:03 2016 -0500

    Merge pull request #86 from devinamatthews/haswell-vmovups
    
    Remove alignment restrictions on C in haswell kernel.

commit e0d2fa0d835ab49366aeb790363bb2b571d36ed8
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 12:56:51 2016 -0500

    Relax alignment restrictions for haswell sgemm.

commit f9214ced97392861f5a0ea72abfcf6f41faf674c
Merge: 413d62ac 08666eaa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 22 12:16:39 2016 -0500

    Merge pull request #85 from devinamatthews/qopenmp
    
    Change -openmp to -fopenmp for icc.

commit ee2c139df6ad53c6aec8a67ab23b3b1912e8d259
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 12:06:03 2016 -0500

    Remove alignment restrictions on C in haswell kernel.

commit 08666eaa20d8a31f2f92f944e5bfa7c1558c53e4
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 11:07:34 2016 -0500

    Change -openmp to -fopenmp for icc.

commit 119d0399428905053265f3aca1cc8cc1fde3b363
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Jul 22 10:23:31 2016 -0500

    Add 8x24 KNL kernel.

commit 1aa77dfc1dc183d16e0b6a1196d9c263f021e83d
Merge: 9101a9c8 ec9f5983
Author: praveeng <praveen.g@amd.com>
Date:   Thu Jul 21 14:22:40 2016 +0530

    Merge master code as on 2016_07_21 to amd-staging branch by praveeng
    
    Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344

commit b58cda9eba0c1e175460aae109baf792d29ba5bf
Merge: 318f063d 413d62ac
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Jul 19 14:09:09 2016 -0500

    Merge remote-tracking branch 'origin/master' into knl
    
    # Conflicts:
    #       frame/base/bli_threading.h
    #       frame/include/blis.h
    #       frame/thread/bli_thread.c

commit ec9f59836b32260c29ff1cd24e629c7d8de14992
Merge: 197e182f 763babe4
Author: praveeng <praveen.g@amd.com>
Date:   Mon Jul 18 12:56:25 2016 +0530

    Merge branch 'master' of https://github.com/clMathLibraries/blis-amd

commit 197e182fcbf1340fd4a202fac58bea6cfcfa9e2f
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit 41fb32711031e7ec86b062aa7f53255d1f5905e2
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit d0dfe5b5372cc7558ee9c4104b29f82eecc7ed61
Merge: 31def12e 413d62ac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 14 11:01:06 2016 -0500

    Merge branch 'master' into compose

commit 9101a9c880e3934f8a63ffc7fe15f5fc1077a73d
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Jul 13 16:51:14 2016 +0530

    Checked in optimized 1V kernels along with benchmark codes. Also incorporated review comments for 1F kernels
    
    Change-Id: I035c0d39e6b0bed28e6e2041242186c49f6ed55b

commit 763babe488880b42c86c7fc207aa7665bd0ff9f7
Merge: 357c990b 413d62ac
Author: praveeng <praveen.g@amd.com>
Date:   Wed Jul 13 11:57:19 2016 +0530

    Merge remote-tracking branch 'publirepo/master'

commit 413d62aca28edabba56605a9f87d5b715831e1db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 12 15:02:52 2016 -0500

    README update (use official ACM TOMS links).

commit dfa431f696db2df4065ea454df268a2e0bc02eac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 12 14:21:19 2016 -0500

    README update (BLIS2 TOMS article now in-print).

commit 357c990bdd7bd5667aac5adf1bab3712973e7414
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 16:51:23 2016 +0530

    first commit
    
    Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2

commit 8aee306300adb099b66036f2c2f7f3996433cf49
Author: praveeng <praveen.g@amd.com>
Date:   Tue Jul 5 15:00:31 2016 +0530

    small modification to readme for  git push test
    
    Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a

commit 31def12e2629f187e40f93f6bae9e26a6c2660e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 30 15:19:20 2016 -0500

    First phase of control tree redesign.
    
    Details:
    - These changes constitute the first set of changes in preparation to
      revamping the structure and use of control trees in BLIS. Modifications
      in this commit don't affect the control tree code yet, but rather lay
      the groundwork.
    - Defined wrappers for the following functions, where the the wrappers
      each take a direction parameter of a new enumerated type (BLIS_BWD or
      BLIS_FWD), dir_t, and executes the correct underlying function.
      - bli_acquire_mpart_*() and _vpart_*()
      - bli_*_determine_kc_[fb]()
      - bli_thread_get_range_*() and bli_thread_get_range_weighted_*()
    - Consolidated all 'f' (forwards-moving) and 'b' (backwards-moving)
      blocked variants for trmm and trsm, and renamed gemm and herk variants
      accordingly. The direction is now queried via routines such as
      bli_trmm_direct(), which deterines the direction from the implied side
      and uplo parameters. For gemm and herk, it is uncondtionally BLIS_FWD.
    - Defined wrappers to parameter-specific macrokernels for herk, trmm, and
      trsm, e.g. bli_trmm_xx_ker_var2(), that execute the correct underlying
      macrokernel based on the implied parameters. The same logic used to
      choose the dir_t in _direct() functions is used here.
    - Simplified the function pointer arrays in _int() functions given the
      consolidation and dir_t querying mentioned above.
    - Function signature (whitespace) reformatting for various functions.
    - Removed old code in various 'old' directories.

commit 405c9d46344d93c3eab5572b233900b50ca50d68
Author: sthangar <Santanu.Thangaraj@amd.com>
Date:   Wed Jun 22 12:18:54 2016 +0530

    Check-in the fused kernels optimized for Zen
    
    Change-Id: I7b2f467b960e7b9a285f06e47be87de122e5fa24

commit 232754feecf29452987666b9f5ebba2619bfd0b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 21 14:25:39 2016 -0500

    Fixed compiler warning in rand[vm], randn[vm].
    
    Details:
    - Fixed compiler warnings about unused variables related to the disabling
      of normalization in the structured cases of the rand[vm] and randn[vm]
      operations.

commit a89555d1605574f3685813dcc972b636dd61264d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 17 14:08:35 2016 -0500

    Added randn[vm] operations, support in testsuite.
    
    Details:
    - Defined a new randomization operation, randn, on vectors and matrices.
      The randnv and randnm operations randomize each element of the target
      object with values from a narrow range of values. Presently, those
      values are all integer powers of two, but they do not need to be powers
      of two in order to achieve the primary goal, which is to initialize
      objects that can be operated on with plenty of precision "slack"
      available to allow computations that avoid roundoff. Using this method
      of randomization makes it much more likely that testsuite residuals of
      properly-functioning operations are close to zero, if not exactly zero.
    - Updated existing randomization operations randv and randm to skip
      special diagonal handling and normalization for matrices with structure.
      This is now handled by the testsuite modules by explicitly calling a
      testsuite function that loads the diagonal (and scales off-diagonal
      elements).
    - Added support for randnv and randnm in the testsuite with a new switch
      in input.general that universally toggles between use of the classic
      randv/randm, which use real values on the interval [-1,1], and
      randnv/randnm, which use only values from a narrow range. Currently,
      the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
      well as 0.0.
    - Updated testsuite modules so that a testsutie wrapper function is called
      instead of directly calling the randomization operations (such as
      bli_randv() and bli_randm()). This wrapper also takes a bool_t that
      indicates whether the object's elements should be normalized. (NOTE: As
      alluded to above, in the test modules of triangular solve operations such
      as trsv and trsm, we perform the extra step of loading the diagonal.)
    - Defined a new level-0 operation, invertsc, which inverts a scalar.
    - Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
      but possible divide-by-zero.
    - Updated function signature and prototype formatting in testsuite.

commit 318f063dcbd8b594969e401bc99146d24b01066a
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Jun 8 17:46:50 2016 -0500

    Add new KNL microkernel derived from Haswell.

commit 096895c5d538a7f8817603d7cf28c52e99340def
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 6 13:32:04 2016 -0500

    Reorganized code, APIs related to multithreading.
    
    Details:
    - Reorganized code and renamed files defining APIs related to multithreading.
      All code that is not specific to a particular operation is now located in a
      new directory: frame/thread. Code is now organized, roughly, by the
      namespace to which it belongs (see below).
    - Consolidated all operation-specific *_thrinfo_t object types into a single
      thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
      also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
      functions (aside from a few general purpose bli_thrinfo_*() functions).
    - Renamed thread_comm_t object type to thrcomm_t.
    - Renamed many of the routines and functions (and macros) for multithreading.
      We now have the following API namespaces:
      - bli_thrinfo_*(): functions related to thrinfo_t objects
      - bli_thrcomm_*(): functions related to thrcomm_t objects.
      - bli_thread_*(): general-purpose functions, such as initialization,
        finalization, and computing ranges. (For now, some macros, such as
        bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
        bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
        appropriate.)
    - Renamed thread-related macros so that they use a bli_ prefix.
    - Renamed control tree-related macros so that they use a bli_ prefix (to be
      consistent with the thread-related macros that were also renamed).
    - Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
      #undef was a temporary fix to some macro defaults which were being applied
      in the wrong order, which was recently fixed.

commit 232530e88ff99f37abcae5b6fb5319a9a375a45f
Merge: 4bcabd1b eef37f8b
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Wed Jun 1 15:14:10 2016 -0500

    Merge commit 'refs/pull/81/head' of https://github.com/flame/blis
    
    Conflicts:
            frame/base/bli_threading_pthreads.c
            frame/base/bli_threading_pthreads.h

commit 4bcabd1bf60688c38cf562459fc5e8be8b831756
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Wed Jun 1 13:27:28 2016 -0500

    Use spin locks instead of pthread barriers

commit eef37f8b4d81845a6ba4bf25586d32b50c3e8a68
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Sun May 29 22:28:13 2016 -0700

    use GCC intrinsic instead of pthread_mutex for atomic increment and fetch

commit 9dcd6f05c4c3ff2ce7cd87a9951a96ebef22681e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 24 13:15:32 2016 -0500

    Implemented developer-configurable malloc()/free().
    
    Details:
    - Replaced all instances of bli_malloc() and bli_free() with one of:
      - bli_malloc_pool()/bli_free_pool()
      - bli_malloc_user()/bli_free_user()
      - bli_malloc_intl()/bli_free_intl()
      each of which can be configured to call malloc()/free() substitutes,
      so long as the substitute functions have the same function type
      signatures as malloc() and free() defined by C's stdlib.h. The _pool()
      function is called when allocating blocks for the memory pools (used
      for packing buffers, primarily), the _user() function is called when
      obj_t's are created (via bli_obj_create() and friends), and the _intl()
      function is called for internal use by BLIS, such as when creating
      control tree nodes or temporary buffers for manipulating internal data
      structures. Substitutes for any of the three types of bli_malloc() may
      be specified by #defining the following pairs of cpp macros in
      bli_kernel.h:
      - BLIS_MALLOC_POOL/BLIS_FREE_POOL
      - BLIS_MALLOC_USER/BLIS_FREE_USER
      - BLIS_MALLOC_INTL/BLIS_FREE_INTL
      to be the name of the substitute functions. (Obviously, the object
      code that contains these functions must be provided at link-time.)
      These macros default to malloc() and free(). Subsitute functions are
      also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
    - Removed definitions for bli_malloc() and bli_free().
    - Note that bli_malloc_pool() and bli_malloc_user() are now defined in
      terms of a new function, bli_malloc_align(), which aligns memory to an
      arbitrary (power of two) alignment boundary, but does so manually,
      whereas before alignment was performed behind the scenes by
      posix_memalign(). Currently, bli_malloc_intl() is defined in terms
      of bli_malloc_noalign(), which serves as a simple wrapper to the
      designated function that is passed in (e.g. BLIS_MALLOC_INTL).
      Similarly, there are bli_free_align() and bli_free_noalign(), which
      are used in concert with their bli_malloc_*() counterparts.

commit 9dd440109a9d964f5cd286e9f83c487ad703e1e4
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Sat May 21 15:21:58 2016 -0700

    fix 404 link to BuildSystem
    
    Google Code is dead.  Long live GitHub!

commit d309f20b7376a68efa3b864ad790c2021c071655
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 18 15:13:53 2016 -0500

    Added alignment switch to testsuite.
    
    Details:
    - Added a new input parameter to input.general that globally toggles
      whether testsuite tests are performed on objects whose buffers and
      leading dimensions have been aligned, and changed the implementation
      of libblis_test_mobj_create() to employ alignment (or not) regardless
      of whether row, column, or general storage is being tested.
    - Updated configure script's "--help" text to indicate default behavior
      for internal integer type size and BLAS/CBLAS integer type size
      options.

commit 32db0adc218ea4ae370164dbe8d23b41cd3526d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 17 15:20:16 2016 -0500

    Generate prototypes for user-defined packm kernels.
    
    Details:
    - Created template prototypes for packm kernels (in bli_l1m_ker.h), and
      then redefined reference packm kernels' prototyping headers in terms of
      this template, as is already done for level-1v, -1f, and -3 kernels.
    - Automatically generate prototypes for user-defined packm kernels in
      bli_kernel_prototypes.h (using the new template prototypes in
      bli_l1m_ker.h).
    - Defined packm kernel function types in bli_l1m_ft.h, including for
      packm kernels specific to induced methods, which are now used in
      bli_packm_cxk.c and friends rather than using a locally-defined
      function type.
    - In bli_packm_cxk.c, extended function pointer for packm kernels array
      from out to index 31 (from previous maximum of 17). This allows us to
      store the unrolled 30xk kernel in the array for use (on knc, for
      example). Note: This should have been done a long time ago.

commit e3bd5ca64ae7c190ba689396c0de687b829a11fe
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu May 12 20:54:13 2016 -0500

     Fix SIMD definitions in KNL config, and a couple of fixes to C update.

commit 4fe02e3d497995d94d34d3fcf5af895084cfc8b9
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu May 12 20:53:58 2016 -0500

    Move bli_kernel.h before bli_threading.h in order of inclusion in blis.h.

commit 4bcf1b35abea3f3dfc8f2fe462dcf155cf199e55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 11 16:09:49 2016 -0500

    Fixed bli_get_range_*() bugs in trsm variants.
    
    Details:
    - Fixed incorrect calls to bli_get_range_*() from within trsm blocked
      variants 1f, 2b, and 2f. The bug somehow went undetected since the
      big commit (537a1f4), and, strangely, did not manifest via the BLIS
      testsuite. The bug finally came to our attention when running thei
      libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
      for submitting the initial report that led to this bug.

commit 9cfa33023f123a6c17e987f72fba174ce073f0b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 11 16:02:30 2016 -0500

    Minor updates to bli_f2c.h.
    
    Details:
    - Added #undef guards to certain #define statements in bli_f2c.h,
      and renamed the file guard to BLIS_F2C_H. This helps when
      #including "blis.h" from an application or library that already
      #includes an "f2c.h" header.

commit a09a2e23eacf5328858c8318bb637c5ff3b71d08
Merge: 4dcd37eb 7c604e1c
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Wed May 11 10:47:11 2016 -0500

    Merge pull request #76 from devinamatthews/move_simd_defs
    
    Move default SIMD-related definitions to bli_kernel_macro_defs.h

commit 4dcd37eb1b12a6e08cc13df7b61391ef8363f5d8
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue May 10 16:28:59 2016 -0500

    fixing knc simd align size

commit 619dee0daec3474b4e5a55df90a61aabcae194f2
Merge: b790b3d9 7c604e1c
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue May 10 12:13:24 2016 -0500

    Merge branch 'move_simd_defs' into knl

commit 7c604e1cbc1609b6e12d3ee973c08b7af5035be4
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue May 10 12:11:55 2016 -0500

    Move default SIMD-related definitions to bli_kernel_macro_defs.h. Otherwise, configurations which customize these fail as these are now defined in bli_kernel.h.

commit b790b3d9e1820f3b691676de48c291cae083452d
Merge: 4f8c05c9 a7be2d28
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue May 10 11:49:47 2016 -0500

    Merge branch 'master' into knl

commit a7be2d28e8930b154d0da1d6929b54a96e210af6
Merge: 97b512ef 4b1e55ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 10 11:48:51 2016 -0500

    Merge pull request #74 from devinamatthews/fix_common_symbols
    
    Default-initialize all extern global variables to avoid generating common symbols.

commit 4b1e55edbfe0e1cb2e7b9428424903497cb7a841
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue May 10 10:08:47 2016 -0500

    Default-initialize all extern global variables to avoid generating common symbols. Fixes #73.

commit 97b512ef62c7e25c97ed5e9eca81cd7015b2ac91
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 6 10:24:30 2016 -0500

    Include headers from cblas.h to pull in f77_int.
    
    Details:
    - Added #include statements for certain key BLIS headers so that the
      definition of f77_int is pulled in when a user compiles application
      code with only #include "cblas.h" (and no other BLIS header). This
      is necessary since f77_int is now used within the cblas API.

commit c3a4d39d03665135f1616588b5ef7c3e9ef5688d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 4 17:22:56 2016 -0500

    Updates to haswell gemm micro-kernels.
    
    Details:
    - Added two new sets of [sd]gemm micro-kernels for haswell architectures,
      one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
    - Changed the haswell configuration to use the 6x16/6x8 micro-kernels
      by default.
    - Updated various Makefiles, in test, test/3m4m, and testsuite.

commit 0b01d355ae861754ae2da6c9a545474af010f02e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 27 15:21:10 2016 -0500

    Miscellaneous cleanups, fixes to recent commits.
    
    Details:
    - Fixed a typo in bli_l1f_ref.h, introduced into bbb8569, that only
      manifested when non-reference level-1f kernels were used.
    - Added an #undef BLIS_SIMD_ALIGN_SIZE to bli_kernel.h of dunnington
      configuration to prevent a compile-time warning until I can figure out
      the proper permanent fix.
    - Moved frame/1f/kernels/bli_dotxaxpyf_ref_var1.c out of the compilation
      path (into 'other' directory). _ref_var2 is used by default, which is
      the variant that is built on axpyf and dotxf instead of dotaxpyv.
    - Removed section of frame/include/bli_config_macro_defs.h pertaining to
      mixed datatype support.

commit ed7326c836f427e2f8420b015220ce293207b10c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 27 14:57:40 2016 -0500

    Added 'restrict' to l1v/l1f code in 'kernels' dir.
    
    Details:
    - Added 'restrict' keyword to existing kernel definitions in 'kernels'
      directory. These changes were meant for inclusion in bbb8569.

commit bbb8569b2a08c3bcd631d5a05eb389d01d94ac07
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 27 14:13:46 2016 -0500

    Use 'restrict' in all kernel APIs; wspace changes.
    
    Details:
    - Updated level-1v, level-1f kernel function types (bli_l1?_ft.h) and
      generic kernel prototypes (bli_l1?_ker.h) to use 'restrict' for all
      numerical operand pointers (ie: all pointers except the cntx_t).
    - Updated level-1f reference kernel definitions to use 'restrict' for
      all numerical operand pointers. (Level-1v reference kernel definitions
      were already updated in bdbda6e.)
    - Rewrote the level-1v and level-1f reference kernel prototypes in
      bli_l1v_ref.h and bli_l1f_ref.h, respectively, to simply #include
      bli_l1v_ker.h and bli_l1f_ker.h with redefined function base names
      (as was already being done for the level-3 micro-kernel prototypes
      in bli_l3_ref.h), rather than duplicate the signatures from the
      _ker.h files.
    - Added definitions to frame/include/bli_kernel_prototypes.h for axpbyv
      and xpbyv, which were probably meant for inclusion in bdbda6e.
    - Converted a number of instances of four spaces, as introduced in
      bdbda6e, to tabs.

commit 4ea419c72c789825e1f93a1eee88219bbf873930
Merge: f1e9be2a bdbda6e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 26 12:50:45 2016 -0500

    Merge pull request #70 from devinamatthews/daxpby
    
    Give the level1v operations some love

commit bdbda6e6acc682ab1b6ca680edebd09ae12a832c
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 25 11:05:57 2016 -0500

    Give the level1v operations some love:
    
    - Add missing axpby and xpby operations (plus test cases).
    - Add special case for scal2v with alpha=1.
    - Add restrict qualifiers.
    - Add special-case algorithms for incx=incy=1.

commit f1e9be2aba1a057eedb947bbae96848597777408
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 22 15:34:02 2016 -0500

    Minor tweak to test/Makefile.
    
    Details:
    - Just committing a minor change to test/Makefile that has been lingering
      in my local working copy for longer than I can remember.

commit aa0bceec277938328dabeb744680623f24fb0b61
Merge: 4136553f e2784b4c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 22 12:01:31 2016 -0500

    Merge branch 'master' of github.com:flame/blis

commit 4136553f0d0661a668dfdb9edcd7ce1c5773dde7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 22 11:53:53 2016 -0500

    Clear level-3 cntx_t's via memset() before use.
    
    Details:
    - In all level-3 operations' _cntx_init() functions, replaced calls to
      bli_cntx_obj_init() with calls to bli_cntx_obj_clear(), and in all
      level-3 operations' _cntx_finalize() functions, removed calls to
      bli_cntx_obj_finalize(), leaving those function definitions empty.
    - Changed the definition of bli_cntx_obj_clear() so that the clearing
      occurs via a single call to memset().

commit 4f8c05c9e2ef4cbb82b35a3ebf1f0a0ac665830e
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Apr 21 10:00:59 2016 -0500

    Rearrange KNL dgemm kernel again to streamline usage of ymm register. sgemm and dgemm now both working with Intel SDE.

commit e2784b4c921f706e756df3e146e20a4cb63f53e3
Merge: dd0ab1d9 a9b6c3ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 20 18:34:09 2016 -0500

    Merge pull request #67 from devinamatthews/cblas-f77-int
    
    Change CBLAS integer type to f77_int

commit a9b6c3abda6222a8b240361643932e83cf726c4f
Merge: e4c54c81 dd0ab1d9
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Apr 20 16:00:10 2016 -0500

    Merge remote-tracking branch 'origin/master' into cblas-f77-int
    
    # Conflicts:
    #       config/haswell/bli_config.h

commit e4c54c81463c2a19c9bb6b1f0f1be3fa9d018a45
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Apr 20 15:56:46 2016 -0500

    Change integer type in CBLAS function signatures to f77_int, and add proper const-correctness to BLAS layer.

commit dd0ab1d93f33abca6af9edd7b8e52da62dcfa5b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 20 14:38:23 2016 -0500

    Converted some bli_cntx query functions to macros.
    
    Details:
    - Commented out several datatype-aware query functions (those ending in
      _dt) from bli_cntx.c, as well as their prototypes in bli_cntx.h, and
      added equivalent cpp query macros to bli_cntx.h.
    - Added 'bli_config.h' to .gitignore.

commit 7193230f7d35edbd1d2f77842a613971f1603463
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Apr 20 09:37:30 2016 -0500

    Work around missing VPMULLQ on KNL.

commit a30ccbc4c6a6e6460e78af6b5c530ee0d06f98fb
Merge: eb2f18e4 0e1a9821
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 19 15:04:33 2016 -0500

    Merge pull request #66 from devinamatthews/blas-configure
    
    Add configure options and generate bli_config.h automatically.

commit bd44cf13e886069bc66c10ac0db178be96629a0d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Apr 19 13:43:04 2016 -0500

    Fix copy-paste errors in KNL kernels.

commit eb2f18e4844d985715df20798f50f9cc12e3b5ad
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 19 12:50:32 2016 -0500

    More compile-time fixes to bgq gemm ukernel code.

commit 0e1a9821d860f6c1d818baf4c48d21a23726c132
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Apr 19 11:44:37 2016 -0500

    Add configure options and generate bli_config.h automatically.
    
    Options to configure have been added for:
    - Setting the internal BLIS and BLAS/CBLAS integer sizes.
    - Enabling and disabling the BLAS and CBLAS layers.
    
    Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.
    
    Lastly, support for OMP in clang has been added (closes #56).

commit a11eec05928ddc5c43fa5dbcd35f2edd24ff35a1
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 18 13:13:36 2016 -0500

    Add sgemm ukernels for KNL. vpmullq is not implemented on KNL -- needs workaround.

commit ff84469a4575f1ef8a0010046fde52240a312cae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 18 12:29:09 2016 -0500

    Applied various compilation fixes to bgq kernels.

commit c38e0dab05b2dc36672eab96e1248fb7fb2d785b
Merge: bd5e2296 cbcd0b73
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 18 10:21:35 2016 -0500

    Merge remote-tracking branch 'origin/master' into knl

commit bd5e2296e98e042c31f1e8ece2c1ca8e4bdc2d4c
Merge: 4745def0 49f85177
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 18 10:15:22 2016 -0500

    Merge remote-tracking branch 'origin/knl' into knl

commit 4745def0c87377ae83ad73ac514d7de08a96b2ac
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 18 10:15:05 2016 -0500

    Add 64-bit offset vector so we can use vgatherqpd.

commit 49f85177f886f38889b60503a4e12fa7f04be1fd
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 18 10:14:11 2016 -0500

    KNL ukernel compiles with gcc.

commit cbcd0b739dc54bd14fbb46aeda267c26725cd70f
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Mon Apr 18 03:12:57 2016 -0500

    Changing ifdef for OSX pthread barriers

commit 58b2c3cf040134d1be913c585a3c6905629116c0
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sat Apr 16 16:12:24 2016 -0500

     Rewrite of KNL kernel in GNU extended asm syntax.

commit dd62080cea78f3a23616200d6640e52c102b2bb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 15 11:15:41 2016 -0500

    Compile-time fix to bgq l1f kernels.
    
    Details:
    - Fixed an old reference to bli_daxpyf_fusefac, which no longer exists,
      by replacing it with the axpyf fusing factor (8), and cleaned up the
      relevant section of config/bgq/bli_kernel.h.
    - Removed most of the details of the level-3 kernels from the template
      kernel code in config/template/kernels/3 and replaced it with a
      reference to the relevant kernel wiki maintained on the BLIS github
      website.

commit d5a915dd8d7a6ead42a68772e4420eb3647e6f1a
Merge: 4320b725 41694675
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 14 12:56:36 2016 -0500

    Merge branch 'master' of github.com:flame/blis

commit 4320b725a1f8fd34101470b6cf52ad504a79c517
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 14 12:51:29 2016 -0500

    Use kernel CFLAGS on "ukernels" directories.
    
    Details:
    - Updated the top-level Makefile so that the CFLAGS variable designated
      for kernel source code is applied not only to source code in
      directories named "kernels" but source code in any directory that
      contains the substring "kernels", such as "ukernels".
    - Formally disabled some code in gen-make-frag.sh script that was already
      effectively disabled. The code was related to handling "noopt" and
      "kernel" directories, which is now handled independently within the
      top-level Makefile without needing to place these source files into
      a spearate makefile variable.

commit 41694675e4cb56e2e0323c7a7db48e0819606a31
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Apr 13 15:51:08 2016 -0500

    pthreads bugfixes
    
    Getting pthreads to work on my Mac
    Implemented a pthread barrier when _POSIX_BARRIER isn't defined
    Now spawn n-1 threads instead of n threads so that master thread isn't just spinning the whole time
    Add -lpthread instead of -pthread to LDFLAGS (for clang)

commit f756dbfa0d542cbc497724981520c83abf049c4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 13 11:25:33 2016 -0500

    Removed stale #include from bgq configuration.
    
    Details:
    - Removed an old #include statement ("bli_gemm_8x8.h") from the
      bli_kernel.h file in the bgq configuration. It turns out this
      file was no longer needed even prior to 537a1f4.

commit 0bd4169ea75f690714e7d2912229932a75d8a7e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 11 18:08:32 2016 -0500

    Fixed context-broken dunnington/penryn kernels.
    
    Details:
    - Added missing context parameters to several instances where simpler
      kernels, or reference kernels, are called instead of executing the
      main body code contained in the kernel function in question.
    - Renamed axpyv and dotv kernel files to use "opt" instead of "int"
      substring, for consistency with level-1f kernels.

commit 7912af5db45b7372d19a9a3dfeb82df302a05628
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 11 17:32:13 2016 -0500

    CHANGELOG update (0.2.0)

commit 898614a555ea0aa7de4ca07bb3cb8f5708b6a002
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 11 17:32:09 2016 -0500

    Version file update (0.2.0)

commit 537a1f4f85ce1aa008901857cb3182e6b4546d7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 11 17:21:28 2016 -0500

    Implemented runtime contexts and reorganized code.
    
    Details:
    - Retrofitted a new data structure, known as a context, into virtually
      all internal APIs for computational operations in BLIS. The structure
      is now present within the type-aware APIs, as well as many supporting
      utility functions that require information stored in the context. User-
      level object APIs were unaffected and continue to be "context-free,"
      however, these APIs were duplicated/mirrored so that "context-aware"
      APIs now also exist, differentiated with an "_ex" suffix (for "expert").
      These new context-aware object APIs (along with the lower-level, type-
      aware, BLAS-like APIs) contain the the address of a context as a last
      parameter, after all other operands. Contexts, or specifically, cntx_t
      object pointers, are passed all the way down the function stack into
      the kernels and allow the code at any level to query information about
      the runtime, such as kernel addresses and blocksizes, in a thread-
      friendly manner--that is, one that allows thread-safety, even if the
      original source of the information stored in the context changes at
      run-time; see next bullet for more on this "original source" of info).
      (Special thanks go to Lee Killough for suggesting the use of this kind
      of data structure in discussions that transpired during the early
      planning stages of BLIS, and also for suggesting such a perfectly
      appropriate name.)
    - Added a new API, in frame/base/bli_gks.c, to define a "global kernel
      structure" (gks). This data structure and API will allow the caller to
      initialize a context with the kernel addresses, blocksizes, and other
      information associated with the currently active kernel configuration.
      The currently active kernel configuration within the gks cannot be
      changed (for now), and is initialized with the traditional cpp macros
      that define kernel function names, blocksizes, and the like. However,
      in the future, the gks API will be expanded to allow runtime management
      of kernels and runtime parameters. The most obvious application of this
      new infrastructure is the runtime detection of hardware (and the
      implied selection of appropriate kernels). With contexts in place,
      kernels may even be "hot swapped" at runtime within the gks. Once
      execution enters a level-3 _front() function, the memory allocator will
      be reinitialized on-the-fly, if necessary, to accommodate the new
      kernels' blocksizes. If another application thread is executing with
      another (previously loaded) kernel, it will finish in a deterministic
      fashion because its kernel information was loaded into its context
      before computation began, and also because the blocks it checked out
      from the internal memory pools will be unaffected by the newer threads'
      reinitialization of the allocator.
    - Reorganized and streamlined the 'ind' directory, which contains much of
      the code enabling use of induced methods for complex domain matrix
      multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
      those APIs' functionality is now mostly subsumed within the global
      kernel structure.
    - Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
      that will reinitialize a memory pool if the necessary pool block size
      has increased.
    - Updated bli_mem.c to use bli_pool_reinit_if() instead of
      bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
      usage of contexts where appropriate to communicate cache and register
      blocksizes to bli_mem_compute_pool_block_sizes().
    - Simplified control trees now that much of the information resides in
      the context and/or the global kernel structure:
      - Removed blocksize object pointers (blksz_t*) fields from all control
        tree node definitions and replaced them with blocksize id (bszid_t)
        values instead, which may be passed into a context query routine in
        order to extract the corresponding blocksize from the given context.
      - Removed micro-kernel function pointers (func_t*) fields from all
        control tree node definitions. Now, any code that needs these function
        pointers can query them from the local context, as identified by a
        level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
        level-1v kernel id (l1vkr_t).
      - Removed blksz_t object creation and initialization, as well as kernel
        function object creation and initialization, from all operation-
        specific control tree initialization files (bli_*_cntl.c), since this
        information will now live in the gks and, secondarily, in the context.
    - Removed blocksize multiples from blksz_t objects. Now, we track
      blocksize multiples for each blocksize id (bszid_t) in the context
      object.
    - Removed the bool_t's that were required when a func_t was initialized.
      These bools are meant to allow one to track the micro-kernel's storage
      preferences (by rows or columns). This preference is now tracked
      separately within the gks and contexts.
    - Merged and reorganized many separate-but-related functions into single
      files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
      util directories, but has the most obvious effect of allowing BLIS
      to compile noticeably faster.
    - Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
      in an attempt to reduce overhead for memory-bound operations. This
      includes removal of default use of object-based variants for level-2
      operations. Now, by default, level-2 operations will directly call a
      low-level (non-object based) loop over a level-1v or -1f kernel.
    - Converted many common query functions in blk_blksz.c (renamed from
      bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
      respective header files.
    - Defined bli_mbool.c API to create and query "multi-bools", or
      heterogeneous bool_t's (one for each floating-point datatype), in the
      same spirit as blksz_t and func_t.
    - Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
      and BLIS_SIMD_SIZE. These values are needed in order to compute a third
      new parameter, which may be set indirectly via the aforementioned
      macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
      statically allocate memory in macro-kernels and the induced methods'
      virtual kernels to be used as temporary space to hold a single
      micro-tile. These values are now output by the testsuite. The default
      value of BLIS_STACK_BUF_MAX_SIZE is computed as
      "2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
    - Cleaned up top-level 'kernels' directory (for example, renaming the
      embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
      and "haswell," respectively, and gave more consistent and meaningful
      names to many kernel files (as well as updating their interfaces to
      conform to the new context-aware kernel APIs).
    - Updated the testsuite to query blocksizes from a locally-initialized
      context for test modules that need those values: axpyf, dotxf,
      dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
    - Reformatted many function signatures into a standard format that will
      more easily facilitate future API-wide changes.
    - Updated many "mxn" level-0 macros (ie: those used to inline double loops
      for level-1m-like operations on small matrices) in frame/include/level0
      to use more obscure local variable names in an effort to avoid variable
      shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
      which are only output using -Wshadow.)
    - Added a conj argument to setm, so that its interface now mirrors that
      of scalm. The semantic meaning of the conj argument is to optionally
      allow implicit conjugation of the scalar prior to being populated into
      the object.
    - Deprecated all type-aware mixed domain and mixed precision APIs. Note
      that this does not preclude supporting mixed types via the object APIs,
      where it produces absolutely zero API code bloat.

commit dd856c2cb75a2221a503a73dde27790c34b91570
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Apr 11 10:39:18 2016 -0500

    Translated MIC kernel to KNL and cleaned up a bit. Only real change is lack of swizzle modifiers for FMA instructions (used bcast from memory instead).

commit 7f27431d3fffdda99c282ec412731d0a90cb32a7
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Apr 8 10:04:39 2016 -0500

    Copy mic kernel to knl for transliteration.

commit f8f02f0334ac020021e15a415bcd33aeea01deb4
Merge: 32c92d94 d1f8e5d9
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Apr 6 11:37:05 2016 -0500

    Merge branch 'master' into const_correctness

commit 32c92d945c55708da0eb63be1771f8c5430e3910
Merge: 62914ccb 20af937b
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Apr 6 11:36:02 2016 -0500

    Merge branch 'master' into const_correctness

commit d1f8e5d9b2ecd054ed103f4d642d748db2d4f173
Merge: 20af937b c11d28ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 5 12:21:27 2016 -0500

    Merge pull request #60 from esauvage/master
    
    sgemm µkernel for bulldozer : bug correction for k%4 != 0

commit c11d28eed89d65494bc4019f04d046520866c0ff
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date:   Sat Apr 2 21:15:48 2016 +0200

    cgemm µkernel for bulldozer : bug correction for k%4 != 0

commit 20af937b57f82bb3acb09418d5c0206e1b24f2c7
Merge: 36c3abb0 fc61a114
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 31 14:37:30 2016 -0500

    Merge pull request #59 from devinamatthews/fix_testsuite_makefile
    
    Fix testsuite makefile

commit fc61a1143edeba4946d4b9915f1775bb08e643fc
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Mar 31 10:53:01 2016 -0500

    Fix formatting in configure.

commit 26379b14de630e3a6c6eef5dfe87ff001558a8a6
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Mar 31 10:45:48 2016 -0500

    Adjust paths in common.mk to support building from testsuite dir.

commit 36c3abb05fecb02d4a9ab13b2b69d133adf34583
Merge: 64b41fa5 917ce754
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 31 10:26:17 2016 -0500

    Merge pull request #58 from esauvage/master
    
    cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…

commit 356d854fc9e34642cc46e0e02a8ceb56114878af
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Mar 30 16:33:15 2016 -0500

    Make symlink to common.mk in build directory.

commit edbb8470044f82ef959583ee09613a5a985292b5
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Wed Mar 30 16:27:11 2016 -0500

    Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile.

commit 917ce75482a543fef46553efff6c246939761e59
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date:   Wed Mar 30 22:03:09 2016 +0200

    cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel

commit 62914ccbcdb3c594f065dcfa65bd7e7b95c79283
Merge: bbf704bf 64b41fa5
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Tue Mar 29 15:24:25 2016 -0500

    Merge branch 'master' into const_correctness

commit 64b41fa554dff44b2f9ad48901b67c63836407a8
Merge: 1b09e343 0171ad58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 29 15:19:41 2016 -0500

    Merge pull request #54 from devinamatthews/more_config_opts
    
    More config opts

commit 1b09e343dfe5b48b4842e2cb96f41c8cc249bad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 29 12:55:28 2016 -0500

    Updated gcc version from 4.8 to 4.9 in .travis.yml.

commit 0171ad58997b3a5a9b76301511dbe0751fffc940
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Mar 28 13:55:06 2016 -0500

    Add icc and clang support for Intel architectures, fixes #47. 2bd036f fixes #49 BTW.

commit 3090fff64cc87ff2519a09f38e6b8699cf3cba11
Merge: 8624e365 4ca5d5b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 28 12:36:25 2016 -0500

    Merge pull request #44 from esauvage/master
    
    sgemm micro-kernel for FMA4 instruction set

commit e6e566426ac3ded7ef87cd8ff9be98accfdc4acc
Merge: 469429ec 8624e365
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sat Mar 26 14:10:15 2016 -0500

    Merge branch 'master' into more_config_opts

commit 8624e36543160739d954c4dbcc5a5594458f3a12
Merge: a315833f 2bd036f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 26 13:56:28 2016 -0500

    Merge pull request #50 from devinamatthews/fix_noopt_avx
    
    Fix configuration issue where instruction set flags are not specified for debug builds.

commit 469429ec34e5b1a172ce35596f9c7afdaacac131
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 20:45:41 2016 -0500

     Fix LD_FLAGS -> LDFLAGS.

commit 8442d65c9ead0376fc5f2dfad62fd4862ab9b2b3
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 20:06:48 2016 -0500

    Replace -march=native with specific architecture flags to support cross-compiling, and add icc support for Intel architectures.

commit 76099f20be1b49ac960f7e3c5a8296bbf4e1782d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 17:22:58 2016 -0500

    Add threading option to configure.

commit ad43eab4c7899d56d8d7caa6e2d92bc0581ea5a5
Merge: 9452bdb3 2bd036f1
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 15:00:02 2016 -0500

    Merge branch 'fix_noopt_avx' into more_config_opts

commit 9452bdb3afbf2d7f898134a091d7790817e7be9c
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 14:59:50 2016 -0500

    Add options for verbose make output and static/shared linking to configure.

commit 2bd036f1f9ce1ee0864365557f66d9415dd42de3
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 12:16:49 2016 -0500

    Fix configuration issue where instruction set flags are not specified for debug builds.

commit bbf704bf7501411964a63a68f1af541f612cf92d
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 25 09:55:35 2016 -0500

     Add missing const to bli_read_nway_from_env.

commit a315833f067944fb0bc14cf60f0c7dcb5dc897b6
Merge: 1d1a426d af92773f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 24 12:30:21 2016 -0500

    Merge pull request #48 from figual/master
    
    Updated and improved ARMv8 micro-kernels.

commit af92773f4f85a2441fe0c6e3a52c31b07253d08e
Author: figual <figual@ucm.es>
Date:   Wed Mar 23 22:07:02 2016 +0100

    Updated and improved ARMv8 micro-kernels.

commit a4d7729776d17d9bdf2341eacd70b9770b9ba8d2
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Mon Mar 21 09:55:21 2016 -0500

    Set default value for debug_type variable.

commit 0e2447fa55d8c5fa2b1fc4150073512495c5f9eb
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Mar 17 16:32:05 2016 -0500

    Add const correctness to auxinfo_t struct (microkernels need update theoretically).

commit 1d1a426d18ec03754021456862a1f4d1dfec1fbf
Merge: 5a978fff d226dfa0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 7 15:17:53 2016 -0600

    Merge pull request #46 from devinamatthews/new-config-opts
    
    Add several changes to the build system.

commit d226dfa05190eb477b33563b1edccf8603973336
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Sat Mar 5 16:18:14 2016 -0600

    Add several changes to the build system.
    
    1) Add -- options.
    2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
    3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
    4) Add make V=[0,1] option to control build verbosity.

commit 5a978fffdb8f09a81c89541d541d4a6830cd70a4
Merge: adb2b4e0 63e26423
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 4 17:26:58 2016 -0600

    Merge pull request #45 from devinamatthews/high_prec_timers
    
    Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday

commit 63e264239053b913164a849dd8a45829087eaddc
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 4 13:17:50 2016 -0600

    Make sure that -lrt is linked on Linux.

commit 44fddd48dc1708a956803d1948f04429ec0d8700
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Fri Mar 4 12:36:38 2016 -0600

    Add missing \.

commit 7cabd2131f953de23e7015d760b0ddfda51b1251
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Mar 3 11:43:07 2016 -0600

    Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday.

commit adb2b4e096c78e8b2f85fd372cf0d5eb04af5be8
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Mar 2 14:48:12 2016 -0600

    Fixing guard for non implemented partitioning through packed matrices

commit 4ca5d5b1fd6f2e4a8b2e139c5405475239581e51
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date:   Tue Mar 1 21:33:01 2016 +0100

    sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel

commit 627d59b5ba06866b26f46e4434a0435b600925e3
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date:   Mon Feb 29 21:53:12 2016 +0100

    symbolic link for bulldozer configuration to kernels

commit 2dc5c0ae038ed175fab85751803ada05734d1ba1
Merge: f2809fc5 3d0fae81
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 29 12:22:51 2016 -0600

    Merge pull request #40 from tkelman/bulldozer-symlink
    
    Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer

commit f2809fc5f74466c755da6a5b4632853e634060b5
Merge: f86b94f2 8624a33c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Feb 27 13:06:03 2016 -0600

    Merge pull request #39 from devinamatthews/fix_f2c_conflicts
    
    Devin's f2c type namespace update.
    
    Details:
    - Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
    - Removed most of the body of bli_f2c.h, which was unused.

commit 3d0fae810d942085d8f2d389820b4e0027577db8
Author: Tony Kelman <tony@kelman.net>
Date:   Thu Feb 25 23:24:03 2016 -0800

    Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
    
    to fix linking issue mentioned in #37 and https://groups.google.com/forum/#!topic/blis-devel/iypwljcaeEI

commit 8624a33ccc12dff6f6c4f92992ca5636af1576a6
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Feb 25 13:51:26 2016 -0600

    Fix remaining f2c conflicts.

commit 372eef0b6c0a535bf88d4b46b72f61266e8491ba
Author: Devin Matthews <dmatthews@utexas.edu>
Date:   Thu Feb 25 12:01:58 2016 -0600

     Fixed most conflicts after hack-n-slash ofr bli_f2c.h, cleanup in
    progress.

commit f86b94f206e2e09fa3221cc55c3dc5b05ca4775a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 23 18:12:34 2016 -0600

    Included missing blas2blis integer def to CBLAS.
    
    Details:
    - Added #include "bli_config_macro_defs" to all cblas_*.c files in
      compat/cblas/src. This has the effect of defining
      BLIS_BLAS2BLIS_INT_TYPE_SIZE to the default value if bli_config.h does
      not define it. Thanks to Tony Kelman for reporting this bug.
    - In cblas_i?amax.c, changed the type of the variable 'iamax' from 'int'
      to 'f77_int'. This eliminates a compiler warning and a potential
      runtime bug and/or crash when the size of an int differs from the size
      of f77_int (as determined by BLIS_BLAS2BLIS_INT_TYPE_SIZE).

commit 0b126de1342c11c65623bcb38e258e21e9244e3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 13 16:29:12 2015 -0600

    Consolidated packm_blk_var1 and packm_blk_var2.
    
    Details:
    - Consolidated the two blocked variants for packm into a single
      implementation (packm_blk_var1) and removed the other variant.
    - Updated all induced method _cntl_init() functions in frame/cntl/ind/
      to use the new blocked variant 1.
    - Defined two new macros, bli_is_ind_packed() and bli_is_nat_packed(),
      to detect pack_t schemas for induced methods and native execution,
      respectively.

commit 30e5eb29e060b97752f702d2ea5d101d950f53b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 13 12:14:19 2015 -0600

    Minor changes to treatment of rs, cs in bli_obj.c.
    
    Details:
    - Applied a patch submitted by Devin Matthews that:
      - implements subtle changes to handling of somewhat unusual cases of
        row and column strides to accommodate certail tensor cases, which
        includes adding dimension parameters to _is_col_tilted() and
        _is_row_tilted() macros,
      - simplifies how buffers are sized when requested BLIS-allocated
        objects,
      - re-consolidates bli_adjust_strides_*() into one function, and
      - defines 'restrict' keyword as a "nothing" macro for C++ and pre-C99
        environments.

commit f0a4f41b5acf55b41707ec821c4c5f9076dfbc24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 12 15:22:50 2015 -0600

    Fixed unimplemented case in core2 sgemm ukernel.
    
    Details:
    - Implemented the "beta == 0" case for general stride output for the
      dunnington sgemm micro-kernel. This case had been, up until now,
      identical to the "beta != 0" case, which does not work when the
      output matrix has nan's and inf's. It had manifested as nan residuals
      in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin
      Matthews for reporting this bug.

commit 42810bbfa0b8f006ecc5128d903909ec13ea63f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 12 12:07:46 2015 -0600

    Fixed minor bugs for uncommon obj_create cases.
    
    Details:
    - Separated bli_adjust_strides() into _alloc() and _attach() flavors so
      that the latter can avoid a test performed by the former, in which the
      rs and cs are overridden and set to zero if either matrix dimension is
      zero. Actually, we also disable this overridding behavior, even for the
      _alloc() case, since keeping the original strides (probably) does not
      hurt anything. The original code has been kept commented-out, though,
      in case an unintended consequence is later discovered.
    - Fixed a typo in an error check for general stride cases where rs == cs.

commit 3e6dd11467643fbc2cb45c13cec8dd6024232833
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 3 10:30:08 2015 -0600

    Minor re-expression in quadratic partitioning code.
    
    Details:
    - Minor change to quadratic equation solution code that avoids
      recomputation of the sqrt() parameter when the compiler is not
      smart enough to perform this optimization automatically.

commit 0694b722f7e4df00efb32639095a2aca80e67f52
Merge: 3e116f0a 33557ecc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 2 17:24:25 2015 -0600

    Merge branch 'master' of github.com:flame/blis

commit 3e116f0a2953f50b3c068759a775ad7ffae04e49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 2 17:18:23 2015 -0600

    Fixed imaginary bug in quadratic partitioning code.
    
    Details:
    - Fixed a bug in the relatively new quadratic partitioning code that,
      under the right conditions, would perform sqrt() on a negative value.
      If the solution is imaginary, we discard it and use an alternate
      partition width that assumes no diagonal intersection. That alternate
      width is actually already computed, so, the fix was quite simple.
      Thanks to Devangi Parikh for reporting this bug.

commit 33557ecccaf49b2569b7f3d7bcea52c2aab94c68
Author: Jeff Hammond <jeff.science@gmail.com>
Date:   Mon Nov 2 12:18:43 2015 -0800

    add Travis CI build status icon to the README

commit 4a502fbe77bd0f701108baaa559d9cfb483f88de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 2 13:28:34 2015 -0600

    Laid groundwork for runtime memory pool resizing.
    
    Details:
    - Changed bli_pool_finalize() so that the freeing begins with the block
      at top_index instead of block 0. This allows us to use the function
      for terminal finalization as well as temporary cleanup prior to
      reinitialization. Also, clear the pool_t struct upon _pool_finalize()
      in case it is called in the terminal case with some blocks still
      checked out to threads (in which case the threads will see the new
      block size as 0 and thus release the block as intended).
    - Added bli_pool_reinit(), which calls _pool_finalize() followed by
      _pool_init() with new parameters.
    - Added bli_mem_reinit(), which is based on bli_pool_reinit().
    - Added new wrapper, _mem_compute_pool_block_sizes(), which calls
      _mem_compute_pool_block_sizes_dt().
    - Updated bli_mem_release() so that the pblk_t is freed, via
      _pool_free_block(), if the block size recorded in the mem_t at the
      time the pblk_t was acquired is now different from the value in the
      pool_t.

commit 37e55ca39bdbddaec03ad30d43e8ad2b3e549c96
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 30 18:25:04 2015 -0500

    Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
    
    Details:
    - Fixed a family of bugs in the triangular level-3 operations for
      certain complex implementations (3m1 and 4m1a) that only manifest if
      one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
      - Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
        for the triangular case.
      - Fixed the incorrect computation of imaginary stride, as stored in
        the auxinfo_t struct in trmm and trsm macro-kernels.
      - Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
        cases where the the register blocksize for the triangular matrix is
        odd. Introduced a new byte-granular pointer arithmetic macro,
        bli_ptr_add(), that computes the correct value.
    - Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
      terms of __typeof__, which is used by bli_ptr_add() macro.
    - Disabled the row- vs. column-storage optimization in bli_trmm_front()
      for singleton problems because the inherent ambiguity of whether a
      scalar is row-stored or column-stored causes the wrong parameter
      combination code to be executed (by dumb luck of our checking for
      row storage first).
    - Added commented-out debugging lines to 3m1/4m1a and reference
      micro-kernels, and trsm_ll macro-kernel.

commit 46294d80e5a79c598e200e1c8ec2a642ff839971
Merge: d3159c57 a0a7b85a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 27 12:41:23 2015 -0500

    Merge pull request #35 from figual/master
    
    Fixed incomplete code in the double precision ARMv8 microkernel.

commit a0a7b85ac3e157af53cff8db0e008f4a3f90372c
Author: Francisco Igual <figual@ucm.es>
Date:   Tue Oct 27 08:59:15 2015 +0000

    Fixed incomplete code in the double precision ARMv8 microkernel.

commit d3159c5740c9ee7f8c0b661003aab6f00646ad6f
Merge: b489152e 7e03e45b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 21 14:54:00 2015 -0500

    Merge branch 'master' of github.com:flame/blis

commit b489152e112644ec3b6d19e687231a9607f7694f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 21 14:53:17 2015 -0500

    Use vzeroall in haswell micro-kernels.

commit 7e03e45bfe6c27c4fdbf06b1caa7f49e9a5fef49
Merge: 77ddb0b1 4f88c29f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 14 13:26:07 2015 -0500

    Merge pull request #33 from xianyi/master
    
    Enable Travis CI

commit 4f88c29f9e634cbb6fb22d8c88931f0ec78ad7db
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Wed Oct 14 12:57:50 2015 -0500

    Detect Intel Broadwell (using Haswell config).

commit 4b0ac1a9984a93f7ad4369b10fca63991107d9f5
Merge: fe3e355c 77ddb0b1
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Wed Oct 14 12:51:05 2015 -0500

    Merge branch 'upstream_master'

commit 77ddb0b1d31ada111dadf392766ba6d9210ed9fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 13 12:53:06 2015 -0500

    Removed flop-counting mechanism.
    
    Details:
    - Removed the optional flop-counting feature introduced in commit
      7574c994.

commit 276da366187460a4c8e6e0910e79cb39ce780bfe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 12 11:43:03 2015 -0500

    Minor formatting change to README.md.

commit d17057446f5404824478e8a6cd08f242ab75544a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 12 11:39:49 2015 -0500

    Added "Getting Started" section to README.md.
    
    Details:
    - Added section to README.md file containing links to wikis with brief
      descriptions.

commit e7e1f2f7b601b21b50e3cdad8972cb3fe11018d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 2 16:51:52 2015 -0500

    Minor updates to CREDITS, README files.

commit 55329906ecd7ce1ab910e4d30a29354a9172e7ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Sep 26 20:47:19 2015 -0500

    Minor edits to README.md, testsuite.
    
    Details:
    - Fixed typos in README.md.
    - Fixed column heading alignment for testsuite when matlab output is
      enabled.
    - Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.

commit bbebdb5793a8fd6aaf257012ab0272beaa04a0de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 25 14:47:27 2015 -0500

    Replaced README with README.md.
    
    Details:
    - Replaced the old (and short) README file with a much more comprehensive
      version written in github-flavored markdown. The new file is based on
      content taken from the old Google Code homepage.

commit e2e9d64a63485461192d9c2a6dd0183a8b71013c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 24 12:14:03 2015 -0500

    Load balance thread ranges for arbitrary diagonals.
    
    Details:
    - Expanded/updated interface for bli_get_range_weighted() and
      bli_get_range() so that the direction of movement is specified in the
      function name (e.g. bli_get_range_l2r(), bli_get_range_weighted_t2b())
      and also so that the object being partitioned is passed instead of an
      uplo parameter. Updated invocations in level-3 blocked variants, as
      appropriate.
    - (Re)implemented bli_get_range_*() and bli_get_range_weighted_*() to
      carefully take into account the location of the diagonal when computing
      ranges so that the area of each subpartition (which, in all present
      level-3 operations, is proportional to the amount of computation
      engendered) is as equal as possible.
    - Added calls to a new class of routines to all non-gemm level-3 blocked
      variants:
        bli_<oper>_prune_unref_mparts_[mnk]()
      where <oper> is herk, trmm, or trsm and [mnk] is chosen based on which
      dimension is being partitioned. These routines call a more basic
      routine, bli_prune_unref_mparts(), to prune unreferenced/unstored
      regions from matrices and simultaneously adjust other matrices which
      share the same dimension accordingly.
    - Simplified herk_blk_var2f, trmm_blk_var1f/b as a result of more the
      new pruning routines.
    - Fixed incorrect blocking factors passed into bli_get_range_*() in
      bli_trsm_blk_var[12][fb].c
    - Added a new test driver in test/thread_ranges that can exercise the new
      bli_get_range_*() and bli_get_range_weighted_*() under a range of
      conditions.
    - Reimplemented m and n fields of obj_t as elements in a "dim"
      array field so that dimensions could be queried via index constant
      (e.g. BLIS_M, BLIS_N). Adjusted/added query and modification
      macros accordingly.
    - Defined mdim_t type to enumerate BLIS_M and BLIS_N indexing values.
    - Added bli_round() macro, which calls C math library function round(),
      and bli_round_to_mult(), which rounds a value to the nearest multiple
      of some other value.
    - Added miscellaneous pruning- and mdim_t-related macros.
    - Renamed bli_obj_row_offset(), bli_obj_col_offset() macros to
      bli_obj_row_off(), bli_obj_col_off().

commit fe3e355c9c5a6f65b8736b009e2d501b62a83ea1
Merge: efa641e3 4dd9dd3e
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Fri Aug 21 14:38:36 2015 -0500

    Merge branch 'upstream_master'

commit efa641e36b73abee34166a252e90e28a6281d92d
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Sat Aug 22 03:15:50 2015 +0800

    Try to fix the compiling bug on travis.

commit 4dd9dd3e1de626b51bfe85d9ee65f193d60e8d38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 21 11:52:37 2015 -0500

    Fixed minor alignment ambiguity bug in bli_pool.c.
    
    Details:
    - Fixed a typecasting ambiguity in bli_pool_alloc_block() in which
      pointer arithmetic was performed on a void* as if it were a byte
      pointer (such as char*). Some compilers may have already been
      interpreting this situation as intended, despite the sloppiness.
      Thanks to Aleksei Rechinskii for reporting this issue.
    - Redefined pointer alignment macros to typecast to uintptr_t instead of
      siz_t.

commit 12ffd568b04feda57147c13b67717416a01c82f8
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Sat Aug 22 00:24:28 2015 +0800

    Add Travis CI.

commit ecc3ebb749e0861c27deda52b5f87236ede4901b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 29 13:31:12 2015 -0500

    CHANGELOG update (0.1.8)

commit 47caa33485b91ea6f2a5e386e61210c90c5f489f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 29 13:31:09 2015 -0500

    Version file update (0.1.8)

commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e
Merge: fdfe14f1 d4b89136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 9 13:54:54 2015 -0500

    Merge branch 'master' of github.com:flame/blis

commit fdfe14f1e17ba5a2f8dfa0bdb799c6b0e730211b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 9 13:52:39 2015 -0500

    Added support for Intel Haswell/Broadwell.
    
    Details:
    - Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
      and FMA instructions. (Complex support is currently provided by default
      induced method, 4m1a.)
    - Added a 'haswell' configuration, which uses the aforementioned kernels.
    - Inserted auto-detection support for haswell configuration in
      build/auto-detect/cpuid_x86.c.
    - Modified configure script to explicitly echo when automatic or manual
      configuration is in progress.
    - Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.

commit d4b891369c1eb0879ade662ff896a5b9a7fca207
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 7 10:06:53 2015 -0500

    Added 'carrizo' configuration.
    
    Details:
    - Added a new configuration for AMD Excavator-based hardware also known
      as Carrizo when referring to the entire APU. This configuration uses
      the same micro-kernels as the piledriver, but with different
      cache blocksizes.

commit 0b7255a642d56723f02d7ca1f8f21809967b8515
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 19 12:01:50 2015 -0500

    CHANGELOG update (0.1.7)

commit 267253de8a7be546ce87626443ee38701c1d411f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 19 12:01:49 2015 -0500

    Version file update (0.1.7)

commit 7cd01b71b5e757a6774625b3c9f427f5e7664a76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 19 11:31:53 2015 -0500

    Implemented dynamic allocation for packing buffers.
    
    Details:
    - Replaced the old memory allocator, which was based on statically-
      allocated arrays, with one based on a new internal pool_t type, which,
      combined with a new bli_pool_*() API, provides a new abstract data
      type that implements the same memory pool functionality but with blocks
      from the heap (ie: malloc() or equivalent). Hiding the details of the
      pool in a separate API also allows for a much simpler bli_mem.c family
      of functions.
    - Added a new internal header, bli_config_macro_defs.h, which enables
      sane defaults for the values previously found in bli_config. Those
      values can be overridden by #defining them in bli_config.h the same
      way kernel defaults can be overridden in bli_kernel.h. This file most
      resembles what was previously a typical configuration's bli_config.h.
    - Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
      defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
      blocks in the memory pool. Also added a corresponding query routine to
      the bli_info API.
    - Deprecated (once again) the micro-panel alignment feature. Upon further
      reflection, it seems that the goal of more predictable L1 cache
      replacement behavior is outweighed by the harm caused by non-contiguous
      micro-panels when k % kc != 0. I honestly don't think anyone will even
      miss this feature.
    - Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
      bli_cntl_init() instead of bli_init().
    - Removed query functions from bli_info.c that are no longer applicable
      given the dynamic memory allocator.
    - Removed unnecessary definitions from configurations' bli_config.h files,
      which are now pleasantly sparse.
    - Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
      modules. Thanks to Devangi Parikh for pointing out these
      miscalculations.
    - Comment, whitespace changes.

commit 9848f255a3bab17d1139c391cca13ff3f1ffe6ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 11 19:14:22 2015 -0500

    Added early return to API-level _init() routines.
    
    Details:
    - Added conditional code that returns early from the API-level _init()
      routines if the API is already initialized. Actually meant for this to
      be included in 5f93cbe8.

commit 5f93cbe870f3478870e15581e7fd450dad5bba1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 11 18:52:12 2015 -0500

    Introduced API-level initialization.
    
    Details:
    - Added API-level initialization state to _const, _error, _mem, _thread,
      _ind, and _cntl APIs. While this functionality will mostly go unused,
      adding miniscule overhead at init-time, there will be at least once
      instance in the near future where, in order to avoid an infinite loop,
      a certain portion of the initialization will call a query function that
      itself attempts to call bli_init(). API-level initialization will allow
      this later stage to verify that an earlier stage of initialization has
      completed, even if the overall call to bli_init() has not yet returned.
    - Added _is_initialized() functions for each API, setting the underlying
      bool_t during _init() and unsetting it during _finalize().
    - Comment, whitespace changes.

commit ee129c6b028bc5ac88da7c74fde72c49803742ff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 10 12:53:28 2015 -0500

    Fixed bugs in _get_range(), _get_range_weighted().
    
    Details:
    - Fixed some bugs that only manifested in multithreaded instances of
      some (non-gemm) level-3 operations. The bugs were related to invalid
      allocation of "edge" cases to thread subpartitions. (Here, we define
      an "edge" case to be one where the dimension being partitioned for
      parallelism is not a whole multiple of whatever register blocksize
      is needed in that dimension.) In BLIS, we always require edge cases
      to be part of the bottom, right, or bottom-right subpartitions.
      (This is so that zero-padding only has to happen at the bottom, right,
      or bottom-right edges of micro-panels.) The previous implementations
      of bli_get_range() and _get_range_weighted() did not adhere to this
      implicit policy and thus produced bad ranges for some combinations of
      operation, parameter cases, problem sizes, and n-way parallelism.
    - As part of the above fix, the functions bli_get_range() and
      _get_range_weighted() have been renamed to use _l2r, _r2l, _t2b,
      and _b2t suffixes, similar to the partitioning functions. This is
      an easy way to make sure that the variants are calling the right
      version of each function. The function signatures have also been
      changed slightly.
    - Comment/whitespace updates.
    - Removed unnecessary '/' from macros in bli_obj_macro_defs.h.

commit 9135dfd69d39f3bbd75034f479f27a78dbfebcce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 5 13:37:44 2015 -0500

    Minor updates to test/3m4m files.

commit d62ceece943b20537ec4dd99f25136b9ba2ae340
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 3 12:56:45 2015 -0500

    Minor update to test/3m4m/runme.sh.
    
    Details:
    - Removed some stale script code that should have been removed
      during 590bb3b8c.

commit b6ee82a3d421c9c4f1eb6848c7c6e37aa46de799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 3 12:14:23 2015 -0500

    Minor cleanup to bli_init() and friends.
    
    Details:
    - Spun-off initialization of global scalar constants to bli_const_init()
      and of threading stuff to bli_thread_init().
    - Added some missing _finalize() functions, even when there is nothing
      to do.

commit 1213f5cebabc1637ce9dd45c4bfa87bb93677c29
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 2 13:27:47 2015 -0500

    POSIX thread bugfixes/edits to bli_init.c, _mem.c.
    
    Details:
    - Fixed a sort-of bug in bli_init.c whereby the wrong pthread mutex
      was used to lock access to initialization/finalization actions.
      But everything worked out okay as long as bli_init() was called by
      single-threaded code.
    - Changed to static initialization for memory allocator mutex in
      bli_mem.c, and moved mutex to that file (from bli_init.c).
    - Fixed some type mismatches in bli_threading_pthreads.c that resulted
      in compiler warnings.
    - Fixed a small memory leak with allocated-but-never-freed (and unused)
      pthread_attr_t objects.
    - Whitespace changes to bli_init.c and bli_mem.c.

commit 590bb3b8c5c0389159c5a9451b6c156c5f237e8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun May 24 16:02:53 2015 -0500

    Backed-out adjusted dim changes to test/3m4m.
    
    Details:
    - Reverted most changes applied during commit ec25807b.

commit ec25807b26da943868f0d0517c3720e50181b8f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 10 13:23:50 2015 -0500

    Tweaks to test/3m4m to test with adjusted dims.
    
    Details:
    - Updated test/3m4m driver files to build test drivers that allow
      comparision of real "asm_blis" results to complex "asm_blis" results,
      except with the latter's problem sizes adjusted so that problems are
      generated with equal flop counts.

commit 426b6488580a92bf071a62dc319a9c837ce39821
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 8 15:12:21 2015 -0500

    Fixed a packing bug that manifested in trsm_r.
    
    Details:
    - Fixed a bug that caused a memory leak in the contiguous memory
      allocator. Because packm_init() was using simple aliasing when
      a subpartition object was marked as zeros by bli_acquire_mpart_*(),
      the "destination" pack object's mem_t entry was being overwritten
      by the corresponding field of the "source" object (which was likely
      NULL). This prevented the block from being released back to the
      memory allocator. But this bug only manifested when changing the
      location of packing B from outside the var1 loop to inside the
      var3 loop, and only for trsm with triangular B (side = right). The
      bug was fixed by changing the type of alias used in packm_init()
      when handling zero partition cases. Specifically, we now use
      bli_obj_alias_for_packing(), which does not clobber the destination
      (pack) object's mem_t field. Thanks to Devangi Parikh for this bug
      report.

commit c84286d5cef48f16d83831baac1f46b9856b9a36
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 4 15:39:14 2015 -0500

    More minor tweaks to test/3m4m.
    
    Details:
    - Added a line of output that forces matlab to allocate the entire array
      up-front.
    - Re-enabled real domain benchmarks in runme.sh, which were temporarily
      disabled.

commit 309717c8ebf4ef1369f15cf41340e13c25b41573
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 3 19:28:49 2015 -0500

    More tweaks to test/3m4m, configurations.
    
    Details:
    - Fixed incorrect number of mc_x_kc memory blocks in
      sandybridge/bli_config.h.
    - Enabled OpenMP multithreding in piledriver/bli_config.h.
    - More updates to test/3m4m driver files.

commit 4baf3b9c69b2f648be9e46e07ccc9859dd675828
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 3 16:44:32 2015 -0500

    Tweaked test/3m4m driver, including acml support.
    
    Details:
    - Added ACML support to test/3m4m driver Makefile and runme.sh script.

commit a32f7c49ca4ea869d2a6c66818780f4321743d67
Merge: 349e075a 4bfd1ce8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 3 08:28:11 2015 -0500

    Merge pull request #23 from xianyi/master
    
    Add auto-detecting CPU  on configure stage.

commit 349e075ad6a8e2a1211d94f36d24828c9d44b052
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 2 18:12:28 2015 -0500

    Tweaks to sandybridge config, test/3m4m driver.
    
    Details:
    - Enable OpenMP support by default in sandybridge's bli_config.h.
    - Reorganized sandybridge's bli_kernel.h.
    - Updated 3m4m Makefile, runme.sh to also test MKL implementation.

commit 4bfd1ce8ca93f93d170dd2715f0a32027b417b46
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Thu Apr 2 16:40:21 2015 -0500

    Detect NEON for cortex-a9 and cortex-a15.

commit aa6eec4f43137057276fe6119bdbfb5c52682527
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Thu Apr 2 16:03:44 2015 -0500

    Detect the CPU architecture. Support ARM cores.
    
    Detect the CPU architecture by compiler's predefined macros.
    Then, detect the CPU cores.
    
    Support detecting x86 and ARM architectures.

commit 2947cfb749c937b0f62fac36cc92f123bd45b53c
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date:   Wed Apr 1 12:24:00 2015 -0500

    Add auto-detecting CPU  on configure stage.
    e.g.  /Path_to_BLIS/configure auto
    
    Now, it only support detecting x86 CPUs.

commit 26a4b8f6f985597f80e0174990bf541f1d9bafac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 1 10:44:54 2015 -0500

    Implemented 3m2, 3m3 induced algorithms (gemm only).
    
    Details:
    - Defined a new "3ms" (separated 3m) pack schema and added appropriate
      support in packm_init(), packm_blk_var2().
    - Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
      as an argument instead of computing it locally. Exception: for trmm,
      is_p must be computed locally, since it changes for triangular
      packed matrices. Also exposed is_p in interface to dt-specific
      packm_blk_var2 (and _var1, even though it does not use imaginary
      stride).
    - Renamed many functions/variables from _3mi to _3mis to indicate that
      they work for either interleaved or separated 3m pack schemas.
    - Generalized gemm and herk macro-kernels to pass in imaginary stride
      rather than compute them locally.
    - Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
      and 3m3-specific virtual micro-kernels.
    - Added special gemm macro-kernels to support 3m2 and 3m3.
    - Added support for 3m2 and 3m3 to testsuite.
    - Corrected the type of the panel dimension (pd_) in various macro-
      kernels from inc_t to dim_t.
    - Renamed many functions defined in bli_blocksize.c.
    - Moved most induced-related macro defs from frame/include to
      frame/ind/include.
    - Updated the _ukernel.c files so that the micro-kernel function pointers
      are obtained from the func_t objects rather than the cpp macros that
      define the function names.
    - Updated test/3m4m driver, Makefile, and run script.

commit ddf62ba7d2da08225b201585b85e06c967767dea
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Mar 27 14:27:51 2015 -0500

    Refuse to free the packm thread info if it uses the single threaded version

commit 016fc587584d958a0e430a56a5e2c05022ac2f17
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Mar 27 14:23:02 2015 -0500

    Don't free packm thread info if it is null

commit 00a443c529a60862a57b93e303a0b3212c9b1df4
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Mar 27 14:11:07 2015 -0500

    Use bli_malloc instead of malloc for the thread info paths

commit f1a6b7d02861ccebdc500ea98778cc0f6cddad17
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 18 15:37:10 2015 -0500

    Reorganized code for induced complex methods.
    
    Details:
    - Consolidated most of the code relating to induced complex methods
      (e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
      are now enabled on a per-operation basis. The current "available"
      (enabled and implemented) implementation can then be queried on
      an operation basis. Micro-kernel func_t objects as well as blksz_t
      objects can also be queried in a similar maner.
    - Redefined several micro-kernel and operation-related functions in
      bli_info_*() API, in accordance with above changes.
    - Added mr and nr fields to blksz_t object, which point to the mr
      and nr blksz_t objects for each cache blocksize (and are NULL for
      register blocksizes). Renamed the sub-blocksize field "sub" to
      "mult" since it is really expressing a blocksize multiple.
    - Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
      trsm to correctly query mr and nr (for purposes of nudging kc).
    - Introduced an enumerated opid_t in bli_type_defs.h that uniquely
      identifies an operation. For now, only level-3 id values are defined,
      along with a generic, catch-all BLIS_NOID value.
    - Reworked testsuite so that all induced methods that are enabled
      are tested (one at a time) rather than only testing the first
      available method.
    - Reformated summary at the beginning of testsuite output so that
      blocksize and micro-kernel info is shown for each induced method
      that was requested (as well as native execution).
    - Reduced the number of columns needed to display non-matlab
      testsuite output (from approx. 90 to 80).

commit 8d5169ccda954e5f72944308a036dcb7ebfc9097
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 18 11:38:08 2015 -0500

    Fixed bug in release of mem_t buffer.
    
    Details:
    - Fixed a bug that affects all level-2 and level-3 blocked variants. The
      bug only manifested, however, if the packing of operands (A and B in
      gemm, for example) spanned multiple nodes in the control tree. Until
      recently, the main consumers of packm were level-3 operations, all of
      which packed both input operands from blocked variant 1 (B outside of
      the loop, and A within the loop). This particular usage masked a flaw
      in the code whereby bli_obj_release_pack() would always release the
      underlying mem_t buffer (provided it was allocated), even if the buffer
      was not allocated in the current variant. This has been fixed by
      replacing all calls to bli_obj_release_pack() with calls to a new
      function, bli_packm_release(), which takes the same control tree node
      argument passed into the object's corresponding call to packm_init()
      or packv_init(). bli_packm_release() then proceeds to invoke
      bli_obj_release_pack() only if the control tree node indicates that
      packing was requested. Thanks to Devangi Parikh for identifying this
      bug.

commit c0acca0f5182ba96fd39c9d10b34a896a6e74206
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 3 10:56:22 2015 -0600

    Clarified comments in testsuite input.operations.

commit 03ba9a6b17861d9e1adc0cf924439c4d7e860d19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 24 10:33:28 2015 -0600

    Removed some 'old' directories.

commit a86db60ee270cdeb745ae7cf68f9e0becc9f522d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 23 18:42:39 2015 -0600

    Extensive renaming of 3m/4m-related files, symbols.
    
    Details:
    - Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
      ('i' for "interleaved"). Similar changes to 3M/4M macros.
    - Renamed all 3m/4m files and functions to 3m1/4m1.
    - Whitespace changes.

commit 8cf8da291a0fb2f491f410969a76ec0fbda47faf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 20 15:24:27 2015 -0600

    Minor updates to induced complex mode management.
    
    Details:
    - Relocated bli_4mh.c, bli_4mb.c, bli_4m.c, bli_3mh.c, bli_3m.c (and
      associated headers) from frame/base to frame/base/induced.
    - Added bli_xm.? to frame/base/induced, which implements
      bli_xm_is_enabled(), which detects whether ANY induced complex method
      is currently enabled.
    - The new function bli_xm_is_enabled() is now used in bli_info.c to
      detect when an induced complex method is used, so we know when to
      return blocksizes from one of the induced methods' blocksize objects.

commit 411e637ee7d1083a84f58f08938d51e63d7c3c9a
Merge: c2569b88 fc0b7712
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Fri Feb 20 20:39:25 2015 -0600

    Merge branch 'master' of http://github.com/flame/blis

commit c2569b8803d4ccc1d7b6f391713461b51443601d
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Fri Feb 20 20:38:19 2015 -0600

    Fixed a memory leak in freeing the thread infos

commit fc0b771227abf86d81f505b324f69f6e83db1d8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 20 11:47:44 2015 -0600

    Added max(mr,nr) to kc in static mem pools.
    
    Details:
    - Changed the static memory definitions to compute the maximum register
      blocksize for each datatype and add it to kc when computing the size
      of blocks of A and B. This formally accounts for the nudging of kc
      up to a multiple of mr or nr at runtime for triangular operations
      (e.g. trmm).

commit af32e3a608631953ef770341df10a14a991bf290
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Thu Feb 19 22:51:11 2015 -0600

    Fixed a bug with get_range_weighted would return end = 0 for small problem sizes

commit 441d47542a64e131578d00da7404c1ed387a721c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 19 17:06:10 2015 -0600

    Renamed 3m and 4m symbols/macros to 3mi and 4mi.
    
    Details:
    - Renamed several variables and macros from 3m/4m to 3mi/4mi. This is
      because those packing schemas were always implicitly "interleaved".
      This new naming scheme will make way for new schemas that separate
      instead of interleve the real and imaginary (and summed) parts.
    - Expanded the pack format sub-field of the pack schema field of the
      info_t to 4 bits (from 3). This will allow for more schema types
      going forward.
    - Removed old _cntl.c files for herk3m, herk4m, trmm3m, trmm4m.

commit 518a1756ccf02122b96fc437b538604a597df42a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 19 14:27:09 2015 -0600

    Fixed indexing bug for trmm3 via 3mh, 4mh.
    
    Details:
    - Fixed a bug that only affected trmm3 when performed via 3mh or 4mh,
      whereby micro-panels of the triangular matrix were packed with "dead
      space" between them due to failing to adjust for the fact that pointer
      arithmetic was occurring in units of complex elements while the data
      being packed consisted of real elements. It turns out that the macro-
      kernel suffered from the same bug, meaning the panels were actually
      being packed and read consistently. The only way I was able to
      discover the bug in the first place was because the packed block of A
      was overflowing into the beginning of the packed row panel of B using
      the sandybridge configuration.

commit 493087d730f01d5169434f461644e5633f48a42f
Merge: 650d2a6f 25021299
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 18 09:45:51 2015 -0600

    Merge branch 'master' of github.com:flame/blis

commit 25021299b670775df8ca9c87910c63d7e74ed946
Merge: fe2b8d39 f05a5763
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 11 20:03:21 2015 -0600

    Merge branch 'master' of github.com:flame/blis

commit fe2b8d39a445ac848686e78c7540fd046cb95492
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 11 19:33:10 2015 -0600

    Fixed an obscure bug in 3mh/3m/4mh/4m packing.
    
    Details:
    - Modified bli_packm_blk_var1.c and _var2.c to increase the triangular
      case's panel increment by 1 if it would otherwise be odd. This is
      particularly necessary in _var2.c when handling the interleaved 3m
      or ro/io/rpi pack schemas, since division of an odd number by 2 can
      happen if both the panel length and the panel packing dimension
      (register packing blocksize) are odd, thus making their product odd.
    - Modified bli_packm_init.c so that panel strides are increased by 1
      if they would otherwise be odd, even for non-3m related packing.
    - Modified the trmm and trsm macro-kernels so that triangular packed
      micro-panels are traversed with this new "increment by 1 if odd"
      policy.
    - Added sanity checks in trmm and trsm macro-kernels that would result
      in an abort() if the conditions that would lead to a "divide odd
      integer by 2" scenario ever manifest.
    - Defined bli_is_odd(), _is_even() macros in bli_scalar_macro_defs.h.

commit 650d2a6ff2e593151a296ca86b5214afcc747afc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 9 14:59:20 2015 -0600

    Added initial support for imaginary stride.
    
    Details:
    - Added an imaginary stride field ("is") to obj_t.
    - Renamed bli_obj_set_incs() macro to bli_obj_set_strides().
    - Defined bli_obj_imag_stride() and bli_obj_set_imag_stride() and
      added invocations in key locations.
    - Added some basic error-checking related to imaginary stride.
    - For now, imaginary stride will not be exposed into the most-used
      BLIS APIs such as bli_obj_create(), and certainly not the
      computational APIs such as bli_dgemm().

commit f05a57634a7c8e3864b25b3335d1194c1ea1aeb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Feb 8 19:40:34 2015 -0600

    Defined gemm cntl function to query ukrs func_t.
    
    Details:
    - Added a new function, bli_gemm_cntl_ukrs(), that returns the func_t*
      for the gemm micro-kernels from the leaf node of the control tree.
      This allows all the func_t* fields from higher-level nodes in the tree
      to be NULL, which makes the function that builds the control trees
      slightly easier to read.
    - Call bli_gemm_cntl_ukrs() instead of the cntl_gemm_ukrs() macro in
      all bli_*_front() functions (which is needed to apply the row/column
      preference optimization).
    - In all level-3 bli_*_cntl_init() functions, changed the _obj_create()
      function arguments corresponding to the gemm_ukrs fields in higher-
      level cntl tree nodes to NULL.
    - Removed some old her2k macro-kernels.

commit cefd3d5d2001264de17cf63dae541f890cb9daaf
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 5 11:09:12 2015 -0600

    A couple of functions were incorrectly ifdeffed away on Xeon Phi. Fixed this

commit 7574c9947d57a19f613880e3b9f62f8c8f6df4ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 4 12:11:55 2015 -0600

    Added basic flop-counting mechanism (level-3 only).
    
    Details:
    - Added optional flop counting to all level-3 front-ends, which is
      enabled via BLIS_ENABLE_FLOP_COUNT. The flop count can be
      reset at any time via bli_flop_count_reset() and queried via
      bli_flop_count(). Caveats:
      - flop counts are approximate for her[2]k, syr[2]k, trmm, and
        trsm operations;
      - flop counts ignore extra flops due to non-unit alpha;
      - flop counts do not account for situations where beta is zero.

commit ceda4f27d1f1bcf19320e09848e0f2e3b9941e6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 29 13:22:54 2015 -0600

    Implemented bli_obj_imag_equals().
    
    Details:
    - Implemented a new function, bli_obj_imag_equals(), which compares the
      imaginary part of the first argument to the second argument, which may
      be a BLIS_CONSTANT or of a regular real datatype.

commit 81114824a05a9053229efd577a8a94a856deda93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 6 12:15:21 2015 -0600

    Minor 4m/3m consolidation to mem_pool_macro_defs.h.
    
    Details:
    - Merged the 4m and 3m definitions in bli_mem_pool_macro_defs.h to
      reduce code and improve readability.

commit 36a9b7b7436d9423ba4de2a9f85cfcd43577b783
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Wed Dec 17 21:53:50 2014 +0000

    reduced the default number of MC by KC blocks for bgq

commit c60619c7c3568f044a849abbab60209aa7455423
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 16 17:08:22 2014 -0600

    Minor tweaks for 3m4m test drivers.
    
    Details:
    - Changed gemm_kc blocksizes to be reduced by two-thirds instead of
      half.
    - Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
      computing the fixed k dimension.
    - Fixed runme.sh so that it would use multiple threads for s/dgemm
      cases.

commit c6929ba6a5e6f633a7295e979a2b8df8c7ecdb1b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 16 11:27:50 2014 -0600

    Added 4m_1b to test/3m4m test driver and script.

commit 785d480805fc0d6f4251b5499933515740b6b2a7
Merge: 9456f330 4156c088
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 12 14:34:19 2014 -0600

    Merge branch 'master' of github.com:flame/blis

commit 9456f330af4617f9ee32972d51f974aa2d84f97b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 12 14:31:57 2014 -0600

    Added 4m_1b implementation for gemm.
    
    Details:
    - Added yet another 4m-based implementation for complex domain level-3
      operations. This method, which the 3m/4m paper identifies as Algorithm
      "4m_1b" fissures the first loop around the micro-kernel so that the
      real sub-panel of the current micro-panel of B is multiplied against
      (both sub-panels of) all micro-panels of A, before doing the same for
      the imaginary sub-panel of the micro-panel of B. For now, only gemm is
      supported, and 4m_1b (labeled "4mb" within the framework) is not yet
      integrated into the test suite.

commit 4156c0880d9aea4ff04a9c4fa139ba8c437d8bfb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 9 16:03:14 2014 -0600

    Fixed obscure level-2 packing / general stride bug.
    
    Details:
    - Fixed a bug in certain structured level-2 operations that manifested
      only when the structured matrix was provided to BLIS as matrix stored
      with general stride. The bug was introduced in c472993b when the
      densify field was removed from the packm control tree node and
      associated APIs. Since then, the packed object was unconditionally
      marked with an uplo field of BLIS_DENSE. This is fine for level-3
      operations where micro-panels are always densified, but in level-2
      contexts, the underlying unblocked variant (fused or unfused) of
      structured operations (e.g. trmv) still needs to know whether to
      execute its "lower" or "upper" branches of code. Since this field
      was unconditionally being set to BLIS_DENSE, the unblocked variants
      were always executed the "else" branch, which happened to be the
      "lower" case code. Thus, running an upper case produced the wrong
      answer. This most obviously manifested in the form of failures for
      trmm, trmm3, and trsm in the test suite.
      The bug was fixed by setting the packed object's uplo field to
      BLIS_DENSE only if the schema indicated that micro-panels were to be
      packed. Otherwise, we can assume we are packing to regular row or
      column storage, as is the case with level-2 packing. Thanks to
      Francisco Igual for reporting the testsuite failures and ultimately
      leading us to this bug.

commit 689f60a578b461119e9ea90c74f642b9eb79addb
Merge: bef24e67 483e4d6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Dec 7 14:03:30 2014 -0600

    Merge pull request #21 from figual/master
    
    Adding armv8a configuration and micro-kernels.

commit 483e4d6a3fdbef9d9ab47fb674c9476c70ca9f0f
Author: Francisco D. Igual <figual@ucm.es>
Date:   Sun Dec 7 20:27:49 2014 +0100

    Adding armv8a configuration and micro-kernels.
    
    Only sgemm micro-kernel is fully functional at this point.

commit bef24e67e0f93579c2a80315348dc2e227f72a72
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Nov 26 18:00:56 2014 -0600

    Fixed a type of race condition exposed by pthreads implementation.
    Lead thread of the inner thread communicator could exit subproblem, move on the next iteration of the loop and modify a1_pack, b1_pack, or c1_pack while other threads were still using those.
    
    Barriers were inserted to fix this.

commit 76bde44411f0e34266bab9d666a54ef22be97320
Merge: e56e6143 f3d729e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 26 17:25:24 2014 -0600

    Merge branch 'master' of github.com:flame/blis

commit f3d729e504ec012e7dc7e02b2ecd42e004c6894d
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Wed Nov 26 22:25:24 2014 -0600

    Added static mutex to bli_init and bli_finalize

commit d71cc797866ff502ad1127527016f463267eef80
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date:   Wed Nov 26 21:35:39 2014 -0600

    Refactored bli_threading files and added support for pthreads

commit e56e61438ff7fcf25a48c0b7603f18df782b50b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 26 17:20:35 2014 -0600

    Minor cleanups to bli_threading.h and friends.
    
    Details:
    - No longer need to define BLIS_ENABLE_MULTITHREADING manually in
      bli_config.h; it now gets defined when BLIS_ENABLE_OPENMP or
      BLIS_ENABLE_PTHREADS is defined.
    - Added sanity check to prevent both BLIS__ENABLE_OPENMP and
      BLIS_ENABLE_PTHREADS from being enabled simultaneously.
    - Reorganization of bli_threading*.h header files, which led to
      simplification of threading-related part of blis.h.
    - added "-fopenmp -lpthread" to LDFLAGS of sandybridge make_defs.mk
      file.

commit 3be2744cbe2c56d38c23fd818aa5c1f10cc7ea51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 21 12:28:08 2014 -0600

    Update to template gemm ukernel comments.
    
    Details:
    - Updated comments on alignment of a1 and b1 to match wiki.

commit 994429c6881b2ade92d9d7949bcaebfbf2cc65eb
Merge: 58796abd 694029d9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 20 13:55:35 2014 -0600

    Merge pull request #20 from TimmyLiu/master
    
    #define PASTEF773 required by cblas compatibility layer

commit 694029d9d7db857d642ab536955c0621791108c8
Author: Timmy <timmy.liu@amd.com>
Date:   Wed Nov 19 15:25:14 2014 -0600

    #define PASTEF773 required by cblas compatiility layer

commit 58796abda66b133346f8d523b39178afc336351f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 6 14:31:52 2014 -0600

    Removed KC constraint comments from _kernel.h files.
    
    Details:
    - Since 4674ca8c, the constraint that KC be a multiple of both MR and
      NR have been relaxed, and thus it was time to remove the comments
      from the top of the bli_kernel.h files of all configurations.

commit 7bbc95a54f706d43c7f7951f0e5995f86130cd52
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 29 10:52:23 2014 -0500

    Added new piledriver micro-kernels.
    
    Details:
    - Added new micro-kernels for the AMD piledriver architecture (one
      for each datatype).
    - Updates and tweaks to piledriver configuration.
    - Added 3xk packm micro-kernel support.
    - Explicitly unrolled some of the smaller packm micro-kernels.
    - Added notes to avx/sandybridge and piledriver micro-kernel files
      acknowledging the influence of the corresponding kernel code in
      OpenBLAS.

commit 59613f1d5500f6279963327db2fbc84bc9135183
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 23 17:21:37 2014 -0500

    Added separeate micro-panel alignment for A and B.
    
    Details:
    - Changed the recently-added micro-panel alignment macros so that we now
      have two sets--one for micro-panels of matrix A and one for micro-
      panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
    - Store each set of alignment values into a separate blksz_t object in
      bli_gemm_cntl_init().
    - Adjusted packm_init() to use the separate alignment values.
    - Added query routines for the new alignment values to bli_info.c.
    - Modified test suite output accordingly.

commit a8e12884ee1fddd3fd77ca5a68aa0cb857f3af57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 23 11:35:48 2014 -0500

    CHANGELOG update (0.1.6)

commit 38ea5022e4ed846112198c4e1672fcdaeb90dc71
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 23 11:35:45 2014 -0500

    Version file update (0.1.6)

commit a3e6341bdb0e28411f935d6b4708a6389663e004
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 23 11:13:28 2014 -0500

    Factored common code from blocksize functions.
    
    Details:
    - Split bli_determine_blocksize_[fb]() into two functions each, the
      newer ones ending with the _sub suffix. These new sub-functions are
      now called from bli_[gemm|trmm|trsm]_determine_kc_[fb](), which
      eliminates redundant code and will allow any future tweaks to the
      core sub-functions to automatically be inherited by the operation-
      specific versions.

commit 4674ca8cffb58331ff7edf23bbe0e3f6a7558489
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 23 10:50:59 2014 -0500

    Extended newly relaxed KC to hemm, symm.
    
    Details:
    - These changes were intended for the previous commit.
    - Defined bli_gemm_determine_kc_[fb]() and bli_gemm_determine_kc_[fb](),
      which determine blocksizes for gemm-based operations, taking special
      care to "nudge" the kc dimension up to a multiple of MR or NR for
      hemm and symm operations, as needed.
    - Changed bli_gemm_blk_var3f.c to call bli_gemm_determine_kc_f().
      instead of bli_determine_blocksize_f().
    - Comment updates to bli_trmm_blocksize.c, bli_trsm_blocksize.c.

commit ab954ba6f874eaca7b001804491f866ef6b9b327
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 22 17:21:58 2014 -0500

    Relaxed constraint that KC be multiple of MR, NR.
    
    Details:
    - Relaxed a long-held requirement in register blocksizes that required
      the kernel programmer to choose a KC that was divisible by both MR
      and NR. This was very constraining on some architectures that did not
      use register blocksizes that were powers of two. The constraint is
      now enforced only for trmm and trsm, where it is needed, and it is
      now handled by "nudging" kc upward at runtime, if necessary, to be a
      multiple of MR or NR, as needed.
    - Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](),
      which determine blocksizes for trmm and trsm, taking special care to
      "nudge" the kc dimension up to a multiple of MR or NR, as needed.
    - Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]()
      instead of bli_determine_blocksize_[fb]().
    - Added safeguard to bli_align_dim_to_mult() that returns the dimension
      unmodified if the dimension multiple is zero (to avoid division by
      zero).
    - Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from
      bli_kernel_macro_defs.h.
    - Whitespace, variable name changes to bli_blocksize.c.
    - Removed old commented code from bli_gemm_cntl.c.

commit 95cdae65d6b88e043ee14bcd53cd2e800d7aecb4
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Oct 22 16:30:16 2014 -0500

    Fixed bug in KNC microkernel where k=0 and beta != 1

commit e64dba5633fc49b768b5edc7762f2b5d8a4d0588
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 20 19:23:06 2014 -0500

    Re-implemented micro-panel alignment.
    
    Details:
    - This commit re-implements a feature that was removed in commit
      c2b2ab62. It was removed because, at the time, I wasn't sure how the
      micro-panel alignment feature would interact with the 4m method (when
      applied at the micro-kernrel level), and so it seemed safer to disable
      the feature entirely rather than allow possible breakage. This commit
      revisits the issue and safely re-implements the feature in a way that
      is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
    - Modified the static memory pool to account for micro-panel alignment
      space.
    - Modified packm_init and blocked variants to align whole micro-panels
      by a datatype-specific alignment value that may be set by the
      configuration. (If it is not set by the configuration, it will default
      to BLIS_SIZEOF_?.)
    - Modified macro-kernels so that:
      - storage stride is handled properly given the new micro-panel
        alignment behavior;
      - indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
        trsm, is more robust (e.g. will work if the applicable packing
        register blocksize is odd);
      - imaginary strides are computed and stored within auxinfo_t structs,
        which allows the virtual micro-kernels to more easily determine how
        to index into the micro-panel operands.
    - Modified virtual 3m and 4m micro-kernels to use the imaginary strides
      within the auxinfo_t structs instead of panel strides.
    - Deprecated the panel stride fields from the auxinfo_t structs.
    - Updated test suite to print out the micro-panel alignment values.

commit add16b0e5402924301e7078e4ca5e3ef725bff0b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 17 11:49:24 2014 -0500

    Added 3m4m test driver subdir of 'test'.
    
    Details:
    - Added a modified test driver for [cz]gemm that will test all 3m/4m
      as well as assembly-based and OpenBLAS implementations of gemm
      in single and multithreaded modes.

commit e171504a72406c61a173241d8bccf0a5ceb10582
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 17 11:25:59 2014 -0500

    Use correct definition of bli_is_last_iter().
    
    Details:
    - As intended for previous commit, the new definition of
      bli_is_last_iter() is now disabled in favor of the old
      definition.

commit 0d954087b2b55d2f5f3c5e57d702b318ca2300f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 17 11:19:34 2014 -0500

    Minor changes and fixes.
    
    Details:
    - Redefined bli_is_last_iter() to take thread_id and num_thread
      arguments, which allows the macro to correctly compute whether a
      given iteration is the last that the thread will compute in that
      particular loop. The new definition, however, remains disabled
      (commented out) until someone can look at this more closely, as
      the new definition seems to actually hurt performance slightly.
    - Whitespace and related updates to level-3 macro-kernels.
    - Updated test suite so that performance results in the hundreds of
      gigaflops does not disrupt the column alignment of the output.

commit d1e86e1876e433f54b501ec5a005b4ba7c5ce4e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Oct 12 13:43:47 2014 -0500

    More minor tweaks to sandybridge/avx micro-kernel.
    
    Details:
    - Re-enabled use of b_next for dgemm and cgemm micro-kernels.

commit 7b6fe4cae57cb22c09c1a97595e1a201a02cbcd2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Oct 12 12:01:51 2014 -0500

    Minor tweaks to sandybridge/avx micro-kernels.
    
    Details:
    - Changed the MC blocksize for zgemm micro-kernel from 128 to 64.
    - Removed usage of b_next in all x86_64/avx gemm micro-kernels.

commit a6a156e9feec47154e7a0fd43bcc006b1fc04aba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 10 14:26:41 2014 -0500

    Added cgemm ukernel for avx/sandybridge.
    
    Details:
    - Implemented AVX-based cgemm micro-kernel (via GNU extended inline
      assembly syntax).
    - Updated sandybridge configuration accordingly.

commit 6f8575ab2580e167a022293b76ddf0514f71b613
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 10 10:01:45 2014 -0500

    Added zgemm ukernel for avx/sandybridge.
    
    Details:
    - Implemented AVX-based zgemm micro-kernel (via GNU extended inline
      assembly syntax).
    - Updated sandybridge configuration accordingly.

commit 23ce7ee542a12ca40b4b6090ad2558d180e16d37
Merge: 99fd9a39 7a8ad47f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 9 16:41:22 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit 99fd9a39718cb7281f6fb23f9fef7cca4fe514f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 9 16:38:04 2014 -0500

    Fixed two minor bugs.
    
    Details:
    - Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
      modules whereby the uplo bits of some packed matrix objects were not
      being set properly, resulting in false FAILURE results for those
      tests. Thanks to Tyler Smith for bringing this issue to my attention.
    - Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
      "not yet implemented" abort() when creating a 1x1 object with non-unit
      strides.

commit 7a8ad47fb2d100a9da93aa8cab774fcceeaab733
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Oct 8 15:52:13 2014 -0500

    Minor changes to knc configuration, including preference row major storage
    Also fixed a bug in the knc micro-kernel where it would fail if k == 0

commit 76b7c34af0c09f47d9615b18857a356acddc788a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 2 14:15:38 2014 -0500

    Fixed a bug in the pack schema-related bit macros.
    
    Details:
    - Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
      include all six bits presently used in the pack schema bitfield of
      the info field of obj_t structs. Prior to this commit, the macro
      constant only included the lowest five bits, which excluded the
      "is or is not packed" bit. This manifested as a strange bug in
      probably many level-2 codes that invoked packing, though we only
      observed it in ger before fixing. Thanks to Devin Matthews for
      finding and reporting this bug.

commit a5763e332226598d70c47dfa9cad4578e15ef5f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 2 13:28:17 2014 -0500

    Added extra output to bli_obj_print().
    
    Details:
    - Print extra values from info field of obj_t struct within
      bli_obj_print().

commit 9bba209fc44fbfce943ba6a51cd8278a0cb6b159
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Sep 29 14:56:36 2014 -0500

    Fixed bug when packing anywhere besides in blk_var_1 for gemm.

commit 614a4afc9272adb47e5a8b83b39d56c2804d95d6
Merge: b541b667 4a7df04e
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Sep 26 10:49:57 2014 -0500

    Merge branch 'master' of http://github.com/flame/blis

commit 4a7df04e8a4ffdb9561d26426afd35e4fe15b013
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 22 16:06:15 2014 -0500

    Added 30xk support for packm ukernels.
    
    Details:
    - Updated bli_kernel_*_macro_defs.h headers to include default
      definitions for 30xk packm kernels.
    - Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
      included 30xk kernels.
    - Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.

commit b6d4bd792e0d44ce4b28afef343f5ff3ba89c285
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 22 16:02:37 2014 -0500

    Fixed missing tabs from Makefile patch.

commit 32630f9b6f0d5ba28d5b56dae4c7288a37158743
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 19 17:18:20 2014 -0500

    Comment update to virtual micro-kernels.

commit 13447cffead7c6d137a7a3ccbf9e552ed0477467
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 19 13:00:48 2014 -0500

    Minor bugfix to top-level Makefile.
    
    Details:
    - Applied a patch that allows the top-level Makefile to work on certain
      systems. The patch simply separates out the source-to-object code
      generation rules for .c and .S files into two separate rules. Thanks
      to Devin Matthews for submitting this patch.

commit e80a4537846416719c067ae08a53aeda978c572d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 18 10:24:20 2014 -0500

    Fixed bug introduced by bugfix in 25b258d.
    
    Details:
    - We actually need to check alignment of lda*sizeof(double) and NOT
      a+lda because in the latter case, alignment could cancel out and
      still allow the optimized code to run when it shouldn't. Thanks
      to Devin for pointing this out.

commit 25b258d61f9c8cee64e922f4131784b6edb196dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 18 10:10:49 2014 -0500

    Fixed a non-fatal problem with bugfix in a68b316c.
    
    Details:
    - The bugfix in a68b316c was inadvertantly checkin alignment of the
      leading dimension itself, rather than the byte size of the leading
      dimension. Now, we simply check alignment of a+lda.

commit 96302d4fc81363410e41c3a3c43a65df44d97ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 18 09:43:40 2014 -0500

    Renamed bli_info_get_*_ukr_type() functions.
    
    Details:
    - Added _string() suffix to bli_info_get_*_ukr_type() function names.
      This makes them consistent with the bli_info_get_*_impl_string()
      functions.

commit a68b316ca4852509f84ed50e01afac486bf70f58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 17 11:10:07 2014 -0500

    Fixed alignment bugs in level-1f kernels.
    
    Details:
    - Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
      were attempting to compute problems with unaligned leading dimensions
      with optimized code, rather than (correctly) using the reference
      implementations. Thanks to Devin Matthews for reporting this bug.

commit 870761eb902e4866090d1d3446a345df3d6d4599
Merge: e9899be0 a2b59a37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 16 18:20:49 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit e9899be09044829e23386bd73e394f1dd7778210
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 16 18:19:32 2014 -0500

    Added high-level implementations of 4m, 3m.
    
    Details:
    - Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
      high levels, respectively. APIs for trmm and trsm were NOT added due
      to the fact that these approaches are inherently incompatible with
      implementing 4m or 3m at high levels (because the input right-hand
      side matrix is overwritten).
    - Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
      3m so that all are stylistically consistent.
    - Added new "rih" packing kernels (both low-level and structure-aware)
      to support both 4mh and 3mh.
    - Defined new pack_t schemas to support real-only, imaginary-only, and
      real+imaginary packing formats.
    - Added various level0 scalar macros to support the rih packm kernels.
    - Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
    - Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
      level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
      that order) and execute the first one that is enabled, or the native
      implementation if none are enabled.
    - Added implementation query functions for each level-3 operation so
      that the user can query a string that describes the implementation
      that is currently enabled.
    - Updated test suite to output implementation types for reach level-3
      operation, as well as micro-kernel types for each of the five micro-
      kernels.
    - Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
    - Fixed an obscure bug when packing Hermitian matrices (regular packing
      type) whereby the diagonal elements of the packed micro-panels could
      get tainted if the source matrix's imaginary diagonal part contained
      garbage.

commit a2b59a37f166f70a6dd5793db2530823ef590c2b
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Sep 15 10:44:44 2014 -0500

    Fixed make defs so that they actually compile for bulldozer

commit 86fc7e40764f78ec217f50216ef4fa5b57dbfbc7
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Sep 15 10:35:46 2014 -0500

    Added bulldozer configuration and updated piledriver micro-kernel

commit 0644e61a79a57f136be5f4c47b9099cff2af06e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 11 12:55:34 2014 -0500

    Minor updates to bli_packm_init.c.

commit 9dc9b44a057a08e20ad4d423344f0ecad54c1eb2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 11 12:03:28 2014 -0500

    Renamed bli_obj_pack_status() to _pack_schema().
    
    Details:
    - Renamed the bli_obj_pack_status() macro to bli_obj_pack_schema() in
      order to help avoid confusion as to what the macro returns.

commit cf5efdde0588a0d5b6ea57fe7d7be5000be06f8e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Sep 11 11:47:56 2014 -0500

    Pass pack_t schemas into ukernels via auxinfo_t.
    
    Details:
    - Modified macro-kernels to pass the pack_t schema values for matrices
      A and B into the datatype-specific functions, where they are now
      inserted into a newly-expanded auxinfo_t struct. This gives gives the
      micro-kernels access to the pack_t schema values embedded in the
      control trees, which determine the precise format into which the
      matrix elements are packed.
    - Updated a call to bli_packm_init_pack() in src/test_libblis.c to
      remove densify argument. Meant to include this in commit c472993b.

commit cc8d2b82775cca3c2d51bf427f4e77c8024a6d15
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 9 13:48:22 2014 -0500

    Updated old test drivers in 'test'.

commit c472993bbccb69e9ffc409c79b742426c8ad2ad4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 9 13:42:04 2014 -0500

    Removed densify argument to packm_cntl_obj_create().
    
    Details:
    - Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
      This argument was inserted very early in BLIS's development, when it
      was anticipated that the developer may sometimes wish to pack a
      Hermitian, symmetric, or triangular matrix without making it dense.
      But as it turns out, if we are packing a matrix, we always want to
      make it dense in some way or another due to the fact that the micro-
      kernel only multiplies dense micro-panels. Thus, unless/until there
      is a real need for the feature, it seems reasonable to remove it from
      the packm_cntl API.

commit 5c43ee387146cd76dc59b730dac6683a8446b834
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 8 15:19:29 2014 -0500

    Moved trmm4m/3m_cntl files to 'old' directory.
    
    Details:
    - Meant to include this in previous commit.

commit 7b2f469d5465ed73b1ca88124bc9a1987388aa27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 8 14:49:50 2014 -0500

    Retired trmm_t control tree definitions, usage.
    
    Details:
    - Replaced all trmm_t control tree instances and usage with that of
      gemm_t. This change is similar to the recent retirement of the herk_t
      control tree.
    - Tweaked packm blocked variants so that the triangular code does NOT
      assume that k is a multiple of MR (when A is triangular) or NR (when
      B is triangular). This means that bottom-right micro-panels packed for
      trmm will have different zero-padding when k is not already a multiple
      of the relevant register blocksize. While this creates a seemingly
      arbitrary and unnecessary distinction between trmm and trsm packing,
      it actually allows trmm to be handled with one control tree, instead
      of one for left and one for right side cases. Furthermore, since only
      one tree is required, it can now be handled by the gemm tree, and thus
      the trmm control tree definitions can be disposed of entirely.
    - Tweaked trmm macro-kernels so that they do NOT inflate k up to a
      multiple of MR (when A is triangular) or NR (when B is triangular).
    - Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
      of which are to facilitate above-mentioned changes whereby k is no
      longer required to be a multiple of register blocksize when packing
      triangular micro-panels.
    - Adjusted trmm3 according to above changes.
    - Retired trmm_t control tree creation/initialization functions.

commit 576e9e9255a79dba9cd3c804267f51e0b4aa6e8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Sep 7 16:12:52 2014 -0500

    Retired herk_t control tree definitions, usage.
    
    Details:
    - Replaced all herk_t control tree instances and usage with that of
      gemm_t, since the two types presently have the same fields. This means
      that herk, her2k, syrk, and syr2k can simply use the gemm control tree
      as-is, just as hemm and symm have been doing for some time now.
    - Retired herk_t control tree creation/initialization functions.
    - Retired many _target.c and .h files into 'old' directories.

commit b2fed052c9a23d858ef0afbe220b342bce9aa7f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 3 17:07:25 2014 -0500

    Minor code cleanup to bli_packm_struc_cxk*.c
    
    Details:
    - Realized that we don't need to track rs_p11 and cs_p11 for
      Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
      equal to rs_p and cs_p.

commit 023ce770966b3b5a98bba729c5af1f45e15ebb97
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 3 10:47:53 2014 -0500

    Minor update to packm_cxk kernels.
    
    Details:
    - Changed m and n dimension parameter names to panel_dim and panel_len,
      respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
      functions. This makes the code a little easier to read since "m" and
      "n" have connotations that are not applicable here.
    - Comment updates.

commit 189def3667d9218adbeec45e2801fd074341a679
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 1 16:23:17 2014 -0500

    Retired portions of bli_kernel_3m/4m_macro_defs.h.
    
    Details:
    - Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
      4m/3m-specific blocksizes after realizing that this can be done in
      bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
      are used.
    - The maximum cache values for 4m/3m are stll needed when computing mem
      pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
      definitions in terms of the regular cache blocksizes are now in place.
    - Similarly, the register blocksizes for 4m/3m are still needed in
      bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
      terms of the regular register blocksizes are now in place.

commit af521ee6f2a77d61c98b833e85c09969987bc00d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 1 14:06:46 2014 -0500

    Changed semantics of blocksize extensions.
    
    Details:
    - Changed semantics of cache and register blocksize extensions so that
      the extended values are tracked, rather than just the marginal
      extensions.
    - BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
    - BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
    - bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
      that these "max" query routines grab the maximum value for cache
      blocksizes and the packdim value for register blocksizes.
    - bli_info_*() API has been updated accordingly.
    - All configurations have been updated accordingly.

commit 07f23aefd52f5ba4960dbd46e59b180a2136b8e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 31 11:58:50 2014 -0500

    Pass pack schema into packm_struc_cxk*().
    
    Details:
    - Changed the interface to the packm_struc_cxk*() kernels to include
      the pack_t schema. This allows the implementation to more easily
      determine how the micro-panel is stored (row-stored column panel
      or column-stored row panel).
    - Updated packm blocked variants to pass in the schema.
    - Updated packm_ker_t function pointer definition accordingly.

commit f032ba9b1186cb02184574d339565f53d733aa42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 30 16:21:20 2014 -0500

    Reorganized packm implementation.
    
    Details:
    - Reorganized packm variants and structure-aware kernels so that all
      routines for a given pack format (4m, 3m, regular) reside in a single
      file.
    - Renamed _blk_var4 to _blk_var2 and generalized so that it will work
      for
      both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
    - Added a new packm_ker_t function pointer type to
      bli_kernel_type_defs.h
      to facilitate function pointer typecasting in the datatype-specific
      packm_blk_var2() functions.
    - Deprecated _blk_var3.
    - Fixed a bug in the triangular micro-panel packing facility that
      affected trmm and trmm3 with unit diagonals.

commit c6793cecb70788bdf2c76ab8102504ea97be9d2a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 28 17:14:48 2014 -0500

    Reorganized #includes for scalar macro headers.
    
    Details:
    - Reordered the #include statements in bli_scalar_macro_defs.h so that
      conventional, ri-, and ri3-based macros are grouped together.
    - Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.

commit b4da8907284345be4374f87a88679c4886ab866e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 28 14:10:32 2014 -0500

    Whitespace, comments updates on packm_blk_var?.c.

commit 46e46a1d83da586c3dd9fd7a01eb16067abbaee1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 28 12:05:45 2014 -0500

    Minor updates to packm blocked, cxk_3m/4m code.
    
    Details:
    - Added 'const' qualifier to inlined packing code that handles
      micro-panel packing that is too large for an existing packm ukernel.
    - Comment updates.

commit 908dc688b5979995eaacb3aa937f241551a8df00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 28 11:55:12 2014 -0500

    Pass pack schema into blocked packm routines.
    
    Details:
    - Rather than passing the packm blocked routines a boolean value that
      represents whether the matrix is being packed to row or column storage,
      we now pass in the pack schema itself.

commit a0ff6066e06075ab5f92b19247b39b92ed15f1bf
Merge: c4c99c48 d40b32bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 24 15:56:21 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit c4c99c4813bf9817592a7899c5d33412fe22313f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 24 15:52:22 2014 -0500

    Renamed packm scalar from beta to kappa.
    
    Details:
    - The packm implementation (i.e. sources files in frame/1m/packm and
      frame/1m/packm/ukernels), interchangeably used the names "beta" and
      "kappa" to refer to the optional scalar to be applied during packing.
      This commit renames all uses of "beta" to be "kappa", since "beta"
      sometimes evokes the scalar specifically on the output matrix of a
      level-2 or level-3 operation.

commit d40b32bc24ffbae24123e054307b3138969bb095
Merge: 9331f794 6c25c379
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 24 13:46:36 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit 6c25c379fadb50834146e1614f7b80c093c2aad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 24 13:44:10 2014 -0500

    Consolidated unpackm ukernels into single file.
    
    Details:
    - Reorganized unpackm ukernels into a single file,
      bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
      ukernels in commit 4cc2b46.

commit 9331f79443223fe267676ee54c439e1ed320380c
Merge: 7fc48a7d 670b6392
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 24 10:54:21 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit 670b63926a7f4fc694abc5b1582ef8a4f367f5a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 24 10:46:27 2014 -0500

    Added whitespace to bli_obj_scalar_ routine calls.
    
    Details:
    - Added extra spaces to align arguments of
      bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
      the fact that the function was previously named
      bli_obj_init_scalar_copy_of() and the name change, performed in
      b444489f, was done via recursive sed commands which left subsequent
      lines untouched.

commit 7fc48a7d920e07fd8e9528ab2565123f8f4e67f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 23 16:50:58 2014 -0500

    Combined 4m/3m bits into an expanded bitfield.
    
    Details:
    - Combined the 4m/3m bits into an expanded bitfield, which will encode
      the packing "format" of the micro-panels. This will allow for more
      easily and compactly encoding additional formats.
    - Other minor comment/whitespace updates to bli_type_defs.h.
    - Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
      format bitfield.
    - Comment update to bli_kernel_post_macro_defs.h.
    - Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.

commit ef0143cc1417e4815e4cafd5a464cc83fe7a1e86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Aug 23 14:02:27 2014 -0500

    Renamed _ri, _ri3 packm ukernels to _4m, _3m.
    
    Details:
    - Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
      helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
    - Updated names of cpp macros that correspond to packm ukernels.

commit b0ccac116158b5ed3316d34798748ba0c6d78672
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 21 19:21:52 2014 -0500

    Cleaned up front-end layering for 4m/3m.
    
    Details:
    - Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
      and bli_gemm4m_entry()) to hide the control trees from the code that
      decides whether to execute native or 4m-based implementations. The
      layering was also applied to 3m.
    - Branch to 4m code based on the return value of bli_4m_is_enabled(),
      rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
      the groundwork for users to be able to change at runtime which
      implementation is called by the main front-ends (e.g. bli_gemm()).
    - Retired some experimental gemm code that hadn't been touched in
      months.

commit bedec95451cabfa7a8906b51018a5e0572998a5e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 21 18:25:48 2014 -0500

    Added bli_4m API for querying 4m enabled state.
    
    Details:
    - Added bli_4m.c (and header), which defines a simple API that can be
      used to query, enable, and disable 4m-based complex support in BLIS.
      The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
      the variable that determines the state (enabled or disabled).
    - Changed bli_info*() API so that all cache and register blocksize-
      related query routines return the blksz_t objects' values as they
      exist at runtime, rather than return the values as determined by the
      configuration system (e.g. bli_kernel.h, or defaults for those values
      not specified). This sets the foundation for being able to change
      those blocksizes at runtime.

commit b541b667cabfa6d41b50ad1e49209651ee6812cc
Merge: 699a8151 dd61307f
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Aug 20 14:44:51 2014 -0500

    Merge branch 'master' of http://github.com/flame/blis
    
    Conflicts:
            frame/3/trsm/bli_trsm_blk_var2b.c
            frame/3/trsm/bli_trsm_blk_var2f.c

commit 699a8151ca3d5021e834a1784ef45dcc3a3d17cd
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Aug 20 14:43:17 2014 -0500

    Some improvements to trsm parallelism

commit dd61307f55bb6bc762fe0ef0446479d6c0536723
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 20 09:52:16 2014 -0500

    Minor update to sandybridge MC_S, KC_S.
    
    Details:
    - Changed sandybridge MC and KC for single-precision real to 128 and 384,
      respectively.
    - Updated comments in template configuration's gemm micro-kernel file
      to document the new "contiguous row preference" macro.

commit d0eec4bddd740ce360d0f655362c551287cf925b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 19 15:49:19 2014 -0500

    Added optional row preference to ukernel config.
    
    Details:
    - Added the ability for the kernel developer to indicate the gemm micro-
      kernel as having a preference for accessing the micro-tile of C via
      contiguous rows (as opposed to contiguous columns). This property may
      be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
      which may be defined or left undefined. Leaving it undefined leads to
      the default assumption of column preference.
    - Changed conditionals in frame/3/*/*_front.c that induce transposition
      of the operation so that the transposition is induced only if there
      is disagreement between the storage of C and the preference of the
      micro-kernel. Previously, the only conditional that needed to be met
      was that C was row-stored, which is to say that we assumed the micro-
      kernel preferred column-contiguous access on C.
    - Added a "prefers_contig_rows" property to func_t objects, and updated
      calls to bli_func_obj_create() in _cntl.c files in order to support
      the above changes.
    - Removed the row-storage optimization from bli_trsm_front.c because
      it is actually ineffective. This is because the right-side case of
      trsm flips the A and B micro-panel operands (since BLIS only requires
      left-side gemmtrsm/trsm kernels), meaning any transposition done
      at the high level is then undone at the low level.
    - Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
      invocation of the bli_obj_swap() macro.

commit 4cc2b464f29cafbfef9295b073b857fe0752f710
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Aug 15 11:49:15 2014 -0500

    Reorganized packm ukernels.
    
    Details:
    - Previously, packm micro-kernels were organized by the implied register
      blocksize (panel dimension) assumed by the kernel, meaning conventional,
      ri, and ri3 variations of some micro-kernel size were housed in the same
      file. This commit reorganizes the micro-kernels so that all sizes reside
      in the same file for each format type (conventional, ri, and ri3).

commit fcc10054a11b6fc3976986f57feccf741596cbf6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 13 12:32:06 2014 -0500

    Tweaks to gemm4m, gemm3m virtual ukernels.
    
    Details:
    - Fixed a potential, but as-yet unobserved bug in gemm3m that would
      allow undesirable inf/NaN propogation, since C was being scaled by
      beta even if it was equal to zero.
    - In gemm3m micro-kernel, we now avoid copying C to the temporary
      micro-tile if beta is zero.
    - Rearranged computation in gemm4m so that the temporary C micro-tile
      is accessed less, and C is accessed only after the micro-kernel
      calls. This improves performance marginally in most situations.
    - Comment updates to both gemm4m and gemm3m micro-kernels.

commit cdcbacc2fa871317c8e7ef961ecc6d70ab22dc34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 12 12:45:38 2014 -0500

    Removed redundant redef of packm ukr prototypes.
    
    Details:
    - Removed redundant macro code that redefined packm ukernel prototypes
      when the previous macro was already sufficient. This helps de-clutter
      the packm ukernel prototyping headers a little bit.

commit 82dac98d9032ccb598068a55ddf23d7898491e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 12 12:36:25 2014 -0500

    Relocated packm ukernel #includes.
    
    Details:
    - Consolidated the #include statements for packm ukernel headers from
      bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
      bli_packm.h.
    - Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.

commit 7f77856e25aad5fc6f172ed3e57b6351804e31a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 12 12:20:15 2014 -0500

    Removed unused 4m/3m-related packm macro defs.
    
    Details:
    - Removed unused and unneeded s- and d-flavored macro definitions for
      packm ukernels related to the complex 4m and 3m methods, as
      implemented in BLIS.

commit bc1d86b2d4d436b1dfba2d0098501aaca9cbb8b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 7 19:01:20 2014 -0500

    Sandy Bridge configuration, micro-kernel update.
    
    Details:
    - Minor updates to bli_config and bli_kernel.h for sandybridge
      configuration.
    - Renamed existing AVX intrinsic-based micro-kernel file to
      bli_gemm_int_d8x4.c.
    - Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
      gemm micro-kernels for single- and double-precision real.

commit 98ec95877a95242e159b2bf0c879115a59e4c6e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 7 18:28:32 2014 -0500

    Corrected comment for _obj_is_[row|col]_stored().
    
    Details:
    - Fixed a mistake in the comments introduced in the previous commit for
      bli_obj_is_row_stored() and bli_obj_is_col_stored().

commit 43d5e419e1b424d2143817103dbee8ead797e8aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 7 18:20:40 2014 -0500

    Reverted _obj_is_[row|col]_stored() macros.
    
    Details:
    - Rolled back recent changes to bli_obj_is_row_stored() and
      bli_obj_is_col_stored() so that those macros now only inspect the
      strides (row or column). It turns out that the more sophisticated
      definitions introduced in a51e32e are not necessary, because these
      "obj" macros are virtually never used on packed matrices, and when
      they are, they can use bli_obj_is_[row|col}_packed() macros, which
      inspect the info bitfield.

commit 45692e3ad4b7e1d05ac4302398df4efce04b4284
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 7 13:21:15 2014 -0500

    Reverted some accidental changes.
    
    Details:
    - Reverted some changes that were unintentionally included in the
      previous commit (9526ce98). Thanks to Tony Kelman for pointing
      this out. (Note: a few select changes were not reverted.)

commit 9526ce98812be908bc4915f2849b657fb6ce1b49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 6 14:13:46 2014 -0500

    Updated copyright headers of emscripten configuration files.

commit 30833ed71d56f231ddba21e632bcbbc90b12a97c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 6 12:12:03 2014 -0500

    Minor edits to configurations' make_defs.mk files.
    
    Details:
    - Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
      is defined first and then the other two are defined in terms of
      CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
      little easier to read.

commit 9d61afeae2ba70fe1df07e7546f6954ea83aed12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 4 16:01:59 2014 -0500

    CHANGELOG update (0.1.5)

commit bde56d0ecfd0ec20330fac290b91a6dca0cf94e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 4 16:01:58 2014 -0500

    Version file update (0.1.5)

commit 4c6ceea4be35d089630986eb5b959b9e97214077
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 4 15:49:59 2014 -0500

    Added CBLAS compatibility layer.
    
    Details:
    - Added a new section in bli_config.h files of all configurations for
      enabling CBLAS support. (Currently, the default is for the CBLAS layer
      to be disabled.)
    - Added a directory, frame/compat/cblas, to house CBLAS source code. A
      subdirectory 'f77_sub' holds subroutine wrappers corresponding to
      subroutines found in CBLAS that allow calling some BLAS routines with
      the return value passed as the last argument rather than as an actual
      (function) return value. This was probably intended to allow CBLAS to
      avoid the whole f2c debacle altogether. However, since BLIS does not
      assume the presence of a Fortran compiler, we had to provide similar
      routines in C.
    - A script, integrate-cblas-tarball.sh, is included to streamline the
      integration of future revisions of the CBLAS source code.
    - The current tarball, cblas.tgz, that was used with the above script to
      generate the present set of CBLAS source code is also included.
    - Updated blis.h to include necessary CBLAS-related headers.

commit caab62dac0fb0bd0d674118f409c81680db94d29
Merge: 383631b5 db97ce97
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Aug 3 14:36:18 2014 -0500

    Merge pull request #19 from kevinoid/fix-install-perms-error
    
    Fix permissions error installing to non-owned directory

commit db97ce979b88c051922c2f946ce52d523c7a12c6
Author: Kevin Locke <kevin@kevinlocke.name>
Date:   Sun Aug 3 12:48:04 2014 -0600

    Fix permissions error installing to non-owned directory
    
    When installing to a directory which is not owned by the installing
    user, even when the user has write permission for the directory, the
    installation can fail with an error similar to the following:
    
    Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
    install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
    Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
    make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1
    
    In the example case, the error occurred because the user attempted to
    install to /usr/local and /usr/local/lib is owned by root with mode 2755
    which the Makefile unsuccessfully attempted to change to 0755.
    
    Given that installing to /usr/local is likely to be quite common and the
    ownership/permissions are the default for Debian and Debian-derived
    Linux distributions (perhaps others as well), this commit attempts to
    support that use case by using mkdir rather than install to create the
    directory (which is the same approach as Automake).
    
    Signed-off-by: Kevin Locke <kevin@kevinlocke.name>

commit 383631b514c3d42b724640f57644eea276cc418c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 31 14:51:48 2014 -0500

    Redefined bit field macros with bitshift operator.
    
    Details:
    - Redefined many of the macros that define bit fields and bit values in
      the obj_t info field using the bitshift operator (<<). This makes it
      easier to reorder bit fields, or expand existing bit fields, or add
      new fields. The bitshifting should be evaluated by the compiler at
      compile-time.

commit 137143345dc93cc9a83da5ba88b25bac7502de86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 31 12:12:45 2014 -0500

    Reimplemented unit blocksize fix in prev commit.
    
    Details:
    - Instead of inferring the storage format of the micro-panels from within
      the packm variants, we now pass in a bool_t value that denotes whether
      the packed matrix contains row-stored column panels or column-stored
      row panels. This value can then be tested more easily inside the main
      packm variant loop.
    - Renumbered pack_t schema values in bli_type_defs.h so that there are
      now five bits, each with different meaning:
      - 4: packed or not packed?
      - 3: packed for 3m?
      - 2: packed for 4m?
      - 1: packed to panels?
      - 0: stored by rows or columns?
    - Added new macros that test for status of above bits in schema bit
      subfield, and renamed some existing macros related to 4m/3m.

commit a51e32ec061941cd10119ea80115c82a40b1673f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 30 10:41:48 2014 -0500

    Fixed unit register blocksize brokenness.
    
    Details:
    - Fixed a breakdown in BLIS's ability to differentiate between row-stored
      and column-stored micro-panels when MR or NR is unit. When either
      register blocksize (or both) is equal to one, inspecting the strides of
      the affected packed micro-panel is no longer sufficient to determine
      whether the micro-panel is a row-stored column panel or a column-stored
      row panel (because both strides are unit). At that point, dimension
      information is necessary when invoking the bli_is_row_stored_f() and
      bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
      Ilya Polkovnichenko for reporting this bug.
    - Added panel dimensions (m and n) to obj_t, which are set in
      packm_init() and then passed into the blocked variants to support the
      aforementioned update.

commit c2732272f0ac680a0ad19fa9db5d587398a1479a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 29 16:37:18 2014 -0500

    Removed old/unused packm variants.

commit b97fa9a5a70fe0123e5eebd999b947461d38445f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 27 18:54:09 2014 -0500

    Minor usage update to build/bump-version.sh.

commit b18ba5f62d98629cdd519ff4c96fc67ec1a62fb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 27 18:52:05 2014 -0500

    Added missing 'bla_' prefix to r_imag(), d_imag().
    
    Details:
    - Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
      Ali for pointing the mis-named functions.

commit af7a8e6c042cade452130a6729377f1a3ef4e19e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 27 18:20:13 2014 -0500

    CHANGELOG update (0.1.4)

commit a7537071b152ecff671f8716595d37dc09e4fd51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 27 18:20:12 2014 -0500

    Version file update (0.1.4)

commit acff74041bf02c7b9fdfa24b507bca782a4c5fce
Merge: cdb9413e 47b243ef
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Jul 23 15:07:30 2014 -0500

    Merge branch 'master' of https://github.com/flame/blis

commit cdb9413e140f8a198666250ec88fa34b5425a9c3
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Jul 23 15:05:15 2014 -0500

    Enabled threading for a couple more loops in TRSM
    
    JC loop is now enabled for the left-sided case
    IC loop is now enabled for the right-sided case

commit 47b243ef08f4101de3d936f2373343e67eaa4dd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 23 13:41:13 2014 -0500

    Call setid for early return from herk/her2k.
    
    Details:
    - Added setid call (to zero imaginary parts of diagonal elements) to
      early return branches of herk_front() and her2k_front() for cases
      where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
    - Comment update.

commit 3e7b0db5b0e24f5fd66c60bacabc019885ddbec5
Merge: 2f8a357d ed3e33d5
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Jul 23 13:40:44 2014 -0500

    Merge branch 'master' of https://github.com/flame/blis

commit 2f8a357de5fb55163a969d888cf059f24b78125c
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Jul 23 13:40:12 2014 -0500

    Some TRSM threading fixes/additions

commit ed3e33d548047be3283ff41268fdf716563bc542
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 22 14:40:43 2014 -0500

    Tweaked behavior of herk, her2k for BLAS compat.
    
    Details:
    - Updated herk_front() and her2k_front() to explicitly set the imaginary
      components of the diagonal entries of C to zero after the computation
      is complete. This is needed in case downstream applications read the
      full diagonal entries (i.e., including imaginary part), which could, in
      the absence of this modification, accumulate numerical error from
      subsequent rank-k/rank-2k updates.
    - Updated BLAS compatibility wrappers for herk and her2k to return early
      if:
        n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
      This also results in the imaginary components of diagonal entries NOT
      being set to zero (see above), which is consistent with BLAS.
    - Updated mkherm to use setid instead of an inlined loop over the
      diagonal.

commit ea59a5c93cde1467a3715abc53dda4aecf961873
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 22 14:36:02 2014 -0500

    Added new level-1d operation: setid.
    
    Details:
    - Defined a new level-1d operation, setid, which sets the imaginary
      elements of an object's diagonal to a single scalar. This can be
      useful, for example, when trying to make the diagonal of a Hermitian
      matrix real-valued.

commit 8965a965931318619ceaebd7c32edccf3022d0c7
Merge: 1785efb5 5b73e80b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 22 14:34:32 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit 1785efb5420bc7b9c850a068cb5d99837071e877
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 22 14:33:01 2014 -0500

    Minor improvements to invertd and setd.
    
    Details:
    - Added missing call to invertd_check() from front-end.
    - Changed setd front-end call of scald_check() to setd_check().

commit 5b73e80b71c054c1945a06aff044ef629bc1a9a0
Merge: a41e68e0 20690fe3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 18 12:21:20 2014 -0500

    Merge pull request #16 from Maratyszcza/emscripten
    
    Emscripten port

commit a41e68e09e73b999fab0bb430a43dccfc63aab45
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 17 13:25:56 2014 -0500

    Reimplemented BLIS initialization/finalization.
    
    Details:
    - Rewrote bli_init() and bli_finalize() with OpenMP critical sections
      for thread-safety. Also added lots of explanatory comments.
    - Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
      suffix, and reimplemented for simplicity. Updated all invocations
      in BLAS compatibility layer to use _auto() suffix.

commit 36358948ea75074bda32a9f8c008f835b87d21db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 17 10:58:10 2014 -0500

    Retired frame/3/gemm/other directory.
    
    Details:
    - Removed frame/3/gemm/other directory, which contained some outdated
      and/or experimental variants.

commit c73261f17edf589e76bdbe297702a1fbbd69275f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 14 16:23:51 2014 -0500

    More minor cleanups post-copyright update.

commit 2a09d24463d358be6243b24f112fad057c2aefe0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 14 16:17:09 2014 -0500

    Reverted power7 symlinks destroyed by sed script.
    
    Details:
    - Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
      after recursive-sed.sh mistakenly replaced them with copies of the
      actual files to which they referred. Meant to include this in previous
      commit.

commit 7ed415824d3b2e78541b6f64e404ca5347c06d3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 14 16:14:33 2014 -0500

    Updated copyright headers (continued).
    
    Details:
    - Inserted "at Austin" into third clause of license declarations.
      Meant to include this change in previous commit.

commit 5c2c6c85616834ff2716ece083118201d9df6dde
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 14 16:05:03 2014 -0500

    Updated copyright headers to contain "at Austin".
    
    Details:
    - Updated copyright headers to include "at Austin" in the name of the
      University of Texas.
    - Updated the copyright years of a few headers to 2014 (from 2011 and
      2012).

commit fcec68cda3f6e90ae055e7304e6674c1c5c8d010
Merge: 94c0df79 4a20ed1a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 14 11:35:34 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit 94c0df797eda377931f29a41ba6a89c0ed58daca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 14 11:24:36 2014 -0500

    Changed order of zero dim / error checking.
    
    Details:
    - Updated level-2 and level-3 internal back-ends so that the operation's
      _check() function is called BEFORE any attempt to return early due to
      the presence of zero dimensions. This ordering makes more sense because
      (for example) object dimensions should match even if one of them is
      zero. Previously, a dimension mismatch could result in an early return
      with no error message.
    - Updated bli_check_object_buffer() so that NULL buffers result in an
      error only if the object is dimensionally non-empty (i.e., only if both
      of the object's dimensions are non-zero). This allows BLIS operations
      to be performed on dimensionally empty objects (i.e., where at least one
      dimension is zero).
    - Updated the error message associated with bli_check_object_buffer()
      to mention the newly relaxed constraint mentioned above, vis-a-vis
      non-zero dimensions.

commit 20690fe3018ce17c8df61ce0bffecaa7911dc3a5
Author: Marat Dukhan <maratek@gmail.com>
Date:   Sun Jul 13 22:50:56 2014 -0700

    Emscripten port

commit 4a20ed1a3f5e9e5232df30aa0e568e6c00c56ce1
Merge: 6a515e98 8ccdfaef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 13 17:45:01 2014 -0500

    Merge pull request #14 from Maratyszcza/master
    
    Support "make test" for PNaCl configuration

commit 6a515e988f2ae1628258a6dec2c0e9cf2d04790f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 13 17:38:33 2014 -0500

    Implemented dsdot() and sdsdot() in compat layer.
    
    Details:
    - Replaced "not yet implemented" error messages in dsdot() and sdsdot()
      with actual implementations. (These routines are so rarely used that
      this log message will probably lead to some people learning of their
      existence for the first time.)

commit 255668ddd1004552c6cc65035ec6486671ce99bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 13 17:30:44 2014 -0500

    Inserted gemv beta-scaling bug into compat layer.
    
    Details:
    - BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
      y of non-zero length and a vector x of zero length results in no action.
      Given that the operation is y := beta*y + A*x, many (most?) individuals
      would expect vector y to still be scaled by beta. BLIS, when called
      natively, handles these cases intuitively (with beta scaling).
      Unfortunately, many BLAS test suites actually check for the way this
      situation is handled. Therefore, we have decided to implement this "bug"
      in the compatibility layer so as to provide "bug-for-bug" compatibility
      with BLAS.

commit 570a154581bdb353fa13a219c7cb3c81d3dceffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jul 12 17:51:05 2014 -0500

    Comment/formatting updates to build scripts.
    
    Details:
    - Minor updates to comments and formatting in bump-version.sh and
      update-version-file.sh scripts.

commit 26cd81990631ff799791629206e068126ff9e3a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 10 13:16:07 2014 -0500

    Added bli_info_*() query functions.
    
    Details:
    - Added a new API family, bli_info_*(), which can be used to query
      information about how BLIS was configured. Most of these values are
      returned as gint_t, with the exception of the version string which
      is char*.
    - Changed how the testsuite driver queries information about how BLIS
      was configured (from using macro constants directly to using the
      new bli_info API).
    - Removed bli_version.c and its header file.
    - Added STRINGIFY_INT() macro to bli_macro_defs.h
    - Renamed info_t type in bli_type_defs.h to objbits_t (not because of
      an actual naming conflict, but because the name 'info_t' would now be
      somewhat misleading in the presence of the new bli_info API, as the
      two are unrelated).

commit 970b43141697d8c31a033f59513bb59d7cc78ab0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 10 09:30:00 2014 -0500

    Minor bugfixes to BLAS compatibility layer.
    
    Details:
    - Changed bla_amax.c so that i?amax() routines now correctly return 0
      if ( n < 1 || incx <= 0 ).
    - Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
      f2c's abs() macro for float and double cases.
    - Thanks to Murtaza Ali for suggesting the two fixes above.
    - Updated label of fnormv to normfv in testsuite/input.operations.

commit 8ccdfaef4c42ad8957af8607a1a9ee29b9277d4b
Author: Marat Dukhan <maratek@gmail.com>
Date:   Tue Jul 8 23:14:36 2014 -0700

    Replicated logic from testsuite/Makefile in top-level Makefile to support make test

commit caa6507ff3724c80d60987f309b8bbc5b50a9841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 8 10:25:27 2014 -0500

    Minor cleanup to standalone test drivers.
    
    Details:
    - Very minor code changes to standalone test drivers in 'test' directory.
    - Added *.so files to '.gitignore'.

commit 6c65e9a58fe55990ebb99ec3986443e18af35338
Merge: cb12e456 daca500d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 8 10:13:49 2014 -0500

    Merge branch 'master' of github.com:flame/blis

commit cb12e456f94c196c093e52f02a7cbca0032fc86e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 8 10:07:46 2014 -0500

    Fixed possible level-3 inf/NaN issue when beta=0.
    
    Details:
    - Redefined xpbys_mxn and xpbys_mxn_u/_l macros to employ a copy
      (instead of scaling by beta) when beta is zero. This will stamp out
      any possible infs or NaNs in the output matrix, if it happens to be
      uninitialized. Thanks to Tony Kelman for isolating this bug.

commit daca500db5e2448ba0da8047b75eb0f88d9f40e3
Merge: ab3bc915 47023502
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Jul 3 12:52:52 2014 -0500

    Merge branch 'master' of http://github.com/flame/blis

commit 4702350278af31f662b458127777dd4d85a3192f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 3 11:48:23 2014 -0500

    Defined _ukernel_void() wrappers to micro-kernels.
    
    Details:
    - Added wrappers for micro-kernels so that users may invoke the
      micro-kernels without knowing what the function names actually are.
      This is useful when an application wishes to call the micro-kernel
      from a shared library instance of BLIS, where the application may not
      necessarily have the luxury of grabbing the micro-kernel name(s) from
      C preprocessor macros at compile-time. Also, since the wrappers use
      void* pointers, one's environment does not need to be aware of some
      BLIS types such as scomplex and dcomplex. These wrappers now join the
      level-1 and level-1f kernel wrappers, which pre-dated this commit.
    - Removed the wrapper definitions and prototypes from the micro-kernel
      test suite modules, and replaced calls to them with calls to the new
      wrappers mentioned above.

commit ab3bc9153b914fbaf259e15b66c91d628e7c8661
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Jul 3 11:19:43 2014 -0500

    Fixed a bug for TRSM when BLIS_ENABLE_MULTITHREADING is not set but the multithreading environment variables are turned on

commit b8134b720b985783ee6a582a3eb5d6c51f00d051
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Jul 2 16:02:39 2014 -0500

    Quick and dirty multithreading for TRSM
    
    Should work fine for small number of threads (up to 8 or maybe even 16).
    However, performance is yet untested.
    This parallelizes the "JR" loop for the left sided cases
    and the "IR" loop for the right sided cases.
    
    Future work is to parallelize the outer loops as well.

commit e8ef69692831db07ddbe9485a5e504ac3f03e496
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 2 14:59:27 2014 -0500

    Added shared library support to build system.
    
    Details:
    - Modified top-level Makefile to support building shared (dynamic)
      libraries.
    - Updated most configurations' make_defs.mk files to include necessary
      compiler/linker flags needed by top-level Makefile.
    - Note that by default, all configurations presently do NOT build
      shared libraries. To enable, one must change the value of
      BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.

commit b80df0f2cffb015da02e70a82b8512da9891ab67
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 13:52:39 2014 -0500

    Added bump-version.sh script to 'build' directory.
    
    Details:
    - Added a bash script, bump-version.sh, to aid in incrementing the BLIS
      version string.

commit 9ef1f1e21d083697fc730e48d7d9169c201f3da2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 13:48:17 2014 -0500

    CHANGELOG update (0.1.3)

commit 036cc634918463b1caa0fd89c9a211f2f5639af7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 13:48:17 2014 -0500

    Version file update (0.1.3)

commit 09d9a3bf6763932d9f571085b2cfd1b8631eccba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 13:43:26 2014 -0500

    Reverting version file to test new version script.
    
    Details:
    - Changed version file contents to 0.1.2 so that I can test out a new
      version file bumping script.

commit ebb33965981dcb2b0bdee5fc7fdf6c959420f311
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 11:22:50 2014 -0500

    Added 'version' file.

commit 2cb9a5501a3cbeb6692cf68e896087ba73b6af69
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 10:42:29 2014 -0500

    Removed 'version' from .gitignore file.

commit b40dcefc5ee31f67aa3990e2e9d2ef8ed1386a25
Merge: 7101a8ee b693b0cd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 23 10:39:05 2014 -0500

    Merge pull request #11 from Maratyszcza/stable
    
    [sc]axpy kernels for PNaCl

commit b693b0cddcfb41450e3c09a3ab97acb44c1ccdec
Author: Marat Dukhan <maratek@gmail.com>
Date:   Sun Jun 22 13:44:25 2014 -0700

    [SC]AXPY kernels for PNaCl

commit 7101a8eec0327d6c3a7eb36eb4b0fd45c1c6d162
Merge: ad48dca2 020a831b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 19 21:46:50 2014 -0500

    Merge pull request #10 from Maratyszcza/stable
    
    Portable Native Client port

commit 020a831bc5f61744cb8354886aa679b99b1285f6
Author: Marat Dukhan <maratek@gmail.com>
Date:   Thu Jun 19 00:58:26 2014 -0700

    Code clean-up in PNaCl port

commit 491be4f91ed725522f5cc7184053857c6c376ada
Author: Marat Dukhan <maratek@gmail.com>
Date:   Thu Jun 19 00:45:44 2014 -0700

    Optimized dot product kernels for PNaCl

commit 4b8e71aab80182873a2e138eb07902b8d8fd5480
Author: Marat Dukhan <maratek@gmail.com>
Date:   Thu Jun 19 00:43:25 2014 -0700

    Use AR rcs flags for PNaCl target to avoid warning

commit 031deb2a5c718d569bde842590a791b812f4cf1d
Author: Marat Dukhan <maratek@gmail.com>
Date:   Wed Jun 18 03:11:34 2014 -0700

    PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac)

commit 68a02976e3c3638f0a9821342e269a1743e3ace3
Author: Marat Dukhan <maratek@gmail.com>
Date:   Wed Jun 18 03:10:25 2014 -0700

    Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features

commit 6f8462eb0ec278b89731e73ef583386a3371d095
Author: Marat Dukhan <maratek@gmail.com>
Date:   Wed Jun 18 03:08:46 2014 -0700

    Fix inconsistent VERBOSE macro in Makefile

commit b2ffb4de8b6872cb23537ad282e557d11dcd9c8b
Author: Marat Dukhan <maratek@gmail.com>
Date:   Sun Jun 15 18:41:30 2014 -0400

    Reformatted PNaCl GEMM kernels

commit 6de2d472d98baa215264a776f3d5291780a6a085
Author: Marat Dukhan <maratek@gmail.com>
Date:   Sun Jun 15 08:44:31 2014 -0400

    CGEMM and ZGEMM kernels for PNaCl

commit f064711a5e6fb3852c17c7520909b09dc27665f2
Author: Marat Dukhan <maratek@gmail.com>
Date:   Sun Jun 15 06:27:37 2014 -0400

    SGEMM and DGEMM kernels for PNaCl

commit ad48dca22913a363899f0bef45553898718eebb1
Merge: ee2b6792 7118f87e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 14 15:10:13 2014 -0500

    Merge pull request #9 from tkelman/memalign_windows
    
    Use _aligned_malloc instead of posix_memalign on Windows

commit 7118f87e18b4941423472afc00215c1d1f2a1fcd
Author: Tony Kelman <tony@kelman.net>
Date:   Sat Jun 14 06:53:20 2014 -0700

    Use _aligned_malloc instead of posix_memalign on Windows

commit ee2b679281ca45fb40b2198e293bc3bc3d446632
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Jun 6 12:41:55 2014 -0500

    Only include omp.h if BLIS_ENABLE_OPENMP is set

commit 19c05dfaac43c627f86e897c8c00f1f9440754aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 5 10:54:16 2014 -0500

    CHANGELOG update (for 0.1.2).

commit 00f232f8ed1f7c41619b12ebf779ebe2c3b2d3cd
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Jun 2 13:40:57 2014 -0500

    Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi

commit 3fc60e491426f6248c0feae88d971e4d1f88fb95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 21 11:34:42 2014 -0500

    Fixed ldim alignment bug in core2 gemm ukernel.
    
    Details:
    - Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
      a segmentation fault if a column-stored matrix's starting address was
      aligned, but its leading dimension was such that its second column was
      unaligned. Basically, the micro-kernel was assuming that aligned load
      instructions were safe when they actually were not. An extra condition
      that checks the alignment of cs_c (ie: the leading dimension in the
      column storage case) has now been added. Thanks to Michael Lehn for
      reporting this bug.

commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987
Merge: 8c5d6071 21fb0893
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue May 20 09:53:19 2014 -0500

    Merge pull request #8 from tlrmchlsmth/master
    
    Added multithreading to most level-3 operations.

commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon May 19 20:38:55 2014 -0700

    Reverting changes dunnington and reference configs
    
    Now they are unchanged from the main branch of BLIS

commit 8a0ef0e0db5880730425926f8ba56b457a2ba764
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri May 16 13:44:14 2014 -0500

    Fixed rounding error in bli_get_range_weighted

commit 0b4b1680334528b1b60bc696537600f763198e92
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri May 16 12:23:37 2014 -0500

    Fixed bug with disabling JC loop threading for right sided trmm

commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed May 14 16:20:06 2014 -0500

    Disabled parallelism for right-sided TRMM JC loop
    
    The loop has dependent iterations.

commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed May 14 14:59:04 2014 -0500

    Fixed bug with bli_get_range_weighted

commit 45957cc7745e9bb1698408d72f53ef192e960820
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue May 13 17:14:46 2014 -0500

    Allowed threading to be turned off
    
    No longer requires OpenMP to compile
    Define the following in bli_config.h in order to enable multithreading:
    BLIS_ENABLE_MULTITHREADING
    BLIS_ENABLE_OPENMP
    
    Also fixes a bug with bli_get_range_weighted

commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon May 12 17:26:19 2014 -0500

    Disabled multithreading of the kc loop

commit 456df0372170bd7ca2c7e2d85365a69f1f04de88
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Apr 30 12:28:00 2014 -0500

    Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity

commit f4fdfe8fc573553eb36795b79cdf681270dab71b
Merge: 31bb065b 8c5d6071
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Apr 30 11:46:35 2014 -0500

    Merge http://github.com/flame/blis

commit 8c5d6071e24ba10a53669390a47287e86ff354ce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 29 12:26:12 2014 -0500

    Added _check() routines for fprint[mv], rand[mv].
    
    Details:
    - Added _check() routines for fprintm, fprintv, randm, and randv.
    - Added invocations to the above routines from their respective
      front-ends.

commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 28 16:48:25 2014 -0500

    Changed treatment of NULL object buffers.
    
    Details:
    - Relaxed the constraint in bli_obj_attach_buffer_check(), which required
      the buffer address being attached to be non-NULL. This is acceptable
      because the user was already able to create and use objects with NULL
      buffers (via bli_obj_create_without_buffer(), which initializes the
      buffer to NULL).
    - Inserted calls to newly defined function, bli_check_object_buffer(),
      into nearly all operations' _check() or _int_check() functions. This
      allows BLIS to abort peacefully if a computational routine is called
      with an object containing a NULL buffer. By contrast, under such
      conditions, BLAS would typically fail with a segmentation fault.
    - Within operation front-ends, moved the calls to _check()/_int_check()
      so that zero dimensions are checked first (and if found, execution
      returns with trivial or no computation). This resolves issue #7. Thanks
      to Jack Poulson for reporting this bug.

commit 31bb065ba40ae0c5a614e743b8025abca012b99e
Merge: 20e24430 7c619599
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Apr 23 12:30:19 2014 -0500

    Merge http://github.com/flame/blis

commit 7c61959955c8ba78160d0ed4d1979022029d963b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 10 17:18:36 2014 -0500

    Can now query register blocksizes from blk algs.
    
    Details:
    - Added a new field to blksz_t objects that allows one to attach a
      sub-object. Doing this allows us to associate a register blocksize with
      any given cache blocksize. That way, the register blocksize can be
      queried wherever the cache blocksize would normally be accessible
      (e.g. a blocked algorithm).
    - Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
      blocksizes are attached to the cache blocksizes after they are created.

commit 58671597d3d450817b2eda576c05ed6dadd8af6d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 10 15:35:30 2014 -0500

    Minor cleanups to level-2 _cntl.c files.
    
    Details:
    - Changed level-2 _cntl.c files so that the blocksizes for gemv are
      imported and used, rather than blocksizes being declared locally.
    - Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
      4m/3m variants).
    - Removed test/old/test_blis2.c.

commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date:   Tue Apr 8 17:50:44 2014 +0000

    Some fixes for the bgq kernels

commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 16:43:44 2014 -0500

    Add -openmp to ldflags as well

commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 16:37:50 2014 -0500

    Added -openmp flag to Xeon Phi build for convenience

commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 16:31:15 2014 -0500

    Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC

commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 16:29:10 2014 -0500

    Fix for tree barrier freeing bug

commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 15:09:10 2014 -0500

    Bunch of minor fixes
    
    Removed barrier after unpackm in all level3 blocked variants
    Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)
    
    Moved the enabling of the tree barriers into bli_config.h
    Fed the default MR and NR for double precision into bli_get_range instead of the number 8

commit 575fb9b0b08f3bdb56ccde056da619d1585617c1
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 12:13:29 2014 -0500

    Changed default blocking factor to default double precision MR and NR

commit ab9c7880335c281432d5809fe0dec46753d22569
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 11:38:11 2014 -0500

    Added faster tree barriers necessary for performance for Xeon Phi
    
    Fixed up some stuff in the thread info free functions
    Disabled threading for TRSM so that it actually works when threading environment variables are set

commit ec58a7923cccac08632670caadf3cf6ff5dce766
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 10:22:48 2014 -0500

    Freeing thread info paths.
    
    Also made herk IC and JC loops do weighted partitioning

commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36
Merge: 4e3eb39a 21a0efb3
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Apr 4 09:54:54 2014 -0500

    Merge http://github.com/flame/blis
    
    Conflicts:
            kernels/bgq/1/bli_axpyv_opt_var1.c
            kernels/bgq/1/bli_dotv_opt_var1.c

commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date:   Fri Apr 4 14:50:03 2014 +0000

    Some fixes to the bgq config
    MR and NR for double complex were wrong
    Default fusing factor for double precision was wrong as well

commit 21a0efb33d7435139e9c43c1a4787a6bff533e26
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 3 16:38:44 2014 -0500

    Fixed follow-up to issue #6.

commit c318157a9bee8ea6e59be16f99f65d9271fe0d27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 3 16:24:34 2014 -0500

    Fixed issue #6 (incorrect 'restrict' usage).
    
    Details:
    - Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
      (However, there may be other instances of similar misuse elsewhere in
      BLIS.) Thanks to Jeff Hammond for reporting this issue.

commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 3 12:25:45 2014 -0500

    Added #include "arm_neon.h" to ARM gemm ukernel.
    
    Details:
    - Inserted #include "arm_neon.h" into gemm ukernel source file for
      arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.

commit 2041c264517b6c590fd4f7e8253e6911b622d1c3
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Apr 3 10:30:03 2014 -0500

    Added barriers needed prior to doing scalar reset for rank-k updates.

commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 1 14:34:31 2014 -0500

    Attempted to fix uninitialized variable warnings.
    
    Details:
    - Added initialization statements to various macros used in level 1m and
      1m-like operations. I wasn't able to reproduce the reported behavior,
      so hopefully this takes care of it. Thanks to Jeff Hammond for the
      report.

commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 1 12:57:24 2014 -0500

    Use generic paths for toolchain in POWER7.
    
    Details:
    - Fixed issue #4. Thanks to Jeff Hammond for contributing changes.

commit 1584ae1c83c3a8c1af76acb46404747507650f19
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Mar 28 15:15:48 2014 -0500

    Fixed race condition involving scalar reset

commit 459dde4acc09e49380da58fb7b246db488884ad9
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Mar 27 17:06:45 2014 -0500

    Made barrier after packing implicit.
    
    This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
    but not the outer packing routines.
    This allowed, for instance, the block of B to not be finished being packed before computation to occur.

commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Mar 27 14:18:46 2014 -0500

    Some fixes for the internal functions,
    was innappropriately only having thread chief do some things.

commit a6fd48345424e097f71652be013aa897e098b41e
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date:   Wed Mar 26 17:19:46 2014 +0000

    Added test drivers for level 3 BLAS that run tests in parallel using MPI

commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date:   Wed Mar 26 15:39:05 2014 +0000

    Some fixes for the bgq configuration

commit f0824a04fc75e231c3a3d7757fa4e7294173282f
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 24 15:21:42 2014 -0500

    Initial commit to enable threading in TRSM,
    
    Also enabled weighted partitioning for herk, trmm
    Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
    Correctly computed a_next and b_next for gemm, herk macrokernels
    a_next and b_next point to the current micropanels in trmm

commit 23d9eab354fbc88165889832955e126772bf8488
Merge: 5d5dc2ee fd3e32a5
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Mar 20 16:54:35 2014 -0500

    Merge https://github.com/flame/blis

commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Mar 20 16:43:36 2014 -0500

    Parallelized trmm and trmm3
    
    Also fixed bugs in packm

commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 20 13:59:48 2014 -0500

    Refined INSERT_GENTFUNC macro usage.
    
    Details:
    - Defined new INSERT_GENTFUNC macros so that the macro always takes
      exactly the number of arguments needed for the particular operation or
      variant being defined. Many operations were using INSERT_GENTFUNC
      macros that expected one auxiliary argument even though none were
      needed. Those instances have now been updated. Most of these instances
      were in the level-0 and -1v operations, as well as some operations
      defined in frame/util.

commit 9b0e715f29338a1a1d6445907d2445c35f011121
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 19 15:47:54 2014 -0500

    Minor simplifications to trmm, trsm macro-kernels.
    
    Details:
    - Simplified some code that would have allowed the diagonal of a trmm
      or trsm triangular matrix to intersect the short end of a micro-panel.
      This is disallowed via higher-level constraints on cache blocksizes, so
      this code was never needed and only served to obfuscate.
    - Updated some comments in trmm, trsm macro-kernels.

commit a3902750b9ab4923433f7e353f3669c3c419f8e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Mar 19 12:35:17 2014 -0500

    Reorganized norm operations.
    
    Details:
    - Completely reoganized norm operations:
      - Renames:
        - fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
        - absumv -> norm1v (vector 1-norm)
      - New operations:
        - norm1m (matrix 1-norm)
        - normiv, normim (infinity-norm)
        - amaxv (BLAS-like absolute maximum value index)
        - asumv (BLAS-like absolute sum)
    - Deprecated absumm, as it did not correspond to any actual norm.
      (However, an inlined version now exists in the testsuite module for
      randm.)

commit c0140cb752f27e99742f85d23be2181c00a1335e
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Wed Mar 19 11:21:16 2014 -0500

    Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state
    
    Now just performed by the master thread.

commit fb42983bd9943711baa7d1c6496de1215bb816ef
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue Mar 18 16:37:28 2014 -0500

    Fixed a barrier bug and a thread decorator bug

commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue Mar 18 15:23:09 2014 -0500

    Fixing function pointer issues with thread decorator

commit ec8b88f93533942d3711191873310e7ff281bda6
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue Mar 18 14:35:37 2014 -0500

    Enabled threading for packm blocked variants 3 and 4

commit 0ac534cdf657bbf04601abfe719ba2887aab5da7
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue Mar 18 13:26:27 2014 -0500

    Added decorator for calling parallelized intermal functions
    
    Will allow for easy support for different threading models

commit 5296f58975f7d351f88909cc80b6d0cffd73def7
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 17 17:15:35 2014 -0500

    Fixing some bugs with herk parallelization

commit c51d0110831eb89361b4720bf7ed75edbd26ebce
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 17 15:00:47 2014 -0500

    Initial multithreading support for HERK

commit c720b141568d1f289146bf34ded08001f2c0dfbb
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 17 11:39:32 2014 -0500

    Switched to using environment variables to control threading.
    
    The environment variables all follow the format BLIS_X_NT,
    where X is the index of the loop as described in our paper
    Anatomy of High Performance Many-Threaded Matrix Multiplication.
    These indices are IR, JR, IC, KC, and JC.
    
    Also enabled parallelism for hemm and symm, but these are currently untested.

commit 92233cf64274b27b2217c5cfffe75443ff6137a4
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue Mar 11 14:16:08 2014 -0500

    Some fixes to gemm thread info tree creation,
    Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
    instead of BLIS_SINGLE_THREADED

commit 020f80c30289d8bcaa688bf600b01fae9b23b54f
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Tue Mar 11 12:08:17 2014 -0500

    Added files specific to threading for gemm and packm operations

commit 8d8f4352a41926bc923e47be836365b6b726aff2
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 10 15:47:28 2014 -0500

    Added single threaded thread info data structures specifically for gemm and packm

commit 0e8677761175189583ca7d855e24b2bbdd2dada8
Merge: 2e727a02 b3bff631
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 10 15:16:21 2014 -0500

    Merge branch 'master' of https://github.com/tlrmchlsmth/blis

commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Mon Mar 10 15:14:33 2014 -0500

    Modifying the thread info data structures
    
    This change makes each operation have its own thread info type,
    allowing more fine control of threading in operations that have different types of suboperations

commit a770590cf21a459f04bf941c58ee2afd272cc441
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 3 14:31:44 2014 -0600

    Minor fixes to sumsqv, abmaxv.
    
    Details:
    - Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
      LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
      the vector (or matrix) contains a NaN.
    - Minor change to bli_abmaxv_unb_var1() to more closely mimic the
      behavior of netlib BLAS's izamax(). There, a "less than or equal to"
      operator is used in the search instead of "less than", which would
      change the element index returned if there were multiple maximum values.
    - Added macro function definitions for bli_isinf() and bli_isnan(), which
      are currently implemented in terms of isinf() and isnan() from math.h.

commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7
Merge: 2c158fb8 e8757b03
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 16:53:24 2014 -0600

    Merge https://github.com/flame/blis

commit 2c158fb885c27f7b599dc1e85b57edd684f19223
Merge: e4738c48 c2b2ab62
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 16:46:23 2014 -0600

    Merge https://github.com/flame/blis
    
    Conflicts:
            frame/1m/packm/bli_packm_blk_var1.c

commit e8757b03a74f9891632242e9a90efb32150826f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 27 16:40:07 2014 -0600

    Use "%ld" as int format specifier in fprintm.
    
    Details:
    - Changed "%d" to "%ld" when printing integers via bli_fprintm().
    - Meant to include this in previous commit.

commit c663ce3b5170fee7dfb5b528b650d70c8e932cac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 27 16:32:57 2014 -0600

    Fixed various bugs when C99 complex is enabled.
    
    Details:
    - Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
      elsewhere in the framework that were not yet set up to work properly
      when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
    - Extensive changes to f2c-derived files in frame/compat/f2c to allow
      C99 complex storage. Most of these changes center around accessing
      real and imaginary components via bli_?real()/bli_?imag() accessor
      macros, and setting of values via bli_?sets() assignment macros.
      (Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
      was broken.)

commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 16:29:46 2014 -0600

    Added support for parallelism in gemm micro-kernel

commit bfe214b633765ed40b57b330fbb84c332663aa40
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 15:53:10 2014 -0600

    Fixed bug with parallel packing, and bug with allocating an array of thread infos
    
    In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
    This dependeny was removed, allowing each iteration to be executed in parallel.
    
    Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.

commit 6193d9ceea552e67170dba45abde04c64271c705
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 14:09:19 2014 -0600

    Fixed bug in thread trees

commit ac5a2de1d17ffd460b00fee9757898525a09abae
Merge: 01b125e8 bd3c7ecf
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 11:59:33 2014 -0600

    Merge branch 'master' of https://github.com/tlrmchlsmth/blis

commit 01b125e815f19410e8e0611d088b84570e499e93
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Thu Feb 27 11:55:45 2014 -0600

    First pass at adding parallelism to BLIS.
    
    Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
    Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.

commit c2b2ab62707e4174892aff3ce65f36f54878fae5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 26 12:46:45 2014 -0600

    Deprecated panel stride alignment in bli_config.h.
    
    Details:
    - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
      configurations. It was already going unused in packm_init() since the
      recent 4m/3m commit. This setting was rarely, if ever, useful, and its
      existence only posed a potential risk for 4m/3m-based implementations.
    - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
    - Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
      micro-kernels.

commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 25 17:58:42 2014 -0600

    CHANGELOG update (for 0.1.1).

commit fde5f1fdece19881f50b142e8611b772a647e6d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 25 13:34:56 2014 -0600

    Added extensive support for configuration defaults.
    
    Details:
    - Standard names for reference kernels (levels-1v, -1f and 3) are now
      macro constants. Examples:
        BLIS_SAXPYV_KERNEL_REF
        BLIS_DDOTXF_KERNEL_REF
        BLIS_ZGEMM_UKERNEL_REF
    - Developers no longer have to name all datatype instances of a kernel
      with a common base name; [sdcz] datatype flavors of each kernel or
      micro-kernel (level-1v, -1f, or 3) may now be named independently.
      This means you can now, if you wish, encode the datatype-specific
      register blocksizes in the name of the micro-kernel functions.
    - Any datatype instances of any kernel (1v, 1f, or 3) that is left
      undefined in bli_kernel.h will default to the corresponding reference
      implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
      it will be defined to be BLIS_DGEMM_UKERNEL_REF.
    - Developers no longer need to name level-1v/-1f kernels with multiple
      datatype chars to match the number of types the kernel WOULD take in
      a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
      sufficient, as in bli_daxpyv_opt().
    - There is no longer a need to define an obj_t wrapper to go along with
      your level-1v/-1f kernels. The framework now prvides a _kernel()
      function which serves as the obj_t wrapper for whatever kernels are
      specified (or defaulted to) via bli_kernel.h
    - Developers no longer need to prototype their kernels, and thus no
      longer need to include any prototyping headers from within
      bli_kernel.h. The framework now generates kernel prototypes, with the
      proper type signature, based on the kernel names defined (or defaulted
      to) via bli_kernel.h.
    - If the complex datatype x (of [cz]) implementation of the gemm micro-
      kernel is left undefined by bli_kernel.h, but its same-precision real
      domain equivalent IS defined, BLIS will use a 4m-based implementation
      for the datatype x implementations of all level-3 operations, using
      only the real gemm micro-kernel.

commit 15b51e990f1d21333b5f7af97c211756247336e5
Merge: 6363a9f6 fc04b5eb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 21 09:04:32 2014 -0600

    Merge branch 'master' of github.com:fgvanzee/blis

commit fc04b5eb69868c341ce03f5ef1f02de4b8c121b0
Merge: b29e1c2b d1813c9d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 21 09:04:13 2014 -0600

    Merge pull request #3 from figual/master
    
    New ARM armv7a kernels and Assembly file consideration in Makefile

commit d1813c9dee34410833db5061e6588ec1a6c9ecd4
Author: Francisco Igual <figual@pandaboard.(none)>
Date:   Fri Feb 21 15:14:31 2014 +0100

    Added new armv7a micro-kernels and configuration files from Werner Saar.

commit 0cd098c03a000ed9426a7e9135190696da8cadbc
Author: Francisco Igual <figual@pandaboard.(none)>
Date:   Fri Feb 21 15:12:30 2014 +0100

     o Modified Makefile to consider .S assembly microkernels.

commit 6363a9f658257fe3d814a3dce5308f807adb54a2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 19 17:00:52 2014 -0600

    Added level-3 support for complex via 4m-/3m.
    
    Details:
    - Added the ability to induce complex domain level-3 operations via new
      virtual complex micro-kernels which are implemented via only real
      domain micro-kernels. Two new implementations are provided: 4m and 3m.
      4m implements complex matrix multiplication in terms of four real
      matrix multiplications, where as 3m uses only three and thus is
      capable of even higher (than peak) performance. However, the 3m method
      has somewhat weaker numerical properties, making it less desirable
      in general.
    - Further refined packing routines, which were recently revamped, and
      added packing functionality for 4m and 3m.
    - Some modifications to trmm and trsm macro-kernels to facilitate indexing
      into micro-panels which were packed for 4m/3m virtual kernels.
    - Added 4m and 3m interfaces for each level-3 operation.
    - Various other minor changes to facilitate 4m/3m methods.

commit b29e1c2b278c177e104c84ba462820ee8296df6c
Merge: ee60377e bd3c7ecf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 14 14:11:54 2014 -0600

    Merge pull request #2 from tlrmchlsmth/master
    
    Fixes and improvements to xeon phi implementation.

commit bd3c7ecfb54a9b9851c7d364f41c21e4cff52f6f
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Feb 14 14:05:57 2014 -0600

    Removing changes to input.general and input.operations

commit ce066863683cb4e910270cf8ab8e138b01ff3358
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Feb 14 13:40:24 2014 -0600

    Fixed more Xeon Phi bugs, especially with scattered update

commit 31134b5c7076423aee1b4f494e925f27171d97e6
Author: Tyler Smith <tms@cs.utexas.edu>
Date:   Fri Feb 14 11:19:44 2014 -0600

    Some fixes, changes, and improvements to the microkernel to the Xeon Phi

commit ee60377e467862b9d8a7205c45dce5cf66c78c46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 13 14:03:31 2014 -0600

    Shifted some fields in info_t.
    
    Details:
    - Shifted the pack order, pack buffer type, and structure type fields
      to make room for an extra bit in the pack type/status field.

commit bd3ab1ad4cf42f8bc30ab262acf8eccb49bb1a08
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 13 09:29:55 2014 -0600

    Minor fixes to trsm consistent with prev on trmm.
    
    Details:
    - Removed use of bli_min() and bli_max() that were only being used to
      try to support situations where the diagonal would intersect the
      short end of some micro-panels, which is situation that is disallowed
      at a higher level by various constraints on the register and cache
      blocksize. This only affected trsm_ll and trsm_lu.
    - Use panel stride as passed into the macro-kernel rather than compute
      it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.

commit 6260b0b5f8bd248f3f66e5a1c6854bdbd9d02ad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 13 09:19:56 2014 -0600

    Fixed obscure bug in trmm_ll, trmm_lu.
    
    Details:
    - Fixed an obscure bug in left-hand trmm that would only manifest when
      non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
      are used.
    - Removed use of bli_min() and bli_max() that were only being used to
      try to support situations where the diagonal would intersect the
      short end of some micro-panels, which is situation that is disallowed
      at a higher level by various constraints on the register and cache
      blocksize. This only affected trmm_ll and trmm_lu.
    - Use panel stride as passed into the macro-kernel rather than compute
      it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.

commit 16915c1c1e55c660bf82141cdadf7c0860d5b464
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 11 10:54:19 2014 -0600

    Fixed an obscure bug in packm_cxk().
    
    Details:
    - Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
      from ldp, which is always equal to PACKMR or PACKNR. The problem with
      this is that the pack ukernels were implicitly assuming that the
      panel dimension of the panel being packed was equal to ldp, which
      is not the case when the register blocksizes extensions are non-zero
      (ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
      problem has been fixed by passing ldp into the pack ukernels, which
      now walk through the packed micro-panel region by incrementing by this
      value, rather than incrementing by the inherent panel dimension value
      assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
    - Also fixed a very minor edge case inefficiency whereby pack ukernels
      smaller than the default were not being used in edge cases, and instead
      those situations were being handled by scal2m. This is related to the
      issue above, because the pack ukernel itself was being chosen based on
      ldp instead of the panel dimension.

commit b7da57b282c5a5e2208946e60309d2352f55351d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 11 10:28:23 2014 -0600

    Updated calls to packm_blk_var2() in testsuite.
    
    Details:
    - In ukernel testsuite modules, replaced calls to packm_blk_var2() with
      _var1(). Meant to include this in previous commit.

commit c255a293e25b2223c88e8800267cd06ad2a90041
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 10 14:31:24 2014 -0600

    Consolidated packm_blk_var2 and var3.
    
    Details:
    - Consolidated the functionality previously supported by packm_blk_var2()
      and packm_blk_var3() into a new variant, packm_blk_var1().
    - Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
      to accommodate above changes.
    - Removed packm_blk_var3() and retired packm_blk_var2() to
      frame/1m/packm/old.
    - Updated all level-3 _cntl_init() functions so that the new, more
      versatile packm_blk_var1 is used for all level-3 matrix packing.

commit 32d8f264ae7b28155f5d7b21dcc5ecb78da2e0ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Feb 9 10:07:37 2014 -0600

    Refactored packm variants.
    
    Details:
    - Revised packm_blk_var2() and _var3() by encapsulating the general,
      hermitian/symmetric, and triangular panel-packing subproblems into
      separate functions: packm_gen_cxk(), packm_herm_cxk(), and
      packm_tri_cxk(), respectively. Also, homogenized the packm code as
      well as the new specialized packm_*_cxk() code to further improve
      readability.

commit 6c8067028707947fcdf4f856a272e15bb9ed91e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 7 11:27:15 2014 -0600

    Renamed enumerated type in testsuite and modules.
    
    Details:
    - Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
      renamed all corresponding "impl" variables to "iface".

commit 6c12598b1bc567f0b08f58aebdc753a1c1390378
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 6 18:26:35 2014 -0600

    Employ simpler INSERT_ macro for ref ukernels.
    
    Details:
    - Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
      argument--the base name of the function--and employed this macro
      in the reference micro-kernel files instead of the _BASIC macro,
      which takes one auxiliary argument. That argument was not being
      used and probably just acted to unnecessarily obfuscate.

commit 32cae66326b68706d0e695cfd60c9ca5bc32c534
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 6 18:06:42 2014 -0600

    Fixed some instances of sloppy 'restrict' usage.
    
    Details:
    - Fixed some technical incorrectness with some usage of the 'restrict'
      keyword in the reference trsm micro-kernels.
    - Tweak to testsuite/Makefile that causes rebuild if libblis was
      touched.

commit 7aceef7683e2a2aff3c7ec2a73508036af2e19e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 6 17:31:19 2014 -0600

    Updated comments in macro-kernels.
    
    Details:
    - Updated (and fixed some errors in) the "Assumptions/assertions" comment
      section of macro-kernels.
    - Changed register blocksizes of reference configuration to MR = 8 and
      NR = 4. It's always good for MR != NR in the reference configuration
      since it may help uncover bugs related to non-square micro-kernels.

commit 8fd292aa78950bcdf556605718f09d13f9575abc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 6 14:32:21 2014 -0600

    Pass panel dimensions into macro-kernels.
    
    Details:
    - Modified the interfaces to the datatype-specific macro-kernels so that:
      - pd_a and pd_b are passed in (which contain the panel dimensions of
        packed panels of a and b).
      - rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
    - Modified implementations of datatype-specific macro-kernels so pd_a,
      pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
      and PACKNR, respectively.
    - Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
      is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
      header file bli_kernel_post_macro_defs.h.

commit 3404e6657eabb017cd1580a2f1dd8e6fb13df923
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 5 11:19:10 2014 -0600

    Deprecated incremental blocksize macro const defs.
    
    Details:
    - Removed macro constant definitions related to incremental blocksizes
      from all configurations' bli_kernel.h files. This change is minor and
      is mostly a cleanup related to a previous commit.

commit 1e9afd39a63e0a58167d4439c1a0a880a4a35657
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 4 20:15:19 2014 -0600

    Comment updates (removed vestiges of "bd").

commit 5cf58f7c2d5bc0d2d94d9576f7158d8f133b7aac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 4 09:15:19 2014 -0600

    Added early returns for "object is zeros" case.
    
    Details:
    - Added some logic to packm_init(), pack_int() and gemm_int() so that
      (a) objects marked as BLIS_ZEROS are not packed, and (b) those
      objects are not computed with. This functionality is not currently
      needed by any existing implementations, but may be used in the
      future.

commit 6bbd4be769a9b344a55abe5ddaca1a99fd29f7b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 3 13:15:25 2014 -0600

    Added 'f' on some gemm and trmm blocked variants.
    
    Details:
    - Added 'f' to some block variant files/functions to be consistent with
      other file/functions' naming convention. Here, the f indicates
      partitioning in the "forward" direction.

commit eb13cb2c6b182df5e2a9b88c76f50e2cee25b9e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 3 11:07:01 2014 -0600

    Removed redundant non-gemm blksz_t creation.
    
    Details:
    - Removed code that creates duplicate blksz_t objects for herk, trmm,
      and trsm. Instead, the gemm blksz_t objects are accessed via extern
      and used directly. This reduces the amount of code associated with
      each of the three _cntl_init() and _cntl_finalize() function.

commit 0a023a7d9e58e53b8c204a5f49aa8ca9afeba938
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 29 14:02:08 2014 -0600

    Introduced new level-3 front-end layer.
    
    Details:
    - Added new _front() functions for each level-3 operation. This is done
      so that the choosing of the control tree (and *only* the choosing of
      the control tree) happens in what was previously the "front end"
      (e.g. bli_gemm()). That control tree is then passed into the _front()
      function, which then performs up-front tasks such as parameter
      checking.

commit 251c5d112196d37b183e554bc9d406104aed65fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jan 28 19:40:29 2014 -0600

    Removed redundant hemm, her2k control trees.
    
    Details:
    - Removed code that generated a control tree specifically for hemm and
      symm. Instead, the gemm control tree is now configured so that it
      works for gemm, hemm, or symm.
    - Retired most her2k code, as it was not being used. (Currently, her2k is
      implemented as two invocations of herk.) I couldn't think of many
      situations where her2k variants were needed.
    - Removed some older her2k code.

commit 5a36e5bf2f59d1e85d6dbce32a07d604c5e82d11
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 27 11:13:00 2014 -0600

    Embed func_t microkernel objects in control trees.
    
    Details:
    - Modified all control tree node definitions to include a new field of
      type func_t*, which is similar to a blksz_t except that it contains
      one function pointer (each typed simply as void*) for each datatype.
      We use the func_t* to embed pointers to the micro-kernels to use for
      the leaf-level nodes of each control tree. This change is a natural
      extension of control trees and will allow more flexibility in the
      future.
    - Modified all macro-kernel wrappers to obtain the micro-kernel pointers
      from the incomming (previously ignored) control tree node and then pass
      the queried pointer into the datatype-specific macro-kernel code, which
      then casts the pointer to the appropriate type (new typedefs residing
      in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
      kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
      is, determined when the datatype-specific macro-kernel functions are
      instantiated by the C preprocessor).
    - Added macros to bli_kernel_macro_defs.h that build datatype-specific
      base names if they do not exist already, and then uses those to build
      datatype-specific micro-kernel function names. This will allow
      developers extra flexibility if they wanted to, for example, name each
      of their datatype-specific micro-kernels differently (e.g. double
      real might be named bli_dgemm_opt_4x4() while double complex might be
      named bli_zgemm_opt_2x2()).
    - Inserted appropriate code into _cntl_init() functions that allocates
      and initializes a func_t object for the corresponding micro-kernels.
      The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
      and then reused via extern wherever possible.

commit 6cbd6f1c7f1915180aa28939833afde48665c5ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 24 10:38:29 2014 -0600

    Removed commented mixed domain macro-kernel code.
    
    Details:
    - Removed commented-out code from macro-kernels that was supposed to
      facilitate implementing mixed domain (complex times real) matrix
      multiplication. This functionality is still (probably possible),
      but I'm getting tired of looking at the code every time I edit
      a macro-kernel. Plus, there are probably ways of doing it at a
      higher level, via control trees.

commit 29778be1119f1a884330d7f8dc424a2df4101d58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 22 16:03:11 2014 -0600

    Removed b_aux field from cntl nodes.
    
    Details:
    - Removed b_aux field from all control tree node definitions. This field
      was being used in certain optimizations (incremental blocking) that were
      not actually being employed within BLIS, and are probably not employed
      by others.
    - Updated all _cntl_obj_create() function definitions and invocations
      according to above change.
    - Retired bli_gemm_blk_var4.c, which was one such function that employed
      incremental blocking, but which was never called by BLIS itself.

commit 06ac727a42ec9e832c7832745036702014638f99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 15 16:44:52 2014 -0600

    Updated some comments in level-3 front ends.

commit d628bf1da1560f1f5126a1ddfed8714f0a4b8da3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 15 11:40:12 2014 -0600

    Consolidated pack_t enums; retired VECTOR value.
    
    Details:
    - Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
      its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
      makes room in the three pack_t bits of the info field of obj_t so that
      two values are now unused, and may be used for other future purposes.
    - Updated sloppy terminology usage in comments in level-2 front-ends.
      (Replaced "is contiguous" with more accurate "has unit stride".)

commit ddc8c1c379b4787be5954802906593d7ea144452
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 13 14:55:43 2014 -0600

    Suppress warning in Makefile (UNINSTALL_LIBS).
    
    Details:
    - Redirect errors to /dev/null when using 'find' to locate libraries that
      would be uninstalled upon executing "make uninstall-old". Before, if the
      Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
      or directory" message was emitted. This message was harmless, but is now
      suppressed in this situation.

commit f8f67d7251bffc05020e20527c100c8115fd5e55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 10 09:06:11 2014 -0600

    Typecast bli_getopt() return value in testsuite.
    
    Details:
    - In the test suite driver, inserted an explicit typecast of the return
      value of bli_getopt() prior parsing. The lack of typecast caused a
      problem on at least one system whereby a return value of -1 was
      interpreted as garbage character. Thanks to Francisco Igual for finding
      and submitting this fix.

commit e7f154fe2ed3e10e2323cefe5d25c2c23ac902c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 10 08:48:07 2014 -0600

    Applied edge case fix to arm/neon microkernel.
    
    Details:
    - Applied an edge case bugfix, courtesy of Francisco Igual, to the current
      double precision real gemm microkernel in kernels/arm/neon/3.

commit 89c76a8a51d070d263c13bfa5ace65769509f2b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jan 9 12:08:37 2014 -0600

    Allow building outside source distribution.
    
    Details:
    - Modified build system (mostly configure and top-level Makefile) so that
      a user can build a BLIS library outside of the top-level directory of
      the source distribution.
    - Added "test" target to Makefile so that the user can run "make test",
      which will compile, link, and run the testsuite binary. This works even
      if the build directory is externally located, thanks to the test suite
      binary's new -g and -o command-line options. Also, when creating the
      test suite via the top-level Makefile, the linking is against the
      local archive, in lib/<configname>, rather than at <install_prefix>/lib.
    - Modified testsuite/Makefile so that it links against the library built
      locally, in ../lib/<configname>.
    - Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
    - Various other cleanups to build system.

commit 12fa82ec12cc340ab28552997d9d50f7c98691f8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jan 8 16:09:26 2014 -0600

    Implemented bli_getopt().
    
    Details:
    - Added bli_getopt.c and .h files to frame/base. These files implement
      a custom version of getopt(), which may be used to parse command line
      options passed into a program via argc/argv. I am implementing this
      function myself, as opposed to using the version available via unistd.h,
      for portability reasons, as the only requirements are string.h (which
      is available via the standard C library).
    - Modified test suite to allow the user to specify the file name (and/or
      path) to the parameters and operations input files: -g may be used to
      specify the general input file and -o to specify the operations input
      file). If -g or -o or both are not given, default filenames are assumed
      (as well as their existence in the current directory).

commit cafb58e86ea5cfb21b9eedc57ca8ebbf24252098
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 6 13:28:36 2014 -0600

    Updated template micro-kernels to use auxinfo_t.
    
    Details:
    - Updated template micro-kernel implementations (located in
      config/template/kernels), to adhere to the new auxinfo_t interface.
      Meant to include this change in a0331fb1.
    - Changed template configuration to use 64-bit integers (for both BLIS
      and the BLAS compatibility layer).

commit 9ab126b499c3805045020cb89a8a5848e28d3bf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jan 6 12:13:26 2014 -0600

    Removed error checks in netlib->BLIS param mapping
    
    Details:
    - Disabled error checking in netlib-to-BLIS parameter mapping functions.
      If the char value input to these functions was not one of the defined
      values, bli_check_error_code() with the appropriate error code value
      would be called, resulting in an abort(). This was unnecessary and
      redundant since these routines are currently only used within the
      BLAS compatibility layer, and they are only called AFTER parameter
      checking has already been performed on the original BLAS char values.
      If the application tried to override xerbla() to prevent an abort()
      from being called, this error checking would still get in the way.
      Thus, instead of reporting the error situation to the framework (ie:
      calling abort()), an arbitrary BLIS parameter value is now chosen and
      the function returns normally. Thanks to Jeff Hammond for finding and
      reporting this issue.

commit 2cb13600f9f9601c60e7f96f4ca159d169ade9cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 3 12:29:13 2014 -0600

    Updated year in copyright headers to 2014.

commit 290fa54e0083c9c837188b8321b13b1b282e7b0c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 20 14:10:26 2013 -0600

    Store variable panel strides in trmm/trsm auxinfo.
    
    Details:
    - Changed the value being stored into the auxinfo_t structure in trmm
      and trsm macro-kernels. Whereas before we stored whatever value was
      provided to the macro-kernel implementation via ps_a/ps_b, now we
      store the stride that will advance to the next variable-length
      micro-panel of the triangular matrix A (left) or B (right).
    - Whitespace changes to the files affected above.

commit e3a6c7e77667fd749248df3f75f880266c3136ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 19 16:29:31 2013 -0600

    Macroized conditionals for a2/b2 in macro-kernels.
    
    Details:
    - Replaced conditional expressions in macro-kernels related to computing
      the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
      invocation, bli_is_last_iter(), that tests the same condition.
    - Updated gemm_ukr module to use auxinfo_t argument.
    - Whitespace changes in test suite ukr modules.

commit a0331fb10a50393e31d16339053b75b944132da1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 19 14:50:11 2013 -0600

    Introduced auxinfo_t argument to micro-kernels.
    
    Details:
    - Removed a_next and b_next arguments to micro-kernels and replaced them
      with a pointer to a new datatype, auxinfo_t, which is simply a struct
      that holds a_next and b_next. The struct may hold other auxiliary
      information that may be useful to a micro-kernel, such as micro-panel
      stride. Micro-kernels may access struct fields via accessor macros
      defined in bli_auxinfo_macro_defs.h.
    - Updated all instances of micro-kernel definitions, micro-kernel calls,
      as well as macro-kernels (for declaring and initializing the structs)
      according to above change.

commit 392428dea4001fe4384efe29f6cde32f8abeeb35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 12 19:01:47 2013 -0600

    Added "ri" scalar macros.
    
    Details:
    - Added set of basic scalar macros that take arguments' real and
      imaginary components separately, named like the previous set except
      with the "ris" (instead of "s") suffix.
    - Redefined the previous set of scalar macros (those that take arguments
      "whole") in terms of the new "ri" set.
    - Renamed setris and getris macros to sets and gets.
    - Renamed setimag0 macros to seti0s.
    - Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.

commit f60c8adc2f61eaba06b892f4e73000159de93056
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 10 14:39:56 2013 -0600

    Minor updates to dunnington configuration.
    
    Details:
    - Added commented alternatives to dunnington configuration's bli_kernel.h.
    - Minor reformatting of optimization flag variables in make_defs.mk.

commit 4ef20150492db254b5baf2368add62e19b0ac11b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 9 18:53:03 2013 -0600

    Tweaks to dunnington configuration (x86_64/core2).
    
    Details:
    - Updated BLIS_DEFAULT_KC_D from 256 to 384.
    - Enabled cache blocksize extension of up to 25% for MC and KC (for
      double-precision real).

commit 5ad2ce7bf5ba3ea955e6d517bfd270e02820263b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 9 18:30:49 2013 -0600

    Minor x86_64 (core2) kernel fixes.
    
    Details:
    - Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
      for x86_64/core2 were calling the wrong reference code (l instead
      of u).
    - Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
      kernels.
    - Minor typecasting fix in testsuite/src/test_libblis.c.
    - Makefile updates.

commit d289f5d3a9c0e1a68a17c1c32b736e282a289c4c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 5 10:56:13 2013 -0600

    Whitespace changes to level-2 blocked variants.
    
    Details:
    - Joined some lines in level-2 blocked variants to match formatting used
      in level-3 blocked variants.
    - Streamlined implementation of bli_obj_equals() in bli_query.c.

commit b444489f100d218bc8ef29b01ff8489c358559f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 3 16:08:30 2013 -0600

    Added new "attached" scalar representation.
    
    Details:
    - Added infrastructure to support a new scalar representation, whereby
      every object contains an internal scalar that defaults to 1.0. This
      facilitates passing scalars around without having to house them in
      separate objects. These "attached" scalars are stored in the internal
      atom_t field of the obj_t struct, and are always stored to be the same
      datatype as the object to which they are attached. Level-3 variants no
      longer take scalar arguments, however, level-3 internal back-ends stll
      do; this is so that the calling function can perform subproblems such
      as C := C - alpha * A * B on-the-fly without needing to change either
      of the scalars attached to A or B.
    - Removed scalar argument from packm_int().
    - Observe and apply attached scalars in scalm_int(), and removed scalar
      from interface of scalm_unb_var1().
    - Renamed the following functions (and corresponding invocations):
    
       bli_obj_init_scalar_copy_of()
                               -> bli_obj_scalar_init_detached_copy_of()
       bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
       bli_obj_create_scalar_with_attached_buffer()
                               -> bli_obj_create_1x1_with_attached_buffer()
       bli_obj_scalar_equals() -> bli_obj_equals()
    
    - Defined new functions:
    
       bli_obj_scalar_detach()
       bli_obj_scalar_attach()
       bli_obj_scalar_apply_scalar()
       bli_obj_scalar_reset()
       bli_obj_scalar_has_nonzero_imag()
       bli_obj_scalar_equals()
    
    - Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
    - Renamed the following macros:
    
       bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
       bli_obj_is_scalar()     -> bli_obj_is_1x1()
    
    - Defined new macros to set and copy internal scalars between objects:
    
       bli_obj_set_internal_scalar()
       bli_obj_copy_internal_scalar()
    
    - In level-3 internal back-ends, added conditional blocks where alpha and
      beta are checked for non-unit-ness. Those values for alpha and beta are
      applied to the scalars attached to aliases of A/B/C, as appropriate,
      before being passed into the variant specified by the control tree.
    - In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
      alpha and/or beta.
    - In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
      attached to A and B are multiplied together to obtain alpha, while beta
      is obtained directly from C.
    - In level-3 front-ends, removed old function calls meant to provide
      future support for mixed domain/precision. These can be added back later
      once that functionality is given proper treatment. Also, removed the
      creating of copy-casts of alpha and beta since typecasting of scalars
      is now implicitly handled in the internal back-ends when alpha and
      beta are applied to the attached scalars.

commit 992de486d6f23e69a623abd15ae77d7881d13871
Merge: 9552e6ee fd4ac636
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 2 13:58:46 2013 -0600

    Unimplemented kernels now call reference.
    
    Details:
    - Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
      datatypes call the corresponding reference kernel. Previously, these
      kernel functions called abort() with a "not yet implemented" error
      message.

commit fd4ac636d9a55cec1476a444bd4e70def219dc8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 2 13:50:36 2013 -0600

    Unimplemented kernels now call reference.
    
    Details:
    - Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
      unimplemented kernel functions simply call the corresponding reference
      implementation. (Previously, these unimplemented functions would
      abort() with a "not yet implemented" message.)

commit 9552e6ee824d4345d5e908e869e071d19829819a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Nov 24 11:40:31 2013 -0600

    Removed optional scaling from packm control tree.
    
    Details:
    - Removed does_scale field from packm control tree node and
      bli_packm_cntl_obj_create() interface. Adjusted all invocations of
      _cntl_obj_create() accordingly.
    - Redefined/renamted macros that are used in aliasing so that now,
      bli_obj_alias_to() does a full alias (shallow copy) while
      bli_obj_alias_for_packing() does a partial alias that preserves the
      pack_mem-related fields of the aliasing (destination) object.
    - Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
      will work just fine for bli_trmm3().
    - Removed some commented vestiges of the typecasting functionality needed
      to support heterogeneous datatypes.

commit e65c476284db9ef64b23191a21c2584b1083342f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Nov 19 10:05:35 2013 -0600

    Minor updates to packm_blk_var2.c and _blk_var3.c.
    
    Details:
    - Comment updates to packm_blk_var2.c and packm_blk_var3.c.
    - In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
      instead of setm(), scal2m().

commit 9e1d0d4bca48eda54301d8976f203e2544c9df3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 18 18:11:07 2013 -0600

    Added trsm_l, trsm_u ukernels for x86_64/core2.
    
    Details:
    - Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
      These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
      that already existed in kernels/x86_64/core2-sse3/3.

commit 85e7e02ea3a9190b6fcff5d46b00d41c79cb1242
Merge: 67761e22 70720054
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 18 12:02:00 2013 -0600

    Merge branch 'master'. Forgot to git-pull.

commit 67761e224c92500eecf9c1540cc72bdd2fb27679
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 18 11:57:40 2013 -0600

    Attempting to fix errors in bgq build.
    
    Details:
    - Removed restrict declaration from b_cast and c_cast from
      bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
      are causing problems for xlc only in those two files and no other
      macro-kernels.
    - Fixed (hopefully) kernel function parameter type declarations in
      kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.

commit 707200541d344f98cf34c9801954dbb36fbe0447
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 18 11:17:31 2013 -0600

    Syntax error fix in x86_64/core2 gemmtrsm_u ukr.

commit bbe2b84a49e7785d4d0c514cda34adfbe66478b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 18 11:11:06 2013 -0600

    Updated Makefile in test, testsuite.
    
    Details:
    - Updated Makefiles in test and testsuite directories to use the new
      BLIS header installation directory scheme, which is to compile with
      -I<PREFIX>/include/blis instead of -I<PREFIX>/include.

commit 9bd7fcfd436625ca2108128086671319362f4d92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 18 10:58:09 2013 -0600

    Outer-to-inner 'restrict' fix in macro-kernels.
    
    Details:
    - Fixed sloppy placement of 'restrict' pointer declarations in level-3
      macro-kernels. Previously, all restricted pointers were being declared
      at the outer-most function scope level. While this violates the C99
      standard, very few of the compilers used with BLIS so far have seemed
      to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
      for identifying this bug (and suggesting the fix).

commit 50549a6a31dd26cf63a013e0ede16b2c7ce835b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Nov 17 18:31:27 2013 -0600

    Changed header install directory to include/blis.
    
    Details:
    - Changed top-level Makefile so that headers are installed to
      $(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
      named by version/configuration and then symlinked.)
    - Added uninstall targets, including uninstall-old to clean out old
      library archives.
    - Added GREP makefile definitions to all configurations' make_defs.mk.

commit d70733abddfb9a95661897e1e4f3c1f3cfa7cbaa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 16 17:34:25 2013 -0600

    Added ARM kernels, configurations.
    
    Details:
    - Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
      Thanks to Francisco Igual for contributing these kernels and
      configurations.

commit d37c2cff62089c86983c2f79762f4b5329037373
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 13 10:47:11 2013 -0600

    Minor comment and Makefile changes.
    
    Details:
    - Added missing 'check-config' and 'check-make-defs' targets to
      testsuite/Makefile.
    - Removed unused 'test' target from top-level Makefile.
    - Comment changes to testsuite input files.

commit 19885f893a17b91ee79bead0620d0f913392d4c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 11 12:09:21 2013 -0600

    Updated some kernel comment headers.
    
    Details:
    - Updated bgq and piledriver comment headers to use BLIS copyright header
      instead of libflame.

commit 1a4d698f42981d74fe5f29b980031e1ee7dc42d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 11 10:15:40 2013 -0600

    CHANGELOG update (for 0.1.0).

commit 089048d5895a30221b6b1976c9be93ad6443420d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 9 17:18:00 2013 -0600

    Added object wrappers to 1f test suite modules.
    
    Details:
    - Added missing object wrappers to level-1f test suite modules. This was
      only apparent if you were configuring with something other than the
      reference configuration.
    - Commented out object-wrappers in level-1f front-ends. These were not
      working as intended the reference configuration was selected, because
      most kernel sets, such as those in the template set, do not have object
      wrappers.
    - Whitespace changes to template micro-kernels.
    - Comment changes to template level-1f kernel headers.

commit 9ef3752079de10124bed906b5d28479d04aa8187
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 8 17:20:47 2013 -0600

    Updated template kernels wrt KernelsHowTo wiki.
    
    Details:
    - Merged latest state of KernelsHowTo wiki into template micro-kernels
      located in config/template/kernels/3.

commit 376bbb59c8944e29c5c1ff6637920d8451370afa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 8 11:17:34 2013 -0600

    Removed support for duplication.
    
    Details:
    - Removed support for duplication from the gemmtrsm/trsm micro-kernels
      and all framework code.
    - Updated test suite modules according to above changes.

commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Nov 7 11:36:11 2013 -0600

    Added comments to testsuite/input.operations.
    
    Details:
    - Added extensive comments to the top of testsuite/input.operations,
      which describe how to edit the file.
    - Removed input.operations.0 and input.operations.1.
    - Changed input.general to test all datatypes ("sdcz") by default.

commit a98f78b715fb256a519870071bb5266130d70b21
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 6 15:32:47 2013 -0600

    Changed dim_t and inc_t to be signed integers.
    
    Details:
    - Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
      This will facilitate interoperability with Fortran in the future.
      (Fortran does not support unsigned integers.)
    - Redefined many instances of stride-related macros so that they return
      or use the absolute value of the strides, rather than the raw strides
      which may now be signed. Added new macros bli_is_row_stored_f() and
      bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
      and changed the packm_blk_var[23] variants to use these macros instead
      of the existing bli_is_row_stored(), bli_is_col_stored().
    - Added/adjusted typecasting to to various functions/macros, including
      bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
      related macros in bli_param_macro_defs.h.
    - Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
      layer properly handles situations where vector increments are negative.
      Thanks to Vladimir Sukharev for pointing out this issue.
    - Changed type of increment parameters in bli_adjust_strides() from dim_t
      to inc_t. Likewise in bli_check_matrix_strides().
    - Defined bli_check_matrix_object(), which checks for negative strides.
    - Redefined bli_check_scalar_object() and bli_check_vector_object() so
      that they also check for negative stride.
    - Added instances of bli_check_matrix_object() to various operations'
      _check routines.

commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Nov 6 10:09:10 2013 -0600

    Minor comment update to BLAS compat files.

commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 4 15:50:00 2013 -0600

    Fixed bugs in scalv and setv.
    
    Details:
    - Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
      a segmentation fault may occur if beta is not the same type as
      the vector operand for scalv and setv.
    - Changed axpyv and scal2v front-ends in a similar fashion.

commit f5953259a1842ee48e5833c22ac86e68a337bfe1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Nov 4 14:43:55 2013 -0600

    Fixed a bug related to Hermitian matrix diagonals.
    
    Details:
    - Fixed a bug whereby BLIS assumed that the imaginary components of the
      diagonal elements of Hermitian matrices were already zero. This property
      is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
      to Vladimir Sukharev for reporting this bug.
    - Minor comment updates to template kernels.

commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Nov 2 17:19:40 2013 -0500

    Added scaling to abval2s, sqrt2s macros.
    
    Details:
    - Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
      and overflow from squaring the real and imaginary components. (This is
      the same technique used to fix recent bugs in invscals/invscaljs and
      inverts.)

commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 1 10:28:04 2013 -0500

    Added new dotxaxpyf variant 2.
    
    Details:
    - Added a new variant for dotxaxpyf that is based on dotxf and axpyf
      kernels. By default, this variant is not used by any other operation.

commit 97f89fbcf202d72fc440b614708e352ea31633e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Nov 1 10:16:39 2013 -0500

    Fixed bug in complex invscals.
    
    Details:
    - Fixed complex inversion in invscals and invscaljs whereby the
      imaginary component was being computed incorrectly.
    - Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
      in inverts, invscals, and invscaljs.
    - Changed bli_abs() and bli_fabs() macro definitions to use "<="
      operator instead of "<".

commit eda42a21d17a2742eab69ab801ed530b82488c8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 31 18:00:44 2013 -0500

    Defined missing symbols in bla_rotg.c
    
    Details:
    - Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
      z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
      d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
      these bugs.

commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 30 14:39:01 2013 -0500

    Fixed bugs in scalm and setm.
    
    Details:
    - Fixed bugs in scalm and setm that resulted in segmentation faults when
      beta is not the same type as the matrix operand. Thanks to Vladimir
      Sukharev for reporting this bug.
    - Changed axpym and scal2m front-ends in fashion similar to that of scalm
      and setm; namely, the alpha scalar is copy-cast the type of the first
      matrix operand.
    - Changed the template and reference configurations' bli_config.h files
      so that the number of memory allocator blocks of A and B are set based
      on BLIS_MAX_NUM_THREADS.
    - Comment updates to bli_obj.c and variable rename in bla_nrm2.c.

commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 24 14:32:20 2013 -0500

    Fixed over/under-flow in complex inversion.
    
    Details:
    - Fixed the complex bli_?inverts() macros, which were inverting elements
      in an "unsafe" manner, such that very large and very small values were
      unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
      reporting this bug.
    - Comment update to bli_sumsqv_unb_var1.c.
    - Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
    - Changed 1.0F to 1.0 for bli_drands() macro.

commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Oct 23 12:15:25 2013 -0500

    Fixed parameter checking issue in BLAS syr[2]k.
    
    Details:
    - Fixed a minor parameter checking bug in the BLAS compatibility layer
      for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
      trans parameter of either operation, it is (a) allowed, and (b) treated
      as 'T' (whereas previously it was disallowed). Thanks for Vladimir
      Sukharev for finding and reporting this bug.

commit a091a219bda55e56817acd4930c2aa4472e53ba5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Oct 14 10:11:29 2013 -0500

    Minor fixes to piledriver configuration, ukernel.
    
    Details:
    - Applied a patch from Tyler that fixes minor staleness in the piledriver
      configuration and gemm micro-kernel.
    - Very minor changes to test suite input files.

commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 11:37:19 2013 -0500

    Added Fran's Sandy Bridge kernels/configuration.
    
    Details:
    - Added a kernel directory for kernels developed by Francisco Igual for
      the Sandy Bridge architecture, including a dgemm ukernel coded with
      AVX intrinsics.
    - Added a configuration for Sandy Bridge using values supplied by Fran.

commit 03106d650e4030d4c9831683448376f92fc52d41
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Oct 11 10:40:38 2013 -0500

    Fixed minor perf bug in gemm_ker_var2.
    
    Details:
    - Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
      bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
      computed correctly (ie: do not wraparound) at the edge cases. Thanks to
      Tze Meng for helping me identify this bug.

commit b053337387dbdef9035be03538222670a21707ca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 10 18:26:55 2013 -0500

    Added fusing factors, MR/NR to test suite output.
    
    Details:
    - Updated the test suite driver (and modules where appropriate) so that
      the level-1f fusing factors are output along with the variable dimension.
      While this is not strictly necessary, since the fusing factors are output
      in the initial parameter summary, it allows extra reassurance to the user
      since the fusing factors appear alongside the variable dimension, which
      together give a complete picture of the problem size. Similar changes were
      made for outputting the register blocksizes when reporting results for the
      micro-kernel test modules.

commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 10 14:20:06 2013 -0500

    Added test suite modules for level-1f, 3 kernels.
    
    Details:
    - Added test modules in test suite for level-1f kernels and level-3
      micro-kernels. (Duplication in the micro-kernels, for now, is NOT
      supported by these test modules.)
    - Added section override switches to test suite's input.operations file.
    - Added obj_t APIs for level-1f front-ends and their unblocked variants to
      facilitate the level-1f test modules. Also added front-end for dupl
      operation.
    - Added obj_t-based check routines for level-1f operations, which are
      called from the new front-ends mentioned above.
    - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
      factors as a function of datatype, which is needed by their respective
      test modules.
    - Whitespace changes to bli_kernel.h of all existing configurations.

commit 680188d46bb15b9a1a2867638104939dc77ca2a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 10 13:23:37 2013 -0500

    Cleaned up old test drivers.
    
    Details:
    - Minor updates to old test drivers in preparation for our participation
      in ACM TOMS's replicated results initiative.

commit 3690bdd4f95769c935c410414112102cc3e108b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 10 11:45:33 2013 -0500

    More updates to level-1f kernels for core2-sse3.
    
    Details:
    - Changed types in function signatures to match new prototypes. Meant to
      include this in previous commit.

commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Oct 10 11:27:27 2013 -0500

    Fixed outdated fusing factor macros in 1f kernels.
    
    Details:
    - Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
      macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
      this out.

commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Oct 1 17:01:18 2013 -0500

    Added section overrides to test suite.
    
    Details:
    - Added new lines of input to the test suite's input.operations file, which
      allows the user to disable entire sections (levels) of tests. Before this
      change, the user had to manually disable each operation tests's "master
      switch". (This is why input.operations.0 existed: to allow a more
      convenient starting point for someone who only wanted to test one or a
      few operations.)

commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 30 12:58:18 2013 -0500

    Added template implementations and other tweaks.
    
    Details:
    - Added a 'template' configuration, which contains stub implementations of the
      level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
      lots of in-file comments and documentation.
    - Modified some variable/parameter names for some 1/1f operations. (e.g.
      renaming vector length parameter from m to n.)
    - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
      to bli_kernel.h.
    - Modifed test suite to print out fusing factors for axpyf, dotxf, and
      dotxaxpyf, as well as the default fusing factor (which are all equal
      in the reference and template implementations).
    - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
      reference variants were implemented in terms of front-end routines rather
      that directly in terms of the kernels. (For example, axpy2v was implemented
      as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
    - Changed the interface to dotxf so that it matches that of axpyf, in that
      A is assumed to be m x b_n in both cases, and for dotxf A is actually used
      as A^T.
    - Minor variable naming and comment changes to reference micro-kernels in
      frame/3/gemm/ukernels and frame/3/trsm/ukernels.

commit 97aaf220a847363b4da35935eca17790c0ef71f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 17 10:51:36 2013 -0500

    Added new kernels, configurations.
    
    Details:
    - Added various micro-kernels for the following architectures:
        Intel MIC
        IBM BG/Q
        IBM Power7
        AMD Piledriver
        Loogson 3A
      and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
      and Xianyi Zhang for contributing these kernels.
    - Added configurations corresponding to above architectures, and renamed
      "clarksville" configuration to "dunnington".

commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 13 14:31:53 2013 -0500

    Removed default configuration behavior.
    
    Details:
    - Changed the configure script so that it no longer defaults to the
      reference configuration. This change is being made so that the
      developer has a firm awareness of which configuration is being used
      to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
      suggested change.

commit da77e9614f54f92f703f01e3b9bd67a83280150c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Sep 13 12:00:37 2013 -0500

    Minor improvements to static memory allocator.
    
    Details:
    - Expanded on cpp macro definitions from bli_mem.c and relocated them to
      a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
      functionality includes computing the pool size for each datatype (using
      that datatype's cache blocksizes) and using the maximum to size the
      actual pool array. This addresses the somewhat common pitfall whereby a
      developer updates cache blocksizes in bli_kernel.h for only one datatype
      (say, single-precision real), while the memory pools are sized using the
      double-precision real values. Then, when the developer attempts to link
      to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
      a message saying the static memory pool was exhausted. Clearly, this
      message is misleading when the pool was not sized properly to begin with.
    - Removed previously disabled code in bli_kernel_macro_defs.h that was
      meant to check for size consistency among the various cache blocksizes.
      (Obviously the memory pool size-based solution mentioned above is better.)
    - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
      reasonable place to put these constants, rather than further crowd up
      bli_config.h.
    - Updated testsuite driver to output memory pool sizes for A, B, and C.
    - Minor comment updates to bli_config.h.
    - Removed 'flame' configuration. It was beginning to get out-of-date, and
      I hadn't used it in months. We can always re-create it later.

commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 10 17:17:28 2013 -0500

    Added ESSL and Accelerate targets to test drivers.
    
    Details:
    - Added ESSL and Accelerate (OS X) targets to standalone test drivers'
      Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
      / providing this patch.

commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 10 16:35:12 2013 -0500

    Various changes to treatment of integers.
    
    Details:
    - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
      assigned values of 32, 64, or some other value. The former two result in
      defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
      causes integers to be defined in terms of a default type (e.g. long int).
    - Updated bli_config.h in reference and clarksville configurations according
      to above changes.
    - Updated test drivers in test and testsuite to avoid type warnings associated
      with format specifiers not matching the types of their arguments to printf()
      and scanf().
    - Inserted missing #include "bli_system.h" into blis.h (which was slated for
      inclusion in d141f9eeb6d1).
    - Added explicit typecasting of dim_t and inc_t to macros in
      bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
    - Slight changes to CREDITS and INSTALL files.
    - Slight tweaks to Windows build system, mostly in the form of switching to
      Windows-style CRLF newlines for certain files.

commit 068437736b41d51a1f5ec47839f059bf58a20413
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 9 14:07:58 2013 -0500

    Fixed set-but-not-used compiler (gcc) warnings.
    
    Details:
    - Used void-casts of certain variables to appease gcc (and perhaps other
      compilers) when such variables are only used in the complex instances of
      the functions. Special thanks to Karl Rupp for suggesting a portable fix
      for these warnings.

commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 9 13:48:52 2013 -0500

    Small fix to Windows defs.mk makefile fragment.
    
    Details:
    - Commented out a !include statement that was attempting to include a
      version file that does not yet exist. For now, the version string is
      hard-coded into defs.mk.

commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 9 13:09:16 2013 -0500

    Added Windows build system.
    
    Details:
    - Added a 'windows' directory, which contains a Windows build system
      similar to that of libflame's. Thanks to Martin for getting this up
      and running.
    - Spun off system header #includes into bli_system.h, which is included
      in blis.h
    - Added a Windows section to bli_clock.c (similar to libflame's).

commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Sep 9 11:04:46 2013 -0500

    Edited bli_?lamch.c to avoid Windows keyword.
    
    Details:
    - Renamed "small" variable to "smnum" to avoid collision with Windows type
      by the same name. This change is needed in advance of the upcoming Windows
      build system.

commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 4 13:36:07 2013 -0500

    Switched integer typedefs (again) to C types.
    
    Details:
    - Redefined gint_t and guint_t in terms of the standard C types long int
      and unsigned long int, respectively.
    - Changed testsuite default max problem size to 500.
    - Changed testsuite input.operations to use square problems for level-3
      operation tests.

commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Sep 4 12:09:11 2013 -0500

    Falling back to 32-bit integers for dim_t, etc.
    
    Details:
    - In light of recent segfaulting issues when compiling on 32-bit systems,
      I've changed the default typedef for gint_t and guint_t from int64_t and
      uint64_t to int32_t and uint32_t, respectively.
    - Disabled 64-bit integers in the blas2blis layer for the reference
      configuration.
    - Added type sizes of gint_t, guint_t, and the four floating-point datatypes
      to introductory output of the testsuite.

commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 3 21:58:07 2013 -0500

    Applied temp fix to typecasting bug in testsuite.
    
    Details:
    - Applied a temporary fix to the typecasting bug in the testsuite driver.
      The fix involves casting both numerator and denominator to unsigned long.
      This fix is more voodoo than science, as I can't be sure why it even
      works.

commit 9ee6e125373869c4213c017ce772c38ecefba103
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Sep 3 21:53:27 2013 -0500

    Changed dimension spec for gemm in testsuite.
    
    Details:
    - Encounted a bizarre typecasting bug whereby the test suite was not
      computing the proper dimension from the problem size and dimension
      specification when the latter was set to -3. Will investigate.
      Thanks to Fran for finding this "bug".

commit e8be081e68c385ab44d0fea8dade21d40c200b79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 28 15:52:34 2013 -0500

    Generalized matlab and file output in testsuite.
    
    Details:
    - Added a new option in input.general that allows outputting in
      matlab/octave format so that one can output in matlab format
      independently from outputting to files.
    - Adjusted input.operations according to above.
    - Added input.operations.0 and input.operations.1 with all options
      disabled and enabled, respectively.

commit d352c746e5683037d41b5061dfb5ce08e1d0843b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 27 13:41:46 2013 -0500

    Added single/real gemm micro-kernel for x86_64.
    
    Details:
    - Added a single-precision real gemm micro-kernel in
      kernels/x86_64/3/bli_gemm_opt_d4x4.c.
    - Adjusted the single-precision real register blocksizes in
      config/clarksville/bli_kernel.h to be 8x4.
    - Added a missing comment to bli_packm_blk_var2.c that was present in
      bli_packm_blk_var3.c

commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Aug 19 12:07:41 2013 -0500

    Fixed bug in bli_acquire_mpart_t2b(), _l2r().
    
    Details:
    - Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
      that cause incorrect partitioning when SUBPART0 was requested. This
      bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
      this bug.
    - Removed dupl kernels from kernels/x86_64/3 directory.
    - Uncommented beta == 0 optimizaition code in
      kernels/x86_64/3/bli_gemm_opt_d4x4.c.

commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 8 14:39:35 2013 -0500

    Moved init_safe(), finalize_safe() to BLAS compat.
    
    Details:
    - Moved the bli_init_safe() and bli_finalize_safe() function calls from the
      BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
      initializers in the BLIS layer wasn't buying us anything because the user
      could still call the library with uninitialized global scalar constants,
      for example. Thus, we will just have to live with the constraint that
      bli_init() MUST be called before calling ANY routine with a bli_ prefix.
    - Added the missing _init_safe() and finalize_safe() calls to the level-1
      BLAS compatibility wrappers.

commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 8 13:30:19 2013 -0500

    Miscellaneous updates.
    
    Details:
    - Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
      BLIS_CACHE_LINE_SIZE (typically 64).
    - Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
      kernels.
    - Reformulated gemm_ker_var2 to look more like the other level-3 macro-
      kernels, in that the interior and edge-case handling is expressed once
      inside the loops in the n and m dimensions, rather than the edge-case
      handling being "unrolled" and expressed as distinct code regions. The
      previous macro-kernel now lives in retired form in the subdirectory
      other/bli_gemm_ker_var2.c.old.
    - Updated experimental gemm_ker_var5 according to above change.
    - Fixed bug in bli_her2k.c whereby incorrect transformations were being
      applied to optimize the macro-kernel accesses pattern on C when C is
      row-stored.
    - Various updates inside of test/exec_sizes.

commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Aug 7 12:27:04 2013 -0500

    Fixed bug in interface of bla_ger_check().
    
    Details:
    - Fixed the misplaced lda parameter in the function signature of
      bla_ger_check(). Thanks to Tyler for finding this bug.

commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Aug 6 12:25:51 2013 -0500

    Fixed cpp guard typos in frame/compat/check files.
    
    Details:
    - Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
      BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
    - Fixed various syntax errors in the code that had yet to be compiled
      due to the aforementioned bug.

commit f4ec28e723d28d998f1038f82da6986e44320ef6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Aug 1 11:24:23 2013 -0500

    Added basic OpenMP-based gemm and packm files.
    
    Details:
    - Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
      into the following auxiliary files
    
        frame/1m/packm/other/bli_packm_blk_var2.c
        frame/3/gemm/other/bli_gemm_ker_var2.c
    
      The routine in the first file uses a basic OpenMP parallel region to
      parallelize the packing of blocks of A and panels of B, while the
      second uses a similar parallel region to parallelize along the n
      dimension of the gemm macro-kernel.

commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
Merge: 67a8b949 6e7e4523
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 26 11:14:27 2013 -0500

    Merge branch 'master' of https://code.google.com/p/blis

commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 26 11:12:37 2013 -0500

    Added missing cpp kernel blocksize constraints.
    
    Details:
    - Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
      constraints on the register blocksizes relative to the cache blocksizes.
      Thanks to Tyler for helping me stumble across this issue.

commit 6e7e452343014e8f86640874dc1dbadca4a642a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 22 14:50:57 2013 -0500

    Fixed minor warnings and misc issues.
    
    Details:
    - Fixed various warnings output by gcc 4.6.3-1, including removing some
      set-but-not-used variables and addressing some instances of typecasting
      of pointer types to integer types of different sizes.

commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 22 12:54:32 2013 -0500

    Tightened some macros that detect datatypes.
    
    Details:
    - Modified the definitions of some macros, such as bli_is_real(), so that
      the "special" bit is taken into account so that BLIS_INT is differentiated
      from BLIS_FLOAT.
    - Whitespace changes to bli_obj_macro_defs.h.
    - Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
      being used.

commit b33e2f4443b9043b554963320280ff7783773652
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jul 19 17:15:03 2013 -0500

    CHANGELOG update (for 0.0.9).

commit 0680916fdd532f7a4716b11a2515243b2c08d00f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 18 18:04:34 2013 -0500

    Added BLAS error checking to compatibility layer.
    
    Details:
    - Added frame/compat/check directory, which now houses companion _check()
      routines for each of the BLAS wrappers in frame/compat. These _check()
      routines are called from the compatibility wrappers and mimic the
      error-checking present in the netlib BLAS.
    - Edited bla_xerbla.c so that xerbla() translates the operation string to
      uppercase before printing.
    - Redefined util routines in frame/compat/f2c/util in terms of level0
      macros.
    - Added prototypes for util routines, f2c routines, lsame(), and xerbla().
    - Commented out prototypes in test/test_*.c since Fortran integers are now
      int64_t by default (and the prototypes that were present in the files
      used int).
    - Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
      since blis.h was already being included.
    - Other minor changes to code in frame/compat/f2c.

commit 4e80ad28c97273db3366428ec44020da7944964d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jul 18 17:53:31 2013 -0500

    Added support for C99 complex types/arithmetic.
    
    Details:
    - Added support for C99 complex types to bli_type_defs.h and overloaded
      complex arithmetic to the scalar-level macros in include/level0. This
      includes a somewhat substantial reorganization and re-layering of much
      of the existing machinery present in the level0 macros.
    - Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
      commented-out by default, which optionally enables the use of built-in
      C99 complex types and arithmetic.
    - Minor changes to clarksville and reference configs' make_defs.mk files.
    - Removed macro definitions from bli_param_macro_defs.h which was not being
      used (bli_proj_dt_to_real_if_imag_eq0).

commit 6072d7c848e837ba20d607f7b727438ada31bdcf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 17 12:27:45 2013 -0500

    Fixed bugs in trsm, trmm macro-kernels.
    
    Details:
    - Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
    - Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
      incorrectly being adjusted upward by MR, instead of NR. The rl and ru
      trmm macro-kernels were updated in a similar fashion.
    - Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
      diagoffb when recomputing k to skip a zero region below where the
      diagonal intersects the right side of the block. The corresponding
      trmm macro-kernel was also updated.
    - Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
      needed to be placed AFTER the block that recomputes k to skip the zero
      region (if present). The other three trsm macro-kernels, as well as the
      trmm macro-kernels, were updated in the same manner, for consistency.
    - Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
      being updated to skip a zero region to the left of where the diagonal
      of A intersects the top edge of the block.
    - Comment updates to all trsm and trmm macro-kernels.
    - Comment updates to bli_packm_init.c.

commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 10 14:53:59 2013 -0500

    Added f2c'ed Givens rotation wrappers.
    
    Details:
    - Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
      along with other wrappers for which no BLIS implementation exists.
    - Added f2c-generated codes for applicable datatype flavors of rot, rotg,
      rotm, and rotmg operations.

commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 10 13:40:12 2013 -0500

    Removed copynz defs from bli_kernel.h files.
    
    Details:
    - Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
      configuration. (Meant to include this in previous commit.)

commit aec12d90f596e8c04b1ad178258a1cd38108f59d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jul 10 13:33:30 2013 -0500

    Removed copynzv, copynzm and related codes.
    
    Details:
    - Removed copynzv and copynzm operation directories. These operations
      implemented a variation of copyv/m that, in the case of real source
      and complex destination operands, leaves the imaginary component
      untouched (rather than setting it to zero). I realize now that the
      special case(s) (e.g. gemm with real A and B but complex C) that I
      thought required this operation actually can be handled more simply.
    - Removed level0 scalar macros implementing copynzs, copynzjs.

commit b0a0a0f274a761788531b5d281cc3b411b7124ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jul 9 17:15:38 2013 -0500

    Added handling of restrict, stdint.h for non-C99.
    
    Details:
    - Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
      in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
      manually typedefs the types we need (which, for now, are unconditionally
      int64_t and uint64_t).
    - Moved basic typedefs to top of bli_type_defs.h, and comment changes.
    - Added cpp macro block to bli_macro_defs.h that #defines restrict as
      nothing for C++ and non-C99.

commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 8 15:20:34 2013 -0500

    Migrated integer usage to stdint.h types.
    
    Details:
    - Changed the way bli_type_defs.h defines integer types so that dim_t,
      inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
      integer) or guint_t (general unsigned integer).
    - Renamed Fortran types fchar and fint to f77_char and f77_int.
    - Define f77_int as int64_t if a new configuration variable,
      BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
      These types are defined in stdint.h, which is now included in blis.h.
    - Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
      in terms of scomplex.
    - Renamed "char" type in f2c files to "character" and typedef'ed in terms
      of char.
    - Updated bla_amax() wrappers so that the return type is defined directly
      as f77_int, rather than letting the prototype-generating macro decide
      the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
      so I removed them. Also, changed the body of the wrapper so that a
      gint_t is passed into abmaxv, which is THEN typecast to an f77_int
      before returning the value.
    - Updated f2c code that accessed .r and .i fields of complex and
      doublecomplex types so that they use .real and .imag instead (now that
      we are using scomplex and dcomplex).

commit 372501398564fdba3d5a3db86c30bc1039b185ff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jul 8 11:24:18 2013 -0500

    Added experimental bli_gemm_ker_var5().
    
    Details:
    - Added support for an experimental gemm macro-kernel incrementally
      packs one micro-panel of B at a time. This is useful for certain
      special cases of gemm where m is small.
    - Minor changes to default values of clarksville configuration.
    - Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
      do not yet have any use (or implementation support) for block storage.
    - Comment update to bli_packm_init.c.

commit 9915d667a79f23e3a2a2516247c560e9063a1646
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Jul 7 13:28:39 2013 -0500

    Defined "total" blocksize query functions.
    
    Details:
    - Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
      the default blocksize plus blocksize extension (using the type or the type
      of an object).
    - Comment update in bli_packm_cxk.c.

commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 27 13:19:56 2013 -0500

    Consolidated lower/upper her[2]k blocked variants.
    
    Details:
    - Consolidated lower and upper blocked variants for herk and her2k, and
      renamed the resulting variants, according to the same changes recently
      made to trmm and trsm.
    - Implemented support for four new subpartitions types:
        BLIS_SUBPART1T
        BLIS_SUBPART1B
        BLIS_SUBPART1L
        BLIS_SUBPART1R
      which correspond to "merged" partitions that include the middle "1"
      partition as well as either the neighboring "0" or "2" partition. This is
      used to clean up code in herk/her2k var2 that attempts to partition away
      the strictly zero region above or below the diagonal of a matrix operand
      that is being marched through diagonally.
    - Added safeguards to herk macro-kernels that skip any leading or trailing
      zero region in the panel of C that is passed in. This is now needed given
      that herk/her2k var1 no longer partitions off this zero region before
      calling the macro-kernel (via bli_her[2]k_int()).
    - Updated comments and other whitespace changes to trmm/trsm macro-kernels.

commit 02002ef6f3d2746665982793db36714bd69bccc9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 24 17:08:14 2013 -0500

    Added row-storage optimizations for trmm, trsm.
    
    Details:
    - Implemented algorithmic optimizations for trmm and trsm whereby the right
      side case is now handled explicitly, rather than induced indirectly by
      transposing and swapping strides on operands. This allows us to walk through
      the output matrix with favorable access patterns no matter how it is stored,
      for all parameter combinations.
    - Renamed trmm and trsm blocked variants so that there is no longer a
      lower/upper distinction. Instead, we simply label the variants by which
      dimension is partitioned and whether the variant marches forwards or
      backwards through the corresponding partitioned operands.
    - Added support for row-stored packing of lower and upper triangular matrices
      (as provided by bli_packm_blk_var3.c).
    - Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
      blocksize  extensions (if non-zero) were not being used to appropriately size
      the first iteration (ie: the bottom/right edge case).
    - Updated comments in bli_kernel.h to indicate that both MC and NC must be
      whole multiples of MR AND NR. This is needed for the case of trsm_r where,
      in order to reuse existing left-side gemmtrsm fused micro-kernels, the
      packing of A (left-hand operand) and B (right-hand operand) is done with
      NR and MR, respectively (instead of MR and NR).

commit d1e81ddc848ee47bc188735883d14582bdd0cabc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 13 11:14:21 2013 -0500

    Minor generalizing tweaks to trmm blk var1, var2.

commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 12 16:40:04 2013 -0500

    CHANGELOG update.

commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Jun 12 16:02:12 2013 -0500

    Use separate CFLAGS for "kernels" directories.
    
    Details:
    - Added a new "special" directory type: any source code within directories
      named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
      compiler flags. This allows the developer to specify a separate set of
      flags (e.g. optimization flags) for compiling kernels while maintaining a
      standard set for regular framework code.
    - Fixed a bug in the top-level Makefile that was causing "noopt" code
      to be compiled with the standard set of compilation flags.
    - Updated make_defs.mk in reference, flame, and clarksville configurations
      according to above changes.

commit 08475e7c7653ba598665071a617d10f0d8f763c2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 11 12:18:39 2013 -0500

    Various level-3 optimizations for row storage.
    
    Details:
    - Implemented remaining two cases within bli_packm_blk_var2(), which allow
      packing from a lower or upper-stored symmetric/Hermitian matrix to column
      panels (which are row-stored). Previously one could only pack to row panels
      (which are column-stored).
    - Implemented various optimizations in the level-3 front-ends that allow more
      favorable access through row-stored matrices for gemm, hemm, herk, her2k,
      symm, syrk, and syr2k.
    - Cleaned up code in level-3 front-ends that has to do with setting target and
      execution datatypes.

commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jun 7 11:04:10 2013 -0500

    Added beta == 0 optimization to x86_64 ukernel.
    
    Details:
    - Modified x86_64 gemm microkernel so that when beta is zero, C is not read
      from memory (nor scaled by beta).
    - Fixed minor bug in test suite driver when "Test all combinations of storage
      schemes?" switch is disabled, which would result in redundant tests being
      executed for matrix-only (e.g. level-1m, level-3) operations if multiple
      vector storage schemes were specified.
    - Restored debug flags as default in clarksville configuration.

commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Jun 6 13:36:06 2013 -0500

    Whitespace changes to old test drivers.
    
    Details:
    - Replaced tabs with four spaces in places where indention was already
      in place.

commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Jun 4 14:57:46 2013 -0500

    Fixed unaligned handling in axpyf, dotxaxpyf.
    
    Details:
    - Fixed over-cautious handling of unaligned operands in vector instrinsic
      implementation of axpyf kernel.
    - Fixed over- and under-cautious handling of unaligned operands in vector
      intrinsic implementation of dotxaxpyf kernel.

commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Jun 3 16:54:52 2013 -0500

    Updated level-1/-1f [vector intrinsic] kernels.
    
    Details:
    - Updated level-1/-1f kernels so that non-unit and un-aligned cases are
      handled by reference implementation (rather than aborted).
    - Added -fomit-frame-pointer to default make_defs.mk for clarksville
      configuration.
    - Defined bli_offset_from_alignment() macro.
    - Minor edits to old test drivers.

commit 0288c827d3659bb225ac9c10f168b623ed0106a2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Jun 1 08:02:23 2013 -0500

    Updated ukernels for x86_64.
    
    Details:
    - Tweaked micro-kernels and configuration for clarksville.
    - Updated/cleaned up old test drivers in test directory.
    - Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
      recently).

commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon May 6 11:05:08 2013 -0500

    Replaced axpys usage with subs in trsv.
    
    Details:
    - Replaced instances of axpys with alpha equal to -1 with subs.
    - Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
      sizeof(dcomplex).

commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 24 16:28:10 2013 -0500

    Fixed x86_64 kernel bugs and other minor issues.
    
    Details:
    - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
      unaligned subpartitions. We were already going out of our way a bit to
      handle edge cases in the first iteration for blocked variants, and this
      was simply the unblocked-fused extension of that idea.
    - Fixed control tree handling in her/her2/syr/syr2 that was not taking
      into account how the choice of variant needed to be altered for
      upper-stored matrices (given that only lower-stored algorithms are
      explicitly implemented).
    - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
      macros to provide inlined versions of bli_determine_blocksize_[fb]() for
      use by unblocked-fused variants.
    - Integrated new blocksize_dim macros into gemv/hemv unf variants for
      consistency with that of the bugfix for trmv/trsv (both of which now
      use the same macros).
    - Modified bli_obj_vector_inc() so that 1 is returned if the object is a
      vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
      conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
      was invalid only because the code was expecting 1 (for purposes of
      performing contiguous vector loads) but got a value greater than 1 because
      the column stride of the object (e.g. rho) was inflated for alignment
      purposes (albeit unnecessarily since there is only one element in the
      object).
    - Replaced some old invocations of set0 with set0s.
    - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
    - Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
    - Added safeguard to test modules so that testing a problem with a zero
      dimension does not result in a failure.
    - Tweaked handling of zero dimensions in level-2 and level-3 operations'
      internal back-ends to correctly handle cases where output operand still
      needs to be scaled (e.g. by beta, in the case of gemm with k = 0).

commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 3 17:35:32 2013 -0500

    Renamed _trans_status() macro.
    
    Details:
    - Mistakenly forgot to rename the _trans_status() macro and instances in
      previous commit.

commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri May 3 17:24:58 2013 -0500

    Renamed _set_trans(), _trans_status() macros.
    
    Details:
    - Renamed the following macros:
        bli_obj_set_trans()    -> bli_obj_set_onlytrans()
        bli_obj_trans_status() -> bli_obj_onlytrans_status()
      to remove ambiguity as to which bits are read/updated.

commit 2f8174509ea9f844db11ebd9389de5168e85b132
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 1 15:06:30 2013 -0500

    Unconditionally check memory pool(s) for errors.
    
    Details:
    - Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
      memory pool is exhausted before checking out and returning a block, even
      if BLIS error checking has been disabled. These errors are useful because
      they likely indicate that BLIS was improperly configured for the code
      being run.

commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed May 1 15:00:30 2013 -0500

    CHANGELOG update.

commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 30 19:35:54 2013 -0500

    Absorbed blocksize extensions into main objects.
    
    Details:
    - Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
      fields to the blksz_t object rather than have them as separate structs.
    - Updated all packm interfaces/invocations according to above change.
    - Generalized bli_determine_blocksize_?() so that edge case optimization
      happens if and only if cache blocksizes are created with non-zero
      extensions.
    - Updated comments in bli_kernel.h files to indicate that the edge case
      blocksize extension mechanism is now available for use.

commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 25 17:16:59 2013 -0500

    Added option to disable err checking in testsuite.
    
    Details:
    - Added a new line to input.general that allows one to specify the error-
      checking level to use for each BLIS experiment. The only two levels
      supported for now are "no error checking" and "full error checking".

commit 096b366ddcfe386f44419ef84d8df8be13825f86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 25 16:43:43 2013 -0500

    Use cntl trees that block in n dimension.
    
    Details:
    - Updated _cntl.c files for each level-3 operation to induce blocked
      algorithms that first paritition in the n dimension with a blocksize
      of NC. Typically this is not an issue since only very large problems
      exceed that of NC. But developers often run very large problems, and
      so this extra blocking should be the default.
    - Removed some recently introduced but now unused macros from
      bli_param_macro_defs.h.

commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 25 12:06:12 2013 -0500

    Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
    
    Details:
    - Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
      and PASTEMAC3) with those that only use a single type (PASTEMAC).
    - Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
      accommodate above change.
    - Fixed comment typo in bli_config.h files.
    - Added .nfs* pattern to .gitignore.

commit df80acf517dde180ddcc5835c6136b2fa7556d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 23 19:43:23 2013 -0500

    Fixed computation of b_next in L3 macro-kernels.
    
    Details:
    - Restructured herk_l and herk_u macro-kernels in the imagine of trmm
      and trsm, in that the edge cases are captured by the main loop, rather
      than trying to have "cleanup" sections that result in four distinct
      parts (interior, bottom edge, right edge, bottom-right edge) of the
      code.
    - Fixed the way b_next was being computed in the non-gemm level-3
      macro-kernels (herk, trmm, trsm). The way they are computed now matches
      that of gemm.

commit 3671528cf8efe4b445d196665143a5c50c2c6048
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 23 19:12:14 2013 -0500

    Fixed minor bug in computing b_next in gemm.

commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 23 17:49:10 2013 -0500

    Fixed rare edge case bug in herk_l macro-kernel.
    
    Details:
    - Fixed a potential bug in herk_l at the m_left edge case. If MR was
      chosen to be much larger than NR, then one could encounter edge cases
      in the the MC dimension that fall entirely below the diagonal, which
      the previous implementation of the herk_l macro-kernel was not allowing
      for.

commit 1dab11e37d1cb403cbe75b73a644c00de534f104
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 23 17:17:11 2013 -0500

    Updated x86 gemmtrsm ukernels to use alpha.

commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 23 16:00:18 2013 -0500

    Added a_next, b_next arguments to micro-kernels.
    
    Details:
    - Added two more arguments to the gemm and gemmtrsm microkernels: the
      addresses of the next micro-panels of A and B. By passing these
      pointers into the micro-kernel, we allow the micro-kernel author to
      prefetch micro-panels of A and B as necessary (though this is
      completely optional; these addresses may also be safely ignored).
    - Updated all seven macro-kernels so that they compute and pass in
      a_next and b_next. Note that ONLY the gemm macro-kernel computes
      a_next and b_next with the precise semantics we want. I will go back
      and fix the other macro-kernels in the near future.
    - Added 'restrict' to various micro-kernels from which it was missing.

commit f3815dc84d385c514a5acaf1e925424a57be2f51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 23 11:12:33 2013 -0500

    Added code for backward edge-case blocking.
    
    Disabled:
    - Edited bli_determine_blocksize_b() to include experimental (and
      currently disabled) code that computes extended blocks.
    - Updated commnts relate to above changes.
    - Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.

commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 22 19:00:43 2013 -0500

    Updated dupl implementation to use PACKNR and NR.
    
    Details:
    - Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
      explicitly so navigate b1 so that situations where PACKNR > NR are
      supported.
    - Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
      frame/3/trsm/ukernels to kernels/c99/.
    - Updated clarksville and flame configurations.

commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Apr 21 15:10:34 2013 -0500

    Disabled blocksize checks for memory pools.
    
    Details:
    - Temporarily disabled checks that ensure that enough memory will be allocated
      by the contiguous memory allocator for all types, given that the values for
      double precision real are the ones used to allocate the space. These checks
      can easily go awry in certain situations, especially if you are developing for
      only one datatype. So for now, they are probably more trouble than they are
      worth.

commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Apr 21 15:00:24 2013 -0500

    Allow ldim of packed micro-panels != MR, NR.
    
    Details:
    - Made substantial changes throughout the framework to decouple the leading
      dimension (row or column stride) used within each packed micro-panel from
      the corresponding register blocksize. It appears advantageous on some
      systems to use, for example, packed micro-panels of A where the column
      stride is greater than MR (whereas previously it was always equal to MR).
    - Changes include:
      - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
        to use when packing micro-panels of A and B.
      - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
        where appropriate, instead of MR and NR.
      - Added pd field (panel dimension) to obj_t.
      - New interface to bli_packm_cntl_obj_create().
      - Renamed bli_obj_packed_length()/_width() macros to
        bli_obj_padded_length()/_width().
      - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
      - Print out new cache and register blocksize extensions in test suite.
    - Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
      blocksize for edge cases, which can improve performance at the margins.

commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 19 15:26:29 2013 -0500

    Fixed bug in compatibility layer (her2k/syr2k).
    
    Details:
    - Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
      and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
      interface caller requests the [conjugate-]transpose case. Thanks to Bryan
      Marker for reporting the behavior that led to this bug.

commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 18 19:39:13 2013 -0500

    Changed old level3 test drivers to call front-ends.
    
    Details:
    - Changed old level-3 test drivers, in 'test' directory, to always call the
      front-end object API instead of the internal back-end with the locally
      defined control tree.

commit 83e45de23e565138b8fde06fb11cfedc973b7246
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 18 18:33:03 2013 -0500

    Allow packm_init() to reacquire a too-small mem_t.
    
    Details:
    - Changed bli_packm_init() to react differently to a situation where a pack
      obj_t has an already-allocated mem_t entry that has a buffer that is smaller
      than what will be needed to hold the block/panel that now needs to be
      packed. Previously, this situation was treated with an abort() since I
      assumed something was horribly wrong. I have changed the code so that it now
      reacts by releasing the previous mem_t and re-acquires a new mem_t with the
      new information. (This change was done at the request of Bryan Marker to
      facilitate code generation via DxT.)

commit a6990434173b0cf651f8521194f3aef738deb7d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 18 13:52:47 2013 -0500

    Fixed bug in packing block of A for hemm/symm.
    
    Details:
    - Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
      of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
      symmetric matrix where the block of A being packed intersects the diagonal,
      but some of its micro-panels do not intersect the diagonal and lie completely
      in the unstored region. Thanks to Francisco Igual for reporting this bug.
    - Comment updates to both _blk_var2.c and _blk_var3.c.

commit c92e7590e1934f830814ab614c794215ebe0c415
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Apr 17 20:53:29 2013 -0500

    Activated bli_packm_acquire_mpart_t2b().
    
    Details:
    - Removed the overly-paranoid bli_abort() from the end of
      bli_packm_acquire_mpart_t2b(), to allow others to experiment with
      partitioning through packed blocks of A. Also, and more importantly,
      changed an earlier check that was causing an erroneous (but
      coincidentally redundant) abort(). Also, updated some of the comments
      in bli_packm_part.c.

commit bea579e9f009a44e08008eb14d09f38748ab2b53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 16 19:43:14 2013 -0500

    Allow creation of "empty" objects.
    
    Details:
    - Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
      modified bli_adjust_strides() to explicitly handle m = n = 0.
    - Updated bli_check_matrix_strides() to allow cases where m = n = 0.

commit 7904e20f2e6908571ee5008da2a08084198eefae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 16 17:37:16 2013 -0500

    Fixed "root" object bug in bli_her[2]k/syr[2]k.
    
    Details:
    - Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
      that manifested as the incorrect triangle being updated. It occurred when
      the user would pass in a matrix object that was correctly marked as
      symmetric/Hermitian and lower-stored, but whose root object was never marked
      as lower (or upper). We now alias and re-assign root status for matrix C
      within the front-ends. Note that trmm and trsm were already doing this,
      albeit for a slightly different reason (to allow the internal back-end to
      choose which algorithm to run--lower or upper--based on the uplo of the root
      object for both left and right side cases). Thanks to Bryan Marker for
      leading me to this bug.

commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 16 11:24:03 2013 -0500

    Fixed overzealous type-checking in bli_getsc().
    
    Details:
    - Relaxed type checking in getsc so that the input object could be a constant
      and not just a proper floating-point type. (If it is a constant, default to
      extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
      bug.
    - Added definition for bli_is_constant() in bli_param_macro_defs.h
    - Comment updates to various level-0 scalar routines.

commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 15 19:27:57 2013 -0500

    Fixed bug in bli_obj_is_packed() and renamed.
    
    Details:
    - This macro is used to determine whether the partitioning routines should
      call a corresponding packm_part routine instead. However, it was
      unintentionally catching matrices that were marked as "packed" by virtue
      of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
      The macro has now been renamed to bli_obj_is_panel_packed(), and now only
      checks for row or column panel packing. (Note that I first attempted to
      fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
      erroneous behavior that led me to this bug.

commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 15 17:54:43 2013 -0500

    Removed local reference ukernel blocksize macros.
    
    Details:
    - Removed locally defined gemm microkernel blocksize macros from _mxn
      reference microkernel definition and header. Meant to include this in
      a recent/previous commit (0020ef7c8271).

commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 15 14:40:31 2013 -0500

    Formatting change to mods in previous commit.

commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 15 14:31:40 2013 -0500

    Set structure of objects in level-2 BLIS APIs.
    
    Details:
    - Added missing statement to set structure field of local objects in
      top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
      reporting this bug.

commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 15 10:21:26 2013 -0500

    Tweak to test suite function string construction.
    
    Details:
    - Fixed a minor bug in the way that the test suite would construct function
      name strings when the user anchored all parameters in input.operations.
      In this case, the test driver would mistake this situation for one where
      the operation simply had no parameters to begin with, and thus would not
      include the parameter string in the function string that is output for
      every result.

commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 15 09:59:46 2013 -0500

    Fixed a bug in reference implementation of dupl.
    
    Details:
    - Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
      which resulted in incorrect duplication.
    - Updated old test drivers according to recently updated packm control tree
      creation interface.
    - Added 'restrict' to x86 gemm microkernel interface.

commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Apr 14 19:05:33 2013 -0500

    Modified bli_kernel.h include order in blis.h.
    
    Details:
    - Delayed #include of bli_kernel.h in blis.h to prevent a situation where
      _kernel.h includes an optimized microkernel header, which uses BLIS types
      such as dim_t and inc_t, which would precede the definition of those types
      in bli_type_defs.h.
    - Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
      (immediately after that of bli_kernel.h).

commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 13 16:53:16 2013 -0500

    CHANGELOG update.

commit ec16c52f2ecf419c749175ce0a297441c10f1c68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 13 16:41:16 2013 -0500

    Updated INSTALL file (now redirects to website).

commit 0020ef7c82711a7ebf08e5174f939bee2563184c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 13 15:26:35 2013 -0500

    Removed gemmtrsm-, trsm-specific blocksize macros.
    
    Details:
    - Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
      instead of operation-specific ones.
    - Removed local, gemmtrsm-specific blocksize macro definitions found in
      micro-kernel header files.
      (Meant to include above changes in 31b100e7bf4a.)
    - Added comments to reference gemmtrsm micro-kernel wrapper implementation.

commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 12 15:25:54 2013 -0500

    Added/renamed alignment constants to _config.h.
    
    Details:
    - Added new memory alignment constants:
        BLIS_HEAP_STRIDE_ALIGN_SIZE   (previously assumed to be same as SYSTEM_MEM)
        BLIS_CONTIG_ADDR_ALIGN_SIZE   (previously assumed to be same as PAGE_SIZE)
        BLIS_STACK_BUF_ALIGN_SIZE     (previously not enforced)
      and renamed existing ones
        BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
        BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
      to better convey what the alignment factor is used for (and what it is
      not used for).
    - Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
      disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
    - Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
      into macro-kernels to specify stack alignment of temporary buffers.
    - Modified test suite driver to output new constants.
    - Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
      use bli_align_dim_to_size(), which takes a third argument (the desired
      alignment).

commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 12 11:40:55 2013 -0500

    Fixed an bug in axpyv/axpym when alpha is unit.
    
    Details:
    - Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
      rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
      this bug.

commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 11 16:39:25 2013 -0500

    Moved _POSIX_C_SOURCE def to compiler cmd line.
    
    Details:
    - Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
      and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
      the compiler command line arguments in make_defs.mk (for both configs).
      Thanks to Devin Matthews for suggesting this change.

commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 11 16:28:17 2013 -0500

    Appended 'f2c_' to abs, min, max macros in f2c.h.
    
    Details:
    - Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
      would not conflict with anything defined by the user (or the language).
      Thanks to Devin Matthews for suggesting this fix.
    - Updated all instances of the above macros accordingly.

commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 11 11:11:52 2013 -0500

    Added new kernel blocksize macro aliases.
    
    Details:
    - Added new macros that alias level-3 cache and register blocksize macros
      to names that can be constructed via the PASTEMAC macro. These aliased
      macro definitions live inside bli_kernel_macro_defs.h, which is now
      #included after bli_kernel.h.
    - Modified macro-kernels to use new aliased blocksize macros instead of
      operation-specific ones.
    - Removed local, operation-specific kernel blocksize macro definitions
      (found in macro-kernel header files).

commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 11 10:35:39 2013 -0500

    Updated CREDITS file.

commit 79328c15410215737f3f14cd069328cf52aa11fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 11 10:32:14 2013 -0500

    Reverted testsuite object files' home to 'obj'.
    
    Details:
    - Removed 'obj' and 'lib' from .gitignore.
    - Added testsuite/obj/.gitkeep (which is an empty file).
    - Updated testsuite/Makefile accordingly.
    - Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
      empty directories in git.

commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 9 17:45:39 2013 -0500

    Renamed/moved object scalar constant macros.
    
    Details:
    - Replaced scalar constant macro definitions in bli_const_defs.h with a single,
      simplier macro in bli_obj_macro_defs.h.
    - Updated invocations of old macros accordingly.
    - Removed bli_const_defs.h.

commit 357893f5be5c56ab7b062874005e77e614b23f06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 9 14:48:15 2013 -0500

    Applied fix from prev commit to gemmtrsm_?_ref_4x4
    
    Details:
    - Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
      bli_gemmtrsm_u_ref_4x4.c.

commit 54988e8dca44475610bcaee5a7bc1c40e8921402
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 8 19:08:43 2013 -0500

    Fixed a performance bug in trsm.
    
    Details:
    - Fixed a bug in the reference implementations of the gemmtrsm wrappers
      (bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
      reference gemm microkernel was hard-coded, and thus always called, even
      when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
      manifested as artificially low trsm performance for all problem sizes, but
      especially for small problem sizes as it only affected blocks of A that
      intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
      find this bug.

commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 8 16:08:22 2013 -0500

    Generate testsuite objects 'src'.
    
    Details:
    - Tweaked the testsuite makefile so that object files are stored in 'src'
      rather than 'obj', since (a) the top-level .gitignore dictates that
      obj directories are to be ignored, and (b) since git has problems
      tracking empty directories. Now, users do not need to create their own
      obj directories within their own local clones of BLIS.

commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 8 15:18:42 2013 -0500

    Minor formatting changes.

commit a571af816d72727e16cad37007e7043b9d6fa362
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Apr 8 15:00:13 2013 -0500

    Fixed definition of bli_is_packed_object() macro.
    
    Details:
    - Changed the definition of bli_is_packed_object() so that it keys off of the
      value of the pack schema bits in the info field of obj_t, rather than
      comparing the obj_t buffer with that of the mem_t entry. This was the cause
      of a very low probability bug whereby uninitialized memory caused the macro
      to evaluate to TRUE even though the object in question was not packed.
      Thanks to Vernon Austel of IBM for helping discover this bug.
    - Changed an abort() in bli_packm_part() to a not-yet-implemented.

commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Apr 6 12:54:45 2013 -0500

    Updated information in testsuite output header.
    
    Details:
    - Added to the information that is echoed at the beginning of the test suite's
      output, and also re-labeled some existing information.

commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Apr 5 17:19:43 2013 -0500

    Fixed edge case handling bug in herk macrokernels.
    
    Details:
    - Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
      only manifests when BLIS is configured such that MR != NR. The bug involves
      incorrectly detecting edge cases, which resulted in some parts of matrix C
      potentially being skipped and not updated, depending on the problem size.
    - Updated the default values of MR and NR in config/reference/bli_kernel.h to
      8 and 4, respectively, so that I can better stress the framework on a
      day-to-day basis. (The fact that they were both equal to 4 for so long is
      why I did not stumble upon this bug much sooner.)

commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Apr 4 15:25:43 2013 -0500

    Added reference microkernels for arbitrary MR, NR.
    
    Details:
    - Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
      contain explicit loops over MR and NR, thus allowing them to be used
      unmodified by developers who want to build a reference library with
      custom register blocksizes.
    - Changed config/reference/bli_kernel.h to use above ukernels by default.
    - Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
      to use 'restrict' keyword.
    - Added -funroll-loops option to config/reference/make_defs.mk.
    - Updated comments in bli_kernel.h describing constraints on register and
      cache blocksizes.
    - Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
      single-char macros are also defined.

commit 6684b73d5501f91d24a79e26655a42819c9b3114
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Apr 2 13:06:20 2013 -0500

    Implemented amax operation and related changes.
    
    Details:
    - Implemented amax operation in BLIS.
    - Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
    - Added integer support to [f]printv, [f]printm.
    - Added integer support to level-0 copys macros.
    - Updated printing of configuration information in test suite driver.
    - Comment changes to _config.h files.
    - Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
      used for.

commit fb68087f8727cd5fd656a742a110e54fb1c91db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 15:10:16 2013 -0500

    More memory alignment-related tweaks.
    
    Details:
    - Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
    - Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
    - Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
      passed into posix_memalign() or equivalent.
    - Defined new function, bli_align_dim_to_cmem(), which applies the
      contiguous memory alignment (rather than the system/malloc alignment).

commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 14:14:53 2013 -0500

    Always define memory alignment size cpp constant.
    
    Details:
    - Removed guard around #define for memory alignment size constant.
      Memory alignment should always be enabled, and so this value should
      always be defined.

commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 13:59:19 2013 -0500

    Renamed memory alignment macro constant.
    
    Details:
    - Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
      BLIS_MEMORY_ALIGNMENT_SIZE.

commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 26 12:43:14 2013 -0500

    Align packed panel strides with system alignment.
    
    Details:
    - Pass panel strides through bli_align_dim_to_sys() to ensure that each
      subsequent packed panel of A and B begins at an aligned address. (The
      first panel is presumably aligned to system alignment because it is
      aligned to a page boundary, which is typically much larger.)
    - Rearranged code in packm_init_pack() to prevent additional conditional
      blocks as a result of the aforementioned change.
    - Adjusted contiguous memory allocator so that the system memory alignment
      is used to allocate enough space for each block no matter what kind of
      register blocking is used (even if register blocksize is unit and every
      row/column needs maximal padding).
    - Adjusted default blocksizes in reference configuration so that MC*KC
      and KC*NC result in identical footprints for all datatypes.

commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 24 20:18:12 2013 -0500

    CHANGELOG update.

commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 24 20:01:49 2013 -0500

    Migrated 'bl2' prefix to 'bli'.
    
    Details:
    - Changed all filename and function prefixes from 'bl2' to 'bli'.
    - Changed the "blis2.h" header filename to "blis.h" and changed all
      corresponding #include statements accordingly.
    - Fixed incorrect association for Fran in CREDITS file.

commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 24 18:49:36 2013 -0500

    Removed several 'old' directories and files.
    
    Details:
    - Removed most of the 'old' directories scattered throughout the framework,
      which includes alternate/half-baked/broken implementations.

commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sun Mar 24 18:00:10 2013 -0500

    Removed #include "blis2.h" from low-level headers.
    
    Details:
    - Removed #include of "blis2.h" from various lower-level, operation-specific
      header files throughout the framework. Given that these low-level headers
      are included within #blis2.h in a very specific order, #include'ing blis2.h
      within them directly is unnecessary.

commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 22 17:18:58 2013 -0500

    Added cpp guards to conflicting libflame typedefs.
    
    Details:
    - Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
      This is a temporary hack to allow interoperability with libflame. (Similarly
      temporary changes are being made to libflame's type definitions file.)

commit f469907503fcdc24dff0174c569170e6e756e045
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 22 15:20:15 2013 -0500

    Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
    
    Details:
    - Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
      BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
      (e.g. "prefetch" instructions, which are different than the particular
      kind of prefetching/preloading referred to by this constant).

commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 22 15:09:59 2013 -0500

    Removed build/old directory.

commit 718888849c48d99f83eea6b8f83bc1998cffef7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 22 15:07:01 2013 -0500

    Deprecated 'flame' configuration.
    
    Details:
    - Removed 'flame' configuration, as it was horribly out-of-date.
    - Comment changes to bl2_blocksize.c and bl2_mem.c.

commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Mar 19 18:07:40 2013 -0500

    Added missing conjbeta argument to scald.

commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 18 15:37:20 2013 -0500

    Relocated packed mem_t dimension fields to obj_t.
    
    Details:
    - Removed the m and n (and elem_size) fields from the mem_t object, and added
      m_packed and n_packed fields to obj_t. These new fields track the same as
      the old ones. From an abstraction standpoint, it seemed awkward to store
      those dimensions inside the mem_t.
    - Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
      is passed in, instead of m, n, and elem_size.
    - Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
      functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
      respectively.
    - Updated packm variants to access the packed length and width fields from
      their new locations.

commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Mar 18 10:37:03 2013 -0500

    CHANGELOG update.

commit e7d41229d3b1674e74f47d7f29fae004a745201a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 15 17:12:36 2013 -0500

    Re-implemented contiguous memory allocator.
    
    Details:
    - Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
      allocator instantiates and initializes three separate memory pool objects,
      each one associated with a separate array of contiguous memory blocks, each
      block of fixed and uniform size. (The three pools are for allocating mc-by-kc
      blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
      objects use a stack structure internally to track which blocks in the region
      have been "checked out" to a thread and which are still available. Critical
      regions are now clearly marked and adaptable to parallel environments (e.g.
      OpenMP). Memory pools are set up when bl2_init() is called.
    - Added a new field to the packm control tree node, which indicates what kind
      of packed buffer is being allocated. The enumerated type for this argument
      is defined as packbuf_t in bl2_type_defs.h.
    - Updated level-3 _cntl.c files to pass in the appropriate value for a new
      packbuf_t argument to bl2_packm_cntl_obj_create().
    - Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
      bl2_mem_macro_defs.h.
    - Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
      number of blocks of A reserved for the memory allocator.
    - Deprecated bl2_align_dim(). Replaced usage with that of
      bl2_align_dim_to_mult(). Turns out that typically we don't need to align
      a dimension to the system alignment, since that value has to do with
      starting addresses, whereas the values we are dealing with are unitless
      dimensions.

commit 1e76cae00cb0a04544aaae1ade878686b238d283
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 15 12:21:42 2013 -0500

    Perform her2k var1 loops in sequence.
    
    Details:
    - Changed variant 1 of her2k so that the two rank-k products are computed
      and accumulated in sequence rather than fused into one loop. This is
      necessary if BLIS is to be configured to provide only enough contiguous
      memory for one panel of B.

commit c95c270eba91ae4efc26603beddfd0292caa919b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 7 14:42:15 2013 -0600

    Enhanced tracking of dimensions for mem_t objects.
    
    Details:
    - Added new fields to mem_t struct definition to track the allocated (as
      opposed to the currently used) dimensions of the memory region. This
      allows packm_init() to be more robust in situations where memory is
      already allocated but is more than needed for the current packing job.
    - Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
      in packm_init(), to update the "currently used" dimensions of the mem_t
      object if the requested dimensions are smaller than the allocated
      dimensions.

commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Mar 7 14:00:10 2013 -0600

    Fixed test suite flop formulas for ops with side.
    
    Details:
    - Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
      trmm3, and trsm.
    - Comment updates in herk macro-kernels.

commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 2 12:47:06 2013 -0600

    Added "version" to .gitignore.
    
    Details:
    - Added "version" to .gitignore file so that the file does not show up when
      running 'git status', or accidentally get pulled into the index when
      running 'git add' or 'git add --all'.

commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Sat Mar 2 12:43:54 2013 -0600

    Removed version file from version control.
    
    Details:
    - Removed version file from version control to prevent git errors that occur
      when trying to pull new commits.

commit bb612f864e9c17dd9805e9446840f02259619469
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Mar 1 12:55:42 2013 -0600

    Updated behavior of bl2_obj_induce_trans() macro.
    
    Details:
    - Changed bl2_obj_induce_trans() so that the transposition bit is no longer
      updated as part of the macro. All current uses of the macro have been
      coupled with instances of bl2_obj_set_trans() to clear the bit.
    - Added Jed to CREDITS file.

commit f24e29b789e7314764a818ceb3063126936c986f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 22 18:15:41 2013 -0600

    Replaced banded/packed BLAS2 stubs with f2c code.
    
    Details:
    - Retired the blas2blis wrappers that simply called abort with a "not yet
      implemented" message. This includes all of the level-2 banded and packed
      routines.
    - Replaced the aforementioned with the corresponding netlib implementations
      having been run through f2c (with some customization).
    - Added directories named 'attic' to build/gen-make-frags/ignore_list.

commit 1454c1a14207766dfed372b8e38b47fa384f5198
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 22 12:38:45 2013 -0600

    Moved Fortran name-mangling macro to bl2_config.h.
    
    Details:
    - Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
      configuration directory (bl2_config.h, specifically) given that it can be
      expected to be tweaked by some developers.

commit ede75693e5a36c6006087c4a7df834175b604504
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 22 12:11:24 2013 -0600

    Implemented blas2blis compatibility layer.
    
    Details:
    - Added the blas2blis compatibility layer, located in frame/compat. This
      includes virtually all of the BLAS, including banded and packed level-2
      operations.
    
    - Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
      initialization, which stores the "exit status" in an err_t, which is then
      read by the latter function to determine whether finalization should actually
      take place.
    - Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
      level-3 BLAS-like wrappers.
    - Added configuration option to instruct BLIS to remain initialized whenever
      it automatically initializes itself (via bl2_init_safe()), until/unless the
      application code explicitly calls bl2_finalize().
    
    - Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
      templatization of blas2blis wrappers.
    - Defined level-0 scalar macro bl2_??swaps().
    - Defined level-1v operation bl2_swapv().
    - Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
      wrappers.

commit 995edf43e21c1868732dbdd7fee14b08730218bd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 21 14:30:50 2013 -0600

    Updated version file. (Forgot to in prev commit).

commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 21 12:00:17 2013 -0600

    Fixed some scalar types in BLAS-like Herm APIs.
    
    Details:
    - Some of the scalars of Hermitian operations, such as alpha in her,
      alpha and beta in herk, and beta in her2k, need to be real. These
      arguments were typed incorrectly as the complex types. This has been
      fixed. Note the issue was only present in the BLAS-like APIs for
      these operations (not the native object-based interfaces).

commit 5ece050a669e74ba4a711d1d4669239d22d45642
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 20 15:50:54 2013 -0600

    Updated version file. (Forgot to in prev commit).

commit f243034b8b430d4684680ea8eddfd246e73fefc0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 20 14:11:36 2013 -0600

    Changed API of packm_init_pack() to use blksz_t.
    
    Details:
    - Changed the interface of packm_init_pack() so that mult_m and mult_n
      are passed in as type blksz_t* instead of dim_t.
    - Make similar change for packv_init_pack().

commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Feb 15 09:59:48 2013 -0600

    Minor changes to lower levels of scalm and setm.
    
    Details:
    - Removed diagx parameter from lower-level interfaces of scalm.
    - Modified scalm_basic_check() to expect an object with a nonunit diagonal.
    - Changed setm_unb_var1() so that having an implicit unit diagonal results
      in only the strictly lower or upper triangle of the matrix being modified.

commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 14 10:42:56 2013 -0600

    Updated beta == zero semantics of mulsc.
    
    Details:
    - Updated beta == zero semantics of mulsc. Hopefully this is the last
      operation that needed updating.
    - Added Devin to CREDITS file.

commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Feb 14 10:18:00 2013 -0600

    Removed some calls to setv() in test modules.
    
    Details:
    - Removed calls to setv() in test modules whose sole purpose was to
      initialize vectors to zero to ensure that nan's and inf's would not
      taint the computation. Now that beta == zero semantics have been
      updated to clear the output operand (when beta is zero), rather than
      multiply against it, these setv() calls are no longer needed.

commit e6ac623a902f776c42f85eadbf76996d9770a0db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 13 18:44:59 2013 -0600

    Properly implemented beta == 0 semantics.
    
    Details:
    - Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
      respectively.
    - Added code to the following operations that sets the output operand to
      zero if the corresponding scalar is zero (rather than performing the
      floating-point multiply, or in the case of setv, copying the value).
      This will prevent nan's and inf's from creeping into results from
      uninitialized memory.
      - axpy
      - dotxv
      - scalv
      - scal2v
      - setv
      - gemv
      - ger
      - hemv
      - her
      - her2
      - gemm reference ukernels

commit aedccbc85d491e41711a0c6eb0d246d8700a199a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 13 18:29:53 2013 -0600

    Fixed stale interface to packm_unb_var1().
    
    Details:
    - Removed the control tree from the interface to packm_unb_var1(), which
      I meant to do when it was un-deprecated.

commit c23135669f7a8a545e2e11ef559bf284be8bc65c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Wed Feb 13 13:21:00 2013 -0600

    Un-deprecated packm_unb_var1.c (needed by l2 ops).
    
    Details:
    - Added bl2_packm_unb_var1() back into the mix once I realized that level-2
      operations still need this routine for packing matrices. Now, whether
      level-2 operations should be packing matrices to begin with is another
      matter. But this fixes the segmentation fault one would have gotten when
      running bl2_gemv() on a general stride matrix.

commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 12 18:39:35 2013 -0600

    Removed cntl tree usage from packm implementation.
    
    Details:
    - Added new fields to obj_t info field:
      - invert_diag
      - pack_order_if_upper
      - pack_order_if_lower
      These fields allow packm_init() to embed information that begins
      in the control tree into the object so that the packm implementation
      does not need to use control trees at all. This is being done to aid
      Bryan's DxT code generation.
    - Added macros that operate on above fields.
    - Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
      to above changes.
    - Made similar (but much simpler) changes to packv.
    - Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
      These were part of prototype implementations and are no longer needed.

commit eb139ae256651af7820b93ef982626180195b87f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 12 12:39:30 2013 -0600

    Replaced bl2_abs() with _fabs() where appropriate.

commit 474bac30c99928f9e87315972bcb45c632c0b7ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 12 12:23:48 2013 -0600

    Removed level-0 macros projrs, grabis.
    
    Details:
    - Replaced instances of projrs and grabis macros with newer,
      more general-purpose getris.

commit 03a260a457c8964e4603a655cee0d40ac17affba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Feb 12 11:45:34 2013 -0600

    Restored executable permissions to scripts.
    
    Details:
    - Restored executable (0755) permissions to scripts that were touched by
      the recursive sed script that updated the copyright headers in the
      previous commit.

commit 1274e1243775e5e705114257a43176f63635227f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 11 14:37:47 2013 -0600

    Updated copyright headers from 2012 to 2013.

commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 11 13:38:07 2013 -0600

    CHANGELOG update.

commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Feb 11 13:20:44 2013 -0600

    Added unified test suite, and many fixes.
    
    Details:
    - Added a highly configurable, unified test suite.
    
    - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
      header files. Now, instead, DUPB is computed as (NDUP != 1) within each
      macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
      incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
      By encoding both pieces of information into one constant in _kernel.h,
      it seems somewhat less likely others will encounter this bug in the
      future.
    - Added level-2 cache blocksizes to _kernel.h for reference configuration,
      and defined blocksizes in _cntl.c files to these default values.
    
    - Changed semantics of her2k and syr2k such that these operations no longer
      expect the B matrix to already be conjugate-transposed (or just transposed
      for syr2k). However, these semantics are preserved for the internal
      mechanics of the implementations, including the internal back-end and all
      blocked variants.
    - Inserted checks for real-valued alpha and beta for herk/her2k and herk,
      respectively.
    
    - Relaxed general object structure constraints in _basic_check() for gemv, ger.
    - Changed her front-end to NOT copy-cast to real projection; instead, this is
      replaced by selecting either the real part or both parts within the unblocked
      algorithm implementation, depending on the value of conjh.
    - Added conjh to all _check routines for her so that the code knows when to
      verify that alpha has an imaginary component equal to zero (for her, but
      not syr).
    - Changed control tree for her to forgo packing.
    
    - Added unit diagonal support to fnormm.
    - Redefined real versions of abval2s macros in terms of fabs(), fabsf().
    - Redefined complex versions of sqrt2s macros using the actual "complex square
      root" formula.
    - Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
    - Defined new level-1v, -1d, and -1m versions of add and sub operations
      (two-operand add and subtract).
    - Added new scalar macros:
      - getris: acquire real and imaginary components.
      - setris: set real and imaginary components.
      - addjs: addition with conjugated x.
      - subjs: subtraction with conjugated x.
    - Defined new utility operations:
      - absumv: element-wise sum of absolute values for vector elements.
      - absumm: element-wise sum of absolute values for matrix elements.
      - mkherm: convert existing matrix to Hermitian.
      - mksymm: convert existing matrix to symmetric.
      - mktrim: convert existing matrix to triangular.
    
    - Added various error checking routines.
    - Added bl2_clock_min_diff(), which is used to more cleanly measure the
      wall clock time of a code block.
    - Added general stride support to bl2_obj_alloc_buffer().
    - Added bl2_obj_init_scalar().
    - Updated parameter mapping in bl2_param_map.c.
    - Added support for queriable version string.
    
    - Fixed a bug in the her2k macro-kernels (which currently are simply
      implemented in terms of two invocations of herk) whereby beta was being
      applied to both the first and second rank-k updates, rather than only
      the first.
    - Fixed a bug in trmm/trsm whereby transpose and right side cases were not
      properly implemented due to erroneous assumptions regarding aliasing and
      root objects.
    - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
      MR x NR block of B was being updated.
    - Fixed a bug in the inverts macro in the double real case whereby the
      value was typecast to float before inversion. This affected non-unit cases
      of dtrsm.
    - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
      constant was being applied incorrectly.
    - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
      now mimics the rank-k strategy of gemm, whereby alpah is applied during
      the first iteration of variant 3, with BLIS_ONE passed in instead for
      subsequent iterations. This also required passing alpha into the macro-
      kernels as well as the fused gemmtrsm micro-kernels.
    - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
      called for blocks strictly above the diagonal. While this sounds good in
      theory, this cannot be done because gemm_ker_var2 expects row panels of
      A to be packed from top to bottom, while for trsm_u, A is actually packed
      from bottom to top due to the reverse (BR->TL) nature of the algorithm.
    - Fixed a bug in packm_cxk() whereby panel packings with unit panel
      dimensions were mishandled due to incorrect arguments to the copyv kernel.
      Also changed the copyv kernel invocation to scal2v so that these edge
      cases are properly handled when scaling is requested.
    - Fixed a bug in packv_int() whereby an uninitialized object is passed in
      instead of the source object.
    - Fixed a bug whereby level-2 code could allocate memory dynamically via
      bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
      a potential future bug whereby a mem_t object that is actually no longer
      "allocated" from the static pool is mistaken for being allocated due to
      failure to NULLify the buffer when the block was most recently released.
    - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
      toggled when the requested subpartition needed to be "reflected" due to it
      residing in an unstored region.

commit be94fb84c0351602d7585269f29998e3bf83f899
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 4 10:55:21 2013 -0600

    Added missing 'd' to fused gemmtrsm function name.

commit 879a179e1dee36f0c56765f2ab91a26861019b34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Jan 4 10:37:27 2013 -0600

    Added debug statements to bl2_mm_acquire_m().
    
    Details:
    - Added printf() statements to bl2_mm_acquire_m() to help debug issues
      with prematurely exhausted memory pool.
    - Removed 'd' from kernel names of reference kernels in clarksville
      configuration's bl2_kernel.h

commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 20 17:07:50 2012 -0600

    Defined Frobenius norm operations.
    
    Details:
    - Added level-0 grabis macro operation to grab imaginary component of one
      variable and copy it to the real component of another variable.
    - Defined sumsqv operation, which computes the sum of the absolute squares
      of the elements of a vector. This implementation is modeled after ?lassq
      in netlib LAPACK.
    - Defined fnormv and fnormm operations, which compute the Frobenius norm on
      vectors and matrices, respectively. These operations are treated as one-
      operand operations where the output norm value is the real projection of
      the datatype of the input operand. Both operations are implemented in terms
      of sumsqv.

commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 20 17:02:55 2012 -0600

    Added GENT*R macros; tweaked bl2_machval defs.
    
    Details:
    - Added function and prototype macro-generating macros for GENTFUNCR and
      GENTPROTR, which are one-operand macros with auxiliary real projection
      types.
    - Tweaked bl2_machval files to use new macros.

commit 2fecc88ca22142020573f168da715e8e9f3dd7de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 20 11:35:14 2012 -0600

    Fixed harmless macro bug in level-1m operations.
    
    Details:
    - Fixed some inconsistent usage of n_iter_max and n_iter in the two
      bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
      despite the bug, which is why I had not discovered it until now.

commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 18 15:07:36 2012 -0600

    Renamed x86,x86_64 kernels to indicate 'd' fusing.
    
    Details:
    - Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
      to emphasize that the fusing shape is not for all datatype instances, but
      rather just for one (that of double-precision real). Other fusing shapes
      would be proportional to their precision and domain "byte footprints".
    - Corresponding changes to config/clarksville/bl2_kernel.h.

commit 6fbbdd4e194d06096ad08c5db61127be338067db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 18 14:34:02 2012 -0600

    More tweaks to _config.h, _kernel.h; smem tweaks.
    
    Details:
    - Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
    - Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
      accomplishes the same thing (enabling posix_memalign()) without enabling
      all of the GNU extensions we don't need.
    - Defined the size of the static memory pool in terms of MC, KC, and NC,
      as well as two new constants that determine how many MCxKC blocks and
      how many KCxNC blocks should be allocated (defined in bl2_config.h).
    - In the case of static memory pool exhaustion, replaced the generic
      bl2_abort() with a specific error code call.

commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 17 16:07:36 2012 -0600

    Minor reordering of bl2_config.h definitions.

commit 4a83f67490136a898f558e273b76a687aed8b893
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 17 12:35:54 2012 -0600

    Consolidated configuration headers.
    
    Details:
    - Merged contents of bl2_arch.h into bl2_config.h for reference and
      clarksville configurations.
    - Updated CREDITS, INSTALL, LICENSE, README files.

commit 0670c33cc14612f636ef09ede4133404ae0af6ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 14 12:45:26 2012 -0600

    Fixed bug in reference gemm ukernels.
    
    Details:
    - Fixed a bug whereby, for the reference gemm ukernels, the matrix product
      was not correctly accumulated and scaled (by alpha) into the output matrix
      C. (Thanks to Fran for finding this bug.)
    - Whitespace changes to reference trsm kernels.

commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 13 18:17:54 2012 -0600

    Expanded reference packm/unpackm kernel set to 16.
    
    Details:
    - Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
      unpackm.
    - Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
      kernel size is requested. (Thanks to Tyler for finding this bug.)
    - Updated bl2_kernel.h to contain new _KERNEL definitions, according
      to above changes, for 'reference' and 'clarksville' configurations.
    - Updated CHANGELOG.
    - Removed "output*.m" from .gitignore.

commit 17455a8bce038dd570356ab0c5c11d9a89f20248
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 10 17:23:32 2012 -0600

    Minor updates towards to 0.0.1.

commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 10 16:18:40 2012 -0600

    Tweaks to get BLIS compiling again on clarksville.
    
    Details:
    - Updated header files and make_defs.mk in config/clarksville.
    - Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
    - Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
    - Shuffled include statements in blis2.h.

commit cc58ea86010b1f046134d13b546c878389df9af5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 10 14:55:12 2012 -0600

    Added template fragment.mk; updated .gitignore.

commit 714c527b0eb153b7e2040b79349edc8372f743fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 7 19:54:04 2012 -0600

    Added 'changelog' make target; other tweaks.
    
    Details:
    - Updated CHANGELOG.
    - Added 'changelog' target to Makefile that runs 'git log --decorate' and
      overwrites CHANGELOG with the output.
    - Other trivial changes.

commit e4e5404d26aded4873278e85faf6f14ac32115b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 7 17:34:53 2012 -0600

    Define static memory pool size in bl2_config.h.

commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Fri Dec 7 17:18:00 2012 -0600

    Refined INSTALL text; added 'showconfig' target.
    
    Details:
    - Added 'showconfig' target to Makefile.
    - Added header files and ./config/<configname>/make_defs.mk as prerequisites
      to object file rules.
    - Added config.mk as prerequisite to library install rules.
    - Edited and added to INSTALL file.

commit 26cb659dd79636489db5a051aa60fff80273a7b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 6 15:34:53 2012 -0600

    Added auto-detection of version string (via git).
    
    Details:
    - Added build/update-version-file.sh script for auto-detecting "version"
      string and updating 'version' file accordingly. (If .git directory is
      not present, then it is assumed this copy of BLIS is a downloaded
      release, in which case 'version' file is left unchanged.)
    - Added invocation of update-version-file.sh to configure script.

commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 6 14:27:11 2012 -0600

    Wrote first draft of INSTALL file.

commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Thu Dec 6 12:42:35 2012 -0600

    Updated standalone test Makefile and other fixes.
    
    Details:
    - Major edits to test/Makefile to bring up-to-date wrt new build system;
      should no longer be broken.
    - Minor edits to top-level Makefile.
    - Fixed copy-and-paste bugs in
      - frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
      - frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c

commit 2f272b40f43307909736327f49d17737c7a05d37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Tue Dec 4 19:22:14 2012 -0600

    Added build system and continued reorganization.
    
    Details:
    - Added/renamed packm, unpackm kernels.
    - Added machine value routines.
    - Added param_map facility.
    - Renamed AUTHORS to CREDITS.
    - Added Makefile; continued to expand upon existing configure script.
    - #define fuse_fac macros in operation headers if not defined already
      (by the user in bl2_kernels.h).

commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date:   Mon Dec 3 12:36:11 2012 -0600

    Initial commit.