1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 7392 7393 7394 7395 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 7470 7471 7472 7473 7474 7475 7476 7477 7478 7479 7480 7481 7482 7483 7484 7485 7486 7487 7488 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 7499 7500 7501 7502 7503 7504 7505 7506 7507 7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 7600 7601 7602 7603 7604 7605 7606 7607 7608 7609 7610 7611 7612 7613 7614 7615 7616 7617 7618 7619 7620 7621 7622 7623 7624 7625 7626 7627 7628 7629 7630 7631 7632 7633 7634 7635 7636 7637 7638 7639 7640 7641 7642 7643 7644 7645 7646 7647 7648 7649 7650 7651 7652 7653 7654 7655 7656 7657 7658 7659 7660 7661 7662 7663 7664 7665 7666 7667 7668 7669 7670 7671 7672 7673 7674 7675 7676 7677 7678 7679 7680 7681 7682 7683 7684 7685 7686 7687 7688 7689 7690 7691 7692 7693 7694 7695 7696 7697 7698 7699 7700 7701 7702 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 7713 7714 7715 7716 7717 7718 7719 7720 7721 7722 7723 7724 7725 7726 7727 7728 7729 7730 7731 7732 7733 7734 7735 7736 7737 7738 7739 7740 7741 7742 7743 7744 7745 7746 7747 7748 7749 7750 7751 7752 7753 7754 7755 7756 7757 7758 7759 7760 7761 7762 7763 7764 7765 7766 7767 7768 7769 7770 7771 7772 7773 7774 7775 7776 7777 7778 7779 7780 7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 7817 7818 7819 7820 7821 7822 7823 7824 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 7846 7847 7848 7849 7850 7851 7852 7853 7854 7855 7856 7857 7858 7859 7860 7861 7862 7863 7864 7865 7866 7867 7868 7869 7870 7871 7872 7873 7874 7875 7876 7877 7878 7879 7880 7881 7882 7883 7884 7885 7886 7887 7888 7889 7890 7891 7892 7893 7894 7895 7896 7897 7898 7899 7900 7901 7902 7903 7904 7905 7906 7907 7908 7909 7910 7911 7912 7913 7914 7915 7916 7917 7918 7919 7920 7921 7922 7923 7924 7925 7926 7927 7928 7929 7930 7931 7932 7933 7934 7935 7936 7937 7938 7939 7940 7941 7942 7943 7944 7945 7946 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 7957 7958 7959 7960 7961 7962 7963 7964 7965 7966 7967 7968 7969 7970 7971 7972 7973 7974 7975 7976 7977 7978 7979 7980 7981 7982 7983 7984 7985 7986 7987 7988 7989 7990 7991 7992 7993 7994 7995 7996 7997 7998 7999 8000 8001 8002 8003 8004 8005 8006 8007 8008 8009 8010 8011 8012 8013 8014 8015 8016 8017 8018 8019 8020 8021 8022 8023 8024 8025 8026 8027 8028 8029 8030 8031 8032 8033 8034 8035 8036 8037 8038 8039 8040 8041 8042 8043 8044 8045 8046 8047 8048 8049 8050 8051 8052 8053 8054 8055 8056 8057 8058 8059 8060 8061 8062 8063 8064 8065 8066 8067 8068 8069 8070 8071 8072 8073 8074 8075 8076 8077 8078 8079 8080 8081 8082 8083 8084 8085 8086 8087 8088 8089 8090 8091 8092 8093 8094 8095 8096 8097 8098 8099 8100 8101 8102 8103 8104 8105 8106 8107 8108 8109 8110 8111 8112 8113 8114 8115 8116 8117 8118 8119 8120 8121 8122 8123 8124 8125 8126 8127 8128 8129 8130 8131 8132 8133 8134 8135 8136 8137 8138 8139 8140 8141 8142 8143 8144 8145 8146 8147 8148 8149 8150 8151 8152 8153 8154 8155 8156 8157 8158 8159 8160 8161 8162 8163 8164 8165 8166 8167 8168 8169 8170 8171 8172 8173 8174 8175 8176 8177 8178 8179 8180 8181 8182 8183 8184 8185 8186 8187 8188 8189 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 8200 8201 8202 8203 8204 8205 8206 8207 8208 8209 8210 8211 8212 8213 8214 8215 8216 8217 8218 8219 8220 8221 8222 8223 8224 8225 8226 8227 8228 8229 8230 8231 8232 8233 8234 8235 8236 8237 8238 8239 8240 8241 8242 8243 8244 8245 8246 8247 8248 8249 8250 8251 8252 8253 8254 8255 8256 8257 8258 8259 8260 8261 8262 8263 8264 8265 8266 8267 8268 8269 8270 8271 8272 8273 8274 8275 8276 8277 8278 8279 8280 8281 8282 8283 8284 8285 8286 8287 8288 8289 8290 8291 8292 8293 8294 8295 8296 8297 8298 8299 8300 8301 8302 8303 8304 8305 8306 8307 8308 8309 8310 8311 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8327 8328 8329 8330 8331 8332 8333 8334 8335 8336 8337 8338 8339 8340 8341 8342 8343 8344 8345 8346 8347 8348 8349 8350 8351 8352 8353 8354 8355 8356 8357 8358 8359 8360 8361 8362 8363 8364 8365 8366 8367 8368 8369 8370 8371 8372 8373 8374 8375 8376 8377 8378 8379 8380 8381 8382 8383 8384 8385 8386 8387 8388 8389 8390 8391 8392 8393 8394 8395 8396 8397 8398 8399 8400 8401 8402 8403 8404 8405 8406 8407 8408 8409 8410 8411 8412 8413 8414 8415 8416 8417 8418 8419 8420 8421 8422 8423 8424 8425 8426 8427 8428 8429 8430 8431 8432 8433 8434 8435 8436 8437 8438 8439 8440 8441 8442 8443 8444 8445 8446 8447 8448 8449 8450 8451 8452 8453 8454 8455 8456 8457 8458 8459 8460 8461 8462 8463 8464 8465 8466 8467 8468 8469 8470 8471 8472 8473 8474 8475 8476 8477 8478 8479 8480 8481 8482 8483 8484 8485 8486 8487 8488 8489 8490 8491 8492 8493 8494 8495 8496 8497 8498 8499 8500 8501 8502 8503 8504 8505 8506 8507 8508 8509 8510 8511 8512 8513 8514 8515 8516 8517 8518 8519 8520 8521 8522 8523 8524 8525 8526 8527 8528 8529 8530 8531 8532 8533 8534 8535 8536 8537 8538 8539 8540 8541 8542 8543 8544 8545 8546 8547 8548 8549 8550 8551 8552 8553 8554 8555 8556 8557 8558 8559 8560 8561 8562 8563 8564 8565 8566 8567 8568 8569 8570 8571 8572 8573 8574 8575 8576 8577 8578 8579 8580 8581 8582 8583 8584 8585 8586 8587 8588 8589 8590 8591 8592 8593 8594 8595 8596 8597 8598 8599 8600 8601 8602 8603 8604 8605 8606 8607 8608 8609 8610 8611 8612 8613 8614 8615 8616 8617 8618 8619 8620 8621 8622 8623 8624 8625 8626 8627 8628 8629 8630 8631 8632 8633 8634 8635 8636 8637 8638 8639 8640 8641 8642 8643 8644 8645 8646 8647 8648 8649 8650 8651 8652 8653 8654 8655 8656 8657 8658 8659 8660 8661 8662 8663 8664 8665 8666 8667 8668 8669 8670 8671 8672 8673 8674 8675 8676 8677 8678 8679 8680 8681 8682 8683 8684 8685 8686 8687 8688 8689 8690 8691 8692 8693 8694 8695 8696 8697 8698 8699 8700 8701 8702 8703 8704 8705 8706 8707 8708 8709 8710 8711 8712 8713 8714 8715 8716 8717 8718 8719 8720 8721 8722 8723 8724 8725 8726 8727 8728 8729 8730 8731 8732 8733 8734 8735 8736 8737 8738 8739 8740 8741 8742 8743 8744 8745 8746 8747 8748 8749 8750 8751 8752 8753 8754 8755 8756 8757 8758 8759 8760 8761 8762 8763 8764 8765 8766 8767 8768 8769 8770 8771 8772 8773 8774 8775 8776 8777 8778 8779 8780 8781 8782 8783 8784 8785 8786 8787 8788 8789 8790 8791 8792 8793 8794 8795 8796 8797 8798 8799 8800 8801 8802 8803 8804 8805 8806 8807 8808 8809 8810 8811 8812 8813 8814 8815 8816 8817 8818 8819 8820 8821 8822 8823 8824 8825 8826 8827 8828 8829 8830 8831 8832 8833 8834 8835 8836 8837 8838 8839 8840 8841 8842 8843 8844 8845 8846 8847 8848 8849 8850 8851 8852 8853 8854 8855 8856 8857 8858 8859 8860 8861 8862 8863 8864 8865 8866 8867 8868 8869 8870 8871 8872 8873 8874 8875 8876 8877 8878 8879 8880 8881 8882 8883 8884 8885 8886 8887 8888 8889 8890 8891 8892 8893 8894 8895 8896 8897 8898 8899 8900 8901 8902 8903 8904 8905 8906 8907 8908 8909 8910 8911 8912 8913 8914 8915 8916 8917 8918 8919 8920 8921 8922 8923 8924 8925 8926 8927 8928 8929 8930 8931 8932 8933 8934 8935 8936 8937 8938 8939 8940 8941 8942 8943 8944 8945 8946 8947 8948 8949 8950 8951 8952 8953 8954 8955 8956 8957 8958 8959 8960 8961 8962 8963 8964 8965 8966 8967 8968 8969 8970 8971 8972 8973 8974 8975 8976 8977 8978 8979 8980 8981 8982 8983 8984 8985 8986 8987 8988 8989 8990 8991 8992 8993 8994 8995 8996 8997 8998 8999 9000 9001 9002 9003 9004 9005 9006 9007 9008 9009 9010 9011 9012 9013 9014 9015 9016 9017 9018 9019 9020 9021 9022 9023 9024 9025 9026 9027 9028 9029 9030 9031 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 9042 9043 9044 9045 9046 9047 9048 9049 9050 9051 9052 9053 9054 9055 9056 9057 9058 9059 9060 9061 9062 9063 9064 9065 9066 9067 9068 9069 9070 9071 9072 9073 9074 9075 9076 9077 9078 9079 9080 9081 9082 9083 9084 9085 9086 9087 9088 9089 9090 9091 9092 9093 9094 9095 9096 9097 9098 9099 9100 9101 9102 9103 9104 9105 9106 9107 9108 9109 9110 9111 9112 9113 9114 9115 9116 9117 9118 9119 9120 9121 9122 9123 9124 9125 9126 9127 9128 9129 9130 9131 9132 9133 9134 9135 9136 9137 9138 9139 9140 9141 9142 9143 9144 9145 9146 9147 9148 9149 9150 9151 9152 9153 9154 9155 9156 9157 9158 9159 9160 9161 9162 9163 9164 9165 9166 9167 9168 9169 9170 9171 9172 9173 9174 9175 9176 9177 9178 9179 9180 9181 9182 9183 9184 9185 9186 9187 9188 9189 9190 9191 9192 9193 9194 9195 9196 9197 9198 9199 9200 9201 9202 9203 9204 9205 9206 9207 9208 9209 9210 9211 9212 9213 9214 9215 9216 9217 9218 9219 9220 9221 9222 9223 9224 9225 9226 9227 9228 9229 9230 9231 9232 9233 9234 9235 9236 9237 9238 9239 9240 9241 9242 9243 9244 9245 9246 9247 9248 9249 9250 9251 9252 9253 9254 9255 9256 9257 9258 9259 9260 9261 9262 9263 9264 9265 9266 9267 9268 9269 9270 9271 9272 9273 9274 9275 9276 9277 9278 9279 9280 9281 9282 9283 9284 9285 9286 9287 9288 9289 9290 9291 9292 9293 9294 9295 9296 9297 9298 9299 9300 9301 9302 9303 9304 9305 9306 9307 9308 9309 9310 9311 9312 9313 9314 9315 9316 9317 9318 9319 9320 9321 9322 9323 9324 9325 9326 9327 9328 9329 9330 9331 9332 9333 9334 9335 9336 9337 9338 9339 9340 9341 9342 9343 9344 9345 9346 9347 9348 9349 9350 9351 9352 9353 9354 9355 9356 9357 9358 9359 9360 9361 9362 9363 9364 9365 9366 9367 9368 9369 9370 9371 9372 9373 9374 9375 9376 9377 9378 9379 9380 9381 9382 9383 9384 9385 9386 9387 9388 9389 9390 9391 9392 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 9403 9404 9405 9406 9407 9408 9409 9410 9411 9412 9413 9414 9415 9416 9417 9418 9419 9420 9421 9422 9423 9424 9425 9426 9427 9428 9429 9430 9431 9432 9433 9434 9435 9436 9437 9438 9439 9440 9441 9442 9443 9444 9445 9446 9447 9448 9449 9450 9451 9452 9453 9454 9455 9456 9457 9458 9459 9460 9461 9462 9463 9464 9465 9466 9467 9468 9469 9470 9471 9472 9473 9474 9475 9476 9477 9478 9479 9480 9481 9482 9483 9484 9485 9486 9487 9488 9489 9490 9491 9492 9493 9494 9495 9496 9497 9498 9499 9500 9501 9502 9503 9504 9505 9506 9507 9508 9509 9510 9511 9512 9513 9514 9515 9516 9517 9518 9519 9520 9521 9522 9523 9524 9525 9526 9527 9528 9529 9530 9531 9532 9533 9534 9535 9536 9537 9538 9539 9540 9541 9542 9543 9544 9545 9546 9547 9548 9549 9550 9551 9552 9553 9554 9555 9556 9557 9558 9559 9560 9561 9562 9563 9564 9565 9566 9567 9568 9569 9570 9571 9572 9573 9574 9575 9576 9577 9578 9579 9580 9581 9582 9583 9584 9585 9586 9587 9588 9589 9590 9591 9592 9593 9594 9595 9596 9597 9598 9599 9600 9601 9602 9603 9604 9605 9606 9607 9608 9609 9610 9611 9612 9613 9614 9615 9616 9617 9618 9619 9620 9621 9622 9623 9624 9625 9626 9627 9628 9629 9630 9631 9632 9633 9634 9635 9636 9637 9638 9639 9640 9641 9642 9643 9644 9645 9646 9647 9648 9649 9650 9651 9652 9653 9654 9655 9656 9657 9658 9659 9660 9661 9662 9663 9664 9665 9666 9667 9668 9669 9670 9671 9672 9673 9674 9675 9676 9677 9678 9679 9680 9681 9682 9683 9684 9685 9686 9687 9688 9689 9690 9691 9692 9693 9694 9695 9696 9697 9698 9699 9700 9701 9702 9703 9704 9705 9706 9707 9708 9709 9710 9711 9712 9713 9714 9715 9716 9717 9718 9719 9720 9721 9722 9723 9724 9725 9726 9727 9728 9729 9730 9731 9732 9733 9734 9735 9736 9737 9738 9739 9740 9741 9742 9743 9744 9745 9746 9747 9748 9749 9750 9751 9752 9753 9754 9755 9756 9757 9758 9759 9760 9761 9762 9763 9764 9765 9766 9767 9768 9769 9770 9771 9772 9773 9774 9775 9776 9777 9778 9779 9780 9781 9782 9783 9784 9785 9786 9787 9788 9789 9790 9791 9792 9793 9794 9795 9796 9797 9798 9799 9800 9801 9802 9803 9804 9805 9806 9807 9808 9809 9810 9811 9812 9813 9814 9815 9816 9817 9818 9819 9820 9821 9822 9823 9824 9825 9826 9827 9828 9829 9830 9831 9832 9833 9834 9835 9836 9837 9838 9839 9840 9841 9842 9843 9844 9845 9846 9847 9848 9849 9850 9851 9852 9853 9854 9855 9856 9857 9858 9859 9860 9861 9862 9863 9864 9865 9866 9867 9868 9869 9870 9871 9872 9873 9874 9875 9876 9877 9878 9879 9880 9881 9882 9883 9884 9885 9886 9887 9888 9889 9890 9891 9892 9893 9894 9895 9896 9897 9898 9899 9900 9901 9902 9903 9904 9905 9906 9907 9908 9909 9910 9911 9912 9913 9914 9915 9916 9917 9918 9919 9920 9921 9922 9923 9924 9925 9926 9927 9928 9929 9930 9931 9932 9933 9934 9935 9936 9937 9938 9939 9940 9941 9942 9943 9944 9945 9946 9947 9948 9949 9950 9951 9952 9953 9954 9955 9956 9957 9958 9959 9960 9961 9962 9963 9964 9965 9966 9967 9968 9969 9970 9971 9972 9973 9974 9975 9976 9977 9978 9979 9980 9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 10011 10012 10013 10014 10015 10016 10017 10018 10019 10020 10021 10022 10023 10024 10025 10026 10027 10028 10029 10030 10031 10032 10033 10034 10035 10036 10037 10038 10039 10040 10041 10042 10043 10044 10045 10046 10047 10048 10049 10050 10051 10052 10053 10054 10055 10056 10057 10058 10059 10060 10061 10062 10063 10064 10065 10066 10067 10068 10069 10070 10071 10072 10073 10074 10075 10076 10077 10078 10079 10080 10081 10082 10083 10084 10085 10086 10087 10088 10089 10090 10091 10092 10093 10094 10095 10096 10097 10098 10099 10100 10101 10102 10103 10104 10105 10106 10107 10108 10109 10110 10111 10112 10113 10114 10115 10116 10117 10118 10119 10120 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 10131 10132 10133 10134 10135 10136 10137 10138 10139 10140 10141 10142 10143 10144 10145 10146 10147 10148 10149 10150 10151 10152 10153 10154 10155 10156 10157 10158 10159 10160 10161 10162 10163 10164 10165 10166 10167 10168 10169 10170 10171 10172 10173 10174 10175 10176 10177 10178 10179 10180 10181 10182 10183 10184 10185 10186 10187 10188 10189 10190 10191 10192 10193 10194 10195 10196 10197 10198 10199 10200 10201 10202 10203 10204 10205 10206 10207 10208 10209 10210 10211 10212 10213 10214 10215 10216 10217 10218 10219 10220 10221 10222 10223 10224 10225 10226 10227 10228 10229 10230 10231 10232 10233 10234 10235 10236 10237 10238 10239 10240 10241 10242 10243 10244 10245 10246 10247 10248 10249 10250 10251 10252 10253 10254 10255 10256 10257 10258 10259 10260 10261 10262 10263 10264 10265 10266 10267 10268 10269 10270 10271 10272 10273 10274 10275 10276 10277 10278 10279 10280 10281 10282 10283 10284 10285 10286 10287 10288 10289 10290 10291 10292 10293 10294 10295 10296 10297 10298 10299 10300 10301 10302 10303 10304 10305 10306 10307 10308 10309 10310 10311 10312 10313 10314 10315 10316 10317 10318 10319 10320 10321 10322 10323 10324 10325 10326 10327 10328 10329 10330 10331 10332 10333 10334 10335 10336 10337 10338 10339 10340 10341 10342 10343 10344 10345 10346 10347 10348 10349 10350 10351 10352 10353 10354 10355 10356 10357 10358 10359 10360 10361 10362 10363 10364 10365 10366 10367 10368 10369 10370 10371 10372 10373 10374 10375 10376 10377 10378 10379 10380 10381 10382 10383 10384 10385 10386 10387 10388 10389 10390 10391 10392 10393 10394 10395 10396 10397 10398 10399 10400 10401 10402 10403 10404 10405 10406 10407 10408 10409 10410 10411 10412 10413 10414 10415 10416 10417 10418 10419 10420 10421 10422 10423 10424 10425 10426 10427 10428 10429 10430 10431 10432 10433 10434 10435 10436 10437 10438 10439 10440 10441 10442 10443 10444 10445 10446 10447 10448 10449 10450 10451 10452 10453 10454 10455 10456 10457 10458 10459 10460 10461 10462 10463 10464 10465 10466 10467 10468 10469 10470 10471 10472 10473 10474 10475 10476 10477 10478 10479 10480 10481 10482 10483 10484 10485 10486 10487 10488 10489 10490 10491 10492 10493 10494 10495 10496 10497 10498 10499 10500 10501 10502 10503 10504 10505 10506 10507 10508 10509 10510 10511 10512 10513 10514 10515 10516 10517 10518 10519 10520 10521 10522 10523 10524 10525 10526 10527 10528 10529 10530 10531 10532 10533 10534 10535 10536 10537 10538 10539 10540 10541 10542 10543 10544 10545 10546 10547 10548 10549 10550 10551 10552 10553 10554 10555 10556 10557 10558 10559 10560 10561 10562 10563 10564 10565 10566 10567 10568 10569 10570 10571 10572 10573 10574 10575 10576 10577 10578 10579 10580 10581 10582 10583 10584 10585 10586 10587 10588 10589 10590 10591 10592 10593 10594 10595 10596 10597 10598 10599 10600 10601 10602 10603 10604 10605 10606 10607 10608 10609 10610 10611 10612 10613 10614 10615 10616 10617 10618 10619 10620 10621 10622 10623 10624 10625 10626 10627 10628 10629 10630 10631 10632 10633 10634 10635 10636 10637 10638 10639 10640 10641 10642 10643 10644 10645 10646 10647 10648 10649 10650 10651 10652 10653 10654 10655 10656 10657 10658 10659 10660 10661 10662 10663 10664 10665 10666 10667 10668 10669 10670 10671 10672 10673 10674 10675 10676 10677 10678 10679 10680 10681 10682 10683 10684 10685 10686 10687 10688 10689 10690 10691 10692 10693 10694 10695 10696 10697 10698 10699 10700 10701 10702 10703 10704 10705 10706 10707 10708 10709 10710 10711 10712 10713 10714 10715 10716 10717 10718 10719 10720 10721 10722 10723 10724 10725 10726 10727 10728 10729 10730 10731 10732 10733 10734 10735 10736 10737 10738 10739 10740 10741 10742 10743 10744 10745 10746 10747 10748 10749 10750 10751 10752 10753 10754 10755 10756 10757 10758 10759 10760 10761 10762 10763 10764 10765 10766 10767 10768 10769 10770 10771 10772 10773 10774 10775 10776 10777 10778 10779 10780 10781 10782 10783 10784 10785 10786 10787 10788 10789 10790 10791 10792 10793 10794 10795 10796 10797 10798 10799 10800 10801 10802 10803 10804 10805 10806 10807 10808 10809 10810 10811 10812 10813 10814 10815 10816 10817 10818 10819 10820 10821 10822 10823 10824 10825 10826 10827 10828 10829 10830 10831 10832 10833 10834 10835 10836 10837 10838 10839 10840 10841 10842 10843 10844 10845 10846 10847 10848 10849 10850 10851 10852 10853 10854 10855 10856 10857 10858 10859 10860 10861 10862 10863 10864 10865 10866 10867 10868 10869 10870 10871 10872 10873 10874 10875 10876 10877 10878 10879 10880 10881 10882 10883 10884 10885 10886 10887 10888 10889 10890 10891 10892 10893 10894 10895 10896 10897 10898 10899 10900 10901 10902 10903 10904 10905 10906 10907 10908 10909 10910 10911 10912 10913 10914 10915 10916 10917 10918 10919 10920 10921 10922 10923 10924 10925 10926 10927 10928 10929 10930 10931 10932 10933 10934 10935 10936 10937 10938 10939 10940 10941 10942 10943 10944 10945 10946 10947 10948 10949 10950 10951 10952 10953 10954 10955 10956 10957 10958 10959 10960 10961 10962 10963 10964 10965 10966 10967 10968 10969 10970 10971 10972 10973 10974 10975 10976 10977 10978 10979 10980 10981 10982 10983 10984 10985 10986 10987 10988 10989 10990 10991 10992 10993 10994 10995 10996 10997 10998 10999 11000 11001 11002 11003 11004 11005 11006 11007 11008 11009 11010 11011 11012 11013 11014 11015 11016 11017 11018 11019 11020 11021 11022 11023 11024 11025 11026 11027 11028 11029 11030 11031 11032 11033 11034 11035 11036 11037 11038 11039 11040 11041 11042 11043 11044 11045 11046 11047 11048 11049 11050 11051 11052 11053 11054 11055 11056 11057 11058 11059 11060 11061 11062 11063 11064 11065 11066 11067 11068 11069 11070 11071 11072 11073 11074 11075 11076 11077 11078 11079 11080 11081 11082 11083 11084 11085 11086 11087 11088 11089 11090 11091 11092 11093 11094 11095 11096 11097 11098 11099 11100 11101 11102 11103 11104 11105 11106 11107 11108 11109 11110 11111 11112 11113 11114 11115 11116 11117 11118 11119 11120 11121 11122 11123 11124 11125 11126 11127 11128 11129 11130 11131 11132 11133 11134 11135 11136 11137 11138 11139 11140 11141 11142 11143 11144 11145 11146 11147 11148 11149 11150 11151 11152 11153 11154 11155 11156 11157 11158 11159 11160 11161 11162 11163 11164 11165 11166 11167 11168 11169 11170 11171 11172 11173 11174 11175 11176 11177 11178 11179 11180 11181 11182 11183 11184 11185 11186 11187 11188 11189 11190 11191 11192 11193 11194 11195 11196 11197 11198 11199 11200 11201 11202 11203 11204 11205 11206 11207 11208 11209 11210 11211 11212 11213 11214 11215 11216 11217 11218 11219 11220 11221 11222 11223 11224 11225 11226 11227 11228 11229 11230 11231 11232 11233 11234 11235 11236 11237 11238 11239 11240 11241 11242 11243 11244 11245 11246 11247 11248 11249 11250 11251 11252 11253 11254 11255 11256 11257 11258 11259 11260 11261 11262 11263 11264 11265 11266 11267 11268 11269 11270 11271 11272 11273 11274 11275 11276 11277 11278 11279 11280 11281 11282 11283 11284 11285 11286 11287 11288 11289 11290 11291 11292 11293 11294 11295 11296 11297 11298 11299 11300 11301 11302 11303 11304 11305 11306 11307 11308 11309 11310 11311 11312 11313 11314 11315 11316 11317 11318 11319 11320 11321 11322 11323 11324 11325 11326 11327 11328 11329 11330 11331 11332 11333 11334 11335 11336 11337 11338 11339 11340 11341 11342 11343 11344 11345 11346 11347 11348 11349 11350 11351 11352 11353 11354 11355 11356 11357 11358 11359 11360 11361 11362 11363 11364 11365 11366 11367 11368 11369 11370 11371 11372 11373 11374 11375 11376 11377 11378 11379 11380 11381 11382 11383 11384 11385 11386 11387 11388 11389 11390 11391 11392 11393 11394 11395 11396 11397 11398 11399 11400 11401 11402 11403 11404 11405 11406 11407 11408 11409 11410 11411 11412 11413 11414 11415 11416 11417 11418 11419 11420 11421 11422 11423 11424 11425 11426 11427 11428 11429 11430 11431 11432 11433 11434 11435 11436 11437 11438 11439 11440 11441 11442 11443 11444 11445 11446 11447 11448 11449 11450 11451 11452 11453 11454 11455 11456 11457 11458 11459 11460 11461 11462 11463 11464 11465 11466 11467 11468 11469 11470 11471 11472 11473 11474 11475 11476 11477 11478 11479 11480 11481 11482 11483 11484 11485 11486 11487 11488 11489 11490 11491 11492 11493 11494 11495 11496 11497 11498 11499 11500 11501 11502 11503 11504 11505 11506 11507 11508 11509 11510 11511 11512 11513 11514 11515 11516 11517 11518 11519 11520 11521 11522 11523 11524 11525 11526 11527 11528 11529 11530 11531 11532 11533 11534 11535 11536 11537 11538 11539 11540 11541 11542 11543 11544 11545 11546 11547 11548 11549 11550 11551 11552 11553 11554 11555 11556 11557 11558 11559 11560 11561 11562 11563 11564 11565 11566 11567 11568 11569 11570 11571 11572 11573 11574 11575 11576 11577 11578 11579 11580 11581 11582 11583 11584 11585 11586 11587 11588 11589 11590 11591 11592 11593 11594 11595 11596 11597 11598 11599 11600 11601 11602 11603 11604 11605 11606 11607 11608 11609 11610 11611 11612 11613 11614 11615 11616 11617 11618 11619 11620 11621 11622 11623 11624 11625 11626 11627 11628 11629 11630 11631 11632 11633 11634 11635 11636 11637 11638 11639 11640 11641 11642 11643 11644 11645 11646 11647 11648 11649 11650 11651 11652 11653 11654 11655 11656 11657 11658 11659 11660 11661 11662 11663 11664 11665 11666 11667 11668 11669 11670 11671 11672 11673 11674 11675 11676 11677 11678 11679 11680 11681 11682 11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 11693 11694 11695 11696 11697 11698 11699 11700 11701 11702 11703 11704 11705 11706 11707 11708 11709 11710 11711 11712 11713 11714 11715 11716 11717 11718 11719 11720 11721 11722 11723 11724 11725 11726 11727 11728 11729 11730 11731 11732 11733 11734 11735 11736 11737 11738 11739 11740 11741 11742 11743 11744 11745 11746 11747 11748 11749 11750 11751 11752 11753 11754 11755 11756 11757 11758 11759 11760 11761 11762 11763 11764 11765 11766 11767 11768 11769 11770 11771 11772 11773 11774 11775 11776 11777 11778 11779 11780 11781 11782 11783 11784 11785 11786 11787 11788 11789 11790 11791 11792 11793 11794 11795 11796 11797 11798 11799 11800 11801 11802 11803 11804 11805 11806 11807 11808 11809 11810 11811 11812 11813 11814 11815 11816 11817 11818 11819 11820 11821 11822 11823 11824 11825 11826 11827 11828 11829 11830 11831 11832 11833 11834 11835 11836 11837 11838 11839 11840 11841 11842 11843 11844 11845 11846 11847 11848 11849 11850 11851 11852 11853 11854 11855 11856 11857 11858 11859 11860 11861 11862 11863 11864 11865 11866 11867 11868 11869 11870 11871 11872 11873 11874 11875 11876 11877 11878 11879 11880 11881 11882 11883 11884 11885 11886 11887 11888 11889 11890 11891 11892 11893 11894 11895 11896 11897 11898 11899 11900 11901 11902 11903 11904 11905 11906 11907 11908 11909 11910 11911 11912 11913 11914 11915 11916 11917 11918 11919 11920 11921 11922 11923 11924 11925 11926 11927 11928 11929 11930 11931 11932 11933 11934 11935 11936 11937 11938 11939 11940 11941 11942 11943 11944 11945 11946 11947 11948 11949 11950 11951 11952 11953 11954 11955 11956 11957 11958 11959 11960 11961 11962 11963 11964 11965 11966 11967 11968 11969 11970 11971 11972 11973 11974 11975 11976 11977 11978 11979 11980 11981 11982 11983 11984 11985 11986 11987 11988 11989 11990 11991 11992 11993 11994 11995 11996 11997 11998 11999 12000 12001 12002 12003 12004 12005 12006 12007 12008 12009 12010 12011 12012 12013 12014 12015 12016 12017 12018 12019 12020 12021 12022 12023 12024 12025 12026 12027 12028 12029 12030 12031 12032 12033 12034 12035 12036 12037 12038 12039 12040 12041 12042 12043 12044 12045 12046 12047 12048 12049 12050 12051 12052 12053 12054 12055 12056 12057 12058 12059 12060 12061 12062 12063 12064 12065 12066 12067 12068 12069 12070 12071 12072 12073 12074 12075 12076 12077 12078 12079 12080 12081 12082 12083 12084 12085 12086 12087 12088 12089 12090 12091 12092 12093 12094 12095 12096 12097 12098 12099 12100 12101 12102 12103 12104 12105 12106 12107 12108 12109 12110 12111 12112 12113 12114 12115 12116 12117 12118 12119 12120 12121 12122 12123 12124 12125 12126 12127 12128 12129 12130 12131 12132 12133 12134 12135 12136 12137 12138 12139 12140 12141 12142 12143 12144 12145 12146 12147 12148 12149 12150 12151 12152 12153 12154 12155 12156 12157 12158 12159 12160 12161 12162 12163 12164 12165 12166 12167 12168 12169 12170 12171 12172 12173 12174 12175 12176 12177 12178 12179 12180 12181 12182 12183 12184 12185 12186 12187 12188 12189 12190 12191 12192 12193 12194 12195 12196 12197 12198 12199 12200 12201 12202 12203 12204 12205 12206 12207 12208 12209 12210 12211 12212 12213 12214 12215 12216 12217 12218 12219 12220 12221 12222 12223 12224 12225 12226 12227 12228 12229 12230 12231 12232 12233 12234 12235 12236 12237 12238 12239 12240 12241 12242 12243 12244 12245 12246 12247 12248 12249 12250 12251 12252 12253 12254 12255 12256 12257 12258 12259 12260 12261 12262 12263 12264 12265 12266 12267 12268 12269 12270 12271 12272 12273 12274 12275 12276 12277 12278 12279 12280 12281 12282 12283 12284 12285 12286 12287 12288 12289 12290 12291 12292 12293 12294 12295 12296 12297 12298 12299 12300 12301 12302 12303 12304 12305 12306 12307 12308 12309 12310 12311 12312 12313 12314 12315 12316 12317 12318 12319 12320 12321 12322 12323 12324 12325 12326 12327 12328 12329 12330 12331 12332 12333 12334 12335 12336 12337 12338 12339 12340 12341 12342 12343 12344 12345 12346 12347 12348 12349 12350 12351 12352 12353 12354 12355 12356 12357 12358 12359 12360 12361 12362 12363 12364 12365 12366 12367 12368 12369 12370 12371 12372 12373 12374 12375 12376 12377 12378 12379 12380 12381 12382 12383 12384 12385 12386 12387 12388 12389 12390 12391 12392 12393 12394 12395 12396 12397 12398 12399 12400 12401 12402 12403 12404 12405 12406 12407 12408 12409 12410 12411 12412 12413 12414 12415 12416 12417 12418 12419 12420 12421 12422 12423 12424 12425 12426 12427 12428 12429 12430 12431 12432 12433 12434 12435 12436 12437 12438 12439 12440 12441 12442 12443 12444 12445 12446 12447 12448 12449 12450 12451 12452 12453 12454 12455 12456 12457 12458 12459 12460 12461 12462 12463 12464 12465 12466 12467 12468 12469 12470 12471 12472 12473 12474 12475 12476 12477 12478 12479 12480 12481 12482 12483 12484 12485 12486 12487 12488 12489 12490 12491 12492 12493 12494 12495 12496 12497 12498 12499 12500 12501 12502 12503 12504 12505 12506 12507 12508 12509 12510 12511 12512 12513 12514 12515 12516 12517 12518 12519 12520 12521 12522 12523 12524 12525 12526 12527 12528 12529 12530 12531 12532 12533 12534 12535 12536 12537 12538 12539 12540 12541 12542 12543 12544 12545 12546 12547 12548 12549 12550 12551 12552 12553 12554 12555 12556 12557 12558 12559 12560 12561 12562 12563 12564 12565 12566 12567 12568 12569 12570 12571 12572 12573 12574 12575 12576 12577 12578 12579 12580 12581 12582 12583 12584 12585 12586 12587 12588 12589 12590 12591 12592 12593 12594 12595 12596 12597 12598 12599 12600 12601 12602 12603 12604 12605 12606 12607 12608 12609 12610 12611 12612 12613 12614 12615 12616 12617 12618 12619 12620 12621 12622 12623 12624 12625 12626 12627 12628 12629 12630 12631 12632 12633 12634 12635 12636 12637 12638 12639 12640 12641 12642 12643 12644 12645 12646 12647 12648 12649 12650 12651 12652 12653 12654 12655 12656 12657 12658 12659 12660 12661 12662 12663 12664 12665 12666 12667 12668 12669 12670 12671 12672 12673 12674 12675 12676 12677 12678 12679 12680 12681 12682 12683 12684 12685 12686 12687 12688 12689 12690 12691 12692 12693 12694 12695 12696 12697 12698 12699 12700 12701 12702 12703 12704 12705 12706 12707 12708 12709 12710 12711 12712 12713 12714 12715 12716 12717 12718 12719 12720 12721 12722 12723 12724 12725 12726 12727 12728 12729 12730 12731 12732 12733 12734 12735 12736 12737 12738 12739 12740 12741 12742 12743 12744 12745 12746 12747 12748 12749 12750 12751 12752 12753 12754 12755 12756 12757 12758 12759 12760 12761 12762 12763 12764 12765 12766 12767 12768 12769 12770 12771 12772 12773 12774 12775 12776 12777 12778 12779 12780 12781 12782 12783 12784 12785 12786 12787 12788 12789 12790 12791 12792 12793 12794 12795 12796 12797 12798 12799 12800 12801 12802 12803 12804 12805 12806 12807 12808 12809 12810 12811 12812 12813 12814 12815 12816 12817 12818 12819 12820 12821 12822 12823 12824 12825 12826 12827 12828 12829 12830 12831 12832 12833 12834 12835 12836 12837 12838 12839 12840 12841 12842 12843 12844 12845 12846 12847 12848 12849 12850 12851 12852 12853 12854 12855 12856 12857 12858 12859 12860 12861 12862 12863 12864 12865 12866 12867 12868 12869 12870 12871 12872 12873 12874 12875 12876 12877 12878 12879 12880 12881 12882 12883 12884 12885 12886 12887 12888 12889 12890 12891 12892 12893 12894 12895 12896 12897 12898 12899 12900 12901 12902 12903 12904 12905 12906 12907 12908 12909 12910 12911 12912 12913 12914 12915 12916 12917 12918 12919 12920 12921 12922 12923 12924 12925 12926 12927 12928 12929 12930 12931 12932 12933 12934 12935 12936 12937 12938 12939 12940 12941 12942 12943 12944 12945 12946 12947 12948 12949 12950 12951 12952 12953 12954 12955 12956 12957 12958 12959 12960 12961 12962 12963 12964 12965 12966 12967 12968 12969 12970 12971 12972 12973 12974 12975 12976 12977 12978 12979 12980 12981 12982 12983 12984 12985 12986 12987 12988 12989 12990 12991 12992 12993 12994 12995 12996 12997 12998 12999 13000 13001 13002 13003 13004 13005 13006 13007 13008 13009 13010 13011 13012 13013 13014 13015 13016 13017 13018 13019 13020 13021 13022 13023 13024 13025 13026 13027 13028 13029 13030 13031 13032 13033 13034 13035 13036 13037 13038 13039 13040 13041 13042 13043 13044 13045 13046 13047 13048 13049 13050 13051 13052 13053 13054 13055 13056 13057 13058 13059 13060 13061 13062 13063 13064 13065 13066 13067 13068 13069 13070 13071 13072 13073 13074 13075 13076 13077 13078 13079 13080 13081 13082 13083 13084 13085 13086 13087 13088 13089 13090 13091 13092 13093 13094 13095 13096 13097 13098 13099 13100 13101 13102 13103 13104 13105 13106 13107 13108 13109 13110 13111 13112 13113 13114 13115 13116 13117 13118 13119 13120 13121 13122 13123 13124 13125 13126 13127 13128 13129 13130 13131 13132 13133 13134 13135 13136 13137 13138 13139 13140 13141 13142 13143 13144 13145 13146 13147 13148 13149 13150 13151 13152 13153 13154 13155 13156 13157 13158 13159 13160 13161 13162 13163 13164 13165 13166 13167 13168 13169 13170 13171 13172 13173 13174 13175 13176 13177 13178 13179 13180 13181 13182 13183 13184 13185 13186 13187 13188 13189 13190 13191 13192 13193 13194 13195 13196 13197 13198 13199 13200 13201 13202 13203 13204 13205 13206 13207 13208 13209 13210 13211 13212 13213 13214 13215 13216 13217 13218 13219 13220 13221 13222 13223 13224 13225 13226 13227 13228 13229 13230 13231 13232 13233 13234 13235 13236 13237 13238 13239 13240 13241 13242 13243 13244 13245 13246 13247 13248 13249 13250 13251 13252 13253 13254 13255 13256 13257 13258 13259 13260 13261 13262 13263 13264 13265 13266 13267 13268 13269 13270 13271 13272 13273 13274 13275 13276 13277 13278 13279 13280 13281 13282 13283 13284 13285 13286 13287 13288 13289 13290 13291 13292 13293 13294 13295 13296 13297 13298 13299 13300 13301 13302 13303 13304 13305 13306 13307 13308 13309 13310 13311 13312 13313 13314 13315 13316 13317 13318 13319 13320 13321 13322 13323 13324 13325 13326 13327 13328 13329 13330 13331 13332 13333 13334 13335 13336 13337 13338 13339 13340 13341 13342 13343 13344 13345 13346 13347 13348 13349 13350 13351 13352 13353 13354 13355 13356 13357 13358 13359 13360 13361 13362 13363 13364 13365 13366 13367 13368 13369 13370 13371 13372 13373 13374 13375 13376 13377 13378 13379 13380 13381 13382 13383 13384 13385 13386 13387 13388 13389 13390 13391 13392 13393 13394 13395 13396 13397 13398 13399 13400 13401 13402 13403 13404 13405 13406 13407 13408 13409 13410 13411 13412 13413 13414 13415 13416 13417 13418 13419 13420 13421 13422 13423 13424 13425 13426 13427 13428 13429 13430 13431 13432 13433 13434 13435 13436 13437 13438 13439 13440 13441 13442 13443 13444 13445 13446 13447 13448 13449 13450 13451 13452 13453 13454 13455 13456 13457 13458 13459 13460 13461 13462 13463 13464 13465 13466 13467 13468 13469 13470 13471 13472 13473 13474 13475 13476 13477 13478 13479 13480 13481 13482 13483 13484 13485 13486 13487 13488 13489 13490 13491 13492 13493 13494 13495 13496 13497 13498 13499 13500 13501 13502 13503 13504 13505 13506 13507 13508 13509 13510 13511 13512 13513 13514 13515 13516 13517 13518 13519 13520 13521 13522 13523 13524 13525 13526 13527 13528 13529 13530 13531 13532 13533 13534 13535 13536 13537 13538 13539 13540 13541 13542 13543 13544 13545 13546 13547 13548 13549 13550 13551 13552 13553 13554 13555 13556 13557 13558 13559 13560 13561 13562 13563 13564 13565 13566 13567 13568 13569 13570 13571 13572 13573 13574 13575 13576 13577 13578 13579 13580 13581 13582 13583 13584 13585 13586 13587 13588 13589 13590 13591 13592 13593 13594 13595 13596 13597 13598 13599 13600 13601 13602 13603 13604 13605 13606 13607 13608 13609 13610 13611 13612 13613 13614 13615 13616 13617 13618 13619 13620 13621 13622 13623 13624 13625 13626 13627 13628 13629 13630 13631 13632 13633 13634 13635 13636 13637 13638 13639 13640 13641 13642 13643 13644 13645 13646 13647 13648 13649 13650 13651 13652 13653 13654 13655 13656 13657 13658 13659 13660 13661 13662 13663 13664 13665 13666 13667 13668 13669 13670 13671 13672 13673 13674 13675 13676 13677 13678 13679 13680 13681 13682 13683 13684 13685 13686 13687 13688 13689 13690 13691 13692 13693 13694 13695 13696 13697 13698 13699 13700 13701 13702 13703 13704 13705 13706 13707 13708 13709 13710 13711 13712 13713 13714 13715 13716 13717 13718 13719 13720 13721 13722 13723 13724 13725 13726 13727 13728 13729 13730 13731 13732 13733 13734 13735 13736 13737 13738 13739 13740 13741 13742 13743 13744 13745 13746 13747 13748 13749 13750 13751 13752 13753 13754 13755 13756 13757 13758 13759 13760 13761 13762 13763 13764 13765 13766 13767 13768 13769 13770 13771 13772 13773 13774 13775 13776 13777 13778 13779 13780 13781 13782 13783 13784 13785 13786 13787 13788 13789 13790 13791 13792 13793 13794 13795 13796 13797 13798 13799 13800 13801 13802 13803 13804 13805 13806 13807 13808 13809 13810 13811 13812 13813 13814 13815 13816 13817 13818 13819 13820 13821 13822 13823 13824 13825 13826 13827 13828 13829 13830 13831 13832 13833 13834 13835 13836 13837 13838 13839 13840 13841 13842 13843 13844 13845 13846 13847 13848 13849 13850 13851 13852 13853 13854 13855 13856 13857 13858 13859 13860 13861 13862 13863 13864 13865 13866 13867 13868 13869 13870 13871 13872 13873 13874 13875 13876 13877 13878 13879 13880 13881 13882 13883 13884 13885 13886 13887 13888 13889 13890 13891 13892 13893 13894 13895 13896 13897 13898 13899 13900 13901 13902 13903 13904 13905 13906 13907 13908 13909 13910 13911 13912 13913 13914 13915 13916 13917 13918 13919 13920 13921 13922 13923 13924 13925 13926 13927 13928 13929 13930 13931 13932 13933 13934 13935 13936 13937 13938 13939 13940 13941 13942 13943 13944 13945 13946 13947 13948 13949 13950 13951 13952 13953 13954 13955 13956 13957 13958 13959 13960 13961 13962 13963 13964 13965 13966 13967 13968 13969 13970 13971 13972 13973 13974 13975 13976 13977 13978 13979 13980 13981 13982 13983 13984 13985 13986 13987 13988 13989 13990 13991 13992 13993 13994 13995 13996 13997 13998 13999 14000 14001 14002 14003 14004 14005 14006 14007 14008 14009 14010 14011 14012 14013 14014 14015 14016 14017 14018 14019 14020 14021 14022 14023 14024 14025 14026 14027 14028 14029 14030 14031 14032 14033 14034 14035 14036 14037 14038 14039 14040 14041 14042 14043 14044 14045 14046 14047 14048 14049 14050 14051 14052 14053 14054 14055 14056 14057 14058 14059 14060 14061 14062 14063 14064 14065 14066 14067 14068 14069 14070 14071 14072 14073 14074 14075 14076 14077 14078 14079 14080 14081 14082 14083 14084 14085 14086 14087 14088 14089 14090 14091 14092 14093 14094 14095 14096 14097 14098 14099 14100 14101 14102 14103 14104 14105 14106 14107 14108 14109 14110 14111 14112 14113 14114 14115 14116 14117 14118 14119 14120 14121 14122 14123 14124 14125 14126 14127 14128 14129 14130 14131 14132 14133 14134 14135 14136 14137 14138 14139 14140 14141 14142 14143 14144 14145 14146 14147 14148 14149 14150 14151 14152 14153 14154 14155 14156 14157 14158 14159 14160 14161 14162 14163 14164 14165 14166 14167 14168 14169 14170 14171 14172 14173 14174 14175 14176 14177 14178 14179 14180 14181 14182 14183 14184 14185 14186 14187 14188 14189 14190 14191 14192 14193 14194 14195 14196 14197 14198 14199 14200 14201 14202 14203 14204 14205 14206 14207 14208 14209 14210 14211 14212 14213 14214 14215 14216 14217 14218 14219 14220 14221 14222 14223 14224 14225 14226 14227 14228 14229 14230 14231 14232 14233 14234 14235 14236 14237 14238 14239 14240 14241 14242 14243 14244 14245 14246 14247 14248 14249 14250 14251 14252 14253 14254 14255 14256 14257 14258 14259 14260 14261 14262 14263 14264 14265 14266 14267 14268 14269 14270 14271 14272 14273 14274 14275 14276 14277 14278 14279 14280 14281 14282 14283 14284 14285 14286 14287 14288 14289 14290 14291 14292 14293 14294 14295 14296 14297 14298 14299 14300 14301 14302 14303 14304 14305 14306 14307 14308 14309 14310 14311 14312 14313 14314 14315 14316 14317 14318 14319 14320 14321 14322 14323 14324 14325 14326 14327 14328 14329 14330 14331 14332 14333 14334 14335 14336 14337 14338 14339 14340 14341 14342 14343 14344 14345 14346 14347 14348 14349 14350 14351 14352 14353 14354 14355 14356 14357 14358 14359 14360 14361 14362 14363 14364 14365 14366 14367 14368 14369 14370 14371 14372 14373 14374 14375 14376 14377 14378 14379 14380 14381 14382 14383 14384 14385 14386 14387 14388 14389 14390 14391 14392 14393 14394 14395 14396 14397 14398 14399 14400 14401 14402 14403 14404 14405 14406 14407 14408 14409 14410 14411 14412 14413 14414 14415 14416 14417 14418 14419 14420 14421 14422 14423 14424 14425 14426 14427 14428 14429 14430 14431 14432 14433 14434 14435 14436 14437 14438 14439 14440 14441 14442 14443 14444 14445 14446 14447 14448 14449 14450 14451 14452 14453 14454 14455 14456 14457 14458 14459 14460 14461 14462 14463 14464 14465 14466 14467 14468 14469 14470 14471 14472 14473 14474 14475 14476 14477 14478 14479 14480 14481 14482 14483 14484 14485 14486 14487 14488 14489 14490 14491 14492 14493 14494 14495 14496 14497 14498 14499 14500 14501 14502 14503 14504 14505 14506 14507 14508 14509 14510 14511 14512 14513 14514 14515 14516 14517 14518 14519 14520 14521 14522 14523 14524 14525 14526 14527 14528 14529 14530 14531 14532 14533 14534 14535 14536 14537 14538 14539 14540 14541 14542 14543 14544 14545 14546 14547 14548 14549 14550 14551 14552 14553 14554 14555 14556 14557 14558 14559 14560 14561 14562 14563 14564 14565 14566 14567 14568 14569 14570 14571 14572 14573 14574 14575 14576 14577 14578 14579 14580 14581 14582 14583 14584 14585 14586 14587 14588 14589 14590 14591 14592 14593 14594 14595 14596 14597 14598 14599 14600 14601 14602 14603 14604 14605 14606 14607 14608 14609 14610 14611 14612 14613 14614 14615 14616 14617 14618 14619 14620 14621 14622 14623 14624 14625 14626 14627 14628 14629 14630 14631 14632 14633 14634 14635 14636 14637 14638 14639 14640 14641 14642 14643 14644 14645 14646 14647 14648 14649 14650 14651 14652 14653 14654 14655 14656 14657 14658 14659 14660 14661 14662 14663 14664 14665 14666 14667 14668 14669 14670 14671 14672 14673 14674 14675 14676 14677 14678 14679 14680 14681 14682 14683 14684 14685 14686 14687 14688 14689 14690 14691 14692 14693 14694 14695 14696 14697 14698 14699 14700 14701 14702 14703 14704 14705 14706 14707 14708 14709 14710 14711 14712 14713 14714 14715 14716 14717 14718 14719 14720 14721 14722 14723 14724 14725 14726 14727 14728 14729 14730 14731 14732 14733 14734 14735 14736 14737 14738 14739 14740 14741 14742 14743 14744 14745 14746 14747 14748 14749 14750 14751 14752 14753 14754 14755 14756 14757 14758 14759 14760 14761 14762 14763 14764 14765 14766 14767 14768 14769 14770 14771 14772 14773 14774 14775 14776 14777 14778 14779 14780 14781 14782 14783 14784 14785 14786 14787 14788 14789 14790 14791 14792 14793 14794 14795 14796 14797 14798 14799 14800 14801 14802 14803 14804 14805 14806 14807 14808 14809 14810 14811 14812 14813 14814 14815 14816 14817 14818 14819 14820 14821 14822 14823 14824 14825 14826 14827 14828 14829 14830 14831 14832 14833 14834 14835 14836 14837 14838 14839 14840 14841 14842 14843 14844 14845 14846 14847 14848 14849 14850 14851 14852 14853 14854 14855 14856 14857 14858 14859 14860 14861 14862 14863 14864 14865 14866 14867 14868 14869 14870 14871 14872 14873 14874 14875 14876 14877 14878 14879 14880 14881 14882 14883 14884 14885 14886 14887 14888 14889 14890 14891 14892 14893 14894 14895 14896 14897 14898 14899 14900 14901 14902 14903 14904 14905 14906 14907 14908 14909 14910 14911 14912 14913 14914 14915 14916 14917 14918 14919 14920 14921 14922 14923 14924 14925 14926 14927 14928 14929 14930 14931 14932 14933 14934 14935 14936 14937 14938 14939 14940 14941 14942 14943 14944 14945 14946 14947 14948 14949 14950 14951 14952 14953 14954 14955 14956 14957 14958 14959 14960 14961 14962 14963 14964 14965 14966 14967 14968 14969 14970 14971 14972 14973 14974 14975 14976 14977 14978 14979 14980 14981 14982 14983 14984 14985 14986 14987 14988 14989 14990 14991 14992 14993 14994 14995 14996 14997 14998 14999 15000 15001 15002 15003 15004 15005 15006 15007 15008 15009 15010 15011 15012 15013 15014 15015 15016 15017 15018 15019 15020 15021 15022 15023 15024 15025 15026 15027 15028 15029 15030 15031 15032 15033 15034 15035 15036 15037 15038 15039 15040 15041 15042 15043 15044 15045 15046 15047 15048 15049 15050 15051 15052 15053 15054 15055 15056 15057 15058 15059 15060 15061 15062 15063 15064 15065 15066 15067 15068 15069 15070 15071 15072 15073 15074 15075 15076 15077 15078 15079 15080 15081 15082 15083 15084 15085 15086 15087 15088 15089 15090 15091 15092 15093 15094 15095 15096 15097 15098 15099 15100 15101 15102 15103 15104 15105 15106 15107 15108 15109 15110 15111 15112 15113 15114 15115 15116 15117 15118 15119 15120 15121 15122 15123 15124 15125 15126 15127 15128 15129 15130 15131 15132 15133 15134 15135 15136 15137 15138 15139 15140 15141 15142 15143 15144 15145 15146 15147 15148 15149 15150 15151 15152 15153 15154 15155 15156 15157 15158 15159 15160 15161 15162 15163 15164 15165 15166 15167 15168 15169 15170 15171 15172 15173 15174 15175 15176 15177 15178 15179 15180 15181 15182 15183 15184 15185 15186 15187 15188 15189 15190 15191 15192 15193 15194 15195 15196 15197 15198 15199 15200 15201 15202 15203 15204 15205 15206 15207 15208 15209 15210 15211 15212 15213 15214 15215 15216 15217 15218 15219 15220 15221 15222 15223 15224 15225 15226 15227 15228 15229 15230 15231 15232 15233 15234 15235 15236 15237 15238 15239 15240 15241 15242 15243 15244 15245 15246 15247 15248 15249 15250 15251 15252 15253 15254 15255 15256 15257 15258 15259 15260 15261 15262 15263 15264 15265 15266 15267 15268 15269 15270 15271 15272 15273 15274 15275 15276 15277 15278 15279 15280 15281 15282 15283 15284 15285 15286 15287 15288 15289 15290 15291 15292 15293 15294 15295 15296 15297 15298 15299 15300 15301 15302 15303 15304 15305 15306 15307 15308 15309 15310 15311 15312 15313 15314 15315 15316 15317 15318 15319 15320 15321 15322 15323 15324 15325 15326 15327 15328 15329 15330 15331 15332 15333 15334 15335 15336 15337 15338 15339 15340 15341 15342 15343 15344 15345 15346 15347 15348 15349 15350 15351 15352 15353 15354 15355 15356 15357 15358 15359 15360 15361 15362 15363 15364 15365 15366 15367 15368 15369 15370 15371 15372 15373 15374 15375 15376 15377 15378 15379 15380 15381 15382 15383 15384 15385 15386 15387 15388 15389 15390 15391 15392 15393 15394 15395 15396 15397 15398 15399 15400 15401 15402 15403 15404 15405 15406 15407 15408 15409 15410 15411 15412 15413 15414 15415 15416 15417 15418 15419 15420 15421 15422 15423 15424 15425 15426 15427 15428 15429 15430 15431 15432 15433 15434 15435 15436 15437 15438 15439 15440 15441 15442 15443 15444 15445 15446 15447 15448 15449 15450 15451 15452 15453 15454 15455 15456 15457 15458 15459 15460 15461 15462 15463 15464 15465 15466 15467 15468 15469 15470 15471 15472 15473 15474 15475 15476 15477 15478 15479 15480 15481 15482 15483 15484 15485 15486 15487 15488 15489 15490 15491 15492 15493 15494 15495 15496 15497 15498 15499 15500 15501 15502 15503 15504 15505 15506 15507 15508 15509 15510 15511 15512 15513 15514 15515 15516 15517 15518 15519 15520 15521 15522 15523 15524 15525 15526 15527 15528 15529 15530 15531 15532 15533 15534 15535 15536 15537 15538 15539 15540 15541 15542 15543 15544 15545 15546 15547 15548 15549 15550 15551 15552 15553 15554 15555 15556 15557 15558 15559 15560 15561 15562 15563 15564 15565 15566 15567 15568 15569 15570 15571 15572 15573 15574 15575 15576 15577 15578 15579 15580 15581 15582 15583 15584 15585 15586 15587 15588 15589 15590 15591 15592 15593 15594 15595 15596 15597 15598 15599 15600 15601 15602 15603 15604 15605 15606 15607 15608 15609 15610 15611 15612 15613 15614 15615 15616 15617 15618 15619 15620 15621 15622 15623 15624 15625 15626 15627 15628 15629 15630 15631 15632 15633 15634 15635 15636 15637 15638 15639 15640 15641 15642 15643 15644 15645 15646 15647 15648 15649 15650 15651 15652 15653 15654 15655 15656 15657 15658 15659 15660 15661 15662 15663 15664 15665 15666 15667 15668 15669 15670 15671 15672 15673 15674 15675 15676 15677 15678 15679 15680 15681 15682 15683 15684 15685 15686 15687 15688 15689 15690 15691 15692 15693 15694 15695 15696 15697 15698 15699 15700 15701 15702 15703 15704 15705 15706 15707 15708 15709 15710 15711 15712 15713 15714 15715 15716 15717 15718 15719 15720 15721 15722 15723 15724 15725 15726 15727 15728 15729 15730 15731 15732 15733 15734 15735 15736 15737 15738 15739 15740 15741 15742 15743 15744 15745 15746 15747 15748 15749 15750 15751 15752 15753 15754 15755 15756 15757 15758 15759 15760 15761 15762 15763 15764 15765 15766 15767 15768 15769 15770 15771 15772 15773 15774 15775 15776 15777 15778 15779 15780 15781 15782 15783 15784 15785 15786 15787 15788 15789 15790 15791 15792 15793 15794 15795 15796 15797 15798 15799 15800 15801 15802 15803 15804 15805 15806 15807 15808 15809 15810 15811 15812 15813 15814 15815 15816 15817 15818 15819 15820 15821 15822 15823 15824 15825 15826 15827 15828 15829 15830 15831 15832 15833 15834 15835 15836 15837 15838 15839 15840 15841 15842 15843 15844 15845 15846 15847 15848 15849 15850 15851 15852 15853 15854 15855 15856 15857 15858 15859 15860 15861 15862 15863 15864 15865 15866 15867 15868 15869 15870 15871 15872 15873 15874 15875 15876 15877 15878 15879 15880 15881 15882 15883 15884 15885 15886 15887 15888 15889 15890 15891 15892 15893 15894 15895 15896 15897 15898 15899 15900 15901 15902 15903 15904 15905 15906 15907 15908 15909 15910 15911 15912 15913 15914 15915 15916 15917 15918 15919 15920 15921 15922 15923 15924 15925 15926 15927 15928 15929 15930 15931 15932 15933 15934 15935 15936 15937 15938 15939 15940 15941 15942 15943 15944 15945 15946 15947 15948 15949 15950 15951 15952 15953 15954 15955 15956 15957 15958 15959 15960 15961 15962 15963 15964 15965 15966 15967 15968 15969 15970 15971 15972 15973 15974 15975 15976 15977 15978 15979 15980 15981 15982 15983 15984 15985 15986 15987 15988 15989 15990 15991 15992 15993 15994 15995 15996 15997 15998 15999 16000 16001 16002 16003 16004 16005 16006 16007 16008 16009 16010 16011 16012 16013 16014 16015 16016 16017 16018 16019 16020 16021 16022 16023 16024 16025 16026 16027 16028 16029 16030 16031 16032 16033 16034 16035 16036 16037 16038 16039 16040 16041 16042 16043 16044 16045 16046 16047 16048 16049 16050 16051 16052 16053 16054 16055 16056 16057 16058 16059 16060 16061 16062 16063 16064 16065 16066 16067 16068 16069 16070 16071 16072 16073 16074 16075 16076 16077 16078 16079 16080 16081 16082 16083 16084 16085 16086 16087 16088 16089 16090 16091 16092 16093 16094 16095 16096 16097 16098 16099 16100 16101 16102 16103 16104 16105 16106 16107 16108 16109 16110 16111 16112 16113 16114 16115 16116 16117 16118 16119 16120 16121 16122 16123 16124 16125 16126 16127 16128 16129 16130 16131 16132 16133 16134 16135 16136 16137 16138 16139 16140 16141 16142 16143 16144 16145 16146 16147 16148 16149 16150 16151 16152 16153 16154 16155 16156 16157 16158 16159 16160 16161 16162 16163 16164 16165 16166 16167 16168 16169 16170 16171 16172 16173 16174 16175 16176 16177 16178 16179 16180 16181 16182 16183 16184 16185 16186 16187 16188 16189 16190 16191 16192 16193 16194 16195 16196 16197 16198 16199 16200 16201 16202 16203 16204 16205 16206 16207 16208 16209 16210 16211 16212 16213 16214 16215 16216 16217 16218 16219 16220 16221 16222 16223 16224 16225 16226 16227 16228 16229 16230 16231 16232 16233 16234 16235 16236 16237 16238 16239 16240 16241 16242 16243 16244 16245 16246 16247 16248 16249 16250 16251 16252 16253 16254 16255 16256 16257 16258 16259 16260 16261 16262 16263 16264 16265 16266 16267 16268 16269 16270 16271 16272 16273 16274 16275 16276 16277 16278 16279 16280 16281 16282 16283 16284 16285 16286 16287 16288 16289 16290 16291 16292 16293 16294 16295 16296 16297 16298 16299 16300 16301 16302 16303 16304 16305 16306 16307 16308 16309 16310 16311 16312 16313 16314 16315 16316 16317 16318 16319 16320 16321 16322 16323 16324 16325 16326 16327 16328 16329 16330 16331 16332 16333 16334 16335 16336 16337 16338 16339 16340 16341 16342 16343 16344 16345 16346 16347 16348 16349 16350 16351 16352 16353 16354 16355 16356 16357 16358 16359 16360 16361 16362 16363 16364 16365 16366 16367 16368 16369 16370 16371 16372 16373 16374 16375 16376 16377 16378 16379 16380 16381 16382 16383 16384 16385 16386 16387 16388 16389 16390 16391 16392 16393 16394 16395 16396 16397 16398 16399 16400 16401 16402 16403 16404 16405 16406 16407 16408 16409 16410 16411 16412 16413 16414 16415 16416 16417 16418 16419 16420 16421 16422 16423 16424 16425 16426 16427 16428 16429 16430 16431 16432 16433 16434 16435 16436 16437 16438 16439 16440 16441 16442 16443 16444 16445 16446 16447 16448 16449 16450 16451 16452 16453 16454 16455 16456 16457 16458 16459 16460 16461 16462 16463 16464 16465 16466 16467 16468 16469 16470 16471 16472 16473 16474 16475 16476 16477 16478 16479 16480 16481 16482 16483 16484 16485 16486 16487 16488 16489 16490 16491 16492 16493 16494 16495 16496 16497 16498 16499 16500 16501 16502 16503 16504 16505 16506 16507 16508 16509 16510 16511 16512 16513 16514 16515 16516 16517 16518 16519 16520 16521 16522 16523 16524 16525 16526 16527 16528 16529 16530 16531 16532 16533 16534 16535 16536 16537 16538 16539 16540 16541 16542 16543 16544 16545 16546 16547 16548 16549 16550 16551 16552 16553 16554 16555 16556 16557 16558 16559 16560 16561 16562 16563 16564 16565 16566 16567 16568 16569 16570 16571 16572 16573 16574 16575 16576 16577 16578 16579 16580 16581 16582 16583 16584 16585 16586 16587 16588 16589 16590 16591 16592 16593 16594 16595 16596 16597 16598 16599 16600 16601 16602 16603 16604 16605 16606 16607 16608 16609 16610 16611 16612 16613 16614 16615 16616 16617 16618 16619 16620 16621 16622 16623 16624 16625 16626 16627 16628 16629 16630 16631 16632 16633 16634 16635 16636 16637 16638 16639 16640 16641 16642 16643 16644 16645 16646 16647 16648 16649 16650 16651 16652 16653 16654 16655 16656 16657 16658 16659 16660 16661 16662 16663 16664 16665 16666 16667 16668 16669 16670 16671 16672 16673 16674 16675 16676 16677 16678 16679 16680 16681 16682 16683 16684 16685 16686 16687 16688 16689 16690 16691 16692 16693 16694 16695 16696 16697 16698 16699 16700 16701 16702 16703 16704 16705 16706 16707 16708 16709 16710 16711 16712 16713 16714 16715 16716 16717 16718 16719 16720 16721 16722 16723 16724 16725 16726 16727 16728 16729 16730 16731 16732 16733 16734 16735 16736 16737 16738 16739 16740 16741 16742 16743 16744 16745 16746 16747 16748 16749 16750 16751 16752 16753 16754 16755 16756 16757 16758 16759 16760 16761 16762 16763 16764 16765 16766 16767 16768 16769 16770 16771 16772 16773 16774 16775 16776 16777 16778 16779 16780 16781 16782 16783 16784 16785 16786 16787 16788 16789 16790 16791 16792 16793 16794 16795 16796 16797 16798 16799 16800 16801 16802 16803 16804 16805 16806 16807 16808 16809 16810 16811 16812 16813 16814 16815 16816 16817 16818 16819 16820 16821 16822 16823 16824 16825 16826 16827 16828 16829 16830 16831 16832 16833 16834 16835 16836 16837 16838 16839 16840 16841 16842 16843 16844 16845 16846 16847 16848 16849 16850 16851 16852 16853 16854 16855 16856 16857 16858 16859 16860 16861 16862 16863 16864 16865 16866 16867 16868 16869 16870 16871 16872 16873 16874 16875 16876 16877 16878 16879 16880 16881 16882 16883 16884 16885 16886 16887 16888 16889 16890 16891 16892 16893 16894 16895 16896 16897 16898 16899 16900 16901 16902 16903 16904 16905 16906 16907 16908 16909 16910 16911 16912 16913 16914 16915 16916 16917 16918 16919 16920 16921 16922 16923 16924 16925 16926 16927 16928 16929 16930 16931 16932 16933 16934 16935 16936 16937 16938 16939 16940 16941 16942 16943 16944 16945 16946 16947 16948 16949 16950 16951 16952 16953 16954 16955 16956 16957 16958 16959 16960 16961 16962 16963 16964 16965 16966 16967 16968 16969 16970 16971 16972 16973 16974 16975 16976 16977 16978 16979 16980 16981 16982 16983 16984 16985 16986 16987 16988 16989 16990 16991 16992 16993 16994 16995 16996 16997 16998 16999 17000 17001 17002 17003 17004 17005 17006 17007 17008 17009 17010 17011 17012 17013 17014 17015 17016 17017 17018 17019 17020 17021 17022 17023 17024 17025 17026 17027 17028 17029 17030 17031 17032 17033 17034 17035 17036 17037 17038 17039 17040 17041 17042 17043 17044 17045 17046 17047 17048 17049 17050 17051 17052 17053 17054 17055 17056 17057 17058 17059 17060 17061 17062 17063 17064 17065 17066 17067 17068 17069 17070 17071 17072 17073 17074 17075 17076 17077 17078 17079 17080 17081 17082 17083 17084 17085 17086 17087 17088 17089 17090 17091 17092 17093 17094 17095 17096 17097 17098 17099 17100 17101 17102 17103 17104 17105 17106 17107 17108 17109 17110 17111 17112 17113 17114 17115 17116 17117 17118 17119 17120 17121 17122 17123 17124 17125 17126 17127 17128 17129 17130 17131 17132 17133 17134 17135 17136 17137 17138 17139 17140 17141 17142 17143 17144 17145 17146 17147 17148 17149 17150 17151 17152 17153 17154 17155 17156 17157 17158 17159 17160 17161 17162 17163 17164 17165 17166 17167 17168 17169 17170 17171 17172 17173 17174 17175 17176 17177 17178 17179 17180 17181 17182 17183 17184 17185 17186 17187 17188 17189 17190 17191 17192 17193 17194 17195 17196 17197 17198 17199 17200 17201 17202 17203 17204 17205 17206 17207 17208 17209 17210 17211 17212 17213 17214 17215 17216 17217 17218 17219 17220 17221 17222 17223 17224 17225 17226 17227 17228 17229 17230 17231 17232 17233 17234 17235 17236 17237 17238 17239 17240 17241 17242 17243 17244 17245 17246 17247 17248 17249 17250 17251 17252 17253 17254 17255 17256 17257 17258 17259 17260 17261 17262 17263 17264 17265 17266 17267 17268 17269 17270 17271 17272 17273 17274 17275 17276 17277 17278 17279 17280 17281 17282 17283 17284 17285 17286 17287 17288 17289 17290 17291 17292 17293 17294 17295 17296 17297 17298 17299 17300 17301 17302 17303 17304 17305 17306 17307 17308 17309 17310 17311 17312 17313 17314 17315 17316 17317 17318 17319 17320 17321 17322 17323 17324 17325 17326 17327 17328 17329 17330 17331 17332 17333 17334 17335 17336 17337 17338 17339 17340 17341 17342 17343 17344 17345 17346 17347 17348 17349 17350 17351 17352 17353 17354 17355 17356 17357 17358 17359 17360 17361 17362 17363 17364 17365 17366 17367 17368 17369 17370 17371 17372 17373 17374 17375 17376 17377 17378 17379 17380 17381 17382 17383 17384 17385 17386 17387 17388 17389 17390 17391 17392 17393 17394 17395 17396 17397 17398 17399 17400 17401 17402 17403 17404 17405 17406 17407 17408 17409 17410 17411 17412 17413 17414 17415 17416 17417 17418 17419 17420 17421 17422 17423 17424 17425 17426 17427 17428 17429 17430 17431 17432 17433 17434 17435 17436 17437 17438 17439 17440 17441 17442 17443 17444 17445 17446 17447 17448 17449 17450 17451 17452 17453 17454 17455 17456 17457 17458 17459 17460 17461 17462 17463 17464 17465 17466 17467 17468 17469 17470 17471 17472 17473 17474 17475 17476 17477 17478 17479 17480 17481 17482 17483 17484 17485 17486 17487 17488 17489 17490 17491 17492 17493 17494 17495 17496 17497 17498 17499 17500 17501 17502 17503 17504 17505 17506 17507 17508 17509 17510 17511 17512 17513 17514 17515 17516 17517 17518 17519 17520 17521 17522 17523 17524 17525 17526 17527 17528 17529 17530 17531 17532 17533 17534 17535 17536 17537 17538 17539 17540 17541 17542 17543 17544 17545 17546 17547 17548 17549 17550 17551 17552 17553 17554 17555 17556 17557 17558 17559 17560 17561 17562 17563 17564 17565 17566 17567 17568 17569 17570 17571 17572 17573 17574 17575 17576 17577 17578 17579 17580 17581 17582 17583 17584 17585 17586 17587 17588 17589 17590 17591 17592 17593 17594 17595 17596 17597 17598 17599 17600 17601 17602 17603 17604 17605 17606 17607 17608 17609 17610 17611 17612 17613 17614 17615 17616 17617 17618 17619 17620 17621 17622 17623 17624 17625 17626 17627 17628 17629 17630 17631 17632 17633 17634 17635 17636 17637 17638 17639 17640 17641 17642 17643 17644 17645 17646 17647 17648 17649 17650 17651 17652 17653 17654 17655 17656 17657 17658 17659 17660 17661 17662 17663 17664 17665 17666 17667 17668 17669 17670 17671 17672 17673 17674 17675 17676 17677 17678 17679 17680 17681 17682 17683 17684 17685 17686 17687 17688 17689 17690 17691 17692 17693 17694 17695 17696 17697 17698 17699 17700 17701 17702 17703 17704 17705 17706 17707 17708 17709 17710 17711 17712 17713 17714 17715 17716 17717 17718 17719 17720 17721 17722 17723 17724 17725 17726 17727 17728 17729 17730 17731 17732 17733 17734 17735 17736 17737 17738 17739 17740 17741 17742 17743 17744 17745 17746 17747 17748 17749 17750 17751 17752 17753 17754 17755 17756 17757 17758 17759 17760 17761 17762 17763 17764 17765 17766 17767 17768 17769 17770 17771 17772 17773 17774 17775 17776 17777 17778 17779 17780 17781 17782 17783 17784 17785 17786 17787 17788 17789 17790 17791 17792 17793 17794 17795 17796 17797 17798 17799 17800 17801 17802 17803 17804 17805 17806 17807 17808 17809 17810 17811 17812 17813 17814 17815 17816 17817 17818 17819 17820 17821 17822 17823 17824 17825 17826 17827 17828 17829 17830 17831 17832 17833 17834 17835 17836 17837 17838 17839 17840 17841 17842 17843 17844 17845 17846 17847 17848 17849 17850 17851 17852 17853 17854 17855 17856 17857 17858 17859 17860 17861 17862 17863 17864 17865 17866 17867 17868 17869 17870 17871 17872 17873 17874 17875 17876 17877 17878 17879 17880 17881 17882 17883 17884 17885 17886 17887 17888 17889 17890 17891 17892 17893 17894 17895 17896 17897 17898 17899 17900 17901 17902 17903 17904 17905 17906 17907 17908 17909 17910 17911 17912 17913 17914 17915 17916 17917 17918 17919 17920 17921 17922 17923 17924 17925 17926 17927 17928 17929 17930 17931 17932 17933 17934 17935 17936 17937 17938 17939 17940 17941 17942 17943 17944 17945 17946 17947 17948 17949 17950 17951 17952 17953 17954 17955 17956 17957 17958 17959 17960 17961 17962 17963 17964 17965 17966 17967 17968 17969 17970 17971 17972 17973 17974 17975 17976 17977 17978 17979 17980 17981 17982 17983 17984 17985 17986 17987 17988 17989 17990 17991 17992 17993 17994 17995 17996 17997 17998 17999 18000 18001 18002 18003 18004 18005 18006 18007 18008 18009 18010 18011 18012 18013 18014 18015 18016 18017 18018 18019 18020 18021 18022 18023 18024 18025 18026 18027 18028 18029 18030 18031 18032 18033 18034 18035 18036 18037 18038 18039 18040 18041 18042 18043 18044 18045 18046 18047 18048 18049 18050 18051 18052 18053 18054 18055 18056 18057 18058 18059 18060 18061 18062 18063 18064 18065 18066 18067 18068 18069 18070 18071 18072 18073 18074 18075 18076 18077 18078 18079 18080 18081 18082 18083 18084 18085 18086 18087 18088 18089 18090 18091 18092 18093 18094 18095 18096 18097 18098 18099 18100 18101 18102 18103 18104 18105 18106 18107 18108 18109 18110 18111 18112 18113 18114 18115 18116 18117 18118 18119 18120 18121 18122 18123 18124 18125 18126 18127 18128 18129 18130 18131 18132 18133 18134 18135 18136 18137 18138 18139 18140 18141 18142 18143 18144 18145 18146 18147 18148 18149 18150 18151 18152 18153 18154 18155 18156 18157 18158 18159 18160 18161 18162 18163 18164 18165 18166 18167 18168 18169 18170 18171 18172 18173 18174 18175 18176 18177 18178 18179 18180 18181 18182 18183 18184 18185 18186 18187 18188 18189 18190 18191 18192 18193 18194 18195 18196 18197 18198 18199 18200 18201 18202 18203 18204 18205 18206 18207 18208 18209 18210 18211 18212 18213 18214 18215 18216 18217 18218 18219 18220 18221 18222 18223 18224 18225 18226 18227 18228 18229 18230 18231 18232 18233 18234 18235 18236 18237 18238 18239 18240 18241 18242 18243 18244 18245 18246 18247 18248 18249 18250 18251 18252 18253 18254 18255 18256 18257 18258 18259 18260 18261 18262 18263 18264 18265 18266 18267 18268 18269 18270 18271 18272 18273 18274 18275 18276 18277 18278 18279 18280 18281 18282 18283 18284 18285 18286 18287 18288 18289 18290 18291 18292 18293 18294 18295 18296 18297 18298 18299 18300 18301 18302 18303 18304 18305 18306 18307 18308 18309 18310 18311 18312 18313 18314 18315 18316 18317 18318 18319 18320 18321 18322 18323 18324 18325 18326 18327 18328 18329 18330 18331 18332 18333 18334 18335 18336 18337 18338 18339 18340 18341 18342 18343 18344 18345 18346 18347 18348 18349 18350 18351 18352 18353 18354 18355 18356 18357 18358 18359 18360 18361 18362 18363 18364 18365 18366 18367 18368 18369 18370 18371 18372 18373 18374 18375 18376 18377 18378 18379 18380 18381 18382 18383 18384 18385 18386 18387 18388 18389 18390 18391 18392 18393 18394 18395 18396 18397 18398 18399 18400 18401 18402 18403 18404 18405 18406 18407 18408 18409 18410 18411 18412 18413 18414 18415 18416 18417 18418 18419 18420 18421 18422 18423 18424 18425 18426 18427 18428 18429 18430 18431 18432 18433 18434 18435 18436 18437 18438 18439 18440 18441 18442 18443 18444 18445 18446 18447 18448 18449 18450 18451 18452 18453 18454 18455 18456 18457 18458 18459 18460 18461 18462 18463 18464 18465 18466 18467 18468 18469 18470 18471 18472 18473 18474 18475 18476 18477 18478 18479 18480 18481 18482 18483 18484 18485 18486 18487 18488 18489 18490 18491 18492 18493 18494 18495 18496 18497 18498 18499 18500 18501 18502 18503 18504 18505 18506 18507 18508 18509 18510 18511 18512 18513 18514 18515 18516 18517 18518 18519 18520 18521 18522 18523 18524 18525 18526 18527 18528 18529 18530 18531 18532 18533 18534 18535 18536 18537 18538 18539 18540 18541 18542 18543 18544 18545 18546 18547 18548 18549 18550 18551 18552 18553 18554 18555 18556 18557 18558 18559 18560 18561 18562 18563 18564 18565 18566 18567 18568 18569 18570 18571 18572 18573 18574 18575 18576 18577 18578 18579 18580 18581 18582 18583 18584 18585 18586 18587 18588 18589 18590 18591 18592 18593 18594 18595 18596 18597 18598 18599 18600 18601 18602 18603 18604 18605 18606 18607 18608 18609 18610 18611 18612 18613 18614 18615 18616 18617 18618 18619 18620 18621 18622 18623 18624 18625 18626 18627 18628 18629 18630 18631 18632 18633 18634 18635 18636 18637 18638 18639 18640 18641 18642 18643 18644 18645 18646 18647 18648 18649 18650 18651 18652 18653 18654 18655 18656 18657 18658 18659 18660 18661 18662 18663 18664 18665 18666 18667 18668 18669 18670 18671 18672 18673 18674 18675 18676 18677 18678 18679 18680 18681 18682 18683 18684 18685 18686 18687 18688 18689 18690 18691 18692 18693 18694 18695 18696 18697 18698 18699 18700 18701 18702 18703 18704 18705 18706 18707 18708 18709 18710 18711 18712 18713 18714 18715 18716 18717 18718 18719 18720 18721 18722 18723 18724 18725 18726 18727 18728 18729 18730 18731 18732 18733 18734 18735 18736 18737 18738 18739 18740 18741 18742 18743 18744 18745 18746 18747 18748 18749 18750 18751 18752 18753 18754 18755 18756 18757 18758 18759 18760 18761 18762 18763 18764 18765 18766 18767 18768 18769 18770 18771 18772 18773 18774 18775 18776 18777 18778 18779 18780 18781 18782 18783 18784 18785 18786 18787 18788 18789 18790 18791 18792 18793 18794 18795 18796 18797 18798 18799 18800 18801 18802 18803 18804 18805 18806 18807 18808 18809 18810 18811 18812 18813 18814 18815 18816 18817 18818 18819 18820 18821 18822 18823 18824 18825 18826 18827 18828 18829 18830 18831 18832 18833 18834 18835 18836 18837 18838 18839 18840 18841 18842 18843 18844 18845 18846 18847 18848 18849 18850 18851 18852 18853 18854 18855 18856 18857 18858 18859 18860 18861 18862 18863 18864 18865 18866 18867 18868 18869 18870 18871 18872 18873 18874 18875 18876 18877 18878 18879 18880 18881 18882 18883 18884 18885 18886 18887 18888 18889 18890 18891 18892 18893 18894 18895 18896 18897 18898 18899 18900 18901 18902 18903 18904 18905 18906 18907 18908 18909 18910 18911 18912 18913 18914 18915 18916 18917 18918 18919 18920 18921 18922 18923 18924 18925 18926 18927 18928 18929 18930 18931 18932 18933 18934 18935 18936 18937 18938 18939 18940 18941 18942 18943 18944 18945 18946 18947 18948 18949 18950 18951 18952 18953 18954 18955 18956 18957 18958 18959 18960 18961 18962 18963 18964 18965 18966 18967 18968 18969 18970 18971 18972 18973 18974 18975 18976 18977 18978 18979 18980 18981 18982 18983 18984 18985 18986 18987 18988 18989 18990 18991 18992 18993 18994 18995 18996 18997 18998 18999 19000 19001 19002 19003 19004 19005 19006 19007 19008 19009 19010 19011 19012 19013 19014 19015 19016 19017 19018 19019 19020 19021 19022 19023 19024 19025 19026 19027 19028 19029 19030 19031 19032 19033 19034 19035 19036 19037 19038 19039 19040 19041 19042 19043 19044 19045 19046 19047 19048 19049 19050 19051 19052 19053 19054 19055 19056 19057 19058 19059 19060 19061 19062 19063 19064 19065 19066 19067 19068 19069 19070 19071 19072 19073 19074 19075 19076 19077 19078 19079 19080 19081 19082 19083 19084 19085 19086 19087 19088 19089 19090 19091 19092 19093 19094 19095 19096 19097 19098 19099 19100 19101 19102 19103 19104 19105 19106 19107 19108 19109 19110 19111 19112 19113 19114 19115 19116 19117 19118 19119 19120 19121 19122 19123 19124 19125 19126 19127 19128 19129 19130 19131 19132 19133 19134 19135 19136 19137 19138 19139 19140 19141 19142 19143 19144 19145 19146 19147 19148 19149 19150 19151 19152 19153 19154 19155 19156 19157 19158 19159 19160 19161 19162 19163 19164 19165 19166 19167 19168 19169 19170 19171 19172 19173 19174 19175 19176 19177 19178 19179 19180 19181 19182 19183 19184 19185 19186 19187 19188 19189 19190 19191 19192 19193 19194 19195 19196 19197 19198 19199 19200 19201 19202 19203 19204 19205 19206 19207 19208 19209 19210 19211 19212 19213 19214 19215 19216 19217 19218 19219 19220 19221 19222 19223 19224 19225 19226 19227 19228 19229 19230 19231 19232 19233 19234 19235 19236 19237 19238 19239 19240 19241 19242 19243 19244 19245 19246 19247 19248 19249 19250 19251 19252 19253 19254 19255 19256 19257 19258 19259 19260 19261 19262 19263 19264 19265 19266 19267 19268 19269 19270 19271 19272 19273 19274 19275 19276 19277 19278 19279 19280 19281 19282 19283 19284 19285 19286 19287 19288 19289 19290 19291 19292 19293 19294 19295 19296 19297 19298 19299 19300 19301 19302 19303 19304 19305 19306 19307 19308 19309 19310 19311 19312 19313 19314 19315 19316 19317 19318 19319 19320 19321 19322 19323 19324 19325 19326 19327 19328 19329 19330 19331 19332 19333 19334 19335 19336 19337 19338 19339 19340 19341 19342 19343 19344 19345 19346 19347 19348 19349 19350 19351 19352 19353 19354 19355 19356 19357 19358 19359 19360 19361 19362 19363 19364 19365 19366 19367 19368 19369 19370 19371 19372 19373 19374 19375 19376 19377 19378 19379 19380 19381 19382 19383 19384 19385 19386 19387 19388 19389 19390 19391 19392 19393 19394 19395 19396 19397 19398 19399 19400 19401 19402 19403 19404 19405 19406 19407 19408 19409 19410 19411 19412 19413 19414 19415 19416 19417 19418 19419 19420 19421 19422 19423 19424 19425 19426 19427 19428 19429 19430 19431 19432 19433 19434 19435 19436 19437 19438 19439 19440 19441 19442 19443 19444 19445 19446 19447 19448 19449 19450 19451 19452 19453 19454 19455 19456 19457 19458 19459 19460 19461 19462 19463 19464 19465 19466 19467 19468 19469 19470 19471 19472 19473 19474 19475 19476 19477 19478 19479 19480 19481 19482 19483 19484 19485 19486 19487 19488 19489 19490 19491 19492 19493 19494 19495 19496 19497 19498 19499 19500 19501 19502 19503 19504 19505 19506 19507 19508 19509 19510 19511 19512 19513 19514 19515 19516 19517 19518 19519 19520 19521 19522 19523 19524 19525 19526 19527 19528 19529 19530 19531 19532 19533 19534 19535 19536 19537 19538 19539 19540 19541 19542 19543 19544 19545 19546 19547 19548 19549 19550 19551 19552 19553 19554 19555 19556 19557 19558 19559 19560 19561 19562 19563 19564 19565 19566 19567 19568 19569 19570 19571 19572 19573 19574 19575 19576 19577 19578 19579 19580 19581 19582 19583 19584 19585 19586 19587 19588 19589 19590 19591 19592 19593 19594 19595 19596 19597 19598 19599 19600 19601 19602 19603 19604 19605 19606 19607 19608 19609 19610 19611 19612 19613 19614 19615 19616 19617 19618 19619 19620 19621 19622 19623 19624 19625 19626 19627 19628 19629 19630 19631 19632 19633 19634 19635 19636 19637 19638 19639 19640 19641 19642 19643 19644 19645 19646 19647 19648 19649 19650 19651 19652 19653 19654 19655 19656 19657 19658 19659 19660 19661 19662 19663 19664 19665 19666 19667 19668 19669 19670 19671 19672 19673 19674 19675 19676 19677 19678 19679 19680 19681 19682 19683 19684 19685 19686 19687 19688 19689 19690 19691 19692 19693 19694 19695 19696 19697 19698 19699 19700 19701 19702 19703 19704 19705 19706 19707 19708 19709 19710 19711 19712 19713 19714 19715 19716 19717 19718 19719 19720 19721 19722 19723 19724 19725 19726 19727 19728 19729 19730 19731 19732 19733 19734 19735 19736 19737 19738 19739 19740 19741 19742 19743 19744 19745 19746 19747 19748 19749 19750 19751 19752 19753 19754 19755 19756 19757 19758 19759 19760 19761 19762 19763 19764 19765 19766 19767 19768 19769 19770 19771 19772 19773 19774 19775 19776 19777 19778 19779 19780 19781 19782 19783 19784 19785 19786 19787 19788 19789 19790 19791 19792 19793 19794 19795 19796 19797 19798 19799 19800 19801 19802 19803 19804 19805 19806 19807 19808 19809 19810 19811 19812 19813 19814 19815 19816 19817 19818 19819 19820 19821 19822 19823 19824 19825 19826 19827 19828 19829 19830 19831 19832 19833 19834 19835 19836 19837 19838 19839 19840 19841 19842 19843 19844 19845 19846 19847 19848 19849 19850 19851 19852 19853 19854 19855 19856 19857 19858 19859 19860 19861 19862 19863 19864 19865 19866 19867 19868 19869 19870 19871 19872 19873 19874 19875 19876 19877 19878 19879 19880 19881 19882 19883 19884 19885 19886 19887 19888 19889 19890 19891 19892 19893 19894 19895 19896 19897 19898 19899 19900 19901 19902 19903 19904 19905 19906 19907 19908 19909 19910 19911 19912 19913 19914 19915 19916 19917 19918 19919 19920 19921 19922 19923 19924 19925 19926 19927 19928 19929 19930 19931 19932 19933 19934 19935 19936 19937 19938 19939 19940 19941 19942 19943 19944 19945 19946 19947 19948 19949 19950 19951 19952 19953 19954 19955 19956 19957 19958 19959 19960 19961 19962 19963 19964 19965 19966 19967 19968 19969 19970 19971 19972 19973 19974 19975 19976 19977 19978 19979 19980 19981 19982 19983 19984 19985 19986 19987 19988 19989 19990 19991 19992 19993 19994 19995 19996 19997 19998 19999 20000 20001 20002 20003 20004 20005 20006 20007 20008 20009 20010 20011 20012 20013 20014 20015 20016 20017 20018 20019 20020 20021 20022 20023 20024 20025 20026 20027 20028 20029 20030 20031 20032 20033 20034 20035 20036 20037 20038 20039 20040 20041 20042 20043 20044 20045 20046 20047 20048 20049 20050 20051 20052 20053 20054 20055 20056 20057 20058 20059 20060 20061 20062 20063 20064 20065 20066 20067 20068 20069 20070 20071 20072 20073 20074 20075 20076 20077 20078 20079 20080 20081 20082 20083 20084 20085 20086 20087 20088 20089 20090 20091 20092 20093 20094 20095 20096 20097 20098 20099 20100 20101 20102 20103 20104 20105 20106 20107 20108 20109 20110 20111 20112 20113 20114 20115 20116 20117 20118 20119 20120 20121 20122 20123 20124 20125 20126 20127 20128 20129 20130 20131 20132 20133 20134 20135 20136 20137 20138 20139 20140 20141 20142 20143 20144 20145 20146 20147 20148 20149 20150 20151 20152 20153 20154 20155 20156 20157 20158 20159 20160 20161 20162 20163 20164 20165 20166 20167 20168 20169 20170 20171 20172 20173 20174 20175 20176 20177 20178 20179 20180 20181 20182 20183 20184 20185 20186 20187 20188 20189 20190 20191 20192 20193 20194 20195 20196 20197 20198 20199 20200 20201 20202 20203 20204 20205 20206 20207 20208 20209 20210 20211 20212 20213 20214 20215 20216 20217 20218 20219 20220 20221 20222 20223 20224 20225 20226 20227 20228 20229 20230 20231 20232 20233 20234 20235 20236 20237 20238 20239 20240 20241 20242 20243 20244 20245 20246 20247 20248 20249 20250 20251 20252 20253 20254 20255 20256 20257 20258 20259 20260 20261 20262 20263 20264 20265 20266 20267 20268 20269 20270 20271 20272 20273 20274 20275 20276 20277 20278 20279 20280 20281 20282 20283 20284 20285 20286 20287 20288 20289 20290 20291 20292 20293 20294 20295 20296 20297 20298 20299 20300 20301 20302 20303 20304 20305 20306 20307 20308 20309 20310 20311 20312 20313 20314 20315 20316 20317 20318 20319 20320 20321 20322 20323 20324 20325 20326 20327 20328 20329 20330 20331 20332 20333 20334 20335 20336 20337 20338 20339 20340 20341 20342 20343 20344 20345 20346 20347 20348 20349 20350 20351 20352 20353 20354 20355 20356 20357 20358 20359 20360 20361 20362 20363 20364 20365 20366 20367 20368 20369 20370 20371 20372 20373 20374 20375 20376 20377 20378 20379 20380 20381 20382 20383 20384 20385 20386 20387 20388 20389 20390 20391 20392 20393 20394 20395 20396 20397 20398 20399 20400 20401 20402 20403 20404 20405 20406 20407 20408 20409 20410 20411 20412 20413 20414 20415 20416 20417 20418 20419 20420 20421 20422 20423 20424 20425 20426 20427 20428 20429 20430 20431 20432 20433 20434 20435 20436 20437 20438 20439 20440 20441 20442 20443 20444 20445 20446 20447 20448 20449 20450 20451 20452 20453 20454 20455 20456 20457 20458 20459 20460 20461 20462 20463 20464 20465 20466 20467 20468 20469 20470 20471 20472 20473 20474 20475 20476 20477 20478 20479 20480 20481 20482 20483 20484 20485 20486 20487 20488 20489 20490 20491 20492 20493 20494 20495 20496 20497 20498 20499 20500 20501 20502 20503 20504 20505 20506 20507 20508 20509 20510 20511 20512 20513 20514 20515 20516 20517 20518 20519 20520 20521 20522 20523 20524 20525 20526 20527 20528 20529 20530 20531 20532 20533 20534 20535 20536 20537 20538 20539 20540 20541 20542 20543 20544 20545 20546 20547 20548 20549 20550 20551 20552 20553 20554 20555 20556 20557 20558 20559 20560 20561 20562 20563 20564 20565 20566 20567 20568 20569 20570 20571 20572 20573 20574 20575 20576 20577 20578 20579 20580 20581 20582 20583 20584 20585 20586 20587 20588 20589 20590 20591 20592 20593 20594 20595 20596 20597 20598 20599 20600 20601 20602 20603 20604 20605 20606 20607 20608 20609 20610 20611 20612 20613 20614 20615 20616 20617 20618 20619 20620 20621 20622 20623 20624 20625 20626 20627 20628 20629 20630 20631 20632 20633 20634 20635 20636 20637 20638 20639 20640 20641 20642 20643 20644 20645 20646 20647 20648 20649 20650 20651 20652 20653 20654 20655 20656 20657 20658 20659 20660 20661 20662 20663 20664 20665 20666 20667 20668 20669 20670 20671 20672 20673 20674 20675 20676 20677 20678 20679 20680 20681 20682 20683 20684 20685 20686 20687 20688 20689 20690 20691 20692 20693 20694 20695 20696 20697 20698 20699 20700 20701 20702 20703 20704 20705 20706 20707 20708 20709 20710 20711 20712 20713 20714 20715 20716 20717 20718 20719 20720 20721 20722 20723 20724 20725 20726 20727 20728 20729 20730 20731 20732 20733 20734 20735 20736 20737 20738 20739 20740 20741 20742 20743 20744 20745 20746 20747 20748 20749 20750 20751 20752 20753 20754 20755 20756 20757 20758 20759 20760 20761 20762 20763 20764 20765 20766 20767 20768 20769 20770 20771 20772 20773 20774 20775 20776 20777 20778 20779 20780 20781 20782 20783 20784 20785 20786 20787 20788 20789 20790 20791 20792 20793 20794 20795 20796 20797 20798 20799 20800 20801 20802 20803 20804 20805 20806 20807 20808 20809 20810 20811 20812 20813 20814 20815 20816 20817 20818 20819 20820 20821 20822 20823 20824 20825 20826 20827 20828 20829 20830 20831 20832 20833 20834 20835 20836 20837 20838 20839 20840 20841 20842 20843 20844 20845 20846 20847 20848 20849 20850 20851 20852 20853 20854 20855 20856 20857 20858 20859 20860 20861 20862 20863 20864 20865 20866 20867 20868 20869 20870 20871 20872 20873 20874 20875 20876 20877 20878 20879 20880 20881 20882 20883 20884 20885 20886 20887 20888 20889 20890 20891 20892 20893 20894 20895 20896 20897 20898 20899 20900 20901 20902 20903 20904 20905 20906 20907 20908 20909 20910 20911 20912 20913 20914 20915 20916 20917 20918 20919 20920 20921 20922 20923 20924 20925 20926 20927 20928 20929 20930 20931 20932 20933 20934 20935 20936 20937 20938 20939 20940 20941 20942 20943 20944 20945 20946 20947 20948 20949 20950 20951 20952 20953 20954 20955 20956 20957 20958 20959 20960 20961 20962 20963 20964 20965 20966 20967 20968 20969 20970 20971 20972 20973 20974 20975 20976 20977 20978 20979 20980 20981 20982 20983 20984 20985 20986 20987 20988 20989 20990 20991 20992 20993 20994 20995 20996 20997 20998 20999 21000 21001 21002 21003 21004 21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 21015 21016 21017 21018 21019 21020 21021 21022 21023 21024 21025 21026 21027 21028 21029 21030 21031 21032 21033 21034 21035 21036 21037 21038 21039 21040 21041 21042 21043 21044 21045 21046 21047 21048 21049 21050 21051 21052 21053 21054 21055 21056 21057 21058 21059 21060 21061 21062 21063 21064 21065 21066 21067 21068 21069 21070 21071 21072 21073 21074 21075 21076 21077 21078 21079 21080 21081 21082 21083 21084 21085 21086 21087 21088 21089 21090 21091 21092 21093 21094 21095 21096 21097 21098 21099 21100 21101 21102 21103 21104 21105 21106 21107 21108 21109 21110 21111 21112 21113 21114 21115 21116 21117 21118 21119 21120 21121 21122 21123 21124 21125 21126 21127 21128 21129 21130 21131 21132 21133 21134 21135 21136 21137 21138 21139 21140 21141 21142 21143 21144 21145 21146 21147 21148 21149 21150 21151 21152 21153 21154 21155 21156 21157 21158 21159 21160 21161 21162 21163 21164 21165 21166 21167 21168 21169 21170 21171 21172 21173 21174 21175 21176 21177 21178 21179 21180 21181 21182 21183 21184 21185 21186 21187 21188 21189 21190 21191 21192 21193 21194 21195 21196 21197 21198 21199 21200 21201 21202 21203 21204 21205 21206 21207 21208 21209 21210 21211 21212 21213 21214 21215 21216 21217 21218 21219 21220 21221 21222 21223 21224 21225 21226 21227 21228 21229 21230 21231 21232 21233 21234 21235 21236 21237 21238 21239 21240 21241 21242 21243 21244 21245 21246 21247 21248 21249 21250 21251 21252 21253 21254 21255 21256 21257 21258 21259 21260 21261 21262 21263 21264 21265 21266 21267 21268 21269 21270 21271 21272 21273 21274 21275 21276 21277 21278 21279 21280 21281 21282 21283 21284 21285 21286 21287 21288 21289 21290 21291 21292 21293 21294 21295 21296 21297 21298 21299 21300 21301 21302 21303 21304 21305 21306 21307 21308 21309 21310 21311 21312 21313 21314 21315 21316 21317 21318 21319 21320 21321 21322 21323 21324 21325 21326 21327 21328 21329 21330 21331 21332 21333 21334 21335 21336 21337 21338 21339 21340 21341 21342 21343 21344 21345 21346 21347 21348 21349 21350 21351 21352 21353 21354 21355 21356 21357 21358 21359 21360 21361 21362 21363 21364 21365 21366 21367 21368 21369 21370 21371 21372 21373 21374 21375 21376 21377 21378 21379 21380 21381 21382 21383 21384 21385 21386 21387 21388 21389 21390 21391 21392 21393 21394 21395 21396 21397 21398 21399 21400 21401 21402 21403 21404 21405 21406 21407 21408 21409 21410 21411 21412 21413 21414 21415 21416 21417 21418 21419 21420 21421 21422 21423 21424 21425 21426 21427 21428 21429 21430 21431 21432 21433 21434 21435 21436 21437 21438 21439 21440 21441 21442 21443 21444 21445 21446 21447 21448 21449 21450 21451 21452 21453 21454 21455 21456 21457 21458 21459 21460 21461 21462 21463 21464 21465 21466 21467 21468 21469 21470 21471 21472 21473 21474 21475 21476 21477 21478 21479 21480 21481 21482 21483 21484 21485 21486 21487 21488 21489 21490 21491 21492 21493 21494 21495 21496 21497 21498 21499 21500 21501 21502 21503 21504 21505 21506 21507 21508 21509 21510 21511 21512 21513 21514 21515 21516 21517 21518 21519 21520 21521 21522 21523 21524 21525 21526 21527 21528 21529 21530 21531 21532 21533 21534 21535 21536 21537 21538 21539 21540 21541 21542 21543 21544 21545 21546 21547 21548 21549 21550 21551 21552 21553 21554 21555 21556 21557 21558 21559 21560 21561 21562 21563 21564 21565 21566 21567 21568 21569 21570 21571 21572 21573 21574 21575 21576 21577 21578 21579 21580 21581 21582 21583 21584 21585 21586 21587 21588 21589 21590 21591 21592 21593 21594 21595 21596 21597 21598 21599 21600 21601 21602 21603 21604 21605 21606 21607 21608 21609 21610 21611 21612 21613 21614 21615 21616 21617 21618 21619 21620 21621 21622 21623 21624 21625 21626 21627 21628 21629 21630 21631 21632 21633 21634 21635 21636 21637 21638 21639 21640 21641 21642 21643 21644 21645 21646 21647 21648 21649 21650 21651 21652 21653 21654 21655 21656 21657 21658 21659 21660 21661 21662 21663 21664 21665 21666 21667 21668 21669 21670 21671 21672 21673 21674 21675 21676 21677 21678 21679 21680 21681 21682 21683 21684 21685 21686 21687 21688 21689 21690 21691 21692 21693 21694 21695 21696 21697 21698 21699 21700 21701 21702 21703 21704 21705 21706 21707 21708 21709 21710 21711 21712 21713 21714 21715 21716 21717 21718 21719 21720 21721 21722 21723 21724 21725 21726 21727 21728 21729 21730 21731 21732 21733 21734 21735 21736 21737 21738 21739 21740 21741 21742 21743 21744 21745 21746 21747 21748 21749 21750 21751 21752 21753 21754 21755 21756 21757 21758 21759 21760 21761 21762 21763 21764 21765 21766 21767 21768 21769 21770 21771 21772 21773 21774 21775 21776 21777 21778 21779 21780 21781 21782 21783 21784 21785 21786 21787 21788 21789 21790 21791 21792 21793 21794 21795 21796 21797 21798 21799 21800 21801 21802 21803 21804 21805 21806 21807 21808 21809 21810 21811 21812 21813 21814 21815 21816 21817 21818 21819 21820 21821 21822 21823 21824 21825 21826 21827 21828 21829 21830 21831 21832 21833 21834 21835 21836 21837 21838 21839 21840 21841 21842 21843 21844 21845 21846 21847 21848 21849 21850 21851 21852 21853 21854 21855 21856 21857 21858 21859 21860 21861 21862 21863 21864 21865 21866 21867 21868 21869 21870 21871 21872 21873 21874 21875 21876 21877 21878 21879 21880 21881 21882 21883 21884 21885 21886 21887 21888 21889 21890 21891 21892 21893 21894 21895 21896 21897 21898 21899 21900 21901 21902 21903 21904 21905 21906 21907 21908 21909 21910 21911 21912 21913 21914 21915 21916 21917 21918 21919 21920 21921 21922 21923 21924 21925 21926 21927 21928 21929 21930 21931 21932 21933 21934 21935 21936 21937 21938 21939 21940 21941 21942 21943 21944 21945 21946 21947 21948 21949 21950 21951 21952 21953 21954 21955 21956 21957 21958 21959 21960 21961 21962 21963 21964 21965 21966 21967 21968 21969 21970 21971 21972 21973 21974 21975 21976 21977 21978 21979 21980 21981 21982 21983 21984 21985 21986 21987 21988 21989 21990 21991 21992 21993 21994 21995 21996 21997 21998 21999 22000 22001 22002 22003 22004 22005 22006 22007 22008 22009 22010 22011 22012 22013 22014 22015 22016 22017 22018 22019 22020 22021 22022 22023 22024 22025 22026 22027 22028 22029 22030 22031 22032 22033 22034 22035 22036 22037 22038 22039 22040 22041 22042 22043 22044 22045 22046 22047 22048 22049 22050 22051 22052 22053 22054 22055 22056 22057 22058 22059 22060 22061 22062 22063 22064 22065 22066 22067 22068 22069 22070 22071 22072 22073 22074 22075 22076 22077 22078 22079 22080 22081 22082 22083 22084 22085 22086 22087 22088 22089 22090 22091 22092 22093 22094 22095 22096 22097 22098 22099 22100 22101 22102 22103 22104 22105 22106 22107 22108 22109 22110 22111 22112 22113 22114 22115 22116 22117 22118 22119 22120 22121 22122 22123 22124 22125 22126 22127 22128 22129 22130 22131 22132 22133 22134 22135 22136 22137 22138 22139 22140 22141 22142 22143 22144 22145 22146 22147 22148 22149 22150 22151 22152 22153 22154 22155 22156 22157 22158 22159 22160 22161 22162 22163 22164 22165 22166 22167 22168 22169 22170 22171 22172 22173 22174 22175 22176 22177 22178 22179 22180 22181 22182 22183 22184 22185 22186 22187 22188 22189 22190 22191 22192 22193 22194 22195 22196 22197 22198 22199 22200 22201 22202 22203 22204 22205 22206 22207 22208 22209 22210 22211 22212 22213 22214 22215 22216 22217 22218 22219 22220 22221 22222 22223 22224 22225 22226 22227 22228 22229 22230 22231 22232 22233 22234 22235 22236 22237 22238 22239 22240 22241 22242 22243 22244 22245 22246 22247 22248 22249 22250 22251 22252 22253 22254 22255 22256 22257 22258 22259 22260 22261 22262 22263 22264 22265 22266 22267 22268 22269 22270 22271 22272 22273 22274 22275 22276 22277 22278 22279 22280 22281 22282 22283 22284 22285 22286 22287 22288 22289 22290 22291 22292 22293 22294 22295 22296 22297 22298 22299 22300 22301 22302 22303 22304 22305 22306 22307 22308 22309 22310 22311 22312 22313 22314 22315 22316 22317 22318 22319 22320 22321 22322 22323 22324 22325 22326 22327 22328 22329 22330 22331 22332 22333 22334 22335 22336 22337 22338 22339 22340 22341 22342 22343 22344 22345 22346 22347 22348 22349 22350 22351 22352 22353 22354 22355 22356 22357 22358 22359 22360 22361 22362 22363 22364 22365 22366 22367 22368 22369 22370 22371 22372 22373 22374 22375 22376 22377 22378 22379 22380 22381 22382 22383 22384 22385 22386 22387 22388 22389 22390 22391 22392 22393 22394 22395 22396 22397 22398 22399 22400 22401 22402 22403 22404 22405 22406 22407 22408 22409 22410 22411 22412 22413 22414 22415 22416 22417 22418 22419 22420 22421 22422 22423 22424 22425 22426 22427 22428 22429 22430 22431 22432 22433 22434 22435 22436 22437 22438 22439 22440 22441 22442 22443 22444 22445 22446 22447 22448 22449 22450 22451 22452 22453 22454 22455 22456 22457 22458 22459 22460 22461 22462 22463 22464 22465 22466 22467 22468 22469 22470 22471 22472 22473 22474 22475 22476 22477 22478 22479 22480 22481 22482 22483 22484 22485 22486 22487 22488 22489 22490 22491 22492 22493 22494 22495 22496 22497 22498 22499 22500 22501 22502 22503 22504 22505 22506 22507 22508 22509 22510 22511 22512 22513 22514 22515 22516 22517 22518 22519 22520 22521 22522 22523 22524 22525 22526 22527 22528 22529 22530 22531 22532 22533 22534 22535 22536 22537 22538 22539 22540 22541 22542 22543 22544 22545 22546 22547 22548 22549 22550 22551 22552 22553 22554 22555 22556 22557 22558 22559 22560 22561 22562 22563 22564 22565 22566 22567 22568 22569 22570 22571 22572 22573 22574 22575 22576 22577 22578 22579 22580 22581 22582 22583 22584 22585 22586 22587 22588 22589 22590 22591 22592 22593 22594 22595 22596 22597 22598 22599 22600 22601 22602 22603 22604 22605 22606 22607 22608 22609 22610 22611 22612 22613 22614 22615 22616 22617 22618 22619 22620 22621 22622 22623 22624 22625 22626 22627 22628 22629 22630 22631 22632 22633 22634 22635 22636 22637 22638 22639 22640 22641 22642 22643 22644 22645 22646 22647 22648 22649 22650 22651 22652 22653 22654 22655 22656 22657 22658 22659 22660 22661 22662 22663 22664 22665 22666 22667 22668 22669 22670 22671 22672 22673 22674 22675 22676 22677 22678 22679 22680 22681 22682 22683 22684 22685 22686 22687 22688 22689 22690 22691 22692 22693 22694 22695 22696 22697 22698 22699 22700 22701 22702 22703 22704 22705 22706 22707 22708 22709 22710 22711 22712 22713 22714 22715 22716 22717 22718 22719 22720 22721 22722 22723 22724 22725 22726 22727 22728 22729 22730 22731 22732 22733 22734 22735 22736 22737 22738 22739 22740 22741 22742 22743 22744 22745 22746 22747 22748 22749 22750 22751 22752 22753 22754 22755 22756 22757 22758 22759 22760 22761 22762 22763 22764 22765 22766 22767 22768 22769 22770 22771 22772 22773 22774 22775 22776 22777 22778 22779 22780 22781 22782 22783 22784 22785 22786 22787 22788 22789 22790 22791 22792 22793 22794 22795 22796 22797 22798 22799 22800 22801 22802 22803 22804 22805 22806 22807 22808 22809 22810 22811 22812 22813 22814 22815 22816 22817 22818 22819 22820 22821 22822 22823 22824 22825 22826 22827 22828 22829 22830 22831 22832 22833 22834 22835 22836 22837 22838 22839 22840 22841 22842 22843 22844 22845 22846 22847 22848 22849 22850 22851 22852 22853 22854 22855 22856 22857 22858 22859 22860 22861 22862 22863 22864 22865 22866 22867 22868 22869 22870 22871 22872 22873 22874 22875 22876 22877 22878 22879 22880 22881 22882 22883 22884 22885 22886 22887 22888 22889 22890 22891 22892 22893 22894 22895 22896 22897 22898 22899 22900 22901 22902 22903 22904 22905 22906 22907 22908 22909 22910 22911 22912 22913 22914 22915 22916 22917 22918 22919 22920 22921 22922 22923 22924 22925 22926 22927 22928 22929 22930 22931 22932 22933 22934 22935 22936 22937 22938 22939 22940 22941 22942 22943 22944 22945 22946 22947 22948 22949 22950 22951 22952 22953 22954 22955 22956 22957 22958 22959 22960 22961 22962 22963 22964 22965 22966 22967 22968 22969 22970 22971 22972 22973 22974 22975 22976 22977 22978 22979 22980 22981 22982 22983 22984 22985 22986 22987 22988 22989 22990 22991 22992 22993 22994 22995 22996 22997 22998 22999 23000 23001 23002 23003 23004 23005 23006 23007 23008 23009 23010 23011 23012 23013 23014 23015 23016 23017 23018 23019 23020 23021 23022 23023 23024 23025 23026 23027 23028 23029 23030 23031 23032 23033 23034 23035 23036 23037 23038 23039 23040 23041 23042 23043 23044 23045 23046 23047 23048 23049 23050 23051 23052 23053 23054 23055 23056 23057 23058 23059 23060 23061 23062 23063 23064 23065 23066 23067 23068 23069 23070 23071 23072 23073 23074 23075 23076 23077 23078 23079 23080 23081 23082 23083 23084 23085 23086 23087 23088 23089 23090 23091 23092 23093 23094 23095 23096 23097 23098 23099 23100 23101 23102 23103 23104 23105 23106 23107 23108 23109 23110 23111 23112 23113 23114 23115 23116 23117 23118 23119 23120 23121 23122 23123 23124 23125 23126 23127 23128 23129 23130 23131 23132 23133 23134 23135 23136 23137 23138 23139 23140 23141 23142 23143 23144 23145 23146 23147 23148 23149 23150 23151 23152 23153 23154 23155 23156 23157 23158 23159 23160 23161 23162 23163 23164 23165 23166 23167 23168 23169 23170 23171 23172 23173 23174 23175 23176 23177 23178 23179 23180 23181 23182 23183 23184 23185 23186 23187 23188 23189 23190 23191 23192 23193 23194 23195 23196 23197 23198 23199 23200 23201 23202 23203 23204 23205 23206 23207 23208 23209 23210 23211 23212 23213 23214 23215 23216 23217 23218 23219 23220 23221 23222 23223 23224 23225 23226 23227 23228 23229 23230 23231 23232 23233 23234 23235 23236 23237 23238 23239 23240 23241 23242 23243 23244 23245 23246 23247 23248 23249 23250 23251 23252 23253 23254 23255 23256 23257 23258 23259 23260 23261 23262 23263 23264 23265 23266 23267 23268 23269 23270 23271 23272 23273 23274 23275 23276 23277 23278 23279 23280 23281 23282 23283 23284 23285 23286 23287 23288 23289 23290 23291 23292 23293 23294 23295 23296 23297 23298 23299 23300 23301 23302 23303 23304 23305 23306 23307 23308 23309 23310 23311 23312 23313 23314 23315 23316 23317 23318 23319 23320 23321 23322 23323 23324 23325 23326 23327 23328 23329 23330 23331 23332 23333 23334 23335 23336 23337 23338 23339 23340 23341 23342 23343 23344 23345 23346 23347 23348 23349 23350 23351 23352 23353 23354 23355 23356 23357 23358 23359 23360 23361 23362 23363 23364 23365 23366 23367 23368 23369 23370 23371 23372 23373 23374 23375 23376 23377 23378 23379 23380 23381 23382 23383 23384 23385 23386 23387 23388 23389 23390 23391 23392 23393 23394 23395 23396 23397 23398 23399 23400 23401 23402 23403 23404 23405 23406 23407 23408 23409 23410 23411 23412 23413 23414 23415 23416 23417 23418 23419 23420 23421 23422 23423 23424 23425 23426 23427 23428 23429 23430 23431 23432 23433 23434 23435 23436 23437 23438 23439 23440 23441 23442 23443 23444 23445 23446 23447 23448 23449 23450 23451 23452 23453 23454 23455 23456 23457 23458 23459 23460 23461 23462 23463 23464 23465 23466 23467 23468 23469 23470 23471 23472 23473 23474 23475 23476 23477 23478 23479 23480 23481 23482 23483 23484 23485 23486 23487 23488 23489 23490 23491 23492 23493 23494 23495 23496 23497 23498 23499 23500 23501 23502 23503 23504 23505 23506 23507 23508 23509 23510 23511 23512 23513 23514 23515 23516 23517 23518 23519 23520 23521 23522 23523 23524 23525 23526 23527 23528 23529 23530 23531 23532 23533 23534 23535 23536 23537 23538 23539 23540 23541 23542 23543 23544 23545 23546 23547 23548 23549 23550 23551 23552 23553 23554 23555 23556 23557 23558 23559 23560 23561 23562 23563 23564 23565 23566 23567 23568 23569 23570 23571 23572 23573 23574 23575 23576 23577 23578 23579 23580 23581 23582 23583 23584 23585 23586 23587 23588 23589 23590 23591 23592 23593 23594 23595 23596 23597 23598 23599 23600 23601 23602 23603 23604 23605 23606 23607 23608 23609 23610 23611 23612 23613 23614 23615 23616 23617 23618 23619 23620 23621 23622 23623 23624 23625 23626 23627 23628 23629 23630 23631 23632 23633 23634 23635 23636 23637 23638 23639 23640 23641 23642 23643 23644 23645 23646 23647 23648 23649 23650 23651 23652 23653 23654 23655 23656 23657 23658 23659 23660 23661 23662 23663 23664 23665 23666 23667 23668 23669 23670 23671 23672 23673 23674 23675 23676 23677 23678 23679 23680 23681 23682 23683 23684 23685 23686 23687 23688 23689 23690 23691 23692 23693 23694 23695 23696 23697 23698 23699 23700 23701 23702 23703 23704 23705 23706 23707 23708 23709 23710 23711 23712 23713 23714 23715 23716 23717 23718 23719 23720 23721 23722 23723 23724 23725 23726 23727 23728 23729 23730 23731 23732 23733 23734 23735 23736 23737 23738 23739 23740 23741 23742 23743 23744 23745 23746 23747 23748 23749 23750 23751 23752 23753 23754 23755 23756 23757 23758 23759 23760 23761 23762 23763 23764 23765 23766 23767 23768 23769 23770 23771 23772 23773 23774 23775 23776 23777 23778 23779 23780 23781 23782 23783 23784 23785 23786 23787 23788 23789 23790 23791 23792 23793 23794 23795 23796 23797 23798 23799 23800 23801 23802 23803 23804 23805 23806 23807 23808 23809 23810 23811 23812 23813 23814 23815 23816 23817 23818 23819 23820 23821 23822 23823 23824 23825 23826 23827 23828 23829 23830 23831 23832 23833 23834 23835 23836 23837 23838 23839 23840 23841 23842 23843 23844 23845 23846 23847 23848 23849 23850 23851 23852 23853 23854 23855 23856 23857 23858 23859 23860 23861 23862 23863 23864 23865 23866 23867 23868 23869 23870 23871 23872 23873 23874 23875 23876 23877 23878 23879 23880 23881 23882 23883 23884 23885 23886 23887 23888 23889 23890 23891 23892 23893 23894 23895 23896 23897 23898 23899 23900 23901 23902 23903 23904 23905 23906 23907 23908 23909 23910 23911 23912 23913 23914 23915 23916 23917 23918 23919 23920 23921 23922 23923 23924 23925 23926 23927 23928 23929 23930 23931 23932 23933 23934 23935 23936 23937 23938 23939 23940 23941 23942 23943 23944 23945 23946 23947 23948 23949 23950 23951 23952 23953 23954 23955 23956 23957 23958 23959 23960 23961 23962 23963 23964 23965 23966 23967 23968 23969 23970 23971 23972 23973 23974 23975 23976 23977 23978 23979 23980 23981 23982 23983 23984 23985 23986 23987 23988 23989 23990 23991 23992 23993 23994 23995 23996 23997 23998 23999 24000 24001 24002 24003 24004 24005 24006 24007 24008 24009 24010 24011 24012 24013 24014 24015 24016 24017 24018 24019 24020 24021 24022 24023 24024 24025 24026 24027 24028 24029 24030 24031 24032 24033 24034 24035 24036 24037 24038 24039 24040 24041 24042 24043 24044 24045 24046 24047 24048 24049 24050 24051 24052 24053 24054 24055 24056 24057 24058 24059 24060 24061 24062 24063 24064 24065 24066 24067 24068 24069 24070 24071 24072 24073 24074 24075 24076 24077 24078 24079 24080 24081 24082 24083 24084 24085 24086 24087 24088 24089 24090 24091 24092 24093 24094 24095 24096 24097 24098 24099 24100 24101 24102 24103 24104 24105 24106 24107 24108 24109 24110 24111 24112 24113 24114 24115 24116 24117 24118 24119 24120 24121 24122 24123 24124 24125 24126 24127 24128 24129 24130 24131 24132 24133 24134 24135 24136 24137 24138 24139 24140 24141 24142 24143 24144 24145 24146 24147 24148 24149 24150 24151 24152 24153 24154 24155 24156 24157 24158 24159 24160 24161 24162 24163 24164 24165 24166 24167 24168 24169 24170 24171 24172 24173 24174 24175 24176 24177 24178 24179 24180 24181 24182 24183 24184 24185 24186 24187 24188 24189 24190 24191 24192 24193 24194 24195 24196 24197 24198 24199 24200 24201 24202 24203 24204 24205 24206 24207 24208 24209 24210 24211 24212 24213 24214 24215 24216 24217 24218 24219 24220 24221 24222 24223 24224 24225 24226 24227 24228 24229 24230 24231 24232 24233 24234 24235 24236 24237 24238 24239 24240 24241 24242 24243 24244 24245 24246 24247 24248 24249 24250 24251 24252 24253 24254 24255 24256 24257 24258 24259 24260 24261 24262 24263 24264 24265 24266 24267 24268 24269 24270 24271 24272 24273 24274 24275 24276 24277 24278 24279 24280 24281 24282 24283 24284 24285 24286 24287 24288 24289 24290 24291 24292 24293 24294 24295 24296 24297 24298 24299 24300 24301 24302 24303 24304 24305 24306 24307 24308 24309 24310 24311 24312 24313 24314 24315 24316 24317 24318 24319 24320 24321 24322 24323 24324 24325 24326 24327 24328 24329 24330 24331 24332 24333 24334 24335 24336 24337 24338 24339 24340 24341 24342 24343 24344 24345 24346 24347 24348 24349 24350 24351 24352 24353 24354 24355 24356 24357 24358 24359 24360 24361 24362 24363 24364 24365 24366 24367 24368 24369 24370 24371 24372 24373 24374 24375 24376 24377 24378 24379 24380 24381 24382 24383 24384 24385 24386 24387 24388 24389 24390 24391 24392 24393 24394 24395 24396 24397 24398 24399 24400 24401 24402 24403 24404 24405 24406 24407 24408 24409 24410 24411 24412 24413 24414 24415 24416 24417 24418 24419 24420 24421 24422 24423 24424 24425 24426 24427 24428 24429 24430 24431 24432 24433 24434 24435 24436 24437 24438 24439 24440 24441 24442 24443 24444 24445 24446 24447 24448 24449 24450 24451 24452 24453 24454 24455 24456 24457 24458 24459 24460 24461 24462 24463 24464 24465 24466 24467 24468 24469 24470 24471 24472 24473 24474 24475 24476 24477 24478 24479 24480 24481 24482 24483 24484 24485 24486 24487 24488 24489 24490 24491 24492 24493 24494 24495 24496 24497 24498 24499 24500 24501 24502 24503 24504 24505 24506 24507 24508 24509 24510 24511 24512 24513 24514 24515 24516 24517 24518 24519 24520 24521 24522 24523 24524 24525 24526 24527 24528 24529 24530 24531 24532 24533 24534 24535 24536 24537 24538 24539 24540 24541 24542 24543 24544 24545 24546 24547 24548 24549 24550 24551 24552 24553 24554 24555 24556 24557 24558 24559 24560 24561 24562 24563 24564 24565 24566 24567 24568 24569 24570 24571 24572 24573 24574 24575 24576 24577 24578 24579 24580 24581 24582 24583 24584 24585 24586 24587 24588 24589 24590 24591 24592 24593 24594 24595 24596 24597 24598 24599 24600 24601 24602 24603 24604 24605 24606 24607 24608 24609 24610 24611 24612 24613 24614 24615 24616 24617 24618 24619 24620 24621 24622 24623 24624 24625 24626 24627 24628 24629 24630 24631 24632 24633 24634 24635 24636 24637 24638 24639 24640 24641 24642 24643 24644 24645 24646 24647 24648 24649 24650 24651 24652 24653 24654 24655 24656 24657 24658 24659 24660 24661 24662 24663 24664 24665 24666 24667 24668 24669 24670 24671 24672 24673 24674 24675 24676 24677 24678 24679 24680 24681 24682 24683 24684 24685 24686 24687 24688 24689 24690 24691 24692 24693 24694 24695 24696 24697 24698 24699 24700 24701 24702 24703 24704 24705 24706 24707 24708 24709 24710 24711 24712 24713 24714 24715 24716 24717 24718 24719 24720 24721 24722 24723 24724 24725 24726 24727 24728 24729 24730 24731 24732 24733 24734 24735 24736 24737 24738 24739 24740 24741 24742 24743 24744 24745 24746 24747 24748 24749 24750 24751 24752 24753 24754 24755 24756 24757 24758 24759 24760 24761 24762 24763 24764 24765 24766 24767 24768 24769 24770 24771 24772 24773 24774 24775 24776 24777 24778 24779 24780 24781 24782 24783 24784 24785 24786 24787 24788 24789 24790 24791 24792 24793 24794 24795 24796 24797 24798 24799 24800 24801 24802 24803 24804 24805 24806 24807 24808 24809 24810 24811 24812 24813 24814 24815 24816 24817 24818 24819 24820 24821 24822 24823 24824 24825 24826 24827 24828 24829 24830 24831 24832 24833 24834 24835 24836 24837 24838 24839 24840 24841 24842 24843 24844 24845 24846 24847 24848 24849 24850 24851 24852 24853 24854 24855 24856 24857 24858 24859 24860 24861 24862 24863 24864 24865 24866 24867 24868 24869 24870 24871 24872 24873 24874 24875 24876 24877 24878 24879 24880 24881 24882 24883 24884 24885 24886 24887 24888 24889 24890 24891 24892 24893 24894 24895 24896 24897 24898 24899 24900 24901 24902 24903 24904 24905 24906 24907 24908 24909 24910 24911 24912 24913 24914 24915 24916 24917 24918 24919 24920 24921 24922 24923 24924 24925 24926 24927 24928 24929 24930 24931 24932 24933 24934 24935 24936 24937 24938 24939 24940 24941 24942 24943 24944 24945 24946 24947 24948 24949 24950 24951 24952 24953 24954 24955 24956 24957 24958 24959 24960 24961 24962 24963 24964 24965 24966 24967 24968 24969 24970 24971 24972 24973 24974 24975 24976 24977 24978 24979 24980 24981 24982 24983 24984 24985 24986 24987 24988 24989 24990 24991 24992 24993 24994 24995 24996 24997 24998 24999 25000 25001 25002 25003 25004 25005 25006 25007 25008 25009 25010 25011 25012 25013 25014 25015 25016 25017 25018 25019 25020 25021 25022 25023 25024 25025 25026 25027 25028 25029 25030 25031 25032 25033 25034 25035 25036 25037 25038 25039 25040 25041 25042 25043 25044 25045 25046 25047 25048 25049 25050 25051 25052 25053 25054 25055 25056 25057 25058 25059 25060 25061 25062 25063 25064 25065 25066 25067 25068 25069 25070 25071 25072 25073 25074 25075 25076 25077 25078 25079 25080 25081 25082 25083 25084 25085 25086 25087 25088 25089 25090 25091 25092 25093 25094 25095 25096 25097 25098 25099 25100 25101 25102 25103 25104 25105 25106 25107 25108 25109 25110 25111 25112 25113 25114 25115 25116 25117 25118 25119 25120 25121 25122 25123 25124 25125 25126 25127 25128 25129 25130 25131 25132 25133 25134 25135 25136 25137 25138 25139 25140 25141 25142 25143 25144 25145 25146 25147 25148 25149 25150 25151 25152 25153 25154 25155 25156 25157 25158 25159 25160 25161 25162 25163 25164 25165 25166 25167 25168 25169 25170 25171 25172 25173 25174 25175 25176 25177 25178 25179 25180 25181 25182 25183 25184 25185 25186 25187 25188 25189 25190 25191 25192 25193 25194 25195 25196 25197 25198 25199 25200 25201 25202 25203 25204 25205 25206 25207 25208 25209 25210 25211 25212 25213 25214 25215 25216 25217 25218 25219 25220 25221 25222 25223 25224 25225 25226 25227 25228 25229 25230 25231 25232 25233 25234 25235 25236 25237 25238 25239 25240 25241 25242 25243 25244 25245 25246 25247 25248 25249 25250 25251 25252 25253 25254 25255 25256 25257 25258 25259 25260 25261 25262 25263 25264 25265 25266 25267 25268 25269 25270 25271 25272 25273 25274 25275 25276 25277 25278 25279 25280 25281 25282 25283 25284 25285 25286 25287 25288 25289 25290 25291 25292 25293 25294 25295 25296 25297 25298 25299 25300 25301 25302 25303 25304 25305 25306 25307 25308 25309 25310 25311 25312 25313 25314 25315 25316 25317 25318 25319 25320 25321 25322 25323 25324 25325 25326 25327 25328 25329 25330 25331 25332 25333 25334 25335 25336 25337 25338 25339 25340 25341 25342 25343 25344 25345 25346 25347 25348 25349 25350 25351 25352 25353 25354 25355 25356 25357 25358 25359 25360 25361 25362 25363 25364 25365 25366 25367 25368 25369 25370 25371 25372 25373 25374 25375 25376 25377 25378 25379 25380 25381 25382 25383 25384 25385 25386 25387 25388 25389 25390 25391 25392 25393 25394 25395 25396 25397 25398 25399 25400 25401 25402 25403 25404 25405 25406 25407 25408 25409 25410 25411 25412 25413 25414 25415 25416 25417 25418 25419 25420 25421 25422 25423 25424 25425 25426 25427 25428 25429 25430 25431 25432 25433 25434 25435 25436 25437 25438 25439 25440 25441 25442 25443 25444 25445 25446 25447 25448 25449 25450 25451 25452 25453 25454 25455 25456 25457 25458 25459 25460 25461 25462 25463 25464 25465 25466 25467 25468 25469 25470 25471 25472 25473 25474 25475 25476 25477 25478 25479 25480 25481 25482 25483 25484 25485 25486 25487 25488 25489 25490 25491 25492 25493 25494 25495 25496 25497 25498 25499 25500 25501 25502 25503 25504 25505 25506 25507 25508 25509 25510 25511 25512 25513 25514 25515 25516 25517 25518 25519 25520 25521 25522 25523 25524 25525 25526 25527 25528 25529 25530 25531 25532 25533 25534 25535 25536 25537 25538 25539 25540 25541 25542 25543 25544 25545 25546 25547 25548 25549 25550 25551 25552 25553 25554 25555 25556 25557 25558 25559 25560 25561 25562 25563 25564 25565 25566 25567 25568 25569 25570 25571 25572 25573 25574 25575 25576 25577 25578 25579 25580 25581 25582 25583 25584 25585 25586 25587 25588 25589 25590 25591 25592 25593 25594 25595 25596 25597 25598 25599 25600 25601 25602 25603 25604 25605 25606 25607 25608 25609 25610 25611 25612 25613 25614 25615 25616 25617 25618 25619 25620 25621 25622 25623 25624 25625 25626 25627 25628 25629 25630 25631 25632 25633 25634 25635 25636 25637 25638 25639 25640 25641 25642 25643 25644 25645 25646 25647 25648 25649 25650 25651 25652 25653 25654 25655 25656 25657 25658 25659 25660 25661 25662 25663 25664 25665 25666 25667 25668 25669 25670 25671 25672 25673 25674 25675 25676 25677 25678 25679 25680 25681 25682 25683 25684 25685 25686 25687 25688 25689 25690 25691 25692 25693 25694 25695 25696 25697 25698 25699 25700 25701 25702 25703 25704 25705 25706 25707 25708 25709 25710 25711 25712 25713 25714 25715 25716 25717 25718 25719 25720 25721 25722 25723 25724 25725 25726 25727 25728 25729 25730 25731 25732 25733 25734 25735 25736 25737 25738 25739 25740 25741 25742 25743 25744 25745 25746 25747 25748 25749 25750 25751 25752 25753 25754 25755 25756 25757 25758 25759 25760 25761 25762 25763 25764 25765 25766 25767 25768 25769 25770 25771 25772 25773 25774 25775 25776 25777 25778 25779 25780 25781 25782 25783 25784 25785 25786 25787 25788 25789 25790 25791 25792 25793 25794 25795 25796 25797 25798 25799 25800 25801 25802 25803 25804 25805 25806 25807 25808 25809 25810 25811 25812 25813 25814 25815 25816 25817 25818 25819 25820 25821 25822 25823 25824 25825 25826 25827 25828 25829 25830 25831 25832 25833 25834 25835 25836 25837 25838 25839 25840 25841 25842 25843 25844 25845 25846 25847 25848 25849 25850 25851 25852 25853 25854 25855 25856 25857 25858 25859 25860 25861 25862 25863 25864 25865 25866 25867 25868 25869 25870 25871 25872 25873 25874 25875 25876 25877 25878 25879 25880 25881 25882 25883 25884 25885 25886 25887 25888 25889 25890 25891 25892 25893 25894 25895 25896 25897 25898 25899 25900 25901 25902 25903 25904 25905 25906 25907 25908 25909 25910 25911 25912 25913 25914 25915 25916 25917 25918 25919 25920 25921 25922 25923 25924 25925 25926 25927 25928 25929 25930 25931 25932 25933 25934 25935 25936 25937 25938 25939 25940 25941 25942 25943 25944 25945 25946 25947 25948 25949 25950 25951 25952 25953 25954 25955 25956 25957 25958 25959 25960 25961 25962 25963 25964 25965 25966 25967 25968 25969 25970 25971 25972 25973 25974 25975 25976 25977 25978 25979 25980 25981 25982 25983 25984 25985 25986 25987 25988 25989 25990 25991 25992 25993 25994 25995 25996 25997 25998 25999 26000 26001 26002 26003 26004 26005 26006 26007 26008 26009 26010 26011 26012 26013 26014 26015 26016 26017 26018 26019 26020 26021 26022 26023 26024 26025 26026 26027 26028 26029 26030 26031 26032 26033 26034 26035 26036 26037 26038 26039 26040 26041 26042 26043 26044 26045 26046 26047 26048 26049 26050 26051 26052 26053 26054 26055 26056 26057 26058 26059 26060 26061 26062 26063 26064 26065 26066 26067 26068 26069 26070 26071 26072 26073 26074 26075 26076 26077 26078 26079 26080 26081 26082 26083 26084 26085 26086 26087 26088 26089 26090 26091 26092 26093 26094 26095 26096 26097 26098 26099 26100 26101 26102 26103 26104 26105 26106 26107 26108 26109 26110 26111 26112 26113 26114 26115 26116 26117 26118 26119 26120 26121 26122 26123 26124 26125 26126 26127 26128 26129 26130 26131 26132 26133 26134 26135 26136 26137 26138 26139 26140 26141 26142 26143 26144 26145 26146 26147 26148 26149 26150 26151 26152 26153 26154 26155 26156 26157 26158 26159 26160 26161 26162 26163 26164 26165 26166 26167 26168 26169 26170 26171 26172 26173 26174 26175 26176 26177 26178 26179 26180 26181 26182 26183 26184 26185 26186 26187 26188 26189 26190 26191 26192 26193 26194 26195 26196 26197 26198 26199 26200 26201 26202 26203 26204 26205 26206 26207 26208 26209 26210 26211 26212 26213 26214 26215 26216 26217 26218 26219 26220 26221 26222 26223 26224 26225 26226 26227 26228 26229 26230 26231 26232 26233 26234 26235 26236 26237 26238 26239 26240 26241 26242 26243 26244 26245 26246 26247 26248 26249 26250 26251 26252 26253 26254 26255 26256 26257 26258 26259 26260 26261 26262 26263 26264 26265 26266 26267 26268 26269 26270 26271 26272 26273 26274 26275 26276 26277 26278 26279 26280 26281 26282 26283 26284 26285 26286 26287 26288 26289 26290 26291 26292 26293 26294 26295 26296 26297 26298 26299 26300 26301 26302 26303 26304 26305 26306 26307 26308 26309 26310 26311 26312 26313 26314 26315 26316 26317 26318 26319 26320 26321 26322 26323 26324 26325 26326 26327 26328 26329 26330 26331 26332 26333 26334 26335 26336 26337 26338 26339 26340 26341 26342 26343 26344 26345 26346 26347 26348 26349 26350 26351 26352 26353 26354 26355 26356 26357 26358 26359 26360 26361 26362 26363 26364 26365 26366 26367 26368 26369 26370 26371 26372 26373 26374 26375 26376 26377 26378 26379 26380 26381 26382 26383 26384 26385 26386 26387 26388 26389 26390 26391 26392 26393 26394 26395 26396 26397 26398 26399 26400 26401 26402 26403 26404 26405 26406 26407 26408 26409 26410 26411 26412 26413 26414 26415 26416 26417 26418 26419 26420 26421 26422 26423 26424 26425 26426 26427 26428 26429 26430 26431 26432 26433 26434 26435 26436 26437 26438 26439 26440 26441 26442 26443 26444 26445 26446 26447 26448 26449 26450 26451 26452 26453 26454 26455 26456 26457 26458 26459 26460 26461 26462 26463 26464 26465 26466 26467 26468 26469 26470 26471 26472 26473 26474 26475 26476 26477 26478 26479 26480 26481 26482 26483 26484 26485 26486 26487 26488 26489 26490 26491 26492 26493 26494 26495 26496 26497 26498 26499 26500 26501 26502 26503 26504 26505 26506 26507 26508 26509 26510 26511 26512 26513 26514 26515 26516 26517 26518 26519 26520 26521 26522 26523 26524 26525 26526 26527 26528 26529 26530 26531 26532 26533 26534 26535 26536 26537 26538 26539 26540 26541 26542 26543 26544 26545 26546 26547 26548 26549 26550 26551 26552 26553 26554 26555 26556 26557 26558 26559 26560 26561 26562 26563 26564 26565 26566 26567 26568 26569 26570 26571 26572 26573 26574 26575 26576 26577 26578 26579 26580 26581 26582 26583 26584 26585 26586 26587 26588 26589 26590 26591 26592 26593 26594 26595 26596 26597 26598 26599 26600 26601 26602 26603 26604 26605 26606 26607 26608 26609 26610 26611 26612 26613 26614 26615 26616 26617 26618 26619 26620 26621 26622 26623 26624 26625 26626 26627 26628 26629 26630 26631 26632 26633 26634 26635 26636 26637 26638 26639 26640 26641 26642 26643 26644 26645 26646 26647 26648 26649 26650 26651 26652 26653 26654 26655 26656 26657 26658 26659 26660 26661 26662 26663 26664 26665 26666 26667 26668 26669 26670 26671 26672 26673 26674 26675 26676 26677 26678 26679 26680 26681 26682 26683 26684 26685 26686 26687 26688 26689 26690 26691 26692 26693 26694 26695 26696 26697 26698 26699 26700 26701 26702 26703 26704 26705 26706 26707 26708 26709 26710 26711 26712 26713 26714 26715 26716 26717 26718 26719 26720 26721 26722 26723 26724 26725 26726 26727 26728 26729 26730 26731 26732 26733 26734 26735 26736 26737 26738 26739 26740 26741 26742 26743 26744 26745 26746 26747 26748 26749 26750 26751 26752 26753 26754 26755 26756 26757 26758 26759 26760 26761 26762 26763 26764 26765 26766 26767 26768 26769 26770 26771 26772 26773 26774 26775 26776 26777 26778 26779 26780 26781 26782 26783 26784 26785 26786 26787 26788 26789 26790 26791 26792 26793 26794 26795 26796 26797 26798 26799 26800 26801 26802 26803 26804 26805 26806 26807 26808 26809 26810 26811 26812 26813 26814 26815 26816 26817 26818 26819 26820 26821 26822 26823 26824 26825 26826 26827 26828 26829 26830 26831 26832 26833 26834 26835 26836 26837 26838 26839 26840 26841 26842 26843 26844 26845 26846 26847 26848 26849 26850 26851 26852 26853 26854 26855 26856 26857 26858 26859 26860 26861 26862 26863 26864 26865 26866 26867 26868 26869 26870 26871 26872 26873 26874 26875 26876 26877 26878 26879 26880 26881 26882 26883 26884 26885 26886 26887 26888 26889 26890 26891 26892 26893 26894 26895 26896 26897 26898 26899 26900 26901 26902 26903 26904 26905 26906 26907 26908 26909 26910 26911 26912 26913 26914 26915 26916 26917 26918 26919 26920 26921 26922 26923 26924 26925 26926 26927 26928 26929 26930 26931 26932 26933 26934 26935 26936 26937 26938 26939 26940 26941 26942 26943 26944 26945 26946 26947 26948 26949 26950 26951 26952 26953 26954 26955 26956 26957 26958 26959 26960 26961 26962 26963 26964 26965 26966 26967 26968 26969 26970 26971 26972 26973 26974 26975 26976 26977 26978 26979 26980 26981 26982 26983 26984 26985 26986 26987 26988 26989 26990 26991 26992 26993 26994 26995 26996 26997 26998 26999 27000 27001 27002 27003 27004 27005 27006 27007 27008 27009 27010 27011 27012 27013 27014 27015 27016 27017 27018 27019 27020 27021 27022 27023 27024 27025 27026 27027 27028 27029 27030 27031 27032 27033 27034 27035 27036 27037 27038 27039 27040 27041 27042 27043 27044 27045 27046 27047 27048 27049 27050 27051 27052 27053 27054 27055 27056 27057 27058 27059 27060 27061 27062 27063 27064 27065 27066 27067 27068 27069 27070 27071 27072 27073 27074 27075 27076 27077 27078 27079 27080 27081 27082 27083 27084 27085 27086 27087 27088 27089 27090 27091 27092 27093 27094 27095 27096 27097 27098 27099 27100 27101 27102 27103 27104 27105 27106 27107 27108 27109 27110 27111 27112 27113 27114 27115 27116 27117 27118 27119 27120 27121 27122 27123 27124 27125 27126 27127 27128 27129 27130 27131 27132 27133 27134 27135 27136 27137 27138 27139 27140 27141 27142 27143 27144 27145 27146 27147 27148 27149 27150 27151 27152 27153 27154 27155 27156 27157 27158 27159 27160 27161 27162 27163 27164 27165 27166 27167 27168 27169 27170 27171 27172 27173 27174 27175 27176 27177 27178 27179 27180 27181 27182 27183 27184 27185 27186 27187 27188 27189 27190 27191 27192 27193 27194 27195 27196 27197 27198 27199 27200 27201 27202 27203 27204 27205 27206 27207 27208 27209 27210 27211 27212 27213 27214 27215 27216 27217 27218 27219 27220 27221 27222 27223 27224 27225 27226 27227 27228 27229 27230 27231 27232 27233 27234 27235 27236 27237 27238 27239 27240 27241 27242 27243 27244 27245 27246 27247 27248 27249 27250 27251 27252 27253 27254 27255 27256 27257 27258 27259 27260 27261 27262 27263 27264 27265 27266 27267 27268 27269 27270 27271 27272 27273 27274 27275 27276 27277 27278 27279 27280 27281 27282 27283 27284 27285 27286 27287 27288 27289 27290 27291 27292 27293 27294 27295 27296 27297 27298 27299 27300 27301 27302 27303 27304 27305 27306 27307 27308 27309 27310 27311 27312 27313 27314 27315 27316 27317 27318 27319 27320 27321 27322 27323 27324 27325 27326 27327 27328 27329 27330 27331 27332 27333 27334 27335 27336 27337 27338 27339 27340 27341 27342 27343 27344 27345 27346 27347 27348 27349 27350 27351 27352 27353 27354 27355 27356 27357 27358 27359 27360 27361 27362 27363 27364 27365 27366 27367 27368 27369 27370 27371 27372 27373 27374 27375 27376 27377 27378 27379 27380 27381 27382 27383 27384 27385 27386 27387 27388 27389 27390 27391 27392 27393 27394 27395 27396 27397 27398 27399 27400 27401 27402 27403 27404 27405 27406 27407 27408 27409 27410 27411 27412 27413 27414 27415 27416 27417 27418 27419 27420 27421 27422 27423 27424 27425 27426 27427 27428 27429 27430 27431 27432 27433 27434 27435 27436 27437 27438 27439 27440 27441 27442 27443 27444 27445 27446 27447 27448 27449 27450 27451 27452 27453 27454 27455 27456 27457 27458 27459 27460 27461 27462 27463 27464 27465 27466 27467 27468 27469 27470 27471 27472 27473 27474 27475 27476 27477 27478 27479 27480 27481 27482 27483 27484 27485 27486 27487 27488 27489 27490 27491 27492 27493 27494 27495 27496 27497 27498 27499 27500 27501 27502 27503 27504 27505 27506 27507 27508 27509 27510 27511 27512 27513 27514 27515 27516 27517 27518 27519 27520 27521 27522 27523 27524 27525 27526 27527 27528 27529 27530 27531 27532 27533 27534 27535 27536 27537 27538 27539 27540 27541 27542 27543 27544 27545 27546 27547 27548 27549 27550 27551 27552 27553 27554 27555 27556 27557 27558 27559 27560 27561 27562 27563 27564 27565 27566 27567 27568 27569 27570 27571 27572 27573 27574 27575 27576 27577 27578 27579 27580 27581 27582 27583 27584 27585 27586 27587 27588 27589 27590 27591 27592 27593 27594 27595 27596 27597 27598 27599 27600 27601 27602 27603 27604 27605 27606 27607 27608 27609 27610 27611 27612 27613 27614 27615 27616 27617 27618 27619 27620 27621 27622 27623 27624 27625 27626 27627 27628 27629 27630 27631 27632 27633 27634 27635 27636 27637 27638 27639 27640 27641 27642 27643 27644 27645 27646 27647 27648 27649 27650 27651 27652 27653 27654 27655 27656 27657 27658 27659 27660 27661 27662 27663 27664 27665 27666 27667 27668 27669 27670 27671 27672 27673 27674 27675 27676 27677 27678 27679 27680 27681 27682 27683 27684 27685 27686 27687 27688 27689 27690 27691 27692 27693 27694 27695 27696 27697 27698 27699 27700 27701 27702 27703 27704 27705 27706 27707 27708 27709 27710 27711 27712 27713 27714 27715 27716 27717 27718 27719 27720 27721 27722 27723 27724 27725 27726 27727 27728 27729 27730 27731 27732 27733 27734 27735 27736 27737 27738 27739 27740 27741 27742 27743 27744 27745 27746 27747 27748 27749 27750 27751 27752 27753 27754 27755 27756 27757 27758 27759 27760 27761 27762 27763 27764 27765 27766 27767 27768 27769 27770 27771 27772 27773 27774 27775 27776 27777 27778 27779 27780 27781 27782 27783 27784 27785 27786 27787 27788 27789 27790 27791 27792 27793 27794 27795 27796 27797 27798 27799 27800 27801 27802 27803 27804 27805 27806 27807 27808 27809 27810 27811 27812 27813 27814 27815 27816 27817 27818 27819 27820 27821 27822 27823 27824 27825 27826 27827 27828 27829 27830 27831 27832 27833 27834 27835 27836 27837 27838 27839 27840 27841 27842 27843 27844 27845 27846 27847 27848 27849 27850 27851 27852 27853 27854 27855 27856 27857 27858 27859 27860 27861 27862 27863 27864 27865 27866 27867 27868 27869 27870 27871 27872 27873 27874 27875 27876 27877 27878 27879 27880 27881 27882 27883 27884 27885 27886 27887 27888 27889 27890 27891 27892 27893 27894 27895 27896 27897 27898 27899 27900 27901 27902 27903 27904 27905 27906 27907 27908 27909 27910 27911 27912 27913 27914 27915 27916 27917 27918 27919 27920 27921 27922 27923 27924 27925 27926 27927 27928 27929 27930 27931 27932 27933 27934 27935 27936 27937 27938 27939 27940 27941 27942 27943 27944 27945 27946 27947 27948 27949 27950 27951 27952 27953 27954 27955 27956 27957 27958 27959 27960 27961 27962 27963 27964 27965 27966 27967 27968 27969 27970 27971 27972 27973 27974 27975 27976 27977 27978 27979 27980 27981 27982 27983 27984 27985 27986 27987 27988 27989 27990 27991 27992 27993 27994 27995 27996 27997 27998 27999 28000 28001 28002 28003 28004 28005 28006 28007 28008 28009 28010 28011 28012 28013 28014 28015 28016 28017 28018 28019 28020 28021 28022 28023 28024 28025 28026 28027 28028 28029 28030 28031 28032 28033 28034 28035 28036 28037 28038 28039 28040 28041 28042 28043 28044 28045 28046 28047 28048 28049 28050 28051 28052 28053 28054 28055 28056 28057 28058 28059 28060 28061 28062 28063 28064 28065 28066 28067 28068 28069 28070 28071 28072 28073 28074 28075 28076 28077 28078 28079 28080 28081 28082 28083 28084 28085 28086 28087 28088 28089 28090 28091 28092 28093 28094 28095 28096 28097 28098 28099 28100 28101 28102 28103 28104 28105 28106 28107 28108 28109 28110 28111 28112 28113 28114 28115 28116 28117 28118 28119 28120 28121 28122 28123 28124 28125 28126 28127 28128 28129 28130 28131 28132 28133 28134 28135 28136 28137 28138 28139 28140 28141 28142 28143 28144 28145 28146 28147 28148 28149 28150 28151 28152 28153 28154 28155 28156 28157 28158 28159 28160 28161 28162 28163 28164 28165 28166 28167 28168 28169 28170 28171 28172 28173 28174 28175 28176 28177 28178 28179 28180 28181 28182 28183 28184 28185 28186 28187 28188 28189 28190 28191 28192 28193 28194 28195 28196 28197 28198 28199 28200 28201 28202 28203 28204 28205 28206 28207 28208 28209 28210 28211 28212 28213 28214 28215 28216 28217 28218 28219 28220 28221 28222 28223 28224 28225 28226 28227 28228 28229 28230 28231 28232 28233 28234 28235 28236 28237 28238 28239 28240 28241 28242 28243 28244 28245 28246 28247 28248 28249 28250 28251 28252 28253 28254 28255 28256 28257 28258 28259 28260 28261 28262 28263 28264 28265 28266 28267 28268 28269 28270 28271 28272 28273 28274 28275 28276 28277 28278 28279 28280 28281 28282 28283 28284 28285 28286 28287 28288 28289 28290 28291 28292 28293 28294 28295 28296 28297 28298 28299 28300 28301 28302 28303 28304 28305 28306 28307 28308 28309 28310 28311 28312 28313 28314 28315 28316 28317 28318 28319 28320 28321 28322 28323 28324 28325 28326 28327 28328 28329 28330 28331 28332 28333 28334 28335 28336 28337 28338 28339 28340 28341 28342 28343 28344 28345 28346 28347 28348 28349 28350 28351 28352 28353 28354 28355 28356 28357 28358 28359 28360 28361 28362 28363 28364 28365 28366 28367 28368 28369 28370 28371 28372 28373 28374 28375 28376 28377 28378 28379 28380 28381 28382 28383 28384 28385 28386 28387 28388 28389 28390 28391 28392 28393 28394 28395 28396 28397 28398 28399 28400 28401 28402 28403 28404 28405 28406 28407 28408 28409 28410 28411 28412 28413 28414 28415 28416 28417 28418 28419 28420 28421 28422 28423 28424 28425 28426 28427 28428 28429 28430 28431 28432 28433 28434 28435 28436 28437 28438 28439 28440 28441 28442 28443 28444 28445 28446 28447 28448 28449 28450 28451 28452 28453 28454 28455 28456 28457 28458 28459 28460 28461 28462 28463 28464 28465 28466 28467 28468 28469 28470 28471 28472 28473 28474 28475 28476 28477 28478 28479 28480 28481 28482 28483 28484 28485 28486 28487 28488 28489 28490 28491 28492 28493 28494 28495 28496 28497 28498 28499 28500 28501 28502 28503 28504 28505 28506 28507 28508 28509 28510 28511 28512 28513 28514 28515 28516 28517 28518 28519 28520 28521 28522 28523 28524 28525 28526 28527 28528 28529 28530 28531 28532 28533 28534 28535 28536 28537 28538 28539 28540 28541 28542 28543 28544 28545 28546 28547 28548 28549 28550 28551 28552 28553 28554 28555 28556 28557 28558 28559 28560 28561 28562 28563 28564 28565 28566 28567 28568 28569 28570 28571 28572 28573 28574 28575 28576 28577 28578 28579 28580 28581 28582 28583 28584 28585 28586 28587 28588 28589 28590 28591 28592 28593 28594 28595 28596 28597 28598 28599 28600 28601 28602 28603 28604 28605 28606 28607 28608 28609 28610 28611 28612 28613 28614 28615 28616 28617 28618 28619 28620 28621 28622 28623 28624 28625 28626 28627 28628 28629 28630 28631 28632 28633 28634 28635 28636 28637 28638 28639 28640 28641 28642 28643 28644 28645 28646 28647 28648 28649 28650 28651 28652 28653 28654 28655 28656 28657 28658 28659 28660 28661 28662 28663 28664 28665 28666 28667 28668 28669 28670 28671 28672 28673 28674 28675 28676 28677 28678 28679 28680 28681 28682 28683 28684 28685 28686 28687 28688 28689 28690 28691 28692 28693 28694 28695 28696 28697 28698 28699 28700 28701 28702 28703 28704 28705 28706 28707 28708 28709 28710 28711 28712 28713 28714 28715 28716 28717 28718 28719 28720 28721 28722 28723 28724 28725 28726 28727 28728 28729 28730 28731 28732 28733 28734 28735 28736 28737 28738 28739 28740 28741 28742 28743 28744 28745 28746 28747 28748 28749 28750 28751 28752 28753 28754 28755 28756 28757 28758 28759 28760 28761 28762 28763 28764 28765 28766 28767 28768 28769 28770 28771 28772 28773 28774 28775 28776 28777 28778 28779 28780 28781 28782 28783 28784 28785 28786 28787 28788 28789 28790 28791 28792 28793 28794 28795 28796 28797 28798 28799 28800 28801 28802 28803 28804 28805 28806 28807 28808 28809 28810 28811 28812 28813 28814 28815 28816 28817 28818 28819 28820 28821 28822 28823 28824 28825 28826 28827 28828 28829 28830 28831 28832 28833 28834 28835 28836 28837 28838 28839 28840 28841 28842 28843 28844 28845 28846 28847 28848 28849 28850 28851 28852 28853 28854 28855 28856 28857 28858 28859 28860 28861 28862 28863 28864 28865 28866 28867 28868 28869 28870 28871 28872 28873 28874 28875 28876 28877 28878 28879 28880 28881 28882 28883 28884 28885 28886 28887 28888 28889 28890 28891 28892 28893 28894 28895 28896 28897 28898 28899 28900 28901 28902 28903 28904 28905 28906 28907 28908 28909 28910 28911 28912 28913 28914 28915 28916 28917 28918 28919 28920 28921 28922 28923 28924 28925 28926 28927 28928 28929 28930 28931 28932 28933 28934 28935 28936 28937 28938 28939 28940 28941 28942 28943 28944 28945 28946 28947 28948 28949 28950 28951 28952 28953 28954 28955 28956 28957 28958 28959 28960 28961 28962 28963 28964 28965 28966 28967 28968 28969 28970 28971 28972 28973 28974 28975 28976 28977 28978 28979 28980 28981 28982 28983 28984 28985 28986 28987 28988 28989 28990 28991 28992 28993 28994 28995 28996 28997 28998 28999 29000 29001 29002 29003 29004 29005 29006 29007 29008 29009 29010 29011 29012 29013 29014 29015 29016 29017 29018 29019 29020 29021 29022 29023 29024 29025 29026 29027 29028 29029 29030 29031 29032 29033 29034 29035 29036 29037 29038 29039 29040 29041 29042 29043 29044 29045 29046 29047 29048 29049 29050 29051 29052 29053 29054 29055 29056 29057 29058 29059 29060 29061 29062 29063 29064 29065 29066 29067 29068 29069 29070 29071 29072 29073 29074 29075 29076 29077 29078 29079 29080 29081 29082 29083 29084 29085 29086 29087 29088 29089 29090 29091 29092 29093 29094 29095 29096 29097 29098 29099 29100 29101 29102 29103 29104 29105 29106 29107 29108 29109 29110 29111 29112 29113 29114 29115 29116 29117 29118 29119 29120 29121 29122 29123 29124 29125 29126 29127 29128 29129 29130 29131 29132 29133 29134 29135 29136 29137 29138 29139 29140 29141 29142 29143 29144 29145 29146 29147 29148 29149 29150 29151 29152 29153 29154 29155 29156 29157 29158 29159 29160 29161 29162 29163 29164 29165 29166 29167 29168 29169 29170 29171 29172 29173 29174 29175 29176 29177 29178 29179 29180 29181 29182 29183 29184 29185 29186 29187 29188 29189 29190 29191 29192 29193 29194 29195 29196 29197 29198 29199 29200 29201 29202 29203 29204 29205 29206 29207 29208 29209 29210 29211 29212 29213 29214 29215 29216 29217 29218 29219 29220 29221 29222 29223 29224 29225 29226 29227 29228 29229 29230 29231 29232 29233 29234 29235 29236 29237 29238 29239 29240 29241 29242 29243 29244 29245 29246 29247 29248 29249 29250 29251 29252 29253 29254 29255 29256 29257 29258 29259 29260 29261 29262 29263 29264 29265 29266 29267 29268 29269 29270 29271 29272 29273 29274 29275 29276 29277 29278 29279 29280 29281 29282 29283 29284 29285 29286 29287 29288 29289 29290 29291 29292 29293 29294 29295 29296 29297 29298 29299 29300 29301 29302 29303 29304 29305 29306 29307 29308 29309 29310 29311 29312 29313 29314 29315 29316 29317 29318 29319 29320 29321 29322 29323 29324 29325 29326 29327 29328 29329 29330 29331 29332 29333 29334 29335 29336 29337 29338 29339 29340 29341 29342 29343 29344 29345 29346 29347 29348 29349 29350 29351 29352 29353 29354 29355 29356 29357 29358 29359 29360 29361 29362 29363 29364 29365 29366 29367 29368 29369 29370 29371 29372 29373 29374 29375 29376 29377 29378 29379 29380 29381 29382 29383 29384 29385 29386 29387 29388 29389 29390 29391 29392 29393 29394 29395 29396 29397 29398 29399 29400 29401 29402 29403 29404 29405 29406 29407 29408 29409 29410 29411 29412 29413 29414 29415 29416 29417 29418 29419 29420 29421 29422 29423 29424 29425 29426 29427 29428 29429 29430 29431 29432 29433 29434 29435 29436 29437 29438 29439 29440 29441 29442 29443 29444 29445 29446 29447 29448 29449 29450 29451 29452 29453 29454 29455 29456 29457 29458 29459 29460 29461 29462 29463 29464 29465 29466 29467 29468 29469 29470 29471 29472 29473 29474 29475 29476 29477 29478 29479 29480 29481 29482 29483 29484 29485 29486 29487 29488 29489 29490 29491 29492 29493 29494 29495 29496 29497 29498 29499 29500 29501 29502 29503 29504 29505 29506 29507 29508 29509 29510 29511 29512 29513 29514 29515 29516 29517 29518 29519 29520 29521 29522 29523 29524 29525 29526 29527 29528 29529 29530 29531 29532 29533 29534 29535 29536 29537 29538 29539 29540 29541 29542 29543 29544 29545 29546 29547 29548 29549 29550 29551 29552 29553 29554 29555 29556 29557 29558 29559 29560 29561 29562 29563 29564 29565 29566 29567 29568 29569 29570 29571 29572 29573 29574 29575 29576 29577 29578 29579 29580 29581 29582 29583 29584 29585 29586 29587 29588 29589 29590 29591 29592 29593 29594 29595 29596 29597 29598 29599 29600 29601 29602 29603 29604 29605 29606 29607 29608 29609 29610 29611 29612 29613 29614 29615 29616 29617 29618 29619 29620 29621 29622 29623 29624 29625 29626 29627 29628 29629 29630 29631 29632 29633 29634 29635 29636 29637 29638 29639 29640 29641 29642 29643 29644 29645 29646 29647 29648 29649 29650 29651 29652 29653 29654 29655 29656 29657 29658 29659 29660 29661 29662 29663 29664 29665 29666 29667 29668 29669 29670 29671 29672 29673 29674 29675 29676 29677 29678 29679 29680 29681 29682 29683 29684 29685 29686 29687 29688 29689 29690 29691 29692 29693 29694 29695 29696 29697 29698 29699 29700 29701 29702 29703 29704 29705 29706 29707 29708 29709 29710 29711 29712 29713 29714 29715 29716 29717 29718 29719 29720 29721 29722 29723 29724 29725 29726 29727 29728 29729 29730 29731 29732 29733 29734 29735 29736 29737 29738 29739 29740 29741 29742 29743 29744 29745 29746 29747 29748 29749 29750 29751 29752 29753 29754 29755 29756 29757 29758 29759 29760 29761 29762 29763 29764 29765 29766 29767 29768 29769 29770 29771 29772 29773 29774 29775 29776 29777 29778 29779 29780 29781 29782 29783 29784 29785 29786 29787 29788 29789 29790 29791 29792 29793 29794 29795 29796 29797 29798 29799 29800 29801 29802 29803 29804 29805 29806 29807 29808 29809 29810 29811 29812 29813 29814 29815 29816 29817 29818 29819 29820 29821 29822 29823 29824 29825 29826 29827 29828 29829 29830 29831 29832 29833 29834 29835 29836 29837 29838 29839 29840 29841 29842 29843 29844 29845 29846 29847 29848 29849 29850 29851 29852 29853 29854 29855 29856 29857 29858 29859 29860 29861 29862 29863 29864 29865 29866 29867 29868 29869 29870 29871 29872 29873 29874 29875 29876 29877 29878 29879 29880 29881 29882 29883 29884 29885 29886 29887 29888 29889 29890 29891 29892 29893 29894 29895 29896 29897 29898 29899 29900 29901 29902 29903 29904 29905 29906 29907 29908 29909 29910 29911 29912 29913 29914 29915 29916 29917 29918 29919 29920 29921 29922 29923 29924 29925 29926 29927 29928 29929 29930 29931 29932 29933 29934 29935 29936 29937 29938 29939 29940 29941 29942 29943 29944 29945 29946 29947 29948 29949 29950 29951 29952 29953 29954 29955 29956 29957 29958 29959 29960 29961 29962 29963 29964 29965 29966 29967 29968 29969 29970 29971 29972 29973 29974 29975 29976 29977 29978 29979 29980 29981 29982 29983 29984 29985 29986 29987 29988 29989 29990 29991 29992 29993 29994 29995 29996 29997 29998 29999 30000 30001 30002 30003 30004 30005 30006 30007 30008 30009 30010 30011 30012 30013 30014 30015 30016 30017 30018 30019 30020 30021 30022 30023 30024 30025 30026 30027 30028 30029 30030 30031 30032 30033 30034 30035 30036 30037 30038 30039 30040 30041 30042 30043 30044 30045 30046 30047 30048 30049 30050 30051 30052 30053 30054 30055 30056 30057 30058 30059 30060 30061 30062 30063 30064 30065 30066 30067 30068 30069 30070 30071 30072 30073 30074 30075 30076 30077 30078 30079 30080 30081 30082 30083 30084 30085 30086 30087 30088 30089 30090 30091 30092 30093 30094 30095 30096 30097 30098 30099 30100 30101 30102 30103 30104 30105 30106 30107 30108 30109 30110 30111 30112 30113 30114 30115 30116 30117 30118 30119 30120 30121 30122 30123 30124 30125 30126 30127 30128 30129 30130 30131 30132 30133 30134 30135 30136 30137 30138 30139 30140 30141 30142 30143 30144 30145 30146 30147 30148 30149 30150 30151 30152 30153 30154 30155 30156 30157 30158 30159 30160 30161 30162 30163 30164 30165 30166 30167 30168 30169 30170 30171 30172 30173 30174 30175 30176 30177 30178 30179 30180 30181 30182 30183 30184 30185 30186 30187 30188 30189 30190 30191 30192 30193 30194 30195 30196 30197 30198 30199 30200 30201 30202 30203 30204 30205 30206 30207 30208 30209 30210 30211 30212 30213 30214 30215 30216 30217 30218 30219 30220 30221 30222 30223 30224 30225 30226 30227 30228 30229 30230 30231 30232 30233 30234 30235 30236 30237 30238 30239 30240 30241 30242 30243 30244 30245 30246 30247 30248 30249 30250 30251 30252 30253 30254 30255 30256 30257 30258 30259 30260 30261 30262 30263 30264 30265 30266 30267 30268 30269 30270 30271 30272 30273 30274 30275 30276 30277 30278 30279 30280 30281 30282 30283 30284 30285 30286 30287 30288 30289 30290 30291 30292 30293 30294 30295 30296 30297 30298 30299 30300 30301 30302 30303 30304 30305 30306 30307 30308 30309 30310 30311 30312 30313 30314 30315 30316 30317 30318 30319 30320 30321 30322 30323 30324 30325 30326 30327 30328 30329 30330 30331 30332 30333 30334 30335 30336 30337 30338 30339 30340 30341 30342 30343 30344 30345 30346 30347 30348 30349 30350 30351 30352 30353 30354 30355 30356 30357 30358 30359 30360 30361 30362 30363 30364 30365 30366 30367 30368 30369 30370 30371 30372 30373 30374 30375 30376 30377 30378 30379 30380 30381 30382 30383 30384 30385 30386 30387 30388 30389 30390 30391 30392 30393 30394 30395 30396 30397 30398 30399 30400 30401 30402 30403 30404 30405 30406 30407 30408 30409 30410 30411 30412 30413 30414 30415 30416 30417 30418 30419 30420 30421 30422 30423 30424 30425 30426 30427 30428 30429 30430 30431 30432 30433 30434 30435 30436 30437 30438 30439 30440 30441 30442 30443 30444 30445 30446 30447 30448 30449 30450 30451 30452 30453 30454 30455 30456 30457 30458 30459 30460 30461 30462 30463 30464 30465 30466 30467 30468 30469 30470 30471 30472 30473 30474 30475 30476 30477 30478 30479 30480 30481 30482 30483 30484 30485 30486 30487 30488 30489 30490 30491 30492 30493 30494 30495 30496 30497 30498 30499 30500 30501 30502 30503 30504 30505 30506 30507 30508 30509 30510 30511 30512 30513 30514 30515 30516 30517 30518 30519 30520 30521 30522 30523 30524 30525 30526 30527 30528 30529 30530 30531 30532 30533 30534 30535 30536 30537 30538 30539 30540 30541 30542 30543 30544 30545 30546 30547 30548 30549 30550 30551 30552 30553 30554 30555 30556 30557 30558 30559 30560 30561 30562 30563 30564 30565 30566 30567 30568 30569 30570 30571 30572 30573 30574 30575 30576 30577 30578 30579 30580 30581 30582 30583 30584 30585 30586 30587 30588 30589 30590 30591 30592 30593 30594 30595 30596 30597 30598 30599 30600 30601 30602 30603 30604 30605 30606 30607 30608 30609 30610 30611 30612 30613 30614 30615 30616 30617 30618 30619 30620 30621 30622 30623 30624 30625 30626 30627 30628 30629 30630 30631 30632 30633 30634 30635 30636 30637 30638 30639 30640 30641 30642 30643 30644 30645 30646 30647 30648 30649 30650 30651 30652 30653 30654 30655 30656 30657 30658 30659 30660 30661 30662 30663 30664 30665 30666 30667 30668 30669 30670 30671 30672 30673 30674 30675 30676 30677 30678 30679 30680 30681 30682 30683 30684 30685 30686 30687 30688 30689 30690 30691 30692 30693 30694 30695 30696 30697 30698 30699 30700 30701 30702 30703 30704 30705 30706 30707 30708 30709 30710 30711 30712 30713 30714 30715 30716 30717 30718 30719 30720 30721 30722 30723 30724 30725 30726 30727 30728 30729 30730 30731 30732 30733 30734 30735 30736 30737 30738 30739 30740 30741 30742 30743 30744 30745 30746 30747 30748 30749 30750 30751 30752 30753 30754 30755 30756 30757 30758 30759 30760 30761 30762 30763 30764 30765 30766 30767 30768 30769 30770 30771 30772 30773 30774 30775 30776 30777 30778 30779 30780 30781 30782 30783 30784 30785 30786 30787 30788 30789 30790 30791 30792 30793 30794 30795 30796 30797 30798 30799 30800 30801 30802 30803 30804 30805 30806 30807 30808 30809 30810 30811 30812 30813 30814 30815 30816 30817 30818 30819 30820 30821 30822 30823 30824 30825 30826 30827 30828 30829 30830 30831 30832 30833 30834 30835 30836 30837 30838 30839 30840 30841 30842 30843 30844 30845 30846 30847 30848 30849 30850 30851 30852 30853 30854 30855 30856 30857 30858 30859 30860 30861 30862 30863 30864 30865 30866 30867 30868 30869 30870 30871 30872 30873 30874 30875 30876 30877 30878 30879 30880 30881 30882 30883 30884 30885 30886 30887 30888 30889 30890 30891 30892 30893 30894 30895 30896 30897 30898 30899 30900 30901 30902 30903 30904 30905 30906 30907 30908 30909 30910 30911 30912 30913 30914 30915 30916 30917 30918 30919 30920 30921 30922 30923 30924 30925 30926 30927 30928 30929 30930 30931 30932 30933 30934 30935 30936 30937 30938 30939 30940 30941 30942 30943 30944 30945 30946 30947 30948 30949 30950 30951 30952 30953 30954 30955 30956 30957 30958 30959 30960 30961 30962 30963 30964 30965 30966 30967 30968 30969 30970 30971 30972 30973 30974 30975 30976 30977 30978 30979 30980 30981 30982 30983 30984 30985 30986 30987 30988 30989 30990 30991 30992 30993 30994 30995 30996 30997 30998 30999
|
commit 894b0955a250682da0d2d3f074f910aaaf88c168
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 23:39:05 2025 -0500
Update ReleaseNotes.md.
commit e10e526b3ed0271cf4c42cd36815e182aacabe4b
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Fri Nov 29 02:16:48 2024 +0200
Add complex return detection for nvfortran (#765)
Details:
- Search for Intel ifx and NVIDIA/PGI Fortran compilers.
- Correctly determine the Fortran compiler vendor for Intel ifx and NVIDIA/PGI compilers.
- Determine the compiler version and correct Fortran complex return type for NVIDIA/PGI.
(cherry picked from commit 12f2efa7dfe11a684d62af02592499d91b7e344b)
commit 9a5c0290b8996bffe1a48511255c7055b777713c
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Fri Jan 24 21:44:32 2025 +0000
Blacklist KNL with GCC 15+ (#844)
Details:
- GCC 15 drops support for Xeon Phi architectures such as KNL.
- This PR blacklists the `knl` configuration for GCC 15+.
(cherry picked from commit 7e8a5891902312a281bce37037eaa06d7d501639)
commit 154f9fcd9cf24b85451685c7e91e34a836375aed
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 17:14:58 2025 -0500
CHANGELOG update (2.0)
commit 200c795373e8ddeffee8d957dcadd05f7e10ab7f
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 17:13:42 2025 -0500
Update ReleaseNotes.md.
commit 11b276fb3b3848cc40cc5ceb87317f885fa90547
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 17:08:09 2025 -0500
Add refined version macros.
Details:
- The `BLIS_VERSION` macro currently provides a string literal version of the BLIS version.
- Add `BLIS_VERSION_{MAJOR,MINOR,REVISION}` macros which provide integer literals useful for programmatically comparing versions from `blis.h` alone.
(cherry picked from commit 290af2ea8f06a84bc4792a3e64b99f539bb347a7)
commit cfbb94d7b8165f0a94c671320bace59f82626cd0
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 16:46:23 2025 -0500
Remove unnecessary OpenMP include. (#875)
Details:
- Previously, `<omp.h>` was included in `bli_thrcomm_openmp.h` so that the framework
could access the necessary OpenMP functions.
- As @melven reported (#873), this causes issues when `blis.h` is included in C++ code since
the `<omp.h>` include happens with `extern "C"`.
- Move the include from the header to the necessary .c files so that it does not "pollute" `blis.h`.
(cherry picked from commit 843a5e8d394d126ed370da523d2c09d7e12b582d)
commit 05094eaba978fe14c573ccd37dac3379251e6c59
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 15:38:15 2025 -0500
Apply temporary fix for gcc 15. (#874)
Details:
- As reported in #845, gcc 15 fails to build the haswell
gemmsup kernels due to the use of rbp.
- As a temporary fix, disable slp-tree-vectorization in just
the affected files.
- Thanks @loveshack for reporting and @chillenb for the suggested
fix.
- Eventually, the kernels should be rewritten to avoid using rbp.
(cherry picked from commit 36effd70b6a323856d98b17dda9cc3afd181b658)
commit 1a6809e4e5320e5f52dda768cc6c6e2faa44dbf1
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 24 14:55:38 2025 -0500
Update ReleaseNotes.md.
commit 650b450b988fb0a467ebacdf8dede2b6cc6eb3ab
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jun 23 17:41:01 2025 -0500
CHANGELOG update (2.0)
commit b6a0372e016d69294493b381921357efbdf89674
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jun 23 17:41:00 2025 -0500
Version file update (2.0)
commit abc38a9ca341069bfe224bd190fa9ea712c32e54
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jun 23 17:40:12 2025 -0500
Update ReleaseNotes.md.
commit 17ffb7212bebae92991e090f40ddb7e3c8cc1354
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Jun 21 20:00:21 2025 -0500
Update license and related info.
(cherry picked from commit cb0da3e6e851c0c6b1896812a6e986f4a6a11f4a)
commit bf7a93fa3389372fb0a1611b573a68ffb96a5786
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jun 20 18:04:46 2025 -0500
Fix an issue with 1m and negative strides. (#871)
Details:
- In the reference `gemm1m` kernel, the code checks if the matrix
is "preferentially stored", meaning columns (rows) are contiguous
if the real-domain microkernel is column- (row-)preferential.
- However, `bli_is_preferentially_stored` checks for contiguity
based on the *absolute value* of the row and column strides,
such that a row or column stride of -1 is indicated as
preferentially stored.
- Passing the stride of -1 to the real-domain kernel then essentially
causes elements along each row or column to be written in opposite
order. This causes problems for 1m because a) imaginary elements
are written before real elements, and b) the provided pointer
points to the last *complex* element, which is one real element
too low and can then cause an out-of-bounds error when the last
real-domain element is written to an address preceding the row
or column storage.
- This commit adds a check for positivity of `rs_c` and `cs_c`
in `bli_gemm1m_ref` and `bli_gemm_ccr_ref` in order to pass
through directly to the real-domain microkernel. Technically,
only a -1 stride along the preferential storage direction will
lead to the errors noted above, but who know what other bad
things might happen for other negative strides (and god forbid
you put in a stride of 0...).
(cherry picked from commit 028c5172be2994cf2dc9daf31b20c46455d4c36e)
commit 965c667b2b7c7b229c5591a3e60b7ef51b21e6d1
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jun 11 12:36:49 2025 -0500
Allow runtime configuration selection by name. (#870)
Allow runtime configuration selection by name.
Details:
- The `BLIS_ARCH_TYPE` environment variable currently only allows
numerical values, which requires reading the source to select the
appropriate value (and these values can change over time).
- Implement selection by name (case insensitive), based on the names
returned by `blis_arch_string` (typically the same as the folder name
in the `config` directory).
(cherry picked from commit 5718de15ead2c3fdc5df63fb3159c0c6bb63b3eb)
commit b33cf051f5c65738f898bc68b5ce007d038c823d
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jun 9 14:26:34 2025 -0500
Fix the thread info node used in packing.
Details:
- Previously, the packing kernel used the wrong thread info
node for packing, specifically, it used the node intended for
the GEMM kernel. Normally this is OK since there is no additional
thread partitioning between packing and the kernel. However,
for some external applications, additional data needed to be
allocated on the GEMM thread info node which conflicted with
the packing buffer.
- This commit uses the correct (parent) thread info node
during packing.
(cherry picked from commit 3e3355a4cffeccc17c5fbedb1e2144d6ad22e24d)
commit 9c941be53db3ae5e6cc3257e43f4e939220086f2
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Jun 7 14:13:18 2025 -0500
CHANGELOG update (2.0-rc1)
commit 43c8b0085459c86b9bf7614dae399429e1865475
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Jun 7 14:13:18 2025 -0500
Version file update (2.0-rc1)
commit 4e2f8f9071681a51abe7438ef06c884afeca59ec
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jan 17 13:54:40 2025 -0600
Update release instructions. (#837)
Details:
- Rename `RELEASING` to `RELEASING.md`.
- Add additional structure and Markdown notation to `RELEASING.md`.
- Add a section on the overall release and branching strategy.
- Clarify and tweak instructions for making release candidates and releases.
- Add instructions for making point releaases and back-porting bug fixes.
- Rename `build/start-new-rc.sh` to `build/do-release.sh`.
- Tweak `do-release.sh` to do only common tasks for rcs, major releases, and point releases.
- Add `-b` option to `do-release.sh` which does a "bare" release without a new branch or tag (for "dev releases" on master).
- Update the version file on `master` to `3.0-dev` to reflect the new guidelines.
(cherry picked from commit fb7ba1da524efa47011d95cfd8a9fee86018fcf0)
commit ac3e46f54c5c4a784644e12481b0a77894415005
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Jun 7 14:09:52 2025 -0500
Update CREDITS and ReleaseNotes.md.
commit 60d8a4a499be9f9a7cacec375a28bf5a8bec7a30
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Jun 7 13:17:15 2025 -0500
Fix arch definitions in CI.
commit af5de5bfd199c0ea8f369d81401ee823733c708c
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Jun 7 12:33:41 2025 -0500
Turn on CI testing for release tracking branches.
(cherry picked from commit 09e77a43651ba2673c3abac23093a595f1b3a920)
commit 1a281e0f8791a0e5c86bfae6afccebe6bd378d3b
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu May 1 10:11:59 2025 -0500
Update CREDITS
[ci skip]
(cherry picked from commit 5097c599b58aecb7f990cc7bd7a5dad688a48df8)
commit a8f3d7efad027d06471a639db6ad85cba47f473c
Author: Atsushi Tatsuma <yoshoku@outlook.com>
Date: Fri May 2 00:09:31 2025 +0900
Fix to prevent is_win flag setting with clang on macOS (#867)
Details:
- In some cases, macOS was improperly detected as Windows due to a builtin preprocessor definition `#define TARGET_OS_WINDOWS 0`.
- Update the detection to specifically look for `#define _WIN32` which more robustly detects Windows.
(cherry picked from commit ec5b57289feaea755ff2eb4ab39511f3dd5879d6)
commit 97a441ada10e2dc9044616cccbc365cac0524062
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date: Mon Apr 7 21:21:45 2025 +0200
Examples: replace all 4.1f printm format by 4.3f (#865)
Details:
- This avoids possible misinterpretation of computation results printed on stdout (thanks Mason McBride for reporting it in #864).
- Also force space for positive numbers to help with alignment.
(cherry picked from commit 5d9e110a2aa58b6e5d131db9131bae0143f22f9f)
commit c3497ed3645284fcd61fda719a47167173767fbd
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Apr 2 12:03:43 2025 -0500
Fix for plugins without explicit optimized kernels.
(cherry picked from commit 53d21cb478801d8e978082da2889e5e67d4221c9)
commit c90ecfb26c6cae4e3f288d74d4ac8d99283f081c
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Mar 2 09:08:35 2025 -0600
Adjust CI testing (#860)
Details:
- Add tests for the `generic` config, including forcing broadcast-A,B which uses a different reference kernel. This uncovered a number of bugs, especially in `trsm`/`gemmtrsm` reference kernels, as well as diagonal packing.
- Move threaded builds into main build and run `make check` once for each enabled backend.
- Fix unused variable warnings in level-0 macros.
- Fix `bli_tbastbbs_mxn` and add `bli_tcompressbbs_mxn`. The latter was missing from the reference `gemmtrsm` microkernel and is needed since the B11 block is accumulated to but, for complex datatypes, the effective imaginary stride is non-unit if B is broadcast packed.
- Run all BLAS tests single-threaded.
(cherry picked from commit 50054a6a7c0561d22720254ab6a9be1199ac10ab)
commit 92bbb13b934558b9cace3a9734a028b117983f63
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Mar 2 08:56:54 2025 -0600
Fix check for SVE instructions which caused problems on Windows. (#859)
* Fix check for SVE instructions which caused problems on Windows.
Details:
- The context intialization for `armsve` was using the HWCAP functionality of Linux to check if SVE instructions are actually available, since these are used to determine the register blocksizes. Naturally, this causes problems on Windows.
- Instead, use functions from `bli_cpuid.c` to check for SVE. On Windows, no check is actually done and SVE is never detected.
- In the case that the user specifically requests the `armsve` config on Windows, only enable this check for the whole `arm64` family and just assume SVE is available otherwise.
* Blacklist armsve on Windows.
(cherry picked from commit 37e52a613a6fec3fe1cde0ca018498a16b28a5dc)
commit 4715e59ebdc4fc710b5f53df4c1bc66374654046
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Mar 2 08:50:37 2025 -0600
Fix problem in `bli_obj_imag_part`. (#861)
Details:
- When adjusting the buffer to point to the first imaginary element, the function `bli_obj_buffer_at_off` was used which includes and currently set offsets, but then `bli_obj_set_buffer` was used which is the offset *before* applying offsets.
- Now a matching `bli_obj_buffer` call is used to avoid any offsets.
(cherry picked from commit 97084c75acd0ed104efc5da4dac0fb38a4a044f1)
commit d3fa776b2d374a072403e8796d5e4570461e683b
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Feb 27 13:48:17 2025 -0600
Add new level-0 macro layer. (#830)
Details:
- Developed by @fgvanzee and @devinamatthews.
- Level-0 scalar macros have moved from a named-based system (e.g. `bli_dcopys( ... )`) to a macro argument-based system (`bli_tcopys( d,d, ... )`).
- All macros are explicitly mixed-type.
- All input and output operands can have a distinct type (precision and/or domain). Unnecessary computations and spurious NaN/Inf propagation are avoided in mixed-domain cases.
- All macros which do math (i.e. not copy/set/etc.) take an additional computational precision.
- Tile-level macros, 1m, broadcast-B, and other extensions are also included.
- All macros should correctly handle aliasing of input and output operands (this needs to be rigorously checked).
- The macros work generically over the defined types -- new types only need limited support (primarily conversion to other types and basic math).
- For code outside of core BLIS (optimized kernels, sandboxes, etc.), a selection of legacy macros have been added which translate to the new level-0 macros. Behavior is unchanged.
- A standalone, templated C++ testsuite for the level-0 macros has been added. It is currently included as part of the CircleCI tests.
- Const-correctness of level-0 macros is also checked.
(cherry picked from commit a014a08189d05f45752f7ac23d8d42a24536fb93)
commit 60119e7d9a85b8f14c45d049e5abd7a7c143e85c
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Feb 19 13:53:39 2025 -0600
Update README.md
Details:
- Add status badge for CircleCI.
- [ci skip]
(cherry picked from commit 3c71737e426f8d567f1324b82609b4a61db670f8)
commit 584a936d6bb4d148304de805e34d9e8f1e50854d
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Feb 19 13:31:19 2025 -0600
Do not use symbol aliases on macOS. (#856)
Details:
- The BLAS/CBLAS function `?gemmtr` is currently implemented as a symbol alias of the already-existing `?gemmt`. This does not work on macOS/Darwin.
- Instead, use a minimal wrapper function which calls the appropriate existing BLAS/CBLAS function.
- Also clean up the CBLAS prototypes a bit.
(cherry picked from commit 14047f62d1fc746cbabe112197cfc1afe526a82a)
commit e5400a74c3e681357031d27eb375b1eef7cae2f6
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Feb 8 13:47:08 2025 -0600
Add CircleCI (#855)
Details:
- This PR adds CircleCI testing in addition to TravisCI and Appveyor.
- All of the same tests as on Travis are run, except that different hardware typically ends up being used (usually Zen on Travis, Xeon Platinum on Circle). This has actually exposed a couple of bugs (see #850 and #852).
- The `travis` directory has been renamed to `ci` as it is now shared.
- Running SDE on CircleCI is a bit problematic because glibc changed how CPUID detection is done. This requires running some architectures with different hardware definition files and forcing a config via `BLIS_ARCH_TYPE`.
(cherry picked from commit 40a52dc0289f27f74a43e886ae14bf11738db169)
commit dffdaae41531ea2160cb21ae172d117dd896f3c4
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Feb 6 23:22:24 2025 -0600
Fix problem with clang-14.0.0 and reference `gemm` ukr. (#854)
Details:
- clang 14.0.0 apparently makes some invalid assumptions about whether
or not the AB microtile is initialized in the `gemm` reference
microkernel. This leads to the "scale by alpha" part doing something
strange (all sorts of random and even NaN values pop up). I do not
know why this only manifested for `ztrsm` on `skx` (in
`zgemm_skx_ref` via `zgemmtrsm_skx_ref`). See #852.
- Aliasing the AB microtile (in the proper datatype) as a pointer to
a raw character array, and then initializing the character array
with `= { 0 }` convinces the compiler to do the right thing.
- The problem did not occur in 14.0.6 or 15.0.7. It may only be a narrow
band of versions which are problematic.
- This commit adds the char array workaround and fixes #852.
(cherry picked from commit 028be422e306986674f7b1d96b99153bf2a6477e)
(cherry picked from commit a0d7f26ba37689d351963c276711e4f51bf99e3e)
commit c091c25d116179619ffd61d9824cba55372b6e52
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Feb 5 16:10:37 2025 -0600
Increase the max size for stack buffers. (#851)
Details:
- See #850 for details on the problem.
- This is a temporary fix which should work for sdcz data types.
- Altra architectures may still not fully work for MP/MD as the stack buffer size is hard-coded.
(cherry picked from commit 5ad37a860b191f905a7ed895280a8057573ae909)
commit 84e2ed773972276426ecf25b2037ec642d1ac15f
Author: M. Zhou <cdluminate@gmail.com>
Date: Wed Feb 5 14:07:01 2025 -0800
Alias *gemmt_ as *gemmtr_ to fix lapack 3.12.1 compatibility. (#849)
Details:
- Alias `?gemmt_` as `?gemmtr_` to fix lapack 3.12.1 compatibility. (Fixes #848)
- Add the `?gemmtr_ `and `cblas_?gemmtr` aliases to symbol list.
- Also alias `cblas_?gemmt` as `cblas_?gemmtr` for lapack 3.12.1 compatibility.
(cherry picked from commit a6f2ce9dd53fbe099650d322fa69b21a3be10fb0)
commit 9e96bcda6a126ff6bcba381bc2add2849cb792aa
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Jan 16 17:07:44 2025 -0600
Create a new type to represent IDs for all kernels, blocksizes, etc.
Details:
- Currently, all enums used to represent built-in kernel IDs, blocksizes, preferences, and operation IDs have a special member equal to `BLIS_VA_END`, which in turn is `(siz_t)-1`. In principle, this would force the underlying type used to represent the enum values to be as wide as `siz_t`, particularly when passed to the variadic function `bli_cntx_set_ukrs` and friends. User-registered kernels IDs and such are of type `siz_t` explicitly. However, gcc (12 and older), clang, and icx pass literal enum constants (e.g. `BLIS_MR`) that are small enough as `int` when 32-bit mode is used (`-m32`). This causes a misalignment of the parameters on the stack and ultimately a segfault. The problem also exists in 64-bit mode with clang and icx and on aarch64 with clang, as parameters far enough down the list to go on the stack do not get the upper 4 bytes initialized.
- This commit introduces a new type `kerid_t` which is always `uint32_t`. This type is used for all kernel, blocksize, preference, and operation IDs (including user-registered ones). It is also used for `BLIS_VA_END`.
- Now all enum values are always passed as 32-bit ints on all architectures.
- Fixes #839.
- [cherry-picked from 32cc0ae3]
commit 19573faaba3baca0a58663eec4ad8716846bc811
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jan 14 18:18:59 2025 -0600
CHANGELOG update (2.0-rc0)
commit 215b76cfd82ce78fdbbd6786a3743ef6ae444f9a
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jan 14 18:18:58 2025 -0600
Version file update (2.0-rc0)
commit 790b995f8f351d0147d3c9d781c3dab549317af1
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jan 14 18:15:40 2025 -0600
ReleaseNotes.md update.
commit 0ef4580a5d459270af6e9ee2971c14bb91315fb8
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jan 14 17:17:36 2025 -0600
Add documentation for plugins (#820)
Add documentation for the plugin system and for modifying the control tree to make custom operations.
Details:
- `docs/PluginHowTo.md` describes in a "tutorial style" how to implement a custom BLAS-like operation by creating a plugin and then modifying the `gemm` control tree to achieve the desired effect.
- Briefly, plugins allow users to add new kernels and associated block sizes/preferences to BLIS without modifying the BLIS source code. User-provided kernels are compiled using the BLIS build system for configured architectures and selected at runtime based on the actual hardware.
- To implement custom operations, users can combine their own kernels (and/or existing BLIS kernels) with a customized control tree, which represents the specific algorithmic steps. Users can customize the kernels to be used for packing and for computation, extra information passed to kernels (e.g. additional parameters or data), block sizes, etc. An API is provided for modifying the default `gemm` control tree (also used for other level-3 operations, except `trsm`).
- [cherry-picked from 5cb70d8e]
commit 1426d6fe5ffb5a90514cf3cf0b248322e1d172bd
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jan 14 17:05:06 2025 -0600
CREDITS file update.
commit b36bc95693091d1777b74eeb14d29ac8e76760a3
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 10 14:48:45 2024 -0500
Fix some aspects of the control tre/plugin infrastructure (#827)
Details:
- Use configure-time variable substitution rather than the PNAME macro
to generate symbol names in plugins. This makes it much easier for
uses to see what names their symbols will have (and to change them
if desired).
- Use 'siz_t' rather than 'ukr_t' for anything dealing with kernel IDs
(and similar for blocksizes and kernel preferences). Because users
can now register new kernels, the values of the IDs for their custom
kernels are no longer enumerated in 'ukr_t', which causes type
conversion problems. This requires also being careful about the type
of BLIS_VA_END and forcing existing enumerations like 'ukr_t' to be
represented using integers of the same width as 'siz_t'.
- Modify the gemm control tree initialization function to indicate
whether or not the operation as a whole was transposed. This is
needed if users have to treat the initial A and B differently in the
control tree, for example in a tensor times matrix operation (if
transposed to matrix times tensor, we need to know which "matrix"
object is now the tensor).
commit 8d9be878b1a59aba401fd0d7b1b24c34526f0e81
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Thu Aug 8 14:41:30 2024 -0500
Flatten cblas.h immediately after blis.h. (#819)
Details:
- Previously, if the user enabled CBLAS via 'configure --enable-cblas'
and then ran 'make', the flattened blis.h header file would be created
immediately, but the flattened cblas.h header file would not be
created until 'make install' was run. This was happening because
nothing in the BLIS build process (except installation) depended on
the flattened cblas.h (whereas *everything* depends on the flattened
blis.h, and therefore it was being created first). This behavior can
be confusing to application developers who could reasonably expect
that the flattened cblas.h header would be available (to inspect or
use) prior to running 'make install'.
- This commit fixes the aforementioned issue by (1) adding cblas.h (if
CBLAS is enabled) as a dependency to all of the build rules for core
framework object files, and (2) making the flattened blis.h a
prerequisite for flattening cblas.h. The upshot is that (1) ensures
that the flattened cblas.h is created around the the same time that
the flattened blis.h is created, and (2) ensures that the two headers
are flattened sequentially (first blis.h and then cblas.h) even when
using 'make -j[n]', which ensures that the output of the two processes
do not comingle.
- Thanks to Jeff Diamond for reporting this issue.
commit a822cb2e22b7ac0c6aec4d477f93301ccf65a296
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Thu Aug 8 13:34:37 2024 -0500
Fixed out-of-bounds read bug in sup haswell ukr. (#824)
Details:
- Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel.
The kernel was erroneously performing an out-of-bounds read whenever
the singleton edge case loop executed (that is, whenever the k
dimension of the millikernel problem was not a multiple of 8). This
OOB error was the result of a copy-paste bug; when developing the
s1x16n function, I started from a copy of the s2x16n function, but
then failed to delete the instruction that reads the second element
of A in the code that handles the PR loop's edge case. Thanks to
@j-bm for reporting this bug in Issue #821 and helping narrow down
the cause to the rax register.
- CREDITS file update.
commit 8820f8f91efd32e38e2995e73323656ef767bbd8
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Tue Jun 25 22:56:23 2024 -0500
Fixed typo in 4158930; variable renames. (#815)
Details:
- Fixed a typo in the "./configure --help" output for the ScaLAPACK
compatibility option implemented in 4158930.
- Trivial variable renames.
commit 31ecf820b9eb3368ad907ae6b192bf7397ebc92c
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Jun 20 18:23:23 2024 -0500
Fix a bug in the piledriver microkernels. (#814)
Details:
- At some point, the piledriver (and bulldozer and excavator)
microkernel tests via SDE had been removed from Travis CI testing.
This PR re-enables them.
- A bug in the piledriver complex gemm microkernels has also been
fixed. The `beta*C` product was not being correctly added to the `A*B`
product before writing back out to memory.
- Fixes #811.
commit 415893066e966159799d96166cadcf9bb5535b1c
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 18 22:03:32 2024 -0500
Add ScaLAPACK compatibility mode. (#813)
Details:
- Add configure options `--enable-scalapack-compat` and `--disabled-scalapack-compat`
(default disabled).
- Add a macro `BLIS_{ENABLE,DISABLE}_SCALAPACK_COMPAT` to bli_config.h.
- This option and macro control any changes to the API necessary to maintain
compatibility with ScaLAPACK. Currently, this only means disabling the complex
versions of `syr`, `syr2`, and `symv`. In the future, other changes could be
controlled by the same flag.
- Complex `syr2` wasn't enabled at the same time that complex `syr` and `symv` were.
This is now corrected.
commit 5cbec6503de335b3b63fa5d4f388fddd3aff2b61
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 4 11:30:22 2024 -0500
Update CREDITS
commit 729c57c15aa50030145ff702626c31839ded3502
Author: AngryLoki <AngryLoki@users.noreply.github.com>
Date: Wed Jun 5 00:28:41 2024 +0800
Fix SyntaxWarning messages from python 3.12 (#809)
Details:
- When using regexes in Python, certain characters need backslash escaping, e.g.:
```python
regex = re.compile( '^[\s]*#include (["<])([\w\.\-/]*)([">])' )
```
However, technically escape sequences like `\s` are not valid and should actually be double-escaped: `\\s`.
Python 3.12 now warns about such escape sequences, and in a later version these warning will be promoted
to errors. See also: https://docs.python.org/dev/whatsnew/3.12.html#other-language-changes. The fix here
is to use Python's "raw strings" to avoid double-escaping. This issue can be checked for all files in the current
directory with the command `python -m compileall -d . -f -q .`
- Thanks to @AngryLoki for the fix.
commit 6d0ab74f6975fdf4d19cee06d946b09b6ca89656
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon May 6 16:02:03 2024 -0500
Updates to README.md section on downloading.
Details:
- Updated the text in README.md in the "How to Download BLIS" section.
The new text no longer recommends that the reader use the 'master'
branch over official releases, as the previous text did. The text was
tweaked since (a) the 'master' branch is now akin to a development
branch, and (b) the reader will no longer forgo bugfixes by sticking
to official releases since we will (going forward) publish bugfix
releases for the most recent version.
commit 01e151a9658cbe07ee0cac8b03fa13fef26df19e
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon May 6 15:37:27 2024 -0500
Updated RELEASING file; fixes to ReleaseNotes.md.
Details:
- Updated RELEASING file to reflect new release protocols, given the
more sophisticated policy of maintaining release candidate branches
separate from 'master' (which is now more akin to a development
branch). Further refinements to this file will likely follow.
- Fixed typos in ReleaseNotes.md. Thanks to Robert van de Geijn for
reporting these.
commit 06dddf1e51ccff70d77ee8cb731c3217e70eb730
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon May 6 13:47:42 2024 -0500
ReleaseNotes.md update.
commit a876918c8c79a1c3d3d95de1f283350b7249b8ae
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon May 6 13:37:48 2024 -0500
CHANGELOG update (1.0)
commit c2af113c7ba6d0dcc128ba36ec6e140d89180cf3
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon May 6 13:37:47 2024 -0500
Version file update (1.0)
commit 5ab286f61525f8ead35ecc258305a5ccd4ee096b
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon May 6 13:14:52 2024 -0500
Added a script to help create new rc branches.
Details:
- Added a new script, build/start-new-rc.sh, which:
1. Updates the version file with a new version string.
2. Commits (locally) the version string update.
3. Updates the CHANGELOG file with the output of 'git log'.
4. Commits (locally) the CHANGLOG file update.
5. Creates a new branch whose name is equal to "<vers>-rc0" where
<vers> is the new version string.
6. Reminds the user to execute some final steps if everything looks
good.
This new script will help in the future when it's time to start a new
release candidate branch/lineage off of 'master'. Note that this
script is based on build/bump-version.sh (which itself may change in
the future due to changes in the way versions/releases will be handled
going forward).
commit cad51491e8a0b306015a5a02881dc2a9b60dd8d9
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Tue Apr 30 16:46:54 2024 -0500
Use "-i auto" by default in test/3 drivers.
Details:
- Request default induced method behavior of BLIS via "-i auto" when
running the standalone performance drivers in test/3 via the runme.sh
script present in that directory. (Previously, the runme.sh script
would use "-i native" by default.) This change was originally intended
for fd1a7e3.
commit fd1a7e3ca9547718aa61c806848099705216182b
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Thu Apr 25 15:00:59 2024 -0500
Allow test/3 drivers to use default ind_t method. (#804)
Details:
- Previously, the standalone performance drivers in test/3 were written
under the assumption that the user would want to explicitly test
either native execution *or* 1m. But because the accompanying runme.sh
script defaults to passing "native" in for the -i command line option
(which explicitly sets the induced method type), running the script
without modification causes the test drivers to use slow reference
microkernels on systems where native complex-domain microkernels are
not registered -- which will yield poor performance for complex-domain
level-3 operations. Furthermore, even if a user was aware of this, the
test drivers did not support any single value for the -i option that
would test BLIS using the library's default behavior -- that is, using
1m on systems where it is needed and native execution on systems that
have native microkernels implemented and registered.
- This commit addresses the aforementioned issue by supporting a new
value for the -i option: "auto". The "auto" value causes the driver
to avoid explicitly setting the induced method altogether, leaving
BLIS's default behavior in place. This "auto" option is also now the
default setting within the runme.sh script. Thanks to Leick Robinson
for finding and reporting this issue.
- Also added support for "nat" as a shorthand for "native", which
the help text already (erroneously) claimed was supported.
commit a49238e6141c96a41aa3c2a4adb0b0663d0b4968
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Apr 24 15:07:18 2024 -0500
Refactor the control tree and other infrastructure (#710)
Details:
1. A "plugin" architecture.
- Users are now able to register new kernels, kernel preferences, and
blocksizes at runtime, directly from user applications.
- Plugins can be created, configured, and built using only an installed
version of BLIS -- no source or source code changes required.
- Plugins support both reference and optimized kernels, as well as
custom configuration-to-kernel-set mappings.
- Building plugins (including reference and relevant optimized kernels)
for enabled architectures or architecture families is automated, as is
linking into the final library.
- The configure script is now installed as 'configure-plugin'. In this
mode, it can be used to initialize a plugin from a template including
optional example code, and prepare a build system for compiling the
plugin into a shared or static library.
- Additional configuration files, templates, and build system components
are also installed to '%prefix%/share/blis'.
- The cntx_t struct now has extensible data structures for holding
kernels, preferences, and blocksizes. These are based on a "stack"
structure which contains a list of fixed-size data blocks. Adding a
new entry (which may require allocating a new block or reallocating
the block pointer array) requires locking, but looking up entries is
lock-free and takes O(1) time.
- Kernels can depend on either 1 or 2 type parameters (e.g.
mixed-precision packing requires 2). The func2_t struct supports
the latter, but can be implicitly cast to func_t if only "diagonal"
entries are needed. The number of type parameters can be inferred from
the kernel ID for type safety.
- Functions have been added to register new kernels, preferences, and
blocksizes with the global kernel structure (gks). This creates
corresponding entries in each allocated context and returns the next
available ID. Plugins use this API to register user kernels, although
the user is responsible for tracking the returned IDs for later
lookup. Setting newly-registered reference kernels, as well as
overriding these with optimized kernels is done in exactly the same
manner as in bli_cntx_init_ref() and bli_cntx_init_<subconfig>().
2. Restructuring of the control and thread control trees.
- The control tree has been substantially restructured to support more
flexibility.
- The "default" control trees for gemm (also used for
hemm/symm/herk/her2k/syrk/syr2k/trmm/trmm3) and trsm are now
represented as a single structure containing all necessary control
tree nodes and parameters.
- An API has been added to modify the default gemm/trsm control trees.
- This same API is used by the framework and packm/gemm/trsm variants
to access specific control tree nodes.
- Users can alternatively create a custom control tree from scratch.
- The blocksizes are now encoded directly in the control tree, rather
than via loop IDs. The logic for adjusting blocksizes for certain
operations has been moved to the control tree initialization.
- Type information is encoded in the control tree to drive proper
selection of packing and computational kernels provided by the user.
- The packing microkernel now receives an opaque "params" struct which
is user-definable and can be used to pass additional information
through the call stack.
- The auxinfo_t struct has been updated with a .params field for
opaque user data as well as the global offsets of the current
microtile.
- The packm and gemm variants can be overridden by the user, and also
receive an opaque params struct via the associated control tree
node.
- The structure-aware packing kernel bli_packm_struc_cxk() is no longer
hard-coded to be called from the default packm variant, but can be
overridden by the user. It also supports mixed-precision/mixed-domain
natively now.
- The thread control tree (thrinfo_t) is now created entirely up-front
by inspecting the control tree. The required number of threads at each
level is encoded in the control tree via loop IDs (actually a bitfield
of loop IDs), although the ordering and number of such IDs is
arbitrary. The logic for adjusting the number of threads at each level
based on operation type (e.g. trmm) is now in the control tree
initialization and expressed by combining loop IDs from multiple
levels into a single level.
- The mem_t object containing the pack buffer pointer has been moved
from the control tree to the thread control tree. NOTE: **The control
tree is now strictly const throughout the operation, and only a
single copy is shared by all threads.**
- The thread control tree node for packing has been changed so that
there is no longer a "fake" node indicating a team of single threads.
Instead, the number of threads and thread IDs in the "normal" thread
control tree node are used. This change has also been made to the
gemmsup thread control tree and packing variants, as well as to the
gemmlike sandbox.
- Parameters controlling packing (e.g. inversion of the diagonal,
direction, schema) are not stored directly in the control tree but in
the opaque params struct. The packing control tree node and its
default params struct are stored together in the "combined"
gemm/trsm control tree structure and initialized as a unit. Users can
update these parameters individually or substitute a custom packm
variant and params struct.
- The "target" and "execution" datatypes has been removed from the obj_t
struct and replaced by type information in the control tree.
- The "sub-node" and "sub-prenode" of a control tree node have been
replaced by an arbitrary number of sub-nodes accessed by index. There
is a hard cap on the number of sub-nodes (currently 2). Sub-nodes are
added during control tree initialization, *after*
creation/initialization of the parent node through an updated API.
- The level-3 thread decorator has been significantly simplified and
directly calls bli_l3_int(). The control tree is created externally,
and it is no longer necessary to alias matrices or set object pack
schemas. Also, the rntm_t passed in may be NULL. Finally, family
and scalar information is no longer needed here.
- bli_l3_int() is now a simple inline function which extracts the next
control tree node and variant and calls it.
- bli_*_front() have been removed and inlined into the expert object
API with significant simplification.
- 1m (or other induced method) no longer uses an alternative cntx_t.
- The .pack_fn/.ker_fn pointers and associated params fields on the
obj_t were removed in favor of the present solution.
3. Overhaul of variable substitution in configure script.
- The configure script has been somewhat re-written to use a
centralized mechanism for substituting variables into build system and
other configuration files.
- All substitution variables go through the same pathway now, which
necessitated some variable naming changes for variables which were
named the same in e.g. Makefile and bli_config.h but with
different definitions.
- CC and CXX variables can now contain spaces, e.g. 'g++ -std=c++17'.
This provides better support for integration with build tooling such
as autotools.
4. Overhaul of packing kernels.
- Previously there were two packing kernels referenced in the cntx_t
structure for MRxk and NRxk shaped micropanels, respectively. These
have now been merged into one kernel which is responsible for packing
any dense rectangular portion of either A or B.
- The packing kernel now receives information about the register
blocksize (cdim_max) and duplication factor (the "broadcast-B"
format, although this can also apply to the A matrix).
- The structure-aware packing kernel (bli_packm_struc_cxk(), which is
now user-overridable) also receives global offsets of the current
micropanel within A or B.
- Explicit kernels for packing the diagonal blocks of
triangular/symmetric/Hermitian matrices have been added to the
cntx_t. This means that the bli_packm_struc_ckx() "kernel" no longer
needs to directly touch data (except to zero out some regions).
- bli_packm_struc_cxk() has also been updated to work only in terms of
fundamental elements (i.e., real datatypes) when computing offsets and
when zeroing data, which greatly simplifies mixed-domain/1m packing.
- bli_packm_scalar() has been updated to better support complex scalars
in mixed-domain operations.
- Pack schemas for PACKED_ROW_PANELS* and PACKED_COL_PANELS* have
been merged into simply PACKED_PANELS*. This reflects the merging of
the packing kernels into a single generic kernel. There were only a
very few places which needed the row/column information and this is
now supplied by alternative means.
- Packing variants always behave "as if" the A matrix were being packed
(i.e. the code assumes packing column-stored row panels). Packing of B
is handled by applying an implicit or explicit transpose before
packing. This change also applies to gemmsup.
5. Improved MD/MP support.
- All level-3 operations (except trsm) now support full
mixed-domain/mixed-precision operation.
- Explicit 1m packing kernels have been added in the cntx_t.
- An explicit 1m microkernel wrapper has been added to the cntx_t.
- An extra packing kernel for the "ro" format has been added, along with
the pack_t enumeration value. This supports the packing for
real*complex -> real, including potential scaling by a complex alpha,
support for structured matrices, etc.
- Extra microkernel wrappers for mixed-domain operations have been added
to support the 'ccr' (and by extension, 'crc'), 'rcc', and 'crr'
cases. Notably this includes full support for general stride storage
and complex alpha/beta.
- Packing kernels and gemm microkernels are now "templated" based on two
type parameters rather than one. For packing this allows direct
optimization of mixed-precision kernels, and for gemm microkernels
this allows direct optimization of mixed-precision without writing to
a temporary buffer. Reference packing kernels are directly
instantiated for all mixes of precisions, while by default
mixed-precision gemm microkernels are supported via a microkernel
wrapper. The "old" way of specifying optimized kernels using a single
type parameter works unchanged.
- alpha and beta are typecast appropriately to the computational or
output datatype, respectively, and **always** to the complex domain.
Scalar typecasting has also been added to gemmsup for safety.
- The gemm macrokernel doesn't have to do any typecasting anymore, as a
microkernel wrapper or optimized mixed-precision/mixed-domain kernel
now handles this.
- 1m and mixed-domain operations now always use a microkernel wrapper,
rather than adjusting parameters in the gemm macrokernel.
- The gemmt macrokernel **does** still have to handle explicit
write-back of microtiles which intersect the diagonal, although
typecasting has already been performed.
- The gemmt_x_ker_var2(), trmm_xx_ker_var2(), and trsm_xx_ker_var2()
functions have been removed. The appropriate macrokernel pointer is
selected during control tree initialization.
- Real domain MR/NR are checked for even-ness based on the gemm
microkernel's row preference in order to guarantee proper 1m and
mixed-domain operation.
- Full range of mixed-domain/mixed-precision functionality tested in the
testsuite ('input.*.mixed').
6. Other changes:
- The build system has been updated to support C++ source files
throughout the framework. While the intent is not to add such files to
BLIS itself, this supports plugins written in C++.
- Many instances of configuration-specific code have been simplified by
introducing an INSERT_GENTCONF macro which instantiates a block of
code for each enabled sub-configuration. The ConfigurationHowTo.md
document has been updated accordingly.
- PASTEMAC?/PASTECH?/PASTEF77? have been removed in favor of
variadic macros which accept any number of arguments (up to a
reasonable limit).
- The INSERT_GENTFUNC* macros have been updated to clean up
mixed-precision and mixed-domain instantiations.
- bli_align_dim_to_mult() has been updated to support rounding either up
or down based on a flag.
- Checking for empty matrices and other early exits (level-3 only) has
been consolidated into a single utility function.
- The auxinfo_t struct is always passed as const.
- The new function bli_obj_alias_submatrix() aliases a matrix while also
resetting the root to NULL, offsets to zero (while adjusting the
buffer), and applying any implicit transpose.
- Level-3 pruning functions now only check matrix structure to see what
to do, not the operation family.
- gemmsup packing has been updated to use the "normal" pack buffer
allocation routines.
- Remove duplicate checks for early return from gemmsup handler.
- bli_determine_blocksize() has been significantly simplified.
- Partitioning packed panels is no longer allowed.
- Added bli_xxsame macros.
- Automated the calculation of info bit shifts and masks based on
predefined bit sizes for various flags. This greatly simplifies
reordering, adding, or removing flags from the info/info2 bitfields.
- Moved more BLIS_NUM_* macros into the corresponding enums as the
last entry so that the value is automatically computed.
- Better const-correctness in some level0 scalar macros.
- Better mixed-precision support in some level0 scalar macros.
- Added a bli_axpbys_mxn() macro.
- bli_thread_range_sub() takes explicit thread ID and number of threads
rather than a thrinfo_t node.
- "De-templated" BLIS gemmlike sandbox (specifically, bls_gemm_bp_var1()
and bls_packm_var1()).
- Combined bls_l3_packm_[ab]() into one function with thin wrappers.
- Deleted bls_packm_var[23]().
- Add a "termination tag" to the testsuite output so that
'make check-blis' can accurately check for successful completion.
- Add a new function to centrally compute FLOPs for level-3 operations
in the testsuite.
commit a316d2c6c33fc1f8f7c58c4210ab203f48349041
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Mar 28 12:52:00 2024 -0500
Fix incorrect commenting of `BLIS_RNTM_INITIALIZER` and `BLIS_OBJECT_INITIALIZER`.
commit 664cc6bc3ea610b4ecea63d78c6024c48f045635
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Mar 26 16:25:17 2024 -0500
Update BLIS_*_INITIALIZER macros for C++ compatibility. (#802)
Details:
- Remove designated initializer syntax. This isn't officially supported
until C++20.
- Arrange initializers in the order in which they are defined in the
struct. Even with standard or extension support for designated
initializers, initializing non-static members out-of-order is an
error in C++.
- Remove the conditional code which uses '-1' as the default value of
the 'pack_buf' member of 'mem_t' in C, but 'BLIS_BUFFER_FOR_GEN_USE'
in C++. Simply use the latter as a common-sense default.
commit 1a8c8180b32cf5988bf9eb5d2f0f8111a729993a
Author: John <50754967+j-bm@users.noreply.github.com>
Date: Thu Feb 15 12:35:10 2024 -0400
Add cpu part codes for various manufacturers and use in the code (#794)
* Add cpu_id symbols for arm v8.
* Add symbols for arm v7.
* Always assume firestorm on Apple aarch64.
* Fixes incorrect usage of model vs. part in some places.
* Fixes #793
---------
Co-authored-by: J <jal@o75snap.localdomain>
commit c382d8bdccc07e22a341fe04960f0cbf4eec083b
Author: Igor Zhuravlov <zhuravlov.ip@ya.ru>
Date: Sun Jan 14 04:03:31 2024 +0000
Fix errors and typos in docs/BLIS*API.md (#791)
Details:
- Fixed errors and unified formatting in docs/BLIS*API.md docs.
commit a72e4569f2a03cc3578c019bf7ce25491a44137d
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Wed Dec 6 18:21:47 2023 -0600
Include bli_config.h before bli_system.h in cblas.h. (#789)
Details:
- Previously, in cblas.h, bli_config.h was being #included *after*
bli_system.h, which meant that the BLIS_ENABLE_SYSTEM macro was
never defined in time for proper OS detection. This bug only
affected cblas.h -- blis.h had been correctly #including
bli_config.h before bli_system.h since fb93d24. Thanks to
Edward Smyth for reporting this bug and suggesting the fix.
commit 1236ddab455ef3a6293ab394ff06b3a19c2913d9
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Sun Dec 3 16:42:34 2023 -0600
Fixed random segfault in test/3 drivers. (#788)
Details:
- Fixed a segfault in the non-gemm test drivers in test/3 that was the
result of sometimes leaving either .n_str or .k_str fields of the
params_t struct uninitialized, depending on the operation in question.
For example, in test_hemm.c, init_def_params() would only initialize
the .m_str and .n_str fields, but not the .k_str field. Even though
hemm doesn't use a 'k' dimension, the proc_params() function (called
via parse_cl_params()) universally attempts to convert all three into
integers via sscanf(), which was understandably failing when one of
those strings was a NULL pointer. I'm not sure how this code ever
worked to begin with. Special thanks to Leick Robinson for finding and
reporting this bug.
commit 141a6c9a8e7557d9c7d28aecedec9dc5377dba13
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Tue Nov 21 12:26:43 2023 -0600
Install helper headers to INCDIR prefix. (#787)
Details:
- Install one-line headers to INCDIR whose entire purpose is to
#include the actual headers within the local 'blis' header directory
so that applications can #include "blis.h" instead of #include
<blis/blis.h> (and/or "cblas.h" instead of <blis/cblas.h> if CBLAS is
enabled) when headers are installed to global paths. (Note that
INCDIR is the installation prefix for headers as specified by
'--includedir=INCDIR', which defaults to 'PREFIX/include' if not
specified.) Not sure how this problem went unreported for so long,
since presumably any user trying to #include "blis.h" from a global
installation would have encountered a compiler error.
- The one-line blis.h and cblas.h headers now reside in the 'build'
directory, ready to install as is.
- Thanks to to Jed Brown for reporting this via Issue #786, and for
Devin Matthews and Mo Zhou for their engagement.
- Harmonized the rule in the top-level Makefile for installing blis.pc
into SHAREDIR/pkgconfig with conventions for others vis-a-vis
verbosity/non-verbosity.
commit 2d9439298b336aa6d0ee000a5285a3adb4e6d462
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Nov 21 12:18:07 2023 -0600
Allow users to defines [sd]complex using std::complex (#784)
Details:
- In C++ applications, it makes a lot of sense to interface to BLIS
using C++'s standard complex number library, which uses a template
class std::complex. Obviously BLIS doesn't know anything about this
and defaults to a custom struct to represent complex numbers. This PR
updates the bli_[cz]{real,imag}() functions to accept std::complex
numbers when a C++ compiler is being used. Note that this has no
effect on the compilation of the BLIS library (or testsuite), and only
comes into play when including blis.h into a C++ project and forcing
the use of std::complex for scomplex and dcomplex.
- The application can explicitly request std:complex-based types via:
#define BLIS_ENABLE_STD_COMPLEX
#include <blis.h>
// Call BLIS functions using std::complex<double> here.
- Fixed a bug in the definition of some scalar level-0 macros, since
bli_creal()/bli_cimag() and bli_zreal()/bli_zimag() are no longer
interchangeable.
commit f7ce54a252028483e4c6af619015eb22063d5541
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Fri Nov 3 15:52:57 2023 -0500
CREDITS file update.
commit 05388ddb66f8bf2d62009b162d64bf2d99226b83
Author: Aaron Hutchinson <113382047+Aaron-Hutchinson@users.noreply.github.com>
Date: Fri Nov 3 13:30:31 2023 -0700
Added 'sifive_x280' subconfig, kernel set. (#737)
Details:
- Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V
instruction set architecture. The subconfig registers kernels from a
correspondingly new kernel set, also named 'sifive_x280'.
- Added the aforementioned kernel set, which includes intrinsics- and
assembly-based implementations of most level-1v kernels along with
level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm,
gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files).
- Registered the 'sifive_x280' subconfig as belonging to a singleton
family by the same name.
- Added an entry to '.travis.yml' to test the new subconfig via qemu.
- Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280'
subconfig and to reflect updated tarball names.
- Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz
for their engagement on this commit.
commit 7a87e57b69d697a9b06231a5c0423c00fa375dc1
Author: Srinivas Yadav <43375352+srinivasyadav18@users.noreply.github.com>
Date: Sat Oct 14 02:05:41 2023 -0500
Fixed HPX barrier synchronization (#783)
Details:
- Fixed hpx barrier synchronization. HPX was hanging on larger cores
because blis was using non-hpx synchronization primitives. But when
using hpx-runtime only hpx-synchronization primitives should be used.
Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx
barrier operations.
- Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with
hpx::barrier on n_threads greater than actual hardware thread count
causes synchronization issues making hpx hanging. This can be avoided
by using hpx::futures, which are relatively very lightweight, robust
and scalable.
commit 8fff1e31da1c87e46cacec112b0ac280ab47cd8b
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Thu Oct 12 15:51:41 2023 -0500
Fixed bug in sup threshold registration. (#782)
Details:
- Fixed a bug that resulted in BLIS non-deterministically calling the
gemmsup handler, irrespective of the thresholds that are registered
via bli_cntx_set_blkszs().
- Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup
thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no
operation ever matched the criteria for gemmsup (unless specific sup
thresholds are registered). HOWEVER, these thresholds are set via
bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was
only coping the thresholds into the gks' cntx_t if the values were
strictly positive. Thus, the zero values passed into
bli_cntx_set_blkszs() were being ignored and those threshold slots
within the gks were left uninitialized. The upshot of this is that the
reference gemmsup handler was being called for gemm problems
essentially at random (and as it turns out, very rarely the reference
gemmsup implementation would encounter a divide-by-zero error).
- The problem was fixed by changing bli_blksz_copy_if_pos() so that it
copies values that are non-negative (values >= 0 instead of > 0). The
function was also renamed to bli_blksz_copy_if_nonneg()
- Also needed to standardize use of -1 as the sole value to embed into
blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register
a value for that slot (and instead let whatever existing values
remain). This required updates to the bli_cntx_init_*() functions for
bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some
of these codes were using 0 instead of -1.
- Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and
proposing a fix for this issue.
commit 1e264a42474b535431768ef925bbd518412d392e
Author: Abhishek Bagusetty <59661409+abagusetty@users.noreply.github.com>
Date: Mon Oct 2 18:29:46 2023 -0500
Update zen3 subconfig to support NVHPC compilers. (#779)
Details:
- Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file.
- Minor refactor to accommodate above edit.
- CREDITS file update.
commit c2099ed2519dcac8ee421faf999b36e1c2260be7
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Mon Oct 2 14:56:48 2023 -0500
Fixed brokenness when sba is disabled. (#777)
Details:
- Previously, disabling the sba via --disable-sba-pools resulted in a
segfault due to a sanity-check-triggering abort(). The problem was
that the sba, as currently used in the l3 thread decorators, did not
yet (fully) support pools being disabled. The solution entailed
creating wrapper function, bli_sba_array_elem(), which either calls
bli_apool_array_elem() (when sba pools are enabled at configure time)
or returns a NULL sba_pool pointer (when sba pools are disabled), and
calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note
that the NULL pointer returned by bli_sba_array_elem() when the sba
pools are disabled does no harm since in that situation the pointer
goes unreferenced when acquiring and releasing small blocks. Thanks to
John Mather for reporting this bug.
- Guarded the bodies of bli_sba_init() and bli_sba_finalize() with
#ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary
to fix the aforementioned bug, but it seems like good practice.
- Moved the code in bli_l3_thrinfo_create() that checked that the array*
pointer is non-NULL before calling bli_sba_array_elem() (previously
bli_apool_array_elem()) into the definition of bli_sba_array_elem().
- Renamed various instances of 'pool' variables and function parameters
to 'sba_pool' to emphasize what kind of pool it represents.
- Whitespace changes.
commit 37ca4fd168525a71937d16aaf6a13c0de5b4daef
Author: Field G. Van Zee <fgvanzee@gmail.com>
Date: Thu Sep 28 16:37:57 2023 -0500
Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)
Details:
- Expanded existing BLAS compatibility APIs to provide interfaces to
[cz]symv_(), [cz]syr_(). This was easy since those operations were
already implemented natively in BLIS; the APIs were previously
omitted only because they were not formally part of the BLAS.
- Implemented [cz]rot_() by feeding code from LAPACK 3.11 through
f2c.
- Thanks to James Foster for pointing out that LAPACK contains these
additional symbols, which prompted these additions, as well as for
testing the [cz]rot_() functions from Julia's test infrastructure.
- CREDITS file update.
commit 6f412204004666abac266409a203cb635efbabf3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 26 18:00:54 2023 -0500
Added 'altra', 'altramax' subconfigs. (#775)
Details:
- Forward-ported 'altra' and 'altramax' subconfigurations from the
older 'stable' branch lineage [1]. These subconfigs primarily target
the Ampere Altra and AltraMax (ARM) processors. They also contain
"QuickStart" directories with information and scripts to help
use BLIS on these microarchitectures. Thanks to Jeff Diamond and
Leick Robinson for developing these subconfigs and resources.
- Updated kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c according to
changes in the 'stable' lineage, mostly related to re-enabling of
assembly code branches that target general stride IO.
[1] Note that the 'stable' branch is being used to make sure that more
recent commits do not introduce unreasonable performance
regressions. As such, the name should be interpreted as shorthand
for "performance stable," not "API stable."
commit a4a63295b96ed5b32f4df6477d24db07bf431202
Author: Srinivas Yadav <43375352+srinivasyadav18@users.noreply.github.com>
Date: Tue Sep 26 17:58:38 2023 -0500
Fixes to HPC runtime code path. (#773)
Details:
- Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX
runtime was initialized using hpx::start, but the hpx::for_each
function was being called on a non-hpx runtime (i.e standard BLIS
runtime - single main thread). To run hpx::for_each on HPX runtime
correctly, the code now uses hpx::run_as_hpx_thread(func, args...).
- Replaced hpx::for_each with hpx::for_loop, which eliminates use of
hpx::util::counting_iterator.
- Employ hpx::execution::chunk_size(1) to make sure that a thread
resides on a particular core.
- Replaced hpx::apply() with updated version hpx::post().
- Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread
and is needed for writing results to output file.
- By default, if not specified, the HPX runtime uses all N threads/cores
available in the system. But, if we want to only specify n_threads out
N threads, we use hpx::execution::experimental::num_cores(n_threads).
commit c6546c1131b1ddd45ef13f9f2b620ce2e955dbf8
Author: John Mather <54645798+jmather-sesi@users.noreply.github.com>
Date: Wed Sep 20 13:41:07 2023 -0400
Fixed broken link in Multithreading.md. (#774)
Details:
- Replaced 404'd link in docs/Multithreading.md with an archive from
The Wayback Machine.
- CREDITS file update.
commit 6dcf7666eff14348e82fbc2750be4b199321e1b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 27 14:18:57 2023 -0500
Revamped bli_init() to use TLS where feasible. (#767)
Details:
- Revamped bli_init_apis() and bli_finalize_apis() to use separate
bli_pthread_switch_t objects for each of the five sub-API init
functions, with the objects for the 'ind' and 'rntm' sub-APIs being
declared with BLIS_THREAD_LOCAL. This allows some APIs to be treated
as thread-local and the rest as thread-shared. Thanks to Edward Smyth
for requesting application thread-specific rntm_t structs, which
inspired these change.
- Combined bli_thread_init_from_env() and bli_pack_init_from_env() into
a new function, bli_rntm_init_rntm_from_env(), and placed the combined
code in bli_rntm.c inside of a new bli_rntm_init() function. Then
removed the (now empty) bli_pack_init() and _finalize() function defs.
- Deprecated bli_rntm_init() for the purposes of initializing a rntm_t
(temporarily preserving it as bli_rntm_clear() in a cpp-undefined code
block) so that the function name could be used for the aforementioned
bli_rntm_init() function.
- Updated libblis_test_pobj_create() in test_libblis.c to use a static
rntm_t initializer instead of the deprecated bli_rntm_init()
function-based option.
- Minor updates to docs/Multithreading.md, including removal of
bli_rntm_init() in the example of how to initialize rntm_t structs.
- Changed the return value of bli_gks_init(), bli_ind_init(),
bli_memsys_init(), bli_thread_init(), and bli_rntm_init() (and their
finalize() counterparts) from 'void' to 'int' so that those functions
match the function type expected by bli_pthread_switch_on()/_off().
Those init/finalize functions now return 0 to indicate success, which
is needed so that the switch actually changes state from off to on
and vice versa.
- Defined bli_thread_reset(), which copies the contents of the
global_rntm_at_init() struct into the global_rntm struct (for the
current application thread).
- Guard calls to bli_pthread_mutex_lock()/_unlock() in
- bli_pack_set_pack_a() and _pack_b()
- bli_rntm_init_from_global()
- bli_thread_set_ways()
- bli_thread_set_num_threads()
- bli_thread_set_thread_impl()
- bli_thread_reset()
- bli_l3_ind_oper_set_enable()
with #ifdef BLIS_DISABLE_TLS (since TLS precludes the possibility of
race conditions).
- In frame/base/bli_rntm.c, declare global_rntm, global_rntm_at_init,
and global_rntm_mutex as BLIS_THREAD_LOCAL so that separate
application threads can change the number of ways of BLIS parallelism
independently from one another.
- Access global_rntm only via a new private (not exported) function,
bli_global_rntm(). Defined a similar function for a rntm_t new to
this commit, global_rntm_at_init, which preserves the state of the
global rntm at initialization-time.
- In frame/3/bli_l3_ind.c, added a guard to the declaration of the
static variable oper_st_mutex with #ifdef BLIS_DISABLE_TLS so that the
mutex is omitted altogether when TLS is enabled (which prevents the
compiler from warning about an unused variable).
- Removed redundant code from bli_thread.c:
#ifdef BLIS_ENABLE_HPX
#include "bli_thread_hpx.h"
#endif
since this code is already present in bli_thread.h.
- Thanks to Minh Quan Ho for his review of and feedback on this commit.
- Comment updates.
commit fa6a9b24ae2ddbd5f30f657d46004843581c768c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 19 12:44:34 2023 -0500
Fixed error when using common.mk from testsuite. (#768)
Details:
- Commit 2db31e0 (#755) inserted logic into common.mk that attempts to
preprocess build/detect/android/bionic.h to determine whether the
__BIONIC__ macro is defined (in which case -lrt should not be included
in LDFLAGS). However, the path to bionic.h was encoded without regard
to DIST_PATH, and so utilizing common.mk anywhere that isn't the top-
level directory (such as in the testsuite directory) resulted in a
compiler error:
gcc: error: build/detect/android/bionic.h: No such file or directory
gcc: fatal error: no input files
compilation terminated.
This commit adds a $(DIST_PATH) prefix to the path to bionic.h so that
it can be located from other applications' Makefiles that use BLIS's
makefile fragments.
commit 634e532c8dcce7383d96ba33276df65c656b2198
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 9 21:54:49 2023 -0500
Set thrcomm timpl_t id inside init functions. (#766)
Details:
- Previously, the timpl_t id being used when a thrcomm_t is being
initialized was set within the bli_thrcomm_init() dispatch function
after the timpl_t-specific bli_thrcomm_init_*() function returned. But
it just occurred to me that each bli_thrcomm_init_*() function already
intrinsically knows its own timpl_t value. This commit shifts the
setting of the thrcomm_t.ti field into the corresponding
bli_thrcomm_init_*() function for each timpl_t type (e.g. single,
openmp, pthreads, hpx).
- Removed long-deprecated code dating back nearly 10 years.
- Whitespace changes
- Comment updates.
commit 3cf17b4a91232709bc6a205b0e4d7ecc96579aa9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 7 13:46:20 2023 -0500
Small fixes/improvements to docs/Multithreading.md. (#764)
Details:
- Added reminders that #include "blis.h" must be added to source files
in order to access BLIS API function prototypes. Thanks to Barry Smith
for suggesting this improvement.
- Fixed pre-existing typos.
- CREDITS file update.
commit dbc79812c390f812c7bf030bfcf87e947a1443c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 28 18:16:38 2023 -0500
CREDITS file update.
Details:
- Thanks to Igor Zhuravlov for PR #753 (commit 915daaa).
commit 915daaa43cd189c86d93d72cd249714f126e9425
Author: Igor Zhuravlov <zhuravlov.ip@ya.ru>
Date: Thu Jul 27 20:33:59 2023 +0000
Fix typos in docs + example code comments. (#753)
Details:
- Fixed various typos in API documentation in docs/BLIS*API.md and
comments in the source code examples within examples/?api/*.c.
commit 2db31e057e7e9c97fc60021b5ae72a01a48d7588
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Thu Jul 27 15:27:21 2023 -0500
Exclude -lrt on Android with Bionic libraries. (#755)
Details:
- Added build/detect/android/bionic.h header to test whether the
__BIONIC__ cpp macro is defined.
- In common.mk, only add -lrt to LDFLAGS when Bionic is not present.
- CREDITS file update.
commit 22ad8c1b752364784f320168b31995945ad84a59
Author: ct-clmsn <ct.clmsn@gmail.com>
Date: Thu Jul 27 16:23:29 2023 -0400
Small fixes to support hpx in the testsuite (#759)
Details:
- Minor changes to test_libblis.c to support hpx.
commit c91b41d022e33da82b3b06c82be047a29873d9b6
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Wed Jul 26 14:37:08 2023 -0500
Auto-detect the RISC-V ABI of the compiler and use -mabi= during RISC-V Builds (#750)
Details:
- Generate a build error if there is a 32/64-bit mismatch between the
RISC-V ABI or architecture and the BLIS configuration selected.
- Handle Q, Zicsr, ZiFencei, Zba, Zbb, Zbc, Zbs and Zfh extensions in
the RISC-V architecture auto-detection. ZiFencei and Zicsr is not
detectable with built-in RISC-V macros right now.
- ZiFencei is not important for BLIS because doesn't it have
Just-In-Time compilation or self-modifying code, and Zicsr is implied
by the floating-point extensions, which are required for good
performance in BLIS.
- Move RISC-V autodetect header files to build/detect/riscv/.
commit a0b04e3c007f1207e5678bf20c07752906742fb7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 26 17:59:21 2023 -0500
Rewrote regen-symbols.sh (gen-libblis-symbols.sh). (#751)
Details:
- Wrote an alternative to regen-symbols.sh, gen-libblis-symbols.sh,
that generates a list of exported symbols from the monolithic blis.h
file rather than peeking inside of the shared object via nm. (This new
script lives in the 'build' directory and the older script has been
retired to build/old.) Special thanks to Devin Matthews for authoring
gen-libblis-symbols.sh.
- Added a 'symbols' target to the top-level Makefile which will refresh
build/libblis-symbols.def, with supporting changes to common.mk.
- Updates to build/libblis-symbols.def using the new symbol-generating
script.
commit 6b894c30b9bb2c2518848d74e4c8d96844f77f24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 12 17:22:44 2023 -0500
Rewrote/fixed broken tree barrier implementation.
Details:
- Rewrote the defintion of bli_thrcomm_tree_barrier() so that it (a)
actually worked again, and (b) used atomics instead of a basic C99
spin loop. (Note that the conventional barrier implementation is
still enabled by default; the tree barrier must be toggled on
manually within the configuration.)
- Added an early return to the definition of bli_thrcomm_barrier() in
the cases where comm == NULL or comm->n_threads == 1.
- Reordered thread-related and thread-dependent header #include
directives in blis.h so that the BLIS_TREE_BARRIER and
BLIS_TREE_BARRIER_ARITY macros, which would be defined in the target
configuration's in the bli_family_*.h file, would be #included prior
to the inclusion of the thrcomm_t header that uses them.
- Changed the type of barrier_t.count from 'int' to 'dim_t'.
- Changed the type of barrier_t.signal from 'volatile int' to 'gint_t'.
- Special thanks to Leick Robinson for contributing these changes.
- Whitespace changes.
commit d639554894b6252a86bd3164921bce6fbb9e3b5e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 7 16:11:14 2023 -0500
Pad thrcomm_t fields to avoid false sharing.
Details:
- Inserted a cache line of padding between various fields of the
thrcomm_t and, in the case of the (presently defunct) tree barrier,
fields of the barrier_t. This additional padding ensures that these
fields, which both serve different purposes when performing a thread
barrier, are only accessed when needed (and not just due to their
spatial locality with their cache line neighbors).
- Added a new cpp macro constant, BLIS_CACHE_LINE_SIZE, to
bli_config_macro_defs. This new constant defines the size of a cache
line (in bytes) and defaults to 64.
- Special thanks to Leick Robinson for discovering this false sharing
issue and developing/submitting the patch.
commit 89b7863fc9a88903917deedc6a5ad9fd17f83713
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon May 8 16:51:18 2023 -0500
Fix 1m enablement for herk/her2k/syrk/syr2k. (#743)
Details:
- Ever since 28b0982, herk, her2k, syrk, and syr2k have been implemented
in terms of the gemmt expert API. And since the decision of which
induced method to use (1m or native) is made *below* the level of the
expert API, executing any of {herk,her2k,syrk,syr2k} results in BLIS
checking the enablement status for gemmt.
- This commit applies a band-aid of sorts to this issue by modifying
bli_l3_ind_oper_get_enable() and bli_l3_ind_oper_set_enable() so that
any attempts to query or modify the internal enablement status for
herk, her2k, syrk, or syr2k instead does so for gemmt.
- This solution isn't perfect since, in theory, the user could enable 1m
for, say, herk but then disable it for syrk, and then be confused when
herk runs via native execution. But we don't anticipate that users
modify 1m enablement at the operation level, and so in practice this
solution is likely fine for now.
commit 138de3b3e88c5bf7d8718c45c88811771cf42db8
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date: Sun May 7 13:01:38 2023 -0700
add nvhpc compiler support (#719)
Add detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`, and adjust some warning options in `config.mk`. Currently, no specific options for `nvc` have been added in the relevant configurations so it may not be usable without further tweaks.
commit 0873c0f6ed03fea321d1631b3d1a385a306aa797
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun May 7 14:03:19 2023 -0500
Consolidate INSERT_ macro sets via variadic macros. (#744)
Details:
- Consolidated INSERT_GENTFUNC_* (and corresponding GENTPROT) macro sets
using variadic macros (__VA_ARGS__), which means we no longer need a
different INSERT_ macro for each possible number of arguments the
macro might take. This change seems reasonable given that variadic
macros are a standard C99 feature and widely supported. I took care
not to use variadic macros where 0 variadic arguments are expected
since that is a non-standard extension.
- Added pre-typecast parentheses to arithmetic expressions in printf()
statements in bli_thread_range_tlb.c.
commit ef9d3e6675320a53e7cb477c16b01388e708b1da
Author: h-vetinari <h.vetinari@gmx.com>
Date: Sun May 7 04:59:35 2023 +1100
Added missing #include <io.h> for Windows. (#747)
Details:
- This commit fixes issue #746, in which the _access() function (called
from within blastest/f2c/open.c) is undeclared when compiling on
Windows with clang 16.
commit 6fd9aabb03d172a792a7eeb106c7d965cf038421
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri May 5 14:22:52 2023 -0500
Fix bug in detecting Fortran compiler vendor (#745)
`FC` was used instead of `found_fc`.
commit 8215b02f99aa77ecc7d813508c247565115319d7
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Wed Apr 12 12:59:27 2023 -0500
Apply #738 to make_defs.mk of RISC-V subconfigs. (#740)
Details:
- PR #738 -- which moved -fPIC flag insertion responsibilities from
common.mk to the subconfigs' individual make_defs.mk files -- was
merged shortly before the introduction of new RISC-V subconfigs in
#693. This commit brings those RISC-V subconfigs up to date with the
new -fPIC conventions.
commit 6b38c5ac07a2a27738674784e58aa699bf895447
Author: angsch <17718454+angsch@users.noreply.github.com>
Date: Tue Apr 11 19:27:43 2023 +0200
Add RISC-V target (#693)
Details:
- There are four RISC-V base configurations: 'rv32i', 'rv32iv', 'rv64i',
and 'rv64iv', namely the 32-bit and 64-bit implementations with and
without the 'V' vector extension. Additional extensions such as 'M'
(multiplication), 'A' (atomics), 'F' ('float' hardware support), 'D'
('double' hardware support), and 'C' (compressed-length instructions),
are automatically used when available. If they are not available, then
software equivalents (e.g., softfloat and -latomic) are used.
- './configure auto' can be invoked on a RISC-V build platform, and will
automatically detect RISC-V CPU extensions through the RISC-V C API:
https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md
- The assembly kernels assume the presence of the vector extension
RVV 1.0.
- It is possible to build 'rv[32,64]iv' for any value of VLEN.
However, if VLEN < 128, the targets will fall back to the generic
kernels and blocksizes.
- The vector microkernels are vector-length agnostic and work with
every VLEN >=128, but are expected to work best with smaller vector
lengths, i.e., VLEN <= 512.
- The assembly kernels cover column major storage (rs_c == 1).
- The blocksizes aim at being a good generic choice for out-of-order
cores. They are not tuned to a specific RISC-V HPC core.
- The vector kernels have been tested using vlen={128,256,512}.
- The single- and double-precision assembly code routines for 'sgemm'
and 'dgemm', or for 'cgemm' and 'zgemm', are combined in their RISC-V
vector assembly source code, and are differentiated only with macros.
- The XLEN=32 and XLEN=64 versions of the RISC-V assembly code are
identical, except that callee-saved registers are saved and restored
differently. There are RISC-V assembly code #include files for
handling the saving and restoring of callee-saved registers, and they
are future-proof if ever XLEN=128.
- Multiplications, such as computing array strides and offsets, are
performed in C, and later passed to the RISC-V assembly kernels. This
is so that the compiler can determine whether the 'M' (multiply)
extension is available and use multiplication instructions, or call
library helper functions instead.
- A new macro called bli_static_assert() has been added to perform
static assertions at compile-time, regardless of the C/C++ dialect of
the compiler. The original motivation of this was to ensure that
calling RISC-V assembly kernels would not silently truncate arguments
of type 'dim_t' or 'inc_t' (so-called "narrowing conversions").
- RISC-V CI tests have been added to Travis CI, using the
riscv-gnu-toolchain cross-compiler, and qemu simulator.
- Thanks to Lee Killough for collaborating on this commit.
commit 593d01761910af6a9a16ee0ac097142732f73c29
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 8 16:44:16 2023 -0500
CREDITS file update.
commit 259f68479671bbaf9c5986759aaa0004f9b05a24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 7 16:11:34 2023 -0500
CREDITS file update.
Details:
- Added attributions associated with commits:
- 98d4678 9b1beec: @bartoldeman
- 2b05948 059f151: @ct-clmsn
- Reordered attirubtion for @decandia50.
commit aea8e1d9243631635ca788d5e14f0f29328e637d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 3 12:17:51 2023 -0500
Optionally disable thread-local storage. (#735)
Details:
- Implemented a new configure option, --disable-tls, which allows the
user to optionally disable the use of thread-local storage qualifiers
on static variables in BLIS. This option will rarely be needed, but
in some situations may allow BLIS to compile when TLS is unavailable.
Thanks to Nick Knight for suggesting this option.
- Unlike the --disable-system option, --disable-tls does not forcibly
disable threading. Instead, warnings of the possible consequences of
using threading with TLS disabled are added to:
- the output of './configure --help';
- the output of 'configure' the --disable-tls option is parsed;
- the informational header output by the testsuite.
Thanks to Minh Quan Ho for suggesting these warnings.
- Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is
defined to nothing when BLIS_ENABLE_TLS is not defined.
- Defined bli_info_get_enable_tls(), which returns whether the cpp macro
BLIS_ENABLE_TLS was defined.
- Edited --disable-system configure status output for clarity.
- Whitespace updates.
commit 3f1432abe75cc306ef90a04381d7e0d8739fded8
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Mon Apr 3 12:10:59 2023 -0500
Add output.testsuite to .gitignore (#736)
Details:
- Added `output.testsuite` to .gitignore since it was previously not
being matched by `output.testsuite.*`.
commit 38fc5237520a2f20914a9de8bb14d5999009b3fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 30 17:30:07 2023 -0500
Added mm_algorithm pdf files (bp and pb).
Details:
- Added PDF versions of the PowerPoint files added in 17cd260.
commit 17cd260cb504b2f3997c32daec77f4c828fbb32b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 29 21:47:12 2023 -0500
Added mm_algorithm pptx files (bp and pb).
Details:
- Added two PowerPoint files that contain slides depicting the classic
Goto algorithm for matrix multiplication as well as its sister
"panel-block" algorithm. These files reside in docs/diagrams.
commit 9d778e0f7c94d8752dd578101e4fc6893a1f54ef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 29 17:36:49 2023 -0500
Move -fPIC insertion to subconfigs' make_defs.mk. (#738)
* Move -fPIC insertion to subconfigs' make_defs.mk.
Details:
- Previously, common.mk was appending -fPIC to the CPICFLAGS variables
set within the various subconfigurations' make_defs.mk files. This
seemed somewhat unintuitive, and so now the -fPIC flag is assigned to
the various subconfigs' CPICFLAGS variables in the respective
make_defs.mk files.
- This also commit changes the logic in common.mk so that instead of
appending, the variable is overwritten, but now *only* in the case
of Windows (since apparently -fPIC needs to be omitted there). Thanks
to Nick Knight for catching and reporting this weirdness.
commit 04090df01175477394d1e73af2e5769751d47cd6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 27 14:13:10 2023 -0500
Fixed compile errors with `BLIS_DISABLE_BLAS_DEFS`. (#730)
* Fixed compile errors with BLIS_DISABLE_BLAS_DEFS.
Details:
- This commit fixes a compile-time error related to the type definition
(prototype) of dsdot_() when BLIS_DISABLE_BLAS_DEFS is defined by the
application (or the configuration), which is actually a symptom of a
larger design issue when disabling BLAS prototypes. The macro was
intended to allow applications to bring their own BLAS prototypes and
suppress the inclusion of duplicate (or possibly conflicting)
prototypes within blis.h. However, prototypes are still needed during
compilation even if they are ultimately omitted from blis.h. The
problem is that almost every source file in BLIS--including the BLAS
compatibility layer--only includes one header (blis.h), and if we
were to #include a new header in the BLAS source files (to isolate
only the BLAS prototypes), we would also have to make the build system
aware of the location of those headers. Thanks to Edward Smyth of AMD
for reporting this issue.
- The solution I settled upon was to remove all cpp guards from all BLAS
headers (by changing them to #if 1, for easy search-and-replace
anchoring in the future if we ever need to re-insert guards) and
modifying bli_blas.h so that the BLAS prototypes are #included if
either (a) BLIS_ENABLE_BLAS_DEFS is defined, or (b)
BLIS_ENABLE_BLAS_DEFS is *not* defined but BLIS_IS_BUILDING_LIBRARY
*is* defined. (Thanks to Devin Matthews for steering me away from an
inferior solution.)
- This commit also spins off the actual BLAS prototypes/definitions to
a separate file, bli_blas_defs.h.
- CREDITS file update.
commit 5f841307f668f65b7ed5a479bd8374d2581208cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 24 20:05:13 2023 -0500
Omit -fPIC if shared library build is disabled. (#732)
Details:
- Updated common.mk so that when --disable-shared option is given to
configure:
1. The -fPIC compiler flag is omitted from the individual
configuration family members' CPICFLAGS variables (which are
initialized in each subconfig's make_defs.mk file); and
2. The BUILD_SYMFLAGS variable, which contains compiler flags needed
to control the symbol export behavior, is left blank.
- The net result of these changes is that flags specific to shared
library builds are only used when a shared library is actually
scheduled to be built. Thanks to Nick Knight for reporting this issue.
- CREDITS file update.
commit 72c37eb80f964b7840377076e5009aec5b29d320
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Thu Mar 23 16:01:55 2023 -0500
Updated configure to pass all shellcheck checks. (#729)
Details:
- Modified configure so that it passes all 'shellcheck' checks,
disabling ones which we violate but which are just stylistic, or are
special cases in our code.
- Miscellaneous other minor changes, such as rearranged redirections in
long sed/perl pipes to look more natural.
- Whitespace tweaks.
commit 60f36347c16e6336215cd52b4e5f3c0f96e7c253
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 22 20:37:30 2023 -0600
Fixed bugs in scal2v ref kernel when alpha == 1. (#728)
Details:
- Fixed a typo bug in ref_kernels/1/bli_scal2v_ref.c where the
conditional that was supposed to be checking for cases when alpha is
equal to 1.0 (so that copyv could be used instead of scal2v) was
instead erroneously comparing alpha against 0.0.
- Fixed another bug in the same function whereby BLIS_NO_CONJUGATE was
erroneously being passed into copyv instead of the kernel's conjx
parameter. This second bug was inert, however, due to the first bug
since the "alpha == 0.0" case was already being handled, resulting in
the code block never executing.
commit fab18dca46618799bb0b4f652820b33d36a5d4d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 22 16:50:00 2023 -0600
Use 'void*' datatypes in kernel APIs. (#727)
Details:
- Migrated all kernel APIs to use void* pointers instead of float*,
double*, scomplex*, and dcomplex* pointers. This allows us to define
many fewer kernel function pointer types, which also makes it much
easier to know which function pointer type to use at any given time.
(For example, whereas before there was ?axpyv_ker_ft, ?axpyv_ker_vft,
and axpyv_ker_vft, now there is just axpyv_ker_ft, which is equivalent
so what axpyv_ker_vft used to be.)
- Refactored how kernel function prototypes and kernel function types
are defined so as to reduce redundant code. Specifically, the
function signatures (excluding cntx_t* and, in the case of level-3
microkernels, auxinfo_t*) are defined in new headers named, for
example, bli_l1v_ker_params.h. Those signatures are reused via macro
instantiation when defining both kernel prototypes and kernel function
types. This will hopefully make it a little easier to update, add, and
manage kernel APIs going forward.
- Updated all reference kernels according to the aforementioned switch
to void* pointers.
- Updated all optimzied kernels according to the aforementioned switch
to void* pointers. This sometimes required renaming variables,
inserting typecasting so that pointer arithmetic could continue to
function as intended, and related tweaks.
- Updated sandbox/gemmlike according to the aforementioned switch to
void* pointers.
- Renamed:
- frame/1/bli_l1v_ft_ker.h -> frame/1/bli_l1v_ker_ft.h
- frame/1f/bli_l1f_ft_ker.h -> frame/1f/bli_l1f_ker_ft.h
- frame/1m/bli_l1m_ft_ker.h -> frame/1m/bli_l1m_ker_ft.h
- frame/3/bli_l1m_ft_ukr.h -> frame/3/bli_l1m_ukr_ft.h
- frame/3/bli_l3_sup_ft_ker.h -> frame/3/bli_l3_sup_ker_ft.h
to better align with naming of neighboring files.
- Added the missing "void* params" argument to bli_?packm_struc_cxk() in
frame/1m/packm/bli_packm_struc_cxk.c. This argument is being passed
into the function from bli_packm_blk_var1(), but wasn't being "caught"
by the function definition itself. The function prototype for
bli_?packm_struc_cxk() also needed updating.
- Reordered the last two parameters in bli_?packm_struc_cxk().
(Previously, the "void* params" was passed in after the
"const cntx_t* cntx", although because of the above bug the params
argument wasn't actually present in the function definition.)
commit 93c63d1f469c4650df082d0fa2f29c46db0e25f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 20 11:14:23 2023 -0600
Use 'const' pointers in kernel APIs. (#722)
Details:
- Qualified all input-only data pointers in the various kernel APIs with
the 'const' keyword while also removing 'restrict' from those kernel
APIs. (Use of 'restrict' was maintained in kernel implementations,
where appropriate.) This affected the function pointer types defined
for all of the kernels, their prototypes, and the reference and
optimized kernel definitions' signatures.
- Templatized the definitions of copys_mxn and xpbys_mxn static inline
functions.
- Minor whitespace and style changes (e.g. combining local variable
declaration and initialization into a single statement).
- Removed some unused kernel code left in 'old' directories.
- Thanks to Nisanth M P for helping to validate changes to the power10
microkernels.
commit 4e18cd34f909c5045597f411340ede3a5e0bc5e1
Author: RuQing Xu <ruqing.xu@phys.s.u-tokyo.ac.jp>
Date: Sun Feb 19 04:18:41 2023 +0900
Restored ArmSVE general storage case. (#708)
Details:
- Restored general storage case in armsve kernels.
- Reason for doing this: Though real `g`-storage is difficult to
speedup, `g`-codepath here can provide a good support for
transposed-storage. i.e. at least good for `GEMM_UKR_SETUP_CT_AMBI`.
- By experience, this solution is only *a little* slower than in-reg
transpose. Plus in-reg transpose is only possible for a fixed VL in
our case.
commit 0ba6e9eafb1e667373d9dbc2aa045557921f33e2
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Sat Feb 18 13:15:42 2023 -0600
Refined emacs handling of indentation. (#717)
Details:
- This refines the emacs autoformatting to be better in line with
contribution guidelines.
- Removed a stray shebang in a .mk file which confuses emacs about the
file mode, which should be makefile-mode. (emacs also removes stray
whitespace at the ends of lines.)
commit 059f15105b1643fe56084f883c22b3cadf368b39
Author: ct-clmsn <ct.clmsn@gmail.com>
Date: Sat Feb 18 14:13:23 2023 -0500
Updated hpx namespace for make_count_shape. (#725)
Details:
- The hpx namespace for *counting_shape changed. This PR updates the use
of counting_shape in blis to comply with the change in hpx.
- Co-authored-by: ctaylor <ctaylor@tactcomplabs.com>
commit 0b421eff130b5c896edcc09e7358d18564d177e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Feb 18 13:11:41 2023 -0600
Added an 'arm64' entry to `.travis.yml`. (#726)
Details:
- Added a new 'arm64' entry to the .travis.yml file in an attempt to get
Travis CI to compile both NEON and SVE kernels, even if only NEON
kernels are exercised in the testing. With this new 'arm64' entry, the
'cortexa57' entry becomes redundant and may be removed. Thanks to
RuQing Xu for this suggestion.
- Previously, the macro BLIS_SIMD_MAX_SIZE was *not* being set in
bli_kernels_arm64.h, which meant that the default value of 64 was
being used. This caused a runtime consistency check to fail in
bli_gks.c (in Travis CI), one which requires that
mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE
for all datatype sizes dt_size, where BLIS_STACK_BUF_MAX_SIZE is
defined as
BLIS_SIMD_MAX_NUM_REGISTERS * BLIS_SIMD_MAX_SIZE * 2
This commit increases BLIS_SIMD_MAX_SIZE to 128 for the 'arm64'
configuration, thus overriding the default and (hopefully) avoiding
the aforementioned consistency check failures.
- Appended '|| cat ./output.testsuite' to all 'make' commands in
travis/do_testsuite.sh. Thanks to RuQing Xu for this suggestion.
- Whitespace changes.
commit b1d3fc7e5b0927086e336a23f16ea59aa3611ccb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 10 15:34:47 2023 -0600
Redirect grep stderr to /dev/null. (#723)
Details:
- In common.mk, added a redirection of stderr to /dev/null for the grep
command being used to gather a list of header files #included from
bli_cntx_ref.c. The redirection is desirable because as of grep 3.8,
regular expressions with "stray" backslashes trigger warnings [1].
But removing the backslash seems to break the BLIS build system when
using pre-3.8 versions of grep, so this seems to be easiest way to
satisfy the BLIS build system for both pre- and post-3.8 grep
environments.
[1] https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html
commit e3d352f1fcc93e6a46fde1aa4a7f0a18fb27bd42
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date: Wed Feb 8 06:11:41 2023 +0530
Added runtime selection of 'power' config family. (#718)
Details:
- Created a 'power' umbrella configuration family, which, when targeted
at configure-time, will build both 'power9' and 'power10' subconfigs.
(With this feature, a BLIS shared library could be compiled on a
power9 system and run on power10 and vice-versa. Unoptimised code
will execute if it is linked and run on any other generic system.)
- This new configuration family will only work with gcc, since that is
the only compiler supported by both power9 and power10 subconfigs in
BLIS.
- Documented power9 and power10 as supported microarchitectures in the
docs/HardwareSupport.md document.
commit e730c685d09336b3bd09e86c94330c4eba967f3e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 6 15:31:54 2023 -0600
Define `BLIS_VERSION_STRING` in `blis.h`. (#720)
Details:
- Previously, the version string was communicated from configure to
config.mk (via the config.mk.in template), where it was included via
the top-level Makefile, where it was then used to define the
preprocessor macro BLIS_VERSION_STRING via a command line argument to
the compiler (via -D). This macro is then used within bli_info.c to
initialize a static string which can then be queried via the
bli_info_get_version_str() function. However, there are some
applications that may find utility in being able to access the version
string by inspecting the monolithic (flattened) blis.h header file
that is created at compile time and installed alongside the library.
This commit moves the definition of BLIS_VERSION_STRING into
bli_config.h (via the bli_config.h.in template) so that it is
embedded in blis.h. The version string is now available in three
places:
- the static/shared library, which is installed in the 'lib'
subdirectory of the install prefix (query-able via the
bli_info_get_version_str() function);
- the config.mk makefile fragment, which is installed in the 'share'
subdirectory of the install prefix (in the VERSION variable);
- the blis.h header file, which is installed in the 'include'
subdirectory of the install prefix (via the BLIS_VERSION_STRING
macro constant).
Thanks to Mohsen Aznaveh and Tim Davis for providing the idea for this
change.
- CREDITS file update.
commit dc5d00a6ce0350cd82859d8c24f23d98f205d8db
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Fri Jan 27 17:36:47 2023 -0600
Typecast printf() args to avoid compiler warnings. (#716)
Details:
- In bli_thread_range_tlb.c, typecast integer arguments passed to
printf() -- which are typically disabled unless debugging -- to type
"long" to guarantee a match to the "%ld" format specifiers used in
those calls. This avoids spurious warnings with certain compilers in
certain toolchain environments, such as 32-bit RISC-V (rv32iv).
commit ecbcf4008815035c695822fcaf106477debff89a
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Wed Jan 18 20:35:50 2023 -0600
Use here-document for 'configure --help' output. (#714)
Details:
- Changed the configure script function that outputs "--help" text to do
so via so-called "here-document" syntax for improved readability and
maintainability. The change eliminates hundreds of echo statements and
makes it easier to change existing configure options' help text, along
with other benefits such as eliminating the need to escape double-
quote characters (").
commit c334ec278f5e2a101625629b2e13bbf1b38dede5
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jan 18 13:10:19 2023 -0600
Merge tlb- and slab/rr-specific gemm macrokernels. (#711)
Details:
- Merged the tlb-specific gemm macrokernel (_var2b) with the slab/rr-
specific one (var2) so that a single function can be compiled with
either tlb or slab/rr support, depending on the value of the
BLIS_ENABLE_JRIR_TLB, _SLAB, and _RR. This is done by incorporating
information from both approaches: the start/end/inc for the JR and IR
loops from slab or rr partitioning; and the number of assigned
microtiles, plus the starting IR dimension offset for all iterations
after the first (ir_next). With these changes, slab, rr, and tlb can
all be parameterized by initializing a similar set of variables prior
to the jr loop.
- Removed the wrap-around logic that sets the "b_next" field of the
auxinfo_t struct, which executes during the last IR iteration of the
last JR iteration. The potential benefit of this code is so minor
(and hinges on the microkernel making use of the b_next field) that
it's arguably not worth including. The code also does the wrong
thing for some threads whenever JR_NT > 1, since only thread 0 (in the
JR group) would even compute with the first micropanel of B.
- Re-expressed the definition of bli_is_last_iter_slrr so that slab and
tlb use the same code rather than rr and tlb.
- Adjusted the initialization of the gemm control tree accordingly.
commit 5793a77937aee9847a5692c8e44b36a6380800a1
Author: HarshDave12 <122850830+HarshDave12@users.noreply.github.com>
Date: Tue Jan 17 21:55:02 2023 +0530
Fixed mis-mapped instruction for VEXTRACTF64X2. (#713)
Details:
- This commit fixes a typo in the macro definition for the extended
inline assembly macro VEXTRACTF64X2 in bli_x86_asm_macros.h. The macro
was previously defined (incorrectly) in terms of the vextractf64x4
instruction rather than vextractf64x2.
- CREDITS file update.
commit 16d2e9ea9ca0853197b416eba701b840a8587bca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 13 20:03:01 2023 -0600
Defined lt, lte, gt, gte + misc. other updates. (#712)
Details:
- Changed invertsc operation to be a non-destructive operation; that is,
it now takes separate input and output operands. This change applies
to both the object and typed APIs.
- Defined an alternative square root operation, sqrtrsc, which, when
operating on complex scalars, assumes the imaginary part of the input
to be zero.
- Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym
so that when the source matrix has an implicit unit diagonal, the
operation leaves the diagonal of the destination matrix untouched.
Previously, the operations would interpret an implicit unit diagonal
on the source matrix as a request to manifest the unit diagonal
*explicitly* on output (either as something to copy in the case of
copym, or something to compute with in the cases of addm, subm, axpym,
scal2m, and xpbym). It turns out that this behavior was too cute by
half and could cause unintended headaches for practical use cases.
(This change in behavior also required small modifications to the trmv
and trsv testsuite modules so that they would properly test matrices
with unit diagonals.)
- Added missing dependencies for copym to gemv, ger, hemv, trmv, and
trsv testsuite modules.
- Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in
frame/util, which use lt, lte, gt, and gte level-0 scalar macros.
- Trivial variable rename in bli_part.c to harmonize with other
variable naming conventions.
commit 9a366b14fe52c469f4664ef5dd93d85be8d97baa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 12 13:07:22 2023 -0600
Implement cntx_t pointer caching in gks. (#709)
Details:
- Refactored the gks cntx_t query functions so that: (1) there is a
clearer pattern of similarity between functions that query a native
context and those that query its induced (1m) counterpart; and (2)
queried cntx_t pointers (for both native and induced cntx_t pointers)
are cached (by default), or deep-queried upon each invocation,
depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is defined.
- Refactored query-related functions in bli_arch.c to cache the queried
arch_t value (by default), or deep-query the arch_t value upon each
invocation, depending on whether cpp macro BLIS_ENABLE_GKS_CACHING is
defined.
- Tweaked the behavior of bli_gks_query_ind_cntx_impl() (formerly named
bli_gks_query_ind_cntx()) so that the induced method cntx_t struct is
repopulated each time the function is called. (It is still only
allocated once on first call.) This was mostly done in preparation for
some future in which the arch_t value might change at runtime. In such
a scenario, the induced method context would need to be recalculated
any time the native context changes.
- Added preprocessor logic to bli_config_macro_defs.h to handle enabling
or disabling of cntx_t pointer caching (via BLIS_ENABLE_GKS_CACHING).
- For now, cntx_t pointer caching is enabled by default and does not
correspond to any official configure option. Disabling can be done
by inserting a #define for BLIS_DISABLE_GKS_CACHING into the
appropriate bli_family_*.h header file within the configuration of
interest.
- Thanks to Harihara Sudhan S (AMD) for suggesting that cntxt_t pointers
(and not just arch_t values) be cached.
- Comment updates.
commit b895ec9f1f66fb93972589c06bff171337153a31
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date: Wed Jan 11 09:02:32 2023 +0530
Fixing type-mismatch errors in power10 sandbox (#701)
Details:
- This commit fixes a mismatch between the function type signature of
bli_gemm_ex() required by BLIS and the version of the function defined
within the power10 sandbox. It also performs typecasting upon calling
bli_gemm_front() to attain type consistency with the type signature
defined by BLIS for bli_gemm_front().
commit 38d88d5c131253066cad4f98eea06fa9299cae3b
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jan 10 21:24:58 2023 -0600
Define new global scalar (obj_t) constants. (#703)
Details:
- This commit defines the following new global scalar constants:
- BLIS_ONE_I: This constant encodes the imaginary unit.
- BLIS_MINUS_ONE_I: This constant encodes the negative imaginary unit.
- BLIS_NAN: This constant encodes a not-a-number value. Both real and
imaginary parts are set to NaN for complex datatypes.
commit cdb22b8ffa5b31a0c16ac1a7bcecefeb5216f669
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date: Wed Jan 11 08:50:57 2023 +0530
Disable power10 kernels other than sgemm, dgemm. (#705)
Details:
- There is a power10 sandbox which uses microkernels for datatypes other
than float and double (or scomplex/dcomplex). In a regular power10-
configured build (that is, with the sandbox disabled), there were
compile errors for some of these other non-sgemm/non-dgemm
microkernels. This commit protects those kernels with a new cpp macro
guard (which is defined in sandbox/power10/bli_sandbox.h) that
prevents that kernel code from being compiled for normal, non-sandbox
power10 builds.
commit d220f9c436c0dae409974724d42ab6c52f12a726
Author: Nisanth M P <nisanthmp.01@gmail.com>
Date: Wed Jan 11 08:43:03 2023 +0530
Fix k = 0 edge case in power10 microkernels (#706)
Details:
- When power10 sgemm and dgemm microkernels are called with k = 0, they
become caught in infinite loops and segfault. This is fixed now via an
early exit in the case of k = 0.
commit 2e1ba9d13c23a06a7b6f8bd326af428f7ea68c31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 10 21:05:54 2023 -0600
Tile-level partitioning in jr/ir loops (ex-trsm). (#695)
Details:
- Reimplemented parallelization of the JR loop in gemmt (which is
recycled for herk, her2k, syrk, and syr2k). Previously, the
rectangular region of the current MC x NC panel of C would be
parallelized separately from from the diagonal region of that same
submatrix, with the rectangular portion being assigned to threads via
slab or round-robin (rr) partitioning (as determined at configure-
time) and the diagonal region being assigned via round-robin. This
approach did not work well when extracting lots of parallelism from
the JR loop and was often suboptimal even for smaller degrees of
parallelism. This commit implements tile-level load balancing (tlb) in
which the IR loop is effectively subjugated in service of more
equitably dividing work in the JR loop. This approach is especially
potent for certain situations where the diagonal region of the MC x NR
panel of C are significant relative to the entire region. However, it
also seems to benefit many problem sizes of other level-3 operations
(excluding trsm, which has an inherent algorithmic dependency in the
IR loop that prevents the application of tlb). For now, tlb is
implemented as _var2b.c macrokernels for gemm (which forms the basis
for gemm, hemm, and symm), gemmt (which forms the basis of herk,
her2k, syrk, and syr2k), and trmm (which forms the basis of trmm and
trmm3). Which function pointers (_var2() or _var2b()) are embedded in
the control tree will depend on whether the BLIS_ENABLE_JRIR_TLB cpp
macro is defined, which is controlled by the value passed to the
existing --thread-part-jrir=METHOD (or -r METHOD) configure option.
This script adds 'tlb' as a valid option alongside the previously
supported values of 'slab' and 'rr'. ('slab' is still the default.)
Thanks to Leick Robinson for abstractly inspiring this work, and to
Minh Quan Ho for inquiring (in PR #562, and before that in Issue #437)
about the possibility of improved load balance in macrokernel loops,
and even prototyping what it might look like, long before I fully
understood the problem.
- In bli_thread_range_weighted_sub(), tweaked the the way we compute the
area of the current MC x NC trapezoidal panel of C by better taking
into account the microtile structure along the diagonal. Previously,
it was an underestimate, as it assumed MR = NR = 1 (that is, it
assumed that the microtile column of C that overlapped with microtiles
exactly coincided with the diagonal). Now, we only assume MR = NR.
This is still a slight underestimate when MR != NR, so the additional
area is scaled by 1.5 in a hackish attempt to compensate for this, as
well as other additional effects that are difficult to model (such as
the increased cost of writing to temporary tiles before finally
updating C). The net effect of this better estimation of the
trapezoidal area should be (on average) slightly larger regions
assigned to threads that have little or no overlap with the diagonal
region (and correspondingly slightly smaller regions in the diagonal
region), which we expect will lead to slightly better load balancing
in most situations.
- Spun off the contents of bli_thread.[ch] that relate to computing
thread ranges into one of three source/header file pairs:
- bli_thread_range.[ch], which define functions that are not specific
to the jr/ir loops;
- bli_thread_range_slab_rr.[ch], which define functions that implement
slab or round-robin partitioning for the jr/ir loops;
- bli_thread_range_tlb.[ch], which define functions that implement
tlb for the jr/ir loops.
- Fixed the computation of a_next in the last iteration of the IR loop
in bli_gemmt_l_ker_var2(). Previously, it always "wrapped" back around
to the first micropanel of the current MC x KC packed block of A.
However, this is almost never actually the micropanel that is used
next. A new macro, bli_gemmt_l_wrap_a_upanel(), computes a_next
correctly, with a similarly named bli_gemmt_u_wrap_a_upanel() for use
in the upper-stored case (which *does* actually always choose the
first micropanel of A as its a_next at the end of the IR loop).
- Removed adjustments for a_next/b_next (a2/b2) for the diagonal-
intersecting case of gemmt_l_ker_var2() and the above-diagonal case
of gemmt_u_ker_var2() since these cases will only coincide with the
last iteration of the IR loop in very small problems.
- Defined bli_is_last_iter_l() and bli_is_last_iter_u(), the latter of
which explicitly considers whether the current microtile is the last
tile that intersects the diagonal. (The former does the same, but the
computation coincides with the original bli_is_last_iter().) These
functions are now used in gemmt to test when a_next (or a2) should
"wrap" (as discussed above). Also defined bli_is_last_iter_tlb_l()
and bli_is_last_iter_tlb_u(), which are similar to the aforementioned
functions but are used when employing tlb in gemmt.
- Redefined macros in bli_packm_thrinfo.h, which test whether an
iteration of work is assigned to a thread, as static inline functions
in bli_param_macro_defs.h (and then deleted bli_packm_thrinfo.h).
In the process of redefining these macros, I also renamed them from
bli_packm_my_iter_rr/sl() to bli_is_my_iter_rr/sl().
- Renamed
bli_thread_range_jrir_rr() -> bli_thread_range_rr()
bli_thread_range_jrir_sl() -> bli_thread_range_sl()
bli_thread_range_jrir() -> bli_thread_range_slrr()
- Renamed
bli_is_last_iter() -> bli_is_last_iter_slrr()
- Defined
bli_info_get_thread_jrir_tlb()
and renamed:
- bli_info_get_thread_part_jrir_slab() ->
bli_info_get_thread_jrir_slab()
- bli_info_get_thread_part_jrir_rr() ->
bli_info_get_thread_jrir_rr()
- Modified bli_rntm_set_ways_for_op() to redirect IR loop parallelism
into the JR loop when tlb is enabled for non-trsm level-3 operations.
- Added a sanity check to prevent bli_prune_unref_mparts() from being
used on packed objects. This prohibition is necessary because the
current implementation does not take into account the atomicity of
packed micropanel widths relative to the diagonal of structured
matrices. That is, the function prunes greedily without regard to
whether doing so would prune off part of a micropanel *which has
already been packed* and assigned to a thread for inclusion in the
computation.
- Further restricted early returns in bli_prune_unref_mparts() to
situations where the primary matrix is not only of general structure
but also dense (in terms of its uplo_t value). The addition of the
matrix's dense-ness to the conditional is required because gemmt is
somewhat unusual in that its C matrix has general structure but is
marked as lower- or upper-stored via its uplo_t. By only checking
for general structure, attempts to prune gemmt C matrices would
incorrectly result in early returns, even though that operation
effectively treats the matrix as symmetric (and stored in only one
triangle).
- Fixed a latent bug in bli_thread_range_rr() wherein incorrect ranges
were computed when 1 < bf. Thankfully, this bug was not yet
manifesting since all current invocations used bf == 1.
- Fixed a latent bug in some unexercised code in bli_?gemmt_l_ker_var2()
that would perform incorrect pruning of unreferenced regions above
where the diagonal of a lower-stored matrix intersects the right edge.
Thankfully, the bug was not harming anything since those unreferenced
regions were being pruned prior to the macrokernel.
- Rewrote slab/rr-based gemmt macrokernels so that they no longer carved
C into rectangular and diagonal regions prior to parallelizing each
separately. The new macrokernels use a unified loop structure where
quadratic (slab) partitioning is used.
- Updated all level-3 macrokernels to have a more uniform coding style,
such as wrt combining variable declarations with initializations as
well as the use of const.
- Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and
bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and
bli_thrinfo_thread_id(), respectively. This change probably should
have been included in aeb5f0c.
- Removed old prototypes in bli_gemmt_var.h and bli_trmm_var.h that
corresponded to functions that were removed in aeb5f0c.
- Other very minor cleanups.
- Comment updates.
commit b6735ca26b9d459d9253795dc5841ae8de9e84c9
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jan 6 14:10:01 2023 -0600
Refactor structure awareness in packm_blk_var1.c. (#707)
Details:
- Factored some of the structure awareness out of the loop in
bli_packm_blk_var1(). So instead of having a single loop with
conditionals in the body to handle various kinds of structure (and
stored/unstored submatrix placement), we now have a conditional branch
to handle various structure/storage scenarios with a loop in each
section. This change was originally motivated to choose slab or round-
robin partitioning (in the context of triangular matrices) based on
the structure of the entire block (or panel) being packed rather than
each micropanel individually. Previously, the code would attempt to
limit rr to the portion of the block that intersects the diagonal and
use slab for the remainder. However, that approach was not well-thought
out and in many situations this would lead to inferior load balancing
when compared to using round-robin for the entire block (or panel).
This commit has the added benefit of incurring less overhead during
the packing process now that each of the new loops is simpler.
commit f956b79922da412791e4c8b8b846b3aafc0a5ee0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Dec 31 20:18:08 2022 -0600
Switch to l3 sup decorator in gemmlike sandbox. (#704)
Details:
- Modified the gemmlike sandbox to call bli_l3_sup_thread_decorator()
rather than a local analogue of that code. This reduces redundant
logic and makes it easier for the sandbox to inherit future
improvements to the framework's threading code.
- Moved addon/gemmd to addon/old/gemmd. This code has fallen out of date
and is taking too much effort to maintain. We will very likely
reimplement it completely once future changes are made to the
framework proper.
commit 538150c5845ad903773ca797c740048174116aa4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Dec 25 22:28:09 2022 -0600
Applied race condition fix to sup thread decorator.
Details:
- Applied the race condition bugfix in commit 7d23dc2 to the
corresponding sup code in bli_l3_sup_decor.c. Note that in the case
of sup, the race condition would have only manifested when optional
packing was enabled at runtime (typically via setting BLIS_PACK_A
and/or BLIS_PACK_B environment variables).
- Both the fix in this commit and the fix in 7d23dc2 address bugs
that were introduced when the thrinfo_t trees/communicators were
restructured in the October omnibus commit (aeb5f0c).
commit 7d23dc2a064a371dc9883e2c2c7236a70912428c
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Dec 25 19:09:14 2022 -0600
Fix a race condition which manifested as incorrect results (rarely). (#702)
The problem occurs when there are at least two teams of threads packing different parts of a matrix, and where each team has at least two threads; call them team A and team B. The problematic sequence is:
1. The chief of team A checks out a block B and broadcasts the pointer to its teammates.
2. Team A completely packs their data and perform a barrier amongst themselves.
3. Team A commences computing with the packed data.
4. The chief of team A finishes computing before its teammates, then calls bli_thrinfo_free on its thrinfo_t struct (which contains the mem_t object referencing the buffer B). This causes buffer B to be checked back in to the pba.
5. The chief of team B checks out the *same* block B that was just checked back in and broadcasts the pointer to its teammates.
6. DATA RACE: now the remaining threads of team A are reading *while* team B are writing to the same buffer B. If team A write new data before team B are done computing then an incorrect result is generated.
The solution is to place a global barrier before the call to bli_thrinfo_free at the end of the computation.
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
commit 3accacf57d11e9b109339754f91bf22329b6cb6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 16 10:26:33 2022 -0600
Skip 1m optimization when forcing hemm_l/symm_l. (#697)
Details:
- Fixed a bug in right-sided hemm when:
- using the 1m method,
- #defining BLIS_DISABLE_HEMM_RIGHT in the active subconfiguration,
and
- the storage of C matches the gemm microkernel IO preference PRIOR to
the right-sidedness being detected and recast in terms of the left-
side code path.
It turns out that bli_gemm_ind_recast_1m_params() was applying its
optimization (recasting a complex-domain macrokernel calling a 1m
virtual microkernel to a real-domain macrokernel calling the real-
domain microkernel) in situations in which it should not have. The
optimization was silently assuming that the storage of C always
matched that of the microkernel preference, since the front-end (in
this case, bli_hemm_front()) would have already had a chance to
transpose the operation to bring the two into agreement. However, by
disabling right-sided hemm, we deprive BLIS of that flexibility (as a
transposed left-sided case would necessarily have to become a right-
sided case), and thus the assumption was no longer holding in all
cases. Thanks to Nisanth M P for reporting this bug in Issue #621.
- The aforementioned bug, and its bugfix, also apply to symm when
BLIS_DISABLE_SYMM_RIGHT is defined.
- Comment updates.
- CREDITS file update.
commit 4833ba224eba54df3f349bcb7e188bcc53442449
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 12 20:26:02 2022 -0600
Fixed perf of mt sup with packing, and mt gemmlike. (#696)
Details:
- Brought the gemmsup code path up to date relative to the latest
thrinfo_t semantics introduced in the October Omnibus commit
(aeb5f0c). This was done by passing the prenode (instead of the
current node) into the packm variant within bli_l3_sup_packm.c as well
as creating the prenodes and attaching them to the thrinfo_t tree in
bli_l3_sup_thrinfo_create(). These changes erase the performance
degradation introduced in the omnibus when running multithreaded sup
with optional packing enabled. Special thanks to Devin Matthews for
sussing out this fix in short order.
- Fixed the gemmlike sandbox in a manner similar to that of sup with
packing, described above. This also involved passing the prenode into
the local gemmlike packm variant. (Recall that gemmlike recycles the
use of bli_l3_sup_thrinfo_create(), so it automatically inherits that
part of the sup fix described above.)
- Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and
bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and
bli_thrinfo_thread_id(), respectively.
commit db10dd8e11a12d85017f84455558a82c0093b1da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 29 19:10:31 2022 -0600
Fixed _gemm_small() prototype; disabled gemm_small.
Details:
- Fixed a mismatch between the prototype for bli_gemm_small() in
bli_gemm_front.h and the actual definition of bli_gemm_small() in
kernels/zen/3/bli_gemm_small.c. The former was erroneously declaring
the cntl_t* argument as 'const'. Thanks to Jeff Diamond for reporting
this issue.
- Commented out BLIS_ENABLE_SMALL_MATRIX, BLIS_ENABLE_SMALL_MATRIX_TRSM
macro definitions in config/zen3/bli_family_zen3.h. AMD's small matrix
implementation should probably remain disabled in vanilla BLIS, at
least for now.
commit f0337b784d164ae505ca0e11277a1155680500d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Nov 13 21:36:47 2022 -0600
Trival whitespace/comment tweaks.
Details:
- Trivial whitespace and comment changes, most of which ideally would
have been part of the previous commit pertaining to HPX (2b05948).
commit 2b05948ad2c9785bc53f376d53a7141cbc917447
Author: ct-clmsn <ct.clmsn@gmail.com>
Date: Sun Nov 13 17:40:22 2022 -0500
blis support for hpx (#682)
Implement threading backend via HPX.
HPX is an asynchronous many task runtime system used in high performance computing applications. The runtime implements the ISO C++ parallelism specification and provides a user-space thread implementation.
This PR provides BLIS a thread backend implementation using HPX and resolves feature request #681. The configuration script, makefiles, and testsuite have been updated to support an HPX build option. The addition of HPX support provides other developers an exemplar for integrating other C++ threading backends into BLIS.
Co-authored-by: ctaylor <ctaylor@pennywise.cm.cluster>
Co-authored-by: Devin Matthews <damatthews@smu.edu>
commit e1ea25da43508925e33d4e57e420cfc0a9de793f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 11 12:07:51 2022 -0600
Fixed subtle barrier_fpa bug in bli_thrcomm.c. (#690)
Details:
- In bli_thrcommo.c, correctly initialize the BLIS_OPENMP element of the
barrier function pointer array (barrier_fpa) to NULL when
BLIS_ENABLE_OPENMP is *not* defined. Similarly, initialize the
BLIS_POSIX element of barrier_fpa to NULL when BLIS_ENABLE_PTHREADS is
not enabled. This bug was introduced in a1a5a9b and was likely the
result of an incomplete edit. The effects of the bug would have
likely manifested when querying a thrcomm_t that was initialized with
a timpl_t value corresponding to a threading implementation that was
omitted from the -t option at configure-time.
commit dc6e5f3f5770074ba38554541b8b64711a68c084
Author: leekillough <15950023+leekillough@users.noreply.github.com>
Date: Thu Nov 3 18:33:08 2022 -0500
Enhance emacs formatting of C files to remove trailing whitespace and ensure a newline at the end of file
commit 713d078075a4a563a43d83fd0880ab5091c2e4a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 3 20:00:11 2022 -0500
Delete mpi_test garbage. (#689)
Details:
- tlrmchlsmth: "What even is this? No comments, no commit message, not
used by anything. Trash."
commit 8d813f7f12732d52c95570ae884d5defbfd19234
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 3 19:10:47 2022 -0500
Some decluttering of the top-level directory.
Details:
- Relocated 'mpi_test' directory to test/mpi_test.
- Relocated 'so_version' and 'version' files from top-level directory to
'build' directory.
- Updated build/bump-version.sh script to accommodate relocation of
'version' file to 'build' directory.
- Updated configure script to accommodate relocation of 'so_version'
file to 'build' directory.
- Updated INSTALL file to replace pointers to blis-devel mailing list
with a pointer to docs/Discord.md.
- Updated RELEASING file to contain a reminder to consider whether the
so_version file should be updated prior to the release.
commit 6774bf08c92fc6983706a91bbb93b960e8eef285
Author: Lee Killough <15950023+leekillough@users.noreply.github.com>
Date: Thu Nov 3 15:20:47 2022 -0500
Fix typo in configure --help text. (#686)
Details:
- Fixed a misspelling in the --help description for the --int-size (-i)
configure option.
commit 872898d817f35702e7678ff7f3eeff0f12e641f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 2 21:53:22 2022 -0500
Fixed trmm[3]/trsm performance bug in cf7d616. (#685)
Details:
- Fixed a performance bug in the packing of micropanels that intersect
the diagonal of triangular matrices (i.e., those found in trmm, trmm3,
and trsm). This bug was introduced in cf7d616 and stemmed from an
ill-formed boolean conditional expression in bli_packm_blk_var1().
This conditional would chose when to use round-robin parallel work
allocation, but checked for the triangularity of the submatrix being
packed while failing also to check for whether the current micropanel
actually intersected the diagonal. The net result of this bug was that
*all* micropanels of a triangular matrix, no matter where the upanels
resided within the matrix, were assigned to threads via a round-robin
policy. This affected some microarchitectures and threading
configurations much worse than others, but it seems that overall the
effect was universally negative, likely because of the reduced spatial
locality during the packing with round-robin. Thanks to Leick Robinson
for his tireless efforts in helping track down this issue.
commit edcc2f9940449f7d9cefcfc02159d27b013e7995
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 2 19:04:49 2022 -0500
Support --nosup, --sup configure options. (#684)
Details:
- Added --nosup and --sup as alternative ways of requesting that sup be
disabled or enabled. These are analagous to --disable-sup-handling and
--enable-sup-handling, respectively. (I got tired of typing out
--disable-sup-handling and needed a shorthand notation.)
- Tweaked message output by configure when sup is enable/disabled for
clarity and specificity.
- Whitespace changes.
commit 5eea6ad9eb25f37685d1ae4ae08c73cd1daca297
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 2 17:07:54 2022 -0500
Add mention of Wilkinson Prize to README.md. (#683)
Details:
- Added blurbs and links to Wilkinson Prize to README.md.
- Added mention of both Best Paper and Wilkinson Prizes to the top of
README.md.
- Other minor tweaks.
commit 29f79f030e939969d4f3876c4fdaac7b0c5daa63
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 31 18:57:45 2022 -0500
Fixed performance bug caused by redundant packing. (#680)
Details:
- Fixed a performance bug whereby multiple threads were redundantly
packing the same (rather than separate) micropanels. This bug was
caused by different parts of the code using the num_threads/thread_id
field of the thrinfo_t vs. the n_way/work_id fields. The fix was to
standardize on the latter and provide a "fake" thrinfo_t sub-prenode
in the thrinfo tree which consists of single-member thread teams. The
single team with multiple threads node is still required since it and
only it can be used to perform barriers and broadcasts (e.g. of the
packed buffer pointer).
commit aeb5f0cc19665456e990a7ffccdb09da2e3f504b
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 27 12:39:11 2022 -0500
Omnibus PR - Oct 2023 (#678)
Details:
- This is an "omnibus" commit, consisting of multiple medium-sized
commits that affect non-trivial aspects of BLIS. The major highlights:
- Relocated the pba, sba pool (from the rntm_t), and mem_t (from the
cntl_t) to the thrinfo_t object. This allows the rntm_t to be
effectively const (although it is sometimes copied internally and
modified to reflect different ways of parallelism). Moving the mem_t
sets the stage for sharing a global control tree amongst all
threads.
- De-templatized the macrokernels for gemmt, trmm, and trsm to match
the macrokernel for gemm, which has been de-templatized since
54fa28b.
- Reimplemented bli_l3_determine_kc() by separating out the logic for
adjusting KC based on MR/NR for triangular A and/or B into a new
function, bli_l3_adjust_kc(). For now, this function is still called
from bli_l3_determine_kc(), but in the future we plan to have it
called once when constructing the control tree.
- Refactored the level-3 thread decorator into two parts:
- One part deals only with launching threads, each one calling a
generic thread entry function. This code resides in frame/thread
and constitutes the definition of bli_thread_launch(). Note that
it is specific to the threading implementation (OpenMP, pthreads,
single, etc.)
- The other part deals with passing the matrix operands and related
information into bli_thread_launch(). This is the "l3 decorator"
and now resides in frame/3. It is agnostic to the threading
implementation.
- Modified the "level" of the thread control tree passed in at each
operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was
passed in a communicator representing the active thread teams which
would share the available work. Now, the *parent* thread comm is
passed in. The operation then grabs the child comm and uses it to
partition the work. The difference is in bli_trsm_blk_var1(), where
there are now two children nodes for this single operation (i.e. the
thread control tree is split one level above where the control tree
is). The sub-prenode is used for the trsm subproblem while the
normal sub-node is used for the gemm part. Importantly, the parent
comm is used for the barrier between them.
- Removed cntl_t* arguments from bli_*_front() functions. These will be
added back in the future when the control tree's creation is moved so
that it happens much sooner (provided that bli_*_front() have not been
absorbed into their respective bli_*_ex() functions).
- Renamed various bli_thread_*() query functions to bli_thrinfo_*(),
for consistency. This includes _num_threads(), _thread_id(), _n_way(),
_work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and
_am_chief().
- Removed extraneous barrier from _blk_var3() of gemm and trsm.
- Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was
misspelled.
commit c803b03e52a7a6997a8d304a8cfa9acf7c1c555b
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Oct 26 18:20:00 2022 -0500
Add check to disable armsve on Apple M1.
commit 2dd692b710b6a9889f7ebdd7934a2108be5c5530
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Oct 26 18:10:26 2022 -0500
Fix auto-detection of firestorm (Apple M1).
commit 88105dbecf0f9dfbfa30215743346e8bd6afb971
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 21 15:16:12 2022 -0500
Added Discord documentation (#677)
Details:
- Added a docs/Discord.md markdown document that walks the reader
through creating a Discord account, obtaining the invite link, and
using the link to join the BLIS Discord server.
- Updated README.md to reference the new Discord.md document in multiple
places, including via the official Discord logo (used with explicit
permission from representatives at Discord Inc.).
commit 23f5b8df3e802a27bacd92571184ec57bbdfa646
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 17 20:21:21 2022 -0500
Shuffled checked properties in bli_l3_check.c. (#676)
Details:
- Added certain checks for matrix structure to the level-3 operations'
_check() functions, and slightly reorganized existing checks.
commit 9453e0f163503f64a290256b4be53d8882224863
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 3 19:46:20 2022 -0500
CREDITS file update.
Details:
- This attribution was intended to go in PR #647.
commit 76a23bd8c33e161221891935a489df9a9fb9c8c0
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 3 15:55:07 2022 -0500
Reinstate sanity check in bli_pool_finalize. (#671)
Details:
- Added a reinit argument to bli_pool_finalize(). This bool will signal
whether or not the function is being called from bli_pool_reinit(). If
it is not being called from _reinit(), we can safely check to confirm
that .top_index == 0 (i.e., all blocks have been checked in). But if
it *is* being called from _reinit(), then that check will be skipped
since one of the predicted use cases for bli_pool_reinit() anticipates
that some blocks are (probably) checked out when the pool_t is
reinitialized.
- Updated existing invocations of bli_pool_finalize() to pass in either
FALSE (from bli_apool_free_block() or bli_pba_finalize_pools()) or
TRUE (from bli_pool_reinit()) for the new reinit argument.
commit 63470b49e3b9b15e00a8f666e86ccd70c6005fe9
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Sep 29 18:52:08 2022 -0500
Fix some bugs in bli_pool.c (#670)
Details:
- Add a check for premature pool exhaustion when checking in blocks via
bli_pool_checkin_block(). This detects "double-free" and other bad
conditions that don't necessarily result in a segfault.
- Make sure to copy all block pointers when growing the pool size.
Previously, checked-out block pointers (which are guaranteed to be set
to NULL) were not being copied, leading to the presence of
uninitialized data.
commit 42d0e66318b186d25eeb215b40ce26115401ed8b
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Sep 29 17:38:02 2022 -0500
Add AddressSanitizer (-fsanitize=address) option. (#669)
Details:
- Added support for AddressSanitizer (ASan), a compiler-integrated
memory error detector. The option (disabled by default) enables
compiling and linking with the -fsanitize=address flag supported by
clang, gcc, and probably others. This flag is employed during
compilation of all BLIS source files *except* for optimized kernels,
which are exempted because ASan usually requires an extra register,
which violates the constraints for many gemm microkernels.
- Minor whitespace, comment, ordering, and configure help text updates.
commit b861c71b50c6d48cb07282f44aa9dddffc1f1b3f
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 23 13:22:27 2022 -0500
Add consistent NaN/Inf handling in sumsqv. (#668)
Details:
- Changed sumsqv implementation as follows:
- If there is a NaN (either real or imaginary), then return a sum of
NaN and unit scale.
- Else, if there is an Inf (either real or imaginary), then return a
sum of +Inf and unit scale.
- Otherwise behave as normal.
commit ee81efc7887374c974a78bfb3e0865776b2f97a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 22 19:15:07 2022 -0500
Parameterized test/3 drivers via command line args. (#667)
Details:
- Rewrote the drivers in test/3, the Makefile, and the runme.sh script
so that most of the important parameters, including parameter combo,
datatype, storage combo, induced method, problem size range, dimension
bindings, number of repeats, and alpha/beta values can be passed in
via command line arguments. (Previously, most of these parameters were
hard-coded into the driver source, except a few that were hard-coded
into the Makefile.) If no argument is given for any particular option,
it will be assigned a sane default. Either way, the values employed at
runtime will be printed to stdout before the performance data in a
section that is commented out with '%' characters (which is used by
matlab and octave for comments), unless the -q option is given, in
which case the driver will proceed quietly and output only performance
data. Each driver also provides extensive help via the -h option, with
the help text tailored for the operation in question (e.g. gemm, hemm,
herk, etc.). In this help text, the driver reminds the user which
implementation it was linked to (e.g. blis, openblas, vendor, eigen).
Thanks to Jeff Diamond for suggesting this CLI-based reimagining of
the test/3 drivers.
- In the test/3 drivers: converted cpp macro string constants, as well
as two string literals (for the opname and pc_str) used in each test
driver, to global (or static) const char* strings, and replaced the
use of strncpy() for storing the results of the command line argument
parsing with pointer copies from the corresponding strings in argv.
This works because the argv array is guaranteed by the C99 standard
to persist throughout the life of the program. This new approach uses
less storage and executes faster. Thanks to Minh Quan Ho for
recommending this change.
- Renamed the IMP_STR cpp macro that gets defined on the command line,
via the test/3/Makefile, to IMPL_STR.
- Updated runme.sh to set the problem size ranges for single-threaded
and multithreaded execution independently from one another, as well as
on a per-system basis.
- Added a 'quiet' variable to runme.sh that can easily toggle quiet mode
for the test drivers' output.
- Very minor typecast fix in call to bli_getopt() in bli_utils.c.
- In bli_getopt(), changed the nextchar variable from being a local
static variable to a field of the getopt_t state struct. (Not sure why
it was ever declared static to begin with.)
- Other minor changes to bli_getopt() to accommodate the rewritten test
drivers' command line parsing needs.
commit 036a4f9d822df25a76a653e70be76fb02284d3d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 22 18:36:50 2022 -0500
Refactored some rntm_t management code. (#666)
Details:
- Separated the "sanitizing" code from the auto-factorization code
in bli_rntm_set_ways_from_rntm() and _rntm_set_ways_from_rntm_sup().
The santizing code now resides in bli_rntm_sanitize() while the
factorization code resides in bli_rntm_factorize() and
bli_rntm_factorize_sup(). (There are two different functions because
the conventional and sup factorization codes are currently somewhat
different.) Also note that the factorization code now relies on the
.auto_factor field to have already been set, either during
rntm_t initialization or when the rntm_t was previously updated and
santized. So rather than locally determining whether to auto-
factorize, those functions just read the .auto_factor field and
proceed accordingly.
- Refactored and removed most code from bli_thread_init_rntm_from_env().
This function now reads the environment variables needed to set nt,
jc, pc, ic, jr, and ir; sets them into the global rntm_t; and then
calls bli_rntm_sanitize() in order to make sure that the contents are
in a "good" state. Thanks to Devin Matthews for suggesting this
refactoring.
- Redefined bli_rntm_set_num_threads() and bli_rntm_set_ways() such that
if multithreading is disabled at compile time (that is, if the cpp
macro BLIS_ENABLE_MULTITHREADING is undefined), they ignore the
caller's request and instead clear the nt and ways fields.
- Redefined bli_thread_set_num_threads() and bli_thread_set_ways() such
that if multithreading is disabled at compile time (that is, if the
cpp macro BLIS_ENABLE_MULTITHREADING is undefined), they ignore the
caller's request and do nothing.
- Redefined bli_rntm_set_num_threads() and bli_rntm_set_ways() as true
functions rather than static inline functions.
- In bli_rntm.c, statically initialize the global_rntm global variable
via the BLIS_RNTM_INITIALIZER macro.
- In bli_rntm.h, defined bli_rntm_clear_auto_factor(), which sets the
.auto_factor field of the rntm_t to FALSE.
- Reorganized order of some inline function definitions in bli_rntm.h.
- Changed the default value given to the .auto_factor field by the
BLIS_RNTM_INITIALIZER macro from TRUE to FALSE.
- Call bli_rntm_clear_auto_factor() instead of
bli_rntm_set_auto_factor_only() in bli_rntm_init().
- Comment/whitespace updates.
commit a1a5a9b4cbef9208da494c45a2f933a8e82559ac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 21 18:31:01 2022 -0500
Implemented support for fat multithreading. (#665)
Details:
- Allow the user to configure BLIS in such a way that multiple threading
implementations get compiled into the library, with one of those
implementations chosen at runtime. For now, there are only three
implementations available: OpenMP, pthreads, and single. (Here,
'single' merely refers to single-threaded mode.) The configure script
now allows the user to give the -t option with a comma-separated list
of values, such as '-t openmp,pthreads'. The first value in the list
will always be the default at library initialization time, and
'single' is always silently appended to the end of the list. The user
can specify which implementation should execute in one of three ways:
by setting the BLIS_THREAD_IMPL environment variable prior to launch;
by calling the bli_thread_set_thread_impl() global runtime API; or by
encoding their choice into a rntm_t that is passed into one of the
expert interfaces. Any of these three choices overrides the
initialization-time default (i.e., the first value listed to the -t
configure option). Requesting an implementation that was not compiled
into the library will result in an error message followed by
bli_abort().
- Relocated the 'auto' logic for the -t option from the top-level
Makefile to the configure script. (Currently, this logic is pretty
dumb, choosing 'openmp' for gcc and icc, and 'pthreads' for clang.)
- Defined a new 'timpl_t' enum in bli_type_defs.h, with three valid
values: BLIS_SINGLE, BLIS_OPENMP, BLIS_POSIX.
- Reorganized the thrcomm_t struct into a single defintion with two
preprocessor blocks, one each for additional fields needed by OpenMP
and pthreads.
- Added timpl_t argument to bli_thrcomm_bcast(), bli_thrcomm_barrier(),
bli_thrcomm_init(), and bli_thrcomm_cleanup(), which these functions
need since they are now wrappers that choose the implementation-
specific function corresponding to the currently enabled threading
implementation.
- Added rntm_t* to bli_thread_broadcast(), bli_thread_barrier() so that
those functions can pass the timpl_t value into bli_thrcomm_bcast()
and bli_thrcomm_barrier(), respectively.
- Defined bli_env_get_str() in bli_env.c to allow the querying of
BLIS_THREAD_IMPL (which, unlike BLIS_NUM_THREADS and friends, is
expected to be a string).
- Defined bli_thread_get_thread_impl(), bli_thread_set_thread_impl() to
get and set the current threading implementation at runtime.
- Defined bli_rntm_thread_impl() and bli_rntm_set_thread_impl() to query
and set the threading implementation within a rntm_t. Also choose
BLIS_SINGLE as the default value when initializing rntm_t structs.
- Added bli_info_get_*() functions to query whether OpenMP or pthreads
would be chosen as the default at init-time. Note that this only
tests whether OpenMP or pthreads is the first implementation in the
list passed to the threading configure option (-t) and is *not* the
same as querying which implementation is currently selected, since
that can be influenced by BLIS_THREAD_IMPL and/or
bli_thread_set_thread_impl().
- Changed l3int_t to l3int_ft.
- Updated docs/Multithreading.md to document the new behavior.
- Updated sandbox/gemmlike and addon/gemmd to work with the new fat
threading feature. This included a few bugfixes to bring the codes up
to date, as necessary.
- Comment, whitespace updates.
commit 89df7b8fa3a3e47ab2fc10ac4d65d0b9fde16942
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Sep 18 18:46:57 2022 -0500
De-templatized _sup_var1n2m.c; unified _sup_packm_a/b(). (#659)
Details:
- Re-expressed the two variants in frame/3/bli_l3_sup_var1n2m.c as a
single function each that performs char* pointer arithmetic rather
than four datatype-specific functions. Did the same for the functions
in bli_l3_sup_packm_a.c and _sup_packm_b.c, and then unified the two
into a single set of functions for packing either A or B, which now
resides in bli_l3_sup_packm.c.
- Pre-grow the cntl_t tree in both bli_l3_sup_var1n2m.c variants rather
than grow them incrementally.
- Relocated empty-matrix and scale-by-beta early return handlnig from
bli_gemm_front() and bli_gemmt_front() to their _ex() counterparts.
- Comment, whitespace updates.
commit fb91337eff1ee2098f315a83888f6667b3a56f86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 15 19:08:10 2022 -0500
Fixed a harmless pc_nt bug in 05a811e.
Details:
- Added missing curly braces around some statements in bli_rntm.c, one
of which needed them in order for the relevant code to be executed in
the intended way. The consequence of 05a811e omitting those braces was
that a statement (pc_nt = 1;) was executed more often than it needed
to be.
- Also adjusted the analagous code in bli_thread.c to match that of
bli_rntm.c.
commit e86076bf4461d1a78186fb21ba8320cfb430f62c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 15 14:22:59 2022 -0500
Test the 'gemmlike' sandbox via AppVeyor. (#664)
Details:
- Added a fifth test to our .appveyor.yml that enables the 'gemmlike'
sandbox with OpenMP enabled (via clang, the 'auto' configuration
target, and building to a static library). Thanks to Jeff Diamond
for pointing out that this test would be useful.
commit 63177dca48cb7d066576d884da4a7a599ececebf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 15 11:21:26 2022 -0500
Fixed gemmlike sandbox bug introduced in 7c07b47.
Details:
- Fixed a bug in the 'gemmlike' sandbox that was introduced in 7c07b47.
This bug was the result of the fact that the gemmlike implementation
uses bli_thrinfo_sup_grow() to grow its thrinfo_t tree, but the
aforementioned commit added an optimization that kicks in when the
rntm_t .pack_a and .pack_b fields are both FALSE. Those fields were
originally added only for sup execution; for large code path, they
are intended to be ignored. But the default initial state of a rntm_t
has those fields set to FALSE, which was inadvertantly activating the
optimization (which targeted single-threaded cases only) and would
cause multithreaded use cases of 'gemmlike' to segfault. The fix took
the form of setting the .pack_a and .pack_b fields to TRUE in
bls_gemm_ex().
- Added minimal 'const' and 'const'-casting to 'gemmlike' so that gcc
stays quiet.
commit 05a811e898b371a76581abd4afa416980cce7db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 13 19:24:05 2022 -0500
Initialize rntm_t nt/ways fields with 1 (not -1). (#663)
Details:
- Changed the way that rntm_t structs are initialized, mainly so that
the global rntm_t that is set via environment variables at runtime
may be queried by the application prior to any computation taking
place. (Strictly speaking, the application may already query these
fields, but they do not always contain valid values and often contain
-1 when they are unset.) These changes also served to clarify how
these parameters are treated, and homogenized the implementations of
bli_rntm_set_ways_from_rntm(), bli_rntm_set_ways_from_rntm_sup(), and
bli_thread_init_rntm_from_env(). Special thanks to Jeff Diamond,
Leick Robinson, and Devin Matthews for pointing out that the previous
behavior was needlessly confusing and could be improved.
- The aforementioned modifications also included subtle changes as to
what counts as "setting" a loop's ways of parallelism for the purposes
of deciding whether to use the ways or the total number of threads.
Previously, setting any loop's ways, even to 1, counted in favor of
using the ways. Now, only values greater than 1 will count as
"setting", and all other values will silently be mapped to 1, with
those parameters treated as if they were untouched all along.
- Updated bli_rntm.h and bli_thread.c so that any attempt to set the
PC_NT variable (or pc_nt field of a rntm_t) will either ignore the
request or reassert the value as 1.
- Updated bli_rntm_set_ways() so that rather than clear the
num_threads field, it is set to the product of all of the per-loop
ways of parallelism.
- Removed code from test_libblis.c that handled the possibility of unset
environment variables when printing out their values.
- Removed bli_rntm_equals() inline function from bli_rntm.h, which has
long been disabled.
- Updates to docs/Multithreading.md related to the aforementioned
changes.
- Comment updates.
commit fd885cf98f4fe1d3bc46468e567776c37c670fcc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 13 11:50:23 2022 -0500
Use kernel CFLAGS for 'kernels' subdirs in addons. (#658)
Details:
- Updated Makefile and common.mk so that the targeted configuration's
kernel CFLAGS are applied to source files that are found in a
'kernels' subdirectory within an enabled addon. For now, this
behavior only applies when the 'kernels' directory is at the top
level of the addon directory structure. For example, if there is an
addon named 'foobar', the source code must be located in
addon/foobar/kernels/ in order for it to be compiled with the target
configurations's kernel CFLAGS. Any other source code within
addon/foobar/ will be compiled with general-purpose CFLAGS (the same
ones that were used on all addon code prior to this commit). Thanks
to AMD (esp. Mithun Mohan) for suggesting this change and catching an
intermediate bug in the PR.
- Comment/whitespace updates.
commit cb74202db39dc8cb81fdd06f8a445f8837e27853
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 13 11:46:24 2022 -0500
Fixed incorrect sizeof(type) in edge case macros. (#662)
Details:
- In bli_edge_case_macro_defs.h, the GEMM_UKR_SETUP_CT_PRE() and
GEMMTRSM_UKR_SETUP_CT_PRE() macros previously declared their temporary
ct microtiles as:
PASTEMAC(ch,ctype)
_ct[ BLIS_STACK_BUF_MAX_SIZE / sizeof( PASTEMAC(ch,type) ) ] \
__attribute__((aligned(alignment))); \
The problem here is that sizeof( PASTEMAC(ch,type) ) evaluates to
things like sizeof( BLIS_DOUBLE ), not sizeof( double ), and since
BLIS_DOUBLE is an enum, it is typically an int, which means the
sizeof() expression is evaluating to the wrong value. This was likely
a benign bug, though, since BLIS does not support any computational
datatypes that are smaller than sizeof( int ), which means the ct
array would be *over*-allocated rather than underallocated. Thanks
to @moon-chilled for identifying and reporting this bug in #624.
- CREDITS file update.
commit 6e5431e8494b06bd80efcab3abf0a6456d6c0381
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Sep 10 15:16:58 2022 -0500
Fix line number issue in flattened blis.h. (#660)
Details:
- Updated the top-level Makefile so that it invokes flatten-headers.py
without the -c option, which was requesting that comments be stripped
(since comment stripping is disabled by default).
- Updated flatten-headers.py to accept a new option (-l) to enable
insertion of #line directives into the output file. This new option
is enabled by default.
- Also added logic to flatten-headers.py that outputs a warning if both
comment stripping and line numbers are requested since the comment
stripping will cause the line numbers to become inaccurate.
commit 4afe0cfdab0e069e027f97920ea604249e34df47
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 8 18:33:20 2022 -0500
Defined invscalv, invscalm, invscald operations. (#661)
Details:
- Defined invert-scale (invscal) operation on vectors (level-1v),
matrices (level-1m), and diagonals (level-1d).
- Added test modules for invscalv and invscalm to the testsuite.
- Updated BLISObjectAPI.md and BLISTypedAPI.md API documentation to
reflect the new operations. Also updated KernelsHowTo.md accordingly.
- Renamed 'beta' to 'alpha' in scalv and scalm testsuite modules (and
input.operations files) so that the parameter name matches the
parameter used in the documentation.
commit a87eae2b11408b556e562f1b04e673c6cd1612bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 6 18:04:09 2022 -0500
Added '-q' quiet mode option to testsuite. (#657)
Details:
- Added support for a '-q' command line option to the testsuite. This
option suppresses most informational output that would normally
clutter up the screen. By default, verbose mode (the previous
status quo) will be operative, and so quiet mode must be requested.
commit dfa54139664a42d29774e140ec9e5597af869a76
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Tue Aug 30 08:07:50 2022 +0800
Arm64 dgemmsup with extended MR&NR (#655)
Details:
- Since the number of registers in NEON is large but their lengths are
short, I'm here extending both MR and NR.
- The approach is to represent the C microtile in registers optionally
in columns, so for sizes like 6x7m, the 'crr' kernel is the default
with 'rrr' supported through an in-register transpose.
- A few asm kernels are crafted for 'rv' to complete this extended size
support.
- For 'rd' I'm still relying heavily on C99 intrinsic kernels with
branching so the performance might not be optimal. (Sorry for that.)
- So far, these changes only affect the 'firestorm' subconfig.
- This commit also contains row-preferential s12x8 and d6x8 gemm
ukernels. These microkernels are templatized versions of the existing
s8x12 and d6x8 ukernels defined in bli_gemm_armv8a_asm_d6x8.c.
commit 9e5594ad5fc41df8ef2825a025d7844ac2275c27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 11 14:36:38 2022 -0500
Temporarily disabled #line directives from 6826c1c.
Details:
- Commented out the inclusion of #line preprocessor directives in the
flattened header output provided by build/flatten-headers.py. This
output was added recently in 6826c1c, but was later found to have
thrown off the line numbering referenced by compiler warnings and
errors (possibly due to license comment blocks, which are stripped
from source headers as they are inlined into the monolithic header).
commit 775148bcdbb1014b4881a76306f35f5d0fedecbe
Author: jdiamondGitHub <jeff_diamond@fastmail.com>
Date: Fri Aug 5 12:01:24 2022 -0500
Updated ARMv8a kernels to fix 2 prefetching issues. (#649)
Details:
- The ARMv8a dgemm/sgemm microkernels had 2 prefetching issues that
impacted performance on modern ARM platforms. The most significant
issue was that only a single prefetch per C tile column was issued.
When a column of C was not cache aligned, the second cache line would
not be prefetched at all, forcing the kernel to wait for an entire
load to update elements of C. This happened with roughly 50% of the
C prefetches. The fix was to have two prefetches per column, spaced
64 bytes (1 cache line) apart.
- A secondary performance issue was that all the C prefetch instructions
were issued sequentially at the beginning of the kernel call. This
caused a noticeable performance slowdown. Interleaving the prefetch
calls every 2-3 instructions in the prologue code solved the issue.
commit bbaf29abd942de47a3a99a80a67d12bab41b27db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 4 17:51:37 2022 -0500
Very minor variable updates to common.mk.
Details:
- Fixed a harmless bug that would have allowed C++ headers into the list
of header suffices specifically reserved for C99 headers. In practice,
this would have had no substantive effect on anything since the core
BLIS framework does not use C++ headers.
commit a48e29d799091a833213efeafaf2d342ebdafde9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 28 10:11:07 2022 -0500
CREDITS file update.
Details:
- Thanks to Kihiro Bando for assisting with issue #644.
commit 5b298935de7f20462bfad1893ed34ecd691cec5a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 27 19:14:15 2022 -0500
Removed buggy cruft from power10 subconfig.
Details:
- Removed #defines for BLIS_BBN_s and BLIS_BBN_d from
bli_kernel_defs_power10.h. These were inadvertently set in ae10d949
because the power10 subconfig was registering bb packm ukernels, but
only for 6xk (power10 uses s8x16 and d8x8 ukernels) and only because
the original author (probably) copy-pasted from power9 when getting
started. That 6xk packm registration was effectively "dead code"
prior to ae10d949, but was then mistaken as not-dead code during the
ae10d949 refactor. These improper bb factors may have been causing
bugs in power10 builds. Thanks to Nicholai Tukanov for helping remind
me what the power10 subconfig was supposed to look like.
- Removed extraneous microkernel preference registrations from power10
subconfig. Preferences for single and double complex gemm were being
registered despite there being no complex gemm ukernels registered to
go with them. Similarly, there were trsm preferences registered
without any trsm ukernels registered (and BLIS doesn't actually use a
preference for the trsm ukernel anyway). These extraneous
registrations were almost surely not hurting anything, even if they
were quite misleading.
commit 56de31b00fa0f1ba866321817cd1e5d83000ff11
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jul 27 13:54:17 2022 -0500
Disable modification of KC in the gemmsup kernels. (#648)
This led to a ~50% performance reduction for certain gemm operations (but not others?). See #644 for example.
commit 4dde947e2ec9e139c162801320c94e6a01a39708
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 26 17:29:32 2022 -0500
Fixed out-of-bounds bug in sup s6x16m haswell kernel.
Details:
- Fixed another out-of-bounds read access bug in the haswell sup
assembly kernels. This bug is similar to the one fixed in 17b0caa
and affects bli_sgemmsup_rv_haswell_asm_6x2m(). Thanks to Madeesh
Kannan for reporting this bug (and a suitable fix) in #635.
- CREDITS file update.
commit 6826c1cdfba855513786d9e3d606681316453398
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jul 25 18:21:05 2022 -0500
Add `#line` directives to flattened `blis.h`. (#643)
Details:
- Modified flatten-headers.py so that #line directives are inserted into
the flattened blis.h file. This facilitates easier debugging when
something is amiss in the flattened blis.h because the compiler will
be able to refer to the line number within the original constituent
header file (which is where the fix would go) rather than the line
number within the flattened header (which is not as helpful).
commit af3a41e02534befdae026377592ce437bab83023
Author: Alexander Grund <Flamefire@users.noreply.github.com>
Date: Thu Jul 21 18:05:48 2022 +0200
Add autodetection for POWER7, POWER9 & POWER10 (#647)
Read from `/proc/cpuinfo` as done for ARM.
Fixes #501
commit 17b0caa2b2bff439feb6d2b39cfa16e7591882b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 14 17:55:34 2022 -0500
Fixed out-of-bounds read in haswell gemmsup kernels.
Details:
- Fixed memory access bugs in the bli_sgemmsup_rv_haswell_asm_Mx2()
kernels, where M = {1,2,3,4,5,6}. The bugs were caused by loading four
single-precision elements of C, via instructions such as:
vfmadd231ps(mem(rcx, 0*32), xmm3, xmm4)
in situations where only two elements are guaranteed to exist. (These
bugs may not have manifested in earlier tests due to the leading
dimension alignment that BLIS employs by default.) The issue was fixed
by replacing lines like the one above with:
vmovsd(mem(rcx), xmm0)
vfmadd231ps(xmm0, xmm3, xmm4)
Thus, we use vmovsd to explicitly load only two elements of C into
registers, and then operate on those values using register addressing.
Thanks to Daniël de Kok for reporting these bugs in #635, and to
Bhaskar Nallani for proposing the fix).
- CREDITS file update.
commit cc260fd7068f0fe449d818435aa11adb14c17fed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 13 16:16:01 2022 -0500
Allow uniform max problem sizes in test/3/runme.sh.
Details:
- Tweaked test/3/runme.sh so that the test driver binaries for single-
threaded (st), single-socket (1s), and dual-socket (2s) execution can
be built using identical problem size ranges. Previously, this was not
possible because runme.sh used the maximum problem size, which was
embedded into the binary filename, to tell the three classes of
binaries apart from one another. Now, runme.sh uses the binary suffix
("st", "1s", or "2s") to tell them apart. This required only a few
changes to the logic, but it also required a change in format to the
threading config strings themselves (replacing the max problem size
with "st", "1s", or "2s"). Thanks to Jeff Diamond for inspiring this
improvement.
- Comment updates.
commit 9b1beec60be31c6ea20b85806d61551497b699e4
Author: bartoldeman <bartoldeman@users.noreply.github.com>
Date: Mon Jul 11 20:15:12 2022 -0400
Use BLIS_ENABLE_COMPLEX_RETURN_INTEL in blastest files (#636)
Details:
- Fixed a crash that occurs when either cblat1 or zblat1 are linked
with a build of BLIS that was compiled with '--complex-return=intel'.
This fix involved inserting preprocessor macro guards based on
BLIS_ENABLE_COMPLEX_RETURN_INTEL into blastest/src/cblat1.c and
blastest/src/zblat1.c to correctly handle situations where BLIS is
compiled with Intel/f2c-style calling conventions for complex numbers.
- Updated blastest/src/fortran/run-f2c.sh so that future executions
will insert the aforementioned cpp macro conditional where
appropriate.
commit 98d467891b74021ace7f248cb0856bec734e39b6
Author: bartoldeman <bartoldeman@users.noreply.github.com>
Date: Mon Jul 11 19:40:53 2022 -0400
Change complex_return='intel' for ifx. (#637)
Details:
- When checking the version string of the Fortran compiler for the
purposes of determining a default return convention for complex
domain values, grep for "IFORT" instead of "ifort" since that string
is common to both the 'ifx' and 'ifort' binaries provided by Intel:
$ ifx --version
ifx (IFORT) 2022.1.0 20220316
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
$ ifort --version
ifort (IFORT) 2021.6.0 20220226
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
commit ffde54cc5c334aca8eff4d6072ba49496bf3104c
Author: jdiamondGitHub <jeff_diamond@fastmail.com>
Date: Mon Jul 11 16:47:30 2022 -0500
Minor changes to .gitignore and LICENSE files. (#642)
Details:
- Macs create .DS_Store files in every directory visited. Updated
.gitignore file so these files won't be reported as untracked by
'git status'.
- Added Oracle Corporation to the LICENSE file.
- Updated UT copyright on behalf of SHPC.
commit 7cba7ce3dd1533fcc4ca96ac902bdf218686139a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 8 11:15:18 2022 -0500
Minor cleanups, comment updates to bli_gks.c.
Details:
- Removed a redundant registration of 'a64fx' subconfig in
bli_gks_init().
- Reordered registration of 'armsve', 'a64fx', and 'firestorm'
subconfigs. Thanks to Jeff Diamond for his input on this reordering.
- Comment updates to bli_gks.c and arch_t enum in bli_type_defs.h.
commit 667f201b7871da68622027d02bd6b7da3262f8e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 7 16:44:21 2022 -0500
Fixed type bug in bli_cntx_set_ukr_prefs().
Details:
- Fixed a bug in bli_cntx_set_ukr_prefs() which erroneously typecast the
num_t value read from va_args() down to a bool before being stored
within the cntx_t. This bug was introduced on April 6th 2022, in
ae10d94. This caused the ukernel preferences for double real and
double complex to go unchanged while the preferences for single real
and single complex were corrupted by the former datatypes'
preference values. The bug manifested as degraded performance for
subconfigurations that registered column-preferential ukernels. The
reason is that the erroneous preferences trigger unnecessary
transpositions in the operation, which forces the gemm ukernel to
compute on matrices that are not stored according to its preference.
Thanks to Devin Matthews, Jeff Diamond, and Leick Robinson for their
extensive efforts and assistance in tracking down this issue.
- Augmented the informational header that is output by the testsuite to
include ukernel preferences for gemm, gemmtrsm_[lu], and trsm_[lu].
- CREDITS file update.
commit d429b6bfced21a63bf711224ac402f93f0080b52
Author: Isuru Fernando <isuruf@gmail.com>
Date: Tue Jun 28 15:34:10 2022 -0500
Support clang targetting MinGW (#639)
* Support clang targetting MinGW
* Fix pthread linking
commit d93df023348144e091f7b3e3053995648f348aa7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 15 14:09:49 2022 -0500
Removed unused dt arg in bli_gks_query_ind_cntx().
Details:
- Removed the num_t datatype argument from bli_gks_query_ind_cntx().
This argument stopped being needed by the function in commit e9da642.
Its only use in bli_gks_query_ind_cntx() was to be passed through to
the context initialization function for the chosen induced method,
but even then, commit log notes from e9da642 indicate that I could not
recall why the datatype argument was ever needed by the context init
function to begin with.
- Updated all invocations of bli_gks_query_ind_cntx() to omit the dt
argument. Most of these invocations resided in various standalone test
drivers (and the testsuite).
commit 56772892450cc92b3fbd6a9d0460153a43fc47ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 1 10:49:33 2022 -0500
Added SMU citation to README.md intro.
Details:
- Added a citation to SMU and the Matthews Research Group to the general
attribution of maintainership and development in the Introduction of
the README.md file. Thanks to Robert van de Geijn and Devin Matthews
for suggesting this change.
commit 4603324eb090dfceaad3693a70b2d60544036aa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 19 14:07:03 2022 -0500
Init/finalize via bli_pthread_switch_t API (#634).
Details:
- Defined and implemented a new pthread-like abstract datatype and API
in bli_pthread.c. The new type, bli_pthread_switch_t, is similar to
bli_pthread_once_t in some respects. The idea is that like a switch in
your home that controls a light or ceiling fan, it can either be on or
off. The switch starts in the off state. Moving from one state to the
other (on to off; off to on) causes some action (i.e., a startup or
shutdown function) to be executed. Trying to move from one state to
the same state (on to on; off to off) is safe in that it results in
no action. Unlike bli_pthread_once(), the API for bli_pthread_switch_t
contains both _on() and _off() interfaces. Also, unlike the _once()
function, the _on() and _off() functions return error codes so that
the 'int' error code returned from the startup or shutdown functions
may be passed back to the caller. Thanks to Devin Matthews for his
input and feedback on this feature.
- Replaced the previous implementation of bli_init_once() and
bli_finalize_once() -- both of which used bli_pthread_once() -- with
ones that rely upon bli_pthread_switch_on() and _switch_off(),
respectively. This also required updating the return types of
_init_apis() and _finalize_apis() to match the function pointer type
required by bli_pthread_switch_on()/_switch_off().
- Comment updates.
commit 64a9b061f6032e2b59613aecdbe7bb52161605c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 10 14:54:22 2022 -0500
Fixed misspelling of 'xpbys' in gemm macrokernel.
Details:
- Fixed a functionally harmless typo in bli_gemm_ker_var2.c where a few
instances of the substring "xpbys" were misspelled as "xbpys". The
misspellings were harmless because they were consistent, and because
they referenced only local symbols.
commit 1c733402a95ab08b20f3332c2397fd52a2627cf6
Author: Jed Brown <jed@jedbrown.org>
Date: Thu Apr 28 11:58:44 2022 -0600
Fix version check for znver3, which needs gcc >= 10.3 (#628)
Apple's clang-12 lacks znver3 support, unlike upstream clang-12.
commit 6431c9e13b86e4442b6aacba18a0ace12288c955
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 14 13:01:24 2022 -0500
Added missing 'const' to zen bli_gemm_small.c.
Details:
- Added missing 'const' qualifiers to signatures of functions defined in
kernels/zen/3/bli_gemm_small.c. This fixes compile-time errors when
targeting 'zen3' subconfig (which apparently is enabling AMD's
gemm_small code path by default). Thanks to Devin Matthews for
reporting this error.
commit 9fea633748ed27ef3853bba7cd955690c61092b4
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Apr 13 15:59:06 2022 -0500
Partial addition of 'const' to all interfaces above the (micro)kernels. (#625)
Details:
- Added 'const' qualifier to applicable function arguments wherever the
the pointed-to object is not internally modified. This change affects
all interfaces that reside above the level of the (micro)kernels.
- Typecast certain function return values to discard 'const' qualifier.
- Removed 'restrict' from various arguments, including cntx_t*,
auxinfo_t*, rntm_t*, thrinfo_t*, mem_t*, and others
- Removed parts of some APIs, such as bli_cntx_*(), due to limited use.
- Merged some variable declarations with their corresponding
initialization statements.
- Whitespace changes.
commit ae10d9495486f589ed0320f0151b2d195574f1cf
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Apr 6 20:31:11 2022 -0500
Simplify and rewrite reference packm kernels. (#610)
Details:
- Reorganized the way kernels are stored within the cntx_t structure so
that rather than having a function pointer for every supported size of
unrolled packm kernel (2xk, 3xk, 4xk, etc.), we store only two packm
kernels per datatype: one to pack MRxk micropanels and one to pack
NRxk micropanels.
- NOTE: The "bb" (broadcast B) reference kernels have been merged into
the "standard" kernels (packm [including 1er and unpackm], gemm,
trsm, gemmtrsm). This replication factor is controlled by
BLIS_BB[MN]_[sdcz] etc. Power9/10 needs testing since only a
replication factor of 1 has been tested. armsve also needs testing
since the MR value isn't available as a macro.
- Simplified the bli_cntx_*() APIs to conform to the new unified kernel
array within the cntx_t. Updated existing bli_cntx_init_<subconfig>()
function definitions for all subconfigurations.
- Consolidated all kernel id types (e.g. l1vkr_t, l1mkr_t, l3ukr_t,
etc.) into one kernel id type: ukr_t.
- Various edits, updates, and rewrites of reference kernels pursuant to
the aforementioned changes.
- Define compile-time macro constants (BLIS_MR_[sdcz], BLIS_NR_[sdcz],
and friends) in bli_kernel_macro_defs.h, but only when the macro
BLIS_IN_REF_KERNEL is defined by the build system.
- Loose ends:
- Still need to update documentation, including:
- docs/ConfigurationHowTo.md
- docs/KernelsHowTo.md
to reflect changes made in this commit.
commit b3e674db3c05ca586b159a71deb1b61d701ae5c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 4 17:31:02 2022 -0500
README.md update to link to releases page.
commit 69fa915464c52f09a5971a60f521900d31a34e69
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 1 08:47:46 2022 -0500
Fixed broken "tagged releases" link in README.md.
commit 88cab8383ca90ddbb4cf13e69b7d44a1663a4425
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 1 08:12:06 2022 -0500
CHANGELOG update (0.9.0)
commit 14c86f66b20901b60ee276da355c1b62642c18d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 1 08:12:06 2022 -0500
Version file update (0.9.0)
commit 99bb9002f1aff598d347eae2821a3f7bdd1f48e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 1 08:10:59 2022 -0500
ReleaseNotes.md update in advance of next version.
commit bee7678b2558a691ac850819dbe33fefe4fdbee3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 31 14:09:39 2022 -0500
CREDITS file update.
commit cf06364327bd2d21d606392371ff3c5962bee5ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 29 16:18:25 2022 -0500
Fixed typo in BLAS gemm3m call to _check().
Details:
- Fixed an unresolved symbol issue leftover from #590 whereby ?gemm3m_()
as defined in bla_gemm3m.c was referencing bla_gemm3m_check(), which
does not exist. It should have simply called the _check() function for
gemm.
commit 1ec020b33ece1681c0041e2549eed2bd4c6cf356
Author: Dipal M Zambare <71366780+dzambare@users.noreply.github.com>
Date: Wed Mar 30 02:45:36 2022 +0530
AMD kernel updates; frame-specific AMD updates. (#597)
Details:
- Allow building BLIS with certain framework files (each with the '_amd'
suffix) that have been customized by AMD for Zen-based hardware. These
customized files were derived from portable versions of the same files
(i.e., those without the '_amd' suffix). Whether the portable or AMD-
specific files are compiled is now controlled by a new configure
option, --[en|dis]able-amd-frame-tweaks. This option is disabled by
default in vanilla BLIS, though AMD may choose to enable it by default
in their fork. For now, the added AMD-specific files are:
- bli_gemv_unf_var2_amd.c
- bla_copy_amd.c
- bla_gemv_amd.c
These files reside in 'amd' subdirectories found within the directory
housing their generic counterparts.
- Register optimized real-domain copyv, setv, and swapv kernels in
bli_cntx_init_zen.c.
- Various minor updates to level-1v kernels in 'zen' kernel set.
- Added caxpyf kernel as well as saxpyf and multiple daxpyf kernels to
the 'zen' kernel set
- If the problem passed to ?gemm_() in bla_gemm.c has a unit m or n dim,
call gemv instead and return early.
- Combined variable declarations with their initialization in various
level-2 and level-3 BLAS compatibility files, and also inserted
'const' qualifer in those same declaration statements.
- Moved frame/compat/bla_gemmt.c and .h to frame/compat/extra/ .
- Added copyv and swapv test drivers to 'test' directory.
- Whitespace, comment changes.
commit 0db2bd5341c5c3ed5f1cc2bffa90952735efa45f
Author: Bhaskar Nallani <Nallani.Bhaskar@amd.com>
Date: Fri Mar 25 05:11:55 2022 +0530
Added BLAS/CBLAS APIs for gemm3m. (#590)
Details:
- Created ?gemm3m_() and cblas_?gemm3m() APIs that (for now) simply
invoke the 1m implementation unconditionally. (Note that these APIs
bypass sup handling.)
- Added BLAS prototypes for gemm3m in frame/compat/bla_gemm3m.h.
- Added CBLAS prototypes for gemm3m in frame/compat/cblas/src/cblas.h.
- Relocated:
frame/compat/cblas/src/cblas_?gemmt.c
files into
frame/compat/cblas/src/extra/
- Relocated frame/compat/bla_gemmt.? into frame/compat/extra/ .
- Minor reorganization of prototypes and cpp macro directives in
bli_blas.h, cblas.h, and cblas_f77.h.
- Trival whitespace change to cblas_zgemm.c.
commit d6810000e961fe807dc5a7db81180a8355f3eac0
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Mar 14 10:29:54 2022 -0500
Update Multithreading.md
Add notes about `BLIS_IR_NT` (should typically be 1) and `BLIS_JR_NT` (should typically be small, e.g. <= 4). [ci skip]
commit f1dbb0e514f53a3240d3a6cbdc3306b01a2206f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 11 13:38:28 2022 -0600
Trival whitespace change; commit log addendum.
Details:
- A co-attribution to Mithun Mohan was inadvertently omitted from the
commit log for headline change in the previous commit, 7c07b47.
commit 7c07b477e432adbbce5812ed9341ba3092b03976
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 11 13:28:50 2022 -0600
Avoid gemmsup barriers when not packing A or B. (#622)
Details:
- Implemented a multithreaded optimization for the special (and common)
case of employing the gemmsup code path when the user requests
(implicitly or explicitly) that neither A nor B be packed during
computation. This optimization takes the form of a greatly reduced
code branch in bli_thrinfo_sup_create_for_cntl(), which avoids a
broadcast and two barriers, and results in higher performance when
obtaining two-way or higher parallelism within BLIS. Thanks to
Bhaskar Nallani of AMD for proposing this change via issue #605.
- Added an early return branch to bli_thrinfo_create_for_cntl() that
detects and quickly handles cases where no parallelism is being
obtained within BLIS (i.e., single-threaded execution). Note that
this special case handling was/is already present in
bli_thrinfo_sup_create_for_cntl().
- CREDITS file update.
commit cad10410b2305bc0e328c5f2517ab02593b53428
Author: Ivan Korostelev <ivan23kor@gmail.com>
Date: Thu Mar 10 09:58:14 2022 -0600
POWER10: edge cases in microkernel (#620)
Use new API for POWER10 gemm microkernel
commit 71851a0549276b17db18a0a0c8ab4f54493bf033
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 8 17:38:09 2022 -0600
Fixed level-3 performance bug in haswell ukernels.
Details:
- Fixed a performance regression affecting nearly all level-3 operations
that use the 'haswell' sgemm and dgemm microkernels. This regression
was introduced in 54fa28b, caused by an ill-formed conditional
expression in the assembly code that controls whether cache lines of C
should be prefetched as rows or as columns. Essentially, the two
branches were reversed, causing incomplete prefetching to occur for
both row- and column-stored instances of matrix C. Thanks to Devin
Matthews for his help finding and fixing this bug.
commit 84732bf95634ac606c5f2661d9474318e366c386
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 28 12:19:31 2022 -0600
Revamp how tools are handled/checked by configure.
Details:
- Consolidate handling of tools that are specifiable via CC, CXX, FC,
PYTHON, AR, and RANLIB into one bash function, select_tool_w_env().
- If the user specifies a tool via an environment variable (e.g.
CC=gcc) and that tool does not seem valid, print an error message
and abort configure, unless the tool is optional (e.g. CXX or FC),
in which case a warning message is printed instead.
- The definition of "seems valid" above amounts to:
- responding to at least one of a basic set of command line options
(e.g. --version, -V, -h) if the os_name is Linux (since GNU tools
tend to respond to flags such as --version) or if the tool in
question is CC, CXX, FC, or PYTHON (which tend to respond to the
expected flags regardless of OS)
- the binary merely existing for AR and RANLIB on Darwin/OSX/BSD.
(These OSes tend to have non-GNU versions of ar and ranlib, which
typically do not respond to --version and friends.)
- This PR addresses #584. Thanks to Devin Matthews for suggesting some
of the changes in this commit.
commit d5146582b1f1bcdccefe23925d3b114d40cd7e31
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Feb 23 03:35:46 2022 +0900
ArmSVE Ensure Non-zero Block Size (#615)
Fixes #613. There are several macros/environment variables which need to be tuned to get good cache block sizes. It would be nice to have a way of getting values automatically.
commit 4d8352309784403ed6719528968531ffb4483947
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Feb 23 01:03:47 2022 +0900
Add armsve to arm64 Metaconfig (#614)
Availability of the `armsve` subconfig is controlled by the compiler version (gcc/clang). Tested for SVE and non-SVE. Fixes #612.
commit c9700f369aa84fc00f36c4b817ffb7dab72b865d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 15 15:36:52 2022 -0600
Renamed SIMD-related macro constants for clarity.
Details:
- Renamed the following macros defined in bli_kernel_macro_defs.h:
BLIS_SIMD_NUM_REGISTERS -> BLIS_SIMD_MAX_NUM_REGISTERS
BLIS_SIMD_SIZE -> BLIS_SIMD_MAX_SIZE
Also updated all instances of these macros elsewhere, including
subconfigurations, source code, and documentation. Thanks to Devin
Matthews for suggesting this change.
commit ee9ff988c49f16696679d4c6cd3dcfcac7295be7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 15 15:01:51 2022 -0600
Move edge cases to gemmtrsm ukrs; doc updates.
Details:
- Moved edge-case handling into the gemmtrsm microkernel. This required
changing the microkernel API to take m and n dimension parameters as
well as updating all existing gemmtrsm microkernel function pointer
types, function signatures, and related definitions to take m and n
dimensions. Also updated all existing gemmtrsm kernels in the
'kernels' directory (which for now is limited to haswell and penryn
kernel sets, plus native and 1m-based reference kernels in
'ref_kernels') to take m and n dimensions, and implemented edge-case
handling within those microkernels via a collection of new C
preprocessor macros defined within bli_edge_case_macro_defs.h. Note
that the edge-case handling for gemm-like operations had already
been relocated into the gemm microkernel in 54fa28b.
- Added desriptive comments to GEMM_UKR_SETUP_CT() and related macros in
bli_edge_case_macro_defs.h to allow for easier reading.
- Updated docs/KernelsHowTo.md to reflect above changes. Also cleaned up
the bullet under "Implementation Notes for gemm" that covers alignment
issues. (Thanks to Ivan Korostelev for pointing out the confusing and
outdated language in issue #591.)
- Other minor tweaks to KernelsHowTo.md.
commit 25061593460767221e1066f9d720fa6676bbed8f
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Feb 13 20:11:55 2022 -0600
Don't use `-Wl,-flat-namespace`.
Flat namespaces can cause problems due to conflicting system libraries,
etc., so just mark `xerbla_` as a weak symbol on macOS instead.
commit 5a4d3f5208d3d8cc1827f8cc90414c764b7ebab3
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Feb 13 17:28:30 2022 -0600
Use -flat_namespace option to link on macOS
Fixes #611.
commit 26742910a087947780a089360e2baf82ea109e01
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Feb 13 16:53:45 2022 -0600
Update CC_VENDOR logic
Look for `GCC` in addition to `gcc` to handle weird conda version strings. [ci skip]
commit 2f3872e01d51545c687ae2c8b2650e00552111a7
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Mon Feb 7 17:14:49 2022 +0900
ArmSVE Adopts Label Wrapper
For clang (& armclang?) compilation.
Hopefully solves #609 .
commit 72089bb2917b78d99cf4f27c69125bf213ee54e6
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Feb 5 16:56:04 2022 +0900
ArmSVE Use Predicate in M-Direction
No need to query MR during kernel runtime.
commit 9cc897f37455d52fbba752e3801f1a9d4a5bfdc1
Author: Ruqing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Feb 3 16:40:02 2022 +0000
Fix SVE Compil.
commit b5df1811f1bc8212b2cda6bb97b79819afe236a8
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Feb 3 02:31:29 2022 +0900
Armv8a, ArmSVE: Simplify Gen-C
commit 35195bb5cea5d99eb3eaf41e3815137d14ceb52d
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jan 31 10:29:50 2022 -0600
Add armclang detection to configure.
armclang is treated as regular clang. Fixes #606. [ci skip]
commit 0be9282cdccf73342d8571d3f7971a9b0af72363
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 26 17:46:24 2022 -0600
Updated zen3 macro constant names.
Details:
- In config/zen3/bli_family_zen3.h, renamed:
BLIS_SMALL_MATRIX_A_THRES_M_GEMMT -> _M_SYRK
BLIS_SMALL_MATRIX_A_THRES_N_GEMMT -> _N_SYRK
Thanks to Jeff Diamond for helping spot the stale _SYRK naming.
commit 0ab20c0e72402ba0b17fe2c3ed3e16bf2ace0fd3
Author: Jeff Hammond <jehammond@nvidia.com>
Date: Thu Jan 13 07:29:56 2022 -0800
the Apple local label thing is required by Clang in general
@egaudry and I both saw this issue on Linux with Clang 10.
```
Compiling obj/thunderx2/kernels/armv8a/3/sup/bli_gemmsup_rv_armv8a_asm_d4x8m.o ('thunderx2' CFLAGS for kernels)
kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c:171:49: fatal error: invalid symbol redefinition
" \n\t"
^
<inline asm>:90:5: note: instantiated into assembly here
.SLOOPKITER:
^
1 error generated.
```
Signed-off-by: Jeff Hammond <jehammond@nvidia.com>
commit 81f93be0561c705ae6823d19e40849facc40bef7
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jan 10 10:19:47 2022 -0600
Fix row-/column-major pref. in 16x8 haswell sgemm ukr (unused)
commit 268ce1f29a717d18304713ecc25a2eafe41838c7
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jan 10 10:17:17 2022 -0600
Relax alignment constraints
Remove alignment of temporary AB buffer in edge case handling macros unless alignment is specifically requested (e.g. Core2, SDB/IVB). Fixes #595.
commit 3f2440b0226d5e23a43d12105d74aa917cd6c610
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 6 14:57:36 2022 -0600
Added m, n dims to gemmd/gemmlike ukernel calls.
Details:
- Updated the gemmd addon and the gemmlike sandbox code to use the new
microkernel calling sequence, which now includes m and n dimensions so
that the microkernel has all the information necessary to handle edge
cases. Thanks to Jeff Diamond for catching this, which ideally would
have been included in commit 54fa28b.
- Retired var2 of both gemmd and gemmlike to 'attic' directories and
removed their corresponding prototypes. In both cases, var2 was a
variant of the block-panel algorithm where edge-case handling was
abstracted away to a microkernel wrapper. (Since this is now the
official behavior of BLIS microkernels, I saw no need to have it
included as a separate code path.)
- Comment updates.
commit 864bfab4486ac910ef9a366e9ade4b45a39747fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 4 15:10:34 2022 -0600
CREDITS file update.
commit 466b68a3ad118342dc49a8130b7b02f5e7748521
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun Jan 2 14:59:41 2022 -0600
Add unique tag to branch labels for Apple ARM64.
Add `%=` tag to branch labels, which expands to a unique identifier for each inline assembly block. This prevents duplicate symbol errors on Apple Silicon (#594). Fixes #594. [ci skip] since we can't test Apple Silicon anyways...
commit 08174a2f6ebbd8ed5aa2bc4edc45da80962f06bb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Jan 1 21:35:19 2022 +0900
Evict <arm_sve.h> Requirement for SVE GEMM
For 8<= GCC < 10 compatibility.
commit 54fa28bd847b389215cffb57a83dc9b3dce79c86
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Dec 24 08:00:33 2021 -0600
Move edge cases to gemm ukr; more user-custom mods. (#583)
Details:
- Moved edge-case handling into the gemm microkernel. This required
changing the microkernel API to take m and n dimension parameters.
This required updating all existing gemm microkernel function pointer
types, function signatures, and related definitions to take m and n
dimensions. We also updated all existing kernels in the 'kernels'
directory to take m and n dimensions, and implemented edge-case
handling within those microkernels via a collection of new C
preprocessor macros defined within bli_edge_case_macro_defs.h. Also
removed the assembly code that formerly would handle general stride
IO on the microtile, since this can now be handled by the same code
that does edge cases.
- Pass the obj_t.ker_fn (of matrix C) into bli_gemm_cntl_create() and
bli_trsm_cntl_create(), where this function pointer is used in lieu of
the default macrokernel when it is non-NULL, and ignored when it is
NULL.
- Re-implemented macrokernel in bli_gemm_ker_var2.c to be a single
function using byte pointers rather that one function for each
floating-point datatype. Also, obtain the microkernel function pointer
from the .ukr field of the params struct embedded within the obj_t
for matrix C (assuming params is non-NULL and contains a non-NULL
value in the .ukr field). Communicate both the gemm microkernel
pointer to use as well as the params struct to the microkernel via
the auxinfo_t struct.
- Defined gemm_ker_params_t type (for the aforementioned obj_t.params
struct) in bli_gemm_var.h.
- Retired the separate _md macrokernel for mixed datatype computation.
We now use the reimplemented bli_gemm_ker_var2() instead.
- Updated gemmt macrokernels to pass m and n dimensions into microkernel
calls.
- Removed edge-case handling from trmm and trsm macrokernels.
- Moved most of bli_packm_alloc() code into a new helper function,
bli_packm_alloc_ex().
- Fixed a typo bug in bli_gemmtrsm_u_template_noopt_mxn.c.
- Added test/syrk_diagonal and test/tensor_contraction directories with
associated code to test those operations.
commit 961d9d509dd94f3a66f7095057e3dc8eb6d89839
Author: Kiran <kiran.varaganti@amd.com>
Date: Wed Dec 8 03:00:38 2021 +0530
Re-add BLIS_ENABLE_ZEN_BLOCK_SIZES macro for 'zen'.
Details:
- Added previously-deleted cpp macro block to bli_cntx_init_zen.c
targeting the Naples microarchitecture that enabled different cache
blocksizes when the number of threads exceeds 16. This commit
represents PR #573.
commit cf7d616a2fd58e293b496770654040818bf5609c
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Dec 2 17:10:03 2021 -0600
Enable user-customized packm ukernel/variant. (#549)
Details:
- Added four new fields to obj_t: .pack_fn, .pack_params, .ker_fn, and
.ker_params. These fields store pointers to functions and data that
will allow the user to more flexibly create custom operations while
recycling BLIS's existing partitioning infrastructure.
- Updated typed API to packm variant and structure-aware kernels to
replace the diagonal offset with panel offsets, and changed strides
of both C and P to inc/ldim semantics. Updated object API to the packm
variant to include rntm_t*.
- Removed the packm variant function pointer from the packm cntl_t node
definition since it has been replaced by the .pack_fn pointer in the
obj_t.
- Updated bli_packm_int() to read the new packm variant function pointer
from the obj_t and call it instead of from the cntl_t node.
- Moved some of the logic of bli_l3_packm.c to a new file,
bli_packm_alloc.c.
- Rewrote bli_packm_blk_var1.c so that it uses byte (char*) pointers
instead of typed pointers, allowing a single function to be used
regardless of datatype. This obviated having a separate implementation
in bli_packm_blk_var1_md.c. Also relegated handling of scalars to a
new function, bli_packm_scalar().
- Employed a new standard whereby right-hand matrix operands ("B") are
always packed as column-stored row panels -- that is, identically to
that of left-hand matrix operands ("A"). This means that while we pack
matrix A normally, we actually pack B in a transposed state. This
allowed us to simplify a lot of code throughout the framework, and
also affected some of the logic in bli_l3_packa() and _packb().
- Simplified bli_packm_init.c in light of the new B^T convention
described above. bli_packm_init()--which is now called from within
bli_packm_blk_var1()--also now calls bli_packm_alloc() and returns
a bool that indicates whether packing should be performed (or
skipped).
- Consolidated bli_gemm_int() and bli_trsm_int() into a bli_l3_int(),
which, among other things, defaults the new .pack_fn field of the
obj_t to bli_packm_blk_var1() if the field is NULL.
- Defined a new function, bli_obj_reset_origin(), which permanently
refocuses the view of an object so that it "forgets" any offsets from
its original pointer. This function also sets the object's root field
to itself. Calls to bli_obj_reset_origin() for each matrix operand
appear in the _front() functions, after the obj_t's are aliased. This
resetting of the underlying matrices' origins is needed in preparation
for more advanced features from within custom packm kernels.
- Redefined bli_pba_rntm_set_pba() from a regular function to a static
inline function.
- Updated gemm_ukr, gemmtrsm_ukr, and trsm_ukr testsuite modules to use
libblis_test_pobj_create() to create local packed objects. Previously,
these packed objects were created by calling lower-level functions.
commit e229e049ca08dfbd45794669df08a71dba892925
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 1 17:36:22 2021 -0600
Added recu-sed.sh script to 'build' directory.
Details:
- Added a recursive sed script to the 'build' directory.
commit 12c66a4acc77bf4927b01e2358e2ac10b61e0a53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 19 14:43:53 2021 -0600
Minor updates to README.md, docs/Addons.md.
Details:
- Add additional mentions of addons to README.md, including in the
"What's New" section.
- Removed mention of sandboxes from the long list of advantages
provided by BLIS.
- Very minor description update to opening line of Addons.md.
commit a4bc03b990fe0572001eb6409efd12cd70677dcf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 19 13:29:00 2021 -0600
Brief mention/link to Addons.md in README.md.
Details:
- Add a blurb about the new addons feature to the "Documentation for
BLIS developers" section of the README.md, which also links to the
Addons.md document.
commit b727645eb7a8df39dee74068f734da66322fe0b3
Merge: 9be97c15 7bde468c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 19 13:22:09 2021 -0600
Merge branch 'dev'
commit 9be97c150e19fa58bca30cb993a6509ae21e2025
Author: Madan mohan Manokar <86282872+madanm3@users.noreply.github.com>
Date: Thu Nov 18 00:46:46 2021 +0530
Support all four dts in test/test_her[2][k].c (#578)
Details:
- Replaced the hard-coded calls to double-precision real syr, syr2,
syrk, and syrk in the corresponding standalone test drivers in the
'test' directory with conditional branches that will call the
appropriate BLAS interface depending on which datatype is enabled.
Thanks to Madan mohan Manokar for this improvement.
- CREDITS file update.
commit 26e4b6b29312b472c3cadf95ccdf5240764777f4
Author: Dipal M Zambare <71366780+dzambare@users.noreply.github.com>
Date: Thu Nov 18 00:32:00 2021 +0530
Added support for AMD's Zen3 microarchitecture.
Details:
- Added a new 'zen3' subconfiguration targeting support for the AMD Zen3
microarchitecture (#561). Thanks to AMD for this contribution.
- Restructured clang and AOCC support for zen, zen2, and zen3
make_defs.mk files. The clang and AOCC version detection now happens
in configure, not in the subconfigurations' makefile fragments. That
is, we've added logic to configure that detects the version of
clang/AOCC, outputs an appropriate variable to config.mk
(ie: CLANG_OT_*, AOCC_OT_*), and then checks for it within the
makefile fragment (as is currently done for the GCC_OT_* variables).
- Added configure support for a GCC_OT_10_1_0 variable (and associated
substitution anchor) to communicate whether the gcc version is older
than 10.1.0, and use this variable to check for recent enough versions
of gcc to use -march=znver3 in the zen3 subconfig.
- Inlined the contents of config/zen/amd_config.mk into the zen and zen2
make_defs.mk so that the files are self-contained, harmonizing the
format of all three Zen-based subconfigurations' make_defs.mk files.
- Added indenting (with spaces) of GNU make conditionals for easier
reading in zen, zen2, and zen3 make_defs.mk files.
- Adjusted the range of models checked by bli_cpuid_is_zen() (which was
previously 0x00 ~ 0xff and is now 0x00 ~ 0x2f) so that it is
completely disjoint from the models checked by bli_cpuid_is_zen2()
(0x30 ~ 0xff). This is normally necessary because Zen and Zen2
microarchitectures share the same family (23, or 0x17), and so the
model code is the only way to differentiate the two. But in our case,
fixing the model range for zen *wasn't* actually necessary since we
checked for zen2 first, and therefore the wide zen range acted like
the 'else' of an 'if-else' statement. That said, the change helps
improve clarity for the reader by encoding useful knowledge, which
was obtained from https://en.wikichip.org/wiki/amd/cpuid .
- Added zen2.def and zen3.def files to the collection in travis/cpuid.
Note that support for zen, zen2, and zen3 is now present, and while
all the three microarchitectures have identical instruction sets from
the perspective of BLIS microkernels, they each correspond to
different subconfigurations and therefore merit separate testing.
Thanks to Devin Matthews for his guidance in hacking these files as
slight modifications of zen.def.
- Enabled testing of zen2 and zen3 via the SDE in travis/do_sde.sh.
Now, zen, zen2, and zen3 are tested through the SDE via Travis CI
builds.
- Updated travis/do_sde.sh to grab the SDE tarball from a new ci-utils
repository on GitHub rather than on Intel's website. This change was
made in an attempt to circumvent recent troubles with Travis CI not
being able to download the SDE directly from Intel's website via curl.
Thanks to Devin Matthews for suggesting the idea.
- Updated travis/do_sde.sh to grab the latest version (8.69.1) of the
Intel SDE from the flame/ci-utils repository.
- Updated .travis.yml to use gcc 9. The file was previously using gcc 8,
which did not support -march=znver2.
- Created amd64_legacy umbrella family in config_registry for targeting
older (bulldozer, piledriver, steamroller, and excavator)
microarchitectures and moved those same subconfigs out of the amd64
umbrella family. However, x86_64 retains amd64_legacy as a constituent
member.
- Fixed a bug in configure related to the building of the so-called
config list. When processing the contents of config_registry,
configure creates a series of structures and lists that allow for
various mappings related to configuration families, subconfigs, and
kernel sets. Two of those lists are built via substitution of
umbrella families with their subconfig members, and one of those
lists was improperly performing the substitution in a way that would
erroneously match on partial umbrella family names. That code was
changed to match the code that was already doing the substitution
properly, via substitute_words(). Also added comments noting the
importance of using substitute_words() in both instances.
- Comment updates.
commit 74c0c622216aba0c24aa2c3a923811366a160cf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 16 16:06:33 2021 -0600
Reverted cbc88fe.
Details:
- Reverted the annotation of some markdown code blocks with 'bash'
after realizing that the in-browser syntax highlighting was not
worthwhile.
commit cbc88feb51b949ce562d044cf9f99c4e46bb8a39
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 16 16:02:39 2021 -0600
Marked some markdown shell code blocks as 'bash'.
Details:
- Annotated the code blocks that represent shell commands and output as
'bash' in README.md and BuildSystem.md.
commit 78cd1b045155ddf0b9ec6e2ab815f2b216ad9a9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 16 15:53:40 2021 -0600
Added 'Example Code' section to README.md.
Details:
- Inserted a new 'Example Code' section into the README.md immediately
after the 'Getting Started' section. Thanks to Devin Matthews for
recommending this addition.
- Moved the 'Performance' section of the README down slightly so that it
appears after the 'Documentation' section.
commit 7bde468c6f7ecc4b5322d2ade1ae9c0b88e6b9f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 13 16:39:37 2021 -0600
Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
sandboxes except that there is no requirement to define gemm or any
other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
for requesting an addon be included within a BLIS build. configure now
outputs the list of enabled addons into config.mk. It also outputs the
corresponding #include directives for the addons' headers to a new
companion to the bli_config.h header file named bli_addon.h. Because
addons may wish to make use of existing BLIS types within their own
definitions, the addons' headers must be included sometime after that
of bli_config.h (which currently is #included before bli_type_defs.h).
This is why the #include directives needed to go into a new top-level
header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
build with them, and what assumptions their authors should keep in
mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.
commit 7bc8ab485e89cfc6032932e57929e208a28f4be5
Author: Meghana-vankadari <74656386+Meghana-vankadari@users.noreply.github.com>
Date: Fri Nov 12 04:16:14 2021 +0530
Added BLAS/CBLAS APIs for axpby, gemm_batch. (#566)
Details:
- Expanded the BLAS compatibility layer to include support for
?axpby_() and ?gemm_batch_(). The former is a straightforward
BLAS-like interface into the axpbyv operation while the latter
implements a batched gemm via loops over bli_?gemm(). Also
expanded the CBLAS compatibility layer to include support for
cblas_?axpby() and cblas_?gemm_batch(), which serve as wrappers to
the corresponding (new) BLAS-like APIs. Thanks to Meghana Vankadari
for submitting these new APIs via #566.
- Fixed a long-standing bug in common.mk that for some reason never
manifested until now. Previously, CBLAS source files were compiled
*without* the location of cblas.h being specified via a -I flag.
I'm not sure why this worked, but it may be due to the fact that
the cblas.h file resided in the same directory as all of the CBLAS
source, and perhaps compilers implicitly add a -I flag for the
directory that corresponds to the location of the source file being
compiled. This bug only showed up because some CBLAS-like source code
was moved into an 'extra' subdirectory of that frame/compat/cblas/src
directory. After moving the code, compilation for those files failed
(because the cblas.h header file, presumably, could not be found in
the same location). This bug was fixed within common.mk by explicitly
adding the cblas.h directory to the list of -I flags passed to the
compiler.
- Added test_axpbyv.c and test_gemm_batch.c files to 'test' directory,
and updated test/Makefile to build those drivers.
- Fixed typo in error message string in cblas_sgemm.c.
commit 28b0982ea70c21841fb23802d38f6b424f8200e1
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Nov 10 12:34:50 2021 -0600
Refactored her[2]k/syr[2]k in terms of gemmt. (#531)
Details:
- Renamed herk macrokernels and supporting files and functions to gemmt,
which is possible since at the macrokernel level they are identical.
Then recast herk/her2k/syrk/syr2k in terms of gemmt within the expert
level-3 oapi (bli_l3_oapi_ex.c) while also redefining them as literal
functions rather than cpp macros that instantiate multiple functions.
Thanks to Devin Matthews for his efforts on this issue (#531).
- Check that the maximum stack buffer size is sufficiently large
relative to the register blocksizes for each datatype, and do so when
the context is initialized rather than when an operation is called.
Note that with this change, users who pass in their own contexts into
the expert interfaces currently will *not* have any checks performed.
Thanks to Devin Matthews for suggesting this change.
commit cfa3db3f3465dc58dbbd842f4462e4b49e7768b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 3 18:13:56 2021 -0500
Fixed bug in mixed-dt gemm introduced in e9da642.
Details:
- Fixed a bug that broke certain mixed-datatype gemm behavior. This
bug was introduced recently in e9da642 when the code that performs
the operation transposition (for microkernel IO preference purposes)
was moved up so that it occurred sooner. However, when I moved that
code, I failed to notice that there was a cpp-protected "if"
conditional that applied to the entire code block that was moved. Once
the code block was relocated, the orphaned if-statement was now
(erroneously) glomming on to the next thing that happened to be in the
function, which happened to be the call to bli_rntm_set_ways_for_op(),
causing a rather odd memory exhaustion error in the sba due to the
num_threads field of the rntm_t still being -1 (because the rntm_t
field were never processed as they should have been). Thanks to
@ArcadioN09 (Snehith) for reporting this error and helpfully including
relevant memory trace output.
commit f065a8070f187739ec2b34417b8ab864a7de5d7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 28 16:05:43 2021 -0500
Removed support for 3m, 4m induced methods.
Details:
- Removed support for all induced methods except for 1m. This included
removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
code that existed only to support those implementations. These
implementations were rarely used and posed code maintenance challenges
for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
and streamlined the remaining code in that directory. The *ind(),
*nat(), and *1m() APIs were all removed. (These additional API layers
no longer made as much sense with only one induced method (1m) being
supported.) The bli_ind.c file (and header) were moved to frame/base
and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
pointer arithmetic that was previously expressed in terms of the
bli_ptr_inc_by_frac() static inline function (whose definition was
also removed).
- Removed the following subdirectories of level-0 macro headers from
frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
defined in these directories were used exclusively for 3m and 4m
method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
variants of the 3m and 4m methods. This leaves two bits unused within
the pack format portion of the schema bitfield. (See bli_type_defs.h
for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
operations' _front() functions to the corresponding _ex() function of
the object API. (This change roughly maintains where the _check()
functions are called in the call stack but lays the groundwork for
future changes that may come to the level-3 object APIs.) Minor
modifications to bli_l3_check.c to allow the check() functions to be
called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
induced methods, and updated the standalone test drivers in the 'test'
directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
of the *nat() functions no longer existing.) Also updated the existing
'power10' and 'gemmlike' sandboxes to come into compliance with the
new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
commit e8caf200a908859fa5f5ea2049911a9bdaa3d270
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 18 13:04:15 2021 -0500
Updated do_sde.sh to get SDE from GitHub.
Details:
- Updated travis/do_sde.sh so that the script downloads the SDE tarball
from a new ci-utils repository on GitHub rather than from Intel's
website. This change is being made in an attempt to circumvent Travis
CI's recent troubles with downloading the SDE from Intel's website via
curl. Thanks to Devin Matthews for suggesting the idea.
commit 290ff4b1c26737b074d5abbf76966bc22af8c562
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 14 16:09:43 2021 -0500
Disable SDE testing of old AMD microarchitectures.
Details:
- Skip testing on piledriver, steamroller, and excavator platforms
in travis/do_sde.sh.
commit 514fd101742dee557e5eb43d0023a221ae8a7172
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 14 13:50:28 2021 -0500
Fixed substitution bug in configure.
Details:
- Fixed a bug in configure related to the building of the so-called
config list. When processing the contents of config_registry,
configure creates a series of structures and list that allow for
various mappings related to configuration families, subconfigs,
and kernel sets. Two of those lists are built via subsitituion
of umbrella families with their subconfig members, and one of
those lists was improperly performing the subtitution in a way
that would erroneously match on partial umbrella family names.
That code was changed to match the code that was already doing
the subtitution properly, via substitute_words().
- Added comments noting the importance of using substitute_words()
in both instances.
commit e9da6425e27a9d63c9fef92afc2dd750c601ccd7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 13 14:15:38 2021 -0500
Allow use of 1m with mixing of row/col-pref ukrs.
Details:
- Fixed a bug that broke the use of 1m for dcomplex when the single-
precision real and double-precision real ukernels had opposing I/O
preferences (row-preferential sgemm ukernel + column-preferential
dgemm ukernel, or vice versa). The fix involved adjusting the API
to bli_cntx_set_ind_blkszs() so that the induced method context init
function (e.g., bli_cntx_init_<subconfig>_ind()) could call that
function for only one datatype at a time. This allowed the blocksize
scaling (which varies depending on whether we're doing 1m_r or 1m_c)
to happen on a per-datatype basis. This fixes issue #557. Thanks to
Devin Matthews and RuQing Xu for helping discover and report this bug.
- The aforementioned 1m fix required moving the 1m_r/1m_c logic from
bli_cntx_ref.c into a new function, bli_l3_set_schemas(), which is
called from each level-3 _front() function. The pack_t schemas in the
cntx_t were also removed entirely, along with the associated accessor
functions. This in turn required updating the trsm1m-related virtual
ukernels to read the pack schema for B from the auxinfo_t struct
rather than the context. This also required slight tweaks to
bli_gemm_md.c.
- Repositioned the logic for transposing the operation to accommodate
the microkernel IO preference. This mostly only affects gemm. Thanks
to Devin Matthews for his help with this.
- Updated dpackm pack ukernels in the 'armsve' kernel set to avoid
querying pack_t schemas from the context.
- Removed the num_t dt argument from the ind_cntx_init_ft type defined
in bli_gks.c. The context initialization functions for induced methods
were previously passed a dt argument, but I can no longer figure out
*why* they were passed this value. To reduce confusion, I've removed
the dt argument (including also from the function defintion +
prototype).
- Commented out setting of cntx_t schemas in bli_cntx_ind_stage.c. This
breaks high-leve implementations of 3m and 4m, but this is okay since
those implementations will be removed very soon.
- Removed some older blocks of preprocessor-disabled code.
- Comment update to test_libblis.c.
commit 81e103463214d589071ccbe2d90b8d7c19a186e4
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date: Wed Oct 13 20:28:02 2021 +0200
Alloc at least 1 elem in pool_t block_ptrs. (#560)
Details:
- Previously, the block_ptrs field of the pool_t was allowed to be
initialized as any unsigned integer, including 0. However, a length of
0 could be problematic given that malloc(0) is undefined and therefore
variable across implementations. As a safety measure, we check for
block_ptrs array lengths of 0 and, in that case, increase them to 1.
- Co-authored-by: Minh Quan Ho <minh-quan.ho@kalray.eu>
commit 327481a4b0acf485d0cbdd8635dd9b886ba3f2a7
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date: Tue Oct 12 19:53:04 2021 +0200
Fix insufficient pool-growing logic in bli_pool.c. (#559)
Details:
- The current mechanism for growing a pool_t doubles the length of the
block_ptrs array every time the array length needs to be increased
due to new blocks being added. However, that logic did not take in
account the new total number of blocks, and the fact that the caller
may be requesting more blocks that would fit even after doubling the
current length of block_ptrs. The code comments now contain two
illustrating examples that show why, even after doubling, we must
always have at least enough room to fit all of the old blocks plus
the newly requested blocks.
- This commit also happens to fix a memory corruption issue that stems
from growing any pool_t that is initialized with a block_ptrs length
of 0. (Previously, the memory pool for packed buffers of C was
initialized with a block_ptrs length of 0, but because it is unused
this bug did not manifest by default.)
- Co-authored-by: Minh Quan Ho <minh-quan.ho@kalray.eu>
commit 32a6d93ef6e2af5e486dfd5e46f8272153d3d53d
Merge: 408906fd 2604f407
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Oct 9 15:53:54 2021 -0500
Merge pull request #543 from xrq-phys/armsve-packm-fix
ARMSVE Block SVE-Intrinsic Kernels for GCC 8-9
commit 408906fdd8892032aa11bd061b7971128f453bef
Merge: 4277fec0 ccf16289
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Oct 9 15:50:25 2021 -0500
Merge pull request #542 from xrq-phys/armsve-zgemm
Arm SVE CGEMM / ZGEMM Natural Kernels
commit ccf16289d2e71fd9511ccf2d13dcebbfa29deabc
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Oct 8 12:34:14 2021 +0900
Arm SVE C/ZGEMM Fix FMOV 0 Mistake
FMOV [hsd]M, #imm does not allow zero immediate.
Use wzr, xzr instead.
commit 82b61283b2005f900101056e6df2a108258db602
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Oct 8 12:17:29 2021 +0900
SH Kernel Unused Eigher
commit 1749dfa493054abd2e4ddba7cb21278d337e4f74
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Oct 8 12:11:53 2021 +0900
Arm SVE C/ZGEMM Support *beta==0
commit 4b648e47daad256ab8ab698173a97f71ab9f75eb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Sep 22 16:42:09 2021 +0900
Arm SVE Config armsve Use ZGEMM/CGEMM
commit f76ea905e216cf640975e6319c6d2f54aeafed2e
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Tue Sep 21 20:38:44 2021 +0900
Arm SVE: Update Perf. Graph
Pic. size seems a bit different from upstream.
Generaged w/ MATLAB. Open to any change.
commit 66a018e6ad00d9e8967b67e1aa3e23b20a7efdfe
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Mon Sep 20 00:16:11 2021 +0900
Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0
commit 9e1e781cb59f8fadb2a10a02376d3feac17ce38d
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sun Sep 19 23:30:42 2021 +0900
Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0
commit f7c6c2b119423e7ba7a24ae2156790e076071cba
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Sep 16 01:47:42 2021 +0900
A64FX Config Use ZGEMM/CGEMM
commit e4cabb977d038688688aca39b366f98f9c36b7eb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Sep 16 01:34:26 2021 +0900
Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg
commit b677e0d61b23f26d9536e5c363fd6bbab6ee1540
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Sep 16 01:18:54 2021 +0900
Arm SVE Add SGEMM 2Vx10 Unindexed
commit 3f68e8309f2c5b31e25c0964395a180a80014d36
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Sep 16 01:00:54 2021 +0900
Arm SVE ZGEMM Support Gather Load / Scatt. St.
commit c19db2ff826e2ea6ac54569e8aa37e91bdf7cabe
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Sep 15 23:39:53 2021 +0900
Arm SVE Add ZGEMM 2Vx10 Unindexed
commit e13abde30b9e0e381c730c496e74bc7ae062a674
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Sep 15 04:19:45 2021 +0900
Arm SVE Add ZGEMM 2Vx7 Unindexed
commit 49b9d7998eb86f340ae7b26af3e5a135d6a8feee
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Tue Sep 14 04:02:47 2021 +0900
Arm SVE Add ZGEMM 2Vx8 Unindexed
commit 4277fec0d0293400497ae8bcfc32be5e62319ae9
Merge: 2329d990 f44149f7
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 7 13:47:22 2021 -0500
Merge pull request #533 from xrq-phys/arm64-hi-bw
ARMv8 PACKM and GEMMSUP Kernels + Apple Firestorm Subconfig
commit 2329d99016fe1aeb86da4552295f497543cea311
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 7 12:37:58 2021 -0500
Update Travis CI badge
[ci skip]
commit f44149f787ae3d4b53d9c4d8e6f23b2818b7770d
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Oct 8 02:35:58 2021 +0900
Armv8 Trash New Bulk Kernels
- They didn't make much improvements.
- Can't register row-preferral and column-preferral ukrs at the same time.
Will break 1m.
commit 70b52cadc5ef4c16431e1876b407019e6286614e
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 7 12:34:35 2021 -0500
Enable testing 1m in `make check`.
commit 2604f4071300d109f28c8438be845aeaf3ec44e4
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Oct 7 02:39:00 2021 +0900
Config ArmSVE Unregister 12xk. Move 12xk to Old
commit 1e3200326be9109eb0f8c7b9e4f952e45700cbba
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Oct 7 02:37:14 2021 +0900
Revert __has_include(). Distinguish w/ BLIS_FAMILY_**
commit a4066f278a5c06f73b16ded25f115ca4b7728ecb
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Oct 7 02:26:05 2021 +0900
Register firestorm into arm64 Metaconfig
commit d7a3372247c37568d142110a1537632b34b8f2ff
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Oct 7 02:25:14 2021 +0900
Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo
commit 2920dde5ac52e09f84aa42990aab8340421522ce
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Oct 7 02:01:45 2021 +0900
Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo
commit 14b13583f1802c002e195b3b48874b3ebadbeb20
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Oct 6 10:22:34 2021 -0500
Add test for Apple M1 (firestorm)
This test will run on Linux, but all the kernels should run just fine. This does not test autodetection but then none of the other ARM tests do either.
commit a024715065532400da6257b8b3124ca5aecda405
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Oct 7 00:15:54 2021 +0900
Firestorm CPUID Dispatcher
Commenting out <sys/sysctl.h> due to possibly a Xcode bug.
commit b9da6d55fec447d05c8b67f34ce83617123d8357
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Oct 6 12:25:54 2021 +0900
Armv8 GEMMSUP Edge Cases Require Signed Ints
Fix a bug in bli_gemmsup_rd_armv8a_asm_d6x8m.c.
For safety upon similar strategies in the future,
change all [mn]_[iter/left] into signed ints.
commit 34919de3df5dda7a06fc09dcec12ca46dc8b26f4
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Oct 2 18:48:50 2021 -0500
Make error checking level a thread-local variable.
Previously, this was a global variable. Setting the value was synchronized via a mutex but reading the value was not. Of course, these accesses are almost certainly atomic, but there is still the possibility of one thread attempting to set the value and then reading the value set by another thread. For correct operation under user threading (e.g. pthreads), this should probably be thread-local with no mutex.
commit c3024993c3d50236fad112822215f066496c5831
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 5 15:20:27 2021 -0500
Fix data race in testsuite.
commit 353a0d82572f26e78102cee25693130ce6e0ea5b
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 5 14:24:17 2021 -0500
Update .appveyor.yml
[ci skip]
commit 4bfadf9b561d4ebe0bbaf8b6d332f07ff531d618
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Oct 6 01:51:26 2021 +0900
Firestorm Block Size Fixes
commit 40baf83f0ea2749199b93b5a8ac45c01794b008c
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Oct 6 01:00:52 2021 +0900
Armv8 Handle *beta == 0 for GEMMSUP ??r Case.
commit 079fbd42ce8cf7ea67a939b0f80f488de5821319
Merge: f5c03e9f 9905f443
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 17:21:48 2021 -0500
Merge branch 'master' into arm64-hi-bw
commit 9905f44347eea4c57ef4927b81f1c63e76a92739
Merge: 6d3036e3 64a421f6
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 15:58:59 2021 -0500
Merge pull request #553 from flame/rpath-fix
Add an option to use an @rpath-dependent install_name on macOS
commit 6d3036e31d8a2c1acbc1260489eeb8f535a8f97a
Merge: 53377fcc eaa554aa
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 15:58:43 2021 -0500
Merge pull request #545 from hominhquan/clean_error
bli_error: more cleanup on the error strings array
commit 53377fcca91e595787b38e2a47780ac0c35a7e7c
Merge: d0a0b4b8 80c5366e
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 15:45:53 2021 -0500
Merge pull request #554 from flame/armsve-cleanup
Move unused ARM SVE kernels to "old" directory.
commit 80c5366e4a9b8b72d97fba1eab89bab8989c44f4
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 15:40:28 2021 -0500
Move unused ARM SVE kernels to "old" directory.
commit 64a421f6983ab5bc0b55df30a2ddcfff5bfd73be
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 13:40:43 2021 -0500
Add an option to control whether or not to use @rpath.
Adds `--enable-rpath/--disable--rpath` (default disabled) to use an install_name starting with @rpath/. Otherwise, set the install_name to the absolute path of the install library, which was the previous behavior.
commit c4a31683dd6f4da3065d86c11dd998da5192740a
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 13:27:10 2021 -0500
Fix $ORIGIN usage on linux.
commit d0a0b4b841fce56b7b2d3c03c5d93ad173ce2b97
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Mon Oct 4 18:03:04 2021 +0000
Arm micro-architecture dispatch (#344)
Details:
- Reworked support for ARM hardware detection in bli_cpuid.c to parse
the result of a CPUID-like instruction.
- Added a64fx support to bli_gks.c.
- #include arm64 and arm32 family headers from bli_arch_config.h.
- Fix the ordering of the "armsve" and "a64fx" strings in the
config_name string array in bli_arch.c. The ordering did not match
the ordering of the corresponding arch_t values in bli_type_defs.h,
as it should have all along.
- Added clang support to make_defs.mk in arm64, cortexa53, cortexa57
subconfigs.
- Updated arm64 and arm32 families in config_registry.
- Updated docs/HardwareSupport.md to reflect added ARM support.
- Thanks to Dave Love, RuQing Xu, and Devin Matthews for their
contributions in this PR (#344).
commit 91408d161a2b80871463ffb6f34c455bdfb72492
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Oct 4 11:37:48 2021 -0500
Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries.
- RPATH entries (and DYLD_LIBRARY_PATH) do nothing on macOS unless the install_name of the library starts with @rpath/. While the install_name can be set to the absolute install path, this makes the installation non-relocatable. When using @path in the install_name, install paths within the normal DYLD_LIBRARY_PATH work with no changes on the user side, but for install paths off the beaten track, users must specify an RPATH entry when linking (or modify DYLD_LIBRARY_PATH at runtime). Perhaps this could be made into a configure-time option.
- Having relocable testsuite binaries is not necessarily a priority but it is easy to do with @executable_path (macOS) or $ORIGIN (linux/BSD).
commit f5c03e9fe808f9bd8a3e0c62786334e13c46b0fc
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sun Oct 3 16:51:51 2021 +0900
Armv8 Handle *beta == 0 for GEMMSUP ?rc Case.
commit abc648352c591e26ceee436bd3a45400115b70c5
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sun Oct 3 13:14:19 2021 +0900
Armv8 Fix 6x8 Row-Maj Ukr
- Fixed for 6x8 only, 4x4 & 4x8 pending;
- Installed to config firestorm as benchmark seems to show better perf:
Old:
blis_dgemm_ukr_c 6 8 320 36.87 2.43e-17 PASS
blis_dgemm_ukr_c 6 8 352 40.55 1.04e-17 PASS
blis_dgemm_ukr_c 6 8 384 44.24 5.68e-17 PASS
blis_dgemm_ukr_c 6 8 416 41.67 3.51e-17 PASS
blis_dgemm_ukr_c 6 8 448 34.41 2.94e-17 PASS
blis_dgemm_ukr_c 6 8 480 42.53 2.35e-17 PASS
New:
blis_dgemm_ukr_r 6 8 352 50.69 1.59e-17 PASS
blis_dgemm_ukr_r 6 8 384 49.15 5.55e-17 PASS
blis_dgemm_ukr_r 6 8 416 50.44 2.86e-17 PASS
blis_dgemm_ukr_r 6 8 448 46.92 3.12e-17 PASS
blis_dgemm_ukr_r 6 8 480 48.08 4.08e-17 PASS
commit 0a45bc0fbc7aee3876c315ed567fc37f19cdc57f
Merge: 5013a6cb 13dbd5b5
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Oct 2 18:59:43 2021 -0500
Merge pull request #552 from flame/armsve_beta_0
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
commit 13dbd5b5d3dbf27e33ecf0e98d43c97019a6339d
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Oct 2 20:40:25 2021 +0000
Apply patch from @xrq-phys.
commit ae0eeeaf77c77892db17027cef10b95ec97c904f
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Sep 29 16:42:33 2021 -0500
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
commit 5013a6cb7110746c417da96e4a1308ef681b0b88
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 29 10:38:50 2021 -0500
More edits and fixes to docs/FAQ.md.
commit b36fb0fbc5fda13d9a52cc64953341d3d53067ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 28 18:47:45 2021 -0500
Fixed newly broken link to CREDITS in FAQ.md.
commit 3442d4002b3bfffd8848f72103b30691df2b19b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 28 18:43:23 2021 -0500
More minor fixes to FAQ.md and Sandboxes.md.
commit 89aaf00650d6cc19b83af2aea6c8d04ddd3769cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 28 18:34:33 2021 -0500
Updates to FAQ.md, Sandboxes.md, and README.md.
Details:
- Updated FAQ.md to include two new questions, reordered an existing
question, and also removed an outdated and redundant question about
BLIS vs. AMD BLIS.
- Updated Sandboxes.md to use 'gemmlike' as its main example, along with
other smaller details.
- Added ARM as a funder to README.md.
commit c52c43115ec2264fda9380c48d9e6bb1e1ea2ead
Merge: 1fc23d21 1f527a93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Sep 26 15:56:54 2021 -0500
Merge branch 'dev'
commit 1fc23d2141189c7b583a5bff2cffd87fd5261444
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 21 14:54:20 2021 -0500
Safelist 'master', 'dev', 'amd' branches.
Details:
- Modified .travis.yml so that only commits to 'master', 'dev', and
'amd' branches get built by Travis CI. Thanks to Devin Matthews for
helping to track down the syntax for this change.
commit 1f527a93b996093e06ef7a8e94fb47ee7e690ce0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 20 17:56:36 2021 -0500
Re-enable and fix fb93d24.
Details:
- Re-enabled the changes made in fb93d24.
- Defined BLIS_ENABLE_SYSTEM in bli_arch.c, bli_cpuid.c, and bli_env.c,
all of which needed the definition (in addition to config_detect.c) in
order for the configure-time hardware detection binary to be compiled
properly. Thanks to Minh Quan Ho for helping identify these additional
files as needing to be updated.
- Added additional comments to all four source files, most notably to
prompt the reader to remember to update all of the files when updating
any of the files. Also made the cpp code in each of the files as
consistent/similar as possible.
- Refer to issues #532 and PR #546 for more history.
commit 7b39c1492067de941f81b49a3b6c1583290336fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 20 16:13:50 2021 -0500
Reverted fb93d24.
Details:
- The latest changes in fb93d24 are still causing problems. Reverting
and preparing to move them to a branch.
commit fb93d242a4fef4694ce2680436da23087bbdd5fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 20 15:42:08 2021 -0500
Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
Details:
- Re-enable the changes originally made in 8e0c425 but quickly reverted
in 2be78fc.
- Moved the #include of bli_config.h so that it occurs before the
#include of bli_system.h. This allows the #define BLIS_ENABLE_SYSTEM
or #define BLIS_DISABLE_SYSTEM in bli_config.h to be processed by the
time it is needed in bli_system.h. This change should have been
in the original 8e0c425, but was accidentally omitted. Thanks to Minh
Quan Ho for catching this.
- Add #define BLIS_ENABLE_SYSTEM to config_detect.c so that the proper
cpp conditional branch executes in bli_system.h when compiling the
hardware detection binary. The changes made in 8e0c425 were an attempt
to support the definition of BLIS_OS_NONE when configuring with
--disable-system (in issue #532). That commit failed because, aside
from the required but omitted header reordering (second bullet above),
AppVeyor was unable to compile the hardware detection binary as a
result of missing Windows headers. This commit, which builds on PR
#546, should help fix that issue. Thanks to Minh Quan Ho for his
assistance and patience on this matter.
commit eaa554aa52b879d181fdc87ba0bfad3ab6131517
Author: Minh Quan HO <minh-quan.ho@kalray.eu>
Date: Wed Sep 15 15:39:36 2021 +0200
bli_error: more cleanup on the error strings array
- There was redundance between the macro BLIS_MAX_NUM_ERR_MSGS (=200) and
the enum BLIS_ERROR_CODE_MAX (-170), while they both mean the same thing:
the maximal number of error codes/messages.
- The previous initialization of error messages at compile time ignored that
the 'bli_error_string' array still occupies useless memory due to 2D char[][]
declaration. Instead, it should be just an array of pointers, pointing at
strings in .rodata section.
- This commit does the two modifications:
* retired macros BLIS_MAX_NUM_ERR_MSGS and BLIS_MAX_ERR_MSG_LENGTH everywhere
* switch bli_error_string from char[][] to char *[] to reduce its footprint
from 40KB (200*200) to 1.3KB (170*sizeof(char*)).
(No problem to use the enum BLIS_ERROR_CODE_MAX at compile-time,
since compiler is smart enough to determine its value is 170.)
commit 52f29f739dbbb878c4cde36dbe26b82847acd4e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 17 08:38:29 2021 -0500
Removed last vestige of #define BLIS_NUM_ARCHS.
Details:
- Removed the commented-out #define BLIS_NUM_ARCHS in bli_type_defs.h
and its associated (now outdated) comments. BLIS_NUM_ARCHS has been
part of the arch_t enum for some time now, and so this change is
mostly about removing any opportunity for confusion for people who
may be reading the code. Thanks to Minh Quan Ho for leading me to
cleanup.
commit 849aae09f4fbf8d7abf11f4df1471f1d057e874b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 16 14:47:45 2021 -0500
Added new packm var3 to 'gemmlike'.
Details:
- Defined a new packm variant for the 'gemmlike' sandbox. This new
variant (bls_l3_packm_var3.c) parallelizes the packing operation over
the k dimension rather than the m or n dimensions. Note that the
gemmlike implementation still uses var1 by default, and use of the new
code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
so that var3 is called instead. Thanks to Jeff Diamond for proposing
this (perhaps NUMA-friendly) solution.
commit b6f71fd378b7cd0cdc5c780e0b8c975a7abde998
Merge: 9293a68e e3dc1954
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Sep 16 12:24:33 2021 -0500
Merge pull request #544 from flame/haswell-gemmsup-fpe
Fix more copy-paste errors in the haswell gemmsup code.
commit e3dc1954ffb5eee2a8b41fce85ba589f75770eea
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Sep 16 10:59:37 2021 -0500
Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
The fix is to use the same (valid) source register twice in the horizontal addition.
commit 5191c43faccf45975f577c60b9089abee25722c9
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Sep 16 10:16:17 2021 -0500
Fix more copy-paste errors in the haswell gemmsup code.
Fixes #486.
commit 30c29b256ef13f0141ca9e9169cbdc7a45ce3a61
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Sep 16 05:01:03 2021 +0900
Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9
Affected configs: a64fx.
commit bffa85be59dece8e756b9444e762f18892c06ee1
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Sep 16 04:31:45 2021 +0900
Arm SVE: Correct PACKM Ker Name: Intrinsic Kers
SVE-Intrinsic-based kernels ought not to use asm in their names.
commit 9293a68eb6557a9ea43a846435908c3d52d4218b
Merge: ade10f42 98ce6e8b
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 10 14:13:29 2021 -0500
Merge pull request #534 from flame/cxx_test
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible
commit 98ce6e8bc916e952510872caa60d818d62a31e69
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 10 14:12:13 2021 -0500
Do a fast test on OSX. [ci skip]
commit c76fcad0c2836e7140b6bef3942e0a632a5f2cda
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 10 13:57:02 2021 -0500
Fix AArch64 tests and consolidate some other tests.
commit e486d666ffefee790d5e39895222b575886ac1ea
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 10 13:50:16 2021 -0500
Use C++ cross-compiler for ARM tests.
commit fbb3560cb8e2aeab205c47c2b096d4fa306d93db
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 10 13:38:27 2021 -0500
Attempt to fix cxx-test for OOT builds.
commit 9c0064f3f67d59263c62d57ae19605562bb87cc2
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Sep 10 10:39:04 2021 -0500
Fix config_name in bli_arch.c
commit ade10f427835d5274411cafc9618ac12966eb1e7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 27 12:47:12 2021 -0500
Updated travis-ci.org link in README.md to .com.
commit 2be78fc97777148c83d20b8509e38aa1fc1b4540
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 27 12:17:26 2021 -0500
Disabled (at least temporarily) commit 8e0c425.
Details:
- Reverted changes in 8e0c425 due to AppVeyor build failures that we do
not yet understand.
commit 820f11a4694aee5f234e24277aecca40885ae9d4
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Aug 27 13:40:26 2021 +0900
Arm Whole GEMMSUP Call Route is Asm/Int Optimized
- `ref2` call in `bli_gemmsup_rv_armv8a_asm_d6x8m.c` is commented out.
- `bli_gemmsup_rv_armv8a_asm_d4x8m.c` contains a tail `ref2` call but
it's not called by any upper routine.
commit 8e0c4255de52a0a5cffecbebf6314aa52120ebe4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 26 15:29:18 2021 -0500
Define BLIS_OS_NONE when using --disable-system.
Details:
- Modified bli_system.h so that the cpp macro BLIS_OS_NONE is defined
when BLIS_DISABLE_SYSTEM is defined. Otherwise, the previous OS-
detecting macro conditionals are considered. This change is to
accommodate a solution to a cross-compilation issue described in
#532.
commit d6eb70fbc382ad7732dedb4afa01cf9f53e3e027
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 26 13:12:39 2021 -0500
Updated stale calls to malloc_intl() in gemmlike.
Details:
- Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
sandbox. These calls to malloc_intl(), which resided in
bls_l3_decor_pthreads.c, were missing the err_t argument that the
function uses to report errors. Thanks to Jeff Diamond for helping
isolate this issue.
commit 2f7325b2b770a15ff8aaaecc087b22238f0c67b7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 23 15:04:05 2021 -0500
Blacklist clang10/gcc9 and older for 'armsve'.
Details:
- Prohibit use of clang 10.x and older or gcc 9.x and older for the
'armsve' subconfiguration. Addresses issue #535.
commit 7e2951e61fda1c325d6a76ca9956253482d84924
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Mon Aug 23 17:06:44 2021 +0900
Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref
Ref cannot handle panel strides (packed cases) thus cannot be called
from the beginning of `gemmsup` (i.e. cannot be dispatch target of
gemmsup to other sizes.)
commit 4fd82b0e9348553d83e258bd4969e49a81f8fcf0
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Mon Aug 23 05:18:32 2021 +0900
Header Typo
commit 35409ebe67557c0e7cf5ced138c8166c9c1c909f
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Mon Aug 23 04:51:47 2021 +0900
Arm: DGEMMSUP ??r(rv) Invoke Edge Size
Plus some fix at edges.
TODO: Should ensure that no ref kernel appear in beginning of gemmsup
kernels. As ref does not recognise panel stride.
commit a361492c24fdd919ee037763fc6523e8d7d2967a
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Mon Aug 23 01:13:39 2021 +0900
Arm: DGEMMSUP ?rc(rd) Invoke Edge Size
commit eaea67401c2ab31f2e51eede59725f64c1a21785
Merge: 5fc65cdd e320ec6d
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Aug 21 16:09:31 2021 -0500
Merge branch 'master' into cxx_test
commit 5fc65cdd9e4134c5dcb16d21cd4a79ff426ca9f3
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Aug 21 15:59:27 2021 -0500
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.
commit e320ec6d5cd44e03cb2e2faa1d7625e84f76d668
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 20 17:15:20 2021 -0500
Moved lang defs from _macro_def.h to _lang_defs.h.
Details:
- Moved miscellaneous language-related definitions, including defs
related to the handling of the 'restrict' keyword, from the top half
of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
#included immediately after "bli_system.h" in blis.h. This change is
an attempt to fix a report of recent breakage of C++ compilers due
to the recent introduction of 'restrict' in bli_type_defs.h (which
previously was being included *before* bli_macro_defs.h and its
restrict handling therein. Thanks to Ivan Korostelev for reporting
this issue in #527.
- CREDITS file update.
commit e6799b26a6ecf1e80661a77d857d1c9e9adf50dc
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Aug 21 02:39:38 2021 +0900
Arm: Implement GEMMSUP Fallback Method
bli_dgemmsup_rv_armv8a_int_6x4mn
commit 7d5903d8d7570090eb37c592094424d1c64805d1
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Aug 21 01:55:50 2021 +0900
Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin
Forgot to support `alpha`/`beta` in gemmsup_armv8a_int.
commit 3b275f810b2479eb5d6cf2296e97a658cf1bb769
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 19 16:06:46 2021 -0500
Minor tweaks to gemmlike sandbox.
Details:
- In the gemmlike sandbox, changed the loop index variable of inner
loop of packm_cxk() from 'd' to 'i' (and likewise for the
corresponding inlined code within packm_var2()).
- Pack matrices A and B using packm_var1() instead of packm_var2().
commit 3eccfd456e7e84052c9a429dcde1183a7ecfaa48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 19 13:22:10 2021 -0500
Added local _check() code to gemmlike sandbox.
Details:
- Added code to the gemmlike sandbox that handles parameter checking.
Previously, the gemmlike implementation called bli_gemm_check(), which
resides within the BLIS framework proper. Certain modifications that a
user may wish to perform on the sandbox, such as adding a new matrix
or vector operand, would have required additional checks, and so these
changes make it easier for such a person to implement those checks for
their custom gemm-like operation.
commit 7144230cdb0653b70035ddd91f7f41e06ad8d011
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 18 13:25:39 2021 -0500
README.md citation updates (e.g. BLIS7 bibtex).
commit 4a955e939044cfd2048cf9f3e33024e3ad1fbe00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 16 13:49:27 2021 -0500
Tweaks to gemmlike to facilitate 3rd party mods.
Details:
- Changed the implementation in the 'gemmlike' sandbox to more easily
allow others to provide custom implementations of packm. These changes
include:
- Calling a local version of packm_cxk() that can be modified. This
version of packm_cxk() uses inlined loops in packm_cxk() rather
than querying the context for packm kernels (or even using scal2m).
- Providing two variants of packm, one of which calls the
aforementioned packm_cxk(), the other of which inlines the contents
of packm_cxk() into the variant itself, making it self-contained.
To switch from one to the other, simply change which function gets
called within bls_packm_a() and bls_packm_b().
- Simplified and cleaned up some variant names in both variants of
packm, relative to their parent code.
commit 2c0b4150e40c83ea814f69ca766da74c19ed0a58
Merge: c99fae50 4b8ed99d
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Aug 14 18:41:35 2021 -0500
Merge pull request #527 from flame/obj_t_makeover
Implement proposed new function pointer fields for obj_t.
commit 4b8ed99d926876fbf54c15468feae4637268eb6b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 13 15:31:10 2021 -0500
Whitespace tweaks.
commit c99fae50ac3de0b5380a085aeebebfe67a645407
Merge: e6d68bc4 4f70eb79
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Aug 13 14:48:00 2021 -0500
Merge pull request #530 from flame/fix_clang_warnings
Clean up some warnings that show up on clang/OSX.
commit e6d68bc4fd0981bea90d7f045779cacfe53f6ae8
Merge: 20a1c401 ec06b6a5
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Aug 13 14:47:46 2021 -0500
Merge pull request #529 from flame/fix_make_check_dependencies
Add dependency on the "flat" blis.h file for the BLIS and BLAS testuite objects.
commit 1772db029e10e0075b5a59d3fb098487b1ad542a
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Aug 13 14:46:35 2021 -0500
Add row- and column-strides for A/B in obj_ukr_fn_t.
commit 4f70eb7913ad3ded193870361b6da62b20ec3823
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Aug 13 11:12:43 2021 -0500
Clean up some warnings that show up on clang/OSX.
commit 3cddce1e2a021be6064b90af30022b99cbfea986
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 12 22:32:34 2021 -0500
Remove schema field on obj_t (redundant) and add new API functions.
commit ec06b6a503a203fa0cdb23273af3c0e3afeae7fa
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 12 19:27:31 2021 -0500
Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
This fixes a bug where "make -j<N> check" may fail after a change to one or more header files, or where testsuite code doesn't get properly recompiled after internal changes.
commit 20a1c4014c999063e6bc1cfa605b152454c5cbf4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 12 14:44:04 2021 -0500
Disabled sanity check in bli_pool_finalize().
Details:
- Disabled a sanity check in bli_pool_finalize() that was meant to alert
the user if a pool_t was being finalized while some blocks were still
checked out. However, this is exactly the situation that might happen
when a pool_t is re-initialized for a larger blocksize, and currently
bli_pool_reinit() is implemeneted as _finalize() followed by _init().
So, this sanity check is not universally appropriate. Thanks to
AMD-India for reporting this issue.
commit e366665cd2b5ae8d7683f5ba2de345df0a41096f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 12 14:06:53 2021 -0500
Fixed stale API calls to membrk API in gemmlike.
Details:
- Updated stale calls to the bli_membrk API within the 'gemmlike'
sandbox. This API is now called bli_pba (packed block allocator).
Ideally, this forgotten update would have been included as part of
21911d6, which is when the branch where the membrk->pba changes was
introduced was merged into 'master'.
- Comment updates.
commit e38ca28689f31c5e5bd2347704dc33042e5ea176
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Aug 13 03:21:19 2021 +0900
Added Apple Firestorm (A14/M1) Subconfig
- Use the same bulk kernel as Cortex-A53 / ThunderX2;
- Larger block size;
- Use gemmsup kernels for double precision.
commit 3df0e9b653fbb1293cad93010273eea579e753d9
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Jul 17 04:21:53 2021 +0900
Arm64 8x4 Kernel Use Less Regs
commit 4e7e225057a05b9722ce65ddf75a9c31af9fbf36
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Jun 9 15:46:36 2021 +0900
Armv8-A Supplimentary GEMMSUP Sizes for RD
commit c792d506ba09530395c439051727631fd164f59a
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Jun 5 04:20:24 2021 +0900
Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm
Suffixed NEON opcode is not supported by GNU assembler
commit ce4473520975c2c8790c82c65a69d75f8ad758ea
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Jun 5 04:08:14 2021 +0900
Armv8-A Adjust Types for PACKM Kernels
GCC does not have full NEON intrinsics support.
commit 8a32d19af85b61af92fcab1c316fb3be1a8d42ce
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Jun 5 03:31:30 2021 +0900
Armv8-A GEMMSUP-RD 6x8m
Armv8-A now has a complete set of GEMMSUP kernels..
commit afd0fa6ad1889ed073f781c8aa8635f99e76b601
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat Jun 5 01:19:01 2021 +0900
Armv8-A GEMMSUP-RD 6x8n
commit 3c5f7405148ab142dee565d00da331d95a7a07b9
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Jun 4 21:50:51 2021 +0900
Armv8-A s/d Packing Kernels Fix Typo
For GCC.
commit 49b05df7929ec3abc0d27b475d2d406116fe2682
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Fri Jun 4 18:04:59 2021 +0900
Armv8-A Introduced s/d Packing Kernels
Sizes according to the 2014 kernels.
commit c3faf93168c3371ff48a2d40d597bdb27021cad4
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Jun 3 23:09:05 2021 +0900
Armv8-A DGEMMSUP 6x8m Kernel
Recommended kernels set:
...
BLIS_RRR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE,
BLIS_RCR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE,
BLIS_RCC, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE,
BLIS_CRR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE,
BLIS_CCR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE,
BLIS_CCC, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE,
...
bli_blksz_init ( &blkszs[ BLIS_MR ], -1, 6, -1, -1,
-1, 8, -1, -1 );
bli_blksz_init_easy( &blkszs[ BLIS_NR ], -1, 8, -1, -1 );
...
commit 3efe707b5500954941061d4c2363d6ed41d17233
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Jun 3 17:20:57 2021 +0900
Armv8-A DGEMMSUP Adjustments
commit 8ed8f5e625de9b77a0f14883283effe79af01771
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Jun 3 16:37:37 2021 +0900
Armv8-A Add More DGEMMSUP
- Add 6x8 GEMMSUP.
- Adjust prefetching.
- Workaround for Clang's disability to handle reg clobbering.
- Subproduct 6x8 row-major GEMM <- incomplete.
commit a9ba79ea14de3b5a271e5970cb473d3c52e2fa5f
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Jun 2 15:04:29 2021 +0900
Armv8-A Add GEMMSUP 4x8n Kernel
- Compile w/ both GCC & Clang.
- Edge cases use ref-kernels.
- Can give performance boost in some contexts.
commit df40efe8fbfd399d76c6000ec03791a9b76ffbdf
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed Jun 2 00:04:20 2021 +0900
Armv8-A Add Part of GEMMSUP 8x4m Kernel
- Compile w/ both GCC & Clang
- Only block part is implement. Edge cases WIP
- Not Optimal kernel scheme. Should do 4x8 instead
commit 66399992881316514f64d68ec9eb60a87d53f674
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 29 05:52:05 2021 +0900
Armv8A DGEMM 4x4 Kernel WIP. Slow
Quite slow.
commit a29c16394ccef02d29141c79b71fb408e20073e6
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 29 04:58:45 2021 +0900
Armv8-A Add 8x4 Kernel WIP
Test result: a bit lower GFlOps than 6x8.
commit 64a1f786d58001284aa4f7faf9fae17f0be7a018
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Aug 11 17:53:12 2021 -0500
Implement proposed new function pointer fields for obj_t.
The added fields:
1. `pack_t schema`: storing the pack schema on the object allows the macrokernel to act accordingly without side-channel information from the rntm_t and cntx_t. The pack schema and "pack_[ab]" fields could be removed from those structs.
2. `void* user_data`: this field can be used to store any sort of additional information provided by the user. The pointer is propagated to submatrix objects and copies, but is otherwise ignored by the framework and the default implementations of the following three fields. User-specified pack, kernel, or ukr functions can do whatever they want with the data, and the user is 100% responsible for allocating, assigning, and freeing this buffer.
3. `obj_pack_fn_t pack`: the function called when a matrix is packed. This functions receives the expected arguments, as well as a mdim_t and mem_t* as memory must be allocated inside this function, and behavior may differ based on which matrix is being backed (i.e. transposition for B). This could also be achieved by passing a desired pack schema, but this would require additional information to travel down the control tree.
4. `obj_ker_fn_t ker`: the function called when we get to the "second loop", or the macro-kernel. Behavior may depend on the pack schemas of the input matrices. The default implementation would perform the inner two loops around the ukr, and then call either the default ukr or a user-supplied one (next field).
5. `obj_ukr_fn_t ukr`: the function called by the default macrokernel. This would replace the various current "virtual" microkernels, and could also be used to supply user-defined behavior. Users could supply both a custom kernel (above) and microkernel, although the user-specified kernel does **not** necessarily have to call the ukr function specified on the obj_t.
Note that no macros or functions for accessing these new fields have been defined yet. That is next once these are finalized. Addresses https://github.com/flame/blis/projects/1#card-62357687.
commit a32257eeab2e9946e71546a05a1847a39341ec6b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 5 16:23:02 2021 -0500
Fixed bli_init.c compile-time error on OSX clang.
Details:
- Fixed a compile-time error in bli_init.c when compiling with OSX's
clang. This error was introduced in 868b901, which introduced a
post-declaration struct assignment where the RHS was a struct
initialization expression (i.e. { ... }). This use of struct
initializer expressions apparently works with gcc despite it not
being strict C99. The fix included in this commit declares a temporary
variable for the purposes of being initialized to the desired value,
via the struct initializer, and then copies the temporary struct (via
'=' struct assignment) to the persistent struct. Thanks to Devin
Matthews for his help with this.
commit c8728cfbd19ecde9d43af05829e00bcfe7d86eed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 5 15:17:09 2021 -0500
Fixed configure breakage on OSX clang.
Details:
- Accept either 'clang' or 'LLVM' in vendor string when greping for
the version number (after determining that we're working with clang).
Thanks to Devin Matthews for this fix.
commit 868b90138e64c873c780d9df14150d2a370a7a42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 4 18:31:01 2021 -0500
Fixed one-time use property of bli_init() (#525).
Details:
- Fixes a rather obvious bug that resulted in segmentation fault
whenever the calling application tried to re-initialize BLIS after
its first init/finalize cycle. The bug resulted from the fact that
the bli_init.c APIs made no effort to allow bli_init() to be called
subsequent times at all due to it, and bli_finalize(), being
implemented in terms of pthread_once(). This has been fixed by
resetting the pthread_once_t control variable for initialization
at the end of bli_finalize_apis(), and by resetting the control
variable for finalization at the end of bli_init_apis(). Thanks to
@lschork2 for reporting this issue (#525), and to Minh Quan Ho and
Devin Matthews for suggesting the chosen solution.
- CREDITS file update.
commit 8dba1e752c6846a85dea50907135bbc5cbc54ee5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 27 12:38:24 2021 -0500
CREDITS file update.
commit cc9206df667b7c710b57b190b8ad351176de53b8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 16 15:48:37 2021 -0500
Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
docs/Performance.md. These results were gathered on a Graviton2
Neoverse N1 server. Special thanks to Nicholai Tukanov for
collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
labels in test/3/octave/plot_l3_perf.m.
commit fab5c86d68137b59800715efb69214c0a7e458a7
Merge: 84f9dcd4 d073fc9a
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jul 13 16:46:21 2021 -0500
Merge pull request #516 from nicholaiTukanov/p10-sandbox-rework
P10 sandbox rework
commit 84f9dcd449fa7a4cf4087fca8ec4ca0d10e9b801
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jul 13 16:45:44 2021 -0500
Remove unnecesary windows/zen2 directory.
commit 21911d6ed3438ca4ba942d05851ba5d7e9835586
Merge: 17729cf4 689fa0f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 9 18:10:46 2021 -0500
Merge branch 'dev'
commit 17729cf449919d1db9777cea5b65d2efc77e2692
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jul 9 14:59:48 2021 -0500
Add vzeroupper to Haswell microkernels. (#524)
Details:
- Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm'
microkernels so as to avoid a performance penalty when mixing AVX
and SSE instructions. These vzeroupper instructions were once part
of the haswell kernels, but were inadvertently removed during a source
code shuffle some time ago when we were managing duplicate 'haswell'
and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down
and re-inserting the missing instructions.
commit c9a7f59aa84daa54d8f8c771f1f1ef2bd8730da2
Merge: 75f03907 9a8e649c
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Jul 8 14:00:38 2021 -0500
Merge pull request #522 from flame/windows-avx512
Fix Win64 AVX512 bug.
commit 9a8e649c5ac89eba951bbee7136ca28aeb24d731
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jul 7 15:23:57 2021 -0500
Fix Win64 AVX512 bug.
Use `-march=haswell` for kernels. Fixes #514.
commit 75f03907c58385b656c8bd35d111db245814a9f3
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jul 7 15:44:11 2021 -0500
Add comment about make checkblas on Windows
[ci skip]
commit 4651583b1204a965e4aa672c7ad6de60f3ab1600
Merge: 69205ac2 174f7fc9
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jul 7 01:11:20 2021 -0500
Merge pull request #520 from flame/travis-ci-install
Test installation in Travis CI
commit 69205ac266947723ad4d7bb028b7521fe5c76991
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 6 20:39:22 2021 -0500
CREDITS file update.
Details:
- Thanks to Chengguo Sun for submitting #515 (5ef7f68).
- Thanks to Andrew Wildman for submitting #519 (551c6b4).
- Whitespace update to configure (spaces to tabs).
commit 174f7fc9a11712c7bd1a61510bdc5c262b3e8e1f
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jul 6 19:35:55 2021 -0500
Test installation in Travis CI
commit 551c6b4ee8cd9dd2e1d1b46c8dde09eb50b91b2c
Merge: 78eac6a0 f648df4e
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jul 6 19:32:53 2021 -0500
Merge pull request #519 from awild82/oot_build_bugfix
Fix installation from out-of-tree builds
commit f648df4e5588f069b2db96f8be320ead0c1967ef
Author: Andrew Wildman <apw4@uw.edu>
Date: Tue Jul 6 16:35:12 2021 -0700
Add symlink to blis.pc.in for out-of-tree builds
commit 78eac6a0ab78c995c3f4e46a9e87388b5c3e1af6
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jul 6 11:05:43 2021 -0500
Revert "Always run `make check`."
This reverts commit a201a53440c51244739aaee20e3309b50121cc68.
commit a201a53440c51244739aaee20e3309b50121cc68
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jul 5 21:39:18 2021 -0500
Always run `make check`.
I'm concerned that problems may lurk for `x86_64` builds on Windows which may be uncovered by a fuller `make check`.
commit 5ef7f684dc75fc707c82f919e0836615f90a2627
Merge: aaa10c87 ad6231cc
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jul 5 21:35:07 2021 -0500
Merge pull request #515 from chengguosun/bug-fix
Fixed configure script bug.
commit ad6231cca3fc1e477752ecd31b1ee2323398a642
Author: sunchengguo <sunchengguo@higon.com>
Date: Tue Jul 6 07:30:00 2021 -0400
Fixed configure script bug.
Details:
- Fixed kernel list string substitution error by adding function substitute_words in configure script.
if the string contains zen and zen2, and zen need to be replaced with another string, then zen2
also be incorrectly replaced.
commit d073fc9acac9d702556cab9fbbb3a253eeb1f998
Author: nicholaiTukanov <nicholaitukanov@gmail.com>
Date: Fri Jul 2 19:54:33 2021 -0500
Update POWER10.md
commit 907226c0af4afb6323b4e02be4f73f5fb89cddaf
Author: nicholaiTukanov <nicholaitukanov@gmail.com>
Date: Fri Jul 2 19:47:18 2021 -0500
Rework POWER10 sandbox
- Add a testsuite for gathering performance (in GFLOPs) and measuring correctness for the POWER10 GEMM reduced precision/integer kernels.
- Reworked GENERIC_GEMM template to hardcode the cache parameters.
- Remove kernel wrapper that checked that only allowed matrices that weren't transposed or conjugated. However, the kernels still assume the matrices are not transposed. This wrapper was removed for performance reasons.
- Renamed and restructured files and functions for clarity.
- Editted the POWER10 document to reflect new changes.
commit aaa10c87e19449674a4ca30fa3b6392bb22c3a66
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 21 17:53:52 2021 -0500
Skip clearing temp microtile in gemmlike sandbox.
Details:
- Removed code from gemmlike sandbox files bls_gemm_bp_var1.c and
bls_gemm_bp_var2.c that initializes the elements of the temporary
microtile to zero. This code, introduced recently in 7f7d726, did
not actually fix any bug (despite that commit's log entry). The
microtile does not need to be initialized because it is completely
overwritten by a "beta = 0" invocation of gemm prior to it being
read. Any NaNs or Infs present at the outset would have no impact
on the output matrix C. Thanks to Devin Matthews for reminding me
of this.
commit bc10a3f2ff518360c32bea825b3eb62a9e4c8a77
Merge: bf727636 6548ceba
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jun 18 19:01:08 2021 -0500
Merge pull request #492 from flame/thunderx2-clang
Allow clang for ThunderX2 config
commit bf727636632a368f3247dc8ab1d4b6119e9c511a
Merge: e28f2a2d 5fc93e28
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jun 18 18:59:43 2021 -0500
Merge pull request #506 from xrq-phys/arm64-mac
BLIS on Darwin_Aarch64
commit e28f2a2dfcff14e7094fce0b279b3a917b3ab98c
Merge: d10e05bb 56ffca6a
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Jun 15 19:35:07 2021 -0500
Merge pull request #513 from nicholaiTukanov/asm_warning_p9_fix
Fix assembler warning in POWER9 DGEMM
commit 56ffca6a9bc67432a7894298739895f406e5f467
Author: nicholai <nicholai@ibm.com>
Date: Tue Jun 15 18:17:39 2021 -0500
Fix asm warning
commit 689fa0f40399bde1acc5367d6dd4e8fc4eb6f3ea
Merge: b683d01b d10e05bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jun 13 19:44:14 2021 -0500
Merge branch 'master' into dev
commit d10e05bbd1ce45ce2c0dfe5c64daae2633357b3f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jun 13 19:36:16 2021 -0500
Sandbox header edits trigger full library rebuild.
Details:
- Adjusted the top-level Makefile so that any change to a sandbox header
file will result in blis.h being regenerated along with a full
recompilation of the library. Previously, sandbox files were omitted
from the list of header files that, when touched, could trigger a full
rebuild. Why was it like that previously? Because originally we only
envisioned using sandboxes to *replace* gemm, not augment the library
with new functionality. When replacing gemm, blis.h does not need to
contain any local sandbox defintions in order for the user to be able
to (indirectly) use that sandbox. But if you are adding functions to
the library, those functions need to be prototyped so the compiler
can perform type checking against the user's invocation of those new
functions. Thanks to Jeff Diamond for helping us discover this
deficiency in the build system.
commit 7c3eb44efaa762088c190bb820ef6a3c87db8f65
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Jun 2 11:28:22 2021 -0500
Add vhsubpd/vhsubpd.
Horizontal subtraction instructions added to bli_x86_asm_macros.h, currently unused [ci skip].
commit 7f7d72610c25f511ba8cd2a53be7b59bdb80f3f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 31 16:50:18 2021 -0500
Fixed bugs in cpackm kernels, gemmlike code.
Details:
- Fixed intermittent bugs in bli_packm_haswell_asm_c3xk.c and
bli_packm_haswell_asm_c8xk.c whereby the imaginary component of the
kappa scalar was incorrectly loaded at an offset of 8 bytes (instead
of 4 bytes) from the real component. This was almost certainly a copy-
paste bug carried over from the corresonding zpackm kernels. Thanks to
Devin Matthews for bringing this to my attention.
- Added missing code to gemmlike sandbox files bls_gemm_bp_var1.c and
bls_gemm_bp_var2.c that initializes the elements of the temporary
microtile to zero. (This bug was never observed in output but rather
noticed analytically. It probably would have also manifested as
intermittent failures, this time involving edge cases.)
- Minor commented-out/disabled changes to testsuite/src/test_gemm.c
relating to debugging.
commit 5fc93e280614b4a21a9cff36cf873b4b9407285b
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 29 18:44:47 2021 +0900
Armv8A Rename Regs for Safe Darwin Compile
Avoid x18 use in FP32 kernel:
- C address lines x[18-26] renamed to x[19-27] (reg index +1)
- Original role of x27 fulfilled by x5 which is free after k-loop pert.
FP64 does not require changing since x18 is not used there.
commit 9f4a4a3cfb2244e4024445e127dafd2a11f39fc5
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 29 17:21:28 2021 +0900
Armv8A Rename Regs for Clang Compile: FP32 Part
Roughly the same as 916e1fa , additionally with x15 clobbering removed.
- x15: Not used at all.
Compilation w/ Clang shows warning about x18 reservation, but
compilation itself is OK and all tests got passed.
commit 916e1fa8be3cea0e3e2a4a7e8b00027ac2ee7780
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 29 16:46:52 2021 +0900
Armv8A Rename Regs for Clang Compile: FP64 Part
- x7, x8: Used to store address for Alpha and Beta.
As Alpha & Beta was not used in k-loops, use x0, x1 to load
Alpha & Beta's addresses after k-loops are completed, since A & B's
addresses are no longer needed there.
This "ldr [addr]; -> ldr val, [addr]" would not cause much performance
drawback since it is done outside k-loops and there are plenty of
instructions between Alpha & Beta's loading and usage.
- x9: Used to store cs_c. x9 is multiplied by 8 into x10 and not used
any longer. Directly loading cs_c and into x10 and scale by 8 spares
x9 straightforwardly.
- x11, x12: Not used at all. Simply remove from clobber list.
- x13: Alike x9, loaded and scaled by 8 into x14, except that x13 is
also used in a conditional branch so that "cmp x13, #1" needs to be
modified into "cmp x14, #8" to completely free x13.
- x3, x4: Used to store next_a & next_b. Untouched in k-loops. Load
these addresses into x0 and x1 after Alpha & Beta are both loaded,
since then neigher address of A/B nor address of Alpha/Beta is needed.
commit 7fabd896af773623ed01820a71bbff432e8a7d25
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 29 16:28:03 2021 +0900
Asm Flag Mingling for Darwin_Aarch64
Apple+Arm64 requires additional "tagging" of local symbols.
commit 213dce32d2eed8b7a38c6a3f6112072b0a89ecd0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 28 14:49:57 2021 -0500
Added a new 'gemmlike' sandbox.
Details:
- Added a new sandbox called 'gemmlike', which implements sequential and
multithreaded gemm in the style of gemmsup but also unconditionally
employs packing. The purpose of this sandbox is to
(1) avoid select abstractions, such as objects and control trees, in
order to allow readers to better understand how a real-world
implementation of high-performance gemm can be constructed;
(2) provide a starting point for expert users who wish to build
something that is gemm-like without "reinventing the wheel."
Thanks to Jeff Diamond, Tze Meng Low, Nicholai Tukanov, and Devangi
Parikh for requesting and inspiring this work.
- The functions defined in this sandbox currently use the "bls_" prefix
instead of "bli_" in order to avoid any symbol collisions in the main
library.
- The sandbox contains two variants, each of which implements gemm via a
block-panel algorithm. The only difference between the two is that
variant 1 calls the microkernel directly while variant 2 calls the
microkernel indirectly, via a function wrapper, which allows the edge
case handling to be abstracted away from the classic five loops.
- This sandbox implementation utilizes the conventional gemm microkernel
(not the skinny/unpacked gemmsup kernels).
- Updated some typos in the comments of a few files in the main
framework.
commit 82af05f54c34526a60fd2ec46656f13e1ac8f719
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 25 15:25:08 2021 -0500
Updated Fugaku (a64fx) performance results.
Details:
- Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
entry within Performance.md, and also updated the experiment details
accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
experiments reflected in this commit.
- In Performance.md, added an English translation of the project name
under which the Fugaku results were gathered, courtesy of RuQing Xu.
commit e5c85da3763f73854ecd739ba3008bb467ed77c3
Merge: cbd8d393 5feb04e2
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon May 24 16:56:22 2021 -0500
Merge pull request #503 from flame/windows-compiler-check
Add explicit compiler check for Windows.
commit cbd8d3932599485727204479fded66ac19186db4
Merge: 6d4ab022 932dfe6a
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon May 24 16:32:42 2021 -0500
Merge pull request #500 from xrq-phys/armsve+travis
Upgrade Travis CI for Arm SVE
commit 5feb04e233e1e6f81c727578ad9eae1367a2562f
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun May 23 18:46:56 2021 -0500
Add explicit compiler check for Windows.
Check the C compiler for a predefined macro `_WIN32` to indicate (cross-)compilation for Windows. Fixes #463.
commit 6d4ab0223d9014ac2a66d66759536aa305be5867
Merge: 61584ded 859fb77a
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun May 23 18:39:53 2021 -0500
Merge pull request #502 from flame/rm-rm-dupls
Remove `rm-dupls` function in common.mk.
commit 859fb77a320a3ace71d25a8885c23639b097a1b6
Author: Devin Matthews <damatthews@smu.edu>
Date: Sun May 23 18:15:23 2021 -0500
Remove `rm-dupls` function in common.mk.
AMD requested removal due to unclear licensing terms; original code was from stackoverflow. The function is unused but could easily be replaced by new implementation.
commit 932dfe6abb9617223bd26a249e53447169033f8c
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu May 20 02:07:31 2021 +0900
Travis CI Revert Unnecessary Extras from 91d3636
- Removed `V=1` in make line
- Removed `CFLAGS` in configure line
- Restored `pwd` surrounding OOT line
commit bd156a210d347a073a6939cc4adab3d9256c2e2b
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sun May 16 02:56:14 2021 +0900
Adjust TravisCI
- ArmSVE don't test gemmt (seems Qemu-only problem);
- Clang use TravisCI-provided version instead of fixing to clang-8
due to that clang-8 seems conflicting with TravisCI's clang-7.
commit 91d3636031021af3712d14c9fcb1eb34b6fe2a31
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Sat May 15 17:05:16 2021 +0900
Travis Support Arm SVE
- Updated distro to 20.04 focal aarch64-gcc-10.
This is minimal version required by aarch64-gcc-10.
SVE intrinsics would not compile without GCC >=10.
- x86 toolchains use official repo instead of ubuntu-toolchain-r/test.
20.04 focal is not supported by that PPA at the moment.
- Add extra configuration-time options to .travis.yml.
- Add Arm SVE entry to .travis.yml.
commit 61584deddf9b3af6d11a811e6e04328d22390202
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Wed May 19 23:52:29 2021 +0900
Added 512b SVE-based a64fx subconfig + SVE kernels.
Details:
- Added 512-bit specific 'a64fx' subconfiguration that uses empirically
tuned block size by Stepan Nassyr. This subconfig also sets the sector
cache size and enables memory-tagging code in SVE gemm kernels. This
subconfig utilizes (16, k) and (10, k) DPACKM kernels.
- Added a vector-length agnostic 'armsve' subconfiguration that computes
blocksizes according to the analytical model. This part is ported from
Stepan Nassyr's repository.
- Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE
at size (2*VL, 10). These kernels use unindexed FMLA instructions
because indexed FMLA takes 2 FMA units in many implementations.
PS: There are indexed-FLMA kernels in Stepan Nassyr's repository.
- Implemented 512-bit SVE dpackm kernels with in-register transpose
support for sizes (16, k) and (10, k).
- Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for
size (12, k). This dpackm kernel is not currently used by any
subconfiguration.
- Implemented several experimental dgemmsup kernels which would
improve performance in a few cases. However, those dgemmsup kernels
generally underperform hence they are not currently used in any
subconfig.
- Note: This commit squashes several commits submitted by RuQing Xu via
PR #424.
commit b683d01b9c4ea5f64c8031bda816beccfbf806a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 13 15:23:22 2021 -0500
Use extra #undef when including ba/ex API headers.
Details:
- Inserted a "#include bli_xapi_undef.h" after each usage of the basic
and expert API macro setup headers: bli_oapi_ba.h, bli_oapi_ex.h,
bli_tapi_ba.h, and bli_tapi_ex.h. This is functionally equivalent to
the previous status quo, in which each header made minimal #undef
prior to its own definitions and then a single instance of
"#include bli_xapi_undef.h" cleaned up any remaining macro defs after
all other headers were used. This commit will guarantee that macro
defs from the setup of one header (say, bli_oapi_ex.h) don't "infect"
the definitions made in a subsequent header. As with this previous
commit, this change does not fix any issue but rather attempts to
avoid creating orphaned macro definitions that are only needed within
a very limited scope.
- Removed minimal #undef from bli_?api_[ba|ex].h.
- Removed old commented-out lines from bli_?api_[ba|ex].h.
commit d4427a5b2f5cab5d2a64c58d87416628867c2b4a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 13 13:55:11 2021 -0500
Minor preprocessor/header cleanup.
Details:
- Added frame/include/bli_xapi_undef.h, which explicitly undefines all
macros defined in bli_oapi_ba.h, bli_oapi_ex.h, bli_tapi_ba.h, and
bli_tapi_ex.h. (This is for safety and good cpp coding practice, not
because it fixes anything.)
- Added #include "bli_xapi_undef.h" to bli_l1v.h, bli_l1d.h, bli_l1f.h,
bli_l1m.h, bli_l2.h, bli_l3.h, and bli_util.h.
- Comment updates to bli_oapi_ba.h, bli_oapi_ex.h, bli_tapi_ba.h, and
bli_tapi_ex.h.
- Moved frame/3/bli_l3_ft_ex.h to local 'old' directory after realizing
that nothing in BLIS used those function pointer types. Also commented
out the "#include bli_l3_ft_ex.h" directive in frame/3/bli_l3.h.
commit 5aa63cd927b22a04e581b07d0b68ef391f4f9b1f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 12 19:53:35 2021 -0500
Fixed typo in cpp guard in bli_util_ft.h.
Details:
- Changed #ifdef BLIS_OAPI_BASIC to #ifdef BLIS_TAPI_BASIC in
bli_util_ft.h. This typo was causing some types to be redefined when
they weren't supposed to be.
commit f0e8634775094584e89f1b03811ee192f2aaf67f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 12 18:45:32 2021 -0500
Defined eqsc, eqv, eqm to test object equality.
Details:
- Defined eqsc, eqv, and eqm operations, which set a bool depending on
whether the two scalars, two vectors, or two matrix operands are equal
(element-wise). eqsc and eqv support implicit conjugation and eqm
supports diagonal offset, diag, uplo, and trans parameters (in a
manner consistent with other level-1m operations). These operations
are currently housed under frame/util, at least for now, because they
are not computational in nature.
- Redefined bli_obj_equals() in terms of eqsc, eqv, and eqm.
- Documented eqsc, eqv, and eqm in BLISObjectAPI.md and BLISTypedAPI.md.
Also:
- Documented getsc and setsc in both docs.
- Reordered entry for setijv in BLISTypedAPI.md, and added separator
bars to both docs.
- Added missing "Observed object properties" clauses to various
levle-1v entries in BLISObjectAPI.md.
- Defined bli_apply_trans() in bli_param_macro_defs.h.
- Defined supporting _check() function, bli_l0_xxbsc_check(), in
bli_l0_check.c for eqsc.
- Programming style and whitespace updates to bli_l1m_unb_var1.c.
- Whitespace updates to bli_l0_oapi.c, bli_l1m_oapi.c
- Consolidated redundant macro redefinition for copym function pointer
type in bli_l1m_ft.h.
- Added macros to bli_oapi_ba.h, _ex.h, and bli_tapi_ba.h, _ex.h that
allow oapi and tapi source files to forego defining certain expert
functions. (Certain operations such as printv and printm do not need
to have both basic expert interfaces. This also includes eqsc, eqv,
and eqm.)
commit 5d46dbee4a06ba5a422e19817836976f8574cb4f
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed May 12 18:42:09 2021 -0500
Replace bli_dlamch with something less archaic (#498)
Details:
- Added new implementations of bli_slamch() and bli_dlamch() that use
constants from the standard C library in lieu of dynamically-computed
values (via code inherited from netlib). The previous implementation
is still available when the cpp macro BLIS_ENABLE_LEGACY_LAMCH is
defined by the subconfiguration at compile-time. Thanks to Devin
Matthews for providing this patch, and to Stefano Zampini for
reporting the issue (#497) that prompted Devin to propose the patch.
commit 6a89c7d8f9ac3f51b5b4d8ccb2630d908d951e6f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat May 1 18:54:48 2021 -0500
Defined setijv, getijv to set/get vector elements.
Details:
- Defined getijv, setijv operations to get and set elements of a vector,
in bli_setgetijv.c and .h.
- Renamed bli_setgetij.c and .h to bli_setgetijm.c and .h, respectively.
- Added additional bounds checking to getijm and setijm to prevent
actions with negative indices.
- Added documentation to BLISObjectAPI.md and BLISTypedAPI.md for getijv
and setijv.
- Added documentation to BLISTypedAPI.md for getijm and setijm, which
were inadvertently missing.
- Added a new entry to the FAQ titled "Why does BLIS have vector
(level-1v) and matrix (level-1m) variations of most level-1
operations?"
- Comment updates.
commit 4534daffd13ed7a8983c681d3f5e9de17c9f0b96
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 27 18:16:44 2021 -0500
Minor API breakage in bli_pack API.
Details:
- Changed bli_pack_get_pack_a() and bli_pack_get_pack_b() so that
instead of returning a bool, they set a bool that is passed in by
address. This does break the public exported API, but I expect very
few users actually use this function. (This change is being made in
preparation for a much more extensive commit relating to error
checking.)
commit 6a4aa986ffc060d3e64ed230afe318b82630f8b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 23 13:10:01 2021 -0500
Fixed typo in Table of Contents.
commit f6424b5b82160d346a09a0fbb526981ecf66cdb3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 23 13:08:06 2021 -0500
Added dedicated Performance section to README.md.
Details:
- Spun off the Performance.md and PerformanceSmall.md links in the
Documentation section into a new Performance section dedicated to
those two links. (The previous entries remain redundantly listed
within Documentation section.) Thanks to Robert van de Geijn for
suggesting this change.
commit 40ce5fd241b9ad140bf57278d440f0598d7f15d8
Merge: 6280757b 1f3461a5
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Apr 21 09:54:25 2021 -0500
Merge pull request #493 from cassiersg/patch-1
Fix typo in FAQ.md
commit 1f3461a5a5a88510f913451a93e3190ec1556f39
Author: Gaëtan Cassiers <cassiersg@users.noreply.github.com>
Date: Wed Apr 21 16:49:05 2021 +0200
Fix typo in FAQ.md
commit 6548cebaf55a1f9bdb8417cc89dd0444d8f9c2e4
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Apr 14 13:00:42 2021 -0500
Allow clang for ThunderX2 config
Needed for compiling on e.g. Mac M1. AFAIK clang supports the same -mcpu flag for ThunderX2 as gcc.
commit 6280757be32f90fd77d8dd9357b07d9306e6f80d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 7 13:03:56 2021 -0500
Minor updates to a64fx section of Performance.md.
commit 1e6ed823c6cd11f9b671779f3c8bdbd2bbb40f34
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Thu Apr 8 02:59:26 2021 +0900
Additional A64fx Comments (#490)
* Performance.md Update A64fx Comments
- Reason for ARMPL's missing data;
- Additional envs / flags for kernel selection;
- Update BLIS SRC commit.
* Include Another Fix in armsve-cfg-vendor
A prototype was forgotten, causing that void* pointer was not fully returned.
commit 2688f21a5b073950f6f187c95917fdbb5aac234a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 6 19:02:37 2021 -0500
Added Fujitsu A64fx (512-bit SVE) perf results.
Details:
- Added single-threaded and multithreaded performance results to
docs/Performance.md. These results were gathered on the "Fugaku"
Fujitsu A64fx supercomputer at the RIKEN Center for Computational
Science in Kobe, Japan. Special thanks to RuQing Xu and Stepan
Nassyr for their work in developing and optimizing A64fx support in
BLIS and RuQing for gathering the performance data that is reflected
in these new graphs.
commit ba3ba8da83d48397162139e11337c036a631ba79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 6 18:39:58 2021 -0500
Minor updates and fixes to test/3/octave scripts.
Details:
- Fixed an issue where the wrong string was being passed in for the
vendor legend string.
- Changed the graph in which the legends appear.
- Updates to runthese.m.
commit 09bd4f4f12311131938baa9f75d27e92b664d681
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 31 17:09:36 2021 -0500
Add err_t* "return" parameter to malloc functions.
Details:
- Added an err_t* parameter to memory allocation functions including
bli_malloc_intl(), bli_calloc_intl(), bli_malloc_user(),
bli_fmalloc_align(), and bli_fmalloc_noalign(). Since these functions
already use the return value to return the allocated memory address,
they can't communicate errors to the caller through the return value.
This commit does not employ any error checking within these functions
or their callers, but this sets up BLIS for a more comprehensive
commit that moves in that direction.
- Moved the typedefs for malloc_ft and free_ft from bli_malloc.h to
bli_type_defs.h. This was done so that what remains of bli_malloc.h
can be included after the definition of the err_t enum. (This ordering
was needed because bli_malloc.h now contains function prototypes that
use err_t.)
- Defined bli_is_success() and bli_is_failure() static functions in
bli_param_macro_defs.h. These functions provide easy checks for error
codes and will be used more heavily in future commits.
- Unfortunately, the additional err_t* argument discussed above breaks
the API for bli_malloc_user(), which is an exported symbol in the
shared library. However, it's quite possible that the only application
that calls bli_malloc_user()--indeed, the reason it is was marked for
symbol exporting to begin with--is the BLIS testsuite. And if that's
the case, this breakage won't affect anyone. Nonetheless, the "major"
part of the so_version file has been updated accordingly to 4.0.0.
commit f9ad55ce7e12f59930605753959fcfd41a218d8d
Merge: 04502492 90508192
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 31 14:20:19 2021 -0500
Merge branch 'master' into dev
commit 90508192f2d6ae95adc2a3ba9f4e5bad2c8d6fd2
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Mar 30 21:16:44 2021 -0500
Update do_sde.sh (#489)
Update to a newer version of SDE, and do a direct download as it seems you don't have to click-through the license anymore.
commit 22c6b5dc4c9cc21942f8ccc30891f9b4385a9504
Author: Nicholai Tukanov <nicholaitukanov@gmail.com>
Date: Tue Mar 30 19:07:42 2021 -0500
Fixed bug in power10 microkernel I/O. (#488)
Details:
- Fixed a bug in the POWER10 DGEMM kernel whereby the microkernel did
not store the microtile result correctly due to incorrect indices
calculations. (The error was introduced when I reorganized the
'kernels/power10/3' directory.)
commit 04502492671456b94bcdee60b9de347b6763a32d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 28 19:11:43 2021 -0500
Always stay initialized after BLAS compat calls.
Details:
- Removed the option to finalize BLIS after every BLAS call, which also
means that BLIS would initialize at the beginning of every BLAS call.
This option never really made sense and wasn't even implemented
properly to begin with. (Because bli_init_auto() and _finalize_auto()
were implemented in terms of bli_init_once() and _finalize_once(),
respectively, the application would have only been able to call one
BLAS routine before BLIS would find itself in a unusable, permanently
uninitialized state.) Because this option was never meant for regular
use, it never made it into configure as an actual configure-time
option, and therefore this commit only removes parts of the code
affected by the cpp macro guard BLIS_ENABLE_STAY_AUTO_INITIALIZED.
commit 3a6f41afb8197e831b6ce2f1ae7f63735685fa0a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 27 17:22:14 2021 -0500
Renamed membrk files/vars/functions to pba.
Details:
- Renamed the files, variables, and functions relating to the packing
block allocator from its legacy name (membrk) to its current name
(pba). This more clearly contrasts the packing block allocator with
the small block allocator (sba).
- Fixed a typo in bli_pack_set_pack_b(), defined in bli_pack.c, that
caused the function to erroneously change the value of the pack_a
field of the global rntm_t instead of the pack_b field. (Apparently
nobody has used this API yet.)
- Comment updates.
commit 36cb4116d15cfef2d42ec4a834efd4a958f261b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 27 15:15:09 2021 -0500
Switch allocator mutexes to static initialization.
Details:
- Switched the small block allocator (sba), as defined in bli_sba.c and
bli_apool.c, to static initialization of its internal mutex. Did a
similar thing for the packing block allocator (pba), which appears as
global_membrk in bli_membrk.c.
- Commented out bli_membrk_init_mutex() and bli_membrk_finalize_mutex()
to ensure they won't be used in the future.
- In bli_thrcomm_pthreads.c and .h, removed old, commented-out cpp
blocks guarded by BLIS_USE_PTHREAD_MUTEX.
commit 159ca6f01a5f91b93513134c9470b69ff78f5354
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 24 15:57:32 2021 -0500
Made test/3/octave scripts robust to missing data.
Details:
- Modified the octave scripts in test/3 so that the script does not
choke when one or more of the expected OpenBLAS, Eigen, or vendor data
files is missing. (The BLIS data set, however, must be complete.) When
a file is missing, that data series is simply not included on that
particular graph. Also factored out a lot of the redundant logic from
plot_panel_4x5.m into a separate function in read_data.m.
commit 545e6c2f6d09d023b353002a9a43b11aa0c1d701
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 22 17:42:33 2021 -0500
CHANGELOG update (0.8.1)
commit 8535b3e11d2297854991c4272932ce4974dda629
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 22 17:42:33 2021 -0500
Version file update (0.8.1)
commit e56d9f2d94ed247696dda2cbf94d2ca05c7fc089
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 22 17:40:50 2021 -0500
ReleaseNotes.md update in advance of next version.
commit ca83f955d45814b7d84f53933cdb73323c0dea2c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 22 17:21:21 2021 -0500
CREDITS file update.
commit 57ef61f6cdb86957f67212aa59407f2f8e7f3d1a
Merge: bf1b578e e7a4a8ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 19 13:05:43 2021 -0500
Merge branch 'master' of github.com:flame/blis
commit bf1b578ea32ea1c9dbf7cb3586969e8ae89aa5ef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 19 13:03:17 2021 -0500
Reduced KC on skx from 384 to 256.
Details:
- Reduced the KC cache blocksize for double real on the skx subconfig
from 384 to 256. The maximum (extended) KC was also reduced
accordingly from 480 to 320. Thanks to Tze Meng Low for suggesting
this change.
commit e7a4a8edc940942357e8e4c4594383a29a962f93
Author: Nicholai Tukanov <nicholaitukanov@gmail.com>
Date: Wed Mar 17 19:43:31 2021 -0500
Fix calculation of new pb size (#487)
Details:
- Added missing parentheses to the i8 and i4 instantiations of the
GENERIC_GEMM macro in sandbox/power10/generic_gemm.c.
commit 4493cf516e01aba82642a43abe350943ba458fe2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 15 13:12:49 2021 -0500
Redefined BLIS_NUM_ARCHS to update automatically.
Details:
- Changed BLIS_NUM_ARCHS from a cpp macro definition to the last enum
value in the arch_t enum. This means that it no longer needs to get
updated manually whenever new subconfigurations are added to BLIS.
Also removed the explicit initial index assigment of 0 from the
first enum value, which was unnecessary due to how the C language
standard mandates indexing of enum values. Thanks to Devin Matthews
for originally submitting this as a PR in #446.
- Updated docs/ConfigurationHowTo.md to reflect the aforementioned
change.
commit a4b73de84cdffcbe5cf71969a0f7f0f8202b3510
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 12 17:12:27 2021 -0600
Disabled _self() and _equal() in bli_pthread API.
Details:
- Disabled the _self() and _equal() extensions to the bli_pthread API
introduced in d479654. These functions were disabled after I realized
that they aren't actually needed yet. Thanks to Devin Matthews for
helping me reason through the appropriate consumer code that will
appear in BLIS (eventually) in a future commit. (Also, I could never
get the Windows branch to link properly in clang builds in AppVeyor.
See the comment I left in the code, and #485, for more info.)
commit f9d604679d8715bc3e79a8630268446889b51388
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 11 16:57:55 2021 -0600
Added _self() and _equal() to bli_pthread API.
Details:
- Expanded the bli_pthread API to include equivalents to pthread_self()
and pthread_equal(). Implemented these two functions for all three cpp
branches present within bli_pthread.c: systemless, Windows, and
Linux/BSD.
commit fa9b3c8f6b3d5717f19832362104413e1a86dfb0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 11 15:13:51 2021 -0600
Shuffled code in Windows branch of bli_pthreads.c.
Details:
- Reordered the definitions in the cpp branch in bli_pthreads.c that
defines the bli_pthreads API in terms of Windows API calls. Also added
missing comments that mark sections of the API, which brings the code
into harmony with other cpp branches (as well as bli_pthread.h).
commit 95d4f3934d806b3563f6648d57a4e381d747caf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 11 13:50:40 2021 -0600
Moved cpp macro redef of strerror_r to bli_env.c.
Details:
- Relocated the _MSC_VER-guarded cpp macro re-definition of strerror_r
(in terms of strerror_s) from bli_thread.h to bli_env.c. It was
likely left behind in bli_thread.h in a previous commit, when code
that now resides in bli_env.c was moved from bli_thread.c. (I couldn't
find any other instance of strerror_r being used in BLIS, so I moved
the #define directly to bli_env.c rather than place it in bli_env.h.)
The code that uses strerror_r is currently disabled, though, so this
commit should have no affect on BLIS.
commit 8a3066c315358d45d4f5b710c54594455f9e8fc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 9 17:52:59 2021 -0600
Relocated gemmsup_ref general stride handling.
Details:
- Moved the logic that checks for general stridedness in any of the
matrix operands in a gemmsup problem. The logic previously resided
near the top of bli_gemmsup_int(), which is the thread entry point
for the parallel region of the current gemmsup implementation. The
problem with this setup was that the code would attempt to reject
problems with any general-strided operands by returning BLIS_FAILURE,
and that return value was then being ignored by the l3_sup thread
decorator, which unconditionally returns BLIS_SUCCESS. To solve this
issue, rather than try to manage n return values, one from each of n
threads, I simply moved the logic into bli_gemmsup_ref(). I didn't
move it any higher (e.g. bli_gemmsup()) because I still want the
logic to be part of the current gemmsup handler implementation. That
is, perhaps someone else will create a different handler, and that
author wants to handle general stride differently. (We don't want to
force them into a particular way of handling general stride.)
- Removed the general stride handling from bli_gemmtsup_int(), even
though this function is inoperative for now.
- This commit addresses issue #484. Thanks to RuQing Xu for reporting
this issue.
commit 670bc7b60f6065893e8ec1bebd2fc9e5ba710dff
Author: Nicholai Tukanov <nicholaitukanov@gmail.com>
Date: Fri Mar 5 13:53:43 2021 -0600
Add low-precision POWER10 gemm kernels (#467)
Details:
- This commit adds a new BLIS sandbox that (1) provides implementations
based on low-precision gemm kernels, and (2) extends the BLIS typed
API for those new implementations. Currently, these new kernels can
only be used for the POWER10 microarchitecture; however, they may
provide a template for developing similar kernels for other
microarchitectures (even those beyond POWER), as changes would likely
be limited to select places in the microkernel and possibly the
packing routines. The new low-precision operations that are now
supported include: shgemm, sbgemm, i16gemm, i8gemm, i4gemm. For more
information, refer to the POWER10.md document that is included in
'sandbox/power10'.
commit b8dcc5bc75a746807d6f8fa22dc2123c98396bf5
Author: RuQing Xu <r-xu@g.ecc.u-tokyo.ac.jp>
Date: Tue Mar 2 06:58:24 2021 +0800
Fixed typed API definition for gemmt (#476)
Details:
- Fixed incorrect definition and prototype of bli_?gemmt() in
frame/3/bli_l3_tapi.c and .h, respectively. gemmt was previously
defined identically to gemm, which was wrong because it did not
take into account the uplo property of C.
- Fixed incorrect API documentation for her2k/syr2k in BLISTypedAPI.md.
Specifically, the document erroneously listed only a single transab
parameter instead of transa and transb.
commit a0e4fe2340a93521e1b1a835a96d0f26dec8406a
Author: Ilknur <ilknuri607@gmail.com>
Date: Tue Mar 2 02:06:56 2021 +0400
Fixed double free() in level1v example (#482)
Details:
- In exampls/tapi/00level1v.c, pointer 'z' was being freed twice and
pointer 'a' was not being freed at all. This commit correctly frees
each pointer exactly once.
commit f5871c7e06a75799251d6b55a8a5fbfa1a92cf95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Feb 28 17:03:57 2021 -0600
Added complex asm packm kernels for 'haswell' set.
Details:
- Implemented assembly-based packm kernels for single- and double-
precision complex domain (c and z) and housed them in the 'haswell'
kernel set. This means c3xk, c8xk, z3xk, and z4xk are now all
optimized.
- Registered the aforementioned packm kernels in the haswell, zen,
and zen2 subconfigs.
- Minor modifications to the corresponding s and d packm kernels that
were introduced in 426ad67.
- Thanks to AMD, who originally contributed the double-precision real
packm kernels (d6xk and d8xk), upon which these complex kernels are
partially based.
commit 426ad679f55264e381eb57a372632b774320fb85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Feb 27 18:39:56 2021 -0600
Added assembly packm kernels for 'haswell' set.
Details:
- Implemented assembly-based packm kernels for single- and double-
precision real domain (s and d) and housed them in the 'haswell'
kernel set. This means s6xk, s16xk, d6xk, and d8xk are now all
optimized.
- Registered the aforementioned packm kernels in the haswell, zen,
and zen2 subconfigs.
- Thanks to AMD, who originally contributed the double-precision real
packm kernels (d6xk and d8xk), which I have now tweaked and used to
create comparable single-precision real kernels (s6xk and s16xk).
commit f50c1b7e5886d29efe134e1994d05af9949cd4b6
Merge: 8f39aea1 b3953b93
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Feb 1 11:55:51 2021 -0600
Merge pull request #473 from ajaypanyala/pkgconfig
build: generate pkgconfig file
commit 8f39aea11f80a805b66cff4b4dc5e72727ea461d
Merge: f8db9fb3 2a815d5b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jan 30 17:59:56 2021 -0600
Merge branch 'dev'
commit f8db9fb33b48844d6b47fdef699625bd9197745a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 28 08:04:52 2021 -0600
Fixed missing parentheses in README.md Citations.
commit b3953b938eee59f79b4a4162ba583a5cb59fa34e
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date: Tue Jan 12 17:07:04 2021 -0800
drop CFLAGS in the generated pkgconfig file
commit b02d9376bac31c1a1c7916f44c4946277a1425e2
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date: Mon Jan 11 20:50:01 2021 -0800
add datadir
commit d8d8deeb6d8b84adb7ae5fdb88c6dd4f06624a76
Author: Ajay Panyala <ajay.panyala@gmail.com>
Date: Mon Jan 11 17:47:50 2021 -0800
generate pkgconfig file
commit 8c65411c7c8737248a6f054ffa0ce008c95cb515
Merge: 328b4f88 874c3f04
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jan 11 16:01:45 2021 -0600
Merge pull request #471 from flame/fix-470
Fix kernel-to-config mapping for intel64
commit 874c3f04ece9af4d8fdf0e2713e21a259c117656
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jan 8 13:56:30 2021 -0600
Update configure
Choose last sub-config in the kernel-to-config map if the config list doesn't contain the name of the kernel set. E.g. for "zen: skx knl haswell" pick "haswell" instead of "skx" which was chosen previously. Fixes #470.
commit 2a815d5b365d934cb351b2f2a8cd1366e997b2e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 4 18:03:39 2021 -0600
Support trsm pre-inversion in 1m, bb, ref kernels.
Details:
- Expanded support for disabling trsm diagonal pre-inversion to other
microkernel types, including the reference microkernel as well as the
kernel implementations for 1m and the pre-broadcast B (bb) format used
by the power9 subconfig. This builds on the 'haswell' and 'penryn'
kernel support added in 7038bba. Thanks to Bhaskar Nallani for
reminding me, in #461 (post-closure), that 1m support was missing from
that commit.
- Removed cpp branch of ref_kernels/3/bli_trsm_ref.c that contained the
omp simd implementation after making a stripped-down copy in 'old'.
This code has been disabled for some time and it seemed better suited
to rot away out of sight rather than clutter up a file that is already
cluttered by the presence of lower and upper versions.
- Minor comment update to bli_ind_init().
commit c3ed2cbb9f60100fc9beb2a9d75476de9f711dc5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 4 16:16:32 2021 -0600
Enable 1m only if real domain ukr is not reference.
Details:
- Previously, BLIS would automatically enable use of the 1m method
for a given precision if the complex domain microkernel was a
reference kernel. This commit adds an additional constraint so that
1m is only enabled if the corresponding real domain microkernel is
NOT reference. That is, BLIS now forgos use of 1m if both the real and
complex domain kernels are reference implementations. Note that this
does not prevent 1m from being enabled manually under those
conditions; it only means that 1m will not be enabled automatically
at initialization-time.
commit ed50c947385ba3b0b5d550015f38f7f0a31755c0
Merge: 0cef09aa 328b4f88
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 4 14:31:44 2021 -0600
Merge branch 'master' into dev
commit 328b4f8872b4bca9a53d2de8c6e285f3eb13d196
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Dec 30 17:54:18 2020 -0600
Shared object (dylib) was not built correctly for partial build.
The SO build rule used $? instead of $^. Observed on macOS, not sure if it affected Linux or not.
commit ae6ef66ef824da9bc6348bf9d1b588cd4f2ded9b
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Dec 30 17:34:55 2020 -0600
bli_diag_offset_with_trans had wrong return type. Fixes #468.
commit ebcf197fb86fdd0a864ea928140752bc2462e8c6
Merge: 472f138c 21aa67e1
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Dec 5 22:26:27 2020 -0600
Merge pull request #466 from isuruf/patch-3
fix cc_vendor for crosstool-ng toolchains
commit 21aa67e11cebbc5a6dd7c6353154256294df3c33
Author: Isuru Fernando <isuruf@gmail.com>
Date: Sat Dec 5 21:59:13 2020 -0600
fix cc_vendor for crosstool-ng toolchains
commit 472f138cb927b7259126ebb9c68919cfcc7a4ea3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Dec 5 14:13:52 2020 -0600
Fixed typo in README.md to CodingConventions.md.
commit 0cef09aa92208441a656bf097f197ea8e22b533b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 4 16:40:59 2020 -0600
Consolidated code in level-3 _front() functions.
Details:
- Reduced a code segment that appears in all of the bli_*_front()
functions except for bli_gemm_front(). Previously, the code looked
like this (taken from bli_herk_front()):
if ( bli_cntx_method( cntx ) == BLIS_NAT )
{
bli_obj_set_pack_schema( BLIS_PACKED_ROW_PANELS, &a_local );
bli_obj_set_pack_schema( BLIS_PACKED_COL_PANELS, &ah_local );
}
else // if ( bli_cntx_method( cntx ) != BLIS_NAT )
{
pack_t schema_a = bli_cntx_schema_a_block( cntx );
pack_t schema_b = bli_cntx_schema_b_panel( cntx );
bli_obj_set_pack_schema( schema_a, &a_local );
bli_obj_set_pack_schema( schema_b, &ah_local );
}
This code segment is part of a sort-of-hack that allows us to
communicate the pack schemas into the level-3 thread decorator, which
needs them so that they can be passed into bli_l3_cntl_create_if(),
where the control tree is created. However, the first conditional case
above is unnecessary because the second case is fully generalized.
That is, even in the native case, the context contains correct,
queryable schemas. Thus, these code segments were reduced to something
like:
pack_t schema_a = bli_cntx_schema_a_block( cntx );
pack_t schema_b = bli_cntx_schema_b_panel( cntx );
bli_obj_set_pack_schema( schema_a, &a_local );
bli_obj_set_pack_schema( schema_b, &ah_local );
There's always a small chance that the seemingly unnecessary code
in the first branch case has some special use that is not apparent to
me, but the testsuite's default input parameters seem to think this
commit will be fine.
commit 7038bbaa05484141195822291cf3ba88cbce4980
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 4 16:08:15 2020 -0600
Optionally disable trsm diagonal pre-inversion.
Details:
- Implemented a configure-time option, --disable-trsm-preinversion, that
optionally disables the pre-inversion of diagonal elements of the
triangular matrix in the trsm operation and instead uses division
instructions within the gemmtrsm microkernels. Pre-inversion is
enabled by default. When it is disabled, performance may suffer
slightly, but numerical robustness should improve for certain
pathological cases involving denormal (subnormal) numbers that would
otherwise result in overflow in the pre-inverted value. Thanks to
Bhaskar Nallani for reporting this issue via #461.
- Added preprocessor macro guards to bli_trsm_cntl.c as well as the
gemmtrsm microkernels for 'haswell' and 'penryn' kernel sets pursuant
to the aforementioned feature.
- Added macros to frame/include/bli_x86_asm_macros.h related to division
instructions.
commit 78aee79452cce2691c40f05b3632bdfc122300af
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 2 13:02:36 2020 -0600
Allow amaxv testsuite module to run with dim = 0.
Details:
- Exit early from libblis_test_amaxv_check() when the vector dimension
(length) of x is 0. This allows the module to run when the testsuite
driver passes in a problem size of 0. Thanks to Meghana Vankadari for
alerting us to this issue via #459.
- Note: All other testsuite modules appear to work with problem sizes
of 0, except for the microkernel modules. I chose not to "fix" those
modules because a failure (or segmentation fault, as happens in this
case) is actually meaningful in that it alerts the developer that some
microkernels cannot be used with k = 0. Specifically, the 'haswell'
kernel set contains microkernels that preload elements of B. Those
microkernels would need to be restructured to avoid preloading in
order to support usage when k = 0.
commit 92d2b12a44ee0990c22735472aeaf1c17deb2d9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 2 13:02:00 2020 -0600
Fixed obscure testsuite gemmt dependency bug.
Details:
- Fixed a bug in the gemmt testsuite module that only manifested when
testing of gemmt is enabled but testing of gemv is disabled. The bug
was due to a copy-paste error dating back to the introduction of gemmt
in 88ad841.
commit b43dae9a5d2f078c9bbe07079031d6c00a68b7de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 1 16:44:38 2020 -0600
Fixed copy-paste bugs in edge-case sup kernels.
Details:
- Fixed bugs in two sup kernels, bli_dgemmsup_rv_haswell_asm_1x6() and
bli_dgemmsup_rd_haswell_asm_1x4(), which involved extraneous assembly
instructions that were left over from when the kernels were first
written. These instructions would cause segmentation faults in some
situations where extra memory was not allocated beyond the end of
the matrix buffers. Thanks to Kiran Varaganti for reporting these
bugs and to Bhaskar Nallani for identifying the cause and solution.
commit 11dfc176a3c422729f453f6c23204cf023e9954d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 1 19:51:27 2020 +0000
Reorganized thread auto-factorization logic.
Details:
- Reorganized logic of bli_thread_partition_2x2() so that the primary
guts were factored out into "fast" and "slow" variants. Then added
logic to the "fast" variant that allows for more optimal thread
factorizations in some situations where there is at least one factor
of 2.
- Changed BLIS_THREAD_RATIO_M from 2 to 1 in bli_kernel_macro_defs.h and
added comments to that file describing BLIS_THREAD_RATIO_? and
BLIS_THREAD_MAX_?R.
- In bli_family_zen.h and bli_family_zen2.h, preprocessed out several
macros not used in vanilla BLIS and removed the unused macro
BLIS_ENABLE_ZEN_BLOCK_SIZES from the former file.
- Disabled AMD's small matrix handling entry points in bli_syrk_front.c
and bli_trsm_front.c. (These branches of small matrix handling have
not been reviewed by vanilla BLIS developers.)
- Added commented-out calls printf() to bli_rntm.c.
- Whitespace changes to bli_thread.c.
commit 6d3bafacd7aa7ad198762b39490876c172bfbbcb
Author: Devin Matthews <damatthews@smu.edu>
Date: Sat Nov 28 17:17:56 2020 -0600
Update BuildSystem.md
Add git version >= 1.8.5 requirement (see #462).
commit 64856ea5a61b01d585750815788b6a775f729647
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 23 16:54:51 2020 -0600
Auto-reduce (by default) prime numbers of threads.
Details:
- When requesting multithreaded parallelism by specifying the total
number of threads (whether it be via environment variable, globally at
runtime, or locally at runtime), reduce the number of threads actually
used by one if the original value (a) is prime and (b) exceeds a
minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which is set
to 11 by default. If, when specifying the total number of threads (and
not the individual ways of parallelism for each loop), prime numbers
of threads are desired, this feature may be overridden by defining the
BLIS_ENABLE_AUTO_PRIME_NUM_THREADS macro in the bli_family_*.h that
corresponds to the configuration family targeted at configure-time.
(For now, there is no configure option(s) to control this feature.)
Thanks to Jeff Diamond for suggesting this change.
- Defined a new function in bli_thread.c, bli_is_prime(), that returns a
bool that determines whether an integer is prime. This function is
implemented in terms of existing functions in bli_thread.c.
- Updated docs/Multithreading.md to document the above feature, along
with unrelated minor edits.
commit 55933b6ff6b9b8a12041715f42bba06273d84b74
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 20 10:39:32 2020 -0600
Added missing attribution to docs/ReleaseNotes.md.
commit e310f57b4b29fbfee479e0f9fe2040851efdec4f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 19 13:33:37 2020 -0600
CHANGELOG update (0.8.0)
commit 9b387f6d5a010969727ec583c0cdd067a5274ed8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 19 13:33:37 2020 -0600
Version file update (0.8.0)
commit 2928ec750d3a3e1e5d55de5b57ddc04e9d0bd796
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 18 18:31:35 2020 -0600
ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
commit b9899bedff6854639468daa7a973bb14ca131a74
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 18 16:52:41 2020 -0600
CREDITS file update.
commit 9bb23e6c2a44b77292a72093938ab1ee6e6cc26a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 16 15:55:45 2020 -0600
Added support for systemless build (no pthreads).
Details:
- Added a configure option, --[enable|disable]-system, which determines
whether the modest operating system dependencies in BLIS are included.
The most notable example of this on Linux and BSD/OSX is the use of
POSIX threads to ensure thread safety for when application-level
threads call BLIS. When --disable-system is given, the bli_pthreads
implementation is dummied out entirely, allowing the calling code
within BLIS to remain unchanged. Why would anyone want to build BLIS
like this? The motivating example was submitted via #454 in which a
user wanted to build BLIS for a simulator such as gem5 where thread
safety may not be a concern (and where the operating system is largely
absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
the implementation of bli_clock() unconditionally returns 0.0 instead
of the time elapsed since some fixed point in the past. The reasoning
for this is that if the operating system is truly minimal, the system
function call upon which bli_clock() would normally be implemented
(e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
and BLISTypedAPI.md, with a note that both are non-functional when
BLIS is configured with --disable-system.
commit 88ad84143414644df4c56733b1cf91a36bfacaf8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 14 09:39:48 2020 -0600
Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
only the lower or upper triangle of a square matrix C. For now, only
the conventional/large code path will be supported (in vanilla BLIS).
This was accomplished by leveraging the existing variant logic for
herk. However, some of the infrastructure to support a gemmtsup is
included in this commit, including
- A bli_gemmtsup() front-end, similar to bli_gemmsup().
- A bli_gemmtsup_ref() reference handler function.
- A bli_gemmtsup_int() variant chooser function (with variant calls
commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
wrapper to a set of polymorphic CBLAS-like function wrappers defined
in another header (cblas.hh). These two headers are installed if
running the 'install' target with INSTALL_HH is set to 'yes'. (Also
added a set of unit tests that exercise blis.hh, although they are
disabled for now because they aren't compatible with out-of-tree
builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
various minor updates to dotv and scalv kernels. Also added various
sup kernels contributed by AMD to kernels/zen/3. However, these
kernels are (for now) not yet used, in part because they caused
AppVeyor clang failures, and also because I have not found time to
review and vet them.
- Output the python found during configure into the definition of PYTHON
in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
bug surfaced because the gemmt module verifies its computation using
gemm with its beta parameter set to zero, which, on a cortexa15 system
caused the gemm kernel code to unconditionally multiply the
uninitialized C data by beta. The C matrix likely contained
non-numeric values such as NaN, which then would have resulted in a
false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
in bli_l3_blocksize.c, was inadvertantly being defined in terms of
helper functions meant for trmm. This bug was probably harmless since
the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
kernels/zen/3/bli_gemm_small.c since those macros are not used in
vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
Windows systems.
- Various whitespace changes.
commit 234b8b0cf48f1ee965bd7999b291fc7add3b9a54
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 12 19:11:16 2020 -0600
Increased dotxaxpyf testsuite thresholds.
Details:
- Increased the test thresholds used by the dotxaxpyf testsuite module
by a factor of five in order to avoid residuals that unnecessarily
fall in the MARGINAL range. This commit should fix #455. Thanks to
@nagsingh for reporting this issue.
commit ed612dd82c50063cfd23576a6b2465213d31b14b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 7 13:09:42 2020 -0600
Updated README.md with sgemmsup blurb.
Details:
- Added an entry to the "What's New" section of the README.md to
announce the availability of sgemmsup.
commit e14424f55b15d67e8d18384aea45a11b9b772e02
Merge: 0cfe1aac eccdd75a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 7 13:02:50 2020 -0600
Merge branch 'dev'
commit 0cfe1aac222008a78dff3ee03ef5183413936706
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 30 17:10:36 2020 -0500
Relocated operation index to ToC in API docs.
Details:
- Moved the "Operation index" section of both the BLISObjectAPI.md and
BLISTypedAPI.md docs to appear immediately after the table of contents
of each document. This allows the reader to quickly jump to the
documentation for any operation without having to scroll through much
of the document (when rendered via a web browser).
- Fixed a mistake in the BLISObjectAPI.md for the setd operation, which
does *not* observe the diag property of its matrix argument. Thanks to
Jeff Diamond for reporting this.
commit 2a0682f8e5998be536da313525292f0da6193147
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 18 18:04:03 2020 -0500
Implemented runtime subconfig selection (#451).
Details:
- Implemented support for the user manually overriding the automatic
subconfiguration selection that happens at runtime. This override
can be requested by setting the BLIS_ARCH_TYPE environment variable.
The variable must be set to the arch_t id (as enumerated in
bli_type_defs.h) corresponding to the desired subconfiguration. If a
value outside this enumerated range is given, BLIS will abort with an
error message. If the value is in the valid range but corresponds to a
subconfiguration that was not activated at configure-time/compile-time,
BLIS will abort with a (different) error message. Thanks to decandia50
for suggesting this feature via issue #451.
- Defined a new function bli_gks_lookup_id to return the address of an
internal data structure within the gks. If this address is NULL, then
it indicates that the subconfig corresponding to the arch_t id passed
into the function was not compiled into BLIS. This function is used
in the second of the two abort scenarios described above.
- Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which
is returned for the latter of the two abort scenarios mentioned above,
along with a corresponding error message and a function to perform
the error check.
- Added cpp macro branching to bli_env.c to support compilation of the
auto-detect.x executable during configure-time. This cpp branch is
similar to the cpp code already found in bli_arch.c and bli_cpuid.c.
- Cleaned up the auto_detect() function to facilitate easier maintenance
going forward. Also added a convenient debug switch that outputs the
compilation command for the auto-detect.x executable and exits.
commit eccdd75a2d8a0c46e91e94036179c49aa5fa601c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 9 15:44:16 2020 -0500
Whitespace tweak in docs/PerformanceSmall.md.
commit 7677e9ba60ac27496e3421c2acc7c239e3f860e9
Merge: addcd46b a0849d39
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 9 15:41:25 2020 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit addcd46b0559d401aa7d33d4c7e6f63f5313a8e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 9 15:41:09 2020 -0500
Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
docs/PerformanceSmall.md for both sgemm and dgemm. These results were
gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
microarchitecture. Special thanks to Jeff Diamond for facilitating
access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
microarchitecture section.
commit a0849d390d04067b82af937cda8191b049b98915
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 9 20:22:17 2020 +0000
Register l3 sup kernels in zen2 subconfig.
Details:
- Registered full suite of sgemm and dgemm sup millikernels, blocksizes,
and crossover thresholds in bli_cntx_init_zen2.c.
- Minor updates to test/sup/runme.sh for running on Zen2 Epyc 7742
system.
commit d98368c32d5fbfaab8966ee331d9bcb5c4fe7a59
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 8 19:05:51 2020 -0500
Another tweak to line thickness of Zen2 graphs.
commit 1855dfbdaafa37892b36c97fd317fd5d8da76676
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 8 19:01:00 2020 -0500
Tweaked line thickness in Zen2 graphs once more.
Details:
- Decreased (relative to previous commit) line thickness in recent Zen2
graphs.
commit 0991611e7ed82889c53a5c3f1ef1d49552c50d61
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 8 18:54:49 2020 -0500
Increased line thickness in recent Zen2 graphs.
Details:
- Increased the width of the lines in the graphs introduced in 74ec6b8.
commit 8273cbacd7799e9af59e5320d66055f2f5d9cb31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 7 14:51:33 2020 -0500
README.md, docs/FAQ.md updates.
Details:
- Added a frequently asked question to docs/FAQ.md regarding the
difference between upstream (vanilla) BLIS and AMD BLIS.
- Updated the name of ICES in the README.md to reflect the Oden
rebranding.
commit a178a822ad3d5021489a0e61f909d8550ae12a8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 30 16:00:52 2020 -0500
Added Zen2 links to docs/Performance.md Contents.
commit 74ec6b8f457cabe37d2382aaab35ba04fc737948
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 30 15:54:18 2020 -0500
Added Epyc 7742 Zen2 ("Rome") performance results.
Details:
- Added single-threaded and multithreaded performance results to
docs/Performance.md. These results were gathered on an Epyc 7742
"Rome" server with AMD's Zen2 microarchitecture. Special thanks
to Jeff Diamond for facilitating access to the system via the
Oracle Cloud.
- Renamed files containing the previous Zen performance results for
consistency with the new results.
commit bc4a213a2c3dcf8bbfcbb3a1ef3e9fc9e3226c34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 30 15:28:20 2020 -0500
Updated matlab (now octave) plot code in test/3.
Details:
- Renamed test/3/matlab to test/3/octave.
- Within test/3, updated and tuned plot_l3_perf.m and plot_panel_4x5.m
files for use with octave (which is free and doesn't crash on me
mid-way through my use of subplot).
- Updated runthese.m scratchpad for zen2 invocations.
- Added Nikolay S.'s subplot_tight() function, along with its license.
commit c77ddc418187e1884fa6bcfe570eee295b9cb8bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 30 20:15:43 2020 +0000
Added optional numactl usage to test/3/runme.sh.
commit 2d8ec164e7ae4f0c461c27309dc1f5d1966eb003
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date: Tue Sep 29 16:52:18 2020 -0500
Add POWER10 support to BLIS (#450)
commit 4fd8d9fec2052257bf2a5c6e0d48ae619ff6c3e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 28 23:39:05 2020 +0000
Tweaked zen2 subconfig's MC cache blocksizes.
Details:
- Updated the MC cache blocksizes registered by the 'zen2' subconfig.
- Minor updates to test/3/Makefile and test/3/runme.sh.
commit 5efcdeffd58af621476d179afc0c19c0f912baa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 25 14:25:24 2020 -0500
More minor README.md updates.
commit 9e940f8aad6f065ea1689e791b9a4e1fb7900c40
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 25 13:53:35 2020 -0500
Added 1m SISC bibtex to README.md.
Details:
- Added final citation info to 1m bibtex in README.md file.
- Updated draft 1m paper link.
- Changed some http to https.
commit e293cae2d1b9067261f613f25eaa0e871356b317
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 15 16:09:11 2020 -0500
Implemented sgemmsup assembly kernels.
Details:
- Created a set of single-precision real millikernels and microkernels
comparable to the dgemmsup kernels that already exist within BLIS.
- Added prototypes for all kernels within bli_kernels_haswell.h.
- Registered entry-point millikernels in bli_cntx_init_haswell.c and
bli_cntx_init_zen.c.
- Added sgemmsup support to the Makefile, runme.sh script, and source
file in test/sup. This included edits that allow for separate "small"
dimensions for single- and double-precision as well as for single-
vs. multithreaded execution.
commit 2765c6f37c11cb7f71cd4b81c64cea6130636c68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 12 17:48:15 2020 -0500
Type saga continues; fixed sgemm ukernel signature.
Details:
- Changed double* pointers in sgemm function signature to float*. At
this point I've lost track of whether this was my fault or another
dormant bug like the one described in ece9f6a, but at this point I
no longer care. It's one of those days (aka I didn't ask for this).
commit 0779559509e0a1af077530d09ed151dac54f32ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 12 17:37:21 2020 -0500
Fixed missing restrict in knl sgemm prototype.
Details:
- Added a missing 'restrict' qualifier in the sgemm ukernel prototype
for knl. (Not sure how that code was ever compiling before now.)
commit ece9f6a3ef1b26b53ecf968cd069df7a85b139fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 12 17:22:42 2020 -0500
Fixed dormant type bugs in bli_kernels_knl.h.
Details:
- Fixed dormant type mismatches in the use of the prototype-generating
macros in bli_kernels_knl.h. Specifically, some float prototypes
were incorrectly using double as their ctype. This didn't actually
matter until the type changes in 645d771, as previously those types
were not used since packm was prototyped with void* pointers.
commit 8ebb3b60e1c4c045ddb48e02de6e246cecde24a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 12 17:00:47 2020 -0500
Fixed accidental breakage in 645d771.
Details:
- In trying to clean up kappa_cast variables in the reference packm
kernels, which I initally believed to be redundant given the other
void* -> ctype* changes in 645d771, I accidentally ended up violating
restrict semantics for 1e/1r packing and possibly other packm kernels.
(Normally, my pre-commit testsuite run would have caught this, but I
was unknowingly using an edited input.operations file in which I'd
disabled most tests as part of unrelated work.) This commit reverts
the kappa_cast changes in 645d771.
commit 645d771a14ae89aa7131d6f8f4f4a8090329d05e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 12 15:31:56 2020 -0500
Minor packm kernel type cleanup (void* -> ctype*).
Details:
- Changed all void* function arguments in reference packm kernels to
those of the native type (ctype*). These pointers no longer need to
be void* and are better represented by their native types anyway.
(See below for details.) Updated knl packm kernels accordingly.
- In the definition of the PACKM_KER_PROT prototype macro template in
frame/1m/bli_l1m_ker_prot.h, changed the pointer types for kappa, a,
and p from void* to ctype*. They were originally void* because these
function signatures had to share the same type so they could all be
stored in a single array of that shared type, from which they were
queried and called by packm_cxk(). This is no longer how the function
pointers are stored, and so it no longer makes sense to force the
caller of packm kernels to use void*, only so that the implementor
of the packm kernels can typecast back to the native datatype within
the kernel definition. This change has no effect internally within
BLIS because currently all packm kernels are called after querying
the function addresses from the context and then typecasting to the
appropriate function pointer type, which is based upon type-specific
function pointers like float* and double*.
- Removed a comment in frame/1m/bli_l1m_ft_ker.h that was outdated and
misleading due to changes to the handling of packm kernels since
moving them into the context.
commit 54bf6c35542a297e25bc8efec6067a6df80536f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 10 15:42:01 2020 -0500
Minor README.md update.
Details:
- Added a new entry to the "What people are saying about BLIS" section.
commit e50b4d40462714ae33df284655a2faf7fa35f37c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 9 14:12:53 2020 -0500
Minor update to README.md (SIAM Best Paper Prize).
commit a8efb72074691e2610372108becd88b4b392299e
Merge: b0c4da17 97e87f2c
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Sep 7 16:18:19 2020 -0500
Merge pull request #434 from flame/intel-zdot
Add an option to change the complex return type.
commit 97e87f2c9f3878a05e1b7c6ec237ee88d9a72a42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 7 15:56:42 2020 -0500
Whitespace/comment updates to #434 PR.
commit b0c4da1732b6c6a9ff66f70c36e4722e0f9645ae
Merge: 810e90ee b1b5870d
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Sep 7 15:47:54 2020 -0500
Merge pull request #436 from flame/s390x
Add checks so that s390x is detected as 64-bit.
commit 810e90ee806510c57504f0cf8eeaf608d38bd9dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 1 16:11:40 2020 -0500
Minor README.md update.
Details:
- Added HPE to list of funders.
- Changed http to https in funders' website links.
commit 7d411282196e036991c26e52cb5e5f85769c8059
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 13 17:50:58 2020 -0500
Use -O2 for all framework code. (#435)
It seems that -O3 might be causing intermittent problems with the f2c'ed packed and banded code. -O3 is retained for kernel code. Fixes #341 and fixes #342.
commit 9c5b485d356367b0a1288761cd623f52036e7344
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Fri Aug 7 20:11:18 2020 +0000
Don't override -mcpu with -march on ARM (#353)
* Use -mcpu for ARM
See the GCC doc about -march, -mtune, and -mpu and maybe
https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu
* Fix typo in flags
* Fix typo in cortexa9 flags
* Modify cortexa53 compilation flags to fix failing BLAS check (#341)
commit c253d14a72a746b670b3ffbb6e81bcafc73d1133
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Aug 7 09:39:04 2020 -0500
Also handle Intel-style complex return in CBLAS interface.
commit 5d653a11a0cc71305d0995507b1733995856f475
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 6 17:58:26 2020 -0500
Update Multithreading.md
Addresses the issue raised in #426.
commit b1b5870dd3f9b1c78cf5f58a53514d73f001fc4c
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 6 17:34:20 2020 -0500
Add checks so that s390x is detected as 64-bit.
commit 882dcb11bfc9ea50aa2f9044621833efd90d42be
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 6 17:28:14 2020 -0500
Mention example code at top of documentation docs.
Details:
- Steer the reader towards the example code section of each
documentation doc (object and typed).
- Trivial update to examples/oapi/README, examples/tapi/README.
commit f4894512e5bf56ff83701c07dd02972e300741a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 6 17:20:00 2020 -0500
Very minor updates to previous commit.
commit adedb893ae8dfacd1dc54035979e15c44d589dbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 6 17:14:01 2020 -0500
Documented mutator functions in BLISObjectAPI.md.
Details:
- Added documentation for commonly-used object mutator functions in
BLISObjectAPI.md. Previously, only accessor functions were documented.
Thanks to Jeff Diamond for pointing out this omission.
- Explicitly set the 'diag' property of objects in oapi example modules
(08level2.c and 09level3.c).
commit 5b5278ff494888509543a79c09ea82089f6c95d9
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 6 14:19:37 2020 -0500
Use #ifdef instead of #if as macro may be undefined.
commit 7fdc0fc893d0c6727b725ea842053b65be2c20ba
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Aug 6 14:03:55 2020 -0500
Add an option to change the complex return type.
ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes #433.
commit 6e522e5823b762d4be09b6acdca30faafba56758
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 30 19:31:37 2020 -0500
Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
behavior is for the sandbox implementation to handle all problem
sizes, even the smaller ones that would normally be handled by the
sup code path.
commit 00e14cb6d849e963a2e1ac35e7dbbe186af00a58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 29 14:24:34 2020 -0500
Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
C99 bool type. A few remaining instances, such as those in the files
bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
C99's bool instead of bool_t, which was raised in issue #420. The first
phase, which cleaned up various typecasts in preparation for using
bool as the basis for bool_t (instead of gint_t), was implemented by
commit a69a4d7. The second phase, which redefined the bool_t typedef
in terms of bool (from gint_t), was implemented by commit 2c554c2.
commit 2c554c2fce885f965a425e727a0314d3ba66c06d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 24 15:57:19 2020 -0500
Redefined bool_t typedef in terms of C99 bool.
Details:
- Changed the typedef that defines bool_t from:
typedef gint_t bool_t;
where gint_t is a signed integer that forms the basis of most other
integers in BLIS, to:
typedef bool bool_t;
- Changed BLIS's TRUE and FALSE macro definitions from being in terms of
integer literals:
#define TRUE 1
#define FALSE 0
to being in terms of C99 boolean constants:
#define TRUE true
#define FALSE false
which are provided by stdbool.h.
- This commit constitutes the second phase of a transition toward using
C99's bool instead of bool_t, which will address issue #420. The first
phase, which cleaned up various typecasts in preparation for using
bool as the basis for bool_t (instead of gint_t), was implemented by
commit a69a4d7.
commit e01dd125581cec87f61e15590922de0dc938ec42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 24 15:41:46 2020 -0500
Fail-safe updates to Makefiles in 'test' dir.
Details:
- Updated Makefiles in test, test/3, and test/sup so that running any of
the usual targets without having first built BLIS results in a helpful
error message. For example, if BLIS is not yet configured, make will
output:
Makefile:327: *** Cannot proceed: config.mk not detected! Run
configure first. Stop.
Similarly, if BLIS is configured but not yet built, make will output:
Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
make first. Stop.
In previous commits, these actions would result in a rather cryptic
make error such as:
make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
needed by 'blis-nat-st'. Stop.
commit b4f47f7540062da3463e2cb91083c12fdda0d30a
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Jul 24 13:56:13 2020 -0500
Add BLIS_EXPORT_BLIS to bli_abort. (#429)
Fixes #428.
commit a69a4d7e2f4607c919db30b14535234ce169c789
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 22 16:13:09 2020 -0500
Cleaned up bool_t usage and various typecasts.
Details:
- Fixed various typecasts in
frame/base/bli_cntx.h
frame/base/bli_mbool.h
frame/base/bli_rntm.h
frame/include/bli_misc_macro_defs.h
frame/include/bli_obj_macro_defs.h
frame/include/bli_param_macro_defs.h
that were missing or being done improperly/incompletely. For example,
many return values were being typecast as
(bool_t)x && y
rather than
(bool_t)(x && y)
Thankfully, none of these deficiencies had manifested as actual bugs
at the time of this commit.
- Changed the return type of bli_env_get_var() from dim_t to gint_t.
This reflects the fact that bli_env_get_var() needs to be able to
return a signed integer, and even though dim_t is currently defined
as a signed integer, it does not intuitively appear to necessarily be
signed by inspection (i.e., an integer named "dim_t" for matrix
"dimension"). Also, updated use of bli_env_get_var() within
bli_pack.c to reflect the changed return type.
- Redefined type of thrcomm_t.barrier_sense field from bool_t to gint_t
and added comments to the bli_thrcomm_*.h files that will explain a
planned replacement of bool_t with C99's bool type.
- Note: These changes are being made to facilitate the substitution of
'bool' for 'bool_t', which will eliminate the namespace conflict with
arm_sve.h as reported in issue #420. This commit implements the first
phase of that transition. Thanks to RuQing Xu for reporting this
issue.
- CREDITS file update.
commit a6437a5c11d364c6c88af527294d29734d7cc7d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 20 19:21:07 2020 -0500
Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
API changes over the last two years. Rather than try to fix it, I've
replaced it with a much simpler version based on var2 of gemmsup.
Why not fix the previous implementation? It occurred to me that the
old implementation was trying to be a lightly simplified duplication
of what exists in the framework. Duplication aside, this sandbox
would have worked fine if it had been completely independent of the
framework code. The problem was that it was only partially
independent, with many function calls calling a function in BLIS
rather than a duplicated/simplified version within the sandbox. (And
the reason I didn't make it fully independent to begin with was that
it seemed unnecessarily duplicative at the time.) Maintaining two
versions of the same implementation is problematic for obvious
reasons, especially when it wasn't even done properly to begin with.
This explains the reimplementation in this commit. The only catch is
that the newer implementation is single-threaded only and does not
perform any packing on either input matrix (A or B). Basically, it's
only meant to be a simple placeholder that shows how you could plug
in your own implementation. Thanks to Francisco Igual for reporting
this brokenness.
- Updated the three reference gemmsup kernels (defined in
ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
conjugation of conja and/or conjb. The general storage kernel, which
is currently identical to the column-storage kernel, is used in the
new ref99 sandbox to provide basic support for all datatypes
(including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
sandbox implementation is based).
commit bca040be9da542dd9c75d91890fa7731841d733d
Merge: 2605eb4d 171ecc1d
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Jul 20 09:27:30 2020 -0500
Merge pull request #425 from gmargari/patch-1
Update Multithreading.md
commit 171ecc1dc6f055ea39da30e508f711b49a734359
Author: Giorgos Margaritis <gmargari@protonmail.com>
Date: Mon Jul 20 12:24:06 2020 +0300
Update Multithreading.md
commit 2605eb4d99d3813c37a624c011aa2459324a6d89
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 15 15:25:19 2020 -0500
Added missing rv_d?x6 edge cases to sup kernel.
Details:
- Added support to bli_gemmsup_rv_haswell_asm_d6x8n.c for handling
various n = 6 edge cases with a single sup kernel call. Previously,
only n = {4,2,1} were handled explicitly as single kernel calls;
that is, cases where n = 6 were previously being executed via two
kernel calls (n = 4 and n = 2).
- Added commented debug line to testsuite's test_libblis.c.
commit 72f6ed0637dfcb021de04ac7d214d5c87e55d799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 3 17:55:54 2020 -0500
Declare/define static functions via BLIS_INLINE.
Details:
- Updated all static function definitions to use the cpp macro
BLIS_INLINE instead of the static keyword. This allows blis.h to
use a different keyword (inline) to define these functions when
compiling with C++, which might otherwise trigger "defined but
not used" warning messages. Thanks to Giorgos Margaritis for
reporting this issue and Devin Matthews for suggesting the fix.
- Updated the following files, which are used by configure's
hardware auto-detection facility, to unconditionally #define
BLIS_INLINE to the static keyword (since we know BLIS will be
compiled with C, not C++):
build/detect/config/config_detect.c
frame/base/bli_arch.c
frame/base/bli_cpuid.c
- CREDITS file update.
commit 5fc701ac5f94c6300febbb2f24e731aa34f0f34a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 1 15:48:58 2020 -0500
Added -fomit-frame-pointer option to CKOPTFLAGS.
Details:
- Added the -fomit-frame-pointer compiler option to the CKOPTFLAGS
variable in the following make_defs.mk files:
config/haswell/make_defs.mk
config/skx/make_defs.mk
as well as comments that mention why the compiler option is needed.
This option is needed to prevent the compiler from using the rbp
frame register (in the very early portion of kernel code, typically
where k_iter and k_left are defined and computed), which, as of
1c719c9, is used explicitly by the gemmsup millikernels. Thanks to
Devin Matthews for identifying this missing option and to Jeff
Diamond for reporting the original bug in #417.
- The file
config/zen/amd_config.mk
which feeds into the make_defs.mk for both zen and zen2 subconfigs,
was also touched, but only to add a commented-out compiler option
(and the aforementioned explanatory comment) since that file already
uses -fomit-frame-pointer in COPTFLAGS, which forms the basis of
CKOPTFLAGS.
commit 6af59b705782dada47e45df6634b479fe781d4fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 1 14:54:23 2020 -0500
Fixed disabled edge case optimization in gemmsup.
Details:
- Fixed an inadvertently disabled edge case optimization in the two
gemmsup variants in bli_l3_sup_var1n2m.c. Background: These edge case
optimizations allow the last millikernel operation in the jr loop to
be executed with inflated an register blocksize if it is the last
(or only) iteration. For example, if mr=6 and nr=8 and the gemmsup
problem is m=8, n=100, k=100. (In this case, the panel-block variant
(var1n) is executed, which places the jr loop in the m dimension.)
In principle, this problem could be executed as two millikernels: one
with dimensions 6x100x100, and one as 2x100x100. However, with the
support for inflated blocksizes in the kernel, the entire 8x100x100
problem can be passed to the millikernel function, which will then
execute it more favorably as two 4x100x100 millikernel sub-calls.
Now, this optimization is disabled under certain circumstances, such
as when multithreading. Previously, the is_mt predicate was being set
incorrectly such that it was non-zero even when running
single-threaded.
- Upon fixing the is_mt issue above, another bit of code needed to be
moved so that the result of the optimization could have an impact on
the assignment of loop bounds ranges to threads.
commit b37634540fab0f9b8d4751b8356ee2e17c9e3b00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 25 16:05:12 2020 -0500
Support ldims, packing in sup/test drivers.
Details:
- Updated the test/sup source file (test_gemm.c) and Makefile to support
building matrices with small or large leading dimensions, and updated
runme.sh to support executing both kinds of test drivers.
- Updated runme.sh to allow for executing sup drivers with unpacked (the
default) or packed matrices (via setting BLIS_PACK_A, BLIS_PACK_B
environment variables), and for capturing output to files that encode
both the leading dimension (small or large) and packing status into
the filenames.
- Consolidated octave scripts in test/sup/octave_st, test/sup/octave_mt
into test/sup/octave and updated the octave code in that consolidated
directory to read the new output filename format (encoding ldim and
packing). Also added comments and streamlined code, particularly in
plot_panel_trxsh.m. Tested the octave scripts with octave 5.2.0.
- Moved old octave_st, octave_mt directories to test/sup/old.
commit ceb9b95a96cc3844ecb43d9af48ab289584e76b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 18 17:15:25 2020 -0500
Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
helping find this incorrect link.
commit b3c42016818797f79e55b32c8b7d090f9d0aa0ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 18 14:00:56 2020 -0500
CREDITS file update.
commit 31af73c11abae03248d959da0f81eacea015b57a
Author: Isuru Fernando <isuruf@gmail.com>
Date: Thu Jun 18 13:35:54 2020 -0500
Expand windows instructions (#414)
* Expand windows instructions
* Windows: both static and shared don't work at the same time
commit b5b604e106076028279e6d94dc0e51b8ad48e802
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 17 16:42:24 2020 -0500
Ensure random objects' 1-norms are non-zero.
Details:
- Fixed an innocuous bug that manifested when running the testsuite on
extremely small matrices with randomization via the "powers of 2 in
narrow precision range" option enabled. When the randomization
function emits a perfect 0.0 to fill a 1x1 matrix, the testsuite will
then compute 0.0/0.0 during the normalization process, which leads to
NaN residuals. The solution entails smarter implementaions of randv,
randnv, randm, and randnm, each of which will compute the 1-norm of
the vector or matrix in question. If the object has a 1-norm of 0.0,
the object is re-randomized until the 1-norm is not 0.0. Thanks to
Kiran Varaganti for reporting this issue (#413).
- Updated the implementation of randm_unb_var1() so that it loops over
a call to the randv_unb_var1() implementation directly rather than
calling it indirectly via randv(). This was done to avoid the overhead
of multiple calls to norm1v() when randomizing the rows/columns of a
matrix.
- Updated comments.
commit 35e38fb693e7cbf2f3d7e0505a63b2c05d3f158d
Author: Isuru Fernando <isuruf@gmail.com>
Date: Tue Jun 16 10:59:41 2020 -0500
FIx typo in FAQ
commit 1c719c91a3ef0be29a918097652beef35647d4b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 4 17:21:08 2020 -0500
Bugfixes, cleanup of sup dgemm ukernels.
Details:
- Fixed a few not-really-bugs:
- Previously, the d6x8m kernels were still prefetching the next upanel
of A using MR*rs_a instead of ps_a (same for prefetching of next
upanel of B in d6x8n kernels using NR*cs_b instead of ps_b). Given
that the upanels might be packed, using ps_a or ps_b is the correct
way to compute the prefetch address.
- Fixed an obscure bug in the rd_d6x8m kernel that, by dumb luck,
executed as intended even though it was based on a faulty pointer
management. Basically, in the rd_d6x8m kernel, the pointer for B
(stored in rdx) was loaded only once, outside of the jj loop, and in
the second iteration its new position was calculated by incrementing
rdx by the *absolute* offset (four columns), which happened to be the
same as the relative offset (also four columns) that was needed. It
worked only because that loop only executed twice. A similar issue
was fixed in the rd_d6x8n kernels.
- Various cleanups and additions, including:
- Factored out the loading of rs_c into rdi in rd_d6x8[mn] kernels so
that it is loaded only once outside of the loops rather than
multiple times inside the loops.
- Changed outer loop in rd kernels so that the jump/comparison and
loop bounds more closely mimic what you'd see in higher-level source
code. That is, something like:
for( i = 0; i < 6; i+=3 )
rather than something like:
for( i = 0; i <= 3; i+=3 )
- Switched row-based IO to use byte offsets instead of byte column
strides (e.g. via rsi register), which were known to be 8 anyway
since otherwise that conditional branch wouldn't have executed.
- Cleaned up and homogenized prefetching a bit.
- Updated the comments that show the before and after of the
in-register transpositions.
- Added comments to column-based IO cases to indicate which columns
are being accessed/updated.
- Added rbp register to clobber lists.
- Removed some dead (commented out) code.
- Fixed some copy-paste typos in comments in the rv_6x8n kernels.
- Cleaned up whitespace (including leading ws -> tabs).
- Moved edge case (non-milli) kernels to their own directory, d6x8,
and split them into separate files based on the "NR" value of the
kernels (Mx8, Mx4, Mx2, etc.).
- Moved config-specific reference Mx1 kernels into their own file
(e.g. bli_gemmsup_r_haswell_ref_dMx1.c) inside the d6x8 directory.
- Added rd_dMx1 assembly kernels, which seems marginally faster than
the corresponding reference kernels.
- Updated comments in ref_kernels/bli_cntx_ref.c and changed to using
the row-oriented reference kernels for all storage combos.
commit 943a21def0bedc1732c0a2453afe7c90d7f62e95
Author: Isuru Fernando <isuruf@gmail.com>
Date: Thu May 21 14:09:21 2020 -0500
Add build instructions for Windows (#404)
commit fbef422f0d968df10e598668b427af230cfe07e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 21 10:30:41 2020 -0500
Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
into two separate questions, one for each OS.
commit 28be1a4265ea67e3f177c391aba3dbbcf840bd52
Author: Guodong Xu <guodong.xu@linaro.org>
Date: Thu May 21 02:22:22 2020 +0800
avoid loading twice in armv8a gemm kernel (#403)
This bug happens at a corner case, when k_iter == 0 and we jump to
CONSIDERKLEFT.
In current design, first row/col. of a and b are loaded twice.
The fix is to rearrange a and b (first row/col.) loading instructions.
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
commit d51245e58b0beff2717156b980007c90337150d8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 8 18:00:54 2020 -0500
Add support for Intel oneAPI in configure.
Details:
- Properly select cc_vendor based on the output of invoking CC with the
--version option, including cases where CC is the variant of clang
that is included with Intel oneAPI. (However, we continue to treat
the compiler as clang for other purposes, not icc.) Thanks to Ajay
Panyala and Devin Matthews for reporting on this issue via #402.
commit 787adad73bd5eb65c12c39d732723a1ac0448748
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 8 16:18:20 2020 -0500
Defined netlib equivalent of xerbla_array().
Details:
- Added a function definition for xerbla_array_(), which largely mirrors
its netlib implementation. Thanks to Isuru Fernando for suggesting the
addition of this function.
commit c53b5153bee585685bf95ce22e058a7af72ecef0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 5 12:39:12 2020 -0500
Documented Perl prerequisite for build system.
Details:
- Added Perl to list of prerequisites for building BLIS. This is in part
(and perhaps completely?) due to some substitution commands used at
the end of configure that include '\n' characters that are not
properly interpreted by the version of sed included on some versions
of OS X. This new documentation addresses issue #398.
commit f032d5d4a6ed34c8c3e5ba1ed0b14d1956d0097c
Author: Guodong Xu <guodong.xu@linaro.org>
Date: Thu Apr 30 01:08:46 2020 +0800
New kernel set for Arm SVE using assembly (#396)
Here adds two kernels for Arm SVE vector extensions.
1. a gemm kernel for double at sizes 8x8.
2. a packm kernel for double at dimension 8xk.
To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.
"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code." [1]
[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
commit 4d87eb24e8e1f5a21e04586f6df4f427bae0091b
Author: Yingbo Ma <mayingbo5@gmail.com>
Date: Mon Apr 27 17:02:47 2020 -0400
Update KernelsHowTo.md (#395)
commit 477ce91c5281df2bbfaddc4d86312fb8c8f879e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 22 14:26:49 2020 -0500
Moved #include "cpuid.h" to bli_cpuid.c.
Details:
- Relocated the #include "cpuid.h" directive from bli_cpuid.h to
bli_cpuid.c. This was done because cpuid.h (which is pulled into
the post-build blis.h developer header) doesn't protect its
definitions with a preprocessor guard of the form:
#ifndef FOOBAR_H
#define FOOBAR_H
// header contents.
#endif
and as a result, applications (previously) could not #include both
blis.h and cpuid.h (since the former was already including the
latter). Thanks to Bhaskar Nallani for raising this issue via #393
and to Devin Matthews for suggesting this fix.
- CREDITS file update.
commit 8bde63ffd7474a97c3a3b0b0dc1eae45be0ab889
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 18 12:50:12 2020 -0500
Adding missing conjy to her2/syr2 in typed API doc.
Details:
- Fixed a missing argument (conjy) in the function signatures of
bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
van de Geijn for reporting this omission.
commit 976902406b610afdbacb2d80a7a2b4b43ff30321
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 17 15:11:10 2020 -0500
Disable packing by default in expert rntm_t init.
Details:
- Changed the behavior of bli_rntm_init() as well as the static
initializer, BLIS_RNTM_INITIALIZER, so that user-initialized rntm_t
objects by default specify the disabling of packing for A and B.
Packing of A/B was already disabled by default when calling non-expert
APIs (and enabled only when the user set environment variables
BLIS_PACK_A or BLIS_PACK_B). With this commit, the default behavior of
using user-initialized rntm_t objects with expert APIs comes into line
with the default behavior of non-expert APIs--that is, they now both
lead to the avoidance of packing in the sup code path. (Note: The
conventional code path is unaffected by the environment variables
BLIS_PACK_A/BLIS_PACK_B and/or the disabling of packing in a rntm_t
object when calling an expert API.) This addresses issue #392. Thanks
to Kiran Varaganti for bringing this inconsistency to our attention.
- The above change was accomplished by changing the the definitions of
static functions bli_rntm_clear_pack_a() and bli_rntm_clear_pack_b()
in bli_rntm.h, which are both for internal use only.
commit 5f2aee7c5fa5d562acaf8fbde3df0e2a04e1dd1b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 7 14:55:15 2020 -0500
README.md update to promote supmt dgemm.
Details:
- Updated the sup entry in the "What's New" section of the README.md
file to promote the multithreaded dgemm sup feature introduced in
c0558fd.
commit f5923cd9ff5fbd91190277dea8e52027174a1d57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 7 14:41:45 2020 -0500
CHANGELOG update (0.7.0)
commit 68b88aca6692c75a9f686187e6c4a4e196ae60a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 7 14:41:44 2020 -0500
Version file update (0.7.0)
commit b04de636c1702e4cb8e7ad82bab3cf43d2dbdfc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 7 14:37:43 2020 -0500
ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
commit 2cb604ba472049ad498df72d4a2dc47a161d4c3c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 6 16:42:14 2020 -0500
Rename more bli_thread_obarrier(), _obroadcast().
Details:
- Renamed instances of bli_thread_obarrier() and bli_thread_obroadcast()
that were made in the supmt-specific code commited to the 'amd'
branch, which has now been merged with 'master'. Prior to the merge,
'master' received commit c01d249, which applied these renamings to
the existing, non-sup codebase.
commit efb12bc895de451067649d5dceb059b7827a025f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 6 15:01:53 2020 -0500
Minor updates/elaborations to RELEASING file.
commit 2e3b3782cfb7a2fd0d1a325844983639756def7d
Merge: 9f3a8d4d da0c086f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 6 14:55:35 2020 -0500
Merge branch 'master' into amd
commit da0c086f4643772e111318f95a712831b0f981a8
Author: Satish Balay <balay@mcs.anl.gov>
Date: Tue Mar 31 17:09:41 2020 -0500
OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime
Before this change:
Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR
Co-Authored-By: Jed Brown <jed@jedbrown.org>
* CREDITS file update.
Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
commit 2bca03ea9d87c0da829031a5332545d05e352211
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 28 22:10:00 2020 +0000
Updates, tweaks to runme.sh in test/1m4m.
Details:
- Made several updates to test/1m4m/runme.sh, including:
- Added missing handling for 1m and 4m1a implementations when setting
the BLIS_??_NT environment variables.
- Added support for using numactl to run the test executables.
- Several other cleanups.
commit c40a33190b94af5d5c201be63366594859b1233f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 26 16:55:00 2020 -0500
Warn user when auto-detection returns 'generic'.
Details:
- Added logic to configure that causes the script to output a warning
to the user if/when "./configure auto" is run and the underlying
hardware feature detection code is unable to identify the hardware.
In these cases, the auto-detect code will return 'generic', which
is likely not what the user expected, and a flag will be set so that
a message is printed at the end of the configure output. (Thankfully,
we don't expect this scenario to play out very often.) Thanks to
Devin Matthews for suggesting this fix #384.
commit 492a736fab5b9c882996ca024b64646877f22a89
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Mar 24 17:28:47 2020 -0500
Fix vectorized version of bli_amaxv (#382)
* Fix vectorized version of bli_amaxv
To match Netlib, i?amax should return:
- the lowest index among equal values
- the first NaN if one is encountered
* Fix typos.
* And another one...
* Update ref. amaxv kernel too.
* Re-enabled optimized amaxv kernels.
Details:
- Re-enabled the optimized, intrinsics-based amaxv kernels in the 'zen'
kernel set for use in haswell, zen, zen2, knl, and skx subconfigs.
These two kernels (for s and d datatypes) were temporarily disabled in
e186d71 as part of issue #380. However, the key missing semantic
properties that prompted the disabling of these kernels--returning the
index of the *first* rather than of the last element with largest
absolute value, and returning the index of the first NaN if one is
encountered--were added as part of #382 thanks to Devin Matthews.
Thus, now that the kernels are working as expected once more, this
commit causes these kernels to once again be registered for the
affected subconfigs, which effectively reverts all code changes
included in e186d71.
- Whitespace/formatting updates to new macros in bli_amaxv_zen_int.c.
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
commit e186d7141a51f2d7196c580e24e7b7db8f209db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 21 18:40:36 2020 -0500
Disabled optimized amaxv kernels.
Details:
- Disabled use of optimized amaxv kernels, which use vector intrinsics
for both 's' and 'd' datatypes. We disable these kernels because the
current implementations fail to observe a semantic property of the
BLAS i?amax_() subroutine, which is to return the index of the
*first* element containing the maximum absolute value (that is, the
first element if there exist two or more elements that contain the
same value). With the optimized kernels disabled, the affected
subconfigurations (haswell, zen, zen2, knl, and skx) will use the
default reference implementations. Thanks to Mat Cross for reporting
this issue via #380.
- CREDITS file update.
commit 9f3a8d4d851725436b617297231a417aa9ce8c6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 14 17:48:43 2020 -0500
Added missing return to bli_thread_partition_2x2().
Details:
- Added a missing return statement to the body of an early case handling
branch in bli_thread_partition_2x2(). This bug only affected cases
where n_threads < 4, and even then, the code meant to handle cases
where n_threads >= 4 executes and does the right thing, albeit using
more CPU cycles than needed. Nonetheless, thanks to Kiran Varaganti
for reporting this bug via issue #377.
- Whitespace changes to bli_thread.c (spaces -> tabs).
commit 8c3d9b9eeb6f816ec8c32a944f632a5ad3637593
Merge: 71249fe8 0f9e0399
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 10 14:03:33 2020 -0500
Merge branch 'amd' of github.com:flame/blis into amd
commit 71249fe8ddaa772616698f1e3814d40e012909ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 10 13:55:29 2020 -0500
Merged test/sup, test/supmt into test/sup.
Details:
- Updated the Makefile, test_gemm.c, and runme.sh in test/sup to be able
to compile and run both single-threaded and multithreaded experiments.
This should help with maintenance going forward.
- Created a test/sup/octave_st directory of scripts (based on the
previous test/sup/octave scripts) as well as a test/sup/octave_mt
directory (based on the previous test/supmt/octave scripts). The
octave scripts are slightly different and not easily mergeable, and
thus for now I'll maintain them separately.
- Preserved the previous test/sup directory as test/sup/old/supst and
the previous test/supmt directory as test/sup/old/supmt.
commit 0f9e0399e16e96da2620faf2c0c3c21274bb2ebd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 5 17:03:21 2020 -0600
Updated sup performance graphs; added mt results.
Details:
- Reran all existing single-threaded performance experiments comparing
BLIS sup to other implementations (including the conventional code
path within BLIS), using the latest versions (where appropriate).
- Added multithreaded results for the three existing hardware types
showcased in docs/PerformanceSmall.md: Kaby Lake, Haswell, and Epyc
(Zen1).
- Various minor updates to the text in docs/PerformanceSmall.md.
- Updates to the octave scripts in test/sup/octave, test/supmt/octave.
commit 90db88e5729732628c1f3acc96eeefab49f2da41
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 2 15:06:48 2020 -0600
Updated sup[mt] Makefiles for variable dim ranges.
Details:
- Updated test/sup/Makefile and test/supmt/Makefile to allow specifying
different problem size ranges for the drivers where one, two, or three
matrix dimensions is large. This will facilitate the generation of
more meaningful graphs, particularly when two dimensions are tiny.
commit 31f11a06ea9501724feec0d2fc5e4644d7dd34fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 27 14:33:20 2020 -0600
Updates to octave scripts in test/sup[mt]/octave.
Details:
- Optimized scripts in test/sup/octave and test/supmt/octave for use
with octave 5.2.0 on Ubuntu 18.04.
- Fixed stray 'end' keywords in gen_opsupnames.m and plot_l3sup_perf.m,
which were not only unnecessary but also causing issues with versions
5.x.
commit c01d249d7c546fe2e3cee3fe071cd4c4c88b9115
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 25 14:50:53 2020 -0600
Renamed bli_thread_obarrier(), _obroadcast().
Details:
- Renamed two bli_thread_*() APIs:
bli_thread_obarrier() -> bli_thread_barrier()
bli_thread_obroadcast() -> bli_thread_broadcast()
The 'o' was a leftover from when thrcomm_t objects tracked both
"inner" and "outer" communicators. They have long since been
simplified to only support the latter, and thus the 'o' is
superfluous.
commit f6e6bf73e695226c8b23fe7900da0e0ef37030c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 24 17:52:23 2020 -0600
List Gentoo under supported external packages.
Details:
- Add mention of Gentoo Linux under the list of external packages in
the README.md file. Thanks to M. Zhou for maintaining this package.
commit 9e5f7296ccf9b3f7b7041fe1df20b927cd0e914b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 18 15:16:03 2020 -0600
Skip building thrinfo_t tree when mt is disabled.
Details:
- Return early from bli_thrinfo_sup_grow() if the thrinfo_t object
address is equal to either &BLIS_GEMM_SINGLE_THREADED or
&BLIS_PACKM_SINGLE_THREADED.
- Added preprocessor logic to bli_l3_sup_thread_decorator() in
bli_l3_sup_decor_single.c that (by default) disables code that
creates and frees the thrinfo_t tree and instead passes
&BLIS_GEMM_SINGLE_THREADED as the thrinfo_t pointer into the
sup implementation.
- The net effect of the above changes is that a small amount of
thrinfo_t overhead is avoided when running small/skinny dgemm
problems when BLIS is compiled with multithreading disabled.
commit 90081e6a64b5ccea9211bdef193c2d332c68492f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 17 14:57:25 2020 -0600
Fixed bug(s) in mt sup when single-threaded.
Details:
- Fixed a syntax bug in bli_l3_sup_decor_single.c as a result of
changing function interface for the thread entry point function
(of type l3supint_t).
- Unfortunately, fixing the interface was not enough, as it caused
a memory leak in the sba at bli_finalize() time. It turns out that,
due to the new multithreading-capable variant code useing thrinfo_t
objects--specifically, their calling of bli_thrinfo_grow()--we
have to pass in a real thrinfo_t object rather than the global
objects &BLIS_PACKM_SINGLE_THREADED or &BLIS_GEMM_SINGLE_THREADED.
Thus, I inserted the appropriate logic from the OpenMP and pthreads
versions so that single-threaded execution would work as intended
with the newly upgraded variants.
commit c0558fde4511557c8f08867b035ee57dd2669dc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 17 14:08:08 2020 -0600
Support multithreading within the sup framework.
Details:
- Added multithreading support to the sup framework (via either OpenMP
or pthreads). Both variants 1n and 2m now have the appropriate
threading infrastructure, including data partitioning logic, to
parallelize computation. This support handles all four combinations
of packing on matrices A and B (neither, A only, B only, or both).
This implementation tries to be a little smarter when automatic
threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
recalculate the factorization in units of micropanels (rather than
using the raw dimensions) in bli_l3_sup_int.c, when the final
problem shape is known and after threads have already been spawned.
- Implemented bli_?packm_sup_var2(), which packs to conventional row-
or column-stored matrices. (This is used for the rrc and crc storage
cases.) Previously, copym was used, but that would no longer suffice
because it could not be parallelized.
- Minor reorganization of packing-related sup functions. Specifically,
bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
instead of from the variant functions. This has the effect of making
the variant functions more readable.
- Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
and inserted usage of these functions within bli_thrinfo_init(), which
previously was accessing thrinfo_t fields via the -> operator.
- Renamed bli_partition_2x2() to bli_thread_partition_2x2().
- Added an auto_factor field to the rntm_t struct in order to track
whether automatic thread factorization was originally requested.
- Added new test drivers in test/supmt that perform multithreaded sup
tests, as well as appropriate octave/matlab scripts to plot the
resulting output files.
- Added additional language to docs/Multithreading.md to make it clear
that specifying any BLIS_*_NT variable, even if it is set to 1, will
be considered manual specification for the purposes of determining
whether to auto-factorize via BLIS_NUM_THREADS.
- Minor comment updates.
commit d7a7679182d72a7eaecef4cd9b9a103ee0a7b42b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 7 17:37:03 2020 -0600
Fixed int-to-packbuf_t conversion error (C++ only).
Details:
- Fixed an error that manifests only when using C++ (specifically,
modern versions of g++) to compile drivers in 'test' (and likely most
other application code that #includes blis.h. Thanks to Ajay Panyala
for reporting this issue (#374).
commit d626112b8d5302f9585fb37a8e37849747a2a317
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 15 13:27:02 2020 -0600
Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
sort function. The purpose was not to sort the contents, but rather
to remove duplicates. However, there is valid syntax in a string of
linker flags that, when sorted, yields different/broken behavior.
So I've removed the line in common.mk that sorts LDFLAGS. Also, for
future use, I've added a new function, rm-dupls, that removes
duplicates without sorting. (This function was based on code from a
stackoverflow thread that is linked to in the comments for that
code.) Thanks to Isuru Fernando for reporting this issue (#373).
commit e67deb22aaeab5ed6794364520190936748ef272
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 14 16:01:34 2020 -0600
CHANGELOG update (0.6.1)
commit 10949f528c5ffc5c3a2cad47fe16a802afb021be
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 14 16:01:33 2020 -0600
Version file update (0.6.1)
commit 5db8e710a2baff121cba9c63b61ca254a2ec097a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 14 15:59:59 2020 -0600
ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
commit cde4d9d7a26eb51dcc5a59943361dfb8fda45dea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 14 15:19:25 2020 -0600
Removed 'attic/windows' (to prevent confusion).
Details:
- Finally removed 'attic/windows' and its contents. This directory once
contained "proto" Windows support for BLIS, but we've since moved on
to (thanks to Isuru Fernando) providing Windows DLL support via
AppVeyor's build artifacts. Furthermore, since 'windows' was the only
subdirectory within 'attic', the directory path would show up in
GitHub's listing at https://github.com/flame/blis, which probably led
to someone being confused about how BLIS provides Windows support. I
assume (but don't know for sure) that nobody is using these files, so
this is admittedly a case of shoot first and ask questions later.
commit 7d3407d4681c6449f4bbb8ec681983700ab968f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 14 15:17:53 2020 -0600
CREDITS file update.
commit f391b3e2e7d11a37300d4c8d3f6a584022a599f5
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Mon Jan 6 20:15:48 2020 +0000
Fix parsing in vpu_count on workstation SKX (#351)
* Fix parsing in vpu_count on workstation SKX
* Document Skylake-X as Haswell for single FMA
* Update vpu_count for Skylake and Cascade Lake models
* Support printing the configuration selected, controlled by the environment
Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.
* Move bli_log outside the cpp condition, and use it where intended
* Add Fixme comment (Skylake D)
* Mostly superficial edits to commits towards #351.
Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
relates to single-VPU Skylake-Xs.
* Fix comment typos
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
commit 5ca1a3cfc1c1cc4dd9da6a67aa072ed90f07e867
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 6 12:29:12 2020 -0600
Fixed 'configure' breakage introduced in 6433831.
Details:
- Added a missing 'fi' (endif) keyword to a conditional block added in
the configure script in commit 6433831.
commit e7431b4a834ef4f165c143f288585ce8e2272a23
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 6 12:01:41 2020 -0600
Updated 1m draft article link in README.md.
commit 6433831cc3988ad205637ebdebcd6d8f7cfcf148
Author: Jeff Hammond <jeff.r.hammond@intel.com>
Date: Fri Jan 3 17:52:49 2020 -0800
blacklist ICC 18 for knl/skx due to test failures
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
commit af3589f1f98781e3a94a8f9cea8d5ea6f155f7d2
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Fri Jan 3 13:23:24 2020 -0800
blacklist Intel 19+
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
commit 60de939debafb233e57fd4e804ef21b6de198caf
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Wed Jan 1 21:30:38 2020 -0800
fix link to docs
the comment contains an incorrect link, which is trivially fixed here.
@fgvanzee I hope you don't mind that I committed directly to master but this cannot break anything.
commit 52711073789b6b84eb99bb0d6883f457ed3fcf80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 16 16:30:26 2019 -0600
Fixed bugs in cblas_sdsdot(), sdsdot_().
Details:
- Fixed a bug in sdsdot_sub() that redundantly added the "alpha" scalar,
named 'sb'. This value was already being added by the underlying
sdsdot_() function. Thus, we no longer add 'sb' within sdsdot_sub().
Thanks to Simon Lukas Märtens for reporting this bug via #367.
- Fixed a second bug in order of typecasting intermediate products in
sdsdot_(). Previously, the "alpha" scalar was being added after the
"outer" typecast to float. However, the operation is supposed to first
add the dot product to the (promoted) scalar and THEN downcast the sum
to float. Thanks to Devin Matthews for catching this bug.
commit fe2560a4b1d8ef8d0a446df6002b1e7decc826e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 6 17:12:44 2019 -0600
Annoted missing thread-related symbols for export.
Details:
- Added BLIS_EXPORT_BLIS annotation to function prototypes for
bli_thrcomm_bcast()
bli_thrcomm_barrier()
bli_thread_range_sub()
so that these functions are exported to shared libraries by default.
This (hopefully) fixes issue #366. Thanks to Kyungmin Lee for
reporting this bug.
- CREDITS file update.
commit 2853825234001af8f175ad47cef5d6ff9b7a5982
Merge: efa61a6c 61b1f0b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 6 16:06:46 2019 -0600
Merge branch 'master' into amd
commit 61b1f0b0602faa978d9912fe58c6c952a33af0ac
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date: Wed Dec 4 14:18:47 2019 -0600
Add prototypes for POWER9 reference kernels (#365)
Updates and fixes to power9 subconfig.
Details:
- Register s,c,z reference gemm and trsm ukernels that assume elements
of B have been broadcast.
- Added prototypes for level-3 ukernels that assume elements of B have
been broadcast. Also added prototype for an spackm function that
employs a duplication/broadcast factor of 4.
- Register virtual gemmtrsm ukernels that work with broadcasting of B.
- Disable right-side hemm, symm, trmm, and trmm3 in bli_family_power9.h.
- Thanks to Nicholai Tukanov for providing these updates.
commit efa61a6c8b1cfa48781fc2e4799ff32e1b7f8f77
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 29 16:17:04 2019 -0600
Added missing bli_l3_sup_thread_decorator() symbol.
Details:
- Defined dummy versions of bli_l3_sup_thread_decorator() for Openmp
and pthreads so that those builds don't fail when performing shared
library linking (especially for Windows DLLs via AppVeyor). For now,
these dummy implementations of bli_l3_sup_thread_decorator() are
merely carbon-copies of the implementation provided for single-
threaded execution (ie: the one found in bli_l3_sup_decor_single.c).
Thus, an OpenMP or pthreads build will be able to use the gemmsup
code (including the new selective packing functionality), as it did
before 39fa7136, even though it will not actually employ any
multithreaded parallelism.
commit 39fa7136f4a4e55ccd9796fb79ad5f121b872ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 29 15:27:07 2019 -0600
Added support for selective packing to gemmsup.
Details:
- Implemented optional packing for A or B (or both) within the sup
framework (which currently only supports gemm). The request for
packing either matrix A or matrix B can be made via setting
environment variables BLIS_PACK_A or BLIS_PACK_B (to any
non-zero value; if set, zero means "disable packing"). It can also
be made globally at runtime via bli_pack_set_pack_a() and
bli_pack_set_pack_b() or with individual rntm_t objects via
bli_rntm_set_pack_a() and bli_rntm_set_pack_b() if using the expert
interface of either the BLIS typed or object APIs. (If using the
BLAS API, environment variables are the only way to communicate the
packing request.)
- One caveat (for now) with the current implementation of selective
packing is that any blocksize extension registered in the _cntx_init
function (such as is currently used by haswell and zen subconfigs)
will be ignored if the affected matrix is packed. The reason is
simply that I didn't get around to implementing the necessary logic
to pack a larger edge-case micropanel, though this is entirely
possible and should be done in the future.
- Spun off the variant-choosing portion of bli_gemmsup_ref() into
bli_gemmsup_int(), in bli_l3_sup_int.c.
- Added new files, bli_l3_sup_packm_a.c, bli_l3_sup_packm_b.c, along
with corresponding headers, in which higher-level packm-related
functions are defined for use within the sup framework. The actual
packm variant code resides in bli_l3_sup_packm_var.c.
- Pass the following new parameters into var1n and var2m: packa, packb
bool_t's, pointer to a rntm_t, pointer to a cntl_t (which is for now
always NULL), and pointer to a thrinfo_t* (which for nowis the address
of the global single-threaded packm thread control node).
- Added panel strides ps_a and ps_b to the auxinfo_t structure so that
the millikernel can query the panel stride of the packed matrix and
step through it accordingly. If the matrix isn't packed, the panel
stride of interest for the given millikernel will be set to the
appropriate value so that the mkernel may step through the unpacked
matrix as it normally would.
- Modified the rv_6x8m and rv_6x8n millikernels to read the appropriate
panel strides (ps_a and ps_b, respectively) instead of computing them
on the fly.
- Spun off the environment variable getting and setting functions into
a new file, bli_env.c (with a corresponding prototype header). These
functions are now used by the threading infrastructure (e.g.
BLIS_NUM_THREADS, BLIS_JC_NT, etc.) as well as the selective packing
infrastructure (e.g. BLIS_PACK_A, BLIS_PACK_B).
- Added a static initializer for mem_t objects, BLIS_MEM_INITIALIZER.
- Added a static initializer for pblk_t objects, BLIS_PBLK_INITIALIZER,
for use within the definition of BLIS_MEM_INITIALIZER.
- Moved the global_rntm object to bli_rntm.c and extern it where needed.
This means that the function bli_thread_init_rntm() was renamed to
bli_rntm_init_from_global() and relocated accordingly.
- Added a new bli_pack.c function, which serves as the home for
functions that manage the pack_a and pack_b fields of the global
rntm_t, including from environment variables, just as we have
functions to manage the threading fields of the global rntm_t in
bli_thread.c.
- Reorganized naming for files in frame/thread, which mostly involved
spinning off the bli_l3_thread_decorator() functions into their own
files. This change makes more sense when considering the further
addition of bli_l3_sup_thread_decorator() functions (for now limited
only to the single-threaded form found in the _single.c file).
- Explicitly initialize the reference sup handlers in both
bli_cntx_init_haswell.c and bli_cntx_init_zen.c so that it's more
obvious how to customize to a different handler, if desired.
- Removed various snippets of disabled code.
- Various comment updates.
commit bbb21fd0a9be8c5644bec37c75f9396eeeb69e48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 21 18:15:16 2019 -0600
Tweaked SIAM/SC Best Prize language in README.md.
commit 043366f92d5f5f651d5e3371ac3adb36baf4adce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 21 18:13:51 2019 -0600
Fixed typo in previous commit (SIAM/SC prize).
commit 05a4d583e65a46ff2a1100ab4433975d905d91f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 21 18:12:24 2019 -0600
Added SIAM/SC prize to "What's New" in README.md.
commit 881b05ecd40c7bc0422d3479a02a28b1cb48383f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 21 16:34:27 2019 -0600
Fixed blastest failure for 'generic' subconfig.
Details:
- Fixed a subtle and complicated bug that only manifested via the BLAS
test drivers in the generic subconfiguration, and possibly any other
subconfiguration that did not register complex-domain gemm ukernels,
or registered ONLY real-domain ukernels as row-preferential. This is
a long story, but it boils down to an exception to the "transpose the
operation to bring storage of C into agreement with ukernel pref"
optimization in bli_hemm_front.c and bli_symm_front.c sabotaging the
proper functioning of the 1m method, but only when the imaginary
component of beta is zero. See the comments in issue #342 for more
details. Thanks to Dave Love for identifying the commit in which this
bug was introduced, and other feedback related to this bug.
commit 0c7165fb01cdebbc31ec00124d446161b289942f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 14 16:48:14 2019 -0600
Fixed obscure bug in bli_acquire_mpart_[mn]dim().
Details:
- Fixed a bug in bli_acquire_mpart_mdim(), bli_acquire_mpart_ndim(),
and bli_acquire_mpart_mndim() that allowed the use of a blocksize b
that is too large given the current row/column index (i.e., the i/j
argument) and the size of the dimension being partitioned (i.e., the
m/n argument). This bug only affected backwards partitioning/motion
through the dimension and was the result of a misplaced conditional
check-and-redirect to the backwards code path. It should be noted
that this bug was discovered not because it manifested the way it
could (thanks to the callers in BLIS making sure to always pass in
the "correct" blocksize b), but could have manifested if the
functions were used by 3rd party callers. Thanks to Minh Quan Ho for
reporting the bug via issue #363.
commit fb8bef9982171ee0f60bc39e41a33c4d31fd59a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 14 13:05:28 2019 -0600
Fixed copy-paste bug in bli_spackm_6xk_bb4_ref().
Details:
- Fixed a copy-paste bug in the new bli_spackm_6xk_bb4_ref() that
manifested as failures in single-precision real level-3 operations.
Also replaced the duplication factor constants with a const-qualifed
varialbe, dfac, so that this won't happen again.
- Changed NC for single-precision real from 4080 to 8160 so that the
packed matrix B will have the same byte footprint in both single
and double real.
commit 8f399c89403d5824ba767df1426706cf2d19d0a7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 12 15:32:57 2019 -0600
Tweaked/added notes to docs/Multithreading.md.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
the nuances of setting multithreading parameters via the manual and
automatic ways simultaneously, and also about how these parameters
behave when multithreading is disabled at configure-time. These
changes are an attempt to address the issues that arose in issue #362.
Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
commit bdc7ee3394500d8e5b626af6ff37c048398bb27e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 11 15:47:17 2019 -0600
Various fixes to support packing duplication in B.
Details:
- Added cpp macros to trmm and trmm3 front-ends to optionally force
those operations to be cast so the structured matrix is on the left.
symm and hemm already had such macros, but these too were renamed so
that the macros were individual to the operation. We now have four
such macros:
#define BLIS_DISABLE_HEMM_RIGHT
#define BLIS_DISABLE_SYMM_RIGHT
#define BLIS_DISABLE_TRMM_RIGHT
#define BLIS_DISABLE_TRMM3_RIGHT
Also, updated the comments in the symm and hemm front-ends related to
the first two macro guards, and added corresponding comments to the
trmm and trmm3 front-ends for the latter two guards. (They all
functionally do the same thing, just for their specific operations.)
Thanks to Jeff Hammond for reporting the bugs that led me to this
change (via #359).
- Updated config/old/haswellbb subconfiguration (used to debug issues
related to duplicating B during packing) to register: a packing
kernel for single-precision real; gemmbb ukernels for s, c, and z;
trsmbb ukernels for s, c, and z; gemmtrsmbb virtual ukrnels for s, c
and z; and to use non-default cache and register blocksizes for s, c,
and z datatypes. Also declared prototypes for all of the gemmbb,
trsmbb, and gemmtrsmbb ukernel functions within the
bli_cntx_init_haswellbb() function. This should, once applied to the
power9 configuration, fix the remaining issues in #359.
- Defined bli_spackm_6xk_bb4_ref(), which packs single reals with a
duplication factor of 4. This function is defined in the same file as
bli_dpackm_6xk_bb2_ref() (bli_packm_cxk_bb_ref.c).
commit 0eb79ca8503bd7b237994335b9687457227d3290
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 8 14:48:48 2019 -0600
Avoid unused variable warning in lread.c (#356).
Details:
- Replaced the line
f = f;
with
( void )f;
for the unused variable 'f' in blastest/f2c/lread.c. (Hopefully)
addresses issue #356, but since we don't use xlc who knows. Thanks
to Jeff Hammond for reporting this.
commit f377bb448512f0b578263387eed7eaf8f2b72bb7
Author: Jérôme Duval <jerome.duval@gmail.com>
Date: Thu Nov 7 23:39:29 2019 +0100
Add Haiku to the known OS list (#361)
commit e29b1f9706b6d9ed798b7f6325f275df4e6be973
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 5 17:15:19 2019 -0600
Fixed failing testsuite gemmtrsm_ukr for power9.
Details:
- Added code that fixes false failures in the gemmtrsm_ukr module of the
testsuite. The tests were failing because the computation (bli_gemv())
that performs the numerical check was not able to properly travserse
the matrix operands bx1 and b11 that are views into the micropanel of
B, which has duplicated/broadcast elements under the power9 subconfig.
(For example, a micropanel of B with duplication factor of 2 needs to
use a column stride of 2; previously, the column stride was being
interpreted as 1.)
- Defined separate bli_obj_set_row_stride() and bli_obj_set_col_stride()
static functions in bli_obj_macro_defs.h. (Previously, only the
function bli_obj_set_strides() was defined. Amazing to think that we
got this far without these former functions.)
- Updated/expounded upon comments.
commit 49177a6b9afcccca5b39a21c6fd8e243525e1505
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 4 18:09:37 2019 -0600
Fixed latent testsuite ukr module bugs for power9.
Details:
- Fixed a latent bug in the testsuite ukernel modules (gemm, trsm, and
gemmtrsm) that only manifested once we began running with parameters
that mimic those of power9. The problem was rooted in the way those
modules were creating objects (and thus allocating memory) for the
micropanel operands to the microkernel being tested. Since power9
duplicates/broadcasts elements of B in memory, we needed an easy way
of asking for more than one storage element per logical element in
the matrix. I incorrectly expressed this as:
bli_obj_create( datatype, k, n, ldbp, 1, &bp );
The problem here is that bli_obj_create() is exceedingly efficient
at calculating the size it passes to malloc() and doesn't allocate a
full leading dimension's worth of elements for the last column (or
row, in this example). This would normally not bother anyone since
you're not supposed to access that memory anyway. But here, my
attempted "hack" for getting extra elements was insufficient, and
needed to be changed to:
bli_obj_create( datatype, k, ldbp, ldbp, 1, &bp );
That is, the extra elements needed to be baked into the dimensions of
the matrix object in order to have the intended effect on the number
of elements actually allocated. Thanks to Jeff Hammond for reporting
this bug.
- Fixed a typically harmless memory leak in the aforementioned test
modules (the objects for the packed micropanels were not being freed).
- Updated/expanded a common comment across all three ukr test modules.
commit c84391314d4f1b3f73d868f72105324e649f2a72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 4 13:57:12 2019 -0600
Reverted minor temp/wspace changes from b426f9e.
Details:
- Added missing license header to bli_pwr9_asm_macros_12x6.h.
- Reverted temporary changes to various files in 'test' and 'testsuite'
directories.
- Moved testsuite/jobscripts into testsuite/old.
- Minor whitespace/comment changes across various files.
commit 4870260f6b8c06d2cc01b7147d7433ddee213f7f
Author: Jeff Hammond <jeff.r.hammond@intel.com>
Date: Mon Nov 4 11:55:47 2019 -0800
blacklist GCC 5 and older for POWER9 (#360)
commit b426f9e04e5499c6f9c752e49c33800bfaadda4c
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date: Fri Nov 1 17:57:03 2019 -0500
POWER9 DGEMM (#355)
Implemented and registered power9 dgemm ukernel.
Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel
assumes that elements of B have been duplicated/broadcast during the
packing step. The microkernel uses a column orientation for its
microtile vector registers and thus implements column storage and
general stride IO cases. (A row storage IO case via in-register
transposition may be added at a future date.) It should be noted that
we recommend using this microkernel with gcc and *not* xlc, as issues
with the latter cropped up during development, including but not
limited to slightly incompatible vector register mnemonics in the GNU
extended inline assembly clobber list.
commit 58102aeaa282dc79554ed045e1b17a6eda292e15
Merge: 52059506 b9bc222b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 28 17:58:31 2019 -0500
Merge branch 'amd'
commit 52059506b2d5fd4c3738165195abeb356a134bd4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 23 15:26:42 2019 -0500
Added "How to Download BLIS" section to README.md.
Details:
- Added a new section to the README.md, just prior to the "Getting
Started" section, titled "How to Download BLIS". This section details
the user's options for obtaining BLIS and lays out four common ways
of downloading the library. Thanks to Jeff Diamond for his feedback
on this topic.
commit e6f0a96cc59aef728470f6850947ba856148c38a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 14 17:05:39 2019 -0500
Updated README.md to ack Facebook as funder.
commit b9bc222bfc3db4f9ae5d7b3321346eed70c2c3fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 14 16:38:15 2019 -0500
Call bli_syrk_small() before error checking.
Details:
- In bli_syrk_front(), moved the conditional call to bli_syrk_check()
(if error checking is enabled) and the conditional scaling of C by
beta (if alpha is zero) so that they occur after, instead of before,
the call to bli_syrk_small(). This sequencing now matches that of
bli_gemm_small() in bli_gemm_front() and bli_trsm_small() in
bli_trsm_front().
commit f0959a81dbcf30d8a1076d0a6348a9835079d31a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 14 15:46:28 2019 -0500
When manual config is blacklisted, output error.
Details:
- Fixed and adjusted the logic in configure so that a more informative
error message is output when a user runs './configure ... <conf>' and
<conf> is present in the configuration blacklist. Previously, this
particular set of conditions would result in the message:
'user-specified configuration '' is NOT registered!
That is, the error message mis-identified the targeted configuration
as the empty string, and (more importantly) mis-identifies the
problem. Thanks to Tze Meng Low for reporting this issue.
- Fixed a nearby error messages somewhat unrelated to the issue above.
Specifically, the wrong string was being printed when the error
message was identifying an auto-detected configuration that did not
appear to be registered.
commit 6218ac95a525eefa8921baf8d0d7057dfacebe9c
Merge: 0016d541 a617301f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 11:53:51 2019 -0500
Merge branch 'master' into amd
commit 0016d541e6b0da617b1fae6612d2b314901b7a75
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 11:09:44 2019 -0500
Changed -march=znver2 to =znver1 for clang on zen2.
Details:
- In config/zen2/make_defs.mk, changed the -march= flag so that
-march=znver1 is used instead of -march=znver2 when CC_VENDOR is
clang. (The gcc branch attempts to differentiate between various
versions, but the equivalent version cutoffs for clang are not
yet known by us, so we have to use a single flag for all versions
of clang. Hopefully -march=znver1 is new enough. If not, we'll
fall back to -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp.)
This issue was discovered thanks to AppVeyor.
commit e94a0530e5ac4c78a18f09105f40003be2b517f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 10:48:27 2019 -0500
Corrected zen NC that was non-multiple of NR.
Details:
- Updated an incorrectly set cache blocksize NC for single real within
config/zen/bli_cntx_init_zen.c that was non a multiple of the
corresponding value of NR. This issue, which was caught by Travis CI,
was introduced in 29b0e1e.
commit a2ffac752076bf55eb8c1fe2c5da8d9104f1f85b
Merge: 1cfe8e25 29b0e1ef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 10:31:18 2019 -0500
Merge branch 'amd-master' into amd
commit 29b0e1ef4e8b84ce76888d73c090009b361f1306
Merge: 1cfe8e25 fdce1a56
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 10:24:24 2019 -0500
Code review + tweaks to AMD's AOCL 2.0 PR (#349).
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
inadvertantly not incremented when the Zen2 subconfiguration was
added.
- In bli_gemm_front(), added a missing conditional constraint around the
call to bli_gemm_small() that ensures that the computation precision
of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
that existed around the call to bli_syrk_small() into bli_syrk_small()
to minimize the calling code footprint and also to bring that code
into stylistic harmony with similar code in bli_gemm_front() and
bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
proper accessor static functions (e.g. 'a->dim[0]' becomes
'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
strictly speaking unnecessary, but it serves as a useful visual cue to
those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
version check for availability of -march=znver2, and added appropriate
support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
config/zen/amd_config.mk, including: removal of -march=znver1 et al.
from CKVECFLAGS (since the -march flag is added within make_defs.mk);
setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
commit a617301f9365ac720ff286514105d1b78951368b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 8 17:14:05 2019 -0500
Updates to docs/CodingConventions.md.
commit 171f10069199f0cd280f18aac184546bd877c4fe
Merge: 702486b1 05d58edf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 4 11:18:23 2019 -0500
Merge remote-tracking branch 'loveshack/emacs'
commit 702486b12560b5c696ba06de9a73fc0d5107ca44
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 2 16:35:41 2019 -0500
Removed stray FAQ section introduced in 1907000.
commit 1907000ad6ea396970c010f07ae42980b7b14fa0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 2 16:31:54 2019 -0500
Updated to FAQ (AMD-related questions).
Details:
- Added a couple potential frequently-asked questions/answers releated
to AMD's fork of BLIS.
- Updated existing answers to other questions.
commit 834f30a0dad808931c9d80bd5831b636ed0e1098
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 2 12:45:56 2019 -0500
Mention mixeddt paper in docs/MixedDatatypes.md.
commit 05d58edfe0ea9279971d74f17a5f7a69c4672ed5
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Wed Oct 2 10:33:44 2019 +0100
Note .dir-locals.el in docs
commit 531110c339f199a4d165d707c988d89ab4f5bfe8
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Wed Oct 2 10:16:22 2019 +0100
Modify Emacs config
Confine it to cc-mode and add comment-start/end.
commit 4bab365cab98202259c70feba6ec87408cba28d8
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Tue Oct 1 19:22:47 2019 +0000
Add .dir-locals.el for Emacs (#348)
A minimal version that could probably do with extending, but at least
gets the indentation roughly right.
commit 4ec8dad66b3d37b0a2b47d19b7144bb62d332622
Author: Dave Love <dave.love@manchester.ac.uk>
Date: Thu Sep 26 16:27:53 2019 +0100
Add .dir-locals.el for Emacs
A minimal version that could probably do with extending, but at least
gets the indentation roughly right.
commit bc16ec7d1e2a30ce4a751255b70c9cbe87409e4f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 23 15:37:33 2019 -0500
Set execute bits of shared library at install-time.
Details:
- Modified the 0644 octal code used during installation of shared
libraries to 0755 (for Linux/OSX only). Thanks to Adam J. Stewart
for reporting this issue via #343.
- CREDITS file update.
commit c60db26aee9e7b4e5d0b031b0881e58d23666b53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 18:04:17 2019 -0500
Fixed bad loop counter in bli_[cz]scal2bbs_mxn().
Details:
- Fixed a typo in the loop counter for the 'd' (duplication) dimension
in the complex macros of frame/include/level0/bb/bli_scal2bbs_mxn.h.
They shouldn't be used by anyone yet, but thankfully clang via
AppVeyor spit out warnings that alerted me to the issue.
commit c766c81d628f0451d8255bf5e4b8be0a4ef91978
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 18:00:29 2019 -0500
Added missing schema arg to knl packm kernels.
Details:
- Added the pack_t schema argument to the knl packm kernel functions.
This change was intended for inclusion in 31c8657. (Thank you SDE +
Travis CI.)
commit 31c8657f1d6d8f6efd8a73fd1995e995fc56748b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 17:42:10 2019 -0500
Added support for pre-broadcast when packing B.
Details:
- Added support for being able to duplicate (broadcast) elements in
memory when packing matrix B (ie: the left-hand operand) in level-3
operations. This turns out advantageous for some architectures that
can afford the cost of the extra bandwidth and somehow benefit from
the pre-broadcast elements (and thus being able to avoid using
broadcast-style load instructions on micro-rows of B in the gemm
microkernel).
- Support optionally disabling right-side hemm and symm. If this occurs,
hemm_r is implemented in terms of hemm_l (and symm_r in terms of
symm_l). This is needed when broadcasting during packing because the
alternative--supporting the broadcast of B while also allowing matrix
B to be Hermitian/symmetric--would be an absolute mess.
- Support alignment factors for packed blocks of A, B, and C separately
(as well as for general-purpose buffers). In addition, we support
byte offsets from those alignment values (which is different from
aligning by align+offset bytes to begin with). The default alignment
values are BLIS_PAGE_SIZE in all four cases, with the offset values
defaulting to zero.
- Pass pack_t schema into bli_?packm_cxk() so that it can be then passed
into the packm kernel, where it will be needed by packm kernels that
perform broadcasts of B, since the idea is that we *only* want to
broadcast when packing micropanels of B and not A.
- Added definition for variadic bli_cntx_set_l3_vir_ukrs(), which can be
used to set custom virtual level-3 microkernels in the cntx_t, which
would typically be done in the bli_cntx_init_*() function defined in
the subconfiguration of interest.
- Added a "broadcast B" kernel function for use with NP/NR = 12/6,
defined in in ref_kernels/1m/bli_packm_cxk_bb_ref.c.
- Added a gemm, gemmtrsm, and trsm "broadcast B" reference kernels
defined in ref_kernels/3/bb. (These kernels have been tested with
double real with NP/NR = 12/6.)
- Added #ifndef ... #endif guards around several macro constants defined
in frame/include/bli_kernel_macro_defs.h.
- Defined a few "broadcast B" static functions in
frame/include/level0/bb for use by "broadcast B"-style packm reference
kernels. For now, only the real domain kernels are tested and fully
defined.
- Output the alignment and offset values for packed blocks of A and B
in the testsuite's "BLIS configuration info" section.
- Comment updates to various files.
- Bumped so_version to 3.0.0.
commit fd9bf497cd4ff73ccdfc030ba037b3cb2f1c2fad
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 15:45:24 2019 -0500
CREDITS file update.
commit 6c8f2d1486ce31ad3c2083e5c2035acfd4409a43
Author: ShmuelLevine <shmuel.levine@gmail.com>
Date: Tue Sep 17 16:43:46 2019 -0400
Fix description for function bli_*pxby2v (#340)
Fix typo in BLISTypedAPI.md for bli_?axpy2v() description.
commit b5679c1520f8ae7637b3cc2313133461f62398dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 14:00:37 2019 -0500
Inserted Multithreading links into BuildSystem.md.
Details:
- Inserted brief disclaimers about default disabled multithreading
and default single-threadedness to BuildSystem.md along with links to
the Multithreading.md document. Thanks to Jeff Diamond for suggesting
these additions.
- Trivial reword of sentence regarding automatically-detected
architectures.
commit f4f5170f8482c94132832eb3033bc8796da5420b
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Sep 11 07:34:48 2019 -0500
Update README.md (#338)
commit 1cfe8e2562e5e50769468382626ce36b734741c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 5 16:08:30 2019 -0500
Reimplemented bli_cpuid_query() for ARM.
Details:
- Rewrote bli_cpuid_query() for ARM architectures to use stdio-based
functions such as fopen() and fgets() instead of popen(). The new code
does more or less the same thing as before--searches /proc/cpuinfo for
various strings, which are then parsed in order to determine the
model, part number, and features. Thanks to Dave Love for suggesting
this change in issue #335.
commit 7c7819145740e96929466a248d6375d40e397e19
Author: Devin Matthews <damatthews@smu.edu>
Date: Fri Aug 30 16:52:09 2019 -0500
Always use sqsumv to compute normfv. (#334)
* Always use sqsumv to compute normfv on MacOS.
* Unconditionally disable the "dot trick" in normfv.
* Added explanatory comment to normfv definition.
Details:
- Added a comment above the unconditional disabling of the dotv-based
implementation to normfv. Thanks to Roman Yurchak, Devin Matthews,
and Isuru Fernando in helping with this improvement.
- CREDITS file update.
commit 80e6c10b72d50863b4b64d79f784df7befedfcd1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 29 12:12:08 2019 -0500
Added reproduction section to Performance docs.
Details:
- Added section titled "Reproduction" to both Performance.md and
PerformanceSmall.md that briefly nudges the motivated reader in the
right direction if he/she wishes to run the same performance
benchmarks used to produce the graphs shown in those documents.
Thanks to Dave Love for making this suggestion.
commit 14cb426414856024b9ae0f84ac21efcc1d329467
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 28 17:04:33 2019 -0500
Updated OpenBLAS, Eigen sup results.
Details:
- Updated the results shown in docs/PerformanceSmall.md for OpenBLAS and
Eigen.
commit b02e0aae8ce2705e91023b98ed416cd05430a78e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 27 14:37:46 2019 -0500
Updated test drivers to iterate backwards.
Details:
- Updated test driver source in test, test/3, test/1m4m, and
test/mixeddt to iterate through the problem space backwards. This
can help avoid certain situations where the CPU frequency does not
immediately throttle up to its maximum. Thanks to Robert van de
Geijn for recommending this fix (originally made to test/sup drivers
in 57e422a).
- Applied off-by-one matlab output bugfix from b6017e5 to test drivers
in test, test/3, test/1m4m, and test/mixeddt directories.
commit b6017e53f4b26c99b14cdaa408351f11322b1e80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 27 14:18:14 2019 -0500
Bugfix of output text + tweaks to test/sup driver.
Details:
- Fixed an off-by-one bug in the output of matlab row indices in
test/sup/test_gemm.c that only manifested when the problem size
increment was equal to 1.
- Disabled the building of rrc, rcr, rcc, crr, crc, and ccr storage
combinations for blissup drivers in test/sup. This helps make the
building of drivers complete sooner.
- Trivial changes to test/sup/runme.sh.
commit 138d403b6bb15e687a3fe26d3d967b8ccd1ed97b
Author: Devin Matthews <damatthews@smu.edu>
Date: Mon Aug 26 18:11:27 2019 -0500
Use -funsafe-math-optimizations and -ffp-contract=fast for all reference kernels when using gcc or clang. (#331)
commit d5a05a15a7fcc38fb2519031dcc62de8ea4a530c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 26 16:54:31 2019 -0500
Cropped whitespace from new sup graphs.
Details:
- Previously forgot crop whitespace from the new .png graphs
added/updated in docs/graphs/sup.
commit a6c80171a353db709e43f9e6e7a3da87ce4d17ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 26 16:51:31 2019 -0500
Fixed contents links in docs/PerformanceSmall.md.
Details:
- Corrected links in contents section of docs/PerformanceSmall.md,
which were erroneously directing readers to the corresponding
sections of docs/Performance.md.
commit 40781774df56a912144ef19cc191ed626a89f0de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 26 16:47:37 2019 -0500
Updated sup performance graphs with libxsmm.
Details:
- Added libxsmm to column-stored sup graphs presented in
docs/PerformanceSmall.md.
- Updated sup results for BLASFEO.
- Added sup results for Lonestar5 (Haswell).
- Addresses issue #326.
commit bfddf671328e7e372ac7228f72ff2d9d8e03ae18
Author: figual <figual@ucm.es>
Date: Mon Aug 26 12:01:33 2019 +0200
Fixed context registration for Cortex A53 (#329).
commit 4a0a6e89c568246d14de4cc30e3ff35aac23d774
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 24 15:25:16 2019 -0500
Changed test/sup alpha to 1; test libxsmm+netlib.
Details:
- Changed the value of alpha to 1.0 in test/sup/test_gemm.c. This is
needed because libxsmm currently only optimizes gemm operations where
alpha is unit (and beta is unit or zero).
- Adjusted the test/sup/Makefile to test libxsmm with netlib BLAS as its
fallback library. This is the library that will be called the
problem dimensions are deemed too large, or any other criteria for
optimization are not met. (This was done not because it is realistic,
but rather so that it would be very clear when libxsmm ceased handling
gemm calls internally when the data are graphed.)
commit 7aa52b57832176c5c13a48e30a282e09ecdabf73
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 23 16:12:50 2019 -0500
Use libxsmm API in test/sup; add missing -ldl.
Details:
- Switch the driver source in test/sup so that libxsmm_?gemm() is called
instead of ?gemm_() when compiling for / linking against libxsmm.
libxsmm's documentation isn't clear on whether it is even *trying* to
provide BLAS API compatibility, and I got tired of trying to figure it
out.
- Added missing -ldl in LDFLAGS when linking against libxsmm.
commit 57e422aa168bee7416965265c93fcd4934cd7041
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 23 14:17:52 2019 -0500
Added libxsmm support to test/sup drivers.
Details:
- Modified test/sup/Makefile to build drivers that test the performance
of skinny/small problems via libxsmm.
- Modified test/sup/runme.sh to run aforementioned drivers.
- Modified test/sup/test_gemm.c so that problem sizes are tested in
reverse order (from largest to smallest). This can help avoid certain
situations where the CPU frequency does not immediately throttle up
to its maximum. Thanks to Robert van de Geijn for recommending this
fix.
commit 661681fe33978acce370255815c76348f83632bc
Merge: 2f387e32 ef0a1a0f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 22 14:29:50 2019 -0500
Merge branch 'master' of github.com:flame/blis
commit 2f387e32ef5f9a17bafb5076dc9f66c38b52b32d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 22 14:27:30 2019 -0500
Added Eigen -march=native hack to perf docs.
Details:
- Spell out the hack given to me by Sameer Agarwal in order to get Eigen
to build with -march=native (which is critically important for Eigen)
in docs/Performance.md and docs/PerformanceSmall.md.
commit ef0a1a0faf683fe205f85308a54a77ffd68a9a6c
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Aug 21 17:40:24 2019 -0500
Update do_sde.sh (#330)
* Update do_sde.sh
Automatically accept SDE license and download directly from Intel
* Update .travis.yml
[ci skip]
* Update .travis.yml
Enable SDE testing for PRs.
commit 0cd383d53a8c4a6871892a0395591ef5630d4ac0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 21 13:39:05 2019 -0500
Corrected variable type and comment update.
Details:
- Forgot to save all changes from bli_gemmtrsm4m1_ref.c before commit
in 8122f59. Fixed type mismatch and referenced github issue in
comment.
commit 8122f59745db780987da6aa1e851e9e76aa985e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 21 13:22:12 2019 -0500
Pacify 'restrict' warning in gemmtrsm4m1 ref ukr.
Details:
- Previously, some versions of gcc would complain that the same
pointer, one_r, is being passed in for both alpha and beta in the
fourth call to the real gemm ukernel in bli_gemmtrsm4m1_ref.c. This
is understandable since the compiler knows that the real gemm ukernel
qualifies all of its floating-point arguments (including alpha and
beta) with restrict. A small hack has been inserted into the file
that defines a new variable to store the value 1.0, which is now used
in lieu of one_r for beta in the fourth call to the real gemm ukernel,
which should pacify the compiler now. Thanks to Dave Love for
reporting this issue (#328) and for Devin Matthews for offering his
'restrict' expertise.
commit e8c6281f139bdfc9bd68c3b36e5e89059b0ead2e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 21 12:38:53 2019 -0500
Add -march support for specific gcc version ranges.
Details:
- Added logic to configure that checks the version of the compiler
against known version ranges that could cause problems later in the
build process. For example, versions of gcc older than 4.9.0 use
different -march labels than version 4.9.0 or later
('-march=corei7-avx' vs '-march=sandybridge', respectively).
Similarly, before 6.1, compilation on Zen was possible, but you
need to start with -march=bdver4 and then disable instruction sets
that were discarded during the transition from Excavator to Zen. So
now, configure substitutes 'yes'/'no' values into anchors in
config.mk.in, which sets various make variables (e.g. GCC_OT_4_9_0),
which can be accessed and branched upon by the various
configurations' make_defs.mk files when setting their compiler flags.
- Updated config/haswell/make_defs.mk to branch on GCC_OT_4_9_0.
- Updated config/sandybridge/make_defs.mk to branch on GCC_OT_4_9_0.
- Updated config/zen/make_defs.mk to branch on GCC_OT_6_1_0.
commit e6ac4ebcb6e6a372820e7f509c0af3342966b84a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 20 13:49:47 2019 -0500
Added page size, source location to perf docs.
Details:
- Added the page size, as returned via 'getconf -a | grep PAGE_SIZE',
and the location of the performance drivers to docs/Performance.md
(test/3) and docs/PerformanceSmall.md (test/sup). Thanks to Dave
Love for suggesting these additions in #325.
commit fdce1a5648d69034fab39943100289323011c36f
Author: Meghana <Meghana.Vankadari@amd.com>
Date: Wed Jul 24 15:04:41 2019 +0530
changed gcc version check condition from 'ifeq' to 'if greater or equal'
Change-Id: Ie4c461867829bcc113210791bbefb9517e52c226
commit c9486e0c4f82cd9f58f5ceb71c0df039e9970a20
Author: Meghana <Meghana.Vankadari@amd.com>
Date: Wed Jul 24 09:45:17 2019 +0530
code to detect version of gcc and set flags accordingly for zen2
Change-Id: I29b0311d0000dee1a2533ee29941acf53f9e9f34
commit 54afe3dfe6828a1aff65baabbf14c98d92e50692
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 23 16:54:28 2019 -0500
Added "Education and Learning" ToC entry to README.
commit 9f53b1ce7ac702e84e71801fe96986f6aa16040e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 23 16:50:35 2019 -0500
Added "Education and Learning" section to README.
Details:
- Added a short section after the Intro of the README.md file titled
"Education and Learning" that directs interested readers to the
"LAFF-On Programming for High-Performance" massive open online course
(MOOC) hosted via edX.
commit deda4ca8a094ee18d7c7c45e040e8ef180f33a48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 22 13:59:05 2019 -0500
Added test/1m4m driver directory.
Details:
- Added a new standalone test driver directory named '1m4m' that can
build and run performance experiments for BLIS 1m, 4m1a, assembly,
OpenBLAS, and the vendor library (MKL). This new driver directory
was used to regenerate performance results for the 1m paper.
- Added alternate (commented-out) cache blocksizes to
config/haswell/bli_cntx_init_haswell.c. These blocksizes tend to
work well on an a 12-core Intel Xeon E5-2650 v3.
commit dcc0ce12fde4c6dca2b4764a1922a2ab19725867
Author: Meghana <Meghana.Vankadari@amd.com>
Date: Mon Jul 22 17:12:01 2019 +0530
Added a global Makefile for AMD architectures in config/zen folder
This Makefile(amd_config.mk) has all the flags that are common to EPYC series
Change-Id: Ic02c60a8293ccdd37f0f292e631acd198e6895de
commit af17bca26a8bd3dcbee8ca81c18d7b25de09c483
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 19 14:46:23 2019 -0500
Updated haswell MC cache blocksizes.
Details:
- Updated the default MC cache blocksizes used by the haswell subconfig
for both row-preferential (the default) and column-preferential
microkernels.
commit b5e9bce4dde5bf014dd9771ae741048e1f6c7748
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 19 14:42:37 2019 -0500
Updated -march flags for sandybridge, haswell.
Details:
- Updated the '-march=corei7-avx' flag in the sandybridge subconfig
to '-march=sandybridge' and the '-march=core-avx2' flag in the
haswell subconfig to '-march=haswell'. The older flags were used
by older versions of gcc and should have been updated to the newer
forms a long time ago. (The older flags were clearly working, even
though they are no longer documented in the gcc man page.)
commit c22b9dba5859a9fc94c8431eccc9e4eb9be02be1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 16 13:14:47 2019 -0500
More updates to comments in testsuite modules.
Details:
- Updated most comments in testsuite modules that describe how the
correctness test is performed so that it is clear whether the vector
(normfv) or matrix (normfm) form of Frobenius norm is used.
commit c4cc6fa702f444a05963db01db51bc7d6669e979
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 16 13:00:35 2019 -0500
New cntx_t blksz "set" functions + misc tweaks.
Details:
- Defined two new static functions in bli_cntx.h:
bli_cntx_set_blksz_def_dt()
bli_cntx_set_blksz_max_dt()
which developers may find convenient when experimenting with different
values of cache blocksizes.
- Updated one- and two-socket multithreaded problem size range and
increment values in test/3/Makefile.
- Changed default to column storage in test/3/test_gemm.c.
- Fixed typo in comment in testsuite/src/test_subm.c.
commit b84cee29f42855dc1f263e42b83b1a46ac8def87
Merge: 1f80858a c7dd6e6c
Author: Meghana Vankadari <Meghana.Vankadari@amd.com>
Date: Mon Jul 8 02:03:07 2019 -0400
Merge "Added compiler flags for vanilla clang" into amd-staging-rome2.0
commit 1f80858abf5ca220b2998fbe6f9b06c32d3864c3
Author: kdevraje <kiran.Devrajegowda@amd.com>
Date: Fri Jul 5 16:05:11 2019 +0530
This checkin solves the dgemm performance issue jira ticket CPUPL 458, as #else was missed during integration, it was always following else path to get the block sizes
Change-Id: I0084b5856c2513ab1066c08c15b5086db6532717
commit c7dd6e6cd2f910cbefcdc1e04a5adeb919a23de0
Author: Meghana <meghana.vankadari@amd.com>
Date: Thu Jul 4 09:32:51 2019 +0530
Added compiler flags for vanilla clang
Change-Id: I13c00b4c0d65bbda4c929848fd48b0ab611952ab
commit 2acd49b76457635625a01e31c2abc8902b23cf51
Author: Meghana <meghana.vankadari@amd.com>
Date: Mon Jul 1 15:42:38 2019 +0530
fix for test failures using AOCC 2.0
Change-Id: If44eaccc64bbe96bbbe1d32279b1b5773aba08d1
commit ceee2f973ebe115beca55ca77f9e3ce36b14c28a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 24 17:47:40 2019 -0500
Fixed thrinfo_t printing bug for small problems.
Details:
- Fixed a bug in bli_l3_thrinfo_print_gemm_paths() and
bli_l3_thrinfo_print_trsm_paths(), defined in bli_l3_thrinfo.c,
whereby subnodes of the thrinfo_t tree are "dereferenced" near the
beginning of the functions, which may lead to segfaults in certain
situations where the thread tree was not fully formed because the
matrix problem was too small for the level of parallelism specified.
(That is, too small because some problems were assigned no work due
to the smallest units in the m and n dimensions being defined by the
register blocksizes mr and nr.) The fix requires several nested levels
of if statements, and this is one of those few instances where use of
goto statements results in (mostly) prettier code, especially in the
case of _gemm_paths(). And while it wasn't necessary, I ported this
goto usage to the loop body that prints the thrinfo_t work_id and
comm_id values for each thread. Thanks to Nicholai Tukanov for helping
to find this bug.
commit cac127182dd88ed0394ad81e6b91b897198e168a
Merge: 565fa385 3a45ecb1
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Mon Jun 24 13:01:27 2019 +0530
Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis
with public repo commit id 565fa3853b381051ac92cff764625909d105644d.
Change-Id: I68b9824b110cf14df248217a24a6191b3df79d42
commit c152109e9a3b1cd74760e8a3215a676d25c18d2e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 19 13:23:24 2019 -0500
Updated BLASFEO results in PerformanceSmall.md.
Details:
- Updated the BLASFEO performance graphs shown in PerformanceSmall.md
using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
accordingly.
- Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
containing the mpnpkp results do not need to be preprocessed in order
to plot half the problem size range (ie: up to 400 instead of the
800 range of the other shape cases).
- Trivial updates to runme.m.
commit 4d19c98110691d33ecef09d7e1b97bd1ccf4c420
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 8 11:02:03 2019 -0500
Trivial change to MixedDatatypes.md link text.
commit 24965beabe83e19acf62008366097a7f198d4841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 8 11:00:22 2019 -0500
Fixed typo in README.md's MixedDatatypes.md link.
commit 50dc5d95760f41c5117c46f754245edc642b2179
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 7 13:10:16 2019 -0500
Adjust -fopenmp-simd for icc's preferred syntax.
Details:
- Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
icc. Recall that this option is used for SIMD auto-vectorization in
reference kernels only. Support for the -f option has been completely
deprecated and removed in newer versions of icc in favor of -q. Thanks
to Victor Eijkhout for reporting this issue and suggesting the fix.
commit ad937db9507786874c801b41a4992aef42d924a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 7 11:34:08 2019 -0500
Added missing #include "bli_family_thunderx2.h".
Details:
- Added a cpp-conditional directive block to bli_arch_config.h that
#includes "bli_family_thunderx2.h". The code has been missing since
adf5c17f. However, this never manifested as an error because the file
is virtually empty and not needed for thunderx2 (or most subconfigs).
Thanks to Jeff Diamond for helping to spot this.
commit ce671917b2bc24895289247feef46f6fdd5020e7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 6 14:17:21 2019 -0500
Fixed formatting/typo in docs/PerformanceSmall.md.
commit 86c33a4eb284e2cf3282a1809be377785cdb3703
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 5 11:43:55 2019 -0500
Tweaked language in README.md related to sup/AMD.
commit cbaa22e1ca368d36a8510f2b4ecd6f1523d1e1f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 4 16:06:58 2019 -0500
Added BLASFEO results to docs/PerformanceSmall.md.
Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
docs/Performance.md.
commit 763fa39c3088c0e2c0155675a3ca868a58bffb30
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 4 14:46:45 2019 -0500
Minor tweaks to test/sup.
Details:
- Changed starting problem and increment from 16 to 4.
- Added 'lll' (square problems) to list of problem size shapes to
compile and run with.
- Define BLASFEO location and added BLASFEO-related definitions.
commit 5e1e696003c9151b1879b910a1957b7bdd7b0deb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 18:37:20 2019 -0500
CHANGELOG update (0.6.0)
commit 18c876b989fd0dcaa27becd14e4f16bdac7e89b3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 18:37:19 2019 -0500
Version file update (0.6.0)
commit 0f1b3bf49eb593ca7bb08b68a7209f7cd550f912
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 18:35:19 2019 -0500
ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.
commit 27da2e8400d900855da0d834b5417d7e83f21de1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 17:14:56 2019 -0500
Minor edits to docs/PerformanceSmall.md.
Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
Epyc sections.
- Added emphasis to certain passages.
commit 09ba05c6f87efbaadf085497dc137845f16ee9c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 16:53:19 2019 -0500
Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
publishes new performance graphs for Kaby Lake and Epyc showcasing
the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
document.
commit 6bf449cc6941734748034de0e9af22b75f1d6ba1
Merge: abd8a9fa a4e8801d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 31 17:42:40 2019 -0500
Merge branch 'amd'
commit a4e8801d08d81fa42ebea6a05a990de8dcedc803
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 31 17:30:51 2019 -0500
Increased MT sup threshold for double to 201.
Details:
- Fine-tuned the double-precision real MT threshold (which controls
whether the sup implementation kicks for smaller m dimension values)
from 180 to 201 for haswell and 180 to 256 for zen.
- Updated octave scripts in test/sup/octave to include a seventh column
to display performance for m = n = k.
commit 3a45ecb15456249c30ccccd60e42152f355615c1
Merge: 3f867c96 b69fb0b7
Author: Kiran Devrajegowda <Kiran.Devrajegowda@amd.com>
Date: Fri May 31 06:47:02 2019 -0400
Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0
commit b69fb0b74a4756168de270fc9b18f7cf7aa57f17
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Fri May 31 15:14:22 2019 +0530
Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup
Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc
commit 3f867c96caea3bbbbeeff1995d90f6cf8c9895fb
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Fri May 31 12:22:44 2019 +0530
When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d
commit abd8a9fa7df4569aa2711964c19888b8e248901f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 28 12:49:44 2019 -0500
Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
a shared library (ie: static library build is disabled), running the
BLAS test drivers can fail because those drivers provide their own
local version of xerbla_() as a clever (albeit still rather hackish)
way of checking the error codes that result from the individual tests.
This local xerbla_() function is never found at link-time because the
BLAS test drivers' Makefile imports BLIS compilation flags via the
get-user-cflags-for() function, which currently conveys the
-fvisibility=hidden flag, which hides symbols unless they are
explicitly annotated for export. The -fvisibility=hidden flag was
only ever intended for use when building BLIS (not for applications),
and so the attempted solution here is to omit the symbol export
flag(s) from get-user-cflags-for() by storing the symbol export
flag(s) to a new BULID_SYMFLAGS variable instead of appending it
to the subconfigurations' CMISCFLAGS variable (which is returned by
every get-*-cflags-for() function). Thanks to M. Zhou for reporting
this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
text.
commit 13806ba3b01ca0dd341f4720fb930f97e46710b0
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Mon May 27 16:24:43 2019 +0530
This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
commit ee123f535872510f77100d3d55a43d4ca56047d5
Author: Meghana <meghana.vankadari@amd.com>
Date: Mon May 27 15:36:44 2019 +0530
Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES
Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
commit 9d93a4caa21402d3a90aac45d7a1603736c9fd63
Author: prangana <pradeep.rao@amd.com>
Date: Fri May 24 17:59:13 2019 +0530
update version 2.0
commit 755730608d923538273a90c48bfdf77571f86519
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 23 17:34:36 2019 -0500
Minor rewording of language around mt env. vars.
commit ba31abe73c97c16c78fffc59a215761b8d9fd1f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 23 14:59:53 2019 -0500
Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
(e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
threading configuration in order to achieve the parallelism reported
on in docs/Performance.md.
commit cb788ffc89cac03b44803620412a5e83450ca949
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 23 13:00:53 2019 -0500
Increased MT sup threshold for double to 180.
Details:
- Increased the double-precision real MT threshold (which controls
whether the sup implementation kicks for smaller m dimension values)
from 80 to 180, and this change was made for both haswell and zen
subconfigurations. This is less about the m dimension in particular
and more about facilitating a smoother performance transition when
m = n = k.
commit 057f5f3d211e7513f457ee6ca6c9555d00ad1e57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 23 12:51:17 2019 -0500
Minor build system housekeeping.
Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
level Makefiles. This variable is already set within common.mk, and
so the only time it should be overridden is if the user wants to link
to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.
commit e05171118c377f356f89c4daf8a0d5ddc5a4e4f7
Author: Meghana <meghana.vankadari@amd.com>
Date: Thu May 23 16:15:27 2019 +0530
Implemented TRSM for small matrices for cases where A is on the right
Added separate kernels for zen and zen2
Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134
commit 02920f5c480c42706b487e37b5ecc96c3555b851
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Thu May 23 15:29:59 2019 +0530
make checkblis fails for matrix dimension check at the begining hence reverting it
Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87
commit 84215022f29fb3bfedd254d041635308d177e6c0
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Thu May 23 11:08:41 2019 +0530
Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5
commit a3554eb1dcc1b5b94d81c60761b2f01c3d827ffa
Merge: ea082f83 17b878b6
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Thu May 23 11:51:07 2019 +0530
Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2
Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae
commit ea082f839071dd9ec555062dc3851c31d12f00e4
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Thu May 23 10:38:29 2019 +0530
adding empty zen2 directory with .gitignore file
Change-Id: Ifa37cf54b2578aa19ad335372b44bca17043fe4b
commit b80bd5bcb2be8551a9a21fafc8e6c8b6336c99b5
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue May 21 15:11:47 2019 +0530
config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples
Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
commit a042db011df9a1c3e7c7ac546541f4746b176ea5
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Mon May 20 14:17:32 2019 +0530
Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0
Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43
commit a23f92594cf3d530e5794307fe97afc877d853b7
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Mon May 20 10:48:06 2019 +0530
config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20
Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
commit 17b878b66d917d50b6fe23721d8579e826cb3e8c
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Wed May 22 14:02:53 2019 +0530
adding license same as in ut-austin-amd-branch
Change-Id: I6790768d2bf5d42369d304ef93e34701f95fbaff
commit df755848b8a271323e007c7a628c64af63deab00
Merge: ca4b33c0 c72ae27a
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Wed May 22 13:30:07 2019 +0530
Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0
Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc
commit c72ae27adee4726679ee004d02c972582b5285b4
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Mar 19 12:49:26 2018 +0530
Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
commit ab0818af80f7f683080873f3fa24734b65267df2
Author: sraut <Biplab.Raut@amd.com>
Date: Wed Oct 3 15:30:33 2018 +0530
Review comments incorporated for small TRSM.
Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9
commit 32392cfc72af7f42da817a129748349fb1951346
Author: Jeff Hammond <jeff.r.hammond@intel.com>
Date: Tue May 14 15:52:30 2019 -0400
add info about CXX in configure (#311)
commit fa7e6b182b8365465ade178b0e4cd344ff6f6460
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 1 19:13:00 2019 -0500
Define _POSIX_C_SOURCE in bli_system.h.
Details:
- Added
#ifndef _POSIX_C_SOURCE
#define _POSIX_C_SOURCE 200809L
#endif
to bli_system.h so that an application that uses BLIS (specifically,
an application that #includes blis.h) does not need to remember to
#define the macro itself (either on the command line or in the code
that includes blis.h) in order to activate things like the pthreads.
Thanks to Christos Psarras for reporting this issue and suggesting
this fix.
- Commented out #include <sys/time.h> in bli_system.h, since I don't
think this header is used/needed anymore.
- Comment update to function macro for bli_?normiv_unb_var1() in
frame/util/bli_util_unb_var1.c.
commit 3df84f1b5d5e1146bb01bfc466ac20c60a9cc859
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 27 21:27:32 2019 -0500
Minor bugfixes in sup dgemm implementation.
Details:
- Fixed an obscure but in the bli_dgemmsup_rv_haswell_asm_5x8n() kernel
that only affected the beta == 0, column-storage output case. Thanks
to the BLAS test drivers for catching this bug.
- Previously, bli_gemmsup_ref_var1n() and _var2m() were returning if
k = 0, when the correct action would be to scale by beta (and then
return). Thanks to the BLAS test drivers to catching this bug.
- Changed the sup threshold behavior such that the sup implementation
only kicks in if a matrix dimension is strictly less than (rather than
less than or equal to) the threshold in question.
- Initialize all thresholds to zero (instead of 10) by default in
ref_kernels/bli_cntx_ref.c. This, combined with the above change to
threshold testing means that calls to BLIS or BLAS with one or more
matrix dimensions of zero will no longer trigger the sup
implementation.
- Added disabled debugging output to frame/3/bli_l3_sup.c (for future
use, perhaps).
commit ecbdd1c42dcebfecd729fe351e6bb0076aba7d81
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 27 19:38:11 2019 -0500
Ceased use of BLIS_ENABLE_SUP_MR/NR_EXT macros.
Details:
- Removed already limited use of the BLIS_ENABLE_SUP_MR_EXT and
BLIS_ENABLE_SUP_NR_EXT macros in bli_gemmsup_ref_var1n() and
bli_gemmsup_ref_var2m(). Their purpose was merely to avoid a long
conditional that would determine whether to allow the last iteration
to be merged with the second-to-last iteration. Functionally, the
macros were not needed, and they ended up causing problems when
building configuration families such as intel64 and x86_64.
commit aa8a6bec3036a41e1bff2034f8ef6766a704ec49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 27 18:53:33 2019 -0500
Fixed typo in --disable-sup-handling macro guard.
Details:
- Fixed an incorrectly-named macro guard that is intended to allow
disabling of the sup framework via the configure option
--disable-sup-handling. In this case, the preprocessor macro,
BLIS_DISABLE_SUP_HANDLING, was still named by its name from an older
uncommitted version of the code (BLIS_DISABLE_SM_HANDLING).
commit b9c9f03502c78a63cfcc21654b06e9089e2a3822
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 27 18:44:50 2019 -0500
Implemented gemm on skinny/unpacked matrices.
Details:
- Implemented a new sub-framework within BLIS to support the management
of code and kernels that specifically target matrix problems for which
at least one dimension is deemed to be small, which can result in long
and skinny matrix operands that are ill-suited for the conventional
level-3 implementations in BLIS. The new framework tackles the problem
in two ways. First the stripped-down algorithmic loops forgo the
packing that is famously performed in the classic code path. That is,
the computation is performed by a new family of kernels tailored
specifically for operating on the source matrices as-is (unpacked).
Second, these new kernels will typically (and in the case of haswell
and zen, do in fact) include separate assembly sub-kernels for
handling of edge cases, which helps smooth performance when performing
problems whose m and n dimension are not naturally multiples of the
register blocksizes. In a reference to the sub-framework's purpose of
supporting skinny/unpacked level-3 operations, the "sup" operation
suffix (e.g. gemmsup) is typically used to denote a separate namespace
for related code and kernels. NOTE: Since the sup framework does not
perform any packing, it targets row- and column-stored matrices A, B,
and C. For now, if any matrix has non-unit strides in both dimensions,
the problem is computed by the conventional implementation.
- Implemented the default sup handler as a front-end to two variants.
bli_gemmsup_ref_var2() provides a block-panel variant (in which the
2nd loop around the microkernel iterates over n and the 1st loop
iterates over m), while bli_gemmsup_ref_var1() provides a panel-block
variant (2nd loop over m and 1st loop over n). However, these variants
are not used by default and provided for reference only. Instead, the
default sup handler calls _var2m() and _var1n(), which are similar
to _var2() and _var1(), respectively, except that they defer to the
sup kernel itself to iterate over the m and n dimension, respectively.
In other words, these variants rely not on microkernels, but on
so-called "millikernels" that iterate along m and k, or n and k.
The benefit of using millikernels is a reduction of function call
and related (local integer typecast) overhead as well as the ability
for the kernel to know which micropanel (A or B) will change during
the next iteration of the 1st loop, which allows it to focus its
prefetching on that micropanel. (In _var2m()'s millikernel, the upanel
of A changes while the same upanel of B is reused. In _var1n()'s, the
upanel of B changes while the upanel of A is reused.)
- Added a new configure option, --[en|dis]able-sup-handling, which is
enabled by default. However, the default thresholds at which the
default sup handler is activated are set to zero for each of the m, n,
and k dimensions, which effectively disables the implementation. (The
default sup handler only accepts the problem if at least one dimension
is smaller than or equal to its corresponding threshold. If all
dimensions are larger than their thresholds, the problem is rejected
by the sup front-end and control is passed back to the conventional
implementation, which proceeds normally.)
- Added support to the cntx_t structure to track new fields related to
the sup framework, most notably:
- sup thresholds: the thresholds at which the sup handler is called.
- sup handlers: the address of the function to call to implement
the level-3 skinny/unpacked matrix implementation.
- sup blocksizes: the register and cache blocksizes used by the sup
implementation (which may be the same or different from those used
by the conventional packm-based approach).
- sup kernels: the kernels that the handler will use in implementing
the sup functionality.
- sup kernel prefs: the IO preference of the sup kernels, which may
differ from the preferences of the conventional gemm microkernels'
IO preferences.
- Added a bool_t to the rntm_t structure that indicates whether sup
handling should be enabled/disabled. This allows per-call control
of whether the sup implementation is used, which is useful for test
drivers that wish to switch between the conventional and sup codes
without having to link to different copies of BLIS. The corresponding
accessor functions for this new bool_t are defined in bli_rntm.h.
- Implemented several row-preferential gemmsup kernels in a new
directory, kernels/haswell/3/sup. These kernels include two general
implementation types--'rd' and 'rv'--for the 6x8 base shape, with
two specialized millikernels that embed the 1st loop within the kernel
itself.
- Added ref_kernels/3/bli_gemmsup_ref.c, which provides reference
gemmsup microkernels. NOTE: These microkernels, unlike the current
crop of conventional (pack-based) microkernels, do not use constant
loop bounds. Additionally, their inner loop iterates over the k
dimension.
- Defined new typedef enums:
- stor3_t: captures the effective storage combination of the level-3
problem. Valid values are BLIS_RRR, BLIS_RRC, BLIS_RCR, etc. A
special value of BLIS_XXX is used to denote an arbitrary combination
which, in practice, means that at least one of the operands is
stored according to general stride.
- threshid_t: captures each of the three dimension thresholds.
- Changed bli_adjust_strides() in bli_obj.c so that bli_obj_create()
can be passed "-1, -1" as a lazy request for row storage. (Note that
"0, 0" is still accepted as a lazy request for column storage.)
- Added support for various instructions to bli_x86_asm_macros.h,
including imul, vhaddps/pd, and other instructions related to integer
vectors.
- Disabled the older small matrix handling code inserted by AMD in
bli_gemm_front.c, since the sup framework introduced in this commit
is intended to provide a more generalized solution.
- Added test/sup directory, which contains standalone performance test
drivers, a Makefile, a runme.sh script, and an 'octave' directory
containing scripts compatible with GNU Octave. (They also may work
with matlab, but if not, they are probably close to working.)
- Reinterpret the storage combination string (sc_str) in the various
level-3 testsuite modules (e.g. src/test_gemm.c) so that the order
of each matrix storage char is "cab" rather than "abc".
- Comment updates in level-3 BLAS API wrappers in frame/compat.
commit 0d549ceda822833bec192bbf80633599620c15d9
Author: Isuru Fernando <isuruf@gmail.com>
Date: Sat Apr 27 22:56:02 2019 +0000
make unix friendly archives on appveyor (#310)
commit ca4b33c001f9e959c43b95a9a23f9df5adec7adf
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Wed Apr 24 15:02:39 2019 +0530
Added compiler option (-mno-avx256-split-unaligned-store) in the file config/zen/make_defs.mk to improve performance of intrinsic codes, this flag ensures compiler generates 256-bit stores for the equivalent intrinsics code.
Change-Id: I8f8cd81a3604869df18d38bc42097a04f178d324
commit 945928c650051c04d6900c7f4e9e29cd0e5b299f
Merge: 663f6629 74e513eb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 17 15:58:56 2019 -0500
Merge branch 'amd' of github.com:flame/blis into amd
commit 74e513eb6a6787a925d43cd1500277d54d86ab8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 17 13:34:44 2019 -0500
Support row storage in Eigen gemm test/3 driver.
Details:
- Added preprocessor branches to test/3/test_gemm.c to explicitly
support row-stored matrices. Column-stored matrices are also still
supported (and is the default for now). (This is mainly residual work
leftover from initial integration of Eigen into the test drivers, so
if we ever want to test Eigen with row-stored matrices, the code will
be ready to use, even if it is not yet integrated into the Makefile
in test/3.)
commit b5d457fae9bd75c4ca67f7bc7214e527aa248127
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 12:50:01 2019 -0500
Applied forgotten variable rename from 89a70cc.
Details:
- Somehow the variable name change (root_file_name -> root_inputname)
in flatten-headers.py mentioned in the commit log entry for 89a70cc
didn't make it into the actual commit. This commit applies that
change.
commit 89a70cccf869333147eb2559cdfa5a23dc915824
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 18:33:08 2019 -0500
GNU-like handling of installation prefix et al.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
includedir, and sharedir (and also added an --exec-prefix option).
The defaults to these variables are set as follows:
prefix: /usr/local
exec_prefix: ${prefix}
libdir: ${exec_prefix}/lib
includedir: ${prefix}/include
sharedir: ${prefix}/share
The key change, aside from the addition of exec_prefix and its use to
define the default to libdir, is that the variables are substituted
into config.mk with quoting that delays evaluation, meaning the
substituted values may contain unevaluated references to other
variables (namely, ${prefix} and ${exec_prefix}). This more closely
follows GNU conventions, including those used by GNU autoconf, and
also allows make to override any one of the variables *after*
configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
top-level Makefile to use $(wildcard ...) instead of 'find'. This
was motivated by the new way of handling prefix and friends, which
leads to the 'find' command being run on /usr/local (by default),
which can take a while almost never yielding any benefit (since the
user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
non-verbose output) since those statements often end with file or
directory paths, which get confusing to read when puctuated by a
period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
over the years.)
- In configure script, changed the default state of threading_model
variable from 'no' to 'off' to match that of debug_type, where there
are similarly more than two valid states. ('no' is still accepted
if given via the --enable-debug= option, though it will be
standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
32812ff.
- CREDITS file update.
commit 9d76688ad90014a11ddc0c2f27253d62806216b1
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Thu Apr 11 10:22:48 2019 +0530
Fix for single rank crash with HPL application. When computing offset of C buffer, as integer variables are used for a row and column index, the intermediate result value overflows and a negative value gets added to the buffer, when the negative value is too large it would index the buffer out of the range resulting in segmentation fault. Although the crash is a result of dgemm kernel, added similar code in sgemm kernel also.
Change-Id: I171119b0ec0dfbd8e63f1fcd6609a94384aabd27
commit 32812ff5aba05d34c421fe1024a61f3e2d5e7052
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 12:20:19 2019 -0500
Minor bugfix to flatten-headers.py.
Details:
- Fixed a minor bug in flatten-headers.py whereby the script, upon
encountering a #include directive for the root header file, would
erroneously recurse and inline the conents of that root header.
The script has been modified to avoid recursion into any headers
that share the same name as the root-level header that was passed
into the script. (Note: this bug didn't actually manifest in BLIS,
so it's merely a precaution for usage of flatten-headers.py in other
contexts.)
commit bec90e0b6aeb3c9b19589c2b700fda2d66f6ccdf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 2 17:45:13 2019 -0500
Minor update to docs/HardwareSupport.md document.
Details:
- Added more details and clarifying language to implications of 1m and
the recycling of microkernels between microarchitectures.
commit 89cd650e7be01b59aefaa85885a3ea78970351e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 2 17:23:55 2019 -0500
Use void_fp for function pointers instead of void*.
Change void*-typed function pointers to void_fp.
- Updated all instances of void* variables that store function pointers
to variables of a new type, void_fp. Originally, I wanted to define
the type of void_fp as "void (*void_fp)( void )"--that is, a pointer
to a function with no return value and no arguments. However, once
I did this, I realized that gcc complains with incompatible pointer
type (-Wincompatible-pointer-types) warnings every time any such a
pointer is being assigned to its final, type-accurate function
pointer type. That is, gcc will silently typecast a void* to
another defined function pointer type (e.g. dscalv_ker_ft) during
an assignment from the former to the latter, but the same statement
will trigger a warning when typecasting from a void_fp type. I suspect
an explicit typecast is needed in order to avoid the warning, which
I'm not willing to insert at this time.
- Added a typedef to bli_type_defs.h defining void_fp as void*, along
with a commented-out version of the aborted definition described
above. (Note that POSIX requires that void* and function pointers
be interchangeable; it is the C standard that does not provide this
guarantee.)
- Comment updates to various _oapi.c files.
commit ffce3d632b284eb52474036096815ec38ca8dd5f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 2 14:40:50 2019 -0500
Renamed armv8a gemm kernel filename.
Details:
- Renamed
kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c
to
kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c.
This follows the naming convention used by other kernel sets, most
notably haswell.
commit 77867478af02144544b4e7b6df5d54d874f3f93b
Author: Isuru Fernando <isuruf@gmail.com>
Date: Tue Apr 2 13:33:11 2019 -0500
Use pthreads on MinGW and Cygwin (#307)
commit 7bc75882f02ce3470a357950878492e87e688cec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 28 17:40:50 2019 -0500
Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
results, this time using a development version cloned from their git
mirror on March 27, 2019 (version 3.3.90). Performance is improved
over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
test/3/matlab.
commit 20ea7a1217d3833db89a96158c42da2d6e968ed8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 27 18:09:17 2019 -0500
Minor text updates (Eigen) to docs/Performance.md.
Details:
- Added/updated a few more details, mostly regarding Eigen.
commit bfb7e1bc6af468e4ff22f7e27151ea400dcd318a
Merge: 044df950 2c85e1dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 27 17:58:19 2019 -0500
Merge branch 'dev'
commit 2c85e1dd9d5d84da7228ea4ae6deec56a89b3a8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 27 16:29:51 2019 -0500
Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
docs/graphs to report on Eigen implementations, where applicable.
Specifically, Eigen implements all level-3 operations sequentially,
however, of those operations it only provides multithreaded gemm.
Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
omitted. Thanks to Sameer Agarwal for his help configuring and
using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
commit bfac7e385f8061f2e6591de208b0acf852f04580
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 27 16:04:48 2019 -0500
Added ability to plot with Eigen in test/3/matlab.
Details:
- Updated matlab scripts in test/3/matlab to optionally plot/display
Eigen performance curves. Whether Eigen is plotted is determined by
a new boolean function parameter, with_eigen.
- Updated runme.m scratchpad to reflect the latest invocations of the
plot_panel_4x5() function (with Eigen plotting enabled).
commit 67535317b9411c90de7fa4cb5b0fdb8f61fdcd79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 27 13:32:18 2019 -0500
Fixed mislabeled eigen output from test/3 drivers.
Details:
- Fixed the Makefile in test/3 so that it no longer incorrectly labels
the matlab output variables from Eigen-linked hemm, herk, trmm, and
trsm driver output as "vendor". (The gemm drivers were already
correctly outputing matlab variables containing the "eigen" label.)
commit 044df9506f823643c0cdd53e81ad3c27a9f9d4ff
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Mar 27 12:39:31 2019 -0500
Test with shared on windows (#306)
Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.
To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix
commit 5e6b160c8a85e5e23bab0f64958a8acf4918a4ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 19:10:59 2019 -0500
Link to Eigen BLAS for non-gemm drivers in test/3.
Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
this since Eigen's headers don't define implementations to the
standard BLAS APIs.
- Simplified #included headers in hemm, herk, trmm, and trsm source
driver files, since nothing specific to Eigen is needed at
compile-time for those operations.
commit e593221383aae19dfdc3f30539de80ed05cfec7f
Merge: 92fb9c87 c208b9dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 15:51:45 2019 -0500
Merge branch 'master' into dev
commit 92fb9c87bf88b9f9c401eeecd9aa9c3521bc2adb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 15:43:23 2019 -0500
Add more support for Eigen to drivers in test/3.
Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
library is not necessary.) However, as of Eigen 3.3.7, Eigen only
parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
(bli_does_trans()) was being called to determine whether the object
for matrix A should be created for a left- or right-side case. This
was corrected by changing the function to bli_is_left(), as is done
in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.
commit c208b9dc46852c877197d53b6dd913a046b6ebb6
Author: Isuru Fernando <isuruf@gmail.com>
Date: Mon Mar 25 13:03:44 2019 -0500
Fix clang version detection (#305)
clang -dumpversion gives 4.2.1 for all clang versions as clang was
originally compatible with gcc 4.2.1
Apple clang version and clang version are two different things
and the real clang version cannot be deduced from apple clang version
programatically. Rely on wikipedia to map apple clang to clang version
Also fixes assembly detection with clang
clang 3.8 can't build knl as it doesn't recognize zmm0
commit 53842c7e7d530cb2d5609d6d124ae350fc345c32
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Fri Mar 22 13:57:14 2019 +0530
Removed printing alpha and beta values
Change-Id: I49102db510311a30f6a936f9d843f35838f50d23
commit 6805db45e343d83d1adaf9157cf0b841653e9ede
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Fri Mar 22 12:55:35 2019 +0530
Corrected setting alpha & beta values- alpha = -1 and beta = 1 - bli_setc(-1.0, 0, &alpha) should be used rather than bli_setc(0.0, -1.0, &alpha). This corrected now
Change-Id: Ic1102dfd6b50ccf212386a1211c6f31e8d987ef9
commit feefcab4427a75b0b55af215486b85abcda314f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 21 18:11:20 2019 -0500
Allow disabling of BLAS prototypes at compile-time.
Details:
- Modified bli_blas.h so that:
- By default, if the BLAS layer is enabled at configure-time, BLAS
prototypes are also enabled within blis.h;
- But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including
blis.h, BLAS prototypes are skipped over entirely so that, for
example, the application or some other header pulled in by the
application may prototype the BLAS functions without causing any
duplication.
- Updated docs/BuildSystem.md to document the feature above, and
related text.
commit 20153cd4b594bc34f860c381ec18de3a6cc743c7
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Thu Mar 21 16:23:53 2019 +0530
Modified test_gemm.c file in test folder
A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
Format for input file:
Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
The operation always implemented is C = C - A*B and column-major format.
When macro is disabled - it reverts back to original implementation.
Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
GEMM is called through BLAS interface
For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
for MKL/OpenBLAS - ignore this character
Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36
commit 288843b06d91e1b4fade337959aef773090bd1c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 20 17:52:23 2019 -0500
Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
build by Eigen. It appears, however, that Eigen's BLAS library does
not support multithreading. (It may be that multithreading is only
available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
commit 153e0be21d9ff413e370511b68d553dd02abada9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 17:53:18 2019 -0500
More minor tweaks to docs/Performance.md.
Details:
- Defined GFLOPS as billions of floating-point operations per second,
and reworded the sentence after about normalization.
commit 05c4e42642cc0c8dbfa94a6c21e975ac30c0517a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 17:07:20 2019 -0500
CHANGELOG update (0.5.2)
commit 9204cd0cb0cc27790b8b5a2deb0233acd9edeb9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 17:07:18 2019 -0500
Version file update (0.5.2)
commit 64560cd9248ebf4c02c4a1eeef958e1ca434e510
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 17:04:20 2019 -0500
ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
commit ab5ad557ea69479d487c9a3cb516f43fa1089863
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 16:50:41 2019 -0500
Very minor tweaks to Performance.md.
commit 03c4a25e1aa8a6c21abbb789baa599ac419c3641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 16:47:15 2019 -0500
Minor fixes to docs/Performance.md.
Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
apparently the result of copy-pasting.
commit fe6dd8b132f39ecb8893d54cd8e75d4bbf6dab83
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 16:30:23 2019 -0500
Fixed broken section links in docs/Performance.md.
Details:
- Fixed a few broken section links in the Contents section.
commit 913cf97653f5f9a40aa89a5b79e2b0a8882dd509
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 16:15:24 2019 -0500
Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
performance of a representative set of level-3 operations across a
variety of hardware architectures, comparing BLIS to OpenBLAS and a
vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
commit 9945ef24fd758396b698b19bb4e23e53b9d95725
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 15:28:44 2019 -0500
Adjusted cache blocksizes for zen subconfig.
Details:
- Adjusted the zen sub-configuration's cache blocksizes for float,
scomplex, and dcomplex based on the existing values for double.
(The previous values were taken directly from the haswell subconfig,
which targets Intel Haswell/Broadwell/Skylake systems.)
commit d202d008d51251609d08d3c278bb6f4ca9caf8e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 18:18:25 2019 -0500
Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
configure option with --export-shared=[public|all], with the 'public'
instance of the latter corresponding to --disable-export-all and the
'all' instance corresponding to --enable-export-all. Nothing else
semantically about the option, or its default, has changed.
commit ff78089870f714663026a7136e696603b5259560
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 13:22:55 2019 -0500
Updates to docs/Multithreading.md.
Details:
- Made extra explicit the fact that: (a) multithreading in BLIS is
disabled by default; and (b) even with multithreading enabled, the
user must specify multithreading at runtime in order to observe
parallelism. Thanks to M. Zhou for suggesting these clarifications
in #292.
- Also made explicit that only the environment variable and global
runtime API methods are available when using the BLAS API. If the
user wishes to use the local runtime API (specify multithreading on
a per-call basis), one of the native BLIS APIs must be used.
commit 3a929a3d0ba0353159a6d4cd188f01b7a390ccfc
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Mon Mar 18 10:51:41 2019 +0530
Fixed code merging: bli_gemm_small.c - missed conditional checks for L!=0 && K!=0. Now they are added. This fix is done to pass blastest
Change-Id: Idc9c9a04d2015a68a19553c437ecaf8f1584026c
commit 663f662932c3f182fefc3c77daa1bf8c3394bb8b
Merge: 938c05ef 6bfe3812
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 16 16:17:12 2019 -0500
Merge branch 'amd' of github.com:flame/blis into amd
commit 938c05ef8654e2fc013d39a57f51d91d40cc40fb
Merge: 4ed39c09 5a5f494e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 16 16:01:43 2019 -0500
Merge branch 'amd' of github.com:flame/blis into amd
commit 6bfe3812e29b86c95b828822e4e5473b48891167
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 13:57:49 2019 -0500
Use -fvisibility=[...] with clang on Linux/BSD/OSX.
Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
etc.). Thanks to Isuru Fernando for pointing out this option works
with clang on these OSes.
commit 809395649c5bbf48778ede4c03c1df705dd49566
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 13 18:21:35 2019 -0500
Annotated additional symbols for export.
Details:
- Added export annotations to additional function prototypes in order to
accommodate the testsuite.
- Disabled calling bli_amaxv_check() from within the testsuite's
test_amaxv.c.
commit e095926c643fd9c9c2220ebecd749caae0f71d42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 13 17:35:18 2019 -0500
Support shared lib export of only public symbols.
Details:
- Introduced a new configure option, --enable-export-all, which will
cause all shared library symbols to be exported by default, or,
alternatively, --disable-export-all, which will cause all symbols to
be hidden by default, with only those symbols that are annotated for
visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
symbols), to be exported. The default for this configure option is
--disable-export-all. Thanks to Isuru Fernando for consulting on
this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
symbol visibility for gcc, clang, and icc (to the extend that each of
these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
slightly further down in the list.
commit 5a5f494e428372c7c27ed1f14802e15a83221e87
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 12 18:45:09 2019 -0500
Removed export macros from all internal prototypes.
Details:
- After merging PR #303, at Isuru's request, I removed the use of
BLIS_EXPORT_BLIS from all function prototypes *except* those that we
potentially wish to be exported in shared/dynamic libraries. In other
words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
functions that can be considered private or for internal use only.
This is likely the last big modification along the path towards
implementing the functionality spelled out in issue #248. Thanks
again to Isuru Fernando for his initial efforts of sprinkling the
export macros throughout BLIS, which made removing them where
necessary relatively painless. Also, I'd like to thank Tony Kelman,
Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
participating in the initial discussion in issue #37 that was later
summarized and restated in issue #248.
- CREDITS file update.
commit 3dc18920b6226026406f1d2a8b2c2b405a2649d5
Merge: b938c16b 766769ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 12 11:20:25 2019 -0500
Merge branch 'master' into dev
commit 766769eeb944bd28641a6f72c49a734da20da755
Author: Isuru Fernando <isuruf@gmail.com>
Date: Mon Mar 11 19:05:32 2019 -0500
Export functions without def file (#303)
* Revert "restore bli_extern_defs exporting for now"
This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.
* Remove symbols not intended to be public
* No need of def file anymore
* Fix whitespace
* No need of configure option
* Remove export macro from definitions
* Remove blas export macro from definitions
commit 4ed39c0971c7917e2675cf5449f563b1f4751ccc
Merge: 540ec1b4 b938c16b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 8 11:56:58 2019 -0600
Merge branch 'amd' of github.com:flame/blis into amd
commit b938c16b0c9e839335ac2c14944b82890143d02f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 16:40:39 2019 -0600
Renamed test/3m4m to test/3.
Details:
- Renamed '3m4m' directory to '3', which captures the directory nicely
since it builds test drivers to test level-3 operations.
- These test drivers ceased to be used to test the 3m and 4m (or even
1m) induced methods long ago, hence the name change.
commit ab89a40582ec7acf802e59b0763bed099a02edd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 16:26:12 2019 -0600
More minor updates and edits to test/3m4m.
Details:
- Further updates to matlab scripts, mostly for compatibility with
GNU Octave.
- More tweaks to runme.sh.
- Updates to runme.m that allow copy-paste into matlab interactive
session to generate graphs.
commit f0e70dfbf3fee4c4e382c2c4e87c25454cbc79a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 01:04:05 2019 +0000
Very minor updates to test/3m4m for ul252.
Details:
- Very minor updates to the newly revamped test/3m4m drivers when used
on a Xeon Platinum (SkylakeX).
commit 7fe44748383071f1cbbc77d904f4ae5538e13065
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Wed Mar 6 16:23:31 2019 +0530
Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning
Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57
commit 9f1dbe572b1fd5e7dd30d5649bdf59259ad770d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 5 17:47:55 2019 -0600
Overhauled test/3m4m Makefile and scripts.
Details:
- Rewrote much of Makefile to generate executables for single- and dual-
socket multithreading as well as single-threaded. Each of the three
can also use a different problem size range/increment, as is often
appropriate when doubling/halving the number of threads.
- Rewrote runme.sh script to flexibly execute as many threading
parameter scenarios as is given in the input parameter string
(currently set within the script itself). The string also encodes
the maximum problem size for each threading scenario, which is used
to identify the executable to run. Also improved the "progress" output
of the script to reduce redundant info and improve readability in
terminals that are not especially wide.
- Minor updates to test_*.c source files.
- Updated matlab scripts according to changes made to the Makefile,
test drivers, and runme.sh script, and renamed 'plot_all.m' to
'runme.m'.
commit f5ed95ecd7d5eb4a63e1333ad5cc6765fc8df9fe
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Mar 5 15:01:57 2019 +0530
Merged BLIS Release 1.3
Modified config/zen/make_defs.mk, now CKVECFLAGS := -mavx2 -mfpmath=sse -mfma -march=znver1
Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81
commit 3bdab823fa93342895bf45d812439324a37db77c
Merge: 70f12f20 e2a02ebd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 28 14:07:24 2019 -0600
Merge branch 'master' into dev
commit e2a02ebd005503c63138d48a2b7d18978ee29205
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 28 13:58:59 2019 -0600
Updates (from ls5) to test/3m4m/runme.sh.
Details:
- Lonestar5-specific updates to runme.sh.
commit f0dcc8944fa379d53770f5cae5d670140918f00c
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Feb 27 17:27:23 2019 -0600
Add symbol export macro for all functions (#302)
* initial export of blis functions
* Regenerate def file for master
* restore bli_extern_defs exporting for now
commit 540ec1b479712d5e1da637a718927249c15d867f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Feb 24 19:09:10 2019 -0600
Updated level-3 BLAS to call object API directly.
Details:
- Updated the BLAS compatibility layer for level-3 operations so that
the corresponding BLIS object API is called directly rather than first
calling the typed BLIS API. The previous code based on the typed BLIS
API calls is still available in a deactivated cpp macro branch, which
may be re-activated by #defining BLIS_BLAS3_CALLS_TAPI. (This does not
yet correspond to a configure option. If it seems like people might
want to toggle this behavior more regularly, a configure option can be
added in the future.)
- Updated the BLIS typed API to statically "pre-initialize" objects via
new initializor macros. Initialization is then finished via calls to
static functions bli_obj_init_finish_1x1() and bli_obj_init_finish(),
which are similar to the previously-called functions,
bli_obj_create_1x1_with_attached_buffer() and
bli_obj_create_with_attached_buffer(), respectively. (The BLAS
compatibility layer updates mentioned above employ this new technique
as well.)
- Transformed certain routines in bli_param_map.c--specifically, the
ones that convert netlib-style parameters to BLIS equivalents--into
static functions, now in bli_param_map.h. (The remaining three classes
of conversation routines were left unchanged.)
- Added the aforementioned pre-initializor macros to bli_type_defs.h.
- Relocated bli_obj_init_const() and bli_obj_init_constdata() from
bli_obj_macro_defs.h to bli_type_defs.h.
- Added a few macros to bli_param_macro_defs.h for testing domains for
real/complexness and precisions for single/double-ness.
commit 8e023bc914e9b4ac1f13614feb360b105fbe44d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 16:55:30 2019 -0600
Updates to 3m4m/matlab scripts.
Details:
- Minor updates to matlab graph-generating scripts.
- Added a plot_all.m script that is more of a scratchpad for copying and
pasting function invocations into matlab to generate plots that are
presently of interest to us.
commit b06244d98cc468346eb1a8eb931bc05f35ff280c
Merge: e938ff08 4c7e6680
Author: praveeng <praveen.g@amd.com>
Date: Thu Feb 21 12:56:15 2019 +0530
Merge branch 'ut-austin-amd' of ssh://git.amd.com:29418/cpulibraries/er/blis into ut-austin-amd
commit e938ff08cea3d108c84524eb129d9e89d701ea90
Author: praveeng <praveen.g@amd.com>
Date: Thu Feb 21 12:44:38 2019 +0530
deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
commit ed13ad465dcba350ad3d5e16c9cc7542e33f3760
Author: mkv <Mallikarjuna-Reddy.K-V@amd.com>
Date: Thu Feb 21 01:04:16 2019 -0500
added test file for initial commit
commit 4c7e6680832b497468cf50c2399e3ac4de0e3450
Author: praveeng <praveen.g@amd.com>
Date: Thu Feb 21 12:44:38 2019 +0530
deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
commit 95e070581c54ed2edc211874faec56055ea298c8
Author: mkv <Mallikarjuna-Reddy.K-V@amd.com>
Date: Thu Feb 21 01:04:16 2019 -0500
added test file for initial commit
commit 70f12f209bc1901b5205902503707134cf2991a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 16:10:10 2019 -0600
Changed unsafe-loop to unsafe-math optimizations.
Details:
- Changed -funsafe-loop-optimizations (re-)introduced in 7690855 for
make_defs.mk files' CRVECFLAGS to -funsafe-math-optimizations (to
account for a miscommunication in issue #300). Thanks to Dave Love
for this suggestion and Jeff Hammond for his feedback on the topic.
commit 7690855c5106a56e5b341a350f8db1c78caacd89
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 19:16:01 2019 -0600
Restored -funsafe-loop-optimizations to subconfigs.
Details:
- Restored use of -funsafe-loop-optimizations in the definitions of
CRVECFLAGS (when using gcc), but only for sub-configurations (and
not configuration families such as amd64, intel64, and x86_64).
This more or less reverts 5190d05 and 6cf1550.
commit 44994d1490897b08cde52a615a2e37ddae8b2061
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 18:35:30 2019 -0600
Disable TBM, XOP, LWP instructions in AMD configs.
Details:
- Added -mno-tbm -mno-xop -mno-lwp to CKVECFLAGS in bulldozer,
piledriver, steamroller, and excavator configurations to explicitly
disable AMD's bulldozer-era TBM, XOP, and LWP instruction sets in an
attempt to fix the invalid instruction error that has plagued Travis
CI builds since 6a014a3. Thanks to Devin Matthews for pointing out
that the offending instruction was part of TBM (issue #300).
- Restored -O3 to piledriver configuration's COPTFLAGS.
commit 1e5b530744c1906140d47f43c5cad235eaa619cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 18:04:38 2019 -0600
Reverted piledriver COPTFLAGS from -O3 to -O2.
Details:
- Debugging continues; changing COPTFLAGS for piledriver subconfig from
-O3 to -O2, its original value prior to 6a014a3.
commit 6cf155049168652c512aefdd16d74e7ff39b98df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 17:29:51 2019 -0600
Removed -funsafe-loop-optimizations from all configs.
Details:
- Error persists. Removed -funsafe-loop-optimizations from all remaining
sub-configurations.
commit 5190d05a27c5fa4c7942e20094f76eb9a9785c3e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 17:07:35 2019 -0600
Removed -funsafe-loop-optimizations from piledriver.
Details:
- Error persists; continuing debugging from bf0fb78c by removing
-funsafe-loop-optimizations from piledriver configuration.
commit bf0fb78c5e575372060d22f5ceeb5b332e8978ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 16:51:38 2019 -0600
Removed -funsafe-loop-optimizations from families.
Details:
- Removed -funsafe-loop-optimizations from the configuration families
affected by 6a014a3, specifically: intel64, amd64, and x86_64.
This is part of an attempt to debug why the sde, as executed by
Travis CI, is crashing via the following error:
TID 0 SDE-ERROR: Executed instruction not valid for specified chip
(ICELAKE): 0x9172a5: bextr_xop rax, rcx, 0x103
commit 6a014a3377a2e829dbc294b814ca257a2bfcb763
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 14:52:29 2019 -0600
Standardized optimization flags in make_defs.mk.
Details:
- Per Dave Love's recommendation in issue #300, this commit defines
COPTFLAGS := -03
and
CRVECFLAGS := $(CKVECFLAGS) -funsafe-loop-optimizations
in the make_defs.mk for all Intel- and AMD-based configurations.
commit 565fa3853b381051ac92cff764625909d105644d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 18 11:43:58 2019 -0600
Redirect trsm pc, ir parallelism to ic, jr loops.
Details:
- trsm parallelization was temporarily simplifed in 075143d to entirely
ignore any parallelism specified via the pc or ir loops. Now, any
parallelism specified to the pc loop will be redirected to the ic
loop, and any parallelism specified to the ir loop will be redirected
to the jr loop. (Note that because of inter-iteration dependencies,
trsm cannot parallelize the ir loop. Parallelism via the pc loop is
at least somewhat feasible in theory, but it would require tracking
dependencies between blocks--something for which BLIS currently lacks
the necessary supporting infrastructure.)
commit a023c643f25222593f4c98c2166212561d030621
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 20:18:55 2019 -0600
Regenerated symbols in build/libblis-symbols.def.
Details:
- Reran ./build/regen-symbols.sh after running
'configure --enable-cblas auto'
commit 075143dfd92194647da9022c1a58511b20fc11f3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 18:52:45 2019 -0600
Added support for IC loop parallelism to trsm.
Details:
- Parallelism within the IC loop (3rd loop around the microkernel) is
now supported within the trsm operation. This is done via a new branch
on each of the control and thread trees, which guide execution of a
new trsm-only subproblem from within bli_trsm_blk_var1(). This trsm
subproblem corresponds to the macrokernel computation on only the
block of A that contains the diagonal (labeled as A11 in algorithms
with FLAME-like partitioning), and the corresponding row panel of C.
During the trsm subproblem, all threads within the JC communicator
participate and parallelize along the JR loop, including any
parallelism that was specified for the IC loop. (IR loop parallelism
is not supported for trsm due to inter-iteration dependencies.) After
this trsm subproblem is complete, a barrier synchronizes all
participating threads and then they proceed to apply the prescribed
BLIS_IC_NT (or equivalent) ways of parallelism (and any BLIS_JR_NT
parallelism specified within) to the remaining gemm subproblem (the
rank-k update that is performed using the newly updated row-panel of
B). Thus, trsm now supports JC, IC, and JR loop parallelism.
- Modified bli_trsm_l_cntl_create() to create the new "prenode" branch
of the trsm_l cntl_t tree. The trsm_r tree was left unchanged, for
now, since it is not currently used. (All trsm problems are cast in
terms of left-side trsm.)
- Updated bli_cntl_free_w_thrinfo() to be able to free the newly shaped
trsm cntl_t trees. Fixed a potentially latent bug whereby a cntl_t
subnode is only recursed upon if there existed a corresponding
thrinfo_t node, which may not always exist (for problems too small
to employ full parallelization due to the minimum granularity imposed
by micropanels).
- Updated other functions in frame/base/bli_cntl.c, such as
bli_cntl_copy() and bli_cntl_mark_family(), to recurse on sub-prenodes
if they exist.
- Updated bli_thrinfo_free() to recurse into sub-nodes and prenodes
when they exist, and added support for growing a prenode branch to
bli_thrinfo_grow() via a corresponding set of help functions named
with the _prenode() suffix.
- Added a bszid_t field thrinfo_t nodes. This field comes in handy when
debugging the allocation/release of thrinfo_t nodes, as it helps trace
the "identity" of each nodes as it is created/destroyed.
- Renamed
bli_l3_thrinfo_print_paths() -> bli_l3_thrinfo_print_gemm_paths()
and created a separate bli_l3_thrinfo_print_trsm_paths() function to
print out the newly reconfigured thrinfo_t trees for the trsm
operation.
- Trival changes to bli_gemm_blk_var?.c and bli_trsm_blk_var?.c
regarding variable declarations.
- Removed subpart_t enum values BLIS_SUBPART1T, BLIS_SUBPART1B,
BLIS_SUBPART1L, BLIS_SUBPART1R. Then added support for two new labels
(semantically speaking): BLIS_SUBPART1A and BLIS_SUBPART1B, which
represent the subpartition ahead of and behind, respectively,
BLIS_SUBPART1. Updated check functions in bli_check.c accordingly.
- Shuffled layering/APIs for bli_acquire_mpart_[mn]dim() and
bli_acquire_mpart_t2b/b2t(), _l2r/r2l().
- Deprecated old functions in frame/3/bli_l3_thrinfo.c.
commit 78bc0bc8b6b528c79b11f81ea19250a1db7450ed
Author: Nicholai Tukanov <nicholai@utexas.edu>
Date: Thu Feb 14 13:29:02 2019 -0600
Power9 sub-configuration (#298)
Formally registered power9 sub-configuration.
Details:
- Added and registered power9 sub-configuration into the build system.
Thanks to Nicholai Tukanov and Devangi Parikh for these contributions.
- Note: The sub-configuration does not yet have a corresponding
architecture-specific kernel set registered, and so for now the
sub-config is using the generic kernel set.
commit 6b832731261f9e7ad003a9ea4682e9ca973ef844
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 16:01:28 2019 -0600
Generalized ref kernels' pragma omp simd usage.
Details:
- Replaced direct usage of _Pragma( "omp simd" ) in reference kernels
with PRAGMA_SIMD, which is defined as a function of the compiler being
used in a new bli_pragma_macro_defs.h file. That definition is cleared
when BLIS detects that the -fopenmp-simd command line option is
unsupported. Thanks to Devin Matthews and Jeff Hammond for suggestions
that guided this commit.
- Updated configure and bli_config.h.in so that the appropriate anchor
is substituted in (when the corresponding pragma omp simd support is
present).
commit b1f5ce8622b682b79f956fed83f04a60daa8e0fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 5 17:38:50 2019 -0600
Minor updates to scripts in test/mixeddt/matlab.
commit 38203ecd15b1fa50897d733daeac6850d254e581
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Mon Feb 4 15:28:28 2019 -0500
Added thunderx2 system in the mixeddt test scripts
Details:
- Added thunderx2 (tx2) as a system in the runme.sh in test/mixeddt
commit dfc91843ea52297bf636147793029a0c1345be04
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Mon Feb 4 15:23:40 2019 -0500
Fixed gcc flags for thunderx2 subconfiguration
Details:
- Fixed -march flag. Thunderx2 is an armv8.1a architecture not armv8a.
commit c665eb9b888ec7e41bd0a28c4c8ac4094d0a01b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 28 16:22:23 2019 -0600
Minor updates to docs, Makefiles.
Details:
- Changed all occurrances of
micro-kernel -> microkernel
macro-kernel -> macrokernel
micro-panel -> micropanel
in all markdown documents in 'docs' directory. This change is being
made since we've reached the point in adoption and acceptance of
BLIS's insights where words such as "microkernel" are no longer new,
and therefore now merit being unhyphenated.
- Updated "Implementation Notes" sections of KernelsHowTo.md, which
still contained references to nonexistent cpp macros such as
BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
- Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
- Minor updates to Testsuite.md, including suggesting use of
'make check' and 'make check-fast' when running from the local
testsuite directory.
- Added a comment to top-level Makefile explaining the purpose behind
the TESTSUITE_WRAPPER variable, which at first glance appears to serve
no purpose.
commit 1aa280d0520ed5eaea3b119b4e92b789ecad78a4
Author: M. Zhou <5723047+cdluminate@users.noreply.github.com>
Date: Sun Jan 27 21:40:48 2019 +0000
Amend OS detection for kFreeBSD. (#295)
commit fffc23bb35d117a433886eb52ee684ff5cf6997f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 25 13:35:31 2019 -0600
CREDITS file update.
commit 26c5cf495ce22521af5a36a1012491213d5a4551
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 24 18:49:31 2019 -0600
Fixed bug in skx subconfig related to bdd46f9.
Details:
- Fixed code in the skx subconfiguration that became a bug after
committing bdd46f9. Specifically, the bli_cntx_init_skx() function
was overwriting default blocksizes for the scomplex and dcomplex
microkernels despite the fact that only single and double real
microkernels were being registered. This was not a problem prior to
bdd46f9 since all microkernels used dynamically-queried (at runtime)
register blocksizes for loop bounds. However, post-bdd46f9, this
became a bug because the reference ukernels for scomplex and dcomplex
were written with their register blocksizes hard-coded as constant
loop bounds, which conflicted the the erroneous scomplex and dcomplex
values that bli_cntx_init_skx() was setting in the context. The
lesson here is that going forward, all subconfigurations must not set
any blocksizes for datatypes corresponding to default/reference
microkernels. (Note that a blocksize is left unchanged by the
bli_cntx_set_blkszs() function if it was set to -1.)
commit 180f8e42e167b83a757340ad4bd4a5c7a1d6437b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 24 18:01:15 2019 -0600
Fixed undefined behavior trsm ukr bug in bdd46f9.
Details:
- Fixed a bug that mainfested anytime a configuration was used in which
optimized microkernels were registered and the trsm operation (or
kernel) was invoked. The bug resulted from the optimized microkernels'
register blocksizes conflicting with the hard-coded values--expressed
in the form of constant loop bounds--used in the new reference trsm
ukernels that were introduced in bdd46f9. The fix was easy: reverting
back to the implementation that uses variable-bound loops, which
amounted to changing an #if 0 to #if 1 (since I preserved the older
implementation in the file alongside the new code based on constant-
bound loops). It should be noted that this fix must be permanent,
since the trsm kernel code with constant-bound loops can never work
with gemm ukernels that use different register blocksizes.
commit bdd46f9ee88057d52610161966a11c224e5a026c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 24 17:23:18 2019 -0600
Rewrote reference kernels to use #pragma omp simd.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
indexing annotated by the #pragma omp simd directive, which a compiler
can use to vectorize certain constant-bounded loops. (The new kernels
actually use _Pragma("omp simd") since the kernels are defined via
templatizing macros.) Modest speedup was observed in most cases using
gcc 5.4.0, which may improve with newer versions. Thanks to Devin
Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
respectively, with a default row preference for the gemm ukernel. Also
updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
which compiler is detected (via macros such as __GNUC__). These
prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
build/templates and removed AMD as a default copyright holder.
commit 63de2b0090829677755eb5cdb27e73bc738da32d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 23 12:16:27 2019 -0600
Prevent redef of ftnlen in blastest f2c_types.h.
Details:
- Guard typedef of ftnlen in f2c_types.h with a #ifndef HAVE_BLIS_H
directive to prevent the redefinition of that type. Thanks to Jeff
Diamond for reporting this compiler warning (and apologies for the
delay in committing a fix).
commit eec2e183a7b7d67702dbd1f39c153f38148b2446
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 21 12:12:18 2019 -0600
Added escaping to '/' in os_name in configure.
Details:
- Add os_name to the list of variables into which the '/' character is
escaped. This is meant to address (or at least make progress toward
addressing) #293. Thanks to Isuru Fernando for spotting this as the
potential fix, and also thanks to M. Zhou for the original report.
commit adf5c17f0839fdbc1f4a1780f637928b1e78e389
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 18 15:14:45 2019 -0600
Formally registered thunderx2 subconfiguration.
Details:
- Added a separate subconfiguration for thunderx2, which now uses
different optimization flags than cortexa57/cortexa53.
commit 094cfdf7df6c2764c25fcbfce686ba29b933942c
Author: M. Zhou <5723047+cdluminate@users.noreply.github.com>
Date: Fri Jan 18 18:46:13 2019 +0000
Port BLIS to GNU Hurd OS. (#294)
Prevent blis.h from misidentifying Hurd as OSX.
commit 5d7d616e8e591c2f3c7c2d73220eb27ea484f9c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 15 20:52:51 2019 -0600
README.md update re: mixeddt TOMS paper.
commit 58c7fb4788177487f73a3964b7a910fe4dc75941
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 8 17:00:27 2019 -0600
Added more matlab scripts for mixeddt paper.
Details:
- Added a variant set of matlab scripts geared to producing plots that
reflect performance data gathered with and without extra memory
optimizations enabled. These scripts reside (for now) in
test/mixeddt/matlab/wawoxmem.
commit 34286eb914b48b56cdda4dfce192608b9f86d053
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 8 11:41:20 2019 -0600
Minor update to docs/HardwareSupport.md.
commit 108b04dc5b1b1288db95f24088d1e40407d7bc88
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 7 20:16:31 2019 -0600
Regenerated symbols in build/libblis-symbols.def.
Details:
- Reran ./build/regen-symbols.sh after running
'configure --enable-cblas auto' to reflect removal of
bli_malloc_pool() and bli_free_pool().
commit 706cbd9d5622f4690e6332a89cf41ab5c8771899
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 7 18:28:19 2019 -0600
Minor tweaks/cleanups to bli_malloc.c, _apool.c.
Details:
- Removed malloc_ft and free_ft function pointer arguments from the
interface to bli_apool_init() after deciding that there is no need to
specify the malloc()/free() for blocks within the apool. (The apool
blocks are actually just array_t structs.) Instead, we simply call
bli_malloc_intl()/_free_intl() directly. This has the added benefit
of allowing additional output when memory tracing is enabled via
--enable-mem-tracing. Also made corresponding changes elsewhere in
the apool API.
- Changed the inner pools (elements of the array_t within the apool_t)
to use BLIS_MALLOC_POOL and BLIS_FREE_POOL instead of BLIS_MALLOC_INTL
and BLIS_FREE_INTL.
- Disabled definitions of bli_malloc_pool() and bli_free_pool() since
there are no longer any consumers of these functions.
- Very minor comment / printf() updates.
commit 579145039d945adbcad1177b1d53fb2d3f2e6573
Author: Minh Quan Ho <1337056+hominhquan@users.noreply.github.com>
Date: Mon Jan 7 23:00:15 2019 +0100
Initialize error messages at compile time (#289)
* Initialize error messages at compile time
- Assigning strings directly to the bli_error_string array, instead of
snprintf() at execution-time.
* Retired bli_error_init(), _finalize().
Details:
- Removed functions obviated by changes in 80e8dc6: bli_error_init(),
bli_error_finalize(), and bli_error_init_msgs(), as well as calls to
the former two in bli_init.c.
* Regenerated symbols in build/libblis-symbols.def.
Details:
- Reran ./build/regen-symbols.sh after running
'configure --enable-cblas auto'.
commit aafbca086e36b6727d7be67e21fef5bd9ff7bfd9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 7 12:38:21 2019 -0600
Updated external package language in README.md.
Details:
- Updated/added comments about Fedora, OpenSUSE, and GNU Guix under the
newly-renamed "External GNU/Linux packages" section. Thanks to Dave
Love for providing these revisions.
commit daacfe68404c9cc8078e5e7ba49a8c7d93e8cda3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 7 12:12:47 2019 -0600
Allow running configure with python 3.4.
Details:
- Relax version blacklisting of python3 to allow 3.4 or later instead
of 3.5 or later. Thanks to Dave Love for pointing out that 3.4 was
sufficient for the purpose of BLIS's build system. (It should be
noted that we're not sure which, if any, python3 versions prior to
3.4 are insufficient, and that the only thing stopping us from
determining this is the fact that these earlier versions of python3
are not readily available for us to test with.)
- Updated docs/BuildSystem.md to be explicit about current python2 vs
python3 version requirements.
commit cdbf16aa93234e0d6a80f0d0e385ec81e7b75465
Author: prangana <pradeep.rao@amd.com>
Date: Fri Jan 4 15:59:21 2019 +0530
Update version 1.3
Change-Id: I32a7d24af860e87a60396614075236afb65a28a9
commit cf9c1150515b8e9cc4f12e0d4787b3471b12ba4a
Author: kdevraje <Kiran.Devrajegowda@amd.com>
Date: Thu Jan 3 09:51:46 2019 +0530
This commit adds a macro, which is to be enabled when BLIS is working on single instance mode
Change-Id: I7f3fd654b78e64c4e6e24e9f0e245b1a30c492b0
commit ad8d9adb09a7dd267bbdeb2bd1fbbf9daf64ee76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 3 16:08:24 2019 -0600
README.md, CREDITS update.
Details:
- Added "What's New" and "What People Are Saying About BLIS" sections to
README.md.
- Added missing github handles to various individuals' entries in the
CREDITS file.
commit 7052fca5aef430241278b67d24cef6fe33106904
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 2 13:48:40 2019 -0600
Apply f272c289 to bli_fmalloc_noalign().
Details:
- Perform the same check for NULL return values and error message output
in bli_fmalloc_noalign() as is performed by bli_fmalloc_align(). (This
change was intended for f272c289.)
commit 528e3ad16a42311a852a8376101959b4ccd801a5
Merge: 3126c52e f272c289
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 2 13:39:19 2019 -0600
Merge branch 'amd'
commit 3126c52ea795ffb7d30b16b7f7ccc2a288a6158d
Merge: 61441b24 8091998b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 2 13:37:37 2019 -0600
Merge branch 'amd'
commit f272c2899a6764eedbe05cea874ee3bd258dbff3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 2 12:34:15 2019 -0600
Add error message to malloc() check for NULL.
Details:
- Output an error message if and when the malloc()-equivalent called by
bli_fmalloc_align() ever returns NULL. Everything was already in place
for this to happen, including the error return code, the error string
sprintf(), the error checking function bli_check_valid_malloc_buf()
definition, and its prototype. Thanks to Minh Quan Ho for pointing out
the missing error message.
- Increased the default block_ptrs_len for each inner pool stored in the
small block allocator from 10 to 25. Under normal execution, each
thread uses only 21 blocks, so this change will prevent the sba from
needing to resize the block_ptrs array of any given inner pool as
threads initially populate the pool with small blocks upon first
execution of a level-3 operation.
- Nix stray newline echo in configure.
commit eb97f778a1e13ee8d3b3aade05e479c4dfcfa7c0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 25 20:17:09 2018 -0600
Added missing AMD copyrights to previous commit.
Details:
- Forgot to add AMD copyrights to several touched files that did not
already have them in 2f31743.
commit 2f3174330fb29164097d664b7c84e05c7ced7d95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 25 19:35:01 2018 -0600
Implemented a pool-based small block allocator.
Details:
- Implemented a sophisticated data structure and set of APIs that track
the small blocks of memory (around 80-100 bytes each) used when
creating nodes for control and thread trees (cntl_t and thrinfo_t) as
well as thread communicators (thrcomm_t). The purpose of the small
block allocator, or sba, is to allow the library to transition into a
runtime state in which it does not perform any calls to malloc() or
free() during normal execution of level-3 operations, regardless of
the threading environment (potentially multiple application threads
as well as multiple BLIS threads). The functionality relies on a new
data structure, apool_t, which is (roughly speaking) a pool of
arrays, where each array element is a pool of small blocks. The outer
pool, which is protected by a mutex, provides separate arrays for each
application thread while the arrays each handle multiple BLIS threads
for any given application thread. The design minimizes the potential
for lock contention, as only concurrent application threads would
need to fight for the apool_t lock, and only if they happen to begin
their level-3 operations at precisely the same time. Thanks to Kiran
Varaganti and AMD for requesting this feature.
- Added a configure option to disable the sba pools, which are enabled
by default; renamed the --[dis|en]able-packbuf-pools option to
--[dis|en]able-pba-pools; and rewrote the --help text associated with
this new option and consolidated it with the --help text for the
option associated with the sba (--[dis|en]able-sba-pools).
- Moved the membrk field from the cntx_t to the rntm_t. We now pass in
a rntm_t* to the bli_membrk_acquire() and _release() APIs, just as we
do for bli_sba_acquire() and _release().
- Replaced all calls to bli_malloc_intl() and bli_free_intl() that are
used for small blocks with calls to bli_sba_acquire(), which takes a
rntm (in addition to the bytes requested), and bli_sba_release().
These latter two functions reduce to the former two when the sba pools
are disabled at configure-time.
- Added rntm_t* arguments to various cntl_t and thrinfo_t functions, as
required by the new usage of bli_sba_acquire() and _release().
- Moved the freeing of "old" blocks (those allocated prior to a change
in the block_size) from bli_membrk_acquire_m() to the implementation
of the pool_t checkout function.
- Miscellaneous improvements to the pool_t API.
- Added a block_size field to the pblk_t.
- Harmonized the way that the trsm_ukr testsuite module performs packing
relative to that of gemmtrsm_ukr, in part to avoid the need to create
a packm control tree node, which now requires a rntm_t that has been
initialized with an sba and membrk.
- Re-enable explicit call bli_finalize() in testsuite so that users who
run the testsuite with memory tracing enabled can check for memory
leaks.
- Manually imported the compact/minor changes from 61441b24 that cause
the rntm to be copied locally when it is passed in via one of the
expert APIs.
- Reordered parameters to various bli_thrcomm_*() functions so that the
thrcomm_t* to the comm being modified is last, not first.
- Added more descriptive tracing for allocating/freeing small blocks and
formalized via a new configure option: --[dis|en]able-mem-tracing.
- Moved some unused scalm code and headers into frame/1m/other.
- Whitespace changes to bli_pthread.c.
- Regenerated build/libblis-symbols.def.
commit 61441b24f3244a4b202c29611a4899dd5c51d3a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 19:38:11 2018 -0600
Make local copy of user's rntm_t in level-3 ops.
Details:
- In the case that the caller passes in a non-NULL rntm_t pointer into
one of the expert APIs for a level-3 operation (e.g. bli_gemm_ex()),
make a local copy of the rntm_t and use the address of that local copy
in all subsequent execution (which may change the contents of the
rntm_t). This prevents a potentially confusing situation whereby a
user-initialized rntm_t is used once (in, say, gemm), and then found
by the user to be in a different state before it is used a second
time.
commit e809b5d2f1023b4249969e2f516291c9a3a00b80
Merge: 76016691 0476f706
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 16:27:26 2018 -0600
Merge branch 'master' into amd
commit 1f4eeee5175a8fc9ac312847c796ce6db5fe75b9
Author: sraut <Biplab.Raut@amd.com>
Date: Wed Dec 19 21:21:10 2018 +0530
Fixed BLAS test failures of small matrix SYRK for single and double precision.
Details:
- SYRK for small matrix was implemented by reusing small GEMM routine. This was
resulting in output written to the full C matrix, and C being symmetric the
lower and upper triangles of C matrix contained same results. BLAS SYRK API
spec demands either lower or upper triangle of C matrix to be written with
results. So, this was resulting in BLAS test failures, even though testsuite
of BLIS was passing small SYRK operation.
- To fix BLAS test failures of small matrix SYRK, separate kernel routines are
implemented for small SYRK for both single and double precision. The newly
added small SYRK routines are in file kernels/zen/3/bli_syrk_small.c.
Now the intermediate results of matrix C are written to a scratch buffer.
Final results are written from scratch buffer to matrix C using SIMD
copy to either lower or upper traingle part of matrix C.
- Source and header files frame/3/syrk/bli_syrk_front.c and
frame/3/syrk/bli_syrk_front.h are changed to invoke new small SYRK routines.
Change-Id: I9cfb1116c93d150aefac673fca033952ecac97cb
commit 6d267375c3a0543f20604d74cc678ad91db3b6f1
Author: sraut <Biplab.Raut@amd.com>
Date: Wed Dec 19 14:22:21 2018 +0530
This commit improves the performance of multi-instance DGEMM when these multiple threads are binded to a CCX.
Multi-Instance: Each thread runs a sequential DGEMM.
Change-Id: I306920c8061b6dad61efac1dae68727f4ac27df6
commit 0476f706b93e83f6b74a3d7b7e6e9cc9a1a52c3b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:56:20 2018 -0600
CHANGELOG update (0.5.1)
commit e0408c3ca3d53bc8e6fedac46ea42c86e06c922d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:56:16 2018 -0600
Version file update (0.5.1)
commit 3ab231afc9f69d14493908c53c85a84c5fba58aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:53:37 2018 -0600
ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
commit d1aa87164e1e82347d62aa98793963c5265ef7e7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:52:40 2018 -0600
README.md update (External packages section).
Details:
- Updated External packages section in anticipation of introducing BLIS
into Debian package universe. Thanks to M. Zhou for sponsoring BLIS in
Debian.
commit 7bf901e9265a1acd78e44c06f7178c8152c7e267
Author: sraut <Biplab.Raut@amd.com>
Date: Tue Dec 18 14:39:16 2018 +0530
Fix on EPYC machine for multi instance performance issue,
Issue: For the default values of mc, kc and nc with multi instance mode the performance across the cores dip drastically.
Fix: After experimentation found different set of values (mc, kc and nc) which fits in the cache size, and performance across the remains same across all the cores.
Change-Id: I98265e3b7e61cd7602a0cc5596240e86c08c03fe
commit d2b2a0819a2fccad9165bc48c0e172d79a87542c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 19:26:35 2018 -0600
Removed stray sections from Multithreading.md.
Details:
- Removed unintended section headers from before table of contents.
commit 93d56319f2953cf0e9df1ff2cda90b8e41351b2c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 19:17:30 2018 -0600
Added missing bli_init_once() in bli_thread API.
Details:
- Fixed an issue with specifying threading globally at runtime via
bli_thread_set_num_threads() (the automatic way) or via
bli_thread_set_ways() (the manual way), with bli_thread_init_rntm()
also affected. These functions were not calling bli_init_once() prior
to acting, and therefore their effects on the global rntm_t structure
were being wiped out by the eventual call to bli_init_once(), by some
other BLIS function. Thanks to Ali Emre Gülcü for reporting the
behavior associated with this bug.
- Added additional content to docs/Multithreading.md covering topics of
choosing between OpenMP and pthreads, and specifying affinity via
OpenMP.
- CREDITS file update.
commit 76016691e2c514fcb59f940c092475eda968daa2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 13 17:23:09 2018 -0600
Improvements to bli_pool; malloc()/free() tracing.
Details:
- Added malloc_ft and free_ft fields to pool_t, which are provided when
the pool is initialized, to allow bli_pool_alloc_block() and
bli_pool_free_block() to call bli_fmalloc_align()/bli_ffree_align()
with arbitrary align_size values (according to how the pool_t was
initialized).
- Added a block_ptrs_len argument to bli_pool_init(), which allows the
caller to specify an initial length for the block_ptrs array, which
previously suffered the cost of being reallocated, copied, and freed
each time a new block was added to the pool.
- Consolidated the "buf_sys" and "buf_align" pointer fields in pblk_t
into a single "buf" field. Consolidated the bli_pblk API accordingly
and also updated the bli_mem API implementation. This was done
because I'd previously already implemented opaque alignment via
bli_malloc_align(), which allocates extra space and stores the
original pointer returned by malloc() one element before the element
whose address is aligned.
- Tweaked bli_membrk_acquire_m() and bli_membrk_release() to call
bli_fmalloc_align() and bli_ffree_align(), which required adding an
align_size field to the membrk_t struct.
- Pass the pack schemas directly into bli_l3_cntl_create_if() rather
than transmit them via objects for A and B.
- Simplified bli_l3_cntl_free_if() and renamed to bli_l3_cntl_free().
The function had not been conditionally freeing control trees for
quite some time. Also, removed obj_t* parameters since they aren't
needed anymore (or never were).
- Spun-off OpenMP nesting code in bli_l3_thread_decorator() to a
separate function, bli_l3_thread_decorator_thread_check().
- Renamed:
bli_malloc_align() -> bli_fmalloc_align()
bli_free_align() -> bli_ffree_align()
bli_malloc_noalign() -> bli_fmalloc_noalign()
bli_free_noalign() -> bli_ffree_noalign()
The 'f' is for "function" since they each take a malloc_ft or free_ft
function pointer argument.
- Inserted various printf() calls for the purposes of tracing memory
allocation and freeing, guarded by cpp macro ENABLE_MEM_DEBUG, which,
for now, is intended to be a "hidden" feature rather than one hooked
up to a configure-time option.
- Defined bli_rntm_equals(), which compares two rntm_t for equality.
(There are no use cases for this function yet, but there may be soon.)
- Whitespace changes to function parameter lists in bli_pool.c, .h.
commit f808d829c58dc4194cc3ebc3825fbdde12cd3f93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 12 15:22:59 2018 -0600
Handle edge cases, zero-filling in packm kernels.
Details:
- Updated the API and semantics of packm kernels such that they must now
handle edge cases, meaning that a c-by-k packm kernel must be able to
pack edge cases that are fewer than c rows/columns and be able to
zero-fill the remaining elements. They must also be able to zero-fill
the equivalent region when copying fewer than k columns/rows (which is
needed by trsm). The new packm kernel API is generally:
void packm_kernel
(
conj_t conja,
dim_t cdim,
dim_t n,
dim_t n_max,
ctype* restrict kappa,
ctype* restrict a, inc_t inca, inc_t lda,
ctype* restrict p, inc_t ldp,
cntx_t* restrict cntx
);
where cdim and n are the dimensions (short and long, respectively) of
the submatrix being copied from the source matrix A, and n_max is the
"full" long dimension (corresponding to the k dimension in gemm) of
the micropanel. The "full" short dimension (corresponding to the
register blocksize MR or NR) is not part of the API because it is
known intrinsically by the packm kernel implementation. Thanks to
Devin Matthews for prompting us to make this change (#282).
- Updated all reference packm kernels in ref_kernels/1m according to
above changes, as well as all optimized packm kernels (which only
consisted of those for knl).
- Bumped the major soname version number in 'so_version' to 2. At first
I was considering leaving it unchanged, but I couldn't escape the
reality that the packm kernel API is much closer to an expert API
than it is some obscure helper function interface within the framework
that nobody would ever notice.
- Removed reference packm kernels for mr/nr = 30. The only sub-config
that would have been using those kernels is knc, which is likely no
longer being used by very many people (if any). (This also mostly
offset the larger object code footprint incurred by moving the edge-
case handling into the individual packm kernels.)
- Fixed an obscure race condition for 3mh and 4mh induced methods in
which those implementations were modifying the contexts stored in the
gks rather than a local copy.
- Fixed a minor bug in the testsuite that prevented non-1m-based induced
method implementations of trsm from executing.
commit 02ec0be3ba0b0d6b4186386ae140906a96de919b
Merge: e275def3 c534da62
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 5 19:33:53 2018 -0600
Merge branch 'master' into amd
commit c534da62c0015f91391983da5376c9e091378010
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 5 15:51:05 2018 -0600
Disabled ARM configuration families in registry.
Details:
- Disabled (commented out) the arm32 and arm64 configuration families
in the config_registry file. Having a configuration family registered
only makes sense if BLIS is currently outfitted with runtime hardware
detection logic to choose the appropriate sub-configuration. That
logic is currently missing for ARM architectures, and thus having the
ARM configuration families in the configuration registry only serves
to confuse people. Thanks to Devangi Parikh for suggesting this
change.
commit 6885051a164628904fad0d8a3b39c82f9a7b193c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 5 14:45:39 2018 -0600
Generalizations/cleanup to mixeddt matlab scripts.
Details:
- Parameterized, reorganized, and added comments to matlab scripts in
test/mixeddt/matlab.
- Reordered some lines of code and added comments to plot_l3_perf.m in
test/3m4m/matlab.
commit cbdb0566bf3201a495bbdcb8cb50342fa0098649
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 5 20:06:32 2018 +0000
Updates to 3m4m, mixeddt test driver files.
Details:
- Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
port recent changes to the former to the latter.
- Disabled (for now) code in 3m4m/test_*.c files that disables all
induced methods except for the one that is requested from the
Makefile via the IND macro. This is done because usually, we want to
test whatever method is enabled automatically for complex datatypes.
(That is, when native complex microkernels are missing, we usually
want to test performance of 1m.)
commit 0645f239fbdf37ee9d2096ee3bb0e76b3302cfff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 4 14:31:06 2018 -0600
Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
commit 9b688a2d69dd420f4d2582827c5ac87e422cd3bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 4 13:30:25 2018 -0600
Refer to color mm algorithm in Multithreading.md.
commit 22384fd2b749aa8cfdfad1084ce5e7dbd4ad2d64
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 4 13:09:04 2018 -0600
Minor updates to test_gemm.c in test/mixeddt.
commit 2ba3b1780cbca58e43a3948d67bd07e637036125
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 3 19:40:39 2018 -0600
Removed symbols from libblis-symbols.def.
Details:
- Removed bli_gemm_md_front() and bli_gemm_md_zgemm() symbols from
build/libblis-symbols.def, which will hopefully appease AppVeyor.
commit dcb38c4e59c3395c258799e69bfe2104c578c528
Merge: dc184095 375eb30b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 3 18:06:19 2018 -0600
Merge branch 'dev'
commit 375eb30b0a63ac06a363a5f75f283584258db48b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 3 17:49:52 2018 -0600
Added mixed-precision support to 1m method.
Details:
- Lifted the constraint that 1m only be used when all operands' storage
datatypes (along with the computation datatype) are equal. Now, 1m may
be used as long as all operands are stored in the complex domain. This
change largely consisted of adding the ability to pack to 1e and 1r
formats from one precision to another. It also required adding logic
for handling complex values of alpha to bli_packm_blk_var1_md()
(similar to the logic in bli_packm_blk_var1()).
- Fixed a bug in several virtual microkernels (bli_gemm_md_c2r_ref.c,
bli_gemm1m_ref.c, and bli_gemmtrsm1m_ref.c) that resulted in the wrong
ukernel output preference field being read. Previously, the preference
for the native complex ukernel was being read instead of the pref for
the native real domain ukernel. This bug would not manifest if the
preference for the native complex ukernel happened to be equal to that
of the native real ukernel.
- Added support for testing mixed-precision 1m execution via the gemm
module of the testsuite.
- Tweaked/simplified bli_gemm_front() and bli_gemm_md.c so that pack
schemas are always read from the context, rather than trying to
sometimes embed them directly to the A and B objects. (They are still
embedded, but now uniformly only after reading the schemas from the
context.)
- Redefined cpp macro bli_l3_ind_recast_1m_params() as a static function
and renamed to bli_gemm_ind_recast_1m_params() (since gemm is the only
consumer).
- Added 1m optimization logic (via bli_gemm_ind_recast_1m_params()) to
bli_gemm_ker_var2_md().
- Added explicit handling for beta == 1 and beta == 0 in the reference
gemm1m virtual microkernel in ref_kernels/ind/bli_gemm1m_ref.c.
- Rewrote various level-0 macro defs, including axpyris, axpbyris,
scal2ris, and xpbyris (and their conjugating counterparts) to
explicitly support three operand types and updated invocations to
xpbyris in bli_gemmtrsm1m_ref.c.
- Query and use the storage datatype of the packed object instead of the
storage datatype of the source object in bli_packm_blk_var1().
- Relocated and renamed frame/ind/misc/bli_l3_ind_opt.h to
frame/3/gemm/ind/bli_gemm_ind_opt.h.
- Various whitespace/comment updates.
commit e275def30ac41cadce296560fa67282704f20a02
Merge: 8091998b dc184095
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 30 15:39:50 2018 -0600
Merge branch 'master' into amd
commit dc18409551f341125169fe8d4d43ac45e81bdf28
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 28 11:58:40 2018 -0600
CREDITS file update.
commit ee4d2712963816f84d7e3fdd39d93424e1aaf63d
Merge: e81c4b56 3d7e8bc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 28 11:52:57 2018 -0600
Merge pull request #287 from SuperFluffy/fix_configuration_links
Fix configuration links
commit 3d7e8bc3b8e77693152138e75676f71573e5e6cd
Author: Richard Janis Goldschmidt <janis.beckert@gmail.com>
Date: Wed Nov 28 15:56:37 2018 +0100
Fix configuration links
commit 6a4885f8be9ecd81423ebf2eb6da75d7981c979b
Merge: 1d8aae22 e81c4b56
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 27 13:22:59 2018 -0600
Merge branch 'master' into dev
commit e81c4b56660b25a39f8fdc09fbe07459c5bd8e8e
Merge: 757043ea cfbdb58d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 21 17:00:49 2018 -0600
Merge pull request #285 from isuruf/pthread
Move LDFLAGS to the end
commit cfbdb58de2e44f2e3a3d8b14fceece7aef4b3006
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 14:23:39 2018 -0600
Move LDFLAGS to the end
Otherwise the linker will drop flags like -lpthread
commit 757043eae8630c0a76e9bb04f2cb0bd72439a86a
Merge: e769bf46 7af8fa01
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 21 13:07:26 2018 -0600
Merge pull request #283 from isuruf/patch-3
Fix MinGW and Cygwin build failures
commit 7af8fa01373b7bb30fa3b1fd110fd201c87ea225
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 02:10:05 2018 -0600
Fix blis dll path
commit 2acd8dcd23805203a6821358c5e3e09d521fecdf
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 02:02:18 2018 -0600
Fix install path of dll.a
commit b7b0ad22b151e89e2a6c7782cf4d8d47b4e60734
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:54:44 2018 -0600
Test mingw
commit bafe521ed0012b7b8814404b78a6c576d8386370
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:54:36 2018 -0600
Fixes for mingw
commit be831879bd03edcddff8a345161f749ad92215af
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:39:32 2018 -0600
test gcc shared
commit f6b924648c79c4b1c3d3c7fbf85372680aff8362
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:39:19 2018 -0600
Don't use .def for gcc
commit ce6e4eae6d5e977e6f699acc9cf239be8ac53771
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:34:56 2018 -0600
test no threading
commit c9169b4685bfe81bc562cf9128b35a6a9884799b
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:17:36 2018 -0600
Add mingw64 path
commit 0f753090eaf4264b743a49ce15de97514bcbe112
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:14:52 2018 -0600
Fix PATH
commit d424470b1f2fa8717fa54c0245b21341504665f6
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 01:04:26 2018 -0600
Check openmp and pthreads threading
commit c73e7601e58239e2dedec6c9f1b752e949254a42
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 00:50:33 2018 -0600
Revert "enable rdp"
This reverts commit 368274bcbd0c9232521d14fa28304f35ced0e6d7.
commit 6209b2e6060b89e65f3405c31333af8952dd63c0
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 00:50:22 2018 -0600
Remove conda
commit 0b1b344447b8a2fcd635a48f0ce7ce89b2107dc4
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 00:42:39 2018 -0600
Fix make name
commit 7a9838983ba8dd32ac9f87712255721542ff561f
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 00:35:27 2018 -0600
Use m2w64-make
commit 4c1dedd6a90087807f16353a5d0bcaaade35a7a5
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 00:28:20 2018 -0600
No activate on gcc
commit 368274bcbd0c9232521d14fa28304f35ced0e6d7
Author: Isuru Fernando <isuruf@gmail.com>
Date: Tue Nov 20 23:40:26 2018 -0600
enable rdp
commit 707a5e7f9b07f554e1e9289dd0ce3b7dc4fded6e
Author: Isuru Fernando <isuruf@gmail.com>
Date: Tue Nov 20 23:39:31 2018 -0600
No conda for mingw build
commit 65b0565c0ad9162d4474bd84eabde491fa971538
Author: Isuru Fernando <isuruf@gmail.com>
Date: Tue Nov 20 23:19:38 2018 -0600
Check MinGW-w64
commit 9ddffba5847080e0d77d9e6059d05dc4b1d89ba5
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Nov 21 00:23:34 2018 -0600
Fix MinGW build failure
Fixes https://github.com/flame/blis/issues/278
commit 1d8aae220bc52ce8e3a8afaa64b57e5d83480bdc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 20 18:42:07 2018 -0600
Track internal scalar datatypes.
Details:
- Added a num_t datatype bitfield to the obj_t in the form of a new
info2 field in the obj_t. This change was made primarily so that in
the case of mixed-datatype gemm, the alpha scalar would not need to
be cast to the storage datatype of B (or A) before then being cast to
the computation datatype just before the macrokernel is called. This
double-casting regime could result in loss of precision if the storage
datatype of B (or A) is less than the computation precision. In
practice, it was likely not going to be a big deal since most usage of
alpha is for -1.0, 0.0, and 1.0 (or integer multiples thereof), which
can all be represented exactly in single or double precision.
- The type of objbits_t was changed to uint32_t, so the new format
potentially takes up the same space as the previous obj_t definition,
assuming no padding inserted by the compiler. Shrinking info to 32
bits and spilling over into a second field was chosen over using the
high 32 bits of a single 64-bit objbits_t info field because many of
the bitwise operations are performed with enums such as num_t, dom_t,
and prec_t, which may take on the type of 32-bit ints. It's easier to
just keep all of those bitwise operations in 32 bits than perform a
million typecasts throughout bli_type_defs.h and bli_obj_macro_defs.h
to ensure that the integers are treated as 64-bit for the purposes of
the ANDs, ORs, and bitshifts.
- Many comment updates.
- Thanks to Devin Matthews and Devangi Parikh for their feedback and
involvement during this commit cycle.
commit e769bf46b0931d68031af212110484ec98e16908
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 20 16:16:53 2018 -0600
Tweak testsuite to issue FAIL for Nan, Inf (#279).
Details:
- Adjusted the definition for libblis_test_get_string_for_result() in
testsuite/src/test_libblis.c so that the "FAIL" string is returned if
the computed residual contains either NaN or Inf. Previously, a
residual containing NaN would result in the selection of the "PASS"
string. Thanks to Devin Matthews for reporting this issue (#279).
- Expounded on comment for the macro definitions of bli_isnan() and
bli_isinf() in bli_misc_macro_defs.h to make it more obvious why they
must remain macros.
commit 279deae18fb8b8106161863b46fcb38232314de4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 16 11:34:19 2018 -0600
Added 4x5 matlab plotting scripts to test/3m4m.
Details:
- Added a new directory, test/3m4m/matlab, containing matlab scripts for
plotting 4x5 panels of performance graphs (using the subplot()
function) for gemm, hemm, herk, trmm, and trsm across all four
floating-point datatypes. I expect to further refine these scripts as
time goes on, but their current state constitutes a good start.
commit 7b02c726650336c12286c8ba166d1d0fdf7601a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 14 13:49:55 2018 -0600
CREDITS file update.
commit 84dd298a27033945fa2d3b6e5dce1fe625cd2a0a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 14 13:47:45 2018 -0600
Patch to fix msys2/Windows build failure (#277).
Details:
- Expanded cpp guard in frame/include/bli_x86_asm_macros.h to also check
__MINGW32__ in addition to _WIN32, __clang__, and __MIC__. Thanks to
Isuru Fernando for suggesting this fix, and also to Costas Yamin for
originally reporting the issue (#277).
commit 8091998b6500e343c2024561c2b1aa73c3bafb0b
Merge: 333d8562 7b5ba731
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 14 12:36:35 2018 -0600
Merge branch 'master' into amd
commit 7b5ba7319b3901ad0e6c6b4fa3c1d96b579efbe9
Merge: ce719f81 52392932
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 14 12:32:01 2018 -0600
Merge branch 'dev' of github.com:flame/blis into dev
commit 52392932dc1ea3c16220cc4e6978efcb2f5f0616
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 13 22:23:38 2018 +0000
Minor fixes to test/3m4m drivers.
Details:
- Cleanups to Makefile to allow all test drivers to be built for
OpenBLAS and MKL in addition to BLIS.
- Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
- Fixed incorrect types for betap in BLAS cpp macro branch of
test_herk.c.
commit 4f12e36a0d0e6df146314b4e50e36c5e7a1af3d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 13 14:23:12 2018 -0600
Fixed number of columns in first output line.
Details:
- In previous commit, forgot to remove output column corresponding to
the k dimension.
commit a2e0cdd7debf8109198536d55af05d5631072fb2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 13 14:15:11 2018 -0600
Added hemm test driver to test/3m4m.
Details:
- Added a new test_hemm.c test driver to test/3m4m, which was modeled
after the driver by the similar name in test. Also updated Makefile
so that blis-nat-[sm]t would trigger builds for the new driver.
commit 0f9b53e84b48d8d73a56cc9889eae3595ca58a78
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 13 13:03:15 2018 -0600
Fixed a bug in high-level mixeddt conditional.
Details:
- Fixed a bug in frame/3/bli_l3_oapi.c in the conditional that divides
use of induced method (1m) execution from native execution. The former
was intended to only be used in cases where all storage datatypes are
complex and the datatype of C is equal to the computation datatype.
(If mixed datatypes are detected, native execution would be used.)
However, the code in bli_gemm() was erroneously checking the execution
datatype instead of the computation datatype, which at that point is
guaranteed to be equal to the storage datatype even if the computation
datatype contains a different value. Thanks to Devangi Parikh for
helping in isolating this bug.
commit 333d8562f04eea0676139a10cb80a97f107b45b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Nov 11 14:28:53 2018 -0600
Added debug output to bli_malloc.c.
Details:
- Added debug output to bli_malloc.c in order to debug certain kinds of
memory behavior in BLIS. The printf() statements are disabled and must
be enabled manually.
- Whitespace/comment updates in bli_membrk.c.
commit ce719f816d1237f5277527d7f61123e77180be54
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 10 14:48:43 2018 -0600
More edits to mixeddt matlab scripts.
Details:
- Renamed scripts in test/mixeddt/matlab:
plot_case_all.m -> plot_dom_all.m
plot_case_md.m -> plot_dom_case.m
plot_all_md.m -> plot_dt_all.m
- Added plot_dt_select.m in order to plot select graphs for the main
body of the mixeddt paper, and added additional related legend
handling in plot_gemm_perf.m.
- Added test/mixeddt/matlab/output and a .gitkeep file within in order
to force git to recognize the directory.
commit bf99e7c14baf45725b698d06ad043b531e3a2763
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 8 18:47:17 2018 -0600
Minor updates to test/mixeddt driver.
Details:
- Cleaned up test/mixeddt Makefile in preparation for gathering new
data for mixeddt paper, including renaming implementations to
"internal" and "ad-hoc" to match the terminology to be used in the
paper.
- Added new matlab scripts for generating 8 figures, each covering all
mixed-precision cases for each mixed-domain case.
- Updated the runme.sh script according to changes to Makefile.
- Fixed a minor bug in test_gemm.c that may have given incorrect
performance in complex, homogeneous storage datatype cases where
the computation precision was equal to the storage precisions.
(Examples: zzzd, cccs.)
commit 4bbb454bf3c361af9e97bfa394a73d610cd9002a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 3 19:11:01 2018 -0500
Testsuite docs update for mixed-datatype gemm.
Details:
- Updated docs/Testsuite.md to include mention of the new mixed-domain
and mixed-precision settings, including descriptions.
- Updated docs/MixedDatatypes.md to include a brief section on running
the testsuite to exercise mixed-datatype functionality, which mostly
amounts to a link to the Testsuite.md document.
- Minor verbiage change to testsuite output to correct a misleading
label associated with the value returned by the query function
bli_info_get_simd_num_registers(). (The function does not return the
number of SIMD registers present in the hardware, but rather a maximum
assumed value for the purposes of allocating temporary microtile
workspace on the function stack.)
commit 16401ae922b1285437cf5f6867b2764650a95fb0
Merge: f19c33af 2d403a15
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 3 19:09:43 2018 -0500
Merge branch 'dev'
commit 2d403a1535380a2ebe2ae2c0f5ac54ba7564fbeb
Merge: e90e7f30 4a12979f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 1 20:18:53 2018 -0500
Merge pull request #275 from RhysU/patch-1
Spelling in FAQ
commit 4a12979f65697ed79ba290efd59f4b994ac9429b
Author: Rhys Ulerich <rhys.ulerich@gmail.com>
Date: Thu Nov 1 20:20:59 2018 -0400
Spelling in FAQ
commit f19c33af4cbe6f5705b96fbf2b8799c3c2bd75c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 26 17:07:15 2018 -0500
Disallow 64b BLAS integers + 32b BLIS integers.
Details:
- Print an error message from configure if the user attempts to
explicitly configure BLIS for simultaneous use of 64-bit integers in
the BLAS API with 32-bit integers in the BLIS API.
- Added cpp macro conditional to bli_type_defs.h to mandate that BLIS
integers be 64 bits if the BLAS integers are 64 bits. This and the
above item take care of issue #274. Thanks to Devin Matthews and
Jeff Hammond for suggesting these safeguards.
- Slight reorganization and relabeling (for clarity) of BLAS/CBLAS
sections and BLIS integer size line of the testsuite configuration
output.
- Very minor edits to docs/MixedDatatypes.md.
commit e90e7f309b3f2760a01e8e09a29bf702754fa2b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 25 14:09:43 2018 -0500
CHANGELOG update (0.5.0)
commit be7c57819cfd48adb175d9a480cc9f37928645c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 25 14:09:40 2018 -0500
Version file update (0.5.0)
commit 75da7f2a208ad7d26ed9c6d3e10d08b2a1caf9d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 25 14:02:41 2018 -0500
ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- Updated docs/FAQ.md to reflect recent developments, and other edits.
- Minor updates to RELEASING.
commit 6fbc456fb3f4401ec951a618990f15a84fdfa236
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 25 13:20:25 2018 -0500
Added SALT testing to Travis CI.
Details:
- Modified .travis.yml to automatically employ the simulation of
application-level threading within the testsuite, with supporting
changes to common.mk, the top-level Makefile, and
travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
'.salt' suffix (similar to those with the '.fast' suffix) for
testing application-level threading.
- Updated docs/BuildSystem.md to document the new make targets
'testblis-salt' and 'checkblis-salt'.
commit 0e27963a6770e6b64f3299ad0613d5df45d8b6ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 24 12:16:19 2018 -0500
Add bli_pthread_mutex_trylock().
Details:
- Added the missing bli_pthread_mutex_trylock() function and prototype
to the non-Windows sections of bli_pthread.c and .h. This function
isn't needed by BLIS, but I figured why not make the Windows and
non-Windows sections consistent with one another.
commit 4b683740c12f83804a51ec610b16ce28607d5c85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 24 11:56:16 2018 -0500
Defined bli_pthread_cond_*() and related defs.
Details:
- Added function definitions for bli_pthread_cond_*() as well as related
types and constants to bli_pthread.c, and corresponding prototypes to
bli_pthread.h.
commit 4b4f8072b9bb495b3e01d45698b0bad3dac31ba8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 24 11:31:46 2018 -0500
Define bli_pthreads barrier types on OS X.
Details:
- Fully define bli_pthreads barrier-related types on OS X. Only typedef
those types in terms of pthreads types on non-Windows, non-Apple OSes
(i.e. Linux).
commit ad98790dcef6bd9aab7f13d615b987b5daa58757
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 23 20:35:05 2018 -0500
Fix names of Windows pthread initializer macros.
Details:
- Renamed the PTHREAD_ initializer macros in the Windows cpp case to use
BLIS_ prefixes to match their non-Windows counterparts.
commit 06c23954e6b17219a50c3d37821544a46defaf89
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 23 19:16:54 2018 -0500
Defined unified bli_pthreads_*() API for all OSes.
Details:
- Expanded the bli_pthread_*() -> pthread_*() wrappers in
frame/thread/bli_pthread.c to include cases for Windows taken from
frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
and always used by BLIS and the BLIS testsuite (in lieu of calling
pthreads directly, as before). The implementation used in this new
API depends on whether we are building for Windows, and to a lesser
extent, whether we are building on OS X. For the core API, Windows
uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
OS X and Windows get barriers implemented in terms of other
bli_pthread_*() functions, and Linux gets barriers implemented in
terms of pthread_barrier*(). This commit addresses issue #273.
- Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
which was erroneously calling pthread_mutex_lock().
- Minor changes to configure so that the auto-detection executable
can be built given the above changes (most notably, turning on
POSIX extensions via -D_GNU_SOURCE).
- Removed temporary play-test code for shiftd that accidentally got
committed into test/3m4m/test_gemm.c.
commit 0ae9585da1e3db1cf8034d4b16305a5883beb0d3
Author: pradeeptrgit <pradeep.rao@amd.com>
Date: Tue Oct 23 09:36:23 2018 +0530
Update version number to 1.2
Change-Id: Ibb31f6683cdecca6b218bc2f0c14701d7e92ebf3
commit eac7d267a017d646a2c5b4fa565f4637ebfd9da7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 22 18:10:59 2018 -0500
Unconditionally define bli_l3_thread_entry().
Details:
- Define a dummy bli_l3_thread_entry() function when multithreading is
disabled altogether, or enabled via OpenMP. This function was
originally necessary when multithreading is enabled via pthreads.
By defining the function no matter the threading options given, it is
less likely that an AppVeyor Windows build will complain due to a
missing symbol in the DLL. (To be clear: AppVeyor was working fine
before, but a problem may have arisen if it were switched to an
OpenMP build.)
- Removed the prototype for bli_l3_thread_entry() from
bli_thrcomm_pthreads.c and placed it in bli_thrcomm.h.
- Regenerated the symbols list file build/libblis-symbols.def.
commit 4ee986f0a74207f4ca29df077929134725d62b80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 22 14:09:44 2018 -0500
Added mixed-datatype testing to Travis CI (#271).
Details:
- Modified .travis.yml to automatically test the mixed-datatype support
of the gemm operation, with supporting changes to common.mk, the
top-level Makefile, and travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
'.mixed' suffix (similar to those with the '.fast' suffix) for testing
mixed-datatype gemm.
- Updated docs/BuildSystem.md to document the new make targets
'testblis-md' and 'checkblis-md'.
commit c3c6ebc9c6244053d654a9b0c955acb2fef42ee8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 21 18:48:54 2018 -0500
Fixed thrinfo_t printing for small problems.
Details:
- Fixed a bug in the code that prints out the communicator and work ids
from the various threads' thrinfo_t nodes. This bug manifested when
the dimension being parallelized was not large enough such that every
thread was assigned actual work (since the minimum amount of work is
determined by the register blocksize in the dimension being
parallelized). In those cases, the threads that receive no work in
that dimension do not finish building their thrinfo_t tree, leaving
lower-level nodes non-existent. (The bug itself was usally observed as
a segfault when the printing code attempted to dereference all the way
down the thrinfo_t tree.) The solution involves explicitly checking
each node as it is dereferenced, and if at any time NULL is found, all
subsequent communicator and work ids are set to -1.
commit 73a222c0d99dcc221be7dea10eaebf844f31f72e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Oct 20 14:13:04 2018 -0500
Minor edits to 'configure --help' text.
commit 14f3d5e6df183819a0c393b2661ad15df0786544
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 20:39:35 2018 -0500
Refresh libblis-symbols.def post-merge 090e4f0.
commit 090e4f08fc2f429a1b2db77b0a6f8276f892a7ac
Merge: c9be5889 0854e880
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 18:41:10 2018 -0500
Merge branch 'master' into dev
commit 0854e880b0848e0c2e3d0644c93c80b0fd13c0dc
Merge: 4e38a8d4 343a2715
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 18:05:00 2018 -0500
Merge pull request #261 from flame/win-pthreads
Implement missing pthreads function on Windows
commit c9be5889fbe947c64ef75740662e4d63032f4c35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 17:42:40 2018 -0500
Added "Known issues" section to Multithreading.md.
Details:
- Added known issues section to Multithreading.md.
- Trivial changes to MixedDatatypes.md, Sandboxes.md.
commit 343a2715ebee28d250ee41b914abdcd1dc77c344
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 16:59:19 2018 -0500
Whitespace changes to configure, bli_pthread_wrap.
Details:
- Mostly whitespace changes (spaces to tabs) to configure and
bli_pthread_wrap.c and .h.
commit 3678a1cd518df9447b4b1ea86885eb2ba8abcf6e
Merge: 85397cd4 4e38a8d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 16:11:31 2018 -0500
Merge branch 'master' into win-pthreads
commit 4e38a8d4eebb18ead74e644fac76a4fde8e7f6c6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 15:54:15 2018 -0500
Implemented python version checking in configure.
Details:
- Added python version checking to configure script. (Recall that python
is needed to execute the flatten-headers.py script.) Minimum versions
of python needed are currently as follows:
python2: 2.7 or later
python3: 3.5 or later
The standard search order for python interpeters is:
python python3 python2
The PYTHON environment variable is also supported and will be checked
before the standard search order list.
- Updated BuildSystem.md to include: a minimum make version; mention
that the C compiler must actually be a C99 compiler; and the caveat
that Windows builds do not require pthreads since BLIS can provide
an implementation of pthreads internally.
commit 85397cd4fa52f6c4c33f4fb715478c55533c680e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 13:12:43 2018 -0500
Added explanatory comment to bli_pthread.c.
Details:
- Added a verbose comment to bli_pthread.c that explains why a bli_
wrapper to pthreads APIs is useful.
commit 53c07035ef61cc9b8469636d4d8fa5085f37652d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 19 12:53:03 2018 -0500
Refresh libblis-symbols.def from bb6df28.
Details:
- Forgot to regenerate the symbols file after the previous commit
(bb6df281) in which shiftd operation was introduced.
commit 473ce54f5fbea4860ac0514e7e8b022c1ea03e63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 18 19:03:56 2018 -0500
Added bli_pthread_*() API.
Details:
- Defined a bli_pthread_*() API so that the testsuite, when being linked
against a Windows DLL, will be able to access pthreads functionality
without those pthreads functions being explicitly exported by the DLL.
Instead, we export the bli_pthread_*() layer, which uses types and
functions that are identical to pthreads, but adds a 'bli_' prefix.
Only a few basic functions are present in the bli_pthreads_*() API
for now. Thanks to Devin Matthews and Isuru Fernando for their help
on a related PR (#261) that this commit will hopefully facilitate.
- Updated testsuite so that it calls bli_pthread_*() layer instead of
pthread_*() functions directly.
- Regenerated build/libblis-symbols.def.
- Comment updated to build/regen-symbols.sh.
commit bb6df2814fcaa2fa62a549379f61be2f8667a598
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 18 17:11:39 2018 -0500
Defined a new level-1d operation: shiftd.
Details:
- Defined a new level-1d operation called 'shiftd', including object and
typed APIs. This operation adds a scalar value to every element along
an arbitrary diagonal of a matrix. Currently, shiftd is implemented in
terms of the addv kernel. (The scalar is passed in as the x vector
with an increment of zero.)
- Replaced ad-hoc usage of setd and addd (after creating a temporary
matrix object) with use of shiftd, which is much more concise, in
various test driver files in the testsuite. Similar changes were made
to the standalone test drivers and the example code.
- Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md
for bli_shiftd() and bli_?shiftd(), respectively.
- Added observed object properties to level-1d documentation in
BLISObjectAPI.md.
commit 53e0a0c9b38e8525c7224e280342ef56328af567
Merge: 1c7247b6 ec676799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 18 14:54:59 2018 -0500
Merge branch 'master' into win-pthreads
commit ec67679990660a60362a49406595383672812287
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 18 14:27:02 2018 -0500
Refreshed Windows symbol list; added regen script.
Details:
- Moved windows/build/libblis-symbols.def to build/libblis-symbols.def.
Updated link commands in common.mk accordingly.
- Added a new script build/regen-symbols.sh that will regenerate the
libblis-symbols.def file in its new location after building a
haswell-targeted shared library. Thanks to Isuru Fernando for
providing the symbol generation command.
- Ran the new script to refresh the symbols file.
commit fdad54ab8eee4a7efd04ec4afb3e6902eb22e60a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 18 12:43:22 2018 -0500
Removed old symbol from libblis-symbols.def.
Details:
- Removed bli_gemm_ker_var1() from windows/build/libblis-symbols.def
since this function is no longer compiled.
commit 49d3f9fcbb4a75553439f97c099ea48d85763eea
Merge: 779d64dc 3c527256
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 17 18:00:40 2018 -0500
Merge branch 'master' into dev
commit 3c52725693d0d7726e1c8fb224f9b1ef786db8b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 17 14:56:22 2018 -0500
Renamed/moved l3 zen ukernels to haswell kernel set.
Details:
- Renamed the microkernels in kernels/zen/3 to kernels/haswell/3 and
then updated the file contents to use the 'haswell' infix.
- Updated bli_cntx_init_zen.c and bli_cntx_init_haswell.c according to
above function renames.
- Moved/updated the corresponding prototypes in bli_kernels_zen.h to
bli_kernels_haswell.h.
- Updated config_registry according to above changes.
- NOTE: This rename reflects the fact that haswell microkernels are
specifically written to overcome the floating-point latency for FMA
instructions on Intel Haswell-like architectures, which can issue two
FMA instructions per cycle. These ukernels happen to work fine on AMD
Zen-based architectures. However, Zen only issues one FMA per cycle,
which, while halving its floating-point throughput, gives it extra
flexibility in the design of its microkernels--namely, mr and nr can
be smaller and still overcome the floating-point latency for those
single-issue cores. A smaller value of mr and nr allows for a larger
value of kc, which may be useful in some situations. In the future,
we may write such Zen-specific microkernels to take advantage of this
additional flexibility.
commit 71c5832d5f5596f25204980803423d08143a4010
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 17 14:11:01 2018 -0500
Consolidated slab/rr-explicit level-3 macrokernels.
Details:
- Consolidated the *sl.c and *rr.c level-3 macrokernels into a single
file per sl/rr pair, with those files named as they were before
c92762e. The consolidation does not take away the *option* of using
slab or round-robin assignment of micropanels to threads; it merely
*hides* the choice within the definitions of functions such as
bli_thread_range_jrir(), bli_packm_my_iter(), and bli_is_last_iter()
rather than expose that choice explicitly in the code. The choice of
slab or rr is not always hidden, however; there are some cases
involving herk and trmm, for example, that require some part of the
computation to use rr unconditionally. (The --thread-part-jrir option
controls the partitioning in all other cases.)
- Note: Originally, the sl and rr macrokernels were separated out for
clarity. However, aside from the additional binary code bloat, I later
deemed that clarity not worth the price of maintaining the additional
(mostly similar) codes.
commit 57eab3a4f0e43099fc2ff189df9fcc0d7801c2cd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 17 11:29:20 2018 -0500
CREDITS file update.
commit 6722ec21817cbab9d86ee63f00984eb407b5e627
Author: Ye Luo <xw111luoye@gmail.com>
Date: Wed Oct 17 11:26:00 2018 -0500
Fix bgclang compilation on BGQ (#270)
* Fix bgq kernels
* Support bgq with bgclang
commit 1c7247b6d146fc728d7c4240e4e069e33f8f8868
Merge: c1bc5530 6c5a1aaf
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 16 14:44:32 2018 -0500
Merge branch 'win-pthreads' of github.com:flame/blis into win-pthreads
commit c1bc5530d51bf55b4aa3c35165f6d4452a0fd779
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 16 14:44:10 2018 -0500
Don't call pthread_once in auto-detect.
commit b9c61d03f542a2e92551ff0595415bec3076ab25
Merge: 5a1e461f 3612ecac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 16 14:39:57 2018 -0500
Merge branch 'nested-omp-patch'
commit 5a1e461ffe09ed200ee2fc7aafccf6dd7e8c0080
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 16 14:21:45 2018 -0500
Execute flatten-headers.py via $(PYTHON).
Details:
- Execute build/flatten-headers.py python script via $(PYTHON) in
common.mk. This allows distributions that define the current/preferred
python interpreter in the PYTHON environment variable to use that
interpreter when executing flatten-headers.py. Thanks to Isuru
Fernando for this suggestion, and for Dave Love for submitting the
initial issue/request.
commit 6c5a1aaff540b19672e91501e894ed695aee322b
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 16 10:15:59 2018 -0500
Fix type in bli_pthread_wrap.c
commit 29e6245816760b1bd4ac738d7d3e11a9d9d13473
Merge: 0b73209f ed657714
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 16 10:12:25 2018 -0500
Merge branch 'master' into win-pthreads
commit 0b73209f6b22cc024169146d343627f6999b63d8
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 16 10:02:06 2018 -0500
Add missing argument to WaitForSingleObject and use $is_win in configure
to turn off pthreads.
commit ed65771482a705f7ed028d822489766327b44e76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 15 17:54:45 2018 -0500
Fixed merge fail on testsuite threading macros.
Details:
- Applied the following C preprocessor macro renames
BLIS_DEFAULT_MR_THREAD_MAX -> BLIS_THREAD_MAX_IR
BLIS_DEFAULT_NR_THREAD_MAX -> BLIS_THREAD_MAX_JR
BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N
in src/test_libblis.c. This is apparently the result of a failure by
git to properly merge the 'master' and 'amd' branches in the previous
commit. (The 'master' branch contained a commit, 53a9ab1, in which
these same cpp macros were renamed throughout the source distribution.
commit dc5fd898af8c74c2e2a75fc647157da0d04dd922
Merge: 667d3929 637c2ce7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 15 17:41:35 2018 -0500
Merge branch 'amd'
commit 779d64dc3091dea6b7530283304e52878151d218
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 15 17:13:18 2018 -0500
Added entry for xpbym to input.operations.fast.
Details:
- Forgot to add an entry for the new xpbym operation to
input.operations.fast in previous commit.
commit 5fec95b99f61761963834f62a9867f797687813c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 15 16:37:39 2018 -0500
Implemented mixed-datatype support for gemm.
Details:
- Implemented support for gemm where A, B, and C may have different
storage datatypes, as well as a computational precision (and implied
computation domain) that may be different from the storage precision
of either A or B. This results in 128 different combinations, all
which are implemented within this commit. (For now, the mixed-datatype
functionality is only supported via the object API.) If desired, the
mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
that requires a single m-by-n matrix be allocated (temporarily) per
call to gemm. This optimization aims to avoid the overhead involved in
repeatedly updating C with general stride, or updating C after a
typecast from the computation precision. This memory optimization may
be disabled at configure-time (provided that the mixed-datatype
support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
The user may test gemm with mixed domains, precisions, both, or
neither.
- Added a standalone test driver directory for building and running
mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
except that imaginary values are not touched when casting a real
operand to a complex operand. (By contrast, in these situations castm
sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
accessor functions. Also commented out all usage of accessor
functions within macrokernels. (Typecasting in the microkernel is
still feasible, though probably unrealistic for now given the
additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
(courtsey of Devin Matthews).
commit 3612ecac98a9d36c3fcd64154121d420bb69febd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 11 15:16:41 2018 -0500
Added comments to nested OpenMP handling code.
Details:
- Added comments to bli_thrcomm_openmp.c relating to changes made in
6ac0c80 and 1064d79.
commit 667d3929ee20e94849b4e25b693b4037b7e3f350
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 11 11:47:57 2018 -0500
Added Fortran APIs for some thread functions.
Details:
- Defined Fortran-77 compatible APIs for bli_thread_set_num_threads()
and bli_thread_set_ways(). These wrappers are defined in
frame/compat/blis/thread/b77_thread.c. Thanks to Kay Dewhurst for
suggesting these new interfaces.
- Added missing prototype for bli_thread_set_ways() in bli_thread.h and
removed prototypes for non-existent functions bli_thread_set_*_nt().
- CREDITS file update.
commit 1064d79711f03a0541b92d8b8b9b7e25e04097a5
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 11 11:14:25 2018 -0500
Adjust rntm_t struct as well.
commit 6ac0c805609b85616ddb32e50101c4f9feb25a35
Author: Devin Matthews <damatthews@smu.edu>
Date: Thu Oct 11 10:45:07 2018 -0500
Fix OMP nesting problem.
Detect when OpenMP uses fewer threads than requested and correct accordingly, so that we don't wait forever for nonexistent threads. Fixes #267.
commit 78a6935483409ae277c766406e175772e820b1de
Author: sraut <Biplab.Raut@amd.com>
Date: Thu Oct 11 10:49:40 2018 +0530
Added comments for the change in syrk small matrix change.
Change-Id: I958939e9953323730da49ef07d1b10e578837d82
commit 53a9ab1c85be14dcfd2560f5b16e898e3e258797
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 10 15:11:09 2018 -0500
Renamed thread auto-factorization macro constants.
Details:
- Renamed the following C preprocessor macros whose fallback/default
values are specified within frame/include/bli_kernel_macro_defs.h:
BLIS_DEFAULT_MR_THREAD_MAX -> BLIS_THREAD_MAX_IR
BLIS_DEFAULT_NR_THREAD_MAX -> BLIS_THREAD_MAX_JR
BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N
- Renamed the above cpp macro overrides within the knl, skx, and zen
sub-configurations, as well as invocations of those macros in
bli_rntm.c.
- Moved config/zen/bli_kernel.h to an 'old' directory as it is no longer
used by any code within BLIS.
commit 637c2ce794b0414ba8b25e9a452f7d64f825d63a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 9 17:18:04 2018 -0500
Updated column index range for irun.py -q.
Details:
- Forgot to apply the column index range fix in 10f179f to situations
when "quiet" mode (-q) is requested. This commit applies the new
column index range modifications to the quiet case.
commit e2a59400bdda7ed7ee0ff00edea70c00ed593b6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 9 15:29:48 2018 -0500
Allow trsm_l parallelism in the jc loop.
Details:
- Previously, trsm was consolidating all ways of parallelism into the jr
loop. This was unnecessary and to some degree detrimental on some
types of hardware. Now, any parallelism bound for the jc loop will be
applied to the jc loop, while all other loops' parallelism is funneled
to the jr loop. Thanks to Devangi Parikh for helping investigate this
issue and suggesting the fix.
- NOTE: This change affects only left-side trsm. However, currently
right-side trsm is currently implemented in terms of the left-side
case, and thus the change effectively applies to both left and right
cases.
commit f1dba506c970f14e612580d3c171e7c5ffd0a5fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 8 17:59:41 2018 -0500
Output threading status/params from testsuite.
Details:
- Updated testsuite to output various parameters related to parallelism
in BLIS. These parameters include:
- threading status: disabled, openmp, or pthreads;
- thread partitioning for jr/ir loops: slab or rr (round-robin);
- ways of parallelism from environment variables, and also actual
values used by gemm, herk, trmm_l, trmm_r, trsm_l, and trsm_r for
square problems (assuming all dimensions are set to 1000);
- automatic thread factorization parameters.
- Also output the status of two relatively new configure-time options:
libmemkind and the sandbox.
commit 10f179fb13fc1179921a4ef8efdd2174f01e07da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 8 14:36:38 2018 -0500
Updated irun.py to use updated column index range.
Details:
- Updated the irun.py script so that it updates the matlab column index
range (if found) to reflect the additional columns of data that are
substituted in. Thanks to Devangi Parikh for recognizing and reporting
this issue.
commit c244a716c97849dee41f52b5f424116aae1b710b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 7 20:59:40 2018 -0500
Added missing -r option to configure --help output.
Details:
- Added inadvertantly-omitted mention of -r option-equivalent to
--thread-part-jrir to the output for 'configure --help'. Also made
minor edits to the same text.
commit c92762ecdca1eb0b08c8acd583b4739a1e3fbd39
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 7 20:30:32 2018 -0500
Added option of slab or rr partitioning in jr/ir.
Details:
- Updated existing macrokernel function names and definitions to
explicitly use slab assignment of micropanels to threads, then created
duplicate versions of macrokernels that explicitly use round-robin
assignment instead of slab. NOTE: As in ac18949, trsm_r macrokernels
were not substantially updated in this commit because they are
currently disabled in bli_trsm_front.c.
- Updated existing packing function (in blk_packm_blk_var1.c) to
explicitly use slab partitioning, and then duplicated for round-robin.
- Updated control tree initialization to use the appropriate macrokernel
and packm function pointers depending on which method (slab or rr) was
enabled at configure-time.
- Updated configure script to accept new --thread-part-jrir=[slab|rr]
option (-m [slab|rr] for short), which allows the user to explicitly
request either slab or round-robin assignment (partitioning) of
micropanels to threads.
- Updated sandbox/ref99 according to above changes.
- Minor updates to build/add-copyright.py.
commit 98e01ea04bfe1032e5bd4781043afd84f864a19e
Merge: ac18949a 541b8a3b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 4 20:44:12 2018 -0500
Merge branch 'master' into amd
commit 541b8a3b3e9af4078f5e6fb2f9608d681839952a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 4 20:39:06 2018 -0500
Removed 1h short-circuit from bli_clock_min_diff().
Details:
- Removed a guard from bli_clock_min_diff() that would return 0 if the
time delta was greater than 60 minutes. This was originally intended
to disregard extremely large values under the assumption that the
user probably didn't intend to run a test that long. However, since
it is in bli_clock_min_diff(), it doesn't actually help short-circuit
an implementation that is hanging or looping infinitely, since such
an implementation would first have to finish before the
bli_clock_min_diff() is called. Thanks to Kiran Varaganti for
reporting this issue.
commit f0c3ef359f7c6c1687fb2671cb35deb346e00597
Author: Kiran V <Kiran.Varaganti@amd.com>
Date: Thu Oct 4 16:32:21 2018 +0530
This is a fix to floating-point exception error for BLIS SGEMM with larger matrix sizes.
BUG No: CPUPL-197 fixed by Thangaraj Santanu
The bli_clock_min_diff() function in BLIS assumed that if the time taken is greater than 1 hour then the reading must be wrong. However this is not the case in general, while the other checks such as time taken closer to zero or nsec is ofcourse valid.
gerrit review: http://git.amd.com:8080/#/c/118694/1/frame/base/bli_clock.c
Change-Id: I9dc313d7c5fdc20684f67a516bf3237de3e0694a
commit 8bf30eb4735872388b5317883d99b775a344ce25
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Wed Oct 3 22:22:29 2018 -0400
Fixed runme.sh in test/studies/thunderx2
Details:
- Fixed the setting of threads for a single core run.
commit f6f2456ba2afa8f85f43c7c2c90acc439d61d94f
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Wed Oct 3 21:43:46 2018 -0400
Fixed the Makefile in test/studies/thunderx2
Details:
- Fixed target for make-all-st and make-all-mt so that the armpl
targets are built
commit 743a1a6dec1bd3908f0f15513b501c9bd59715b3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 3 14:40:10 2018 -0500
Fixed misleading version query from gcc 7+.
Details:
- gcc 7 introduced new behavior to the -dumpversion option whereby only
the major version component is output. However, as part of this
change, gcc 7 also introduced a new option, -dumpfullversion, which is
guaranteed to always output the major, minor, and revision numbers. If
we are using gcc 7 or later, we re-query the version string with this
new option and then re-parse the result so as to avoid misleading
output from configure (e.g. using gcc 7.3.0 is reported as 7.7.7).
commit de07840ba5672b9d7b2ed2b918974e98c3f249fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 3 13:57:25 2018 -0500
Whitespace, https updates to README.md.
Details:
- Reformatted to fit all lines within 80 columns, unless a link is too
long to fit on a single line.
- Changed some links from http to https.
commit 80a8b3dd8034ec8bc03d31be3f9c837c3f6fc94b
Author: sraut <Biplab.Raut@amd.com>
Date: Wed Oct 3 15:30:33 2018 +0530
Review comments incorporated for small TRSM.
Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9
commit b8dfd82e0d1afda4ee5436662d63515a59b2dee3
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 15:37:12 2018 -0500
Get pthreads via blis.h in the test driver.
commit d0c0c20b7bd3ecf914b5910a50f618fb7d7aa355
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 15:16:00 2018 -0500
There seems to be a problem with _POSIX_BARRIERS on Travis.
commit 0904d9e4df0c8a256ac35c491f14a587ebe9fca2
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 15:04:36 2018 -0500
*Always* use Windows primitives instead of pthreads.
commit 998317d309934cd7129f8c818ea6e5f07534ebc8
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 14:43:24 2018 -0500
Remove pthreads from appveyor build.
commit 627d0c5bfd4b7b149803587391c93b164c11ced5
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 14:40:55 2018 -0500
Combine the alternative barrier implementation for macOS with the pthread wrapper for Windows. Also implement pthread_{create,join} for Windows.
commit 81d2c064a209df7eca7d6103696ca3a137a7f82e
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 11:46:36 2018 -0500
Add wrapper for basic pthreads functionality (mutex, once) with MSVC.
commit d33f130ea621fca1dccb30631f454d237918eb04
Author: Devin Matthews <damatthews@smu.edu>
Date: Tue Oct 2 11:45:43 2018 -0500
Some configure changes:
1) Allow environment variables to be set anywhere in the argument list.
2) Allow any environment variable to be set.
3) Allow LIBPHTREAD to be set to null without getting defaulted to -lpthread.
commit 9d5f1c4f3bf70c2c0ea84bfa326a0113ae2d176c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 1 17:39:26 2018 -0500
Patch to avoid gcc warning in blastest/f2c/open.c.
Details:
- Use the modulo operator to limit the size of an integer that is given
to sprintf(). This avoids a warning in some versions of gcc about the
integer potentially overflowing the available space in the string into
which the integer is being printed.
commit 0c3cd00ba76de607e807f8deb04b1a2ce18ea7a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 1 16:18:25 2018 -0500
More README.md updates.
Details:
- Replaced much of "Getting Started" section with a shortened version of
the bullet list of documentation currently shown in the github wiki
page. Thanks to Devangi Parikh for her feedback in this change.
commit 8eaf34bd23b30a1857a50d7142ee9811895f24bf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 1 14:29:07 2018 -0500
Very minor README.md update.
commit 599090e0eb41b2706fa1231fa7b90096f3281678
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 1 14:04:30 2018 -0500
README.md update.
Details:
- Added language mentioning SHPC group to Introduction.
commit ee46fa3efb6e920fa6c3d0b0601007f5de31deb5
Author: sraut <Biplab.Raut@amd.com>
Date: Mon Oct 1 16:30:30 2018 +0530
Small TRSM optimization changes :- 1) single precision small trsm kernels for XAt=B case are further optimized for performance. 2) double precision small trsm kernels for AX=B and XAtB cases are implemented. 3) single precision small trsm kernels for AutX=B are implemented in intrinsics to improve the current performance.
Change-Id: Ic9d67ae6d8522615257dde018903f049dcffa2cf
commit 08045a6c52b6e025652c5b18eb120c0f4e61cf6f
Author: sraut <Biplab.Raut@amd.com>
Date: Mon Oct 1 15:38:23 2018 +0530
Corrected the fix made for blastest level-3 failure to check m,n,k non-zero condition in bli_gemm_small.c
Change-Id: Idaf9f2327c3127b04a2738ae8a058b83d6c57934
commit ac18949a4b9613741b9ea8e5026d8083acef6fe4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Sep 30 18:54:56 2018 -0500
Multithreading optimizations for l3 macrokernels.
Details:
- Adjusted the method by which micropanels are assigned to threads in
the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly)
employ contiguous "slab" partitioning rather than interleaved (round
robin) partitioning. The new partitioning schemes and related details
for specific families of operations are listed below:
- gemm: slab partitioning.
- herk: slab partitioning for region corresponding to non-triangular
region of C; round robin partitioning for triangular region.
- trmm: slab partitioning for region corresponding to non-triangular
region of B; round robin partitioning for triangular region.
(NOTE: This affects both left- and right-side macrokernels:
trmm_ll, trmm_lu, trmm_rl, trmm_ru.)
- trsm: slab partitioning.
(NOTE: This only affects only left-side macrokernels trsm_ll,
trsm_lu; right-side macrokernels were not touched.)
Also note that the previous macrokernels were preserved inside of
the 'other' directory of each operation family directory (e.g.
frame/3/gemm/other, frame/3/herk/other, etc).
- Updated gemm macrokernel in sandbox/ref99 in light of above changes
and fixed a stale function pointer type in blx_gemm_int.c
(gemm_voft -> gemm_var_oft).
- Added standalone test drivers in test/3m4m for herk, trmm, and trsm
and minor changes to test/3m4m/Makefile.
- Updated the arguments and definitions of bli_*_get_next_[ab]_upanel()
and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h.
- Renamed bli_thread_get_range*() APIs to bli_thread_range*().
commit b952ca8feb6f17f71a4512649c2aa72bdee9c8f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 28 16:12:32 2018 -0500
CREDITS file update.
commit 7d96fc437ebaa9dd2d7071865b5df16402fadd64
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 28 15:40:45 2018 -0500
Allow slashes ('/') in version tags.
Details:
- Updated the configure script to allow slashes in version string. This
is needed so that downstream maintainers (such as those for Debian)
can create local tags such as "upstream/0.4.1". Thanks to M. Zhou for
reporting this issue via PR #256 and providing me the information
needed to debug the problem.
commit 5fdddf6f37c64da093c7f59e3a85214e819ae652
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 28 11:25:54 2018 -0500
Removed 'debian' directory.
Details:
- Removed the top-level 'debian' directory. This directory is apparently
no longer needed (issue #257). Thanks to M. Zhou and Nico Schlömer for
their contributions.
commit 9814cfdf3157ef4726ee604fc895d56e8063d765
Author: Meghana <meghana.vankadari@amd.com>
Date: Fri Sep 28 11:02:39 2018 +0530
fixed blastest level-3 failure by adding ((M&N&K) != 0) to check condition in bli_gemm_small.c
Change-Id: I85e4a32996ebb880f3c00bd293edc38f74700fe6
commit 86330953b14c180862deef3ccdcc6431259be27b
Merge: 7af5283d 807a6548
Author: praveeng <praveen.g@amd.com>
Date: Fri Sep 28 10:08:06 2018 +0530
Resolved conflicts and modified bli_trsm_small.c
Change-Id: I578d419cff658003e0fdd4c4cdc93145d951ce31
commit 60b2650d7406d266feffe232c2d5692a9e3886d0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 24 15:04:45 2018 -0500
Added statistics-collecting irun.py script.
Details:
- Added irun.py script to 'build' directory. This irun.py script is a
python script for repeatedly invoking a test driver executable, such
as those found in test/3m4m, and replace the performance output column
with four columns that aggregate statistics. Specifically, the script
reports the minimum, average, maximum, and standard deviation for each
problem size. This script is useful especially (though not
exclusively) when trying to determine the impact of relatively minor
changes to the code, or other small optimizations that may be
difficult to distinguish from "noise." One way this "noise" manifests
is that a test executable may run slightly slower or faster for all
problem sizes (and all implementations) tested by the executable over
the life of a single execution. The cause of these minor
across-the-board pertubations in the overall performance signatures is
unknown, though we hypothesize that it may relate to any number of
issues such as operating system scheduling, where in memory the
program is loaded, or how the CPU clock frequency is throttled at the
time of execution. Regardless of the source of these subtle
performance anomalies, the statistical properties reported by the
irun.py script help the user to more precisely characterize the
underlying performance exhibited by any given test driver, which
allows him or her to make better judgments about the true difference
in performance between two implementations, or minor changes within a
single implementation.
commit 807a654888117fb3a27ea36384f1c1c11b882cd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 20 15:41:05 2018 -0500
Fixed confusing configure message for libmemkind.
Details:
- Corrected feedback echoed to user by configure when libmemkind is
found but not explicitly requested. In these cases, configure would
echo a message that it had received an explicit request to enable
libmemkind, which was not accurate, even if the end result was the
same--that libmemkind is enabled by default when it is found. Thanks
To Devangi Parikh for reporting this issue.
commit 02adab427c779b0aaf38a5877a5f0246b1909e8f
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Thu Sep 20 14:38:50 2018 -0400
Created a 'thunderx2' subdirectory within test/studies
Details:
- Created a 'thunderx2' subdirectory within test/studies to house
various level-3 test driver used to measure performance on
ThunderX2.
commit d7537fb51dac0636591fc7c68261a2322642ab3c
Merge: dad07245 c03728f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 12 15:24:20 2018 -0500
Merge branch 'dev'
commit dad07245dbcfaf35232ec379ba756eb133c361c1
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Wed Sep 12 04:16:58 2018 -0500
Fixed yet another bug in runme script in test/studies
Details:
- Fixed another copy-paste bug
commit e669057fe35f2037d8111af687d84a0ecf6d7a2a
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Tue Sep 11 22:29:42 2018 -0500
Fixed bug in runme script in test/studies
Details:
- Fixed bug in runme script for skx studies that set the number of
threads incorrectly
commit 232fdc3df3e01ae3f86d53767bd14eb93b511e6e
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Mon Sep 10 18:45:50 2018 -0500
Updated runme script in test/studies.
Details:
- Updated runme script for skx studies to run multithreading tests
on 1 and 2 sockets.
commit c03728f1f45edb5e434db90ab8a77ba0184a682b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 10 17:54:27 2018 -0500
Various minor cleanups.
Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
unconditionally, but differently for Windows and non-Windows, but
then disabled the definition of bli_setenv() entirely since BLIS
no longer needs to set environment variables. Updated bli_winsys.h
accordingly, and call bli_sleep() from within testsuite instead of
sleep() directly.
- Use
#if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
instead of
#if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
when guarding against local definition of pthread barrier in
testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
should always be set to 200809L when barriers are supported, though I
won't be surprised if we encounter a case in the future where it is
set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
top-level Makefile. These are no longer needed because we no longer
output libraries with the version and configuration name as
substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.
commit e249a00a82908054ecd307cf602c8801275903e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 10 16:48:35 2018 -0500
Imported skx dgemm ukernel from skx-redux branch.
Details:
- Added the new bli_dgemm_skx_asm_16x14.c microkernel from the skx-redux
branch, along with appropriate blocksizes in bli_cntx_init_skx.c and
a prototype in bli_kernels_skx.h. (Devin has not yet written the
sgemm analague, so for now we will continue using the older sgemm
ukernel.)
- Updated frame/include/bli_x86_asm_macros.h with a minor change that
was present within the skx-redux branch.
commit e93b01ff60bf9742baa5eefd93e208d1219e7a43
Author: Isuru Fernando <isuruf@gmail.com>
Date: Sun Sep 9 15:57:43 2018 -0500
Windows DLL support (#246)
* Enable shared
* Enable rdp
* Add support for dll
* Use libblis-symbols.def
* Fix building dlls
* Fix libblis-symbols.def
* Fix soname
* Fix Makefile error
* Fix install target
* Fix missing symbols
* Add BLIS_MINUS_TWO
* Add path to dll
* Fix OSX soname
* Add declspec for dll
* Add -DBLIS_BUILD_DLL
* Replace @enable_shared@ in config
* switch to auto for now
* blis_ -> bli_
* Remove BLIS_BUILD_DLL in make check
* change auto->haswell
* enable_shared_01
* Add wno-macro-redefined
* print out.cblat3
* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY
* Use V=1
* Remove fpic for windows
* Remember LIBPTHREAD
* Remove libm for windows
* Remember AR
* Fix remembering libpthread
* Add Wno-maybe-uninitialized in only gcc
* Don't do blastest for shared for now
* Fix install target
And remove unnecessary change
* test auto and x86_64
* Fix install target again
* Use IS_WIN variable
* Remove leading dot from LIBBLIS_SO_MAJ_EXT
* Make is_win yes/no
* Add comments for windows builds
* Change if else blocks location
commit 1330d5c4bc3b644ec0af54c3939a5b9f00eacd9c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 7 19:37:59 2018 -0500
Employ "user" cflags for tl Makefile test targets.
Details:
- Use get-user-cflags-for() to generate cflags when compiling BLAS test
drivers and BLIS testsuite from top-level Makefile. Meant to include
these changes in previous commit (4b5437e). Thanks to Isuru Fernando
for pointing out this oversight.
commit 4b5437ec7afb2befffffbb83f7872bcb4fc61e51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 7 17:24:32 2018 -0500
Define a cpp macro specific to BLIS compilation.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
itself is being built. This macro will not be defined when, for
example, the testsuite or example code compiles code local to those
applications. This was done in part by defining a new cflags function
get-user-cflags-for(), which is now the designated function for
application Makefiles if they wish to inherit a basic set of CFLAGS
from BLIS. (The compiler flags returned are identical to that of
get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
instead of get-frame-cflags-for().
commit cc2cca4f56eb30212a0dce3e5c121e64d9e59560
Merge: e19e7212 fb81c7fc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 6 17:12:13 2018 -0500
Merge branch 'dev'
commit e19e7212872da3d464734199193436faa51f0da0
Merge: 97965b09 b3d0702c
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Thu Sep 6 14:58:49 2018 -0700
Merge pull request #244 from kali/pthread-barrier-osx
add an adhoc impl for pthread_barrier
commit b3d0702cf2ef6dda19a23dd8a677be1b6f73c322
Merge: 4e7d0670 97965b09
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Thu Sep 6 14:58:23 2018 -0700
Merge branch 'master' into pthread-barrier-osx
commit 4e7d06700f176a62952d7d51e41fdcbc6b7a9d5f
Author: Mathieu Poumeyrol <kali@zoy.org>
Date: Thu Sep 6 23:48:31 2018 +0200
second __APPLE__
commit fb81c7fc665d68e6a2add163feb29acc0bce8936
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 6 16:29:39 2018 -0500
Defined cortexa53 sub-configuration.
Details:
- Added a new sub-configuration 'cortexa53', which is a mirror image
of cortexa57 except that it will use slightly different compiler
flags. Thanks to Mathieu Poumeyrol for making this suggestion after
discovering that the compiler flags being used by cortexa57 were
not working properly in certain OS X environments (the fix to which
is currently pending in pull request #245).
commit 24ecc0d94aaa9ab4df1ae6d199c4ec6d7783169f
Author: Mathieu Poumeyrol <kali@zoy.org>
Date: Thu Sep 6 22:10:16 2018 +0200
use _POSIX_BARRIERS instead of __APPLE__
commit 97965b09059a610db06fb7a22bdfa79c0d37d673
Author: Mathieu Poumeyrol <kali@users.noreply.github.com>
Date: Thu Sep 6 21:10:29 2018 +0200
cortexa9 and cortexa53 travis build + qemu test (#245)
commit a6802eab7d94b5a9de633c53beca8245b74f5dc6
Author: Mathieu Poumeyrol <kali@zoy.org>
Date: Thu Sep 6 17:16:35 2018 +0200
reinstantiate test on macos
commit d688a2b7e5a19cba44ea398a99e325e19b8fce50
Author: Mathieu Poumeyrol <kali@zoy.org>
Date: Thu Sep 6 15:25:16 2018 +0200
add an adhoc impl for pthread_barrier
commit ab9f9e684dc3ffbb70cc45b21c67af5d916919e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 30 15:14:02 2018 -0500
CHANGELOG update (0.4.1)
commit 10fd614031307c46db3d893528d4e5fc31f490b3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 30 15:13:59 2018 -0500
Version file update (0.4.1)
commit 08dd67c4b21244851f8416bd59159bea7a9c5b3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 30 15:12:13 2018 -0500
ReleaseNotes.md update in advance of next version.
commit 4fa4cb0734e7de6505b5d6f1aeef3a5d5c89dcbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 29 18:06:41 2018 -0500
Trivial comment header updates.
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
commit b051ffb815baf6c3ece2b5118b679fd9219d5780
Merge: 6f33d9de aaa549f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 29 17:06:48 2018 -0500
Merge branch 'dev'
commit 6f33d9de21fbc2f579846b9104fb9d513753f79c
Author: Mathieu Poumeyrol <kali@users.noreply.github.com>
Date: Wed Aug 29 23:48:22 2018 +0200
fix compilation of armv7a kernels (#242)
commit 8199e339aefdd27019c7f3d8c99818d375d5400b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 27 07:00:12 2018 -0500
Added testsuite threading to input.general.fast.
Details:
- Added lines associated with the testsuite's new threading option to
input.general.fast. This change was intended for the previous commit
(10d0735).
commit 10d07357afbb2d468837aa97369ef9a6d0610817
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 26 20:34:30 2018 -0500
Better thread safety; added threading to testsuite.
Details:
- Replaced critical sections that were conditional upon multithreading
being enabled (via pthreads or OpenMP) with unconditional use of
pthreads mutexes. (Why pthreads? Because BLIS already requires it
for its initialization mechanism: pthread_once().) This was done in
bli_error.c, bli_gks.c, bli_l3_ind.c. Also, replaced usage of BLIS's
mtx_t object and bli_mutex_*() API with pthread mutexes in
bli_thread.c. The previous status quo could result in a race condition
if the application called BLIS from more than one thread. The new
pthread-based code should be completely agnostic to the application's
threading configuration. Thanks to AMD for bringing to our attention
the need for a thread-safety review.
- Added an option to the testsuite to simulate application-level
multithreading. Specifically, each thread maintains a counter that is
incremented after each experiment. The thread only executes the
experiment if: counter % n_threads == thread_id. In other words, the
threads simply take turns executing each problem experiment. Also,
POSIX guarantees that fprintf() will not intermingle output, so
output was switched to fprintf() instead of libblis_test_fprintf().
- Changed membrk_t objects to use pthread_mutex_t intead of mtx_t and
replaced use of bli_mutex_init()/_finalize() in bli_membrk.c with
wrappers to pthread_mutex_init()/_destroy().
- Changed the implementation of bli_l3_ind_oper_enable_only() to fix
a race condition; specifically, two threads calling the function with
the same parameters could lead to a non-deterministic outcome.
- Added #include <pthread.h> to bli_cpuid.c and moved the same in
bli_arch.c.
- Added 'const' to declaration of OPT_MARKER in bli_getopt.c.
- Added #include <pthread.h> to bli_system.h.
- Added add-copyright.py script to automate adding new copyright lines
to (and updating existing lines of) source files.
commit aaa549f4d1e63929fe2bea023ce849253cfbbb42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 26 20:13:51 2018 -0500
Minor update to configure --help (--sharedir option).
Details:
- Fixed/tweaked description for --sharedir=SHAREDIR option.
commit 573b8ac373f821a65cc8afd51cdbe03b8ec01081
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 26 13:51:32 2018 -0500
Fixed copy-paste typo in previous commit.
Details:
- Fixed a typo in travis/do_testsuite.sh introduced in 62ea1d3.
commit 62ea1d33d3bc1e890420a1e828b9d0e87e87533b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 26 13:35:53 2018 -0500
Fixed broken out-of-tree builds.
Details:
- Fixed stale filepaths to check-blastest.sh and check-blistest.sh in
travis/do_testsuite.sh and travis/do_sde.sh.
- Create a symbolic link to the 'config' directory so that the top-level
Makefile can find the configs' make_defs.mk files during out-of-tree
builds.
- Added additional case handling to out-of-tree scenario to handle
situations where files 'Makefile', 'common.mk', or 'config' exist but
are not symbolic links. In such cases, configure warns the user and
exits.
- Homogenized various error messages throughout configure.
- Belated thanks to Victor Eijkhout for requesting the feature added
in 0f491e9 whereby lesser Makefiles can compile and link against
an existing installation of BLIS.
commit 0f491e994a7e14d4dfce26e6a51dba2bccad29a3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 25 20:12:36 2018 -0500
Allow lesser Makefiles to reference installed BLIS.
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
belonging to example code or the testsuite, may be run even if the
directory is orphaned from the original build tree. This allows a
user to configure, compile, and install BLIS, delete the build tree
(that is, the source distribution, or the build directory for out-
of-tree builds) and then compile example or testsuite code and link
against the installed copy of BLIS (provided the example or testsuite
directory was preserved or obtained from another source). The only
requirement is that make be invoked while setting the
BLIS_INSTALL_PATH variable to the same installation prefix used when
BLIS was configured. The easiest syntax is:
make BLIS_INSTALL_PATH=/install/prefix
though it's also permissible to set BLIS_INSTALL_PATH as an
environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
blastest and testsuite, respectively, so that if those directories are
copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
of building/linking against an installed copy of BLIS.
commit 36ff92ce0d3b428b15b6cddc6f5944afe22e43ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 24 18:26:09 2018 -0500
Missing C++ compiler no longer fatal to configure.
Details:
- Changed configure so that the absence of any C++ compiler from the
pre-defined search list does not result in an exit. Instead, in this
situation, the found_cxx variable is assigned 'c++notfound' and the
error message is changed to remind the user that C++ will not be
available in the sandbox. Thanks to Devangi Parikh for reporting this
issue.
- Also tweaked the message when a C++ compiler *is* found to remind any
would-be confused user that BLIS will only use C++ if it is needed by
code in the sandbox.
commit 658f0a129bdc565b072696b6ebddce501132091c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 24 17:49:37 2018 -0500
Fixed obscure integer size bug in va_arg() usage.
Details:
- Fixed a bug in the way that the variadic bli_cntx_set_l3_nat_ukrs()
function was defined. This function is meant to take a microkernel id,
microkernel datatype, microkernel address, and microkernel preference
as arguments, and is typically called within the bli_cntx_init_*()
function defined within a sub-configuration for initializing an
appropriate context. The problem is with the final argument: the
microkernel preference. These preferences are actually boolean values,
0 or 1 (encoded as FALSE or TRUE). Since the variadic function does
not give the compiler any type information for any variadic arguments,
they are "promoted" in the course of internal (macroized) processing
according to default argument promotion rules. Thus, integer literals
such as 0 and 1 become int and floating-point literals (such as 0.0 or
1.0) become double. Previous to this commit, we indicated to va_arg()
that the ukernel preference was a 'bool_t', which is a typedef of
int64_t on 64-bit systems. On systems where int is defined as 64 bits,
no problems manifest since int is the same size as the type we passed
in to va_arg(), but on systems where int is 32 bits, the ukernel
preference could be misinterpreted as a garbage value. (This was
observed on a modern armv8 system.) The fix was to interpret the
bool_t value as int and then immediately typecast it to and store it
as a bool_t. Special thanks to Devangi Parikh for helping track down
this issue, including deciphering the use of va_arg() and its
byzantine treatment of types.
- Added explicit typecasts for all invocations of va_arg() in
bli_cntx.c.
commit e71dc389120b032e42091e4d1a928515ed6f7275
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 24 15:56:04 2018 -0500
Fixed a very minor memory leak in gks.
Details:
- Fixed a memory leak in the global kernel structure that resulted in 56
bytes per configured architecture (of which only 18 are presently
supported by BLIS). The leak would only manifest if BLIS was
initialized and then finalized before the application terminated.
Thanks to Devangi Parikh for helping track down this leak.
commit a7e3a5f9753468c8e665e6c5c3b38d22b7c92500
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 24 14:51:11 2018 -0500
Fixed uncallable bli_finalize().
Details:
- Previously, bli_finalize_once()--which, like bli_init_once(), was
implemented in terms of pthread_once()--was using the same
pthread_once_t control object being used by bli_init(), thus
guaranteeing that it would never be called as long as BLIS had already
been initialized. This could manifest as a rather large memory leak to
any application that attempted to finalize BLIS midway through its
execution (since BLIS reserves several megabytes of storage for
packing buffers per thread used). The fix entailed giving each
function its own pthread_once_t object. Thanks to Devangi Parikh for
helping track down this very quiet bug.
commit a79c21c7c17fb4854fd24c73b81ec5543f74082d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 23 14:40:46 2018 -0500
Fixed cleanmk target post-1b0f8d6.
Details:
- Changed the cleanmk target to delete makefile fragments from their new
home in obj/$(CONFIG_NAME). The old definition worked only because of
a typo (REFERKN_PATH instead of REFKERN_PATH), and only in the
non-verbose (V != 1) case.
commit ffb57242f3eb1175c991fe1b492595fdaa175c27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 22 18:22:41 2018 -0500
Cosmetic output changes to configure.
Details:
- Disable sandbox-related obj directory creation, directory mirroring,
and makefile fragment generation when a sandbox is not enabled.
- Prevent various duplicate actions by configure (such as those
mentioned above for sandboxes above).
commit ac17454aae9ad430f05aa7c156919c6c695c300c
Merge: a77bec76 7afd095a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 22 15:34:53 2018 -0500
Merge branch 'master' into dev
commit a77bec766a01e42f13f8cacbec8c4cbde8ecefef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 22 15:31:29 2018 -0500
Whitespace changes, minor renames in build system.
Details:
- Minor whitespace cleanup, mostly in the form of spaces -> tabs.
- Shortened certain variables' _FRAGMENT_ infixes to _FRAG_ in
common.mk.
commit 1b0f8d60d1132b56485cc202ebf1246898d3a2a4
Author: Devin Matthews <damatthews@smu.edu>
Date: Wed Aug 22 13:19:29 2018 -0700
Generate makefile fragments in build tree (#240)
* Make src dir read-only in out-of-tree build test.
* Generate makefile fragments in the build tree.
commit 7afd095af33690e0175903852b354c9fe46993f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 22 14:58:24 2018 -0500
Removed skx from code snippet in previous commit.
Details:
- The docs/ConfigurationHowTo.md document was written with examples that
did not yet contain the skx sub-configuration, but the previous commit
included bli_arch.c code copied and pasted from a recent commit that
does support skx. To keep things consistent, I've removed skx from the
recently-added ConfigurationHowTo.md code snippet.
commit 48211a980d78673133076e8eced1007b1980f5e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 22 14:55:02 2018 -0500
Update to docs/ConfigurationHowTo.md.
Details:
- Added missing language directing the reader to modify the config_name
string array in bli_arch.c when adding a new sub-configuration. Thanks
to Devangi Parikh for reporting this missing section.
commit 65c9096c6e21f3dc2947fa12be9ea3034f8662dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 17 11:44:12 2018 -0500
Fixed broken -p option to configure.
Details:
- Fixed some stale code that was preventing the -p option to configure
from working as expected (though the --prefix option was unaffected).
This bug was was most likely introduced in 7e5648c (May 7 2018).
Thanks to Dave Love for reporting this issue.
commit e358d5e497c77b305af462f44266370a596445e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 16 12:18:45 2018 -0500
README.md update (Funding section).
commit a61dd5e7bcf23f7237d407a5e06dd44e1bec9ad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 14 17:08:03 2018 -0500
Changed 'test' target to be more like 'check'.
Details:
- Redefined the 'test' make target in the top-level Makefile so that the
final result ("everything passed" or at "least one failure") is echoed
to stdout. Note that 'check' is unchanged, and thus is now effectively
a fast version of 'test'.
- Updated docs/BuildSystem.md to reflect the above change.
commit ce5c3a198a7ae1ca676c27da4541d51ed19d16e1
Merge: 4f6745d6 0bbe69d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 14 16:52:19 2018 -0500
Merge branch 'master' of github.com:flame/blis
commit 4f6745d68a2c66511695eff0beb00a82ffc6bbbe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 14 16:50:47 2018 -0500
Fixed link error when building only shared library.
Details:
- Fixed a linker error that occurred when attempting to compile and link
the testsuite and/or BLAS test drivers after having configured BLIS to
only generate a shared library (no static library). The chosen
solution involved
(1) adding the local library path, $(BASE_LIB_PATH), to the search
paths for the shared library via the link option
-Wl,-rpath,$(BASE_LIB_PATH).
(2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
version number so that ld would find the shared library at
execution time.
Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
pointing out the need for the -rpath option, and to Devangi Parikh for
helping Sajid isolate the problem.
- Added #include <ctype.h> to bli_system.h to avoid a compiler warning
resulting from using toupper() from bli_string.c without a prototype.
Thanks again to Sajid Ali, whose build log revealed this compiler
warning.
- Added '*.so.*' to .gitignore.
- CREDITS file update.
commit 0bbe69d5ed260849297d8f2d35b7668d167482ed
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Tue Aug 14 14:49:58 2018 -0500
Updated plotting scripts in test/studies.
Details:
- Fixed indexing on plots to correspond to the removal of dtime in
the test drivers.
commit e93e0e149e087e08eca2885f1a748a4e88ffe55d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 7 15:54:30 2018 -0500
Removed redefinition of axpyv, scal2v func types.
Details:
- Removed a stray/accidental redefinition of axpyv and scal2v function
types in frame/1d/bli_l1d_ft.h (probably a copy/paste leftover during
development).
commit 1deb33bd16349aaa643694d1bd685ff8a9a5f476
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 7 15:02:50 2018 -0500
Updated penryn kernels to use new _ker_ft type names.
Details:
- Updated older _ft kernel type suffixes used within penryn level-1v
and -1f kernels to use the newer _ker_ft suffix that was introduced
in 0175483. (Thank you Travis CI.)
commit 9cb0b023ca91abdc056d726cdc070062e4954611
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 7 14:21:07 2018 -0500
INSTALL file update.
commit 017548314f3f78f66fbe3264509ac5302bd8d62b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 7 14:13:25 2018 -0500
Replaced function chooser macros w/ func ptr arrays.
Details:
- Previously, most object API functions (_oapi.c) used a function
chooser macro that would expand out to an if-elseif-elseif-else
conditional that used a num_t datatype to call the appropriate
type-specific API (_tapi.c). This always felt a little hackish, and
would get in the way somewhat of addig support for new num_t datatypes
in the future. So, I've replaced that functionality with code that
queries a function pointer that is then typecast appropriately. This
model of function calling was already pervasive for kernels queried
from the cntx_t structure. It was also already in use in various other
functions, such as macrokernels, and this commit simply extends that
pattern.
- The above change required many new files, mostly header files, that
define the function types (mostly _ft.h) for the queriable functions
as well as some source files to define the function pointer arrays and
their corresponding query functions (_fpa.c). Various other function
types, mostly for kernel function types, were renamed to reduce the
potential for confusion with the function types for expert and basic
(non-expert) typed API functions.
- Removed definitions for all of the "bli_call_ft_*()" function chooser
macros from bli_misc_macro_defs.h.
commit addce089664561f9f63efa6f107e58fc48d29871
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 6 13:18:20 2018 -0500
Format spec and other updates in test, test/3m4m.
Details:
- Removed the dtime (delta time, or wallclock time) column from the
matlab output of all test drivers in test, test/3m4m, test/studies.
This value was rarely (if ever) really needed and usually only served
to take up screen space.
- Updated format specifier in test/studies/skx to use %7.2f instead of
%6.3f.
- For the test drivers in 'test' directory, added an initial line of
output that sets last entry of matlab matrix to zero in order to
induce a pre-allocation of the entire array of performance results.
commit 94d5ef42c833a4d43e50a80d46dddbd7a56d2db6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 4 15:57:17 2018 -0500
Adjusted gflops format spec in testsuite, test/3m4m.
Details:
- Changed the format specifier for the gflops column in the testsuite
output from %7.3f to %7.2f. This was done mainly to keep the output
aligned properly when the expected perfomance exceeded 1000 gflops.
Also, two decimal places still conveys plenty of precision for all
practical applications, including just eyeballing performance deltas
between two executions (let alone two implementations).
- Changed the format specifier for gflops in the test/3m4m drivers
from %6.3f to %7.2f (for the same reasons listed above).
commit c7ff06bae92b9b6c6656f2030d13486b95417821
Merge: 6074082c ebe998d0
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Wed Aug 1 14:20:41 2018 -0500
Merge branch 'master' of https://github.com/flame/blis
commit 6074082cd359dd775ef72478f8f3a281c5a6a6f9
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Wed Aug 1 13:30:51 2018 -0500
Fixed bug in bli_cntx_set_packm_ker_dt() implementation.
Details:
- Fixed bug in static function bli_cntx_set_[packm/unpackm]_ker_dt(), which
were incorrectly calling bli_cntx_get_[packm/unpackm]_ker_dt to get the
corresponding func_t.
commit ebe998d06cc56a9a9d66990b6ebf683d6fd0efdf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 1 13:24:00 2018 -0500
Fixed typos in BuildSystem.md from previuos commit.
commit e72a344e94c5ae253f69b60f41d92ca89a5d1d1c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 1 13:00:38 2018 -0500
Added table of 'make' targets to BuildSystem.md.
Details:
- Added a new section to BuildSystem.md that describes the most useful
make targets defined in the top-level Makefile.
commit 4f60d0288e00586dc921ff57db851f1266ff8e70
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 30 19:22:57 2018 -0500
README.md, comment updates.
Details:
- Added links, and sandbox language to README.md.
- Adjusted some comments in high-level level-3 object functions to make
clear what bli_thread_init_rntm() does.
commit 455d3f49e5c8362395be14c79e6adb5123e29623
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 29 18:31:29 2018 -0500
Edits to object/typed API, multithreading docs.
commit 922a1c05e06f52c97fb369870dce07233e61c4c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 20:15:55 2018 -0500
More tweaks to README.md.
commit a7a0cf2b5d9f1dea5061c0f20eeaf371dfd4ea12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:59:31 2018 -0500
More edits to docs/Multithreading.md.
commit be21d0cf68c330fd0d2048465a43ddc59d0b9d6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:46:51 2018 -0500
Fixed typos in docs/Multithreading.md.
commit eac07c7b4f7a41c68d63f1e67141b2b58009609e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:45:28 2018 -0500
Edits to docs/Multithreading.md.
commit 5438375a032273b46ae626fee909ffc05f48ab72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:34:21 2018 -0500
Fixed link in README.md.
commit 1f1a237d3f0b24d71ce2d7ee52d8a84f8e6a29ad
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:33:28 2018 -0500
Fixed links in BLISTypedAPI.md.
commit 89c8806e3aa49310f36c0314c5f6956c83a627a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:30:56 2018 -0500
Minor doc fixes to previous commit.
commit b8c7574f84873b9c408f70c29c41ce464df57c2d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 28 16:27:09 2018 -0500
README.md, typed/object API updates.
Details:
- Updated the typed and object APIs to include language on the rntm_t
parameters in the expert interfaces.
- Updated README to include link to object API.
commit 29c34c4adb02d91fb34d1ccc0e821d6cfb7ce5c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 27 16:26:19 2018 -0500
CREDITS file update.
commit 55a04edf52ac4f16c51b738bc884684adc1f1777
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 27 16:10:46 2018 -0500
CHANGELOG update (0.4.0)
commit 4ad61ce905d250dd3ef197f0d06a69ce6d99d309
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 27 16:10:43 2018 -0500
Version file update (0.4.0)
commit b86cf13793b07f35c027a56c9faec8f4b6279d3e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 27 16:08:21 2018 -0500
Release Notes update in advance of next version.
commit a8b4084a0e04e47ac02ceae93a2018f5363e1205
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 27 16:07:26 2018 -0500
CREDITS file update.
commit 8e10cac5f388ac961c3d77b0a465214e7c9dc91a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 27 14:45:35 2018 -0500
Updates to CREDITS, RELEASING, config/README.md.
Details:
- Added individuals' github handles to CREDITS file.
- Updated RELEASING, config/README.md files.
commit 401b69c8f26a86726ac5e1fb4f9fc2d2098ef204
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 25 17:55:13 2018 -0500
More indentation in docs/ConfigurationHowTo.md.
commit 1c6a1b921ef96999bb449d657cca6d9a556f7245
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 25 17:14:58 2018 -0500
Trying new indentation in ConfigurationHowTo.md.
Details:
- Modified a few sections to take advantage of a feature of markdown
that allows a bullet or enumeration to have multiple paragraphs. This
is a trial run to make sure the indentation looks good when rendered
in a web browser.
commit 71f978719527fcf17617cb234e48bf349a76c12d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 25 15:55:36 2018 -0500
Whitespace changes to macrokernels' func ptr defs.
commit 87d57c31c2bfcf4609dfe31ce915e9345150e613
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 25 14:20:18 2018 -0500
Various minor updates to typed, object API docs.
commit fb6e16268aaafbab2fd78d47cbf821e2152261fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 25 14:17:28 2018 -0500
Consolidated prototypes in bli_l1v_tapi.h.
Details:
- Consolidated typed API function prototypes in bli_l1v_tapi.h by
leveraging identical function signatures between operations.
- Removed 'restrict' keyword since it is not actually present in the
function definitions.
commit af60d738f21340ccb0903e6c87dbf6af4fc44fc0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 24 15:35:52 2018 -0500
Finished object creation part of BLISObjectAPI.md.
Details:
- Filled in remaining section on object creation function reference
of BLISObjectAPI.md. All object management functions demonstrated as
part of the example code in examples/oapi are now documented, as well
as some other functions that are not shown in the example code.
- Updated variuos links (mostly in function index) to correctly point to
the object API reference instead of the typed API reference.
- Added documentation to getijm, setijm.
commit 8217a6a3b68382c62f016c658d337e6086112fef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 24 13:13:10 2018 -0500
Moved sandbox README.md to docs/Sandboxes.md.
Details:
- Relocated sandbox/ref99/README.md to docs/Sandboxes.md and made minor
edits to the document.
commit b7db29332394324ffd1a73c3847a75e9a5b38c8d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 19 11:14:30 2018 -0500
Explicitly typecast return vals in static funcs.
Details:
- Added explicit typecasting to various functions (mostly static
functions), primarily those in bli_param_macro_defs.h,
bli_obj_macro_defs.h, bli_cntx.h, bli_cntl.h, and a few other header
files.
- This change was prompted by feedback from Jacob Gorm Hansen, who
reported that #including "blis.h" from his application caused a
gcc to output error messages (relating to types being returned
mismatching the declared return types) when used via the C++ compiler
front-end. This is the first pass of fixes, and we may need to
iterate with additional follow-up commits (#233).
commit fa08e5ead95f9d757af6ab5b095a8bf131e3874d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 17 19:02:15 2018 -0500
Fixed minor issues in ecbebe7 with mt disabled.
Details:
- Fixed an unused variable warning in frame/base/bli_rntm.c when
multithreading is disabled.
- Fixed a missing variable declaration in bli_thread_init_rntm_from_env()
when multithreading is disabled.
commit ecbebe7c2e43950dfa369f71c2b83cabe348a046
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 17 18:37:32 2018 -0500
Defined rntm_t to relocate cntx_t.thrloop (#235).
Details:
- Defined a new struct datatype, rntm_t (runtime), to house the thrloop
field of the cntx_t (context). The thrloop array holds the number of
ways of parallelism (thread "splits") to extract per level-3
algorithmic loop until those values can be used to create a
corresponding node in the thread control tree (thrinfo_t structure),
which (for any given level-3 invocation) usually happens by the time
the macrokernel is called for the first time.
- Relocating the thrloop from the cntx_t remedies a thread-safety issue
when invoking level-3 operations from two or more application threads.
The race condition existed because the cntx_t, a pointer to which is
usually queried from the global kernel structure (gks), is supposed to
be a read-only. However, the previous code would write to the cntx_t's
thrloop field *after* it had been queried, thus violating its read-only
status. In practice, this would not cause a problem when a sequential
application made a multithreaded call to BLIS, nor when two or more
application threads used the same parallelization scheme when calling
BLIS, because in either case all application theads would be using
the same ways of parallelism for each loop. The true effects of the
race condition were limited to situations where two or more application
theads used *different* parallelization schemes for any given level-3
call.
- In remedying the above race condition, the application or calling
library can now specify the parallelization scheme on a per-call basis.
All that is required is that the thread encode its request for
parallelism into the rntm_t struct prior to passing the address of the
rntm_t to one of the expert interfaces of either the typed or object
APIs. This allows, for example, one application thread to extract 4-way
parallelism from a call to gemm while another application thread
requests 2-way parallelism. Or, two threads could each request 4-way
parallelism, but from different loops.
- A rntm_t* parameter has been added to the function signatures of most
of the level-3 implementation stack (with the most notable exception
being packm) as well as all level-1v, -1d, -1f, -1m, and -2 expert
APIs. (A few internal functions gained the rntm_t* parameter even
though they currently have no use for it, such as bli_l3_packm().)
This required some internal calls to some of those functions to
be updated since BLIS was already using those operations internally
via the expert interfaces. For situations where a rntm_t object is
not available, such as within packm/unpackm implementations, NULL is
passed in to the relevant expert interfaces. This is acceptable for
now since parallelism is not obtained for non-level-3 operations.
- Revamped how global parallelism is encoded. First, the conventional
environment variables such as BLIS_NUM_THREADS and BLIS_*_NT are only
read once, at library initialization. (Thanks to Nathaniel Smith for
suggesting this to avoid repeated calls getenv(), which can be slow.)
Those values are recorded to a global rntm_t object. Public APIs, in
bli_thread.c, are still available to get/set these values from the
global rntm_t, though now the "set" functions have additional logic
to ensure that the values are set in a synchronous manner via a mutex.
If/when NULL is passed into an expert API (meaning the user opted to
not provide a custom rntm_t), the values from the global rntm_t are
copied to a local rntm_t, which is then passed down the function stack.
Calling a basic API is equivalent to calling the expert APIs with NULL
for the cntx and rntm parameters, which means the semantic behavior of
these basic APIs (vis-a-vis multithreading) is unchanged from before.
- Renamed bli_cntx_set_thrloop_from_env() to bli_rntm_set_ways_for_op()
and reimplemented, with the function now being able to treat the
incoming rntm_t in a manner agnostic to its origin--whether it came
from the application or is an internal copy of the global rntm_t.
- Removed various global runtime APIs for setting the number of ways of
parallelism for individual loops (e.g. bli_thread_set_*_nt()) as well
as the corresponding "get" functions. The new model simplifies these
interfaces so that one must either set the total number of threads, OR
set all of the ways of parallelism for each loop simultaneously (in a
single function call).
- Updated sandbox/ref99 according to above changes.
- Rewrote/augmented docs/Multithreading.md to document the three methods
(and two specific ways within each method) of requesting parallelism
in BLIS.
- Removed old, disabled code from bli_l3_thrinfo.c.
- Whitespace changes to code (e.g. bli_obj.c) and docs/BuildSystem.md.
commit 323eaaab99752858b12e81e2eb8e416f009a3028
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Fri Jul 13 11:40:06 2018 -0500
Removed left over code from plotting scripts.
commit 60c197736495b47ce974ffb9b43874d1ebcfe78c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 12 19:22:14 2018 -0500
Documented accessor functions in BLISObjectAPI.md.
Details:
- Added documentation to docs/BLISObjectAPI.md for a handful of
commonly-used obj_t accessor functions.
- Minor updates to docs/BLISTypedAPI.md.
commit 77327ad796e11ef67df0cc91d45ed663598ba4df
Merge: 73b0b2a3 9fef8575
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Thu Jul 12 17:09:33 2018 -0500
Merge branch 'master' of https://github.com/flame/blis
commit 73b0b2a3ac1be6dfbe85c116886b4e29d98ac945
Author: Devangi N. Parikh <dnp@cs.utexas.edu>
Date: Thu Jul 12 16:53:10 2018 -0500
Created hardware-specific test driver directory.
Details:
- Created a 'studies' subdirectory within 'test' to be used to house
test drivers, makefiles, run scripts, matlab plot code, and related
files that have been customized for collecting performance data on
specific host machines or product lines. This new setup will help us
catalog, track, and share test driver materials over time, and in a
way that facilitates reproducibility.
- Created an 'skx' subdirectory within 'test/studies' to house various
level-3 test driver files used to measure performance on SkylakeX
nodes (specifically, those nodes used by TACC's stampede2 system).
commit 9fef85756d15ee0f977fff6e57acd01c20cba184
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 11 18:40:30 2018 -0500
Cleaned up loose ends in BLISObjectAPI.md.
Details:
- Deleted some lines from the API function signatures that did not
belong (and were only left over from the copy-paste of the typed API).
- Fixed some paragraph-in-bullet indentation.
commit 80ddeae4629022b69fdf1f1b053a1fcba643c40c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 11 18:31:57 2018 -0500
Added BLISObjectAPI.md to docs.
Details:
- Added first draft of BLISObjectAPI.md. (Object management section is
still missing.)
- Small fixes to BLISTypedAPI.md found while writing BLISObjectAPI.md.
- In various .md files, changed ``` verbatim blocks to language
attributes (e.g. ```c for C code).
commit 038442add39ce629fee0d960b212ce0c95138d46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 11 12:24:18 2018 -0500
Added -lpthread to makefile example in BuildSystem.md.
Details:
- Added missing pthreads library linking to example makefile in
docs/BuildSystem.md, as well as similar language to build requirements
at the beginning of the document. Thanks to Stefanos Mavros for
bringing this to our attention.
- Updated CREDITS file.
commit bf10d8624e7b5902c9d9189c7c93f318b8e1b9a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 9 18:40:13 2018 -0500
Small updates to KernelsHowTo.md, BLISTypedAPI.md.
Details:
- Minor updates to BLISTypedAPI.md, mostly to bring terminology
up-to-date with the new "typed API" classification.
- Added contents section to KernelsHowTo.md.
commit 1fd3bce59e43b422e62f9684bca9d1296a29edc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 9 18:20:11 2018 -0500
Further updates to KernelsHowTo.md, BLISTypedAPI.md.
Details:
- Added missing level-1v operations to BLISTypedAPI (e.g. axpbyv,
xpbyv).
- Updated broken linkes in KernelsHowTo.md based on misnamed anchors.
- Other minor changes.
commit c40d30a6c920bd2e5a8353a3cd07a7e2b2265758
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 9 17:55:54 2018 -0500
Updated KernelsHowTo.md, BLISTypedAPI.md.
Details;
- Added missing (basic) information in KernelsHowTo.md for level-1f and
level-1v kernels.
- Updated section regarding contexts.
commit f8913c2bf91c0e0fb4e68aedf64a242a19db92a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 20:35:13 2018 -0500
Fixed outdated scalv() calls in penryn l1f kernels.
Details:
- Fixed stale calls to dscalv() from the dotxf and dotxaxpyf penryn
kernels that were not updated during the basic/expert API separation
in e88aeda.
commit e78e71d549ac17ecd52c7b33008df1cd78f1b59e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 20:18:09 2018 -0500
Added README.md mention/link to examples/tapi.
Details:
- Added language to README.md to bring the reader's attention to the
example code for the typed API (in addition to those for the object
API).
commit 419ffb158573a26bfec47bac73e4394e7926a7b8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 20:14:23 2018 -0500
Updates to README.md.
Details:
- Updated wiki links according to renamed/relocated files in 'docs'.
- Converted links to relative paths.
- Added link to docs/Multithreading.md.
commit 7d3e8a7e5f1ec299d009fb6c9071f0c1b089b460
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 20:01:29 2018 -0500
Reverted docs/*.md links to relative paths.
Details:
- Within the documents in docs/*.md, reverted links to other local
documents to relative paths.
- Fixed some links/documents that did not yet have the '.md' suffix.
- Testing whether we can use relative links ('docs/BLISTypedAPI.md')
from within README.md.
commit d97c862c2b9170d774f414e63ae365488fffb4f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 19:40:41 2018 -0500
Updated links (URLs) in docs/*.md.
Details:
- Updated most markdown links in the documents/wikis to use absolute
paths instead of the relative paths that were in use previously.
A few links were not updated, except for adding a ".md" to reflect
the documents' new names, in order to test whether relative
linking still works.
commit 3a0c12135875e0fb04de9798664e4fae632d994e
Merge: 2c7960c8 bcacddfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 16:51:38 2018 -0500
Merge branch 'dev'
commit bcacddfad75b20969660606751eea6ead6c42ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 16:45:29 2018 -0500
Added 'docs' directory with wiki markdown files.
Details:
- Exported all github wikis to a new 'docs' directory.
- Renamed 'BLISAPIQuickReference' wiki to 'BLISTypedAPI' and removed
all cntx_t* arguments from the (now non-expert) APIs (with the
exception of the kernel APIs).
- Added section to BuildSystem documenting new ARG_MAX hack.
commit 3ee2bc0f7aa3b08da92331d64271bee99eaf8c1d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 7 16:02:16 2018 -0500
Renamed files that distinguish basic/expert APIs.
Details:
- Renamed various files that were previously named according to a
"with context" or "without context" convention. For example, the
following files in frame/3 were renamed:
frame/3/bli_l3_oapi_woc.c -> frame/3/bli_l3_oapi_ba.c
frame/3/bli_l3_oapi_wc.c -> frame/3/bli_l3_oapi_ex.c
frame/3/bli_l3_tapi_woc.c -> frame/3/bli_l3_tapi_ba.c
frame/3/bli_l3_tapi_wc.c -> frame/3/bli_l3_tapi_ex.c
Here, the "ba" is for "basic" and "ex" is for "expert". This new
naming scheme will make more sense especially if/when additional
expert parameters are added to the expert APIs (typed and object).
commit e88aedae735dfeb6fa5ac28d4527eb3ca58c6510
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 6 19:14:02 2018 -0500
Separated expert, non-expert typed APIs.
Details:
- Split existing typed APIs into two subsets of interfaces: one for use
with expert parameters, such as the cntx_t*, and one without. This
separation was already in place for the object APIs, and after this
commit the typed and object APIs will have similar expert and non-
expert APIs. The expert functions will be suffixed with "_ex" just as
is the case for expert interfaces in the object APIs.
- Updated internal invocations of typed APIs (functions such as
bli_?setm() and bli_?scalv()) throughout BLIS to reflect use of the
new explictly expert APIs.
- Updated example code in examples/tapi to reflect the existence (and
usage) of non-expert APIs.
- Bumped the major soname version number in 'so_version'. While code
compiled against a previous version/commit will likely still work
(since the old typed function symbol names still exist in the new API,
just with one less function argument) the semantics of the function
have changed if the cntx_t* parameter the application passes in is
non-NULL. For example, calling bli_daxpyv() with a non-NULL context
does not behave the same way now as it did before; before, the
context would be used in the computation, and now the context would
be ignored since the interace for that function no longer expects a
context argument.
commit 331694e52414c0cd50048daf880a9ace9e29b94a
Author: Isuru Fernando <isuruf@gmail.com>
Date: Fri Jul 6 09:07:38 2018 -0600
Fix windows build and enable x86_64 on appveyor (#230)
* Upload artifacts built on appveyor (#228)
* Upload artifacts
* Fix install in appveyor
* Remove windows.h in bli_winsys.c (#229)
Looks like it is unneeded.
* Implemented ARG_MAX hack in configure, Makefile.
Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the @
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.
* Enable x86_64 and arg-max-hack on appveyor
* Use gas style assembly for clang on windows
commit a64a780d28c99d35f237f59212772e9beff35b3e
Merge: 89e178ce 3cb396d1
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 6 09:38:42 2018 -0500
Merge pull request #231 from flame/travis-pr
Disable SDE for PRs
commit 3cb396d1ae4ee569f862db201c6a976712fd128e
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 6 09:19:44 2018 -0500
Disable SDE for PRs
Pull requests cannot use Travis secret variables, so SDE needs to be disabled. This PR should suffice as a test.
commit 2c7960c8416ee9b67364be5f2b210fd7a0aec4b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 5 14:38:33 2018 -0500
Implemented ARG_MAX hack in configure, Makefile.
Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the @
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.
commit c422a5cd191d47e6aeb9cea6de0e348f46e3e318
Merge: b6470262 89e178ce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 5 12:33:35 2018 -0500
Merge branch 'dev'
commit b6470262ea66c0f48a5b4d85ca4bf85c1fb2b3af
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Jul 4 19:14:29 2018 -0600
Remove windows.h in bli_winsys.c (#229)
Looks like it is unneeded.
commit eac4bdf98691c5ec784af0dc11d1ad2269840661
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Jul 4 18:31:01 2018 -0600
Upload artifacts built on appveyor (#228)
* Upload artifacts
* Fix install in appveyor
commit 89e178ce380439dea951925e33703dc4b979e914
Merge: d868eb3e e32b2ef9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 4 17:51:16 2018 -0500
Merge branch 'master' into dev
commit e32b2ef983ea1c3521dd3821116c0078690f125e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 4 17:49:39 2018 -0500
Update to CREDITS file.
commit 14648e137696484e0ff04f89b16c6b4183ea42b8
Author: Isuru Fernando <isuruf@gmail.com>
Date: Wed Jul 4 16:48:42 2018 -0600
Native windows support using clang (#227)
* Add appveyor file
* Build script
* Remove fPIC for now
* copy as
* set CC and CXX
* Change the order of immintrin.h
* Fix testsuite header
* Move testsuite defs to .c
* Fix appveyor file
* Remove fPIC again and fix strerror_r missing bug
* Remove appveyor script
* cd to blis directory
* Fix sleep implementation
* Add f2c_types_win.h
* Fix f2c compilation
* Remove rdp and rename appveyor.yml
* Remove setenv declaration in test header
* set CPICFLAGS to empty
* Fix another immintrin.h issue
* Escape CFLAGS and LDFLAGS
* Fix more ?mmintrin.h issues
* Build x86_64 in appveyor
* override LIBM LIBPTHREAD AR AS
* override pthreads in configure
* Move windows definitions to bli_winsys.h
* Fix LIBPTHREAD default value
* Build intel64 in appveyor for now
commit b45ea92fc6f77f2313b50dbe95922f838cbead07
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 3 18:27:29 2018 -0500
Added typed (BLAS-like) API code examples.
Details:
- Added new example code to examples/tapi demonstrating how to use the
BLIS typed API. These code examples directly mirror the corresponding
example code files in examples/oapi. This setup provides a convenient
opportunity for newcomers to BLIS to compare and contrast the typed
and object APIs when they are used to perform the same tasks.
- Minor cleanups to examples/oapi.
commit d868eb3e200f657a1284c4cc933e7a4d25260dce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 29 12:36:04 2018 -0500
Implemented bli_obj_scalar_cast_to().
Details:
- Implemented bli_obj_scalar_cast_to(), which will typecast the value in
the internal scalar of an obj_t to a specified datatype.
- Changed bli_obj_scalar_attach() so that the scalar value being attached
is first typecast to the storage datatype of the destination object
rather than the target datatype.
- Reformatted function type signatures in bli_obj_scalar.c as well as
prototypes in its corresponding header file.
commit 52d80b5f09517d80ac8a7c96983a576c1ec2080b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 29 12:30:44 2018 -0500
Fixed static funcs related to target and exec dts.
Details:
- Fixed incorrect bit shifts in the following static functions:
bli_obj_set_target_domain()
bli_obj_set_target_prec()
bli_obj_set_exec_domain()
bli_obj_set_exec_prec()
- Fixed incorrect bitmask in bli_dt_proj_to_single_prec().
- Updated bli_obj_real_part() and bli_obj_imag_part() so that it updates
the target and exec datatypes (in addition to the storage datatypes).
commit e006f2d0eeb229c1cd05a424496a774c29bdc5d7
Merge: bd8c55fe dafca7a0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 27 15:54:38 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit bd8c55fe268e8e352508341ebd739ef4fc68eb92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 27 15:52:37 2018 -0500
Added dt_on_output field to auxinfo_t.
Details:
- Added a new field to the auxinfo_t struct that can be used, in theory,
to request type conversion before the microkernel stores/accumulates
its microtile back to memory.
- Added the appropriate get/set static functions to bli_type_defs.h.
commit dafca7a0c2c72aaf15cb588b2bef6f246abb1905
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jun 25 16:20:10 2018 -0500
Fix botched memory addressing in Penryn kernel (no effect for GAS output).
commit de493b0f349efebab98ab17f063d4d3d932c24c3
Merge: 195480be a7166feb
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jun 25 14:26:06 2018 -0500
Merge pull request #226 from devinamatthews/dev
Finish macroization of assembly ukernels.
commit 195480beb589db7d582646f556e855c611d4c3a9
Merge: 07c3d0a9 3f387ca3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 25 13:24:21 2018 -0500
Merge branch 'master' into dev
commit 3f387ca35e42519f0d6a154814e4c8800fa2acb8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 25 12:32:03 2018 -0500
Fixed bugs in configure's select_cc() function.
Details:
- This commit fixes several bugs in configure relating to selecting a C
compiler. By dumb luck, two of the two bugs sort of cancelled each
other out in most use cases, which manifested as the expected behavior.
Thanks to Mathieu Poumeyrol for bringing this issue to our attention,
and to Devin Matthews for suggesting the more portable way of
capturing both stdout and stderr and suggesting a return code check
instead of testing stdout/stderr.
- The first bug: As the values of the compiler search list are iterated
over, only stderr is captured when querying a compiler with --version
rather than both stdout and stderr.
- The second bug: After each query, a conditional attempted to test
whether the query resulted in anything being output. That conditional
erroneously was using "-z" instead of "-n" for non-emptiness. Thus,
most of the time, stderr was empty (because the --version info was
being output on stdout), and since it was empty, the -z conditional
(intended to execute only when a compiler was found to be responsive)
executed.
- A third bug was also fixed in the way that the merged stdout/stderr
output was tested for non-emptiness (moving the 'cat' invocation to
another line and testing the contents of a variable instead).
- The three bugs above have been fixed as part of a partial rewrite of
the select_cc() function in terms of a return code check, which
obviated the need to save the output of stdout and stderr.
- The fourth bug involved a misnamed variable in the right-hand side
of a statement intended to prepend CC to search_list when CC was
non-empty. This typically did not manifest as a bug since usually CC
(if it was set) was set to a value that was known to work.
commit a7166feb1053814b7dd27f3879ae38acfc9637fc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jun 25 12:09:18 2018 -0500
Finish macroization of assembly ukernels.
commit f986396c2af5de06283b9834112782afd0a8907e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 22 18:12:40 2018 -0500
Added 'configure --help' text for CFLAGS, LDFLAGS.
Details:
- Added mention of the new support for preset CFLAGS, LDFLAGS to the
bottom of the text output by './configure --help'.
- Updated usage example to use 'haswell' instead of 'sandybridge'.
commit 884175d9ffb62e49535e6c1f7d58fb3b83e7e78f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 22 18:08:43 2018 -0500
Added configure support for preset CFLAGS, LDFLAGS.
Details:
- Any preexisting values set to the CFLAGS environment variable (or the
CFLAGS variable if given on the command line) are saved by configure
for later inclusion (prepending, to be precise) along with the
compiler flags automatically determined by the BLIS build system.
LDFLAGS is treated in a similar manner.) Thanks to Dave Love for
requesting this feature in issue #223 and Mathieu Poumeyrol for his
support on this and a previous related issue.
- Comment updates to build/config.mk.in.
- Strip whitespace from return value of various cflags functions in
common.mk.
commit 07c3d0a95190bd23f0cd2ef220deb3384d8378d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 21 12:35:07 2018 -0500
Update to CREDITS file.
commit a1ebbbf158c7b34c9032ef45431bc610b6f14858
Merge: 17928b1c c81c6f23
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jun 20 15:37:53 2018 -0500
Merge pull request #224 from devinamatthews/asm-macros
Asm macros
commit c81c6f23b9547b5d55ae68fd5a3bbd8a78290b6b
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jun 20 15:20:44 2018 -0500
Fix problem with inc and dec macros.
commit 5a63971c822fd452f97ba869625c8e87f6cbeebc
Merge: b4d94e54 17928b1c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jun 20 14:07:49 2018 -0500
Merge remote-tracking branch 'upstream/dev' into asm-macros
commit b4d94e54d44cf30e4bb452ca5263be3473c0582d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jun 20 14:07:24 2018 -0500
Convert x86 microkernels to assembly macros.
commit 17928b1c9941aa58aef1f122c793e2b14e705267
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 19 17:59:03 2018 -0500
Added static funcs bli_dt_domain(), bli_dt_prec().
Details:
- Added definitions of static functions bli_dt_domain()/bli_dt_prec(),
which extract a dom_t domain or prec_t precision value, respectively,
from a num_t datatype.
- Changed the return types of bli_obj_domain() and bli_obj_prec() from
objbits_t to dom_t and prec_t. (Not sure why they were ever set to
return objbits_t.)
commit 5f7fbb7115b1bf532c169dfd9adef84c41a95031
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 19 15:38:55 2018 -0500
Static funcs for projecting dt to single/double.
Details:
- Added static functions for projecting a datatype to single precision
or double precision, both for obj_t's storage datatypes and standalone
datatypes.
commit d4a22702c7a90273dc14f271db465c2e11e5b87e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 19 14:54:57 2018 -0500
Set up haswell config for optional col-pref ukrs.
Details:
- Added two presently-disabled cpp blocks in bli_cntx_init_haswell.c to
easily allow one to switch to a set of column-preferential gemm
microkernels (in the haswell subconfiguration). The second column-
preferring block sets the the register blocksizes to their appropriate
values. However, cache blocksizes are left unchanged, and therefore are
likely suboptimal. This should be addressed later.
commit f317c2e31bfc329cb6bb4e06005e45b9c8a9d6a7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 19 12:21:23 2018 -0500
Added get/set static funcs for exec dt/dom/prec.
Details:
- Added functions to bli_obj_macro_defs.h to get and set the target
domain and target precision bits in the obj_t, and also added the
appropriate support in bli_type_defs.h.
commit e88a5b8da8c26caebd2b0fb73b30836fb5417c9c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 18 15:56:26 2018 -0500
Implemented castm, castv operations.
Details:
- Implemented castm and castv operations, which behave like copym and
copyv except where the obj_t operands can be of different datatypes.
These new operations, however, unlike copym/copyv, do not build upon
existing level-1v kernels.
- Reorganized projm, projv into a 'proj' subdirectory of frame/base (to
match the newly added frame/base/cast directory).
- Added new macros to bli_gentfunc_macro_defs.h, _gentprot_macro_defs.h
that insert GENTFUNC2/GENTPROT2 macros for all non-homogeneous datatype
combinations. Previously, one had to invoke two additional macros--one
which mixed domains only and another that included all remaining
cases--in order to get full type combination coverage.
- Defined a new static function, bli_set_dims_incs_2m(), to aid in the
setting of various variables in the implementations of bli_??castm().
This static function joins others like it in bli_param_macro_defs.h.
- Comment update to bli_copysc.h.
commit 2000cdff59272974438e88e0e82d8e1a32710325
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 18 14:17:28 2018 -0500
Update to CREDITS file.
commit ed2c8aed848ba2dede18df090cf2e0b6e4cc059f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 18 11:49:34 2018 -0500
Temporarily disabled small matrix handling on zen.
Details:
- Disabled small matrix handling in config/zen/bli_family_zen.h due to
what appears to be a bug that manifests as failures in the single and
double precision real level-3 BLAS test drivers (visible via
out.sblat3 and out.dblat3). Thanks to Robin Christ for reporting this
issue.
commit ed20392c500940bfc0947795c1ff7c8c24f8e26f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 15 16:31:22 2018 -0500
Added get/set static funcs for exec dt/dom/prec.
Details:
- Added functions to bli_obj_macro_defs.h to get and set the execution
domain and execution precision bits in the obj_t.
- Added/rearranged a few functions in bli_obj_macro_defs.h.
- Renamed some macros in bli_type_defs.h: EXECUTION -> EXEC.
commit 22594e8e9ab55f5bc0e69d96a23e128502849999
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 14 17:35:23 2018 -0500
Updated sandbox/ref99 according to f97a86f.
Details:
- Applied changes to ref99 sandbox analagous to those applied to
framework code in f97a86f. This involves setting the pack schemas of
A and B objects temporarily to communicate those desired schemas to
the control tree creation function in blx_gemm_cntl.c. This allows us
to (henceforth) query the schemas from the control tree rather than
the context.
commit 1b5d0424d2c7e5eac33e02359c12917ef280949f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 13 18:41:32 2018 -0500
Prototype column-preferential zen gemm ukernels.
Details:
- Added prototypes to bli_kernels_zen.h for each of the four gemm
microkernels that prefer outputting to column storage.
commit f88c2e7a539e383297e846e6d4647058dd3db128
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 13 18:27:46 2018 -0500
Defined static function bli_blksz_scale_def_max().
Details:
- Added a new static function to bli_blksz.h that scales both the default
(regular) blocksize as well as the maximum blocksize in the blksz_t
object. Reminder: maximum blocksizes have different meanings in
different contexts. For register blocksizes, they refer to the packing
register blocksizes (PACKMR or PACKNR) while for cache blocksizes, they
refer to the maximum blocksize to use during the final iteration of a
loop.
commit 87db5c048e0c7f37351fda486abaf7d19fc5821c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 12 19:38:37 2018 -0500
Changed usage of virtual microkernel slots in cntx.
Details:
- Changed the way virtual microkernels are handled in the context.
Previously, there were query routines such as bli_cntx_get_l3_ukr_dt()
which returned the native ukernel for a datatype if the method was
equal to BLIS_NAT, or the virtual ukernel for that datatype if the
method was some other value. Going forward, the context native and
virtual ukernel slots will both be initialized to native ukernel
function pointers for native execution, and for non-native execution
the virtual ukernel pointer will be something else. This allows us
to always query the virtual ukernel slot (from within, say, the
macrokernel) without needing any logic in the query routine to decide
which function pointer (native or virtual) to return. (Essentially,
the logic has been shifted to init-time instead of compute-time.)
This scheme will also allow generalized virtual ukernels as a way
to insert extra logic in between the macrokernel and the native
microkernel.
- Initialize native contexts (in bli_cntx_ref.c) with native ukernel
function addresses stored to the virtual ukernel slots pursuant to
the above policy change.
- Renamed all static functions that were native/virtual-ambiguous, such
as bli_cntx_get_l3_ukr_dt() or bli_cntx_l3_ukr_prefers_cols_dt()
pursuant to the above polilcy change. Those routines now use the
substring "get_l3_vir_ukr" in their name instead of "get_l3_ukr". All
of these functions were static functions defined in bli_cntx.h, and
most uses were in level-3 front-ends and macrokernels.
- Deprecated anti_pref bool_t in context, along with related functions
such as bli_cntx_l3_ukr_eff_dislikes_storage_of(), now that 1m's
panel-block execution is disabled.
commit dbaf440540837b03643190cd685ed889fa7fd212
Merge: 22aa44eb 2610fff0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 11 12:37:04 2018 -0500
Merge branch 'master' into dev
commit 2610fff0b07bdb345cb2e334ef6bea0c63c8cead
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 11 12:32:54 2018 -0500
Renamed 1m packm kernels from _1e to _1er.
Details:
- Renamed the reference packm kernels used by 1m. Previously, they used
a _1e suffix, which was confusing since they packed to both 1e and 1r
schemas. This was likely an artifact of the time when there were
separate kernels for each schema before I decided to combine them into
a single function (per datatype and panel dimension), and the 1e
functions were the ones to inherit the 1r functionality. The kernels
have now been renamed to use a _1er suffix.
commit 7af5283dcc3dded114852d6013d33134021b81aa
Author: sraut <Biplab.Raut@amd.com>
Date: Mon Jun 11 15:00:22 2018 +0530
added check condition on n-dimension for XA'=B intrinsic code to process till 128 size
Change-Id: I95d020a5ca3ea21d446b8c2e379d56e1eea18530
commit 712de9b371a8727682352a2f52cd4880de905f0b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 9 14:36:30 2018 -0500
Added missing semicolon in 03obj_view.c
Details:
- Thanks to Tony Skjellum for pointing out this typo due to a
last-minute change to the source prior to committing.
commit 043d0cd37ef4a27b1901eeb89d40083cfb2a57ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 9 13:46:49 2018 -0500
Implemented bli_acquire_mpart(), added example code.
Details:
- Implemented bli_acquire_mpart(), a general-purpose submatrix view
function that will alias an obj_t to be a submatrix "view" of an
existing obj_t.
- Renumbered examples in examples/oapi and inserted a new example file,
03obj_view.c, which shows how to use bli_acquire_mpart() to obtain
submatrix views of existing objects, which can then be used to
indirectly modify the parent object.
commit f1908d39767baef56077def69126d96f805ee27e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 8 14:22:22 2018 -0500
Fixed broken input.operations.fast.
Details:
- Removed three input lines from input.operations.fast (labeled
"test sequential micro-kernel") that I intended to remove in bd02c4e.
These lines prevented 'make check' (and 'make checkblis-fast') from
completing correctly. Note: This bug was fixed in 3df39b3, but that
commit has not yet been merged into master, hence this redundant
commit. Thanks to Robert van de Geijn for reporting this issue.
commit 262a62e3482c5caa947a89cabb562b5887555bd6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 8 12:10:54 2018 -0500
Fixed undefined ref in steamroller/excavator configs.
Details:
- Fixed erroneous calls to bli_cntx_init_piledriver_ref() in
bli_cntx_init_steamroller() and bli_cntx_init_excavator(), which
should have been to their respectively-named bli_cntx_init_*()
functions instead. Thanks to qnerd for bringing these bugs to our
attention.
commit 22aa44ebec2c7884bdc944775a1aa7534ab53f0d
Merge: 65fae950 b65d0b84
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 7 17:42:59 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit 65fae95074d239354737355bbe6f202d4f8b2871
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 7 17:41:09 2018 -0500
Implemented bli_setrm, _setim, _setrv, _setiv.
Details:
- Defined new wrappers to setm/setv operations in frame/base/bli_setri.c
that will target only the real or only the imaginary parts of a
matrix/vector object.
- Updated bli_obj_real_part() so that the complex-specific portions of
the function are not executed if the object is real.
- Defined bli_obj_imag_part().
- Caveat: If bli_obj_imag_part() is called on a real object, it does
nothing, leaving the destination object untouched. The caller must
take care to only call the function on complex objects.
- Reordered some of the static functions in bli_obj_macro_defs.h related
to aliasing.
commit b65d0b841b7e4357bc2cf743bbb03384a3ab0bfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 7 14:38:41 2018 -0500
Fixed bug in bli_dt_proj_to_complex().
Details:
- Fixed a bug identical to the one fixed in 0a4a27e, except this time in
the bli_obj_param_defs.h header file. It looks like the only consumers
of this static function were in bli_l0_oapi.c, and so this may not have
been manifesting (yet).
commit 55b6abdf7458e31df3ad01796d67c2332c776948
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 7 14:08:12 2018 -0500
Enforce consistent datatypes in most object APIs.
Details:
- Added logic to level-1v, -1d, -1f, -1m, -2, and -3 operations' _check()
functions to ensure that all operands are of the same datatype. There
are some exceptions that were left out, such as the _check() function
for the various norm operations since they have a different idea of
datatype consistency (ie: the norm object must be the real projection
of the primary input vector/matrix object).
commit 513138b1a1ecebd015580423c779810cae5c67f2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 7 12:24:47 2018 -0500
Defined/implemented bli_projv().
Details:
- Added an implementation for bli_projv() to go along with the
implementation of bli_projm() added in 0a4a27e. The only difference
between the two is that bli_projv() may only be used on vectors,
whereas bli_projm() is general-purpose.
- Added a _check() function corresponding to bli_projv().
commit 5f71c1e719eb482b2a4e40daa280c4f7d05b6963
Merge: b5a641e9 3df39b37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 6 19:06:14 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit b5a641e968469805906eb2c971384d12ad1beac5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 6 19:05:37 2018 -0500
Added char-to-dt and dt-to-char mapping functions.
Details:
- Defined additional functions in bli_param_map.c:
bli_param_map_char_to_blis_dt()
bli_param_map_blis_to_char_dt()
which will map a char to its corresponding num_t, or vice versa.
commit 0a4a27e1a4487480410bc0b1bb034bcf97583214
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 6 19:02:29 2018 -0500
Defined/implemented bli_projm().
Details:
- Defined a new operation in frame/base/bli_proj.c, bli_projm(), which
behaves like bli_copym(), except that operands a and b are allowed to
contain data of differing domains (e.g. a is real while b is complex,
or vice versa). The file is named bli_proj.c, rather than bli_projm.c,
with the intention that a 'v' vector version of the function may be
added to the same file (at some point in the future).
- Added supporting bli_check_*() functions in bli_check.c to confirm
consistent precisions between to datatypes/objects, as well as the
appropriate error message in bli_error.c and a new error code in
bli_type_defs.h.
- Wrote a bli_projm_check() function to go along with bli_projm().
- Defined static function bli_obj_real_part() in bli_obj_macro_defs.h,
which will initialize an obj_t alias to the real part of the source
object.
- Fixed a bug in the static function bli_dt_proj_to_complex(), found
in bli_param_macro_defs.h. Thankfully, there were no calls to the
function to produce buggy behavior.
commit 3df39b37a0134befa34b6b6259db98467c7bc965
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 6 15:35:05 2018 -0500
Fixed recently broken input.operations.fast.
Details:
- Removed "test sequential front-end" lines from microkernel test
entries of input.operations.fast. This change was meant for inclusion
in bd02c4e but was missed due to slightly different wording of the
comment (I used "sed //d" to remove the lines). This fixes the broken
'make checkblis-fast' (and 'make check') targets.
commit 695cd520e2f5eab938f66afe9fe36201ab2700c5
Author: sraut <Biplab.Raut@amd.com>
Date: Wed Jun 6 11:48:56 2018 +0530
AMD Copyright information changed to 2018
Change-Id: Idfd11afd5d252f8063d0158680d24bf7e2854469
commit df1dd24fd896821de60917b429f303bab7fd0d4b
Author: sraut <Biplab.Raut@amd.com>
Date: Wed Jun 6 11:24:33 2018 +0530
small matrix trsm intrinsics optimization code for AX=B and XA'=B
Change-Id: I90123c4d9adbd314c867995cd19dc975150b448c
commit 3f48c38164b4135515b5c752c506fdccc4480be2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 5 16:52:35 2018 -0500
Cosmetic fix to configure output in config.mk.
Details:
- Fixed configure so that MK_ENABLE_MEMKIND is assigned "no" when the
option is disabled due to libmemkind not being present. This wasn't
affecting anything since the one use of the variable (in common.mk)
was formulated as "ifeq ($(MK_ENABLE_MEMKIND),yes)". That is, the
variable being empty was effectively equivalent to it being set to
"no".
- Comment updates to build/config.mk.in, common.mk.
commit 5df201260f64aa98a365931f6d2da70144d69932
Merge: 1b9af85e 96d2774b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 5 16:14:19 2018 -0500
Merge branch 'master' into dev
commit 1b9af85ec98d91bb2b27aadaa3df344d18faff35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 5 16:07:13 2018 -0500
Updated ref99 call to _cntx_set_thrloop_from_env().
Details:
- Reordered the arguments in the ref99 sandbox's call to
bli_cntx_set_thrloop_from_env() to be consistent with the updated
function signature from f97a86f. Thanks to Devangi Parikh for
reporting this issue.
commit 96d2774b4cb44ff1e8b5798d7cfc83154a607624
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Tue Jun 5 14:17:39 2018 +0200
Make bli_auxinfo_next_b() return b_next, not a_next (#216)
commit d4c24ea5f644eb635046e7fe249d3e8e58b4c98a
Author: sraut <biplab.raut@amd.com>
Date: Tue Jun 5 15:42:59 2018 +0530
copyright message changed to 2018
Change-Id: I33c1ebda41bc7f1973ff19e3b1947bdad62b4d44
commit 3f1ba4e646776699ebfaa042fe24691d9e2f55d0
Author: sraut <biplab.raut@amd.com>
Date: Tue Jun 5 14:21:13 2018 +0530
copyright changed to 2018
Change-Id: Ie916c7cd6f95aedc3cab6eec3a703c9ddb333bc3
commit bd02c4e9f7fe07487276e61507335d48c8e05f35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 4 13:42:17 2018 -0500
Cleanups to testsuite, input.operations format.
Details:
- Removed the line in each operation entry in input.operations titled
"test sequential front-end" and the corresponding support for the lines
in the testsuite input parsing code. This line was included in the some
of the earliest versions of the testsuite, back when I intended to
eventually have separate multithreaded APIs. Specifically, I envisioned
that multithreaded and sequential testing could be enabled or disabled
on an operation level. However, BLIS evolved in a different direction
and still does not have multithreaded-specific APIs (even if it will
eventually someday). But even if it did have such APIs, I doubt I would
allow the user to enable/disable them on an operation level. Thus, this
was a zombie future parameter that was never used and never made sense
to begin with. The one instance of the front_seq variable, used in the
various libblis_test_<operation>() functions to guard the call to the
operation test driver, that remains was commented out instead of
deleted so that someday it could be easily changed via sed, if desired.
- Various minor cleanups to the testsuite code, including consolidating
use of DISABLE and DISABLE_ALL and reexpressing certain conditional
expressions in the libblis_test_<operation>() functions in terms of
boolean functions.
commit 2c6d99b99e50d70f904da298a0c59be16cc5c180
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jun 3 18:13:36 2018 -0500
Fixed names out of alphabetical order in CREDITS.
commit 7a207e8f2c5046f8b295a78e029ff2de765c7409
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jun 3 18:04:27 2018 -0500
Disabled indirect blacklisting (issue #214).
Details:
- Return early from function, pass_config_kernel_registries(), that
implements indirect blacklisting of subconfigurations (during pass 0).
In short, I realized that indirect blacklisting is not needed in the
situations I envisioned, and can actually cause problems under certain
circumstances. Thanks to Tony Skjellum for reporting the issue (#214)
that led to this commit, and to Devin Matthews for prompting me to
realize that indirect blacklisting was unnecessary, at least as
originally envisioned.
commit d7fb32682057c7458c8891c0eedafc374fd9beef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jun 3 13:20:37 2018 -0500
Fixed syntax artifacts from 4b36e85 in examples.
Details:
- Fixed artifacts of malformed recursive sed expressions used when
preparing 4b36e85, in which most function-like macros were converted
to static functions. The syntactically defective code was contained
entirely in examples/oapi. Thanks to Tony Skjellum for reporting this
issue.
- Update to CREDITS file.
commit ed7dedfd4a07eefeb5a038f9899afb8053b45383
Merge: f97a86f3 469727d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 2 20:29:53 2018 -0500
Merge branch 'master' into dev
commit f97a86f322a6e3e31f33c89befc66189b0b8c64f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 2 20:28:20 2018 -0500
Updated setting/querying pack schema (cntx->cntl).
- Query pack schemas in level-3 bli_*_front() functions and store those
values in the schema bitfields of the correponding obj_t's when the
cntx's method is not BLIS_NAT. (When method is BLIS_NAT, the default
native schemas are stored to the obj_t's.)
- In bli_l3_cntl_create_if(), query the schemas stored to the obj_t's in
bli_*_front(), clear the schema bitfields, and pass the queried values
into bli_gemm_cntl_create() and bli_trsm_cntl_create().
- Updated APIs for bli_gemm_cntl_create() and bli_trsm_cntl_create() to
take schemas for A and B, and use these values to initialize the
appropriate control tree nodes. (Also cpp-disabled the panel-block cntl
tree creation variant, bli_gemmpb_cntl_create(), as it has not been
employed by BLIS in quite some time.)
- Simplified querying of schema in bli_packm_init() thanks to above
changes.
- Updated openmp and pthreads definitions of bli_l3_thread_decorator()
so that thread-local aliases of matrix operands are guaranteed, even
if aliasing is disabled within the internal back-end functions (e.g.
bli_gemm_int.c). Also added a comment to bli_thrcomm_single.c
explaining why the extra aliasing is not needed there.
- Change bli_gemm() and level-3 friends so that the operation's ind()
function is called only if all matrix operands have the same datatype,
and only if that datatype is complex. The former condition is needed
in preparation for work related to mixed domain operands, while the
latter helps with readability, especially for those who don't want to
venture into frame/ind.
- Reshuffled arguments in bli_cntx_set_thrloop_from_env() to be
consistent with BLIS calling conventions (modified argument(s) are
last), and updated all invocations in the level-3 _front() functions.
- Comment updates to bli_cntx_set_thrloop_from_env().
commit 965db85d29977d228ea744581edf2b682eb8e8a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 1 12:32:15 2018 -0500
Updated macro invocations in bli_gemm_ker_var2.c.
Details:
- Updated "get next a/b micropanel" macro invocations in
bli_gemm_ker_var2.c according to changes in 9588625.
- Comment update in bli_cntx.c.
commit 8749fa0b48a7710f4115023e2c46bc80167bc8f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 31 12:34:01 2018 -0500
Cleanups to ref99/README.md, test/3m4m/Makefile.
Details:
- Minor edits to sandbox/ref99/README.md.
- Removed cpp guards in sandbox/ref99/thread/blx_gemm_thread.h to be
consistent with other headers in sandbox/ref99.
- Additional targets and related cleanups in test/3m4m/Makefile.
commit 9588625c43c86ef1bde8140f620a30f52420e6a6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 30 15:19:53 2018 -0500
Renamed "next micropanel" macros in _l3_thrinfo.h.
Details:
- Renamed several macros defined in bli_l3_thrinfo.h designed to compute
the values of a_next and b_next to insert into an auxinfo_t struct in
level-3 macrokernels. (Previously, the macros did not use a bli_
prefix.)
- Updated instances of above macro usage within various macrokernels.
commit e4420591225fca2f63ca74ef6a23b962fcd4bec0
Merge: 34f974d1 850a8a46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 29 17:12:22 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit 34f974d1a83a7d29ba09f67e392d361231fdf99c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 29 17:11:52 2018 -0500
More tweaks/updates to sandbox/ref99/README.md.
commit 850a8a46c0a569a2652d8c200e5c53b61bcf988d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue May 29 13:51:21 2018 -0500
Test all x86_64 configurations*... (#212)
* Add custom SDE cpuid files.
* Set up testing of all x86_64 architectures (except bulldozer) using SDE.
* Update .travis.yml
[ci skip]
* Update do_testsuite.sh
[ci skip]
* Updated .travis.yml with my secret token.
Details:
- Replaced Devin's temporary secret token with my own, which is used by
Travis when accessing the Intel SDE via Dropbox.
* Work around CPUID dispatch in glibc/libm by patching ld.so.
* Detect path of loader at runtime.
* Attempt to make SDE run on Travis
* Allow unpatched ld.so if we don't know how to patch it.
I *think* this only happens for older glibc without the multi-arch stuff (e.g. Ubuntu 14.04 on Travis), but who knows?
* Upgrade Travis to gcc-6 and binutils-2.26.
* Try to get Travis to use the right assembler.
* Apparently you need ld-2.26 too.
* Try to also patch ld.so from Ubuntu 14.04.
* Take the nuclear option.
* Account for non-absolute dependencies in ldd output.
* String manipulation fail.
* Update patch-ld-so.py
* Add Zen to SDE testing.
* Removed dead variable from travis/do_testsuite.sh.
Details:
- Removed 'BLIS_ENABLE_TEST_OUTPUT=yes' from make invocations in
travis/do_testsuite.sh. This variable is no longer present in the
BLIS build system (if it ever was?), and therefore has no effect.
commit 42ea02a34e5c144893fe239ae55daef895d92677
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 29 12:48:14 2018 -0500
Renamed c99 sandbox to ref99.
Details:
- Renamed sandbox/c99 to sandbox/ref99. I wanted to name the sandbox so
that it would be thought of as a "reference" sandbox. I kept the "99"
to differientiate it from future reference sandboxes that may be
written in another language (such as C++).
- Updates to sandbox/ref99/README.md.
commit 0e7205ccef50dccd4306cf427a63633396472813
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 29 12:36:13 2018 -0500
Remove sandbox/.gitkeep now that dir is non-empty.
commit 3a4603858e3819cbd6ed7dd67d0fc0b3f89ed254
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat May 26 15:51:08 2018 -0500
More README.md updates to sandbox/c99.
Details:
- Added a section that walks the reader through how to configure BLIS to
use a gemm sandbox.
commit 2bad97f6bdf4642884d60fc03970549902a54d74
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat May 26 15:31:16 2018 -0500
Updates to CREDITS, sandbox/c99/README.md.
commit 2b4a447526effa3e847a7e5c15c3758573f12318
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 25 18:51:23 2018 -0500
Initial implementation of c99 "reference" sandbox.
Details:
- Added a c99 sandbox (in sandbox/c99) to serve as a starting point for
others looking to experiment with alternative implementations of gemm
in BLIS. Note that this sandbox implementation is a first draft and
will be refined over time.
- Minor updates to Makefile and common.mk to restrict what source files
get recompiled when sandbox files are touched.
- Added an initial draft of a README.md in sandbox/c99.
commit 469727d4f8a976d8713afb4d0b6235c322498db0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 25 16:17:13 2018 -0500
Very minor comment updates.
commit 66dbe69a0f9359bf1e39b5672ee365213de2e3ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 25 15:45:53 2018 -0500
Converted macros to static funcs in _packm_cntl.h.
Details:
- Converted various macros in frame/1m/packm/bli_packm_cntl.h (designed
to access fields of a packm_params_t struct) to static functions.
commit 22deef2f5463a47e3b3c37fc313d17550f10ee06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 24 14:28:55 2018 -0500
Support alternative gemm implementation sandboxes.
Detail:
- configure:
- add support for --enable-sandbox=NAME to configure script, where NAME
is a subdirectory of a new 'sandbox' directory that contains an
alternative implementation of gemm. (For now, only implementations of
gemm may be provided via a sandbox.);
- add support for C++ compiler. C++ compilers are handled in a manner
similar to that of C compilers, in that a default search order is
used, and that CXX is searched for first, if the variable is set. In
practice, the C++ compiler that is selected should correspond to the
selected C compiler. (Example: If gcc is selected for C, g++ should
be selected for C++.) The result of the search is output to config.mk
via build/config.mk.in. NOTE: The use of C++ in BLIS is still
hypothetical, but may eventually move to being experimental. This
support was intended only for use of C++ within a gemm sandbox.
- build/config.mk.in:
- define SANDBOX variable containing sandbox subdirectory name.
- build/bli_config.in:
- define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
macros in bli_config.h.
- common.mk:
- include makefile fragments that were propagated into the specified
sandbox subdirectory;
- generate different CFLAGS for sandboxes, as well as a separate
CXXFLAGS variable for sandboxes when C++ source files are compiled;
- isolate into a single location lists of file suffixes for various
purposes.
- reorganized/clean up code related to identifying header files and
paths.
- Makefile:
- generate object filepaths for and compile source code files found in
sandbox sub-directory;
- remove makefile fragments placed in sandbox sub-directory (cleanmk);
- various other cleanups.
- Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
makefile fragments (via build/gen-make-frags/suffix_list).
- Updated blis.h to conditionally #include bli_sandbox.h (via a new file,
bli_sbox.h), which each sandbox is assumed to use for any type
definitions and function prototypes it wishes to export out to blis.h.
- Conditionally disable bli_gemmnat() implementation in frame/3 when
BLIS_ENABLE_SANDBOX is defined.
commit 25e3501ed57a0db7f860c88b7199b36049aec12a
Merge: 216a4cb9 5140ee34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 24 13:57:16 2018 -0500
Merge branch 'master' into dev
commit 5140ee3424c744981a3fed3b5a748ebbfc111388
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 23 16:56:14 2018 -0500
Updated types of bli_is_[un]aligned_to() functions.
Details:
- Changed the void* arguments of the following static functions:
bli_is_aligned_to()
bli_is_unaligned_to()
bli_offset_past_alignment()
to siz_t, and the return type of bli_offset_past_alignment() from
guint_t to siz_t. This allows for more versatile usage of these
functions (e.g. when aligning both pointers and leading dimension).
- Updated all invocations of these functions, mostly in kernels/penryn
but also in kernels/bgq, to include explicit typecasts to siz_t when
pointer arguments are passed in.
- Thanks to Devin Matthews for pointing out this potential bug (via issue
#211).
- Deleted a few trailing spaces in various penryn kernels.
- Removed duplicate instances of the words "derived" and "THEORY" from
various kernel license headers, likely from a malformed recursive sed
performed long ago.
commit 216a4cb9cb87fa4c93f6ceb6ae90602e5018b305
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 18 18:47:03 2018 -0500
Minor update to flatten-headers.[py|sh] help text.
Details:
- Fixed a typo and removed some outdated language from the help text of
flatten-headers.py and flatten-headers.sh.
commit 962a706a6f56ea070ac4683f0af69c7e59af8ecb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 18 18:19:40 2018 -0500
Updated LICENSE file to mention HP Enterprise.
Details:
- Added HP Enterprise to the LICENSE file. Previously, only the source
files touched by HPE contained the corresponding copyright notices.
(This oversight was unintentional.)
- Updated file-level copyright notices to include a comma, to match
the formatting used for UT and AMD copyrights.
commit efa43e13effe901ad31e734ac90f027e89473bd9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 18 12:20:40 2018 -0500
More updates to CREDITS and RELEASING files.
commit f94ab97af8e86baf9ee9a9cbaef8bb3712df2e11
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 17 17:45:31 2018 -0500
Update to CREDITS file.
commit 4919b10c005e006a6d818eb8f865f9dbd8aa16df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 17 16:38:49 2018 -0500
Minor changes to README.md and CONTRIBUTING.md.
commit b89451187e8321b673a1cf7603c8d48028d9d4c8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 17 16:23:06 2018 -0500
README.md update.
Details:
- Added "Contributing" section with relevant links.
commit af244194e7d76276a1b90fe59f9307dde0429e1d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 17 15:38:02 2018 -0500
Removed explicit critical sec. from bli_memsys.c.
Details:
- Removed critical sections protecting the initialization/finalization of
bli_memsys.c. These synchronization mechanisms are no longer needed now
that BLIS initializes all APIs via pthread_once().
commit 10c9e8f95254d8c6436c4d3cb093fa5544b45c90
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 17 15:22:51 2018 -0500
Cache hardware's arch_t id after querying once.
Details:
- Added logic to bli_arch.c that will call what was previously the body
of bli_arch_query_id() only once and then cache the value in a static
variable local to the file. (Previously, the arch_t associated with
the hardware/configuration was queried every time bli_arch_query_id()
was called, which was at least once per level-3 function call. Thanks
to Devin Matthews for suggesting this feature via issue #175.
- Added -lpthread to the compile/link command line of the compiler
invocation that compiles build/detect/config/config_detect.c, which
prints the string identifying the detected configuration, since it
is now needed due to new pthread_once() logic in bli_arch.c.
- Implementation note: I chose to implement this arch_t caching feature
via pthread_once(), using a separate pthread_once_t variable local to
the file, rather than calling bli_init_once(). The reason is that I
did not want to require bli_init() as a prerequisite to this function.
bli_init() already calls several sub-components, some of which make use
of bli_arch_query_id(), and therefore it would be easy to fall into a
circular self-init situation (which usually causes pthreads to hang
indefinitely).
commit f28a15293890ac6fbceac229fd204dbc9fec6e27
Author: Francisco Igual <figual@ucm.es>
Date: Thu May 17 09:26:14 2018 +0000
Fixed clobber list bug in ARMv8 ukernel
commit 2e31dd7852b4d6a9355899cf9659d4b8130461cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 16 17:28:33 2018 -0500
Inserted missing integer typecasting into ukernels.
Details:
- Inserted missing safeguards into most microkernels to ensure that the
integers read by the microkernel's assembly instructions are of the
appropriate size. In many cases, this bug was going undetected likely
because the compiler was inserting zero padding before the integers
in the calling function, allowing the assembly code to read 64-bits
in a way that did not corrupt the "lower" 32 integer bits with garbage
in the higher bits. Thanks to Francisco Igual and Devangi Parikh for
finding this issue.
commit 12dfa9516428b4092554f0ce70b07571d35de222
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 16 12:46:57 2018 -0500
Fixed a bug in determining default integer size.
Details:
- Fixed a bug that would cause configurations to inadvertantly define
their integers to be 32 bits when those environments actually call for
64-bit integers. While either BLIS_ARCH_64 or BLIS_ARCH_32 is defined
in bli_system.h (based on whether preprocessor macros such as __x86_64
or __aarch64__ are defined by the environment), bli_system.h was being
#included *after* bli_config_macro_defs.h, in which the BLIS_ARCH_64
macro was used to choose an integer type size in the event that
BLIS_INT_TYPE_SIZE was not already defined by configure via
bli_config.h. And due to the structure of the cpp code in that file,
the 32-bit integer case was being chosen. Thanks to Francisco Igual
and Devangi Parikh for their help in isolating this bug.
- Moved the #include of hbwmalloc.h and related preprocessor code to
bli_kernel_macro_defs.h to facilitate the reshuffling of the #include
for bli_system.h in blis.h.
commit f930cec0f35824c0f9ebbd218614209217d491cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 15 17:47:08 2018 -0500
More tweaks to CONTRIBUTING.md.
commit 173e30ff7d293ba31f3fab8ab0c0a695eda3d4fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 15 14:48:34 2018 -0500
Added initial draft of CONTRIBUTING.md file.
Details:
- Thanks to the Ruby on Rails project for providing a good template off
of which to build.
commit 6e25e758b444bf725046674e1e64c6a52421749d
Author: Nico Schlömer <nico.schloemer@gmail.com>
Date: Tue May 15 14:03:20 2018 +0200
Debian config (#206)
* add debian config
* correct wording in the README
commit fcf6c6a3c87da08a7cdb92b102489b991ef7a644
Author: Alex Arslan <ararslan@comcast.net>
Date: Mon May 14 18:41:03 2018 -0700
Fix shared library builds on platforms other than Linux and macOS (#209)
* Fix detection of systems other than Linux and macOS
The way the logic is currently laid out, any platform that isn't Linux
gets assigned the .dylib shared library extension and the macOS-specific
compiler flags. This reverses the logic to check for macOS first, and
have the fallback use the Linux definitions, which apply to most other
systems as well.
* Use SHLIB_EXT instead of SO_SUF
The former is more standard, as jakirkham pointed out in a comment.
commit 6f7f51048c48f31d691c06451d0fd2cbc453ad03
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 14 18:41:56 2018 -0500
Echo cc_vendor when printing compiler version.
Details:
- Echo the ${cc_vendor} when informing the user of the compiler's version.
Previously, the actual ${cc} (which could be a path to the executable)
was being printed, which has already been printed by that point in the
configure script.
commit ad67dc4e348b0a381efc057573a6b03cc7e26db0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 14 18:35:28 2018 -0500
Communicate cc, cc_vendor to make via config.mk.
Details:
- Historically, the compiler selection has happened statically in the
various make_defs.mk and would only be overriden by setting CC (either
prior to running configure or as a configure argument). However, in
the last couple months, configure has evolved to contain rather
sophisticated compiler detection logic for the purposes of blacklisting
sub-configurations. It only makes sense that configure now fully take
over the responsibility of selecting a compiler from the GNU make side
of the build system. Thanks to Alex Arslan for his help exposing this
issue.
- Substitute found_cc into CC in config.mk via configure.
- Set a new variable, CC_VENDOR, in config.mk via substitution from
configure, and disable the corresponding CC_VENDOR code in common.mk.
- Disabled default compiler selection (usually gcc) in the sub-configs'
various make_def.mk files.
commit 20af119fc97ec6120017a7a5ba5f9aaa920c7640
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 14 17:44:58 2018 -0500
Added README.md to 'config' directory.
Details:
- Added a brief README.md file to the config directory to redirect those
who may be exploring the source tree to the ConfigurationHowTo wiki.
(Included is a very brief explanation of configurations for those who
don't have time to read the wiki.) Thanks to Nico Schlömer for this
suggestion.
commit 9dbce16269c3e1f27c7a0d64372cc76aed30dfc1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 14 17:04:54 2018 -0500
Search for 'cc clang gcc' on OpenBSD, FreeBSD.
Details:
- Swapped gcc and clang in the compiler search list for OpenBSD.
- Use the same search list for FreeBSD as above.
commit 55ebf24d63128b5fd15b10160485667415a02a55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 14 16:19:08 2018 -0500
Change compiler search order on OpenBSD.
Details:
- Set a compiler search list (and order) as a function of the OS detected
via 'uname -s'. By default, this list and order is 'gcc clang cc' for
Linux and Darwin (OS X), and any other OS except OpenBSD). On OpenBSD,
we use 'cc gcc clang' because OpenBSD's default installation of gcc
(4.2.1) is too old for BLIS. Thanks to Alex Arslan for reporting this
issue and suggesting a fix.
commit 4fb353bd90e6642c8aeffd1b1e6329f54eee4bb4
Merge: 4b36e85b 8a2857b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun May 13 17:50:51 2018 -0500
Merge branch 'master' into dev
commit 8a2857b5e3c633b18c24f2275110437a702a71d0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 11 18:42:05 2018 -0500
Fixed README.md typo; mention 'make check'.
commit 543935c02f9335142d2e485a15f37dbaebe012ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 11 18:35:32 2018 -0500
Updated README.md with Ubuntu packages link.
Details:
- Created a separate section of README.md for external packages, with
one bullet each for Dave Love's rpms and Nico Schlömer's Ubuntu apt
packages. Thanks to Dave and Nico for their contributions.
commit af1d8470b56d3b2a1c8513d366d788dddcb84baa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 11 17:49:58 2018 -0500
Better handling of shared libraries on OS X.
Details:
- Use the .dylib shared library suffix on OS X (instead of .so in Linux).
- Link with the -dynamiclib and -install_name options on OS X (instead of
-shared and -soname in Linux).
- Determine operating system (e.g. Linux, Darwin) during configure and
substitute into config.mk.in rather than run 'uname -s' during make.
- Echo operating system during configure.
commit 4b72a462d7467cf815422aafac7b05037d2e3b13
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 10 18:35:38 2018 -0500
Enable building shared library by default.
Details:
- Tweaked configure so that the shared library is generated by default.
- Updated --help text and configure's feedback messages reporting the
status of the static/shared builds.
- Changed the order of build product installation so that headers are
installed last, after libraries and symlinks.
commit b699bb1ff03c6e9baaa054805b4939983ae7145b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu May 10 15:54:17 2018 -0500
Adopt Linux-like .so versioning at install-time.
Details:
- Changed the naming conventions used for installed libraries and
symlinks to more closely mirror patterns used by typical GNU/Linux
libraries. Whereas previously static and shared libraries were
installed and symlinked as follows:
(library) libblis-0.3.2-15-haswell.a
(library) libblis-0.3.2-15-haswell.so
(symlink) libblis.a -> libblis-0.3.2-15-haswell.a
(symlink) libblis.so -> libblis-0.3.2-15-haswell.so
we now use the following naming conventions:
(library) libblis.a
(symlink) libblis.so -> libblis.so.0.1.2
(symlink) libblis.so.0 -> libblis.so.0.1.2
(library) libblis.so.0.1.2
where 0.1.2 indicates shared library major, minor, and build versions
of 0, 1, and 2, respectively. The conventional version string can
still be queried by linking to the library in question and then calling
bli_info_get_version_str(). (The testsuite binary does this
automatically at startup.)
- Added logic to common.mk to set the soname field in the shared library
via the -soname linker flag.
- Added a 'so_version' file to the top-level directory containing two
lines. The first line specifies the .so major version number, and the
second line specifies the minor and build version numbers joined with
a '.'. This file is read by configure and those values substituted
into build/config.mk.in to define SO_MAJOR, SO_MINORB, and SO_MMB
variables.
commit fc2d9ec6bf46f6e5b19d196208415ce433e95b10
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 9 15:19:28 2018 -0500
Tweaks to top-level clean and distclean targets.
Details:
- Moved the removal of bli_config.h from cleanh to distclean.
- Removed cleantest as a dependency of clean.
commit bf0350305971e3991861b5117a13fda31ff97b6d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 8 16:49:22 2018 -0500
Renamed (shortened) a few build system variables.
Details:
- Renamed the following variables in config.mk (via build/config.mk.in):
BLIS_ENABLE_VERBOSE_MAKE_OUTPUT -> ENABLE_VERBOSE
BLIS_ENABLE_STATIC_BUILD -> MK_ENABLE_STATIC
BLIS_ENABLE_SHARED_BUILD -> MK_ENABLE_SHARED
BLIS_ENABLE_BLAS2BLIS -> MK_ENABLE_BLAS
BLIS_ENABLE_CBLAS -> MK_ENABLE_CBLAS
BLIS_ENABLE_MEMKIND -> MK_ENABLE_MEMKIND
and also renamed all uses of these variables in makefiles and makefile
fragments. Notice that we use the "MK_" prefix so that those variables
can be easily differentiated (such as via grep) from their "BLIS_" C
preprocessor macro counterparts.
- Other whitespace changes to build/config.mk.in.
- Renamed the following C preprocessor macros in bli_config.h (via
build/bli_config.h.in):
BLIS_ENABLE_BLAS2BLIS -> BLIS_ENABLE_BLAS
BLIS_DISABLE_BLAS2BLIS -> BLIS_DISABLE_BLAS
BLIS_BLAS2BLIS_INT_TYPE_SIZE -> BLIS_BLAS_INT_TYPE_SIZE
and also renamed all relevant uses of these macros in BLIS source
files.
- Renamed "blas2blis" variable occurrences in configure to "blas", as
was done in build/config.mk.in and build/bli_config.h.in.
- Renamed the following functions in frame/base/bli_info.c:
bli_info_get_enable_blas2blis() -> bli_info_get_enable_blas()
bli_info_get_blas2blis_int_type_size()
-> bli_info_get_blas_int_type_size()
- Remove bli_config.h during 'make cleanh' target of top-level Makefile.
commit 4b36e85be9b516b4089b24768f881dd976668997
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 8 14:26:30 2018 -0500
Converted function-like macros to static functions.
Details:
- Converted most C preprocessor macros in bli_param_macro_defs.h and
bli_obj_macro_defs.h to static functions.
- Reshuffled some functions/macros to bli_misc_macro_defs.h and also
between bli_param_macro_defs.h and bli_obj_macro_defs.h.
- Changed obj_t-initializing macros in bli_type_defs.h to static
functions.
- Removed some old references to BLIS_TWO and BLIS_MINUS_TWO from
bli_constants.h.
- Whitespace changes in select files (four spaces to single tab).
commit 7e5648ca150757b874f6823da832f3798c40b9f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 7 18:59:19 2018 -0500
Add configure support for --libdir, --includedir.
Details:
- Added support for two new configure options: --libdir and --includedir.
They specify the precise install directories for libraries and header
files, respectively, and override any location implied by the --prefix
option (including the default install prefix, if --prefix was not
given). Thanks to Nico Schlömer for suggesting this via issue #195.
- Removed the INSTALL_PREFIX definition/anchor from build/config.mk.in
and replaced it with corresponding definitions/anchors for libdir and
includedir.
- Updated top-level Makefile to use the new variables, INSTALL_LIBDIR
and INSTALL_INCDIR, instead of INSTALL_PREFIX (which is now no longer
needed by make).
- Set default sane values for INSTALL_LIBDIR and INSTALL_INCDIR in
common.mk when configure has not been run, as is already done for
DIST_PATH. This is to safeguard against statements in the top-level
Makefile that use 'find' to locate old libraries and headers for the
uninstall targets, which run regardless of make target. Without setting
INSTALL_LIBDIR and INSTALL_INCDIR, those variables are empty and the
'find' ends up looking at '/', which is obviously not what we want.
(Also enclosed those definitions in an IS_CONFIGURED guard so that they
won't get evaluated unless configure has been run.)
- Rearranged "ifeq ($(IS_CONFIGURED),yes)" conditionals in Makefile to
reduce occurrences and separated "local" and top-level components of
cleanblastest and cleanblistest targets to improve readability.
- Adjusted out-of-tree builds so that they are no longer oblivious to
the .git directories, if present, and thus now properly augment version
strings with the appropriate patch number.
- Include missing version string in 'configure --help' output.
commit b09e4e8852a6c42895910e3bcb9041124dc8bf9f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 7 14:37:50 2018 -0500
Allow 'make clean' and friends without configuring.
Details:
- Modified top-level Makefile so that a user can run 'make distclean',
'make clean', or any of the other clean-related targets prior to
running configure (or after a previous 'make distclean'). Thanks to
Nico Schlömer for suggesting this via issue #197.
- Made the cleanblastest and cleanblistest more comprehensive in that
they now clean out build products that would have resulted from local
compilation (ie: builds performed within the 'blastest' or 'testsuite'
directories).
- Added "cc" to list of expected compiler "vendors" since the CC variable
seems to automatically be set to "cc" on Ubuntu 16.04 (which is just an
alias to gcc).
- Comment update to build/config.mk.in.
commit 35c5a1449c3efe0b2ec43cdefcfdf00e71828149
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 7 12:04:57 2018 -0500
No longer update version file during configure.
Details:
- Recycled the core functionality of build/update-version-file.sh into a
function in configure, disabling the updating of the 'version' file in
the process. Instead of writing the patched version string back to the
version file and then reading it again from within configure, the
patched version string is now saved directly to a variable in the main()
function in configure. This will prevent developers from accidentally
committing configure-induced changes to the version file in between
releases.
commit 8adb2f919b62da4a2885ae04a10925e0e6a2e304
Author: Mathieu Poumeyrol <kali@users.noreply.github.com>
Date: Sun May 6 19:58:16 2018 +0200
Some cross compilations fixes (#198)
* cross-compilation fixes
* add doc ranlib variable
* icc support -dumpversion, posix compatible test, plus one stupid mistake
* retab
* revert version as requested
commit 89acd9ebe516eeb97006dba344354bfc98826645
Merge: 4cff432d 0557eba7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 2 12:53:35 2018 -0500
Merge branch 'amd'
commit 4cff432d707891ada705b039a7e043558bbf3c51
Author: Nisanth M P <31736542+nisanthmpamd@users.noreply.github.com>
Date: Wed May 2 23:20:42 2018 +0530
AMD specific optimizations for target 'zen' (#194)
Re-enabled AMD-specific optimizations for zen.
Details:
- Re-enabled Zen-specific cache blocksizes for 'zen' sub-configuration.
- Re-enabled small matrix gemm optimization for 'zen'.
- These were both temporarily disabled during a previous merge simply due to lack of Zen hardware for testing.
commit 8eda5fe7f678b413cb274bd84716995a7d0b87a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 2 12:20:37 2018 -0500
Typo fix in README.md.
commit 0557eba78f5fcf28f0f039f28da79498ffde848c
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Mar 19 12:49:26 2018 +0530
Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
commit df78ceb3d6f33a27fe69017854405edaea7c40e5
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Mar 19 11:34:32 2018 +0530
Re-enabling Zen optimized cache block sizes for config target zen
Change-Id: I8191421b876755b31590323c66156d4a814575f1
commit 5e515f9a76f4aaf43dc21315a34d797726ca8069
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 1 13:44:10 2018 -0500
Tweaked new language in README.md.
commit 1ddd9e316ad5024af8b606dfcebd1e7d587a130f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 1 13:36:28 2018 -0500
Added link to Dave Love's Fedora Copr page.
Details:
- Added a blurb to README.md advertising Dave Love's Copr homepage,
which contains rpm packages for RHEL/Fedora-like distributions.
commit 078a852f738c66c6468bd5e64b06467edc9057fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 30 16:15:26 2018 -0500
Minor tweaks to top-level 'make clean' target.
Details:
- Execute 'cleanh' target as part of 'clean'
- Remove cblas.h file from 'include/<configname>/' as part of 'cleanh'
target.
- Updated the echoed (non-verbose) text for uniformity.
commit 75d0d1057dda69c655bd1cd8f791cb39b54d99b8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 30 14:57:33 2018 -0500
Renamed various datatype-related macros/functions.
Details:
- Renamed the following macros in bli_obj_macro_defs.h and
bli_param_macro_defs.h:
- bli_obj_datatype() -> bli_obj_dt()
- bli_obj_target_datatype() -> bli_obj_target_dt()
- bli_obj_execution_datatype() -> bli_obj_exec_dt()
- bli_obj_set_datatype() -> bli_obj_set_dt()
- bli_obj_set_target_datatype() -> bli_obj_set_target_dt()
- bli_obj_set_execution_datatype() -> bli_obj_set_exec_dt()
- bli_obj_datatype_proj_to_real() -> bli_obj_dt_proj_to_real()
- bli_obj_datatype_proj_to_complex() -> bli_obj_dt_proj_to_complex()
- bli_datatype_proj_to_real() -> bli_dt_proj_to_real()
- bli_datatype_proj_to_complex() -> bli_dt_proj_to_complex()
- Renamed the following functions in bli_obj.c:
- bli_datatype_size() -> bli_dt_size()
- bli_datatype_string() -> bli_dt_string()
- bli_datatype_union() -> bli_dt_union()
- Removed a pair of old level-1f penryn intrinsics kernels that were no
longer in use.
commit 01c4173238baf08e7f6700a3f91a2ea58cca50c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 28 14:07:34 2018 -0500
CHANGELOG update (0.3.2)
commit 2fb440876690bdcec0c11a30e2b33ad100bab529
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 28 14:07:31 2018 -0500
Version file update (0.3.2)
commit cdf041ddadd8725e578e2f59f37ae341f26655af
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 28 14:05:00 2018 -0500
Use config.mk instead of common.mk in bump-version.sh.
Details:
- Fixed inadvertent targeting of common.mk when testing whether configure
had already been run, rather than config.mk.
commit 6ded8f9f0364b3c07255e2532ada3eeb2ed2a715
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 28 14:01:29 2018 -0500
Account for recent 'make distclean' in bump-version.sh.
Details:
- Added logic to build/bump-version.sh that will run './configure auto'
if 'common.mk' is not present (usually because 'make distclean' was run
recently).
commit 7c16fdce433f5dea0e83d5047553c955d8e46fd2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 28 13:50:55 2018 -0500
Fixed typo in RELEASING file.
commit 5e5ca4984fcf6d72d3036c338bb9cdc64520a325
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 28 13:48:01 2018 -0500
README updates.
Details:
- Updates to the top-level README files in the top-level directory as
well as the 'examples/oapi' directory.
commit 627b045e301defea6770dc5b64e1110cbec25153
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 27 18:11:19 2018 -0500
Added an example of using transposition with gemm.
Details:
- Added an example to examples/oapi/8level3.c to show how to indicate
transposition when performing a gemm operation.
commit 13a0eadc69d72933e322901f5b44944834e3c787
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 27 18:00:07 2018 -0500
Added more transposition/conjugation examples.
Details:
- Added code to examples/oapi/5level1m.c that demonstrates transposing
(and conjugate-transposing) unstructured matrices.
- Comment updates to 6level1m_diag.c to maintain consistency with new
examples in 5level1m.c.
commit 5606cd8881e75264a96af45dc8ea1905bab054f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 27 17:13:10 2018 -0500
Added utility module to examples/oapi.
Details:
- Added a new code example file to examples/oapi demonstrating how to use
various utility operations.
- Comment updates to other example files.
- README updates.
commit ff26c94c6486374c709f93c6965ea18903bd6a18
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 27 12:31:34 2018 -0500
Added missing gcc version constraint for knl.
Details:
- Previously forgot to add explicit enforcement of a minimum gcc version
in configure script when 'knl' sub-configuration is requested.
- Comment updates to configure.
commit 4d97574e477b3e55ddbb6044b0542a92cd9bab30
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 24 18:48:09 2018 -0500
Added object API example code.
Details:
- Added an 'examples' directory at the top level.
- Added an 'oapi' subdirectory in 'examples' that contains a tutorial-like
sequence of example code demostrating the core functionality of BLIS's
object-based API, along with a Makefile and README. Thanks to Victor
Eijkhout for being the first to suggest including such code in BLIS.
commit d6ab25a3232aa52b9b855088fb4b0b46ff2c00c8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 24 18:43:03 2018 -0500
Add setijm, getijm operations.
Details:
- Added bli_setgetijm.c, which defines bli_setijm(), bli_getijm(), and
related functions that can be used to read and write individual
elements of an obj_t.
- Defined a new function, bli_obj_create_conf_to(), in bli_obj.c that will
create a new object with dimensions conformal to an existing object.
Transposition and conjugation states on the existing object are ignored,
as are structure and uplo fields.
- Defined a new function, bli_datatype_string(), in bli_obj.c that returns
a char* to a string representation of the name of each num_t datatype.
For example, BLIS_DOUBLE is "double" and BLIS_DCOMPLEX is "dcomplex".
BLIS_INT is included (as "int"), but BLIS_CONSTANT is not, and thus is
not a valid input argument to bli_datatype_string().
- Added calls to bli_init_once() to various functions in bli_obj.c, the
most important of which was bli_obj_create_without_buffer().
- Removed unintended/extra newline from the end of printv output.
- Whitespace changes to
- frame/base/bli_machval.c
- frame/base/bli_machval.h
- frame/0/copysc/bli_copysc.c
- Trivial changes to README.md and common.mk.
commit a731a428f7fc02fd6ab4f953ead828c1d06fb5a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 17 16:44:55 2018 -0500
Another README.md update.
commit c734ee928a824b27d280a9a67b1b4bc8423d5795
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 17 16:40:05 2018 -0500
README.md update.
commit 03ecad372d8eb603ee905a7b944d0544a813460a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 17 14:16:59 2018 -0500
Added RELEASING file.
Details:
- Added a file named 'RELEASING' that contains basic notes on how to
create a new version/release of BLIS. This is mostly just a reminder
to myself, but also may become useful if/when others take over
development and administration of the project.
commit 24b3c3149ce66546b9a1afc2cc794a637a86aa60
Merge: 60366a3f 817b67c0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 16 18:49:38 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit 60366a3faba4e60cee85c3b87a3f69625f4b9026
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 16 18:46:21 2018 -0500
Updates to knl kernels and related code.
Details:
- Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
kernel) from TBLIS and enabled its use in the knl sub-config. Also
Added sgemm microkernel prototype to bli_kernels_knl.h.
- Updated dgemm and dpackm microkernels from TBLIS, which included an
important change regarding the offsets array (changed from extern
declaration to static declaration/definition).
- Activated use of level-1v and -1f zen kernels in skx and knl
sub-configs.
- Removed some old macros no longer needed in bli_family_skx.h now that
libmemkind support exists in configure.
- Moved bli_avx512_macros.h to frame/include and adjusted #includes in
skx and knl kernels accordingly.
- Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
directory.
- Fixed a minor bug in the 'make' output per compile when verboseness
is not turned on. The rule-generating function 'make-kernel-rule' was
previously passing in the name of the config, rather than the name of
the kernel set returned by get-config-for-kset, which could give
misleading information to the user when the kconfig_map mapped a
kernel set to a sub-configuration that did not share the same name.
(This didn't affect the CFLAGS that were actually used.)
- Updated test/3m4m/Makefile, removing acml targets and renaming the
remaining targets.
commit 817b67c01752e0ca8fe230bb8ad23afc7bd0f64e
Merge: 67c9c2f8 2b7108a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 16 14:06:26 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit 67c9c2f86d5ef2accc439b21581d73d82754a2e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 16 14:03:12 2018 -0500
Retired haswell gemm microkernels.
Details:
- Moved microkernels in kernels/haswell/3 to kernels/haswell/3/old. These
microkernels were no longer being used and only sowed confusion to
anyone inspecting the repository without being fully cognizant of the
build system and how it works (and sometimes even to those who wrote
the build system). Note that the haswell configuration currently
employs the zen microkernels.
commit 2b7108a8ef8ce958b3acad028ff07c85ff97fd63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 16 12:35:53 2018 -0500
Minor updates to test driver makefiles.
Details:
- Cleaned up and homogenized the various test driver Makefiles in
testsuite and test directories.
- Very minor updates to test driver code.
commit 9f56df95570a24587b910b169f342bd356ccbfb6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 11 14:51:36 2018 -0500
Trivial tweaks to configure blacklisting output.
Details:
- Updated output of information vis-a-vis configuration blacklisting.
commit f56481efebd9a7785c0618f3a12c0bec36f46333
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 10 19:02:21 2018 -0500
Cleaned up assembler version query on OS X.
Details:
- Swiched from querying version of 'objdump' to 'as' (e.g. the
assembler).
- Fixed the outputting of the version of 'as' on OS X, which required
this beauty:
...=$(as -v /dev/null -o /dev/null 2>&1)
- Only add sub-configs to blacklist if the sub-config hasn't already
been added.
commit 088c474e629535affbe111f141f895af50d109be
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 10 18:09:56 2018 -0500
Added support for blacklisting via the assembler.
Details:
- Added logic to configure that attempts to assemble various small files
containing select instructions designed to reveal whether binutils
(specifically, the assembler) supports emitting those instruction sets.
This information provides additional opportunities to blacklist sub-
configurations that are unsupported by the environment. Thanks to Devin
Matthews for pointing me towards a similar solution in TBLIS as an
example.
- Various other cleanups in configure.
- Reorganized the detection code in the 'build' directory, bringing the
"auto-detect" configuration detection, libmemkind detection, and new
instruction set detection codes into a single new subdirectory named
'detect'.
commit 78a24e7dada52a3582f8488795bd1a44993989d9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 9 17:02:13 2018 -0500
Updated bli_avx512_macros.h in knl and skx configs.
Details:
- Downloaded updated version of bli_avx512_macros.h from TBLIS [1] in
attempt to address issue #192.
[1] https://github.com/devinamatthews/tblis/
commit 388f64d6ade14caa4a6c286845ad2d565378b2bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 9 15:33:10 2018 -0500
Fixed failure to honor CC= argument to configure.
Details:
- Fixed a failure to observe the value of CC when selecting the compiler
in configure. Thanks to Devangi Parikh for reporting this bug.
- The semantics now also work for the CC environment variable. That is,
if CC is set prior to running configure, that value is used, but will
be overridden by specifying the CC= argument to configure. If the CC
environment variable is not set, the CC= value is used. If neither the
environment variable nor CC= are specified, then the choice is made
internally to configure: first attempting to find gcc, then clang, and
then cc.
commit 45fbe66b3e2ab92f0b4fdf437d57c5d06603803d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 9 14:01:08 2018 -0500
Fixed libmemkind dependency for x86_64.
Details:
- Removed some old conditional code in config/knl/make_defs.mk that
added -lmemkind to LDFLAGS if DEBUG_TYPE was not 'sde' and inserted
code into common.mk that affirmatively filters out -lmemkind from
LDFLAGS if DEBUG_TYPE is 'sde'. (Thanks to Dave Love for reporting
this issue.) Other minor cleanups to neighboring code in common.mk.
- Updated CRVECFLAGS in knl/make_defs.mk to be based on -march=knl,
and then AVX-512 functionality is manually removed via various
-mno-avx512* flags. Also, make the setting of CRVECFLAGS conditional
on CC_VENDOR. Similar change to skx/make_defs.mk.
- Comment/whitespace updates.
commit ca982148b3b419db063cad2fa74376ec383a5c80
Author: dnp <devangiparikh@gmail.com>
Date: Sun Apr 8 21:27:10 2018 -0500
Fixed bug in SKX sgemm microkernel. Modified SKX dgemm mircokernel to be consistent with the sgemm microkernel
commit bd0276752ccdd56ff897b1a5ae022f2ffe6e0b38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 6 18:51:43 2018 -0500
Track separate ref kernel flags for each sub-config.
Details:
- Renamed CVECFLAGS variables in sub-configurations' make_defs.mk files
to CKVECFLAGS.
- Added default defintions of two new make variables to most sub-
configurations' make_defs.mk files--CROPTFLAGS and CRVECFLAGS--
which correspond to reference kernel analogues of the CKOPTFLAGS
and CKVECFLAGS, which track optimization and vectorization flags for
optimized kernels. Currently, two sub-configurations (knl and skx)
explicitly set CRVECFLAGS to non-default values (using AVX2 instead of
AVX-512 for reference kernels. Thanks to Jeff Hammond, whose feedback
prompted me to make this change (issue #187).
- Changed common.mk so that the get-refkern-cflags-for function returns
the flags associated with the given sub-configuration's CROPTFLAGS
and CRVECFLAGS (instead of CKOPTFLAGS and CKVECFLAGS).
commit b9aebce19480448817373e2df2b36bd090eae41a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 6 18:37:33 2018 -0500
De-verbosify makefile fragment generation.
Details:
- Changed from -v1 to -v0 when calling gen-make-frag.sh from configure.
The directory-by-directory recursive output didn't add much value to
the user, so now we just echo a line for each top-level directory into
which we will recurse (e.g. 'config', 'ref_kernels', 'frame', etc.).
This also helps keep more interesting information (from earlier in the
execution of configure) from scrolling out of the terminal window.
commit b549b91f26948991e13364f1f26a878da0f43aa0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 6 16:31:33 2018 -0500
Added 64-bit integer support to BLAS test drivers.
Details:
- Updated the build system and BLAS test drivers to use 64-bit integers
when BLIS is configured for 64-bit integers in the BLAS layer. Also
updated blastest/Makefile accordingly. Thanks to Dave Love for
reporting the need for this feature.
- Added a 'check' target to blastest/Makefile so that the user can see
a summary of the tests.
- Commented out the initial definition of INCLUDE_PATHS in common.mk,
which was used pre-monolithic header, back when BLIS needed paths to
*all* headers, rather than just a select few. This line is no longer
needed since the value of INCLUDE_PATHS is overwritten by a later
definition limited to only the header paths that are needed now.
commit d39fa1c04265869bdf8b6f453076359eec2f3c59
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 5 19:38:35 2018 -0500
Adjusted CFLAGS used to compile bli_cntx_ref.c.
Details:
- Removed CKOPTFLAGS and CVECFLAGS from the set of CFLAGS used to
compile bli_cntx_ref.c for each configuration. This is necessary
because the file defines functions like bli_cntx_init_skx_ref(),
which are called during BLIS's initialization of the global kernel
structure, potentially being executed by an architecture that lacks
the instruction set used to compile the kernels for, in this example,
skx, which would lead to an illegal instruction error. Thanks to
Dave Love for reporting this issue.
- Further adjusted CFLAGS used when compiling code in the 'config'
directory (e.g. bli_cntx_init_skx.c) as well as code in 'frame' so
as to avoid the aforementioned issue.
commit 08b123084d35680beab379012f8f5a5a8b44a443
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 5 14:25:39 2018 -0500
Added color-coding to 'make check' output.
Details:
- Added color coding to output of check-blistest.sh, check-blastest.sh
scripts. Success messages are coded green and failure are coded red.
This helps draw the eye toward those messages as the 'make checkblis',
'make checkblis-fast', and 'make checkblas' targets are executed.
- Changed top-level Makefile so that execution will not halt if
'checkblis', 'checkblis-fast', or 'checkblas' targets fail, which
means that the second of the two tests (BLIS and BLAS) run by
'make check' will run even if the first test fails.
commit c9e4d7db7410b03c1ffe8c9727e9f1b2ba7fecfe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 4 17:13:15 2018 -0500
CHANGELOG update (0.3.1)
commit 1f28d7c86e17730f05bd239c8e8d67e3e7510a4f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 4 17:13:15 2018 -0500
Version file update (0.3.1)
commit e6cc9ee26bcf0450f1120d5d12985b04d9fb8516
Merge: 786d15c5 3c91c7ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 4 16:08:18 2018 -0500
Merge branch 'dev' of github.com:flame/blis into dev
commit 786d15c5ef09f1f647b126b63d57e76d5810c58e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 4 16:06:47 2018 -0500
Added skx, knl to x86_64 configuration family.
Details:
- Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration
family in the config_registry file.
- Added logic to configure that avoids committing certain sub-configs to
the configuration/kernel registries if those sub-configs cannot be
handled properly by the chosen compiler. (This was modeled after
similar logic in TBLIS's configure; thanks to Devin Matthews for
pointing this out.) First, the compiler and its version are inspected
and, based on the results, certain configurations are added to a
"blacklist". Then, as the configuration registries are being created,
configurations and/or kernels that match items in the blacklist are
skipped over and not commited to the registries. Under certain
circumstances, omitting a blacklisted configuration will indirectly
invalidate other configurations due to the loss of availability of
the original blacklisted configuration's kernel set. This additional
indirect blacklist is also accounted for.
- Added output to the beginning of configure that echos information
about the chosen compiler as well as the configurations that are
blacklisted and must be stripped from the registries.
- Various other cleanups in configure, especially with respect to
explicitly declaring local variables in functions.
- Comment updates to config/zen/make_defs.mk regarding choice of -march
flags based on compiler version.
commit 3c91c7aebafb446a2582267beb3b22c8bb475b3b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 2 12:40:25 2018 -0500
Fixed 64b type mismatch warning in cblas_xerbla.c.
Details:
- Fixed a compiler warning concerning a type mismatch between the
format specifier of the printf() call in cblas_xerbla.c and its
corresponding (info) argument. The warning manifested when the CBLAS
layer was enabled and the BLAS/CBLAS integer type siwas is set to 64
(the default is 32). The warning was fixed by changing the specifier
from %d to %jd and typecasting the argument to intmax_t. Thanks to
Dave Love for reporting this issue and submitting the patch.
commit 71eaf449a812fe2bd640d21513ec83974b2edb45
Merge: 6a628184 ae9a5be5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 27 17:21:43 2018 -0500
Merge branch 'dev'
commit ae9a5be56d6f9b87278d6032154d2dcf3fb7d54f
Author: dnp <devangiparikh@gmail.com>
Date: Tue Mar 27 17:01:23 2018 -0500
Fixed bug in skx sgemm microkernel
commit 3f02af0905b1e2e2e065862f8afe5e9a52f282b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 26 17:40:04 2018 -0500
Row storage optimizations to zen dotxf kernels.
Details:
- Split the main loop bodies of zen's [sd]dotxf kernels into two cases:
one to handle a column-stored matrix A and one to handle a row-stored
matrix A. This allows vector instructions to be employed even if A is
stored by rows (and A^T appears stored as columns). Both storage cases
use a common edge case loop. Thanks to Devin Matthews for this idea
and for prototyping the change needed for sdotxf kernel.
commit 679dcc331dd870ec680e135a3fb65ffa6e3a91c2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 26 15:35:17 2018 -0500
Make k_iter/k_left uint64_t in bulldozer fma ukrs.
Details:
- Changed the declaration of k_iter and k_left for d, c, z microkernels
from dim_t to uint64_t. This is needed to ensure compatibility with
the movq instruction used to load the value into registers. This
change should have been made a long time ago, but for some reason
only recently began showing up via Travis CI.
commit 6a628184f6938673440e4cdd4fed0208c51fd1f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 26 14:48:16 2018 -0500
Fixed a memkind-related compile-time bug on knl.
Details:
- Fixed a compile-time error that occurred due to the fact that
BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
soon enough to be used in bli_system.h where it is needed to determine
whether hbwmalloc.h should be #included. bli_system.h is now included
after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
for reporting this issue.
- Tweaked the language used by configure to echo the status of the
--with[out]-memkind option.
commit e2192a8fd58ec3657434ddd407033e097edad8f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 23 12:53:48 2018 -0500
Removed vzeroupper intrinsics from zen kenels.
Details:
- Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a
vzeroupper instruction destoryed part of the intermediate result
stored by the vdpps instructions that came right before. (The
vzeroupper instrinsic was removed.)
- Removed remaining vzeroupper instrinsics from other zen kernels.
Previously, the vzeroupper instructions were included because BLIS is
typically compiled with -mfpmath=sse. But it was brought to my
attention that inserting these vzeroupper instructions is unnecessary
for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar
code rather than literal SSE instructions, and (b) compilers already
(likely) insert vzeroupper instructions where necessary. Thanks to
Devin Matthews for zeroing in on the dotxf bug.
- Removed -malign-double from bulldozer make_defs.mk. This alignment
was already happening by default since bulldozer is an x86_64 system.
commit 22289ad23cd10b81451ce82f60d84b5f97e7fd85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 22 18:21:30 2018 -0500
Added build system support for libmemkind.
Details:
- Added support for libmemkind to configure. configure attempts to
detect the presence of libmemkind by compiling a small program
containing #include <hbwmalloc.h> and a call to hbw_malloc(). If
successful, it is assumed that libmemkind is present and available.
If present, use of libmemkind is enabled by default, and otherwise
use is disabled by default. If libmemkind is present, the user may
explicitly disable use of the library by running configure with the
--without-memkind option. Furthermore, a configuration may disable
libmemkind, perhaps conditional on some aspect of the build system,
by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
set in config.mk, to 'no'. (The knl configuration makes use of this
latter feature; see below.)
- If enabled at configure-time, bli_system.h will #include <hbwmalloc.h>
and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
- Deprecated explicit use of BLIS_NO_HBWMALLOC in
config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
(#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
variable to 'no'.
- common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.
commit 7dc40eafdd9af3e8c4519a8d1b04d25830b4ca7a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 21 18:39:16 2018 -0500
Updates to top-level and test driver Makefiles.
Details:
- Added logic to common.mk that will choose a BLIS library against which
to link (LIBBLIS_LINK). The default choice is the static (.a) library;
the shared (.so) library is chosen only if the shared library build was
enabled and the static one was disabled.
- Updated the various test driver Makefiles to reference this common,
pre-chosen library against which to link. (Previously, these drivers
unconditionally linked against the static library and would have
failed if the static library build was disabled at configure-time.)
- Renamed many of the variables in common.mk and the top-level Makefile
so that variables relating to the libblis.[a|so] files, including
paths to those files, begin with "LIBBLIS".
- Shuffled around some of the library definitions from the top-level
Makefile to common.mk.
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
and in configure.
- A few other cleanups in the top-level Makefile.
commit 97e1eeade3c51df1bae574a9bc1da34b05bf2bd3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 21 15:47:11 2018 -0500
Added input.operations.fast file for 'make check'.
Details:
- Added an 'input.operations.fast' file to testsuite directory to go
along with the 'input.general.fast' file used by the 'make check'
target in the top-level Makefile. This will allow the "fast" check
to prune operations and/or parameter combinations from the test
space in order to save time.
- Currently, input.operations.fast prunes trmm3 and all transposition
and conjugation parameters from the level-3 test space.
- Reduced problem size tested in input.general.fast to 100 and disabled
testing of 1m method.
commit c441caa95aabe69f54e2160eb67bf4ca76a66c34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 20 17:56:02 2018 -0500
README update.
Details:
- Minor updates to README.md.
- Minor change to blastest/Makefile.
commit 6fe018eb4ac8c16f2edc916c24f5994848017b7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 20 15:35:45 2018 -0500
Added .gitkeep file to blastest/obj.
Details:
- Added an empty file named '.gitkeep' to blastest/obj/ so that git will
track the otherwise empty directory. (This is already done for the BLIS
testsuite in testsuite/obj.)
commit 0e6d000db9291342913dc5f8590a28c67bbcbc95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 20 15:08:43 2018 -0500
Updated .gitignore to ignore BLAS test out.* files.
commit 40c040a31d96fbadff11f761d0cad1ef03ef2cc5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 20 14:33:50 2018 -0500
Fixes to .travis.yml.
Details:
- Invoke the full BLIS testsuite via 'make testblis' instead of the fast
version via 'blistest-fast' (which was wrong anyway, since the correct
fast traget is 'testblis-fast').
- Invoke the BLAS tests via 'make testblas' instead of 'blastest'.
commit 664ec4813d8b53121cce7a68bef47da656ece9cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 20 13:54:58 2018 -0500
Integrated f2c'ed netlib BLAS test suite.
Details:
- Created a new test suite that exercises only the BLAS compatibility
found in BLIS. The test suite is a straightforward port of code
obtained from netlib LAPACK, run through f2c and linked to a stripped-
down version of libf2c that is compiled along with the test drivers
(to prevent any obvious ABI issues). The new BLAS test suite can be
run from within its new local directory, 'blastest' (through its local
'make ; make run' targets) or from the top-level Makefile (via the
'make testblas' target). Output files are created in whatever directory
the test drivers are run, whether it be the 'blastest' directory, the
top-level source distribution directory, or the out-of-tree directory
in which 'configure' was run. Also, the results of the BLAS test suite
can be checked via 'make checkblas', which summarizes the presence or
absence of test failures in a single line printed to stdout.
- Updated the 'test' target to run both 'testblis' and 'testblas'.
- Added a new 'testblis-fast' target that runs the BLIS testsuite with
smaller problem sizes, allowing it to finish more quickly.
- Added a 'make check' target, which runs 'checkblis-fast' and
'checkblas'.
- Changed .travis.yml so that Travis CI runs 'testblis-fast' instead of
'testblis' before (calling the check-blistest.sh script to check the
result manually).
- Renamed some targets in the top-level Makefile to be consistent between
BLAS and BLIS.
commit fc53ad6c5b2e39238b1bbbf625cc0c638b9da4e1
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Mar 19 12:49:26 2018 +0530
Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
commit d12d34e167d7dc32732c0ed135f8065a55088106
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Mar 19 11:34:32 2018 +0530
Re-enabling Zen optimized cache block sizes for config target zen
Change-Id: I8191421b876755b31590323c66156d4a814575f1
commit 40fa10396c0a3f9601cf49f6b6cd9922185c932e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 19 18:19:43 2018 -0500
Fixed a few obscure bugs in the BLAS API.
Details:
- Fixed a missing parameter in the definition of sdsdot_(). The 'sb'
argument was missing. Strangely, the argument is omitted from dsdot_()
in the BLAS API.
- Fixed the missing 'c' or 'u' in the "?gerc" or "?geru" operation string
passed to xerbla_() by the bla_ger_check() macro.
- For bla_syrk_check() and bla_syr2k_check() macros, only allow
conjugate-transpose (trans='c') as a valid argument for the real
domain functions [sd]syrk_() and [sd]syr2k_(). (Previously, the
argument was allowed even for the complex domain equivalents, which
was inconsistent with the BLAS API.)
commit fe7d7f1e43e4c26249eed83d4188beee1ba96202
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 18 19:43:06 2018 -0500
Fixed cpp macro parameter "ch" typo in bla_ger.c.
Details:
- Previously, the BLAS routine-generating macro in bla_ger.c was
incorrectly passing MKSTR(ch) into the _check() macro when it
should have been passing in the char that was available, chxy.
I've instead changed the name of the macro parameter from chxy
to ch. Similar change as made to bla_ger.h for consistency.
Thanks to Dave Love in helping track this down. (NOTE: This is
actually the root cause of the bug that was first patched by
increasing the length of the operation name strings passed into
xerbla_(), as defined by the constant BLIS_MAX_BLAS_FUNC_STR_LENGTH,
in 3d1a5a7. In theory, that change could be backed out now.)
- Applied aforementioned chxy->ch change to bla_dot.[ch], as well as
frame/compat/cblas/f77_sub/f77_dot_sub.[ch] (not because it needed
to happen, but for naming consistency).
- Reformatted function signatures/prototypes of CBLAS functions and
function calls to BLAS in frame/compat/cblas/f77_sub/*.c.
commit cb7ed90752d1ddbac11368c4510641ca4f3a02eb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 16 13:05:56 2018 -0500
Convert op names to uppercase before calling xerbla_().
Details:
- Defined a new function, bli_string_mkupper(), that calls toupper() on
every non-NULL character in a string.
- Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3
BLAS _check() macros. This prevents the BLAS testsuite from complaining
that the operation name (e.g. "dgemm") does not match the expected
value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue.
commit 3d1a5a7c08fed3ba29f060fe1db2b0dc42dde223
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 16 12:24:07 2018 -0500
Fixed printf() format overflow.
Details:
- Increased the length of operation name strings passed to xerbla_() in
the level-2 and level-3 operation _check() functions, found in
frame/compat/check. This avoids a format specifier overflow warning by
gcc 7. Thanks to Dave Love for reporting this issue and suggesting the
fix.
commit c73055f028684d998e03b2392093c393782bbfe7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 15 16:08:21 2018 -0500
Return after non-zero info in BLAS checks.
Details:
- Previously, when calling the BLAS compatibility layer, discovering a
parameter check failure would result in the proper setting of the
info parameter (printed by xerbla_()), but would also come with an
immediate abort() rather than a return. This was incorrect behavior
for two overlapping reasons.
(1) BLAS should return gracefully to the caller in the event of a
bad set of parameters, not abort().
(2) When BLIS was being tested via the BLAS testsuite, BLIS's
xerbla_() would correctly get preempted/overridden by the
xerbla_() in the BLAS testsuite, but execution would then
erroneously continue on to the BLIS implementation with bad
parameter values.
- The previous issue was addressed by disabling the abort() in BLIS's
xerbla_(), changing all of the BLAS _check() functions to cpp macros,
and adding a return statement to the end of each _check() macro's
"if ( info != 0 )" conditional.
Thanks to Dave Love for reporting this issue.
commit c4f1d18b97a6a8c3ea0366aa759db597a664062a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 14 19:10:09 2018 -0500
Minor typo fix to printing arch in testsuite.
Details:
- Mistakenly was calling bli_cpuid_query_id() instead of
bli_arch_query_id() in the recent addition to the testsuite output
that prints the active sub-configuration. The former function is
only used for multi-architecture builds, whereas the latter is the
more general option that also works for single configuration
(including 'configure auto') builds.
commit 8f2fabec800a720b3e94b33c0048cc8c4ead436d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 14 17:43:42 2018 -0500
Make arm32 and arm64 families work. (#176)
commit fc6a1842518a0820c6708c285611346d5a1419da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 14 15:31:17 2018 -0500
Print sub-configuration name in testsuite output.
Details:
- Added a line to the testsuite output that prints the name of the
current/active sub-configuration. This is useful when linking the
testsuite against multi-configuration builds because it confirms
the sub-configuration that is actually being employed at runtime.
Thanks to Devin Matthews for suggesting this feature.
commit 9943a899d64bf7ec4a24106f6f4c70629bbe1f6e
Merge: 290dd4a9 b1a15ae6
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 14 13:27:44 2018 -0500
Merge pull request #173 from devinamatthews/dev
Fix Cortex-A9 and Cortex-A15 configs.
commit b1a15ae6ee0f46c9a95cf59f9555925e0e8e21ff
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 14 13:26:44 2018 -0500
Use BLIS_H_FLAT
commit 290dd4a9feee447e69b40ad108954af78e196f7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 14 13:15:37 2018 -0500
Allow arbitrarily deep configuration families.
Details:
- Updated configure so that configuration families specified in the
config_registry are no longer constrained as being only one level
deep. For example, previously the x86_64 family could not be defined
concisely in terms of, say, intel64 and amd64 families, and instead
had to be defined as containing "haswell, sandybridge, penryn, zen,
etc." In other words, families were constrained to only having
singleton configurations as their members. That constraint is now
lifted.
- Redefined x86_64 family in config_registry in terms of intel64 and
amd64.
commit 9cee78e006d56543ac02fc9c488905c0434e60ae
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 14 13:09:48 2018 -0500
Fix Cortex-A9 and Cortex-A15 configs.
Tested with QEMU.
commit 1a3031740f7fcbbcc2c99d5c4cb50d0413407455
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 13 16:04:40 2018 -0500
Updates to ARM hardware detection support.
Details:
- Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c.
Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit)
sub-configurations are supported. However, the functions that detect
features specific to a15 and a9 are identical, and since a15 is tested
first, it will always be chosen for arm32 hardware (even if both
sub-configurations were enabled at configure-time and the library is
linked and run on an a9). Thus, more work needs to be done to
distinguish these two.
- Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either
the x86_64 or ARM code will be compiled (or neither, if neither
environment is detected).
- In bli_arch_query_id(), call bli_cpuid_query_id() when the
BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined.
- Added arm64 and arm32 configuration families to config_registry.
- Added a note to the arch_t typedef enum in bli_type_defs.h reminding
the developer to update the string array in bli_arch.c whenever new
enum values are added or existing values are reordered.
commit 1442d06886ebdc34d8f1cb620229ddc6062c2ce8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 11 16:59:50 2018 -0500
Fixed misnamed kernels in _cntx_init_cortexa57.c.
Details:
- Changed incorrect kernel function names in bli_cntx_init_cortexa57.c:
bli_sgemm_cortexa57_asm_8x12 -> bli_sgemm_armv8a_asm_8x12
bli_dgemm_cortexa57_asm_6x8 -> bli_dgemm_armv8a_asm_6x8
Thanks to Jacob Gorm Hansen for reporting this issue.
commit 28bcea37dfcf0eb99a99da6f46de2a2830393d1d
Merge: b1ea3092 8b0475a8
Author: praveeng <praveen.g@amd.com>
Date: Fri Mar 9 19:13:08 2018 +0530
Merge master code till 06_mar_2018 to amd-staging
Change-Id: I12267e5999c92417e3715fef4f36ac2131d00f1a
commit 48da9f5805f0a49f6ad181ae2bf57b4fde8e1b0a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 7 12:54:06 2018 -0600
Tweaked common.mk, Makefile, skx/knl make_defs.mk.
Details:
- Reorganized linker-related section of common.mk so that LDFLAGS set
in a sub-configuration's make_defs.mk file will not be immediately
(and erroneously) overridden by the default values.
- Re-enabled redirected (to file) output of the testsuite when run from
the top-level Makefile via 'make test'. (For some reason, it was
commented-out for the non-verbose case.)
- Removed old/unnecessary code from the make_defs.mk files of skx and
knl sub-configurations.
commit 8b0475a87daa177916e2caac0e530c6a57fa07cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 6 06:39:44 2018 -0600
Fixed typo in attempted fix in 1a8350f7.
Details:
- Mistakenly entered 148 as knl mc blocksize for double real when the
value should have been 144. Thanks to Dave Love for reporting this.
commit 8912e6886b97eabb4ce0c35a3609a0fd994d347b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 5 18:00:45 2018 -0600
Fixed missing flags during shared object build.
Details:
- Fixed a bug in common.mk that caused warning, position-independent
code, miscellaneous, and general preprocessor flags to be omitted
from the configuration family-specific variables that hold those
values, as registered by the family's make_defs.mk file. This would
most obviously manifest when targeting a configuration family such as
'intel64' while simultaneously configuring for a shared object build,
as the key '-fPIC' flag would be omitted at compile-time and prevent
successful linking. Thanks to Dave Love for reporting this bug.
- Other cleanups to common.mk for readability and clarity.
commit 1a8350f70557fc53ca0c2eadf2076710dd0d9bc9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 5 13:32:00 2018 -0600
Fixed cache blocksize bug in knl configuration.
Details:
- Changed the mc blocksize for double real execution in the knl sub-
configuration from 160 to 148. The old value was not a multiple of
mr (which is 24), and thus the safeguards in bli_gks_register_cntx()
were tripping. Thanks for Dave Love for reporting this issue.
- Switch knl sub-configuration to use default blocksizes for datatypes
not supported by native kernels.
- Fixed typos in bli_error.c that prevented certain error strings
(which report maximum cache blocksizes not being multiples of their
corresponding register blocksize) from properly initializing.
commit c09fffa827fe6241dc20193a1c404496664220de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 3 13:13:39 2018 -0600
Added missing cntx_t* arg in knl packm kernels.
Details:
- Added the missing cntx_t* argument to the function signature of packm
kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this
issue.
commit b1ea30925dff751eced23dfa94ff578a20ea0b94
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 23 17:42:48 2018 -0600
CHANGELOG update (0.3.0)
Change-Id: Id038b00a62de51c9818ad249651ec5dc662f4415
commit 1ef9360b1fd0209fbeb5766f7a35402fbd080fcb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 1 14:36:39 2018 -0600
Enable non-unit vector stride tests by default.
Details:
- Change "vector storage schemes to test" parameter in testsuite's
input.general file to "cj". This means that both unit stride column
vectors and non-unit stride column vectors will be tested in
operations with vector operands (e.g. level-1v, level-1f, level-2).
- Very minor comment (typo) changes to input.operations.
commit 8c4e55a1a1ead9a5e970200fee027ffd2c7e8454
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 28 17:01:47 2018 -0600
Added individual operation overrides in testsuite.
Details:
- Updated the testsuite driver so that setting one or more individual
operation test switches to "2" in input.operations will enable ONLY
those operations and disable all others, regardless of the values of
the section overrides and other operation switches. This makes it
every easy to quickly test only one or two operations, and equally
easy to revert back to the previous combination of operation tests.
- Added more comments to input.operations describing the use of
individual "enable only" overrides.
commit 34862aed89e5d5a8f35aeecd49f3052ada1f337b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 28 15:30:14 2018 -0600
Use zen kernels in haswell sub-configuration.
Details:
- Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv,
dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf
and dotxf. This works because these kernels simply target AVX/AVX2,
and therefore work without modification on haswell hardware.
- Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen
kernels are essentially identical to those used by haswell, except that
now zen kernels are a bit more up-to-date. In the future, I may
continue to maintain duplicates, or I may keep the kernels named after
one architecture (zen or haswell) but used by both sub-configurations.
- In config_registry, enable use of both haswell and zen kernels for the
haswell sub-configuration. This is necessary in order to make zen
kernels visible when registering kernels in bli_cntx_init_haswell.c.
- Enable use of assembly-based complex gemm microkernels for zen,
bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in
bli_cntx_init_zen.c. This was actually intended for 1681333.
commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 23 17:42:48 2018 -0600
Version file update (0.3.0)
commit d9079655c9cbb903c6761d79194a21b7c0a322bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 23 17:42:48 2018 -0600
CHANGELOG update (0.3.0)
commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 23 17:38:19 2018 -0600
Applied 34b72a3 to non-active/unused microkernels.
Details:
- Applied the read-beyond-bounds bugfix in 34b72a3 to other haswell and
zen kernels (ie: other microtile shapes) which are not used by default.
This was done mostly in case someone decided to pick up these kernels
and start using them, not because it affects BLIS's behavior
out-of-the-box.
commit 34b72a351745aa0d47bb0b74ebcd0f0a616d613d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 23 16:33:32 2018 -0600
Fixed obscure read-beyond-bounds bug in sgemm ukrs.
Details:
- Fixed an obscure bug in the bli_sgemm_haswell_asm_6x16 and
bli_sgemm_zen_asm_6x16 microkernels when the input/output matrix C
is stored with general stride (ie: both rs and cs are non-unit). The
bug was rooted in the way those microkernels read from matrix C--
namely, they used vmovlps/vmovhps instead of movss. By loading two
floats at a time, even if one of them was treated as junk, the
assembly code could be written in a more concise manner. However,
under certain conditions--if m % mr == 0 and n % nr == 0 and the
underlying matrix is not an internal "view" into a larger matrix--
this could result in the very last vmovhps of the last (bottom-right)
microkernel invocation reading beyond valid memory. Specifically, the
low 32 bits read would always be valid, but the high 32 bits could
reside beyond the bounds of the array in which the output C matrix is
contained. To remedy this situation, we now selectively use movss to
load any element that could be the last element in the matrix.
commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 23 14:31:26 2018 -0600
Added missing 'restrict' to some kernels' cntx_t*.
Details:
- Added missing 'restrict' keyword to cntx_t* argument of function
signatures corresponding to level-1v, level-1f, and level-1m kernels.
This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and
bli_l1m_ker_prot.h. (The 'restrict' was already being used to
qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.)
- Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and
bli_l3_ukr.h that help explain how those headers function to produce
kernel prototypes using the prototype macros defined in the files
mentioned above.
commit 1fa8af95d807168e0849adb668492601e7009be0
Merge: c084b03b 16813335
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 21 17:54:02 2018 -0600
Merge branch 'rt'
commit c084b03b31d84427a120e391963db5419f1911ee
Merge: 5d03b6e6 fa74af4e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 21 17:52:17 2018 -0600
Merge branch 'rt'
commit 16813335bdb5978bc9a26cd00a32bd5a130130c4
Merge: fa74af4e 5a7005dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 21 17:43:32 2018 -0600
Merge branch 'amd' into rt
Details:
- Merged contributions made by AMD via 'amd' branch (see summary below).
Special thanks to AMD for their contributions to-date, especially with
regard to intrinsic- and assembly-based kernels.
- Added column storage output cases to microkernels in
bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
the extra cost of transposing the microtile in registers, this is
much faster than using the general storage case when the underlying
matrix is column-stored.
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
column storage optimization mentioned above).
- Updated zen sub-configuration to reflect presence of new native
kernels.
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
to smaller haswell values.
- Temporarily disabled small matrix handling for zen configuration
family in config/zen/bli_family_zen.h.
- Updated zen CFLAGS according to changes in 1e4365b.
- Updated haswell microkernels such that:
- only one vzeroupper instruction is called prior to returning
- movapd/movupd are used in leiu of movaps/movups for double-real
microkernels. (Note that single-real microkernels still use
movaps/movups.)
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
now included via frame/include/bli_arch_config.h.
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
in testsuite/src/test_amaxv.c).
- Added early return for alpha == 0 in bli_dotxv_ref.c.
- Integrated changes from f07b176, including a fix for undefined
behavior when executing the 1m method under certain conditions.
- Updated config_registry; no longer need haswell kernels for zen
sub-configuration.
- Tweaked marginal and pass thresholds for dotxf.
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
comments.
- Updated LICENSE file to explicitly mention that parts are copyright
UT-Austin and AMD.
- Added AMD copyright to header templates in build/templates.
Summary of previous changes from 'amd' branch.
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
and scalv, with extra-unrolling variants for axpyv and scalv.
- Added a small matrix handler to bli_gemm_front(), with the handler
implemented in kernels/zen/3/bli_gemm_small_matrix.c.
- Added additional logic to sumsqv that first attempts to compute the
sum of the squares via dotv(). If there is a floating-point exception
(FE_OVERFLOW), then the previous (numerically conservative) code is
used; otherwise, the result of dotv() is square-rooted and stored as
the result. This new implementation is only enabled when FE_OVERFLOW
is #defined. If the macro is not #defined, then the previous
implementation is used.
- Added axpyv and dotv standalone test drivers to test directory.
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.
commit 5d03b6e6e19d5a07f0cccf1a158f02fbd62dfd99
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Feb 19 11:31:30 2018 -0600
Fix asm macro include line for KNL. Fixes #167.
commit f07b176c84dc9ca38fb0d68805c28b69287c938a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 15 18:36:54 2018 -0600
Fixed an obscure bug in the 1m implementation.
Details:
- Fixed a bug in the way the bli_gemm1m_cntx_ref() function (defined in
ref_kernels/bli_cntx_ref.c) initializes its context for 1m execution.
Previously, the function probed the context that was in the process of
being updated for use with 1m--this context being previously
initialized/copied from a native context--for its storage preference
to determine which "variant" (row- or column-oriented) of 1m would be
needed. However, the _cntx_ref() function was not updating the method
field of the context until AFTER this query, and the conditional which
depended on it, had taken place, meaning the storage preference query
function would mistakenly think the context was for native execution,
since the context's method field would still be set to BLIS_NAT. This
would lead it to incorrectly grab the storage preference of the complex
domain microkernel rather than the corresponding real domain
microkernel, which could cause the storage preference predicate to
evaluate to the wrong value, which would lead to the _cntx_ref()
function choosing the wrong variant. This could lead to undefined
behavior at runtime. The method is now explicitly set within the
context prior to calling the storage preference query function.
- Updated comments in frame/ind/oapi/bli_l3_3m4m1m_oapi.c.
- Fixed a typo in the commented-out CFLAGS in config/zen/make_defs.mk,
which are appropriate for gcc 6.x and newer. (Mistakenly used
-march=bdver4 instead of -march=znver1.)
commit 1f94bb7b96eb2b67257e6c4df89e29c73e9ab386
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 19 12:46:53 2018 -0600
Document how to enable zen-specific instructions.
Details:
- Added as a comment in config/zen/make_defs.mk the list of compiler flags
that could be added to manually enable the instructions provided by the
Zen microarchitecture that are not already implied by -march=bdver4.
This information, along with the previous commit's flags to selectively
disable Bulldozer instructions no longer present in Zen, was gathered
from [1]. I hesitate to enable use of these instructions since I don't
have any Zen hardware to test on yet.
[1] https://wiki.gentoo.org/wiki/Ryzen
commit 1e4365b21bafa02bd108c5ac4705a25671fb9441
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 18 12:03:51 2018 -0600
Augment zen CFLAGS to prevent illegal instruction.
Details:
- Added various compiler flags (-mno-fma4 -mno-tbm -mno-xop -mno-lwp) so
that compiling with -march=bdver4 on zen-based architectures does not
result in an illegal instruction error at runtime. Note: This fix is
only needed for gcc 5.4; gcc 6.3 or later supports the use of
-march=znver1, which can be used in lieu of the augmented set of flags
based on bdver4. Thanks to Nisanth Padinharepatt for reporting this
error.
commit fa74af4e1fa7385ac3f3089fe1ea7bb88c906029
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 9 13:43:15 2018 -0600
Minor labeling update for './configure -c' output.
Details:
- Print the name of the configuration in the output of the
kernel-to-config map (and chosen pairs list) as a subtle way to remind
the user that these only apply to the targeted configuration (whereas
the config list and kernel list are printed without regard to which
configuration was actually targeted).
commit 5cdea756c7391e2c6cbfb38436ef9a205f860237
Merge: 9d8858b5 1e7a4896
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jan 7 19:45:20 2018 -0600
Merge branch 'rt'
commit 9d8858b5cff4a4b078b87872847a5710073fff0a
Merge: 0b3ca3cf f7df64da
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sun Jan 7 10:03:25 2018 -0600
Merge pull request #164 from devinamatthews/master
Don't use memkind for skx configuration.
commit f7df64daf6bbe6431effada6e13d8d1fab5aa221
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sun Jan 7 09:37:25 2018 -0600
Don't use memkind for skx configuration. Fixes #163.
commit 1e7a4896e0cbe73c4685fa956278e3f28273cdf9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 5 12:33:48 2018 -0600
Minor error handling in update-version-file.sh.
Details:
- Added explicit handling of situations when 'git describe --tags'
returns an error. This command is used by update-version-file.sh
when deciding whether or not to update the version file prior to
configuration.
- Removed bli_packm.c and bli_unpackm.c, as they contained no source
code.
commit 0b3ca3cfb682715a3686fd93ebb10d4a695d1162
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 4 20:51:35 2018 -0600
Intelligently select compiler for auto-detection.
Details:
- Rewrote code that selects the compiler for the purposes of compiling
the auto-detection executable. CC (if specified) is tried first. Then
gcc. Then clang. The absolute fallback is cc. The previous code was
sort of broken, and seemed to unintentionally always use gcc.
- Moved various configuration-agnostic flags from config/*/make_defs.mk
files to common.mk. The new mechanism appends the configuration-
agnostic flags to the various compiler flag variables initialized in
make_defs.mk. Flags specific to the sub-configuration are still set
in make_defs.mk.
- Added -Wno-tautological-compare to CMISCFLAGS when clang is in use.
Also added the flag to the compiler instantiation during configure-
time hardware detection (when clang is selected).
- Added some missing (but mostly-optional) quotes to configure script.
commit 5a7005dd44ed3174abbe360981e367fd41c99b4b
Merge: 7be88705 3bc99a96
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Wed Jan 3 12:05:12 2018 +0530
Merge changes in AMD beta release 0.95 into amd branch
commit 0b9c5127e91508c115228ca604ee2dac8de8f477
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Dec 23 15:53:44 2017 -0600
Enabled C99, added stdint.h to auto-detect build.
Details:
- Added "-std=c99" to compiler arguments when building auto-detection
driver in configure script.
- Added #include <stdint.h> to all three source files needed by auto-
detection program.
commit 0ce5e19c318e04909d3e664d69accb3a0fc6b988
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Dec 23 15:32:03 2017 -0600
Reimplemented configure-time hardware detection.
Details:
- Reimplemented the hardware detection functionality invoked when running
"./configure auto". Previously, a standalone script in build/auto-detect
that used CPUID was used. However, the script attempted to enumerate all
models for each microarchitecture supported. The new approach recycles
the same code used for runtime hardware detection introduced in 2c51356.
This has two immediate benefits. First, it reduces and consolidates the
code required to detect microarchitectures via the CPUID instruction.
Second, it provides an indirect way of testing at configure-time the
code that is used to detect hardware at runtime. This code is (a) only
activated when targeting a configuration family (such as intel64 or
amd64) at configure-time and (b) somewhat difficult to test in
practice, since it relies on having access to older microarchitectures.
- The above change required placing conditional cpp macro blocks in
bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include
a bare-bones set of headers that does not rely on the presence of a
bli_config.h header. This is needed because bli_config.h has not been
created yet when configure-time auto-detection takes places.
- Defined a new function in bli_arch.c, bli_arch_string(), which takes
an arch_t id and returns a pointer to a string that contains the
lowercase name of the corresponding microarchitecture. This function
is used by the auto-detection script to printf() the name of the
sub-configuration corresponding to the detected hardware.
commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 21 19:22:57 2017 -0600
Added option to disable pack buffer memory pools.
Details:
- Added a new configure option, --[en|dis]able-packbuf-pools, which will
enable or disable the use of internal memory pools for managing buffers
used for packing. When disabled, the function specified by the cpp
macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed
(and BLIS_FREE_POOL is called when the buffer is ready to be released,
usually at the end of a loop). When enabled, which was the status quo
prior to this commit, a memory pool data structure is created and
managed to provide threads with packing buffers. The memory pool
minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls
BLIS_MALLOC_POOL), but does so through a somewhat more complex
mechanism that may incur additional overhead in some (but not all)
situations. The new option defaults to --enable-packbuf-pools.
- Removed the reinitialization of the memory pools from the level-3
front-ends and replaced it with automatic reinitialization within the
pool API's implementation. This required an extra argument to
bli_pool_checkout_block() in the form of a requested size, but hides
the complexity entirely from BLIS. And since bli_pool_checkout_block()
is only ever called within a critical section, this change fixes a
potential race condition in which threads using contexts with different
cache blocksizes--most likely a heterogeneous environment--can check
out pool blocks that are too small for the submatrices it wishes to
pack. Thanks to Nisanth Padinharepatt for reporting this potential
issue.
- Removed several functions in light of the relocation of pool reinit,
including bli_membrk_reinit_pools(), bli_memsys_reinit(),
bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool().
- Updated the testsuite to print whether the memory pools are enabled or
disabled.
commit 107801aaae180c00022f1b990bc59038c14949d2
Merge: d9c05745 0084531d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 18 16:29:28 2017 -0600
Merge branch 'master' into selfinit
commit 0084531d3eea730a319ecd7018428148c81bbba7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Dec 17 18:58:25 2017 -0600
Updated flatten-headers.py for python3.
Details:
- Modifed flatten-headers.py to work with python 3.x. This mostly
amounted to removing print statements (which I replaced with calls
to my_print(), a wrapper to sys.stdout.write()). Thanks to Stefan
Husmann for pointing out the script's incompatibility with python 3.
- Other minor changes/cleanups.
commit 90b11b79c302f208791bdfb1ed754873103c7ce5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Dec 17 17:34:32 2017 -0600
Modest performance boost to flatten-headers.py.
Details:
- Updated flatten-headers.py to pre-compile the main regular expression
used to isolate #include directives and the header filenames they
reference. The compiled regex object is then used over and over on
each header file in the tree of referenced headers. This appears to
have provided a 1.7-2x performance increase in the best case.
- Other minor tweaks, such as renaming the main recursive function from
replace_pass() to flatten_header().
commit 99dee87f30b4d437fa6b5e4ba862526d07b9f08b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Dec 17 16:47:27 2017 -0600
Reimplemented flatten-headers.sh in python.
Details:
- Added flatten-headers.py, a python implementation of the bash script
flatten-headers.sh. The new script appears to be 25-100x faster,
depending on the operating system, filesystem, etc. The python script
abides by the same command line interface as its predecessor and
targets python 2.7 or later. (Thanks to Devin Matthews for suggesting
that I look into a python replacement for higher performance.)
- Activated use of flatten-headers.py in common.mk via the FLATTEN_H
variable.
- Made minor tweaks to flatten-headers.sh such as spelling corrections
in comments.
commit d9c0574599c3f97c0f9b6c334a077bab9452e1f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 14 17:13:42 2017 -0600
Allow travis failures of OS X builds that run testsuite.
Details:
- Added an allowance for OS X builds that run the testsuite to fail.
There seems to be an issue with 1m when running in Travis CI under
OS X and clang, but only in double-precision. Haven't been able to
reproduce the error on my own, and thus, I can't debug it. (Hopefully
it is simply a version-specific compiler bug.)
commit 86cd23b7379b00a42b4ecc04fa668f1e3f9b54ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 14 15:47:41 2017 -0600
Fixed testsuite Makefile brokenness from 9091a207.
Details:
- Fixed a makefile error encountered when building the testsuite directly
in its directory (as opposed to indirectly via 'make test'). The fix
involves introducing a new variable, BUILD_PATH, alongside the existing
DIST_PATH variable. By default, BUILD_PATH is set to the current
directory, and is overridden by other Makefiles used by, for example,
the testsuite and standalone test drivers in testsuite or test,
respectively.
- Some files/directories in common.mk were redefined in terms of
BUILD_DIR, such as the locations of config.mk file and the intermediate
include directory.
commit 6a3a8924c04d25507fc4aa593df30c56c7dc12f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 14 13:20:02 2017 -0600
Temporarily show Makefile's testsuite output.
Details:
- Disabled redirection of testsuite output for 'test' target. This is
part of an attempt to debug a segmentation fault on OS X via Travis.
commit 9a01080dd426915bed18229f70401bfa639dc283
Merge: 83316485 a32e8a47
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 14 11:27:19 2017 -0600
Merge branch 'master' into selfinit
commit a32e8a47c022b6071302b2956af5728976c83ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 16:31:36 2017 -0600
Added an exclusion to .travis.yml.
Details:
- Added exclusion for out-of-tree builds on OS X (clang).
commit b9f7d987df548965c86e16e0ba94d5cad0d9b399
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 16:22:09 2017 -0600
Cleaned up after previous travis oot debugging.
Details:
- Removed debugging output from common.mk related to Travis CI
out-of-tree builds.
- Other minor cleanups to common.mk.
commit 9091a207aa8c49e279676ea02be533480b3b0d5a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 16:12:34 2017 -0600
Attempted fix to travis oot build failure.
Details:
- Found the likely cause of the Travis CI out-of-tree build failures:
config.mk was being read from DIST_PATH, rather than the current
directory.
commit c01c71c33e236e6c91f5ddd3ec1e3faec89368c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 15:58:50 2017 -0600
Added debugging output to Makefile.
Details:
- Added $(info ...) statements in key locations in an attempt to reveal
why Travis CI doesn't like building BLIS out-of-tree.
commit 784289d69dd6b3692444d3b3e290f6a014465b72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 15:31:27 2017 -0600
Updated SHELL in common.mk from /bin/bash to bash.
commit d9bb1d1d4ebc89ea75d9d927d09882162a914f77
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 15:27:54 2017 -0600
Defined SHELL in common.mk so "echo -n" works.
Details:
- Defined the SHELL variable in common.mk as "/bin/bash" so that the
-n option can be used with echo in the Makefile rule for flattening
blis.h. Thanks to Devin Matthews for suggesting this fix.
commit 9289a08667df2044f3a37af54d893efe2b56d555
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 15:14:27 2017 -0600
Attempt 3 on .travis.yml.
commit 720bfcf0ef54fdc41df0dcaa94503edb0d5c8972
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 14:52:28 2017 -0600
More fixes to .travis.yml.
Details:
- Fixed a mistake (hopefully) in d0c4dd0 that resulted in many more
osx/clang sub-tests than intended.
- Shortened the variable names in an effort to make them more readable
via the Travis CI web interface.
commit 8717c9c97fe9b1ecd3b3192049a73976f8390ca7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 14:36:37 2017 -0600
Added 'pwd' commands to .travis.yml for debugging.
Details:
- Added 'pwd' commands to the script portion of the .travis.yml file in
an attempt to uncover the problem with the recent out-of-tree build
testing changes made in d0c4dd0.
commit 83316485ce10f6fcafe92a1c146282de0dd8068a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 13 14:14:50 2017 -0600
Simplified/fixed self-initialization.
Details:
- Fixed a race condition in self-initialization whereby the bli_is_init
static variable could be erroneously read as TRUE by thread 1 while
thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
use the library before it is actually ready. Thanks to to Minh Quan Ho
and Devin Matthews for pointing out this issue.
- Part of the solution to the aforementioned race condition was involved
replacing the runtime initialization of the global scalar constants
(e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
initialization of those same constants. This eliminates the need for
bli_const_init() altogether. (The static initialization is made concise
via preprocess macros.)
- Defined bli_gks_query_cntx_noinit(), which behaves just like
bli_gks_query_cntx(), except that it does not call bli_init_once(). This
function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
bli_memsys_init() so as to not result in any recursion into
bli_init_once().
- Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
They have no use in BLIS or its test products, and we have little reason
to believe they are used by others.
- Removed testsuite/out file, which was accidentally committed as part
of 70640a3.
commit 6526d1d4ae6dbfa854ca8d1e5f224cd6ab3fa958
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 12 13:50:43 2017 -0600
Added temp_dir argument to flatten-headers.sh.
Details:
- Added "temp_dir" argument to flatten-headers.sh so that the caller can
specify where intermediate files should be created as the script runs.
- Updated flatten-headers.sh to create intermediate files in temp_dir
instead of alongside the corresponding source files. This should now
(once again) allow out-of-tree builds where the BLIS distribution is
read-only, or where the out-of-tree build is running concurrently with
another out-of-tree build. (Thanks to Devin Matthews for pointing out
the possibility of simultaneous out-of-tree builds.)
commit 94755017c967630daf2e31c1f63ed5e88ab0d6ab
Merge: d0c4dd00 5cf7b0c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 12 12:50:41 2017 -0600
Merge branch 'master' of github.com:flame/blis
commit d0c4dd000ff38acc249e8acf7e0655a523991695
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 12 12:47:53 2017 -0600
Added out-of-tree build test to .travis.yml file.
Details:
- Modified .travis.yml file to include an out-of-tree build test (using
the "auto" configure target). Thanks to Devin Matthews for this
suggestion.
commit 5cf7b0c4e52922069183a87dc2aa177419644e04
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Dec 12 12:38:48 2017 -0600
Ignore blis.h.interm [ci skip]
commit 8d8ff74d15b4a584929cec36034ba6d3c53f7d27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 12 12:32:50 2017 -0600
Further attempt to fix out-of-tree builds.
Details:
- Fix applied in 87978f6 was necessary but not sufficient to fix
out-of-tree builds. It turns out that using a source tree that had
already built the target erroneously gave the impression that
out-of-tree builds were working again, when in fact they were still
broken. The additional changes in this commit should complete the
fix that was started in the aforementioned commit. Thanks to Devin
Matthews and Shaden Smith for their help in isolating this issue.
commit 70640a37109290b57c344083c00624e13c496e30
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 11 17:18:43 2017 -0600
Implemented library self-initialization.
Details:
- Defined two new functions in bli_init.c: bli_init_once() and
bli_finalize_once(). Each is implemented with pthread_once(), which
guarantees that, among the threads that pass in the same pthread_once_t
data structure, exactly one thread will execute a user-defined function.
(Thus, there is now a runtime dependency against libpthread even when
multithreading is not enabled at configure-time.)
- Added calls to bli_init_once() to top-level user APIs for all
computational operations as well as many other functions in BLIS to
all but guarantee that BLIS will self-initialize through the normal
use of its functions.
- Rewrote and simplified bli_init() and bli_finalize() and related
functions.
- Added -lpthread to LDFLAGS in common.mk.
- Modified the bli_init_auto()/_finalize_auto() functions used by the
BLAS compatibility layer to take and return no arguments. (The
previous API that tracked whether BLIS was initialized, and then
only finalized if it was initialized in the same function, was too
cute by half and borderline useless because by default BLIS stays
initialized when auto-initialized via the compatibility layer.)
- Removed static variables that track initialization of the sub-APIs in
bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
bli_ind.c. We don't need to track initialization at the sub-API level,
especially now that BLIS can self-initialize.
- Added a critical section around the changing of the error checking
level in bli_error.c.
- Deprecated bli_ind_oper_has_avail() as well as all functions
bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
name. These functions had no use cases within BLIS and likely none
outside of BLIS.
- Commented out calls to bli_init() and bli_finalize() in testsuite's
main() function, and likewise for standalone test drivers in 'test'
directory, so that self-initialization is exercised by default.
commit 70a64432ee5a7adbee10fb7ff6d7b608c1940a7a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 11 13:14:20 2017 -0600
Fixed off-by-one indexing in bli_cpuid.c.
Details:
- In bli_cpuid.c, fixed an off-by-one indexing statement in vpu_count()
whereby a string-terminating NULL character, '\0', is written beyond
the bounds of the model_num string.
- Minor whitespace and formatting edits to bli_cpuid.c.
commit 87978f6261a080d261d01f9acf4e9cc18855c833
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 11 12:49:03 2017 -0600
Fixed broken out-of-tree builds since 52f9e6f.
Details:
- Added missing $(DIST_PATH)/ prefix to relative path to flatten-headers.sh
script in common.mk so that the script could be found during out-of-tree
builds. Thanks to Devin Matthews for reporting this bug.
commit 513ef4d040f89a18dda5154e8c4cf1aaf7463999
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 11 12:35:59 2017 -0600
Various typecasting fixes, mis-typed enums, etc.
Details:
- Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
- Properly typecast integer arguments to match format specifier in various
calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
bli_util_oapi.c.
- Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
bli_cntx.h.
- Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
l1fkr_t or l1vkr_t).
- Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
value BLIS_GEMM_UKR in bli_cntx_ref.c.
- NOTE: These issues were identified via compiler warnings when building
BLIS with clang on a rather old installation of OS X:
$ clang --version
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin15.2.0
Thread model: posix
commit 3bc99a96a3648f51b9acdc8a8c7e1cf4eb815459
Merge: 3a441183 78199c53
Author: prangana <pradeep.rao@amd.com>
Date: Mon Dec 11 12:53:03 2017 +0530
Fix merge conflicts after rebase with release branch
Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630
commit 3a44118398955d6f872e01f73ae5bb4a4f8500f7
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Wed Nov 15 11:11:17 2017 +0530
Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
commit 268a56c06e94d1c388766dbfe81d54efbe432809
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 1 11:51:41 2017 -0500
Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
config/bulldozer/bli_kernel.h. Not sure where this value came from, but
it would seem to allow for insufficient starting address alignment for
any matrices created via bli_malloc_user(), such as via
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
in 8f150f2.
commit 510a6863e28277f9446abfb77f1aea9f01d37e7a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Oct 30 10:04:42 2017 -0500
Fix CVECFLAGS for bulldozer config.
commit c669716790bdda5d2b11ea0a026cbc121b228842
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Tue Oct 24 16:36:36 2017 +0530
Adding __attribute__((constructor/destructor)) for CLANG case.
CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.
Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
commit 24e64a9d0877d788357fc63d4b947e977f8697f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 18 13:41:25 2017 -0500
Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
installation for the 'knl' configuration. Thanks to Victor Eijkhout
for reporting this issue.
commit 9c0a3c4c0260cbfefb9f11532f46508b4fd19ec2
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Oct 16 22:06:57 2017 +0530
Thread Safety: Move bli_init() before and bli_finalize() after main()
BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.
This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().
GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.
Similarly bli_init() can be made to run before application enters main()
so that application need not call it.
Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
commit 83f31253eb21c5ecd8a5907835e57720daae0b8b
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Oct 16 21:07:50 2017 +0530
Thread safety: Make the global induced method status array local to thread
BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.
This patch solves this issue by making the induced method status array
local to threads.
Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
commit e923402e68029be379a4297de3ac6fb155ffd928
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Thu Sep 28 12:15:36 2017 +0530
The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
commit a64c15de19327c7595376d699be676c7003e850e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 26 19:02:53 2017 -0500
Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
commit 42dcd589c37e1a2473ab2e1539207da97aebc07f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 26 17:00:04 2017 -0500
Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
into a k x k triangular matrix for the purposes of obtaining an mr x k
micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
very large k (depending on the product of mr x kc on that architecture).
The bug arose from the fact that the test module was triggering the
allocation of blocks from the internal memory pools, which are limited in
size. This allocation imposes an implicit assumption that the micro-
panel being tested with will fit inside, and this assumption is violated
for large values of k. Arbitrarily large k may now be tested for both
operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
issues.
commit 206beb68ff73b75f5c382413967aacbb8a0aac3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 9 14:10:15 2017 -0500
Updated bibtex info for BLIS5 (3m4m) article.
commit 0c8c0363aeb1f4aa88f7ec2d02403dab05a6e014
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Mon Aug 28 16:44:42 2017 +0530
Bug fix for the testsuite build failing
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
commit 63d1c84465b50f64787808dd3e8494e683c16821
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Aug 23 13:01:14 2017 +0530
Adding auto hardware detection for Zen
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
commit 537fb2a895b09be94b11947696fd2da629be24dd
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Aug 15 10:02:25 2017 -0500
Add vzeroupper to Intel AVX kernels.
commit 7628de3f76f78a44788807605a4601ddda445854
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 10 16:24:28 2017 -0500
Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
Erling Andersen for pointing out this inconsistency and suggesting
the change.
commit a666fd4e267ffae3d4b21f38d569c61ff56adc9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 5 13:04:31 2017 -0500
Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
bli_determine_blocksize_b_sub(). This isn't actually needed by any
current use case within BLIS, but handling the situation is nonetheless
prudent. Thanks to Minh Quan for reporting this issue and requesting
the fix.
commit 0c8afa546d7f33760415519ba328d7c49eb7aa06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 4 14:17:44 2017 -0500
Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
entries to be released and then re-acquired unnecessarily. (In essence,
the "<" operands in the conditional that guards the
release-and-reacquire code block simply needed to be swapped.) The bug
should have only affected performance (rather than the computed result).
Thanks to Minh Quan for identifying and reporting the bug.
commit 6cf68a185d83fa46d438fcef65258ace78e24b13
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 31 15:19:51 2017 -0500
Change lsame_ signature to match lapacke.
commit 6a9bd97295cc4fb1cbcd28f69824a43c073c9a76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 29 20:17:05 2017 -0500
Fixed pthreads compile bug with previous commit.
Details:
- Erroneously passed family parameter into l3int_t function despite
that function not taking the parameter. Oops.
commit 95adc43d800431dc0a02ca83a51426dbef641ad6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 29 14:53:39 2017 -0500
Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
cntl_t struct. Updated all accessor functions/macros accordingly, as well
as all consumers and intermediaries of the family parameter (such as
bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
change was motivated by the desire to keep the context limited, as much
as possible, to information about the computing environment. (The family
field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
that operate only on a single struct to contain the "_node" suffix to
differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
They weren't being used and probably never will be.
commit a98e4aa547f61ab09dd91d11478c2a2ef9882e11
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 14:50:13 2017 -0500
Clang can't make up it's mind what to support.
commit 32eb36c3e8c2add2528514272044de16faed0c8f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 12:54:58 2017 -0500
Add default #define for __has_extension.
commit 2a9aa134f7c29d3d4fdc160022ff257e61885a95
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 10:04:34 2017 -0500
Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143.
commit 6f07a034d575e1e9e30bb6417b8fcb77cf301297
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 19 15:40:48 2017 -0500
Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
ar. Previously, "cru" was used, while now we employ only "cr". This
change was prompted by a warning observed on Ubuntu 16.04:
ar: `u' modifier ignored since `D' is the default (see `U')
This caused me to realize that the default mode causes timestamps to be
zero, and thus the 'u' option, which causes only changed object files to
be inserted, is not applicable.
commit 32bc03f9eed8795cfd2f2615d1c9f8673e039c57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 19 13:51:53 2017 -0500
Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
version string at configure-time. The help text also now describes the
usage information.
- Changed the way the version string is communicated to the Makefile.
Previously, it was read into the VERSION variable from the 'version' file
via $(shell cat ...). Now, the VERSION variable is instead set in
config.mk (via a configure-substituted anchor from config.mk.in).
commit befaee6dd8b2a72de9e0461fe2ec1f36e9f88f3c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 18 17:56:00 2017 -0500
Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
implementation goes through the same motions as the previous codes, but
protects its loads and increments with GNU atomic built-ins. These atomic
statements take memory ordering parameters that allow us to specify just
enough constraints for the barrier to work as intended on weakly-ordered
hardware. The prior implementation was only guaranteed to work on systems
with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
simply "comm".
commit 8f739cc847fcff2ddeeb336f8b2b9d080eb16f6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 17 19:03:22 2017 -0500
Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
bli_thread_get_jc_nt()
bli_thread_get_ic_nt()
bli_thread_get_jr_nt()
bli_thread_get_ir_nt()
bli_thread_get_num_threads()
bli_thread_set_jc_nt()
bli_thread_set_ic_nt()
bli_thread_set_jr_nt()
bli_thread_set_ir_nt()
bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
commit 10163833075fd42be5b5b503acc855f91a484cfd
Author: Marat Dukhan <marat@fb.com>
Date: Thu Jul 13 21:39:24 2017 -0700
Fix Emscripten builds
commit c09b30d115eade72f44f37bf90aa848c9c0e79af
Author: Minh Quan HO <mqho@kalray.eu>
Date: Fri Jul 7 10:52:05 2017 +0200
set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init
commit 997628ed9793c72e9ef576dd8d715cfec27c4862
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Fri Jun 30 12:23:19 2017 +0530
Reducing the framework overhead of GEMV routines
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
commit ee869066168239b710ad9938bb0e1ae454883f3a
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Jul 4 12:57:32 2017 +0530
Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4
commit 7b933b90b1859c96de49a402d48de82909bc73e5
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Jun 6 20:23:17 2017 -0500
Add new SSI acknowledgment
commit 3485abba4b426fbf42b146a9611a0841f6d236c6
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed May 24 11:48:16 2017 +0530
Checked in the small matrix code to compute GEMM called with A transpose case
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
commit de16beb83b29b4b9748f70db985b0fe04db85f7d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri May 26 14:49:31 2017 -0400
PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.
commit 25d0e618544b6eea7d3f13c7aec513ac0139801d
Author: Devin Matthews <dmatthews@gator3.ufhpc>
Date: Fri May 26 14:47:36 2017 -0400
Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.
commit c5bdd84b35bc2a8ebf55b7763fb56c0c945be0cb
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri May 26 12:28:09 2017 -0500
Change PACKDIM_MR (double) for haswell to 8.
commit 172789d562001293b973bbdd8015bd27d37292e8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 17 13:03:52 2017 -0500
Restored deleted lines from makefile fragments.
commit 3ea9bd2c8e90dbd35655fa6a5b953dfea1f308fe
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed May 17 12:29:44 2017 -0500
Change to /bin/sh.
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
commit 49438409eedb98d3f0ebf00b8d1eee0ae45f4f8c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed May 17 12:27:14 2017 -0500
Remove shebangs from makefiles.
commit 497e2640474c016d576dce3530fa6a66891642a0
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 23:11:22 2017 -0400
Fix if/else structure. Thanks to TravisCI.
commit 835035c56a8de36ad25bb8d1375db170d489ef57
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:23:27 2017 -0400
Mark piledriver compilable w/ clang.
commit 6cdb533472ee61af297c1f948307abbf45828887
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:12:12 2017 -0400
Mark bulldozer compilable w/ clang.
commit a85697d62272da06d28cd1c947f6cf1098df6467
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:06:59 2017 -0400
Correct error message.
commit e0c64cad271058688a2b999caf8c2767dc3aef7e
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:03:23 2017 -0400
Indeed once can compile for carrizo also using clang.
commit 4aafe0505d3f0954d095ded5459a76976e5093b4
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 21:50:49 2017 -0400
A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash
commit abaeaa68ea11e84be1810f564d6f38d506cbeb6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 5 15:06:56 2017 -0500
Fixed a bug in norm1v, norm1m.
Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
and matrices. This is one of the few operations in BLIS that does not
have its own test module within the testsuite, hence why it went
undetected for so long. The bad 1-norms were being used to normalize
matrices in the testsuite after initialization, which led to some
matrices containing a combination of "large" and "small" values. This
tended to push the residuals computed after each test away from zero.
In some cases, they were off *just* enough to the testsuite to label
it a "failure". Many thanks to Jeff Hammond for reporting this bug.
(Wonky details: the bug was due to improperly-defined level-0 scalar
macros for abval2, an operation that computes the absolute square,
or complex magnitude/modulus. Certain complex domain instances of
abval2 were being incorrectly defined in terms of real-only solutions,
leading to bad results. This level-0 operation forms the basis of
norm1v/norm1m. absq2 was also affected, but almost nothing uses
this operation.)
commit cc3107ae1c2074f72b724aa748d2e5b4cb290ed5
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu May 4 10:35:22 2017 -0500
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123.
commit c8ab91f70d399ee14edd30a3a5c46b24c5d2f910
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 3 15:04:51 2017 -0500
Disable complex 3m/4m in testsuite by default.
Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
and 4m. This will improve testing runtime on Travis CI as well as for
anyone manually running the testsuite using default test parameters.
Thanks to Devin Matthews for suggesting this change.
commit 9700f0e5785007ddafb72a5ca83800dee61fd35c
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Tue May 2 19:25:21 2017 -0700
allow KNL build without hbwmalloc.h (i.e. emulated)
we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
although it is possible to install hbwmalloc implementation on such
systems, it is easier not to, since obviously the performance of SDE
execution is not representative so there is no reason to emulate HBW
allocation.
commit 17dcd5a33ff91967f67e7c0ba09b4f18754609a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 16:48:43 2017 -0500
Fixed stray parentheses in README citations.
commit 2910d44ff9e1d951d3249313f4ab39d18ea1b48d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 16:38:43 2017 -0500
CHANGELOG update (0.2.2)
commit 5ca3863220e07972fcefc6682ddd3f6e54fe4a94
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 15:48:30 2017 -0500
Fixed a trsm1m bug that affected right-side cases.
Details:
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
was nondeterministic behavior (usually segmentation faults) for certain
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
which explicitly directed the virtual gemm micro-kernel to use temporary
space if the storage preference of the [real domain] gemm ukernel did
not match the storage of the output matrix C. In the context of gemm,
this handling is not needed because agreement between the storage pref
and the matrix is guaranteed by a high-level optimization in BLIS.
However, this optimization is not applied to trsm because the storage
of C is not necessarily the same as the storage of the micro-panels of
B--both of which are updated by the micro-kernel during a trsm
operation. Thus, the guarantee of storage/preference agreement is not
in place for trsm, which means we must handle that case within the
virtual gemm micro-kernel.
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
3m1, 4m1a, and 1m.
commit 1af0b09f5c275ee7bac896cc6f36f42af721d9b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 12:09:39 2017 -0500
README.md update.
Details:
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
and 6th BLIS papers.
commit db4a0bb8ba7cd697d68be8e5632371ee3e59fd63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 17 12:07:27 2017 -0500
Whitespace reformatting to armv8a kernels file.
Details:
- Updated formatting of function signature/header in
kernels/armv8a/3/bli_gemm_opt_4x4.c.
commit e3eb01f6b990e205b15edcbaffd3d54b3ddd1ca4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 21 15:33:39 2017 -0600
Disabled experiment-related 1m code.
Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
specifically inserted to facilitate the benchmarking of 1m block-panel
and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
reflect changes used/needed during benchmarking.
commit 4f61528d56eed6a139eeac9db0c44e56f2d2d136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 25 16:25:46 2017 -0600
Added 1m-specific APIs for bp, pb gemm algorithms.
Details:
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
body of bli_gemm_cntl_create() replaced with a call to the former.
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
bli_cntl_free() can check if the thread parameter is NULL, and if so,
call the latter, and otherwise call the former.
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
terms of bli_gemm1mxx_cntx_init(), which behaves the same as
bli_gemm1m_cntx_init() did before, except that an extra bool parameter
(is_pb) is used to support both bp and pb algorithms (including to
support the anti-preference field described below).
- Added support for "anti-preference" in context. The anti_pref field,
when true, will toggle the boolean return value of routines such as
bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
causing BLIS to transpose the operation to achieve disagreement (rather
than agreement) between the storage of C and the micro-kernel output
preference. This disagreement is needed for panel-block implementations,
since they induce a transposition of the suboperation immediately before
the macro-kernel is called, which changes the apparent storage of C. For
now, anti-preference is used only with the pb algorithm for 1m (and not
with any other non-1m implementation).
- Defined new functions,
bli_cntx_l3_ukr_eff_prefers_storage_of()
bli_cntx_l3_ukr_eff_dislikes_storage_of()
bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
which are identical to their non-"eff" (effectively) counterparts except
that they take the anti-preference field of the context into account.
- Explicitly initialize the anti-pref field to FALSE in
bli_gks_cntx_set_l3_nat_ukr_prefs().
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
in terms of the existing block-panel macro-kernel _ker_var2(). This
technique requires inducing transposes on all operands and swapping
the A and B.
- Changed bli_obj_induce_trans() macro so that pack-related fields are
also changed to reflect the induced transposition.
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
specify the 1m algorithm (block-panel or panel-block).
- Renamed the following cntx_t-related macros:
bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
and updated all instantiations. Also updated the field names in the
cntx_t struct.
- Comment updates.
commit 1d728ccb2394e77365e7c42683db6579c5fba014
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 25 18:29:49 2016 -0600
Implemented the 1m method.
Details:
- Implemented the 1m method for inducing complex domain matrix
multiplication. 1m support has been added to all level-3 operations,
including trsm, and is now the default induced method when native
complex domain gemm microkernels are omitted from the configuration.
- Updated _cntx_init() operations to take a datatype parameter. This was
needed for the corresponding function for 1m (because 1m requires us
to choose between column-oriented or row-oriented execution, which
requires us to query the context for the storage preference of the
gemm microkernel, which requires knowing the datatype) but I decided
that it made sense for consistency to add the parameter to all other
cntx initialization functions as well, even though those functions
don't use the parameter.
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
a second scalar for each blocksize entry. The semantic meaning of the
two scalars now is that the first will scale the default blocksize
while the second will scale the maximum blocksize. This allows scaling
the two independently, and was needed to support 1m, which requires
scaling for a register blocksize but not the register storage
blocksize (ie: "packdim") analogue.
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
default and maximum blocksizes to some desired blocksize multiple.
These functions are needed in the updated definitions of
bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
- Added support for the 1e and 1r packing schemas to packm, including
1e/1r packing kernels.
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
certain circumstances (specifically, real domain beta and row- or
column-stored matrix C), the real domain macrokernel and microkernel
to be called directly, rather than using the virtual microkernel
via the complex domain macrokernel, which carries a slight additional
amount of overhead.
- Added 1m support to the testsuite.
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
some code in test_gemm.c driver.
commit 0d1b90286e29aa8b768e280b5286d92c02ad87a1
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Tue Oct 25 21:15:26 2016 -0700
never use libm with Intel compilers
Intel compilers include a highly optimized math library (libimf) that
should be used instead of GNU libm.
yes, this change is for ALL targets, including those that are not
supported by the Intel compiler. there is no harm in doing this, and it
is future-proof in the event that the Intel compilers support other
architectures.
commit b150870397e7aee558e61d1bd72a0c0d1d99bee8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 8 16:08:41 2017 -0600
Removed most "old" directories.
Details:
- Removed the vast majority of directories named "old", which contained
deprecated code that I wasn't quite ready to jettison from the source
tree.
commit 270c65985df849297ba1951aa3b56c03948d7775
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 8 15:21:18 2017 -0600
Modified bli_getopt() for thread-safety.
Details:
- Changed the interface of bli_getopt() to take a new argument, a getopt_t
struct, that stores the values of optarg, optind, opterr, and optopt,
and updated the implementation accordingly. (Previously, these
variables were assumed to be global.)
- Added a function for initializing a getopt_t struct.
- Changed test_libblis.c--currently the only consumer of bli_getopt()--to
utilize the new getopt_t state object.
commit ce4d8fabc2e39371f89c12192fb707be82ae021a
Merge: 39be59f2 e05a8dfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 7 17:36:44 2017 -0600
Merge branch 'master' of github.com:flame/blis
commit 39be59f2a8470f40475907d9dd52639b8a911a92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 7 17:35:20 2017 -0600
Replaced several macros with static function APIs.
Details:
- Reimplemented several sets of get/set-style preprocessor macros with
static functions, including those in the following frame/base headers:
auxinfo, cntl, mbool, mem, membrk, opid, and pool. A few headers in
frame/thread were touched as well: mutex_*, thrcomm, and thrinfo.
commit e05a8dfa7cc7df41e966c1ad04e51c482b308b23
Merge: 79507337 4423e33d
Author: dnp <devangiparikh@gmail.com>
Date: Wed Dec 6 16:45:24 2017 -0600
Merge branch 'rt'
commit 4423e33dc593115cda92c5763d756d7ad1298aa9
Author: dnp <devangiparikh@gmail.com>
Date: Wed Dec 6 16:35:03 2017 -0600
Adding SKX kernels and configuration.
commit 79507337e140daec7639f6eb3ed9cfe6e123d342
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Dec 6 16:21:35 2017 -0600
Various checks to ensure that arch_t id is in range.
Details:
- Expanded checking of the arch_t id in bli_gks.c--either passed in from
the caller or as returned from bli_arch_query_id()--against the expected
range of id values. Thanks to Devangi Parikh for suggesting these
additional sanity checks.
commit fde7c1126c58373ecde83471890b257399144876
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 4 16:11:01 2017 -0600
Added 'uninstall-old-headers' target to Makefile.
Details:
- Defined a new 'uninstall-old-headers' target that allows users of BLIS to
uninstall no-longer-needed headers left over from previous installations.
- Fixed the 'uninstall-old' target so that it will install both .a and .so
libraries.
- Renamed 'uninstall-old' to 'uninstall-old-libs'.
- Added 'uninstall-old' target (different from previous 'uninstall-old'
target) that combines 'uninstall-old-libs' and 'uninstall-old-headers'.
commit d4ee770bde213a87aa6049245145318324dc6b51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 4 14:53:43 2017 -0600
Create/install monolithic cblas.h.
Details:
- When CBLAS is enabled at configure-time, BLIS now creates a monolithic
cblas.h using the same flatten-header.sh script that was recently
introduced for creating monolithic blis.h header files. The top-level
Makefile will also install this cblas.h file into the install prefix
alongside blis.h when the 'install' target is invoked. The two header
files are compatible with one another. Regardless whether the user's
source #includes cblas.h, both blis.h and cblas.h, or just blis.h,
the user will get the CBLAS function prototypes and enums, as expected.
commit 52f9e6f1b6468785af8947317656445d4729fc8b
Merge: ab57b979 21360dd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 1 12:28:09 2017 -0600
Merge branch 'rt'
commit 21360dd8e2c7287100645e109acaabcc6ba1140c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 29 14:11:34 2017 -0600
Fixed cntx_t packm query when ker_id > _NUM_PACKM_KERS.
Details:
- Fixed a subtle bug in bli_cntx_get_[un]packm_ker_dt() in which the
function fails to return NULL when passed a kernel id argument that is
equal to or beyond BLIS_NUM_[UN]PACKM_KERS. Instead, the function was
attempting to index into the cntx_t's packm kernel array, which resulted
in undefined behvaior. Thanks to Devangi Parikh for finding this bug.
commit 244a6f4e66e8ff091e995f8090ce779c1928aa8b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 28 17:48:48 2017 -0600
Fixed POSIX sed non-compliance in flatten-header.sh.
Details:
- Changed GNU usage of 'i' and 'a' sed commands used in flatten-header.sh
to POSIX-compliant usage that will work on OS X's sed.
commit 45078621676833e53a2878af8f89479c4f93b8ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 28 15:16:22 2017 -0600
Generate/compile with/install monolithic blis.h.
Details:
- Rewrote monolithify-header.sh (and renamed to flatten-header.sh) so that
headers are inserted recursively. This improves performance by a factor
of 3-4x.
- Modified configure to create an 'include/<configname>' directory in which
make can create a monolithic header.
- Modified the top-level Makefile so that a monolithic header is generated
unconditionally prior to compilation (stored in include/<configname>) and
so that the single header is installed instead of the 450 or so header
files that reside throughout the framework source tree.
- Added "include/*/*.h" to .gitignore file.
- Removed some pnacl/emscripten leftovers that I intended to include in
a1caeba (mostly in testsuite/Makefile).
- Trivial comment changes to frame/include/bli_f2c.h.
commit 1f30b1301bf6d6047ec29e57a5fde8eb1072a0ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 25 16:54:26 2017 -0600
Added missing framework support for x86_64 family.
Details:
- Added support for the x86_64 configuration family to bli_arch.c and
bli_arch_config.h. Thanks to Johannes Dieterich for reporting this
issue.
- Bumped the default value for BLIS_SIMD_NUM_REGISTERS from 16 to 32 and
the default value for BLIS_SIMD_SIZE from 32 to 64. This will support
configuration families that include Skylake and newer processors without
any supported needed in the bli_family_*.h file. The semantics of these
values have always been "maximum" and not exact values; comments in
bli_kernel_macro_defs.h and the github wiki have been adjusted
accordingly.
commit 9f39806c4ed484c9ed13edf96005838d977722a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 21 16:03:56 2017 -0600
Fixed a bug in e31f0b3/b131b9a.
Details:
- Erroneously placed the "don't overwrite existing blocksize" logic in
bli_blksz_init*() rather than in bli_cntx_set_blkszs(). It belongs in
the latter because that function copies blocksizes as-is from the
blksz_t function argument to the appropriate field in the cntx_t. If
the blksz_t was previously initialized selectively, based on the sign
of the blocksize value passed into bli_blksz_init*(), that just leaves
some fields possibly uninitialized (with garbage values), which
definitely will not work.
- The aforementioned logic has been moved to bli_cntx_set_blkszs() via
a new function bli_blksz_copy_if_pos(), which selectively copies only
the blocksizes that are greater than zero.
commit b131b9a025c15f548d4c2952a9ec85eee3d139b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 21 14:30:26 2017 -0600
Updated configs to omit setting some blocksizes.
Details:
- Employ the new semantics of bli_blksz_init*() in e31f0b3 in various
sub-configurations' bli_cntx_init_*() functions by passing in 0 for
register and cache blocksizes that correpond to gemm microkernel
datatypes that were not registered, allowing the default values
set by the bli_cntx_init_*_ref() function call to remain.
commit 499a4c002f895744ecaf81ef7f62d2d6d0d7d594
Merge: e31f0b3e 6c3ba502
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 21 14:25:08 2017 -0600
Merge branch 'rt' of github.com:flame/blis into rt
commit e31f0b3e2dba19ca8a2946bc21beb136a42d0f57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 21 14:21:25 2017 -0600
Subtle update to bli_blksz_init*() API.
Details:
- Updated the semantics of bli_blksz_init() and bli_blksz_init_ed() so
that non-positive blocksize values are ignored entirely. This provides
an easy way to indicate that certain existing values should not be
touched by the update. Thanks to Devangi Parikh for feedback that led
to these changes.
commit 6c3ba502a11f87bc67555d26154cfd39d0af1bac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 21 13:50:53 2017 -0600
Added 'x86_64' sub-config directory.
Details:
- Added missing x86_64 configuration directory, which was intended to be
part of b7ca580.
- Added -Wfatal-errors compiler warning flag to all configurations so that
compilation stops after the first error.
- Changed the vectorization flags for intel64 configuration to be compatible
with 'penryn', the oldest sub-config included in that family.
- Changed the vectorization flags for penryn to target the 'core2'
microarchitecture and ssse3.
commit 25eee3cc49b0631812485d4d5ceef0c23ed1b6dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 21 12:34:20 2017 -0600
Added a dummy file to kernels/generic.
Details:
- Added a dummy file to kernels/generic, which was previously empty, so
that git would begin tracking the otherwise-empty directory. This
directory's existence is necessary for proper execution of configure
for any configuration family that contains the 'generic'
sub-configuration. Thanks to Johannes Dieterich for reporting the
issue that led to this fix.
commit ef024ce4cafa217669eaabb31ff8ab6df93cca05
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 20 18:08:29 2017 -0600
More tweaks to monolithify-header.sh
Details:
- Further fixes monolithify-header.sh script.
- Removed unnecessary #include "blis.h" from frame/3/bli_l3_packm.h.
commit 5028e7dec269b62895511453272585da36e591b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 20 17:00:37 2017 -0600
Second attempt to implement travis_wait.
Details:
- Corrected accidental misplacement of the travis_wait prefix (on the
wrong line of the .travis.yml file) in commit 13e5d91.
commit 13e5d9107b3763cba46fb1bae87476852601b47c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 20 15:57:06 2017 -0600
Added travis_wait prefix to testsuite via Travis.
Details:
- It appears that Travis CL has implemented a new policy that results in
a test failing if it does not produce any output for more than 10
minutes. (Two test instances are now failing in Travis despite the most
recent commit not affecting the library or testsuite.) This issue can
be worked around by executing the test run via travis_wait, which takes
an optional time parameter. This commit attempts to use 'travis_wait 30'
in the .travis.yml file to prevent the early failure at 10 minutes.
commit a1caeba0ea79c8fecb1abadca1f91c6367ab3afb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 20 13:31:20 2017 -0600
Removed pnacl, emscripten support from Makefile.
commit 78199c539beaa50f37893add220261ce0dcb921a
Merge: b3d8ab2e ab57b979
Author: praveeng <praveen.g@amd.com>
Date: Mon Nov 20 15:51:20 2017 +0530
Merge master code till 01-Nov-2017 to amd-staging
Change-Id: I40b53f876db84c8b947b3f2385c9b882245c6603
commit 9df6dda9ec51a0d40166169d2d8a2f84b42266e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 18 19:03:26 2017 -0600
Improvements, bugfixes to monolithify-header.sh.
commit 21d26201f90b884eb8d5de279ed74bbd244ffcb5
Merge: 43baa3b3 b7ca5806
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 18 14:16:53 2017 -0600
Merge branch 'rt' of github.com:flame/blis into rt
commit 43baa3b327d5ae1e2ba619432687b4dd849b05e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 18 14:14:44 2017 -0600
Removed unnecessary flags for generic config.
Details:
- Removed -D_POSIX_C_SOURCE=200112L and -m64 flags from make_defs.mk file
of generic sub-configuration. These flags are generally not necessary,
and particularly not desirable for the generic configuration since they
unnecessarily restrict the environments in which the configuration can
be built.
commit b7ca580618f9382b7982168fd035ed058f83e4c2
Author: iotamudelta <dieterich@ogolem.org>
Date: Sat Nov 18 14:56:05 2017 -0500
[WIP] Add x86 and x86_64 processor families. (#154)
* Add x86 and x86_64 processor families.
* Use generic config as fallback for more families.
After discussion with fgvanzee, a) it's "generic" and 2) use it for all the families as a fallback. Goal is that if a specific CPU is not yet supported by a family (say a new Intel microarchitecture on x86_64), it'll fall through to still work with the slower "generic" kernels
commit 870597d1663aaba1b74d7654b1d4946280aa0d3f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 17 17:06:42 2017 -0600
Added bash script for creating monolithic headers.
Details:
- Added a new script, monolithify-header.sh, to the 'build' directory.
This script recursively replaces all #include directives in a selected
file with the contents of the header files referenced by each directive.
The idea is to "flatten" a tree of .h files into a single file, with
the script acting as a C preprocessor that only processes #include
directives.
commit c76f77f4cc1e71988251c5e63cf6ef137477bf9c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 17 15:10:52 2017 -0600
Removed unnecessary #include "blis.h" from header.
Details:
- Removed an errant #include "blis.h directive from bli_cntx_ind_stage.h.
The generaly policy is that no header file in BLIS should include
blis.h. This will be important in the near future when using a tool to
recursively create a monolithic blis.h file from its consitutent
headers.
commit 2bb9bc6e9536fa239fbc19a7efaaf151116e15b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 17 13:50:14 2017 -0600
Miscellaneous tweaks to gks, rt functionality.
Details:
- Updated bli_cpuid_query_id() so that BLIS_ARCH_GENERIC is always returned
if the hardware fails to test positive for any supported sub-configuration.
- Defined bli_gks_init_ref_cntx(), which will call the context initialization
function bli_cntx_init_configname() for the sub-configuration 'configname'
associated with the arch_t id returned by bli_arch_query_id(). This makes
initializing a reference context easy for experts who wish to construct
those contexts.
commit b3d8ab2ea02c127ab241532abc214624f35bfaab
Merge: 189ffbb0 fe71c06e
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date: Wed Nov 15 01:33:12 2017 -0500
Merge "Added AMD copyright line to the changed files in last 3 commits" into amd-staging
commit fe71c06e42b072407c83112779055b0afb67173d
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Wed Nov 15 11:11:17 2017 +0530
Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
commit d5bf79e50bf97072bbe7117c86b7c45e6e707ea0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 13 14:24:29 2017 -0600
Miscellaneous tweaks and fixes.
Details:
- Fixed incorrect calling sequence in bli_cntx_init_knl.c--an instance of
bli_blksz_init_easy() that should have been bli_blksz_init().
- Fixed a bug in code that is supposed to output the list of sub-directories
in the 'config' directory when configure script is run with no arguments.
- Expanded the output of "make showconfig" to include more info from config.mk.
- Minor changes to build/auto-detect/cpuid_x86.c, mostly in preparation for
someone to add excavator and zen support.
- Added a link to the ConfigurationHowTo wiki to config_registry.
- Other minor tweaks to configure.
commit 673e5184030532c4ebd9fdeecbaa6442bb3ad54f
Merge: 2c51356a 8f150f28
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 1 17:37:42 2017 -0500
Merge branch 'rt' of github.com:flame/blis into rt
commit 2c51356a8b2699c99f9507c80d69c08a35d45fe3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 1 17:37:02 2017 -0500
Implemented runtime hardware detection via cpuid.
Details:
- Added runtime support for selecting an appropriate arch_t value based
on the results of the cpuid instruction (for x86_64). This allows
deferral of choosing a context (kernels, blocksizes, etc.) until
runtime, which allows BLIS to be built with support for multiple
microarchitectures. Currently, only amd64 and intel64 configurations
are registered in the config_registry; however, one could create
custom configuration families to support arbitrary sets of x86_64
microarchitectures.
- Current Intel microarchitectures supported via cpuid are knl, haswell,
sandybridge, and penryn.
- Current AMD microarchitectures supported via cpuid are: zen, excavator,
steamroller, piledriver, and bulldozer.
commit ab57b979046479bcda7f83165838a80117c2ad95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 1 11:51:41 2017 -0500
Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
config/bulldozer/bli_kernel.h. Not sure where this value came from, but
it would seem to allow for insufficient starting address alignment for
any matrices created via bli_malloc_user(), such as via
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
in 8f150f2.
commit 8f150f28a678c4a0c1591400177ad7cca81fcaec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 1 11:41:45 2017 -0500
Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
bli_family_bulldozer.h. Not sure where this value came from, but it
would seem to allow for insufficient starting address alignment for
any matrices created via bli_malloc_user(), such as via
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
led us to this bug.
commit e3f10557caf114441fbfff990e3ce3576c177bdc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 30 13:37:54 2017 -0500
Use perl for some substitution for OS X compatibility.
Details:
- Discovered that sed commands where the replacement string contains '\n'
are problematic with the version of sed present in OS X. For these cases
cases in the configure script, we instead use 'perl -pe' for
search-and-replace functionality.
- Various other minor comment/whitespace tweaks to configure.
- Removed remaining lines of code related to setting/checking variables to
track "unregistered" configurations.
commit dd45cfdfc3d8f9acf4cf7f69138d9b83dafc8842
Merge: 3e4f42a4 f60c827b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 30 12:23:05 2017 -0500
Merge branch 'master' into rt
commit f60c827ba95f452c8454fb914f5564f4895bf644
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Oct 30 10:04:42 2017 -0500
Fix CVECFLAGS for bulldozer config.
commit 3e4f42a4d2ebb37b95988933d92e561c5b2cc201
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 27 11:41:37 2017 -0500
Typecast l1mkr_t enum value prior to comparison.
Details:
- Typecast l1mkr_t enum value in bli_cntx.h to guint_t before testing for
out-of-range value. This is an attempt to pacify a strange warning from
clang on OS X that is seemingly the result of the following compiler
warning flag:
-Wtautological-constant-out-of-range-compare
commit aec6e038d942d35b81bbd723a640cce2c054fb8e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 26 16:12:36 2017 -0500
Removed associative arrays from configure.
Details:
- Implemented a replacement for associative arrays in the configure script
that does not utilize arrays, and therefore works in pre-4.0 versions of
bash. (It appears that Mac OS X will be stuck with version 3.2 indefinitely
due to bash switching to the GPL 3.0 license starting with version 4.0.)
commit 189ffbb0d37262b21acddc0d35b4a22f2cbbca94
Merge: 06e0e635 3eb44f67
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date: Wed Oct 25 02:00:30 2017 -0400
Merge changes Ie115b206,I7ce6cfa2,Iff59b6f4 into amd-staging
* changes:
Adding __attribute__((constructor/destructor)) for CLANG case.
Thread Safety: Move bli_init() before and bli_finalize() after main()
Thread safety: Make the global induced method status array local to thread
commit 3eb44f67618b91ae5f5f0aaaba67e38f16042ee4
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Tue Oct 24 16:36:36 2017 +0530
Adding __attribute__((constructor/destructor)) for CLANG case.
CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.
Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
commit 07c352188bf5265af242255f8e6fcb97050d973d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 23 16:59:22 2017 -0500
Added "generic" configuration.
Details:
- Added a "generic" configuration that leaves the default blocksizes and
kernels unchanged. This replaces the older "reference" configuration.
Updated auto-detect script and code accordingly.
- Added support for generic configuration to arch_t (bli_type_defs.h),
bli_gks_init() (bli_gks.c), and bli_arch_config.h
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
- Whitespace changes to configurations' make_defs.mk files.
commit c1a98d6f70608b02a1e6bcad6ba020a60773dace
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 23 14:24:41 2017 -0500
Minor update to .travis.yml file.
commit 75b9383f01caa8b83f8be0117e15085b0d807ba6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 20 16:41:22 2017 -0500
Minor header renaming ahead of bli_arch.c.
Details:
- Renamed the various configurations' "bli_arch_<configname>.h" header files
(replacing "arch" with "family") to free up the 'bli_arch' namespace for a
different purpose (hardware detection).
- Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
"bli_arch_config.h" and "bli_arch_config_pre.h", respectively.
commit 482af51add26d5ed103c3e3f167657f273b32c7a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 20 15:44:26 2017 -0500
Fixed 'make test' target from top-level Makefile.
Details:
- Updated the top-level Makefile's build rule for testsuite object files to
properly obtain CFLAGS via get-frame-cflags-for() function instead of
simply using the $(CFLAGS) variable (which is empty). This means that
'make test' should now work as expected.
commit 3c269f700d207efe6c04193f09d519c88c1d4045
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 20 13:57:21 2017 -0500
Makefile updates for test drivers, testsuite.
Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
needs only set DIST_PATH to the relative path to the top level of the
BLIS source distribution before including common.mk in order to acquire
all of the definitions typically needed in a Makefile that tests BLIS.
commit 0557189d463446b4c32077cdcf0467fa71ca68dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 18 15:05:27 2017 -0500
Minor updates to .travis.yml, configure script.
commit 2553734d1d62043793f4e783a027349ef6d4d563
Merge: 453deb29 37534279
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 18 13:46:50 2017 -0500
Merge branch 'master' into rt
commit 375342799cbae981c28d831793af588d7951f3f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 18 13:41:25 2017 -0500
Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
installation for the 'knl' configuration. Thanks to Victor Eijkhout
for reporting this issue.
commit 453deb29068889698e274f269c9aa90eea99b527
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 18 13:29:32 2017 -0500
Implemented runtime kernel management.
Details:
- Reworked the build system around a configuration registry file, named
config_registry', that identifies valid configuration targets, their
constituent sub-configurations, and the kernel sets that are needed by
those sub-configurations. The build system now facilitates the building
of a single library that can contains kernels and cache/register
blocksizes for multiple configurations (microarchitectures). Reference
kernels are also built on a per-configuration basis.
- Updated the Makefile to use new variables set by configure via the
config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
in determining which sub-configurations (CONFIG_LIST) and kernel sets
(KERNEL_LIST) are included in the library, and which make_defs.mk files'
CFLAGS (KCONFIG_MAP) are used when compiling kernels.
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
functions into a standard format that includes the kernel set name
(e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
kernels sub-directory. These files exist to provide prototypes for the
kernels present in those directories.
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
This directory includes a new source file, bli_cntx_ref.c (compiled on
a per-configuration basis), that defines the code needed to initialize
a reference context and a context for induced methods for the
microarchitecture in question.
- Rewrote make_defs.mk files in each configuration so that the compiler
variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
basis.
- Modified bli_config.h.in template so that bli_config.h is generated with
#defines for the config (family) name, the sub-configurations that are
associated with the family, and the kernel sets needed by those
sub-configurations.
- Deprecated all kernel-related information in bli_kernel.h and transferred
what remains to new header files named "bli_arch_<configname>.h", which
are conditionally #included from a new header bli_arch.h. These files
are still needed to set library-wide parameters such as custom
malloc()/free() functions or SIMD alignment values.
- Added bli_cntx_init_<configname>.c files to each configuration directory.
The files contain a function, named the same as the file, that initializes
a "native" context for a particular configuration (microarchitecture). The
idea is that optimized kernels, if available, will be initialized into
these contexts. Other fields will retain pointers to reference functions,
which will be compiled on a per-configuration basis. These bli_cntx_init_*()
functions will be called during the initialization of the global kernel
structure. They are thought of as initializing for "native" execution, but
they also form the basis for contexts that use induced methods. These
functions are prototyped, along with their _ref() and _ind() brethren, by
prototype-generating macros in bli_arch.h.
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
identifies the various sub-configurations.
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
structures (pointers to cntx_t, actually). The first dimension is indexed
over arch_t and the inner dimension is the ind_t (induced method) for
each microarchitecture. When a microarchitecture (configuration) is
"registered" at init-time, the inner array for that configuration in the
2D array is initialized (and allocated, if it hasn't been already). The
cntx_t slot for BLIS_NAT is initialized immediately and those for other
induced method types are initialized and cached on-demand, as needed. At
cntx_t registration, we also store function pointers to cntx_init functions
that will initialize (a) "reference" contexts and (b) contexts for use with
induced methods. We don't cache the full contexts for reference contexts
since they are rarely needed. The functions that initialize these two kinds
of contexts are generated automatically for each targeted sub-configuration
from cpp-templatized code at compile-time. Induced method contexts that
need "stage" adjustments can still obtain them via functions in
bli_cntx_ind_stage.c.
- Added new functions and functionality to bli_cntx.c, such as for setting
the level-1f, level-1v, and packm kernels, and for converting a native
context into one for executing an induced method.
- Moved the checking of register/cache blocksize consistency from being cpp
macros in bli_kernel_macro_defs.h to being runtime checks defined in
bli_check.c and called from bli_gks_register_cntx() at the time that the
global kernel structure's internal context is initialized for a given
microarchitecture/configuration.
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
the previous operation-level cntx_t_init()/_finalize() invocations.
Instead, we now query the gks for a suitable context, usually via
bli_gks_query_cntx().
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
hackery that I was no longer willing to support.)
- Consolidated the 1e and 1r packm kernels for any given register blocksize
into a single kernel that will branch on the schema and support packing
to both formats.
- Added the cntx_t* argument to all packm kernel signatures.
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
and instead obtain the packm kernel from the cntx_t.
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
bli_malloc_intl(). Useful when we wish to allocate and initialize to
zero/NULL.
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
bli_cntx.h into static functions.
commit 4607aac297e55ad540cbe5fffbe02e6b1889c181
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Oct 16 22:06:57 2017 +0530
Thread Safety: Move bli_init() before and bli_finalize() after main()
BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.
This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().
GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.
Similarly bli_init() can be made to run before application enters main()
so that application need not call it.
Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
commit 0f5ce26fc597cda6e8ae93a7526f52eb8cba01e9
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Oct 16 21:07:50 2017 +0530
Thread safety: Make the global induced method status array local to thread
BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.
This patch solves this issue by making the induced method status array
local to threads.
Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
commit b882648af87deb1b365fc6b3e94151e69c5ccfa4
Merge: 8b379069 e02d3cb8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 11 16:32:21 2017 -0500
Merge branch 'master' into rt
commit 06e0e6351acb9481225975ad9a4e0b8925336621
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Thu Sep 28 12:15:36 2017 +0530
The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
commit e02d3cb84190a345ebe9b32f53db03a1838976b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 26 19:02:53 2017 -0500
Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
commit f5962a1aae0fb3c9be104d0035c0d73210e7f670
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 26 17:00:04 2017 -0500
Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
into a k x k triangular matrix for the purposes of obtaining an mr x k
micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
very large k (depending on the product of mr x kc on that architecture).
The bug arose from the fact that the test module was triggering the
allocation of blocks from the internal memory pools, which are limited in
size. This allocation imposes an implicit assumption that the micro-
panel being tested with will fit inside, and this assumption is violated
for large values of k. Arbitrarily large k may now be tested for both
operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
issues.
commit 8e917b256ca2d4bcdc059fe98d86be8775c69561
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 9 14:10:15 2017 -0500
Updated bibtex info for BLIS5 (3m4m) article.
commit 7be887057358df4978a4833eeae0c17e15acd9d1
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Mon Aug 28 17:38:22 2017 +0530
Merging "Adding auto hardware detection for Zen"
Change-Id: Id450fb0c4f91a5cd5cbdc06970f4f9ed28dd8520
commit e056d810d16621891ead032603de0c2105cfc0f7
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Mon Aug 28 16:44:42 2017 +0530
Bug fix for the testsuite build failing
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
commit 83796b7caf745fafc263e9e5e1bfcf5eff00c025
Merge: 8176f4e4 d1ee7762
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Mon Aug 28 05:23:28 2017 -0400
Merge "Adding auto hardware detection for Zen" into amd-staging
commit d1ee776202b26874333af7a91b6d2686342c4c81
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Aug 23 13:01:14 2017 +0530
Adding auto hardware detection for Zen
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
commit 8176f4e43872714b997f1a5f83056daadb0ff1a5
Merge: 12413018 adafe974
Author: praveeng <praveen.g@amd.com>
Date: Mon Aug 28 12:21:16 2017 +0530
resolving conflicts bli_gemm_front.c and LICENCE
Change-Id: Id24ce53896d4c1c7ceccc3e004014a0ecceb5474
commit 57e1e5cd51e7ffe8612c96a20b6a041b55426ddb
Merge: f86ce54d d6ef56c6
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
Date: Tue Aug 22 17:07:44 2017 +0530
Merge AMD authored changes
commit adafe974b4bc3fc0663bc2f6f4ce2fde71a97988
Merge: f86ce54d 7dc78b49
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Aug 15 15:17:21 2017 -0500
Merge pull request #150 from devinamatthews/vzeroupper
Add vzeroupper to Intel AVX kernels.
commit 7dc78b49f97e6b3cd6d72fcdc588ace534d0e700
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Aug 15 10:02:25 2017 -0500
Add vzeroupper to Intel AVX kernels.
commit f86ce54d6f315006984534fe29e47a2deaacc9f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 10 16:24:28 2017 -0500
Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
Erling Andersen for pointing out this inconsistency and suggesting
the change.
commit 60a1eeb2317939d732b9eb6ff1e0d6d668c9a1e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 5 13:04:31 2017 -0500
Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
bli_determine_blocksize_b_sub(). This isn't actually needed by any
current use case within BLIS, but handling the situation is nonetheless
prudent. Thanks to Minh Quan for reporting this issue and requesting
the fix.
commit b01c80829907d50ec79977fba8e7b53cfe7db80a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 4 14:17:44 2017 -0500
Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
entries to be released and then re-acquired unnecessarily. (In essence,
the "<" operands in the conditional that guards the
release-and-reacquire code block simply needed to be swapped.) The bug
should have only affected performance (rather than the computed result).
Thanks to Minh Quan for identifying and reporting the bug.
commit 8b379069fcd4811669855b1248ece831f190dff6
Merge: 1f3a5819 05925dd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 1 15:30:40 2017 -0500
Merge branch 'master' into rt
commit 05925dd5d30e8f403bb671ce33029170d65ce7c0
Merge: 803bbef0 cecdc05d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Aug 1 09:31:02 2017 -0500
Merge pull request #146 from devinamatthews/master
Change lsame_ signature to match lapacke.
commit cecdc05d2834786a84ff85775d3f99a958c0765a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 31 15:19:51 2017 -0500
Change lsame_ signature to match lapacke.
commit 803bbef0a386dd0571ad389f69d55154dbfe3c50
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 29 20:17:05 2017 -0500
Fixed pthreads compile bug with previous commit.
Details:
- Erroneously passed family parameter into l3int_t function despite
that function not taking the parameter. Oops.
commit c63980f4ca750618f359031d0691289b1abf5146
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 29 14:53:39 2017 -0500
Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
cntl_t struct. Updated all accessor functions/macros accordingly, as well
as all consumers and intermediaries of the family parameter (such as
bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
change was motivated by the desire to keep the context limited, as much
as possible, to information about the computing environment. (The family
field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
that operate only on a single struct to contain the "_node" suffix to
differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
They weren't being used and probably never will be.
commit 07837395560d413a1ba828163b41186e21a7bcfe
Merge: ca1d1d85 ad8610b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 21 16:49:48 2017 -0500
Merge pull request #139 from Maratyszcza/emscripten
Fix Emscripten builds
commit ad8610b4415cc7982804d74f9aba29875e9e2b6c
Merge: 8772a0b3 ca1d1d85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 21 15:18:33 2017 -0500
Merge branch 'master' into emscripten
commit ca1d1d8560c9ab1a7e3b0ac43ac70d08075bf904
Merge: b537b5bb 733faf84
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 21 09:49:50 2017 -0500
Merge pull request #144 from devinamatthews/fix_atomics_on_bgq
Add fallbacks to __sync_* or __c11_atomic_* builtins...
commit 733faf848dcc54834fcdfbb0185dc644978d8864
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 14:50:13 2017 -0500
Clang can't make up it's mind what to support.
commit 7425d0744d9e9cd29a887120e57c2b43ba287040
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 12:54:58 2017 -0500
Add default #define for __has_extension.
commit b537b5bbe8cbee459a85bac11458498ae2bce4de
Merge: 1f1ec0db 7f41bb0a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 10:58:39 2017 -0500
Merge pull request #133 from devinamatthews/haswell-packdim
Fix prefetching in haswell ukernel
commit 8823f91a14638ce6f4e45e67df03212bb61609d6
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Jul 20 10:04:34 2017 -0500
Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143.
commit 1f1ec0db9380b87679d5c771c4594daa1cfc5f0d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 19 15:40:48 2017 -0500
Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
ar. Previously, "cru" was used, while now we employ only "cr". This
change was prompted by a warning observed on Ubuntu 16.04:
ar: `u' modifier ignored since `D' is the default (see `U')
This caused me to realize that the default mode causes timestamps to be
zero, and thus the 'u' option, which causes only changed object files to
be inserted, is not applicable.
commit 5caaba2d61cbbc36d63102a0786ece28ff797f72
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 19 13:51:53 2017 -0500
Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
version string at configure-time. The help text also now describes the
usage information.
- Changed the way the version string is communicated to the Makefile.
Previously, it was read into the VERSION variable from the 'version' file
via $(shell cat ...). Now, the VERSION variable is instead set in
config.mk (via a configure-substituted anchor from config.mk.in).
commit 13175c5fb70fb6a378d5fff6ecede62e5ea6a1f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 18 17:56:00 2017 -0500
Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
implementation goes through the same motions as the previous codes, but
protects its loads and increments with GNU atomic built-ins. These atomic
statements take memory ordering parameters that allow us to specify just
enough constraints for the barrier to work as intended on weakly-ordered
hardware. The prior implementation was only guaranteed to work on systems
with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
simply "comm".
commit 0e58ba1b3aa84700ca51a96f1c0eed6067562fba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 17 19:03:22 2017 -0500
Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
bli_thread_get_jc_nt()
bli_thread_get_ic_nt()
bli_thread_get_jr_nt()
bli_thread_get_ir_nt()
bli_thread_get_num_threads()
bli_thread_set_jc_nt()
bli_thread_set_ic_nt()
bli_thread_set_jr_nt()
bli_thread_set_ir_nt()
bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
commit 8772a0b33a90154c80d88b381dcdd66f824e041f
Author: Marat Dukhan <marat@fb.com>
Date: Thu Jul 13 21:39:24 2017 -0700
Fix Emscripten builds
commit 72c8b49bb8d3b9370b2cc37718da22f065de9c57
Merge: 70cc825b ba7cada5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 12 14:58:12 2017 -0500
Merge pull request #138 from hominhquan/membrk_set_free_fp
Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
commit ba7cada51a238d320528e3504ed0f0a17a6b022a
Author: Minh Quan HO <mqho@kalray.eu>
Date: Fri Jul 7 10:52:05 2017 +0200
set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init
commit 1241301869957c96f16a2c6567e3ad70afa547de
Merge: 969b67e8 25ead66f
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Wed Jul 5 02:24:00 2017 -0400
Merge "Reducing the framework overhead of GEMV routines" into amd-staging
commit 25ead66fb78557f73af48bac305724d5d8aa3309
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Fri Jun 30 12:23:19 2017 +0530
Reducing the framework overhead of GEMV routines
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
commit 969b67e8800fbd5d14a086606f3b5afbf66ed093
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Jul 4 12:57:32 2017 +0530
Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4
commit 70cc825b552dec05165b9d70f9e6eb33d8abb118
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Jun 6 21:58:21 2017 -0500
Update LICENSE
Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].
commit cf54c77bc79a0f33a514be72c80a654c4e6e6f63
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Jun 6 20:23:17 2017 -0500
Add new SSI acknowledgment
commit d6ef56c6dbaf6df8ee1af1ca6a0f0792a811396a
Author: prangana <pradeep.rao@amd.com>
Date: Thu Jun 1 16:11:09 2017 +0530
Update version number
Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4
commit 897bfa0e92082c30bbb74229562d7d7327cbbac8
Author: prangana <pradeep.rao@amd.com>
Date: Thu Jun 1 16:11:09 2017 +0530
Update version number
Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4
commit 99d0ba5606d4b63e6a9c639aa78d4defc2455f79
Merge: be2c7eb8 6d17e012
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date: Thu Jun 1 02:19:02 2017 -0400
Merge "Checked in the small matrix code to compute GEMM called with A transpose case" into amd-staging
commit 6d17e0120fe5c127b941136ad2c0c08e91439535
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed May 24 11:48:16 2017 +0530
Checked in the small matrix code to compute GEMM called with A transpose case
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
commit 9d93f8481a1404695f7b78a3ced8ca47e890b649
Author: prangana <pradeep.rao@amd.com>
Date: Tue May 30 09:58:10 2017 +0530
Update Licence File
Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2
commit be2c7eb85168937bd4318f4d05ded37620119310
Author: prangana <pradeep.rao@amd.com>
Date: Tue May 30 09:58:10 2017 +0530
Update Licence File
Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2
commit 7f41bb0a0becde6a7de7df0f99668d7b4686c3b0
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri May 26 14:49:31 2017 -0400
PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.
commit d87614af3f3d9187be94d6e77984b282bf890928
Author: Devin Matthews <dmatthews@gator3.ufhpc>
Date: Fri May 26 14:47:36 2017 -0400
Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.
commit 681eec913d7c2ebcff637cec5c1627ced9a92b99
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri May 26 12:28:09 2017 -0500
Change PACKDIM_MR (double) for haswell to 8.
commit 0a3ae0ecaa0ddcb5887005d7051fa234499f1120
Merge: 0f4e6652 6e04f9df
Author: praveeng <praveen.g@amd.com>
Date: Sat May 20 16:53:50 2017 +0530
frame/3/gemm/bli_gemm_front.c
Change-Id: I52a0fbc1d33bb948d430942323bbc5fe44e3ca13
commit 6e04f9df01d79c1b0e673943ca0d5d0a6095eb2e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 17 13:03:52 2017 -0500
Restored deleted lines from makefile fragments.
commit ec5c0c0448275280dca0991f6f33afeb73650450
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed May 17 12:29:44 2017 -0500
Change to /bin/sh.
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
commit 555ddc30d4c7e44f3f335e436c98606f56e1598b
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed May 17 12:27:14 2017 -0500
Remove shebangs from makefiles.
commit f26bd7f42e0c2a47fe321b2c452644990b689654
Merge: cbf8710a 169fb05f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed May 17 11:58:41 2017 -0500
Merge pull request #128 from iotamudelta/master
Portability and clang
commit 169fb05f225c2f060265bcaa872f7f80dc638b70
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 23:11:22 2017 -0400
Fix if/else structure. Thanks to TravisCI.
commit 0579dfea0bcfbb90ebc073fcf78b92a5cf7238e1
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:58:07 2017 -0400
Restore version.
commit a75b05c23dc786a1fdc45dc1627a5ce2299f1a7b
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:23:27 2017 -0400
Mark piledriver compilable w/ clang.
commit 7541d46e2ba8659bb2e36b444edef112fefa1345
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:12:12 2017 -0400
Mark bulldozer compilable w/ clang.
commit 91f897073ec0df3330ede449c4d6af8158266ae3
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:06:59 2017 -0400
Correct error message.
commit f5131e1e49167f948bddd714bb1af1761829c212
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 22:03:23 2017 -0400
Indeed once can compile for carrizo also using clang.
commit 5fa4e9439c04f35f89dd7d26ff742cb2dadc3180
Author: J M Dieterich <dieterich@ogolem.org>
Date: Tue May 16 21:50:49 2017 -0400
A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash
commit 1f3a58197e5d5f9ac862bda91e7527cbfbab5d76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 8 16:10:03 2017 -0500
Housekeeping, induced method file/function renames.
Details:
- Renamed all level-3 induced method files to use the "_vir.c" suffix
instead of "_ref.c". Also renamed functions within these files
accordingly.
- Renamed cpp macro definitions in frame/ind/include according to the
above changes.
- Removed frame/3/old.
commit cbf8710a1ba63e25aadaa6fc5da51ea81b3d596d
Merge: cf39d3ef fdc66f12
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Mon May 8 11:21:20 2017 -0500
Merge pull request #127 from devinamatthews/fix_blis_nt_xx
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS
commit cf39d3ef3b29b8058c39fb4638c1a734fe64aaed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 5 15:06:56 2017 -0500
Fixed a bug in norm1v, norm1m.
Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
and matrices. This is one of the few operations in BLIS that does not
have its own test module within the testsuite, hence why it went
undetected for so long. The bad 1-norms were being used to normalize
matrices in the testsuite after initialization, which led to some
matrices containing a combination of "large" and "small" values. This
tended to push the residuals computed after each test away from zero.
In some cases, they were off *just* enough to the testsuite to label
it a "failure". Many thanks to Jeff Hammond for reporting this bug.
(Wonky details: the bug was due to improperly-defined level-0 scalar
macros for abval2, an operation that computes the absolute square,
or complex magnitude/modulus. Certain complex domain instances of
abval2 were being incorrectly defined in terms of real-only solutions,
leading to bad results. This level-0 operation forms the basis of
norm1v/norm1m. absq2 was also affected, but almost nothing uses
this operation.)
commit 799485124f4d823e908d2e5d38b0c3a1e6172ade
Merge: 773a24ef 0df3541f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu May 4 10:52:09 2017 -0500
Merge pull request #121 from jeffhammond/not-real-knl
allow KNL build without hbwmalloc (i.e. emulated)
commit fdc66f12d40754ff46179804bff592fddafbca02
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu May 4 10:35:22 2017 -0500
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123.
commit 773a24efb2fa1c3a220bf0ce1dd621a3176196da
Merge: dd58c954 b8854259
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 3 15:07:59 2017 -0500
Merge branch 'master' of github.com:flame/blis
commit dd58c9545c877c3f7553eaebca7b5e9720a66f5d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 3 15:04:51 2017 -0500
Disable complex 3m/4m in testsuite by default.
Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
and 4m. This will improve testing runtime on Travis CI as well as for
anyone manually running the testsuite using default test parameters.
Thanks to Devin Matthews for suggesting this change.
commit 0df3541f54b7fe0c604ab2ec47ba814f12391798
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Tue May 2 19:25:21 2017 -0700
allow KNL build without hbwmalloc.h (i.e. emulated)
we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
although it is possible to install hbwmalloc implementation on such
systems, it is easier not to, since obviously the performance of SDE
execution is not representative so there is no reason to emulate HBW
allocation.
commit b88542591d4dd0cde366e5ae35afd3205cb81bdc
Merge: 43007f7b c2c91e09
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 19:22:41 2017 -0500
Merge pull request #107 from jeffhammond/intel-compilers-no-use-libm
never use libm with Intel compilers
commit 43007f7b65ec7926cbbfc39965ff733fa251c15f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 16:48:43 2017 -0500
Fixed stray parentheses in README citations.
commit a4f1d0b8801c114e9ef8be39df01e1b8d27ebcb3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 16:38:43 2017 -0500
CHANGELOG update (0.2.2)
commit 940a707ac78de975110e17c95765e65b89aa5e10
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 16:38:42 2017 -0500
Version file update (0.2.2)
commit d5a5e003ea9b24bb6abf12e88862e8eb61ffb03d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 15:48:30 2017 -0500
Fixed a trsm1m bug that affected right-side cases.
Details:
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
was nondeterministic behavior (usually segmentation faults) for certain
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
which explicitly directed the virtual gemm micro-kernel to use temporary
space if the storage preference of the [real domain] gemm ukernel did
not match the storage of the output matrix C. In the context of gemm,
this handling is not needed because agreement between the storage pref
and the matrix is guaranteed by a high-level optimization in BLIS.
However, this optimization is not applied to trsm because the storage
of C is not necessarily the same as the storage of the micro-panels of
B--both of which are updated by the micro-kernel during a trsm
operation. Thus, the guarantee of storage/preference agreement is not
in place for trsm, which means we must handle that case within the
virtual gemm micro-kernel.
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
3m1, 4m1a, and 1m.
commit e80993e71f4d571e9650a8e90ed386e32059eae5
Merge: a509fbd5 ca3a7924
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 12:30:28 2017 -0500
Merge branch 'master' into 1m
commit ca3a7924770d6cf203cce4ca9f5482e1d0d4e961
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 2 12:09:39 2017 -0500
README.md update.
Details:
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
and 6th BLIS papers.
commit 0f4e6652dfe9b30105d3bab328ac26d9d5c11182
Merge: 42e7f6fb 6e7de6ef
Author: praveeng <praveen.g@amd.com>
Date: Wed Apr 19 17:54:10 2017 +0530
Merge master code till 2017_04_19 to amd-staging
Change-Id: Ibebe83c8ea2e7eb15798c2bcf214b7228a1c9518
commit 42e7f6fb2a531429ee600b2fe0293b67371c7ccb
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Mar 28 18:10:03 2017 +0530
fixed license attribute issues in AMD added files
Change-Id: I303f870a777c7cd1c1af29ea0b93f3e0a27948e4
commit 5600001e973c6cea048bd3fdb28117f1d7c98b9d
Merge: 0b190293 b3ed4933
Author: prangana <pradeep.rao@amd.com>
Date: Mon Mar 20 13:56:33 2017 +0530
Fix merge conflicts after sync with release branch
Change-Id: Icf14a09f728befb69a73fff9fa79c4128e728310
commit 6e7de6ef84babb273dc5528a9b9d01f0febe394b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 17 12:10:24 2017 -0500
Minor updates to test/3m4m.
Details:
- Updated initial problem size and increment in Makefile.
- Updated code in test_gemm.c to correctly query kc from context.
commit f484c6cd4389dc7ae5b972849e12e98ad5bbf9a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 17 12:07:27 2017 -0500
Whitespace reformatting to armv8a kernels file.
Details:
- Updated formatting of function signature/header in
kernels/armv8a/3/bli_gemm_opt_4x4.c.
commit 0b19029342ffc530fa22ef20398a26221cb8f6ec
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Mar 14 14:51:31 2017 +0530
Code cleanup, removed warnings from trsm, removed unused routines in axpyv & scalv
Change-Id: I02867f394c5f416194c4b1769a6c75f39243ec81
commit 825363bd2a5a60a923d4a6d9691dc143845a9cab
Merge: 093bdb80 513944e4
Author: praveeng <praveen.g@amd.com>
Date: Wed Mar 8 15:42:49 2017 +0530
Merge code from master to amd-staging as on 2017_03_08 by praveeng
Change-Id: I80740081b2cb54c9b77a3e78b9fe540e170be23d
commit 093bdb80c86b06367e595aa17487139ae983822f
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Mar 7 13:35:50 2017 +0530
Checked in Unpacked DGEMM code
Change-Id: I39dcc7b238b328f73ee2675d21a5e521d0488723
commit 33923da9a108854590d386e74b6ee66b971e7796
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Mon Mar 6 14:31:31 2017 +0530
Added variant 10 for double precision axpyv microkernel
Change-Id: I7a20cc113a422603250bc450825c965136354974
commit bc828f7f8e3ddb9f58af07edc0b935b21759fb0f
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Fri Mar 3 14:45:35 2017 +0530
Added new axpyv (single precision) microkernel where it performs 10 FMAs per loop- This gives better performance than all other implementations of axpyv
Change-Id: Ic4f0e4c67e367d67d0b24febcf34f81a70a39972
commit c9949f4603419267c10973adf1d63ec38497475d
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Fri Feb 17 14:16:33 2017 +0530
Checked in DGEMMTRSM and edge case handling routine in DDOTXF
Change-Id: I65f00661af6c09b2507294fd43e0a10641c0597e
commit a509fbd5ac04fafd4e51b43d2f59ca56432dc212
Merge: 69b4846a 513944e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 21 17:06:16 2017 -0600
Merge branch 'master' into 1m
commit 69b4846ae9adb157c4171b52e159684db2867853
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 21 15:33:39 2017 -0600
Disabled experiment-related 1m code.
Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
specifically inserted to facilitate the benchmarking of 1m block-panel
and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
reflect changes used/needed during benchmarking.
commit 513944e4a951d8823b4de161b86ad7a965b4d99b
Merge: 8b462a0e 0e18f68c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Feb 20 10:04:33 2017 -0500
Merge pull request #118 from devinamatthews/master
Handle k=0 correctly in KNL dgemm ukernel.
commit 0e18f68cf12eb9189ba901a20040b1cdae417670
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Feb 20 09:03:21 2017 -0600
Handle k=0 correctly in KNL dgemm ukernel.
commit 8b462a0e8c3e9252f0401940849e53cc772256fa
Merge: c362afc5 7d42fc07
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sun Feb 19 23:03:03 2017 -0500
Merge pull request #117 from devinamatthews/master
Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.
commit 7d42fc0796ef0c010375fd8e59b1240ba41ce4d2
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sun Feb 19 21:10:55 2017 -0500
Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.
commit 04245c9ff7f8b3c70d61003029c964bb9a4320ee
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Fri Feb 10 14:24:30 2017 +0530
Reoptimized scalv routines - two vector multiplies are done per iteration, and these routines are enabled in bli_kernel.h
Change-Id: Ic5654508573d1f6bde2edef06aefe117e581feb5
commit c362afc525bab4050581d1b0fcea2fe4d582c608
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 9 11:54:59 2017 -0600
Added missing "level-0" BLAS [sd]cabs1_().
Details:
- Fixed issue #115 by adding implementations for scabs1_() and dcabs1_()
to the BLAS compatibility layer. Thanks to heroxbd for pointing out
their absence.
commit 018180c938c32efbeaaf626ba71ec5b780664db1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 8 11:20:52 2017 -0600
Fixed a minor bug in configure (issue #114).
Details:
- Fixed a bug in the configure script whereby a non-preferred value for
--enable-threading would cause problems in common.mk vis-a-vis detecting
which threading model was chosen. Thanks to heroxbd for reporting this
issue.
commit 58b5b77e5fdb179ea465e398e416e6a00d917e05
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Wed Feb 8 21:43:34 2017 +0530
Fixed a bug in axpyv, the arguments passed to intrinsic fmad instruction are corrected
Change-Id: If12f24c6bc74b22ac9e4acd6b9378e06d79f2f5e
commit 85de4ebf74d0a5587d5a12724eb5489d51674db3
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Wed Feb 8 14:41:04 2017 +0530
variant 4 axpyv single precision modified: explicitly used FMA intrinsics, replaced vector multiply and add operations
Change-Id: I975feef56696d479d2b9e9441b0660021cf4f6ff
commit 3fa53e8af31d634779f40258c51483ae8af494fa
Merge: b5291a44 95be7b04
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Wed Feb 8 11:46:34 2017 +0530
Merged axpyv and gemm small in bli_kernel.h
Merge branch 'amd-staging' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging
modified: config/zen/bli_kernel.h
modified: frame/3/gemm/bli_gemm_front.c
modified: kernels/x86_64/zen/3/bli_gemm_small_matrix.c
Change-Id: If181cf9345178c448b3530beb8bef453917fe295
commit 95be7b04709e688a4cb01fba680081e30f4258ef
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Feb 7 14:01:27 2017 +0530
Added logic for packing matrix A and prefetching matrix C in Unpacked SGEMM code
Change-Id: I99efeca9eb5b4449286ec0ec133fd554ef1bb4f0
commit b5291a445b1313e01f1e0e8102c5f3660ab07f69
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Feb 7 12:39:31 2017 +0530
Added optimization variant 4 for axpyv single precision - this performs 5 FMA per loop, keeping the IPC always full
Change-Id: Ie77ed22584271136a257e673bcd3b1ba71136bc9
commit f4bfc1662af82aa4b98185334c44835e51f1cbec
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Mon Feb 6 15:04:27 2017 +0530
New routines implemented for axpyv to improve performance for small vector sizes, vectorization is done for vectors as small as 8 (single precision) 4(double precision), since this operation has low compute to memory ratio, higher matrix sizes memory operations are dominating and hence not much gain - This still needs some work- added saxpyv and daxpyv var 3 routines in the file bli_axpyv_opt_var1.c
Change-Id: Ic1b33bd5516e10113b00e44ab41b97eb19d46072
commit ddf45e71770c55ea4a58ca24ea4913fe5d8beb9b
Merge: a6ab91bc 78e1b16e
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jan 27 14:25:40 2017 -0600
Merge pull request #113 from devinamatthews/knl_thread_params
Change default threading parameters for KNL.
commit 78e1b16e16d589ed31b2e712115ee282097f114d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jan 27 14:22:20 2017 -0600
Change default threading parameters for KNL.
commit 574472ba5a89924eca7dbd10055d0e1dcd7f4c71
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Jan 10 14:51:46 2017 +0530
checked in unpacked SGEMM optimization
Change-Id: I8e4ea374415c0c402c660b656fb076af15354181
commit 1c732d3ddc4ac0861d3b0e0dd15eb7e071615502
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 25 16:25:46 2017 -0600
Added 1m-specific APIs for bp, pb gemm algorithms.
Details:
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
body of bli_gemm_cntl_create() replaced with a call to the former.
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
bli_cntl_free() can check if the thread parameter is NULL, and if so,
call the latter, and otherwise call the former.
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
terms of bli_gemm1mxx_cntx_init(), which behaves the same as
bli_gemm1m_cntx_init() did before, except that an extra bool parameter
(is_pb) is used to support both bp and pb algorithms (including to
support the anti-preference field described below).
- Added support for "anti-preference" in context. The anti_pref field,
when true, will toggle the boolean return value of routines such as
bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
causing BLIS to transpose the operation to achieve disagreement (rather
than agreement) between the storage of C and the micro-kernel output
preference. This disagreement is needed for panel-block implementations,
since they induce a transposition of the suboperation immediately before
the macro-kernel is called, which changes the apparent storage of C. For
now, anti-preference is used only with the pb algorithm for 1m (and not
with any other non-1m implementation).
- Defined new functions,
bli_cntx_l3_ukr_eff_prefers_storage_of()
bli_cntx_l3_ukr_eff_dislikes_storage_of()
bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
which are identical to their non-"eff" (effectively) counterparts except
that they take the anti-preference field of the context into account.
- Explicitly initialize the anti-pref field to FALSE in
bli_gks_cntx_set_l3_nat_ukr_prefs().
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
in terms of the existing block-panel macro-kernel _ker_var2(). This
technique requires inducing transposes on all operands and swapping
the A and B.
- Changed bli_obj_induce_trans() macro so that pack-related fields are
also changed to reflect the induced transposition.
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
specify the 1m algorithm (block-panel or panel-block).
- Renamed the following cntx_t-related macros:
bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
and updated all instantiations. Also updated the field names in the
cntx_t struct.
- Comment updates.
commit 41595e98eedaf3f1f93802c14dcae490402f933f
Merge: d625c49e a6ab91bc
Author: praveeng <praveen.g@amd.com>
Date: Wed Dec 7 15:13:21 2016 +0530
Merge master code as on 2016_12_07 to amd-staging
Change-Id: I5d9ecef9bff960aeb9b51ca4e4b21714e789e44f
commit d625c49e20bd3c50d6d44e330e34076cced114a3
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Nov 29 15:05:19 2016 +0530
checked-in SGEMMTRSM microkernel for Zen
Change-Id: Ib61936418dea911b2154aa99f703b66e9669f94f
commit a6ab91bc61432490fadf18d596de4589645f37dd
Merge: 145a551d 7f31a630
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 30 09:26:58 2016 -0600
Merge pull request #111 from figual/master
Fixed missing cntx argument in ARMv8 microkernels.
commit 7f31a6307b7bd35f913c895947552c3a176f789b
Author: Francisco Igual <figual@ucm.es>
Date: Sun Nov 27 14:40:47 2016 +0100
Fixed missing cntx argument in ARMv8 microkernels.
commit 126482a3b609b9ad7026ba348f6c4bf6a29be8a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 25 18:29:49 2016 -0600
Implemented the 1m method.
Details:
- Implemented the 1m method for inducing complex domain matrix
multiplication. 1m support has been added to all level-3 operations,
including trsm, and is now the default induced method when native
complex domain gemm microkernels are omitted from the configuration.
- Updated _cntx_init() operations to take a datatype parameter. This was
needed for the corresponding function for 1m (because 1m requires us
to choose between column-oriented or row-oriented execution, which
requires us to query the context for the storage preference of the
gemm microkernel, which requires knowing the datatype) but I decided
that it made sense for consistency to add the parameter to all other
cntx initialization functions as well, even though those functions
don't use the parameter.
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
a second scalar for each blocksize entry. The semantic meaning of the
two scalars now is that the first will scale the default blocksize
while the second will scale the maximum blocksize. This allows scaling
the two independently, and was needed to support 1m, which requires
scaling for a register blocksize but not the register storage
blocksize (ie: "packdim") analogue.
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
default and maximum blocksizes to some desired blocksize multiple.
These functions are needed in the updated definitions of
bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
- Added support for the 1e and 1r packing schemas to packm, including
1e/1r packing kernels.
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
certain circumstances (specifically, real domain beta and row- or
column-stored matrix C), the real domain macrokernel and microkernel
to be called directly, rather than using the virtual microkernel
via the complex domain macrokernel, which carries a slight additional
amount of overhead.
- Added 1m support to the testsuite.
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
some code in test_gemm.c driver.
commit d8f13beeea90338e0ecb0a3aeaa2d59d8ebd6c36
Merge: c25a9205 145a551d
Author: praveeng <praveen.g@amd.com>
Date: Fri Nov 25 17:31:08 2016 +0530
Merge master code till 2016_11_25 to amd-staging
commit c25a9205fd8c8d8de7fd81b1e5621e7ac79f4e87
Merge: 65298762 bdc0a264
Author: praveeng <praveen.g@amd.com>
Date: Fri Nov 25 17:06:36 2016 +0530
Merge master code till Switched to simpler trsm_r 2016_11_25 to amd-staging
Change-Id: Ibf71d224d8fb6cf0bc497f84d50c27d276512cc1
commit 145a551d524ae5492667a05fc248923d922df850
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 23 17:59:06 2016 -0600
Switched to simpler trsm_r implementation.
Details:
- Disabled the implementation of trsm_r that allows the right-hand matrix
B to be trianglar, and switched to the implementation that simply
transposes the operation (and thus the storage of C) in order to recast
the operation as trsm_l. This avoids the need to use trsm_rl and trsm_ru
macrokernels, which require an awkward swapping of MR and NR. For now,
the support for trsm_r macrokernels, via separate control trees, remains.
- Modified bli_config_macro_defs.h so that BLIS_RELAX_MCNR_NCMR_CONSTRAINTS
is defined by default. This is mostly a safety precaution in case someone
tries to switch back to the previous trsm_r implementation, but also
serves as a convenience on some systems where one does not naturally
choose blocksizes in a way that satisfies MC % NR = 0 and NC % MR = 0.
commit b3e58ee30307cf1e11529f2113acb9abbeda25af
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 23 17:58:26 2016 -0600
Reimplemented 4x12 haswell ukernels (real only).
Details:
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
defines 4x24 single real and 4x12 double real gemm microkernels, with
broadcast-based implementations. (The previous microkernel file has been
moved to an 'old' subdirectory.)
commit 65298762ff15c45e8588e0c279a9feaa98c927a0
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Nov 22 12:15:33 2016 +0530
removed a redundant copy operation in DNRM2
Change-Id: I673b08efde4480e871779716f7715566740ad9ce
commit d6863e851adeef037e4d1476fe63bb293fb9d987
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Mon Nov 21 11:30:30 2016 +0530
checked-in DNRM2 optimizations
Change-Id: I3b31d768bd7f4fbf43042aa5a0762995c73c4522
commit bdc0a264d2fb5940bfd09298b1de823674a39053
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 16 14:13:08 2016 -0600
Adjusted stride selection of ct in macrokernels.
Details:
- Updated the changes introduced in 618f433 so that the strides of the
temporary microtile ct used in the macrokernels is determined based
on the storage preference of the microkernel (via the new functions
below), rather than the strides of c. In almost all cases, presently,
this change results in no net effect, as a high-level optimization
in the _front() functions aligns the storage of c to that of the
microkernel's preference. However, I encountered some cases where
this is not always the case in some development code that has yet
to be committed, and therefore I'm generalizing the framework code
in advance.
- Defined two new functions in bli_cntx.c:
bli_cntx_l3_ukr_prefers_rows_dt()
bli_cntx_l3_ukr_prefers_cols_dt()
which return bool_t's based on the current micro-kernel's storage
preferences. For induced methods, the preference of the underlying
real domain microkernel is returned.
- Updated definition of bli_cntx_l3_ukr_dislikes_storage_of(), and
by proxy bli_cntx_l3_ukr_prefers_storage_of(), to be in terms of
the above functions, rather than querying the preferences of the
native microkernel directly (which did the wrong thing for induced
methods).
commit 031978d2647cf08316858baf29c84ebba9c3133e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 16 14:04:33 2016 -0600
Fixed inactive trsm_r blocksize constraint code.
Details:
- Changed a cpp macro that was meant to prevent using certain trsm_r code
if BLIS_RELAX_MCNR_NCMR_CONSTRAINTS was defined. It was actually coded
incorrectly at first. I've now fixed its location and changed its
consequence to a compile-time #error message.
commit 9772218cae57d55c252595b01e3669d8bed84944
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Nov 16 15:19:19 2016 +0530
Added optimized DAMAX routines for Zen
Change-Id: I499c0c8f0f4ce6c19235c47b86d5608db6ba50f8
commit 9c448e30174e5eb76a94b43b30819704a5dfcb3f
Merge: 998d8240 e35d3c23
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
Date: Wed Nov 16 04:18:57 2016 -0500
Merge "Added new optimized micro-kernel for dotxv routine" into amd-staging
commit 998d824044adac0d54c921dcd44fb58f3d54aad2
Merge: 0d13e9a4 6b5a4032
Author: praveeng <praveen.g@amd.com>
Date: Wed Nov 16 14:22:42 2016 +0530
Merge master code till devinamatthews/omp_num_thrds 2016_11_16 to amd-staging
Change-Id: I601ff1d3ec8a680e1be039ffc7b299744e8a27c5
commit 6b5a4032d2e3ed29a272c7f738b7e3ed6657e556
Merge: 3b524a08 a8220e3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 10 15:28:24 2016 -0600
Merge pull request #109 from devinamatthews/omp_num_threads
Add automatic loop thread assignment.
commit a8220e3a86433b5d76789e32ea7ca014a11b6d17
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Nov 10 14:19:34 2016 -0600
- Fix typo in bli_cntx.c
- Bump BLIS_DEFAULT_NR_THREAD_MAX to 4
commit e35d3c23f28784e50ee13d2e77a69d60e0c24c1f
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Thu Nov 10 14:30:53 2016 +0530
Added new optimized micro-kernel for dotxv routine
Change-Id: I2c544e9b25a454d971ad690353502a55cd668391
commit 0d13e9a4f6f2fcda08f205215240cdf86442d6c6
Merge: e044fa62 3b524a08
Author: praveeng <praveen.g@amd.com>
Date: Mon Nov 7 14:40:41 2016 +0530
bli_kernel.h
Change-Id: I425d089f79497a0de7d1622e829c3ca9edf7f091
commit c05b3862f6241486442b313eff0c8bee7b5e1274
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Nov 4 15:48:02 2016 -0500
Add automatic loop thread assignment.
- Number of threads is determined by BLIS_NUM_THREADS or OMP_NUM_THREADS, but can be overridden by BLIS_XX_NT as before.
- Threads are assigned to loops (ic, jc, ir, and jc) automatically by weighted partitioning and heuristics, both of which are tunable via bli_kernel.h.
- All level-3 BLAS covered.
commit 3b524a08e3fb8380e7b8b2ba835312c51a331570
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 2 17:45:18 2016 -0500
Consolidated 3m1/4m1 gemmtrsm, trsm ukernel code.
Details:
- Consolidated the macros that define the lower and upper versions of the
gemmtrsm microkernels into a single macro that is instantiated twice.
Did this for both 3m1 and 4m1 microkernels.
- Consolidated lower and upper versions of the trsm microkernels for 3m1
and 4m1 into single files (each).
commit ead231aca635deb3db270f118454e4222c627f31
Merge: d25e6f8b 62987f60
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 2 13:03:50 2016 -0500
Merge pull request #108 from devinamatthews/patch-2
Update .travis.yml with additional tests
commit 62987f60a6a6ff0a75b31d0404f493593ce35ccc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Nov 2 11:20:37 2016 -0500
Allow KNL to fail
commit 8f9010542c751ae3cbfe6121cb011d8985c1e00d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Nov 2 11:18:32 2016 -0500
Fix some problems with OSX builds:
- Update CPU detection for Intel archs (esp. Skylake)
- Allow clang for the reference config
commit d25e6f8b63c57f30b8a67dffbf4995977cf9f235
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 1 14:35:15 2016 -0500
Can disable trsm_r-specific blocksize constraints.
Details:
- Added cpp guards around the constraints in bli_kernel_macro_defs.h
that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
needed when handling right-side trsm by allowing the matrix on the
right (matrix B) to be triangular, because it involves swapping
register, but not cache, blocksizes (packing A by NR and B by MR)
and then swapping the operands to gemmtrsm just before that kernel
is called. It may be useful to disable these constraints if, for
example, the developer wishes to test the configuration with
a different set of cache blocksizes where only MC % MR = 0 and
NC % NR = 0 are enforced.
- In summary, #defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
the enforcement of MC % NR = 0 and NC % MR = 0.
commit 1a67e3688edb073a9d44c160e7b0798e08796b8a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Nov 1 13:53:18 2016 -0500
Bogus commit
Need to trigger another Travis build.
commit 2cd82d67b372cad1bed50cfd99e524f1f40b4e24
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Nov 1 13:25:50 2016 -0500
Some fixes for .travis.yml
- Switch to gcc-5 to support knl
- Don't run tests in parallel -- it is super slow.
- Use clang on OSX since gcc is only a zombie husk.
commit a3db4e6bdfe745083acf704ab0f51f74ea869538
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Nov 1 10:33:18 2016 -0500
Update .travis.yml with additional tests
- Test knl configuration (without running of course).
- Test openmp and pthreads threading for auto configuration with 4 threads.
- Test auto configuration with and without pthreads on OSX.
- Also, run make in parallel.
I don't know how the `addons:` section works on OSX; hopefully it is just ignored.
commit 8a11a2174a1a5b9426f13bbc5338dc86ab138cdd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 31 19:07:55 2016 -0500
Updates to non-default haswell microkernels.
Details:
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
constraints.
- Added missing c and z microkernels, which are based on the corresponding
kernels in the d6x8 set.
- This completes the d8x6 set (which may be used for situations when it
is desirable to have a microkernel with a column preference).
commit 618f4331eba209803ecab99747872eceb1b5f091
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 31 14:40:51 2016 -0500
Align strides of ct in macrokernels to that of c.
Details:
- Previously, rs_ct and cs_ct, the strides of the temporary microtile used
primarily in the macrokernels' edge case handling, were unconditionally
set to 1 and MR, respectively. However, Devin Matthews noted that this
ought to be changed so that the strides of ct were in agreement with the
strides of C. (That is, if C was row-stored, then ct should be accessed
as by rows as well.) The implicit assumption is that the strides of C
have already been adjusted, via induced transposition, if the storage
preference of the microkernel is at odds with the storage of C. So, if
the microkernel prefers row storage, the macrokernel's interior cases
would present row-stored (ideal) microkernel subproblems to the
microkernel, but for edge cases, it would still see column-stored
subproblems (not ideal). This commit fixes this issue. Thanks to Devin
for his suggestion.
commit c2c91e09b4893cb81314774557f728a95080f81e
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Tue Oct 25 21:15:26 2016 -0700
never use libm with Intel compilers
Intel compilers include a highly optimized math library (libimf) that
should be used instead of GNU libm.
yes, this change is for ALL targets, including those that are not
supported by the Intel compiler. there is no harm in doing this, and it
is future-proof in the event that the Intel compilers support other
architectures.
commit 630391002325a589063aec2ab0a7d89ef2e178c0
Merge: 956b3edf 216206c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 25 19:34:51 2016 -0500
Merge pull request #105 from devinamatthews/knl
Support for Intel Knight's Landing.
commit 216206c1d328a865c2192e35a4df6e9aff79a85b
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Oct 25 13:56:18 2016 -0500
Fix up for merge to master.
commit 11eb7957abbcdf02d5e312898e094260eadb1209
Merge: cd5b6681 956b3edf
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Oct 25 13:51:07 2016 -0500
Merge branch 'master' into knl
# Conflicts:
# frame/thread/bli_thread.h
commit cd5b6681838899283cd94e5427dfda206e7fbabe
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Oct 25 13:49:27 2016 -0500
Don't use %rbp in KNL packing kernels.
commit 956b3edf8eb09480f31f2e861c1b10f9ecbb2e52
Merge: b7e41d71 0662a3c1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 25 13:02:57 2016 -0500
Merge pull request #104 from devinamatthews/misspellings
Add flexible options for thread model (pthread/posix for pthreads etc.).
commit 0662a3c1b1f4644a86bf8e5073d1391808c91b4a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Oct 25 12:42:44 2016 -0500
Add flexible options for thread model (pthread/posix for pthreads etc.).
commit e044fa624008c161de32a39d734cddf1dd22dd41
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Oct 25 13:03:05 2016 +0530
Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a
commit b3ed4933aa0da72ad771fb0fdf1727e5ba9ad7b4
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Tue Oct 25 13:03:05 2016 +0530
Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a
commit b7e41d71b07d2af6d22d632c70e0c5f7ce46852c
Merge: 4bd905bd 5117d444
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 24 16:47:46 2016 -0500
Merge pull request #103 from devinamatthews/patch-1
Change .align to .p2align in Bulldozer ukernels.
commit 5117d444f7f3a2bc327f067926eaf2398212edda
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Oct 24 16:20:47 2016 -0500
Change .align to .p2align in Bulldozer ukernels
Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.
commit 4bd905bd4597e0ad7bedf31e25e779d3e2dfda29
Merge: 936d5fdc 7f32dd57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 21 14:48:44 2016 -0500
Merge pull request #93 from ShadenSmith/config_check
Adds sanity check to configuration choice.
commit 936d5fdc26c6c4dab199a8d11fde948975cfa1d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 21 14:34:27 2016 -0500
Fixed multithreading compilation bug in 970745a.
Details:
- Moved the definition of the cpp macro BLIS_ENABLE_MULTITHREADING
from bli_thread.h to bli_config_macro_defs.h. Also moved the
sanity check that OpenMP and POSIX threads are not both enabled.
- Thanks to Krzysztof Drewniak for reporting this bug.
commit d250e6a3af3af8beedcda28f508ac03e94efb3c8
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Thu Oct 20 14:34:39 2016 +0530
Merged TRSM and scalv routines into zen folder
Change-Id: Ice897bc83e8fb70b90f23cc3ce892c39883aceb9
commit 8feb0f85a674e84bec2417486e3bcea584b14c04
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 19 16:05:41 2016 -0500
Removed auto-prototyping of malloc()/free() substitutes.
Details:
- Removed the header file, bli_malloc_prototypes.h, which automatically
generated prototypes for the functions specified by the following
cpp macros:
BLIS_MALLOC_INTL
BLIS_FREE_INTL
BLIS_MALLOC_POOL
BLIS_FREE_POOL
BLIS_MALLOC_USER
BLIS_FREE_USER
These prototypes were originally provided primarily as a convenience
to those developers who specified their own malloc()/free() substitutes
for one or more of the following. However, we generated these prototypes
regardless, even when the default values (malloc and free) of the
macros above were used. A problem arose under certain circumstances
(e.g., gcc in C++ mode on Linux with glibc) when including blis.h that
stemmed from the "throw" specification which was added to the glibc's
malloc() prototype, resulting in a prototype mismatch. Therefore, going
forward, developers who specify their own custom malloc()/free()
substitutes must also prototype those substitutes via bli_kernel.h.
Thanks to Krzysztof Drewniak for reporting this bug, and Devin Matthews
for researching the nature and potential solutions.
commit 970745a5fc7c29de3e202988e5eb104fabca4fdc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 19 15:58:03 2016 -0500
Reorganized typedefs to avoid compiler warnings.
Details:
- Relocated membrk_t definition from bli_membrk.h to bli_type_defs.h.
- Moved #include of bli_malloc.h from blis.h to bli_type_defs.h.
- Removed standalone mtx_t and mutex_t typedefs in bli_type_defs.h.
- Moved #include of bli_mutex.h from bli_thread.h to bli_typedefs.h.
- The redundant typedefs of membrk_t and mtx_t caused a warning on some C
compilers. Thanks to Tyler Smith for reporting this issue.
commit 1c2f7b57d557c05f5ef6148cccafaf0f70d910da
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Tue Oct 18 15:06:35 2016 +0530
Removed symlinks to zen kernels from haswell kernel folder and also modified the bli_kernel.h file accordingly
Change-Id: Ib3736af48e851c8243bbe10d937fb942c49ad048
commit d864ea9f4f039fe2b2dc395d0015bd9e8902bc8e
Merge: 7045fcbf 28b2af8a
Author: praveeng <praveen.g@amd.com>
Date: Fri Oct 14 17:00:57 2016 +0530
Merge master code 2016_10_14 till Added disabled code thrinfo_t structures
Change-Id: If7db98d286c1471fcd30f00757abee9b253ef987
commit 28b2af8a71133ce68774e153b6e05afb05affba8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 13 14:50:08 2016 -0500
Added disabled code to print thrinfo_t structures.
Details:
- Added cpp-guarded code to bli_thrcomm_openmp.c that allows a curious
developer to print the contents of the thrinfo_t structures of each
thread, for verification purposes or just to study the way thread
information and communicators are used in BLIS.
- Enabled some previously-disabled code in bli_l3_thrinfo.c for freeing
an array of thrinfo_t* values that is used in the new, cpp-guarde code
mentioned above.
- Removed some old commented lines from bli_gemm_front.c.
commit 11eed3f683d09e65f721567b346b0f733bff9a64
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 13 14:23:23 2016 -0500
Fixed a configure -t omp/openmp bug from fd04869.
Details:
- Forgot to update certain occurrences of "omp" in common.mk during
commit fd04869, which changed the preferred configure option string
for enabling OpenMP from "omp" to "openmp".
commit 7045fcbf0bd349ebe6cb9ac4508c6a387bb05966
Merge: 7e044900 9cda6057
Author: praveeng <praveen.g@amd.com>
Date: Thu Oct 13 12:02:28 2016 +0530
Merge master code 2016_10_13 Removed previously renamed/old files
Change-Id: I8106d371afaa0af474a8967388d44481b05de923
commit 7e04490002206d3557fcfb7dd893838a7f36916f
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Oct 12 16:43:02 2016 +0530
Checked in the SAMAX optimizations
Change-Id: I7faf8c3adf52ff01432188ad3b9866ee4b9a9dfd
commit 9cda6057eaa16a24ac8785a9fa167df6c9edba44
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 11 13:21:26 2016 -0500
Removed previously renamed/old files.
Details:
- Removed frame/base/bli_mem.c and frame/include/bli_auxinfo_macro_defs.h,
both of which were renamed/removed in 701b9aa. For some reason, these
files survived when the compose branch was merged back into master.
(Clearly, git's merging algorithm is not perfect.)
- Removed frame/base/bli_mem.c.prev (an artifact of the long-ago changed
memory allocator that I was keeping around for no particular reason).
commit 22377abd84b9e560ffe1c4e4d284eb443ddb7133
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 10 13:43:56 2016 -0500
Fixed bli_gemm() segfault on empty C matrices.
Details:
- Fixed a bug that would manifest in the form of a segmentation fault
in bli_cntl_free() when calling any level-3 operation on an empty
output matrix (ie: m = n = 0). Specifically, the code previously
assumed that the entire control tree was built prior to it being
freed. However, if the level-3 operation performs an early exit, the
control tree will be incomplete, and this scenario is now handled.
Thanks to Elmar Peise for reporting this bug.
commit 0b571cd94d9b175331c9453258a6b1389a718ae8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 6 14:48:15 2016 -0500
Fixed segfault in bli_free_align() for NULL ptrs.
Details:
- Fixed a bug in bli_free_align() caused by failing to handle NULL pointers
up-front, which led to performing pointer arithmetic on NULL pointers in
order to free the address immediately before the pointer. Thanks to Devin
Matthews for reporting this bug.
commit cd84fb95182514601d72c78ee0e36a394d0284d7
Author: praveeng <praveen.g@amd.com>
Date: Thu Oct 6 15:08:21 2016 +0530
syntax erros in configure file
Change-Id: Ibe8a6071aad97df550df64c009fec33a9d8f43a1
commit f2e7ea113aa93b74f1d42408d5db2c5a7b00a653
Merge: 133983c3 86969873
Author: praveeng <praveen.g@amd.com>
Date: Thu Oct 6 12:35:30 2016 +0530
conflicts merge for bli_kernel.h
Change-Id: I15d846bd34e11f86ebfd7ed091ff671a1f3366a0
commit 133983c36fa01c7acb6d666b3744f77f216314a5
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Thu Oct 6 11:26:22 2016 +0530
code clean up in bli_kernel.h
Change-Id: I11d9cdf2af8e8199209eb084f6c3a7c910b83d5d
commit 4fb9b4ef2e4cf2626a6e000a41628fb823f16da8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 5 14:41:35 2016 -0500
CHANGELOG update (0.2.1)
commit 866b2dde3f41760121115fb25f096d4344e8b4f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 5 14:41:34 2016 -0500
Version file update (0.2.1)
commit 87fddeab3c8a5ccb1bbf02e5f89db1464e459ba9
Merge: 86969873 6f71cd34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 5 13:35:01 2016 -0500
Merge branch 'compose'
commit 6f71cd344951854e4cff9ea21bbdfe536e72611d
Merge: c0630c40 8d55033c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 4 15:53:46 2016 -0500
Merge pull request #94 from flame/distcomm
Implemented distributed thrinfo_t management.
commit 86969873b5b861966d717d8f9f370af39e3d9de6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 4 14:24:59 2016 -0500
Reclassified amaxv operation as a level-1v kernel.
Details:
- Moved amaxv from being a utility operation to being a level-1v operation.
This includes the establishment of a new amaxv kernel to live beside all
of the other level-1v kernels.
- Added two new functions to bli_part.c:
bli_acquire_mij()
bli_acquire_vi()
The first acquires a scalar object for the (i,j) element of a matrix,
and the second acquires a scalar object for the ith element of a vector.
- Added integer support to bli_getsc level-0 operation. This involved
adding integer support to the bli_*gets level-0 scalar macros.
- Added a new test module to test amaxv as a level-1v operation. The test
module works by comparing the value identified by bli_amaxv() to the
the value found from a reference-like code local to the test module
source file. In other words, it (intentionally) does not guarantee the
same index is found; only the same value. This allows for different
implementations in the case where a vector contains two or more elements
containing exactly the same floating point value (or values, in the case
of the complex domain).
- Removed the directory frame/include/old/.
commit 8d55033c966feed99fcca2a58017c3ab5b1646dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 27 15:20:58 2016 -0500
Implemented distributed thrinfo_t management.
Details:
- Implemented Ricardo Magana's distributed thread info/communicator
management. Rather that fully construct the thrinfo_t structures, from
root to leaf, prior to spawning threads, the threads individually
construct their thrinfo_t trees (or, chains), and do so incrementally,
as needed, reusing the same structure nodes during subsequent blocked
variant iterations. This required moving the initial creation of the
thrinfo_t structure (now, the root nodes) from the _front() functions
to the bli_l3_thread_decorator(). The incremental "growing" of the tree
is performed in the internal back-end (ie: _int()) function, and so
mostly invisible. Also, the incremental growth of the thrinfo_t tree is
done as a function of the current and parent control tree nodes (as well
as the parent thrinfo_t node), further reinforcing the parallel
relationship between the two data structures.
- Removed the "inner" communicator from thrinfo_t structure definition,
as well as its id. Changed all APIs accordingly. Renamed
bli_thrinfo_needs_free_comms() to bli_thrinfo_needs_free_comm().
- Defined bli_l3_thrinfo_print_paths(), which prints the information
in an array of thrinfo_t* structure pointers. (Used only as a
debugging/verification tool.)
- Deprecated the following thrinfo_t creation functions:
bli_packm_thrinfo_create()
bli_l3_thrinfo_create()
because they are no longer used. bli_thrinfo_create() is now called
directly when creating thrinfo_t nodes.
commit fd04869ae4d4a3b0ebb9052557c296456bce7c0d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 27 14:14:11 2016 -0500
Changed configure's 'omp' threading to 'openmp'.
Details:
- Changed the configure script so that the expected string argument to the
-t (or --enable-threading=) option that enables OpenMP multithreading is
'openmp'. The previous expected string, 'omp', is still supported but
should be considered deprecated.
commit 9424af87209e4e435e2e742430945152690170b0
Merge: efa7341d c0630c40
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 27 12:51:08 2016 -0500
Merge branch 'compose'
commit 7f32dd57c6bd41c0704341752842277dd6a4c8eb
Author: Shaden Smith <shaden@cs.umn.edu>
Date: Sat Sep 17 11:33:57 2016 -0500
Adds sanity check to configuration choice.
commit efa7341df0b0115926aa8a6e8a4ebfb24fdbf11e
Merge: 121c39d4 e1453f68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 16 11:01:57 2016 -0500
Merge pull request #92 from ShadenSmith/readme_fix
Fixes broken URL in README.md
commit e1453f68f6afd90ae9a29b7a5faa46aa79bbf741
Author: Shaden Smith <ShadenTSmith@gmail.com>
Date: Fri Sep 16 09:29:28 2016 -0500
Fixes broken URL in README.md
commit b922d7563422e14c49a4677bc6ae088a408861ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 23 13:38:36 2016 -0500
Avoid compiling BLAS/CBLAS files when disabled.
Details:
- Updated the top-level Makefile, build/config.mk.in template, and
configure script so that object files corresponding to source files
belonging to the BLAS compatibility layer are not compiled (or archived)
when the compatibility layer is disabled. (Same for CBLAS.) Thanks
to Devin Matthews for suggesting this optimization.
- Slight change to the way configure handles internal variables. Instead
of converting (overwriting) some, such as enable_blas2blis and
enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
now stored in new variables that live alongside the originals (with the
suffix "_01"). This is convenient since some values need to be
sed-substituted into the config.mk.in template, which requires "yes" or
"no", while some need to be written to the bli_config.h.in template,
which requires "0" or "1".
Updated BLIS4 TOMS citation in README.md.
Added complex gemm micro-kernels for haswell.
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
architectures. As with their real domain brethren, these kernels perfer
row storage, (though this doesn't affect most users due to high-level
optimizations in most level-3 operations that induce a transpose to
whatever storage preference the kernel may have).
Change-Id: I512ab90784ecbb7cdaee24928d2ccebb544ba5c1
commit 69826110bab2a064ec76457c24843d28f2581281
Merge: 64598ee4 a58dd35e
Author: Pradeep Rao <Pradeep.Rao@amd.com>
Date: Wed Sep 14 03:26:25 2016 -0400
Merge "Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision" into amd-staging
commit c0630c4024b08750043a2942a3e8a037aa6b6259
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 12 13:59:02 2016 -0500
Added debugging printf()'s to bli_l3_thrinfo.c.
Details:
- Added optional printf() statements to print out thread communicator
info as the thrinfo_t structure is built in bli_l3_thrinfo.c.
- Minor changes to frame/thread/bli_thrinfo.h.
commit 7b3bf1ffcd7160ccbf6c2518af6d88f6742e4977
Merge: 35509818 121c39d4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 6 15:47:13 2016 -0500
Merge branch 'master' into compose
commit 121c39d455f2db6f7ce6802ba7f73ad5e088c68c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 5 13:11:42 2016 -0500
Added complex gemm micro-kernels for haswell.
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
architectures. As with their real domain brethren, these kernels perfer
row storage, (though this doesn't affect most users due to high-level
optimizations in most level-3 operations that induce a transpose to
whatever storage preference the kernel may have).
commit 35509818cbea1598b123421f81c42120889a03c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 31 17:34:15 2016 -0500
Added, moved some thread barriers.
Details:
- Removed thread barriers from the end of the loop bodies of
bli_gemm_blk_var1(), bli_gemm_blk_var2(), bli_trsm_blk_var1(),
and bli_trsm_blk_var2().
- Moved the thread barrier at the end of bli_packm_int() to the
end of bli_l3_packm(), and added missing barriers to that function.
- Removed the no longer necessary (and now incorrect) ochief guard
in bli_gemm3m3_packa() on the bli_obj_scalar_reset() on C.
- Thanks to Tyler Smith for help with these changes.
commit 64598ee4cfb86f64abbd4bcef5a82ba0d5565b67
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Aug 31 12:54:50 2016 +0530
fixed the symlink issue
Change-Id: I2186d529f295c576597c189e1ae219bc1a83f955
commit abd61f9fa75d77a96d1491b3e035451ee73238fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 30 12:34:19 2016 -0500
Updated BLIS4 TOMS citation in README.md.
commit 8a2373f26ba8fcd5b2d7b2cc72cb8b2e1f841a03
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Mon Aug 29 14:10:45 2016 +0530
Norm 2 optimization
Change-Id: Ide9decaccd20bf0ccc32c9abb6556e038dceed2b
commit fdc663902347aa252ea88cf09ce24ab748958dff
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Mon Aug 29 10:43:38 2016 +0530
Placed 1 and 1f AMD optimized AVX routines under zen folder
Change-Id: I26795211ef11d232ed794ce36dd0a9c1f8706328
commit 701b9aa3ff028decbf90efac0dca5bd64fe26269
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 26 19:04:45 2016 -0500
Redesigned control tree infrastructure.
Details:
- Altered control tree node struct definitions so that all nodes have the
same struct definition, whose primary fields consist of a blocksize id,
a variant function pointer, a pointer to an optional parameter struct,
and a pointer to a (single) sub-node. This unified control tree type is
now named cntl_t.
- Changed the way control tree nodes are connected, and what computation
they represent, such that, for example, packing operations are now
associated with nodes that are "inline" in the tree, rather than off-
shoot braches. The original tree for the classic Goto gemm algorithm was
expressed (roughly) as:
blk_var2 -> blk_var3 -> blk_var1 -> ker_var2
| |
-> packb -> packa
and now, the same tree would look like:
blk_var2 -> blk_var3 -> packb -> blk_var1 -> packa -> ker_var2
Specifically, the packb and packa nodes perform their respective packing
operations and then recurse (without any loop) to a subproblem. This means
there are now two kinds of level-3 control tree nodes: partitioning and
non-partitioning. The blocked variants are members of the former, because
they iteratively partition off submatrices and perform suboperations on
those partitions, while the packing variants belong to the latter group.
(This change has the effect of allowing greatly simplified initialization
of the nodes, which previously involved setting many unused node fields to
NULL.)
- Changed the way thrinfo_t tree nodes are arranged to mirror the new
connective structure of control trees. That is, packm nodes are no longer
off-shoot branches of the main algorithmic nodes, but rather connected
"inline".
- Simplified control tree creation functions. Partitioning nodes are created
concisely with just a few fields needing initialization. By contrast, the
packing nodes require additional parameters, which are stored in a
packm-specific struct that is tracked via the optional parameters pointer
within the control tree struct. (This parameter struct must always begin
with a uint64_t that contains the byte size of the struct. This allows
us to use a generic function to recursively copy control trees.) gemm,
herk, and trmm control tree creation continues to be consolidated into
a single function, with the operation family being used to select
among the parameter-agnostic macro-kernel wrappers. A single routine,
bli_cntl_free(), is provided to free control trees recursively, whereby
the chief thread within a groups release the blocks associated with
mem_t entries back to the memory broker from which they were acquired.
- Updated internal back-ends, e.g. bli_gemm_int(), to query and call the
function pointer stored in the current control tree node (rather than
index into a local function pointer array). Before being invoked, these
function pointers are first cast to a gemm_voft (for gemm, herk, or trmm
families) or trsm_voft (for trsm family) type, which is defined in
frame/3/bli_l3_var_oft.h.
- Retired herk and trmm internal back-ends, since all execution now flows
through gemm or trsm blocked variants.
- Merged forwards- and backwards-moving variants by querying the direction
from routines as a function of the variant's matrix operands. gemm and
herk always move forward, while trmm and trsm move in a direction that
is dependent on which operand (a or b) is triangular.
- Added functions bli_thread_get_range_mdim(), bli_thread_get_range_ndim(),
each of which takes additional arguments and hides complexity in managing
the difference between the way ranges are computed for the four families
of operations.
- Simplified level-3 blocked variants according to the above changes, so that
the only steps taken are:
1. Query partitioning direction (forwards or backwards).
2. Prune unreferenced regions, if they exist.
3. Determine the thread partitioning sub-ranges.
<begin loop>
4. Determine the partitioning blocksize (passing in the partitioning
direction)
5. Acquire the curren iteration's partitions for the matrices affected
by the current variants's partitioning dimension (m, k, n).
6. Call the subproblem.
<end loop>
- Instantiate control trees once per thread, per operation invocation.
(This is a change from the previous regime in which control trees were
treated as stateless objects, initialized with the library, and shared
as read-only objects between threads.) This once-per-thread allocation
is done primarily to allow threads to use the control tree as as place
to cache certain data for use in subsequent loop iterations. Presently,
the only application of this caching is a mem_t entry for the packing
blocks checked out from the memory broker (allocator). If a non-NULL
control tree is passed in by the (expert) user, then the tree is copied
by each thread. This is done in bli_l3_thread_decorator(), in
bli_thrcomm_*.c.
- Added a new field to the context, and opid_t which tracks the "family"
of the operation being executed. For example, gemm, hemm, and symm are
all part of the gemm family, while herk, syrk, her2k, and syr2k are
all part of the herk family. Knowing the operation's family is necessary
when conditionally executing the internal (beta) scalar reset on on
C in blocked variant 3, which is needed for gemm and herk families,
but must not be performed for the trmm family (because beta has only
been applied to the current row-panel of C after the first rank-kc
iteration).
- Reexpressed 3m3 induced method blocked variant in frame/3/gemm/ind
to comform with the new control tree design, and renamed the macro-
kernel codes corresponding to 3m2 and 4m1b.
- Renamed bli_mem.c (and its APIs) to bli_memsys.c, and renamed/relocated
bli_mem_macro_defs.h from frame/include to frame/base/bli_mem.h.
- Renamed/relocated bli_auxinfo_macro_defs.h from frame/include to
frame/base/bli_auxinfo.h.
- Fixed a minor bug whereby the storage-to-ukr-preference matching
optimization in the various level-3 front-ends was not being applied
properly when the context indicated that execution would be via an
induced method. (Before, we always checked the native micro-kernel
corresponding to the datatype being executed, whereas now we check
the native micro-kernel corresponding to the datatype's real projection,
since that is the micro-kernel that is actually used by induced methods.
- Added an option to the testsuite to skip the testing of native level-3
complex implementations. Previously, it was always tested, provided that
the c/z datatypes were enabled. However, some configurations use
reference micro-kernels for complex datatypes, and testing these
implementations can slow down the testsuite considerably.
commit a58dd35ed7b5b77a6b272655d2edd7a822b8fa87
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
Date: Fri Aug 26 14:55:12 2016 +0530
Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision
Change-Id: Ibddf989f4aad577e89558673e1038cf6ece654d9
commit 73517f522b69de429dd7f3df60a70c068149ab28
Merge: c6f5c215 50293da3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 23 13:46:59 2016 -0500
Merge branch 'master' into compose
commit 50293da38d5f2b7be9bbc94b9e85aacb6a10f672
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 23 13:38:36 2016 -0500
Avoid compiling BLAS/CBLAS files when disabled.
Details:
- Updated the top-level Makefile, build/config.mk.in template, and
configure script so that object files corresponding to source files
belonging to the BLAS compatibility layer are not compiled (or archived)
when the compatibility layer is disabled. (Same for CBLAS.) Thanks
to Devin Matthews for suggesting this optimization.
- Slight change to the way configure handles internal variables. Instead
of converting (overwriting) some, such as enable_blas2blis and
enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
now stored in new variables that live alongside the originals (with the
suffix "_01"). This is convenient since some values need to be
sed-substituted into the config.mk.in template, which requires "yes" or
"no", while some need to be written to the bli_config.h.in template,
which requires "0" or "1".
commit 22dd6a353ddb56614309c01533b1a94c9fd32bca
Merge: cdfb3c3f f20ed388
Author: praveeng <praveen.g@amd.com>
Date: Tue Aug 23 15:15:35 2016 +0530
Merge master code as on 2016_08_23 to amd-staging branch by praveeng
Changes to be committed:
modified: frame/thread/bli_mutex_openmp.h
modified: frame/thread/bli_mutex_pthreads.h
Change-Id: Ica522edbb1d0173f53f38d5057b1f7aef73666be
commit c6f5c215ee793d03ea834469fc2adc53feaffc42
Merge: d52cb767 16a4c7a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 22 17:33:02 2016 -0500
Merge branch 'master' into compose
commit f20ed3885d628992fab88690f629a5a2bab3eb88
Merge: 02ac597e 4bc842ca
Author: praveeng <praveen.g@amd.com>
Date: Mon Aug 22 15:27:33 2016 +0530
Merge branch 'master' of https://github.com/clMathLibraries/blis-amd for "Fixed bugs in bli_mutex_init() and friends."
commit 02ac597e4b9be2670d9fff65d28552f8e1ec81b3
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:11:08 2016 +0530
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
commit 84e41cc73c9c87ce64582acd4264b8e1b5316482
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:01:36 2016 +0530
Revert commits 8aee306
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
commit 30ccfcee82db93d0109d1571242e2db925e95d0a
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 25 14:14:00 2016 +0530
removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
commit aeca25cd63fc8971f8fe7809599c57853f976548
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit 6b2274864b36fd1019d97bcc4ca6dd7a57ef16d9
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit daa7a9ecb25982f2551adbd95e65f8ba97cfe944
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit 5f66a4aa05aeffcb6eb587851d78d9527319466c
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit c6cbd78d2388c08824822b91a1c36ac4349bb67f
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:11:08 2016 +0530
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
commit 9219a9060762525f87ebbf556d78fe8621858513
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:01:36 2016 +0530
Revert commits 8aee306
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
commit 728573296efa7cf14d2381570e116509dfe2a240
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 25 14:14:00 2016 +0530
removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
commit ad7862e291c240505c733a41d231b1a126ade73c
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit ad4b471a25ce77867295e5529dfc787e7c18b03f
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit 55d641363fcd8bdfdabbd7c22822fa2d0b7f3fa6
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit f3b6b15f6d591d323802bd6c81c522a02056506d
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit 16a4c7a823d60707ed9272f5d36e5c5d54c0ba4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 19 11:38:36 2016 -0500
Fixed bugs in bli_mutex_init() and friends.
Details:
- Fixed a couple of bugs that affected OpenMP and POSIX threads
configurations that resulted in compiler errors and warnings due
to type mismatch, and in the case of pthreads, a missing function
argument. The bugs are fairly recent, introduced in a017062.
commit c8e4ef93953ba2b79fb7e0973c08469c0e28a2cd
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Aug 3 16:13:03 2016 -0500
Add prefetchw to 30x8 kernel.
commit 4b5a2f3d6e7ffeb5cc2be8448554f5c2083ad68f
Merge: 380736bf 9f52a587
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Aug 3 16:09:51 2016 -0500
Merge remote-tracking branch 'origin/knl' into knl
# Conflicts:
# kernels/x86_64/knl/3/bli_dgemm_opt_24x8.c
commit 380736bfe955efbdd7274c90b6fd635688e83bc4
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Aug 3 16:08:28 2016 -0500
Add (new) 30x8 KNL kernel and fix non-scatter prefetch bug.
commit 9f52a587dee855daa73c194e41b6951416544e9a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Aug 3 16:03:53 2016 -0500
Try prefetchw[t1] instead of regular prefetch for C.
commit 8945a1512d366bc6a8a85718d12cbf5de6f2898b
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Aug 3 11:28:24 2016 -0500
This version gets ~1550 GFLOPs on KNL wuth 16x4.
commit cdfb3c3f29d321033fca106aa58ab67ead90a95d
Merge: 50a2f2ef 4bc842ca
Author: praveeng <praveen.g@amd.com>
Date: Fri Jul 29 12:45:04 2016 +0530
Merge master code as on 2016_07_29 to amd-staging branch by praveeng
Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486
commit 4bc842ca3a64e658c0808bfe4c5693a5ace97923
Merge: 117f8838 b0d510bf
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 17:32:12 2016 +0530
Merge branch 'master' of publicrepo
commit 117f8838511a478aa16137e770d27dd21f4227c5
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:11:08 2016 +0530
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
commit 2fcdc28f1055d385b2e662aa920fb97c472394d7
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:01:36 2016 +0530
Revert commits 8aee306
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
commit 1b5d104afe0628b8b6c0650f1e58cfb08be67004
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 25 14:14:00 2016 +0530
removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
commit d81273047bff56501e9413a90991d3d1f8b56a06
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit 65905c3011a11cda95761681d4ae84337e46bdb5
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit 23cca231be10fe1797aed451bcbc69d38c78bc0c
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit 922e3091702f25e3287b417719a33adbd5bbf138
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit b0d510bf0e4dfd177f9e4ae0069f41921e2ecdc1
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:11:08 2016 +0530
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
commit 5ebeece5b4a8df81d59ca7558b278a4263d15128
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 28 15:01:36 2016 +0530
Revert commits 8aee306
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
commit 6ce4c022ebdea00c2b951090e3c2e9e88735b9ce
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jul 27 16:26:36 2016 -0500
Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved.
commit d52cb7671509592a8078729477b40b60380518a2
Merge: 95abea46 c31b1e7b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 27 16:04:55 2016 -0500
Merge branch 'master' into compose
commit c31b1e7b9d659b96433a87e5aecb90e457a104cc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 27 15:58:07 2016 -0500
Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
instead of vmovaps/vmovapd. These change mimic those made to the haswell
microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
directory to use DBL_MAX as the initial time candidate. Thanks to Devin
Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
commit b8f2b55532849d45d379afbdd05a52ff6100800d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jul 27 15:22:55 2016 -0500
Try an 8x24 kernel for the hell of it.
commit 7ede5863ae3567f7c0852efc2d5cd649ca19e0f3
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jul 27 13:41:27 2016 -0600
Allocate pack buffer on MCDRAM for KNL.
commit ad89ed2e829c7b261d8ba0998a3cb83ad576ee04
Merge: 2c9de740 81e2b05f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jul 27 11:45:40 2016 -0500
Merge branch 'knl' of github.com:devinamatthews/blis into knl
commit 2c9de740edb66c4692c200731763bbd1d3171ccb
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jul 27 11:44:54 2016 -0500
This version gets ~26GF on one core.
commit 81e2b05f31bca4e1e1676e7b533d1868d9f9be33
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jul 27 11:39:05 2016 -0500
Add optimized packing kernels for KNL.
commit a7d8ca97b8d835c32d90ff20a565c82733f014a8
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 25 15:15:13 2016 -0500
All fixed.
commit 963d0393b023f4134bb0c682923faf9964c0e645
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 25 14:40:53 2016 -0500
Add 24xk pack kernel.
commit 117b76739afba481768897d2580f8365d3345417
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 25 13:53:07 2016 -0500
In the midst of debugging.
commit 8c0a4fd1d3535d608a9a309a61ffee0a73c3646f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 25 13:09:24 2016 -0500
Fix some row/column confusion.
commit c44f9f96930312125b15e64c326ab5ab5cc02633
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 25 12:02:24 2016 -0500
Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length.
commit e0cce177cc1b47ec9f11ac0556241feaa3564df1
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Jul 25 10:02:25 2016 -0500
Minor fixes for 8x24 KNL kernel.
commit 50a2f2efcbeb46537f1deaa8e44dc579a4e49eb8
Merge: 1aa77dfc cfd46c88
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 25 17:01:20 2016 +0530
Merge master code as on 2016_07_25 to amd-staging branch by praveeng
Change-Id: I84886ae241db2aac0bef6b7ef399f04aa8bca16d
commit cfd46c88d59c8f61d5e7cf768d606e4c44623584
Merge: f493bf4d a017062f
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 25 15:38:13 2016 +0530
Merge remote-tracking branch 'publicrepo/master'
commit f493bf4d704fe0e967783cd6e6877d3302c056a1
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 25 14:14:00 2016 +0530
removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
commit 65735bbedf75784c48bd11e05b3fdc98fc66b4bc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sun Jul 24 21:50:32 2016 -0500
Switch to 24x8 kernel, unrolled by 16.
commit 45d5dc97177117220bd9dd0abf85aafc185acad1
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sun Jul 24 14:25:26 2016 -0500
Add 24x8 "KNC-style" kernel for KNL.
commit 95abea46f86816fddfc9ff0abfa52880801461be
Merge: d0dfe5b5 a017062f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 23 15:38:33 2016 -0500
Merge branch 'master' into compose
commit a017062fdf763037da9d971a028bb07d47aa1c8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 22 17:02:59 2016 -0500
Integrated "memory broker" (membrk_t) abstraction.
Details:
- Integrated a patch originally authored and submitted by Ricardo Magana
of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
(memory broker) that allows multiple sets of memory pools on, for example,
separate NUMA nodes, each of which has a separate memory space.
- Added membrk field to cntx_t and defined corresponding accessor macros.
- Added membrk field to mem_t object and defined corresponding accessor macros.
- Created new bli_membrk.c file, which contains the new memory broker API,
including:
bli_membrk_init(), bli_membrk_finalize()
bli_membrk_acquire_[mv](), bli_membrk_release(),
bli_membrk_init_pools(), bli_membrk_reinit_pools(),
bli_membrk_finalize_pools(),
bli_membrk_pool_size()
- In bli_mem.c, changed function calls to
bli_mem_init_pools() -> bli_membrk_init()
bli_mem_reinit_pools() -> bli_membrk_reinit()
bli_mem_finalize_pools() -> bli_membrk_finalize()
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
bli_mem_release() -> bli_membrk_release()
- Added bli_mutex.c and related files to frame/thread. These files define
abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
single-threaded execution. This new API is employed within functions
such as bli_membrk_acquire_[mv]() and bli_membrk_release().
commit 8ff2e069c48c12fd06b9c48c6b3aeb4ea9b0e6e1
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 16:22:26 2016 -0500
Add 4x unrolled variant for KNL microkernel.
commit 9cb2ed9b0c25f31a22c1c9719b062fa665ad7adf
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 16:10:30 2016 -0500
Git rid of one RBX update.
commit 451bde076f0320d60cd2475cfb048ac4a2b798bb
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 15:43:00 2016 -0500
Add some more knobs to twiddle for KNL microkernel.
commit 8c6e621c099521e7a4d87e007bb8224faa5f33a3
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 15:05:15 2016 -0500
Make knl conform to new kernel dir structure.
commit ce7214c6618d6f22f4ce2ee452336236916d1f30
Merge: 119d0399 ce59f811
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 14:59:53 2016 -0500
Merge remote-tracking branch 'origin/master' into knl
commit ce59f81108ec9aea918a7e77030da8acfdd397ce
Merge: ff41153f 707a2b7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 22 14:48:14 2016 -0500
Merge pull request #88 from devinamatthews/32bit-dim_t
Handle 32-bit dim_t in 64-bit microkernels.
commit 707a2b7faca137cca7cab7b11a12c44ddaf7ad53
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 13:49:44 2016 -0500
Somehow forgot the most important microkernel.
commit 47ec045056351ac4f0791c071fa0daaa81699c8c
Merge: 08f1d6b6 ff41153f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 13:45:23 2016 -0500
Merge remote-tracking branch 'upstream/master' into 32bit-dim_t
commit 08f1d6b6fa344275de0f675f69737145ccf6646a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 13:44:37 2016 -0500
Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit.
commit ff41153f4eb7f38ed94bdd9a3fd81fb979f3f401
Merge: f9214ced e0d2fa0d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 22 13:21:03 2016 -0500
Merge pull request #86 from devinamatthews/haswell-vmovups
Remove alignment restrictions on C in haswell kernel.
commit e0d2fa0d835ab49366aeb790363bb2b571d36ed8
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 12:56:51 2016 -0500
Relax alignment restrictions for haswell sgemm.
commit f9214ced97392861f5a0ea72abfcf6f41faf674c
Merge: 413d62ac 08666eaa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 22 12:16:39 2016 -0500
Merge pull request #85 from devinamatthews/qopenmp
Change -openmp to -fopenmp for icc.
commit ee2c139df6ad53c6aec8a67ab23b3b1912e8d259
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 12:06:03 2016 -0500
Remove alignment restrictions on C in haswell kernel.
commit 08666eaa20d8a31f2f92f944e5bfa7c1558c53e4
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 11:07:34 2016 -0500
Change -openmp to -fopenmp for icc.
commit 119d0399428905053265f3aca1cc8cc1fde3b363
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Jul 22 10:23:31 2016 -0500
Add 8x24 KNL kernel.
commit 1aa77dfc1dc183d16e0b6a1196d9c263f021e83d
Merge: 9101a9c8 ec9f5983
Author: praveeng <praveen.g@amd.com>
Date: Thu Jul 21 14:22:40 2016 +0530
Merge master code as on 2016_07_21 to amd-staging branch by praveeng
Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344
commit b58cda9eba0c1e175460aae109baf792d29ba5bf
Merge: 318f063d 413d62ac
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Jul 19 14:09:09 2016 -0500
Merge remote-tracking branch 'origin/master' into knl
# Conflicts:
# frame/base/bli_threading.h
# frame/include/blis.h
# frame/thread/bli_thread.c
commit ec9f59836b32260c29ff1cd24e629c7d8de14992
Merge: 197e182f 763babe4
Author: praveeng <praveen.g@amd.com>
Date: Mon Jul 18 12:56:25 2016 +0530
Merge branch 'master' of https://github.com/clMathLibraries/blis-amd
commit 197e182fcbf1340fd4a202fac58bea6cfcfa9e2f
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit 41fb32711031e7ec86b062aa7f53255d1f5905e2
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit d0dfe5b5372cc7558ee9c4104b29f82eecc7ed61
Merge: 31def12e 413d62ac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 14 11:01:06 2016 -0500
Merge branch 'master' into compose
commit 9101a9c880e3934f8a63ffc7fe15f5fc1077a73d
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Jul 13 16:51:14 2016 +0530
Checked in optimized 1V kernels along with benchmark codes. Also incorporated review comments for 1F kernels
Change-Id: I035c0d39e6b0bed28e6e2041242186c49f6ed55b
commit 763babe488880b42c86c7fc207aa7665bd0ff9f7
Merge: 357c990b 413d62ac
Author: praveeng <praveen.g@amd.com>
Date: Wed Jul 13 11:57:19 2016 +0530
Merge remote-tracking branch 'publirepo/master'
commit 413d62aca28edabba56605a9f87d5b715831e1db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 12 15:02:52 2016 -0500
README update (use official ACM TOMS links).
commit dfa431f696db2df4065ea454df268a2e0bc02eac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 12 14:21:19 2016 -0500
README update (BLIS2 TOMS article now in-print).
commit 357c990bdd7bd5667aac5adf1bab3712973e7414
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 16:51:23 2016 +0530
first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
commit 8aee306300adb099b66036f2c2f7f3996433cf49
Author: praveeng <praveen.g@amd.com>
Date: Tue Jul 5 15:00:31 2016 +0530
small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
commit 31def12e2629f187e40f93f6bae9e26a6c2660e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 30 15:19:20 2016 -0500
First phase of control tree redesign.
Details:
- These changes constitute the first set of changes in preparation to
revamping the structure and use of control trees in BLIS. Modifications
in this commit don't affect the control tree code yet, but rather lay
the groundwork.
- Defined wrappers for the following functions, where the the wrappers
each take a direction parameter of a new enumerated type (BLIS_BWD or
BLIS_FWD), dir_t, and executes the correct underlying function.
- bli_acquire_mpart_*() and _vpart_*()
- bli_*_determine_kc_[fb]()
- bli_thread_get_range_*() and bli_thread_get_range_weighted_*()
- Consolidated all 'f' (forwards-moving) and 'b' (backwards-moving)
blocked variants for trmm and trsm, and renamed gemm and herk variants
accordingly. The direction is now queried via routines such as
bli_trmm_direct(), which deterines the direction from the implied side
and uplo parameters. For gemm and herk, it is uncondtionally BLIS_FWD.
- Defined wrappers to parameter-specific macrokernels for herk, trmm, and
trsm, e.g. bli_trmm_xx_ker_var2(), that execute the correct underlying
macrokernel based on the implied parameters. The same logic used to
choose the dir_t in _direct() functions is used here.
- Simplified the function pointer arrays in _int() functions given the
consolidation and dir_t querying mentioned above.
- Function signature (whitespace) reformatting for various functions.
- Removed old code in various 'old' directories.
commit 405c9d46344d93c3eab5572b233900b50ca50d68
Author: sthangar <Santanu.Thangaraj@amd.com>
Date: Wed Jun 22 12:18:54 2016 +0530
Check-in the fused kernels optimized for Zen
Change-Id: I7b2f467b960e7b9a285f06e47be87de122e5fa24
commit 232754feecf29452987666b9f5ebba2619bfd0b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 21 14:25:39 2016 -0500
Fixed compiler warning in rand[vm], randn[vm].
Details:
- Fixed compiler warnings about unused variables related to the disabling
of normalization in the structured cases of the rand[vm] and randn[vm]
operations.
commit a89555d1605574f3685813dcc972b636dd61264d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 17 14:08:35 2016 -0500
Added randn[vm] operations, support in testsuite.
Details:
- Defined a new randomization operation, randn, on vectors and matrices.
The randnv and randnm operations randomize each element of the target
object with values from a narrow range of values. Presently, those
values are all integer powers of two, but they do not need to be powers
of two in order to achieve the primary goal, which is to initialize
objects that can be operated on with plenty of precision "slack"
available to allow computations that avoid roundoff. Using this method
of randomization makes it much more likely that testsuite residuals of
properly-functioning operations are close to zero, if not exactly zero.
- Updated existing randomization operations randv and randm to skip
special diagonal handling and normalization for matrices with structure.
This is now handled by the testsuite modules by explicitly calling a
testsuite function that loads the diagonal (and scales off-diagonal
elements).
- Added support for randnv and randnm in the testsuite with a new switch
in input.general that universally toggles between use of the classic
randv/randm, which use real values on the interval [-1,1], and
randnv/randnm, which use only values from a narrow range. Currently,
the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
well as 0.0.
- Updated testsuite modules so that a testsutie wrapper function is called
instead of directly calling the randomization operations (such as
bli_randv() and bli_randm()). This wrapper also takes a bool_t that
indicates whether the object's elements should be normalized. (NOTE: As
alluded to above, in the test modules of triangular solve operations such
as trsv and trsm, we perform the extra step of loading the diagonal.)
- Defined a new level-0 operation, invertsc, which inverts a scalar.
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
but possible divide-by-zero.
- Updated function signature and prototype formatting in testsuite.
commit 318f063dcbd8b594969e401bc99146d24b01066a
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Jun 8 17:46:50 2016 -0500
Add new KNL microkernel derived from Haswell.
commit 096895c5d538a7f8817603d7cf28c52e99340def
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 6 13:32:04 2016 -0500
Reorganized code, APIs related to multithreading.
Details:
- Reorganized code and renamed files defining APIs related to multithreading.
All code that is not specific to a particular operation is now located in a
new directory: frame/thread. Code is now organized, roughly, by the
namespace to which it belongs (see below).
- Consolidated all operation-specific *_thrinfo_t object types into a single
thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
functions (aside from a few general purpose bli_thrinfo_*() functions).
- Renamed thread_comm_t object type to thrcomm_t.
- Renamed many of the routines and functions (and macros) for multithreading.
We now have the following API namespaces:
- bli_thrinfo_*(): functions related to thrinfo_t objects
- bli_thrcomm_*(): functions related to thrcomm_t objects.
- bli_thread_*(): general-purpose functions, such as initialization,
finalization, and computing ranges. (For now, some macros, such as
bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
appropriate.)
- Renamed thread-related macros so that they use a bli_ prefix.
- Renamed control tree-related macros so that they use a bli_ prefix (to be
consistent with the thread-related macros that were also renamed).
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
#undef was a temporary fix to some macro defaults which were being applied
in the wrong order, which was recently fixed.
commit 232530e88ff99f37abcae5b6fb5319a9a375a45f
Merge: 4bcabd1b eef37f8b
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Jun 1 15:14:10 2016 -0500
Merge commit 'refs/pull/81/head' of https://github.com/flame/blis
Conflicts:
frame/base/bli_threading_pthreads.c
frame/base/bli_threading_pthreads.h
commit 4bcabd1bf60688c38cf562459fc5e8be8b831756
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Jun 1 13:27:28 2016 -0500
Use spin locks instead of pthread barriers
commit eef37f8b4d81845a6ba4bf25586d32b50c3e8a68
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Sun May 29 22:28:13 2016 -0700
use GCC intrinsic instead of pthread_mutex for atomic increment and fetch
commit 9dcd6f05c4c3ff2ce7cd87a9951a96ebef22681e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 24 13:15:32 2016 -0500
Implemented developer-configurable malloc()/free().
Details:
- Replaced all instances of bli_malloc() and bli_free() with one of:
- bli_malloc_pool()/bli_free_pool()
- bli_malloc_user()/bli_free_user()
- bli_malloc_intl()/bli_free_intl()
each of which can be configured to call malloc()/free() substitutes,
so long as the substitute functions have the same function type
signatures as malloc() and free() defined by C's stdlib.h. The _pool()
function is called when allocating blocks for the memory pools (used
for packing buffers, primarily), the _user() function is called when
obj_t's are created (via bli_obj_create() and friends), and the _intl()
function is called for internal use by BLIS, such as when creating
control tree nodes or temporary buffers for manipulating internal data
structures. Substitutes for any of the three types of bli_malloc() may
be specified by #defining the following pairs of cpp macros in
bli_kernel.h:
- BLIS_MALLOC_POOL/BLIS_FREE_POOL
- BLIS_MALLOC_USER/BLIS_FREE_USER
- BLIS_MALLOC_INTL/BLIS_FREE_INTL
to be the name of the substitute functions. (Obviously, the object
code that contains these functions must be provided at link-time.)
These macros default to malloc() and free(). Subsitute functions are
also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
- Removed definitions for bli_malloc() and bli_free().
- Note that bli_malloc_pool() and bli_malloc_user() are now defined in
terms of a new function, bli_malloc_align(), which aligns memory to an
arbitrary (power of two) alignment boundary, but does so manually,
whereas before alignment was performed behind the scenes by
posix_memalign(). Currently, bli_malloc_intl() is defined in terms
of bli_malloc_noalign(), which serves as a simple wrapper to the
designated function that is passed in (e.g. BLIS_MALLOC_INTL).
Similarly, there are bli_free_align() and bli_free_noalign(), which
are used in concert with their bli_malloc_*() counterparts.
commit 9dd440109a9d964f5cd286e9f83c487ad703e1e4
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Sat May 21 15:21:58 2016 -0700
fix 404 link to BuildSystem
Google Code is dead. Long live GitHub!
commit d309f20b7376a68efa3b864ad790c2021c071655
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 18 15:13:53 2016 -0500
Added alignment switch to testsuite.
Details:
- Added a new input parameter to input.general that globally toggles
whether testsuite tests are performed on objects whose buffers and
leading dimensions have been aligned, and changed the implementation
of libblis_test_mobj_create() to employ alignment (or not) regardless
of whether row, column, or general storage is being tested.
- Updated configure script's "--help" text to indicate default behavior
for internal integer type size and BLAS/CBLAS integer type size
options.
commit 32db0adc218ea4ae370164dbe8d23b41cd3526d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 17 15:20:16 2016 -0500
Generate prototypes for user-defined packm kernels.
Details:
- Created template prototypes for packm kernels (in bli_l1m_ker.h), and
then redefined reference packm kernels' prototyping headers in terms of
this template, as is already done for level-1v, -1f, and -3 kernels.
- Automatically generate prototypes for user-defined packm kernels in
bli_kernel_prototypes.h (using the new template prototypes in
bli_l1m_ker.h).
- Defined packm kernel function types in bli_l1m_ft.h, including for
packm kernels specific to induced methods, which are now used in
bli_packm_cxk.c and friends rather than using a locally-defined
function type.
- In bli_packm_cxk.c, extended function pointer for packm kernels array
from out to index 31 (from previous maximum of 17). This allows us to
store the unrolled 30xk kernel in the array for use (on knc, for
example). Note: This should have been done a long time ago.
commit e3bd5ca64ae7c190ba689396c0de687b829a11fe
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu May 12 20:54:13 2016 -0500
Fix SIMD definitions in KNL config, and a couple of fixes to C update.
commit 4fe02e3d497995d94d34d3fcf5af895084cfc8b9
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu May 12 20:53:58 2016 -0500
Move bli_kernel.h before bli_threading.h in order of inclusion in blis.h.
commit 4bcf1b35abea3f3dfc8f2fe462dcf155cf199e55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 11 16:09:49 2016 -0500
Fixed bli_get_range_*() bugs in trsm variants.
Details:
- Fixed incorrect calls to bli_get_range_*() from within trsm blocked
variants 1f, 2b, and 2f. The bug somehow went undetected since the
big commit (537a1f4), and, strangely, did not manifest via the BLIS
testsuite. The bug finally came to our attention when running thei
libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
for submitting the initial report that led to this bug.
commit 9cfa33023f123a6c17e987f72fba174ce073f0b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 11 16:02:30 2016 -0500
Minor updates to bli_f2c.h.
Details:
- Added #undef guards to certain #define statements in bli_f2c.h,
and renamed the file guard to BLIS_F2C_H. This helps when
#including "blis.h" from an application or library that already
#includes an "f2c.h" header.
commit a09a2e23eacf5328858c8318bb637c5ff3b71d08
Merge: 4dcd37eb 7c604e1c
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed May 11 10:47:11 2016 -0500
Merge pull request #76 from devinamatthews/move_simd_defs
Move default SIMD-related definitions to bli_kernel_macro_defs.h
commit 4dcd37eb1b12a6e08cc13df7b61391ef8363f5d8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue May 10 16:28:59 2016 -0500
fixing knc simd align size
commit 619dee0daec3474b4e5a55df90a61aabcae194f2
Merge: b790b3d9 7c604e1c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue May 10 12:13:24 2016 -0500
Merge branch 'move_simd_defs' into knl
commit 7c604e1cbc1609b6e12d3ee973c08b7af5035be4
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue May 10 12:11:55 2016 -0500
Move default SIMD-related definitions to bli_kernel_macro_defs.h. Otherwise, configurations which customize these fail as these are now defined in bli_kernel.h.
commit b790b3d9e1820f3b691676de48c291cae083452d
Merge: 4f8c05c9 a7be2d28
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue May 10 11:49:47 2016 -0500
Merge branch 'master' into knl
commit a7be2d28e8930b154d0da1d6929b54a96e210af6
Merge: 97b512ef 4b1e55ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 10 11:48:51 2016 -0500
Merge pull request #74 from devinamatthews/fix_common_symbols
Default-initialize all extern global variables to avoid generating common symbols.
commit 4b1e55edbfe0e1cb2e7b9428424903497cb7a841
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue May 10 10:08:47 2016 -0500
Default-initialize all extern global variables to avoid generating common symbols. Fixes #73.
commit 97b512ef62c7e25c97ed5e9eca81cd7015b2ac91
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 6 10:24:30 2016 -0500
Include headers from cblas.h to pull in f77_int.
Details:
- Added #include statements for certain key BLIS headers so that the
definition of f77_int is pulled in when a user compiles application
code with only #include "cblas.h" (and no other BLIS header). This
is necessary since f77_int is now used within the cblas API.
commit c3a4d39d03665135f1616588b5ef7c3e9ef5688d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 4 17:22:56 2016 -0500
Updates to haswell gemm micro-kernels.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
commit 0b01d355ae861754ae2da6c9a545474af010f02e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 27 15:21:10 2016 -0500
Miscellaneous cleanups, fixes to recent commits.
Details:
- Fixed a typo in bli_l1f_ref.h, introduced into bbb8569, that only
manifested when non-reference level-1f kernels were used.
- Added an #undef BLIS_SIMD_ALIGN_SIZE to bli_kernel.h of dunnington
configuration to prevent a compile-time warning until I can figure out
the proper permanent fix.
- Moved frame/1f/kernels/bli_dotxaxpyf_ref_var1.c out of the compilation
path (into 'other' directory). _ref_var2 is used by default, which is
the variant that is built on axpyf and dotxf instead of dotaxpyv.
- Removed section of frame/include/bli_config_macro_defs.h pertaining to
mixed datatype support.
commit ed7326c836f427e2f8420b015220ce293207b10c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 27 14:57:40 2016 -0500
Added 'restrict' to l1v/l1f code in 'kernels' dir.
Details:
- Added 'restrict' keyword to existing kernel definitions in 'kernels'
directory. These changes were meant for inclusion in bbb8569.
commit bbb8569b2a08c3bcd631d5a05eb389d01d94ac07
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 27 14:13:46 2016 -0500
Use 'restrict' in all kernel APIs; wspace changes.
Details:
- Updated level-1v, level-1f kernel function types (bli_l1?_ft.h) and
generic kernel prototypes (bli_l1?_ker.h) to use 'restrict' for all
numerical operand pointers (ie: all pointers except the cntx_t).
- Updated level-1f reference kernel definitions to use 'restrict' for
all numerical operand pointers. (Level-1v reference kernel definitions
were already updated in bdbda6e.)
- Rewrote the level-1v and level-1f reference kernel prototypes in
bli_l1v_ref.h and bli_l1f_ref.h, respectively, to simply #include
bli_l1v_ker.h and bli_l1f_ker.h with redefined function base names
(as was already being done for the level-3 micro-kernel prototypes
in bli_l3_ref.h), rather than duplicate the signatures from the
_ker.h files.
- Added definitions to frame/include/bli_kernel_prototypes.h for axpbyv
and xpbyv, which were probably meant for inclusion in bdbda6e.
- Converted a number of instances of four spaces, as introduced in
bdbda6e, to tabs.
commit 4ea419c72c789825e1f93a1eee88219bbf873930
Merge: f1e9be2a bdbda6e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 26 12:50:45 2016 -0500
Merge pull request #70 from devinamatthews/daxpby
Give the level1v operations some love
commit bdbda6e6acc682ab1b6ca680edebd09ae12a832c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 25 11:05:57 2016 -0500
Give the level1v operations some love:
- Add missing axpby and xpby operations (plus test cases).
- Add special case for scal2v with alpha=1.
- Add restrict qualifiers.
- Add special-case algorithms for incx=incy=1.
commit f1e9be2aba1a057eedb947bbae96848597777408
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 22 15:34:02 2016 -0500
Minor tweak to test/Makefile.
Details:
- Just committing a minor change to test/Makefile that has been lingering
in my local working copy for longer than I can remember.
commit aa0bceec277938328dabeb744680623f24fb0b61
Merge: 4136553f e2784b4c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 22 12:01:31 2016 -0500
Merge branch 'master' of github.com:flame/blis
commit 4136553f0d0661a668dfdb9edcd7ce1c5773dde7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 22 11:53:53 2016 -0500
Clear level-3 cntx_t's via memset() before use.
Details:
- In all level-3 operations' _cntx_init() functions, replaced calls to
bli_cntx_obj_init() with calls to bli_cntx_obj_clear(), and in all
level-3 operations' _cntx_finalize() functions, removed calls to
bli_cntx_obj_finalize(), leaving those function definitions empty.
- Changed the definition of bli_cntx_obj_clear() so that the clearing
occurs via a single call to memset().
commit 4f8c05c9e2ef4cbb82b35a3ebf1f0a0ac665830e
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Apr 21 10:00:59 2016 -0500
Rearrange KNL dgemm kernel again to streamline usage of ymm register. sgemm and dgemm now both working with Intel SDE.
commit e2784b4c921f706e756df3e146e20a4cb63f53e3
Merge: dd0ab1d9 a9b6c3ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 20 18:34:09 2016 -0500
Merge pull request #67 from devinamatthews/cblas-f77-int
Change CBLAS integer type to f77_int
commit a9b6c3abda6222a8b240361643932e83cf726c4f
Merge: e4c54c81 dd0ab1d9
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Apr 20 16:00:10 2016 -0500
Merge remote-tracking branch 'origin/master' into cblas-f77-int
# Conflicts:
# config/haswell/bli_config.h
commit e4c54c81463c2a19c9bb6b1f0f1be3fa9d018a45
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Apr 20 15:56:46 2016 -0500
Change integer type in CBLAS function signatures to f77_int, and add proper const-correctness to BLAS layer.
commit dd0ab1d93f33abca6af9edd7b8e52da62dcfa5b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 20 14:38:23 2016 -0500
Converted some bli_cntx query functions to macros.
Details:
- Commented out several datatype-aware query functions (those ending in
_dt) from bli_cntx.c, as well as their prototypes in bli_cntx.h, and
added equivalent cpp query macros to bli_cntx.h.
- Added 'bli_config.h' to .gitignore.
commit 7193230f7d35edbd1d2f77842a613971f1603463
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Apr 20 09:37:30 2016 -0500
Work around missing VPMULLQ on KNL.
commit a30ccbc4c6a6e6460e78af6b5c530ee0d06f98fb
Merge: eb2f18e4 0e1a9821
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 19 15:04:33 2016 -0500
Merge pull request #66 from devinamatthews/blas-configure
Add configure options and generate bli_config.h automatically.
commit bd44cf13e886069bc66c10ac0db178be96629a0d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Apr 19 13:43:04 2016 -0500
Fix copy-paste errors in KNL kernels.
commit eb2f18e4844d985715df20798f50f9cc12e3b5ad
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 19 12:50:32 2016 -0500
More compile-time fixes to bgq gemm ukernel code.
commit 0e1a9821d860f6c1d818baf4c48d21a23726c132
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Apr 19 11:44:37 2016 -0500
Add configure options and generate bli_config.h automatically.
Options to configure have been added for:
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
- Enabling and disabling the BLAS and CBLAS layers.
Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.
Lastly, support for OMP in clang has been added (closes #56).
commit a11eec05928ddc5c43fa5dbcd35f2edd24ff35a1
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 18 13:13:36 2016 -0500
Add sgemm ukernels for KNL. vpmullq is not implemented on KNL -- needs workaround.
commit ff84469a4575f1ef8a0010046fde52240a312cae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 18 12:29:09 2016 -0500
Applied various compilation fixes to bgq kernels.
commit c38e0dab05b2dc36672eab96e1248fb7fb2d785b
Merge: bd5e2296 cbcd0b73
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 18 10:21:35 2016 -0500
Merge remote-tracking branch 'origin/master' into knl
commit bd5e2296e98e042c31f1e8ece2c1ca8e4bdc2d4c
Merge: 4745def0 49f85177
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 18 10:15:22 2016 -0500
Merge remote-tracking branch 'origin/knl' into knl
commit 4745def0c87377ae83ad73ac514d7de08a96b2ac
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 18 10:15:05 2016 -0500
Add 64-bit offset vector so we can use vgatherqpd.
commit 49f85177f886f38889b60503a4e12fa7f04be1fd
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 18 10:14:11 2016 -0500
KNL ukernel compiles with gcc.
commit cbcd0b739dc54bd14fbb46aeda267c26725cd70f
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Mon Apr 18 03:12:57 2016 -0500
Changing ifdef for OSX pthread barriers
commit 58b2c3cf040134d1be913c585a3c6905629116c0
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sat Apr 16 16:12:24 2016 -0500
Rewrite of KNL kernel in GNU extended asm syntax.
commit dd62080cea78f3a23616200d6640e52c102b2bb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 15 11:15:41 2016 -0500
Compile-time fix to bgq l1f kernels.
Details:
- Fixed an old reference to bli_daxpyf_fusefac, which no longer exists,
by replacing it with the axpyf fusing factor (8), and cleaned up the
relevant section of config/bgq/bli_kernel.h.
- Removed most of the details of the level-3 kernels from the template
kernel code in config/template/kernels/3 and replaced it with a
reference to the relevant kernel wiki maintained on the BLIS github
website.
commit d5a915dd8d7a6ead42a68772e4420eb3647e6f1a
Merge: 4320b725 41694675
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 14 12:56:36 2016 -0500
Merge branch 'master' of github.com:flame/blis
commit 4320b725a1f8fd34101470b6cf52ad504a79c517
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 14 12:51:29 2016 -0500
Use kernel CFLAGS on "ukernels" directories.
Details:
- Updated the top-level Makefile so that the CFLAGS variable designated
for kernel source code is applied not only to source code in
directories named "kernels" but source code in any directory that
contains the substring "kernels", such as "ukernels".
- Formally disabled some code in gen-make-frag.sh script that was already
effectively disabled. The code was related to handling "noopt" and
"kernel" directories, which is now handled independently within the
top-level Makefile without needing to place these source files into
a spearate makefile variable.
commit 41694675e4cb56e2e0323c7a7db48e0819606a31
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 13 15:51:08 2016 -0500
pthreads bugfixes
Getting pthreads to work on my Mac
Implemented a pthread barrier when _POSIX_BARRIER isn't defined
Now spawn n-1 threads instead of n threads so that master thread isn't just spinning the whole time
Add -lpthread instead of -pthread to LDFLAGS (for clang)
commit f756dbfa0d542cbc497724981520c83abf049c4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 13 11:25:33 2016 -0500
Removed stale #include from bgq configuration.
Details:
- Removed an old #include statement ("bli_gemm_8x8.h") from the
bli_kernel.h file in the bgq configuration. It turns out this
file was no longer needed even prior to 537a1f4.
commit 0bd4169ea75f690714e7d2912229932a75d8a7e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 11 18:08:32 2016 -0500
Fixed context-broken dunnington/penryn kernels.
Details:
- Added missing context parameters to several instances where simpler
kernels, or reference kernels, are called instead of executing the
main body code contained in the kernel function in question.
- Renamed axpyv and dotv kernel files to use "opt" instead of "int"
substring, for consistency with level-1f kernels.
commit 7912af5db45b7372d19a9a3dfeb82df302a05628
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 11 17:32:13 2016 -0500
CHANGELOG update (0.2.0)
commit 898614a555ea0aa7de4ca07bb3cb8f5708b6a002
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 11 17:32:09 2016 -0500
Version file update (0.2.0)
commit 537a1f4f85ce1aa008901857cb3182e6b4546d7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 11 17:21:28 2016 -0500
Implemented runtime contexts and reorganized code.
Details:
- Retrofitted a new data structure, known as a context, into virtually
all internal APIs for computational operations in BLIS. The structure
is now present within the type-aware APIs, as well as many supporting
utility functions that require information stored in the context. User-
level object APIs were unaffected and continue to be "context-free,"
however, these APIs were duplicated/mirrored so that "context-aware"
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
These new context-aware object APIs (along with the lower-level, type-
aware, BLAS-like APIs) contain the the address of a context as a last
parameter, after all other operands. Contexts, or specifically, cntx_t
object pointers, are passed all the way down the function stack into
the kernels and allow the code at any level to query information about
the runtime, such as kernel addresses and blocksizes, in a thread-
friendly manner--that is, one that allows thread-safety, even if the
original source of the information stored in the context changes at
run-time; see next bullet for more on this "original source" of info).
(Special thanks go to Lee Killough for suggesting the use of this kind
of data structure in discussions that transpired during the early
planning stages of BLIS, and also for suggesting such a perfectly
appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
structure" (gks). This data structure and API will allow the caller to
initialize a context with the kernel addresses, blocksizes, and other
information associated with the currently active kernel configuration.
The currently active kernel configuration within the gks cannot be
changed (for now), and is initialized with the traditional cpp macros
that define kernel function names, blocksizes, and the like. However,
in the future, the gks API will be expanded to allow runtime management
of kernels and runtime parameters. The most obvious application of this
new infrastructure is the runtime detection of hardware (and the
implied selection of appropriate kernels). With contexts in place,
kernels may even be "hot swapped" at runtime within the gks. Once
execution enters a level-3 _front() function, the memory allocator will
be reinitialized on-the-fly, if necessary, to accommodate the new
kernels' blocksizes. If another application thread is executing with
another (previously loaded) kernel, it will finish in a deterministic
fashion because its kernel information was loaded into its context
before computation began, and also because the blocks it checked out
from the internal memory pools will be unaffected by the newer threads'
reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
the code enabling use of induced methods for complex domain matrix
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
those APIs' functionality is now mostly subsumed within the global
kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
that will reinitialize a memory pool if the necessary pool block size
has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
usage of contexts where appropriate to communicate cache and register
blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
the context and/or the global kernel structure:
- Removed blocksize object pointers (blksz_t*) fields from all control
tree node definitions and replaced them with blocksize id (bszid_t)
values instead, which may be passed into a context query routine in
order to extract the corresponding blocksize from the given context.
- Removed micro-kernel function pointers (func_t*) fields from all
control tree node definitions. Now, any code that needs these function
pointers can query them from the local context, as identified by a
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
level-1v kernel id (l1vkr_t).
- Removed blksz_t object creation and initialization, as well as kernel
function object creation and initialization, from all operation-
specific control tree initialization files (bli_*_cntl.c), since this
information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
blocksize multiples for each blocksize id (bszid_t) in the context
object.
- Removed the bool_t's that were required when a func_t was initialized.
These bools are meant to allow one to track the micro-kernel's storage
preferences (by rows or columns). This preference is now tracked
separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
util directories, but has the most obvious effect of allowing BLIS
to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
in an attempt to reduce overhead for memory-bound operations. This
includes removal of default use of object-based variants for level-2
operations. Now, by default, level-2 operations will directly call a
low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
heterogeneous bool_t's (one for each floating-point datatype), in the
same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
new parameter, which may be set indirectly via the aforementioned
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
statically allocate memory in macro-kernels and the induced methods'
virtual kernels to be used as temporary space to hold a single
micro-tile. These values are now output by the testsuite. The default
value of BLIS_STACK_BUF_MAX_SIZE is computed as
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
and "haswell," respectively, and gave more consistent and meaningful
names to many kernel files (as well as updating their interfaces to
conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
context for test modules that need those values: axpyf, dotxf,
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
for level-1m-like operations on small matrices) in frame/include/level0
to use more obscure local variable names in an effort to avoid variable
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
of scalm. The semantic meaning of the conj argument is to optionally
allow implicit conjugation of the scalar prior to being populated into
the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
that this does not preclude supporting mixed types via the object APIs,
where it produces absolutely zero API code bloat.
commit dd856c2cb75a2221a503a73dde27790c34b91570
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Apr 11 10:39:18 2016 -0500
Translated MIC kernel to KNL and cleaned up a bit. Only real change is lack of swizzle modifiers for FMA instructions (used bcast from memory instead).
commit 7f27431d3fffdda99c282ec412731d0a90cb32a7
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Apr 8 10:04:39 2016 -0500
Copy mic kernel to knl for transliteration.
commit f8f02f0334ac020021e15a415bcd33aeea01deb4
Merge: 32c92d94 d1f8e5d9
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Apr 6 11:37:05 2016 -0500
Merge branch 'master' into const_correctness
commit 32c92d945c55708da0eb63be1771f8c5430e3910
Merge: 62914ccb 20af937b
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Apr 6 11:36:02 2016 -0500
Merge branch 'master' into const_correctness
commit d1f8e5d9b2ecd054ed103f4d642d748db2d4f173
Merge: 20af937b c11d28ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 5 12:21:27 2016 -0500
Merge pull request #60 from esauvage/master
sgemm µkernel for bulldozer : bug correction for k%4 != 0
commit c11d28eed89d65494bc4019f04d046520866c0ff
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Sat Apr 2 21:15:48 2016 +0200
cgemm µkernel for bulldozer : bug correction for k%4 != 0
commit 20af937b57f82bb3acb09418d5c0206e1b24f2c7
Merge: 36c3abb0 fc61a114
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 31 14:37:30 2016 -0500
Merge pull request #59 from devinamatthews/fix_testsuite_makefile
Fix testsuite makefile
commit fc61a1143edeba4946d4b9915f1775bb08e643fc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 31 10:53:01 2016 -0500
Fix formatting in configure.
commit 26379b14de630e3a6c6eef5dfe87ff001558a8a6
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 31 10:45:48 2016 -0500
Adjust paths in common.mk to support building from testsuite dir.
commit 36c3abb05fecb02d4a9ab13b2b69d133adf34583
Merge: 64b41fa5 917ce754
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 31 10:26:17 2016 -0500
Merge pull request #58 from esauvage/master
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…
commit 356d854fc9e34642cc46e0e02a8ceb56114878af
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 30 16:33:15 2016 -0500
Make symlink to common.mk in build directory.
commit edbb8470044f82ef959583ee09613a5a985292b5
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 30 16:27:11 2016 -0500
Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile.
commit 917ce75482a543fef46553efff6c246939761e59
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Wed Mar 30 22:03:09 2016 +0200
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
commit 62914ccbcdb3c594f065dcfa65bd7e7b95c79283
Merge: bbf704bf 64b41fa5
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Tue Mar 29 15:24:25 2016 -0500
Merge branch 'master' into const_correctness
commit 64b41fa554dff44b2f9ad48901b67c63836407a8
Merge: 1b09e343 0171ad58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 29 15:19:41 2016 -0500
Merge pull request #54 from devinamatthews/more_config_opts
More config opts
commit 1b09e343dfe5b48b4842e2cb96f41c8cc249bad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 29 12:55:28 2016 -0500
Updated gcc version from 4.8 to 4.9 in .travis.yml.
commit 0171ad58997b3a5a9b76301511dbe0751fffc940
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Mar 28 13:55:06 2016 -0500
Add icc and clang support for Intel architectures, fixes #47. 2bd036f fixes #49 BTW.
commit 3090fff64cc87ff2519a09f38e6b8699cf3cba11
Merge: 8624e365 4ca5d5b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 28 12:36:25 2016 -0500
Merge pull request #44 from esauvage/master
sgemm micro-kernel for FMA4 instruction set
commit e6e566426ac3ded7ef87cd8ff9be98accfdc4acc
Merge: 469429ec 8624e365
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sat Mar 26 14:10:15 2016 -0500
Merge branch 'master' into more_config_opts
commit 8624e36543160739d954c4dbcc5a5594458f3a12
Merge: a315833f 2bd036f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 26 13:56:28 2016 -0500
Merge pull request #50 from devinamatthews/fix_noopt_avx
Fix configuration issue where instruction set flags are not specified for debug builds.
commit 469429ec34e5b1a172ce35596f9c7afdaacac131
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 20:45:41 2016 -0500
Fix LD_FLAGS -> LDFLAGS.
commit 8442d65c9ead0376fc5f2dfad62fd4862ab9b2b3
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 20:06:48 2016 -0500
Replace -march=native with specific architecture flags to support cross-compiling, and add icc support for Intel architectures.
commit 76099f20be1b49ac960f7e3c5a8296bbf4e1782d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 17:22:58 2016 -0500
Add threading option to configure.
commit ad43eab4c7899d56d8d7caa6e2d92bc0581ea5a5
Merge: 9452bdb3 2bd036f1
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 15:00:02 2016 -0500
Merge branch 'fix_noopt_avx' into more_config_opts
commit 9452bdb3afbf2d7f898134a091d7790817e7be9c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 14:59:50 2016 -0500
Add options for verbose make output and static/shared linking to configure.
commit 2bd036f1f9ce1ee0864365557f66d9415dd42de3
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 12:16:49 2016 -0500
Fix configuration issue where instruction set flags are not specified for debug builds.
commit bbf704bf7501411964a63a68f1af541f612cf92d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 09:55:35 2016 -0500
Add missing const to bli_read_nway_from_env.
commit a315833f067944fb0bc14cf60f0c7dcb5dc897b6
Merge: 1d1a426d af92773f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 24 12:30:21 2016 -0500
Merge pull request #48 from figual/master
Updated and improved ARMv8 micro-kernels.
commit af92773f4f85a2441fe0c6e3a52c31b07253d08e
Author: figual <figual@ucm.es>
Date: Wed Mar 23 22:07:02 2016 +0100
Updated and improved ARMv8 micro-kernels.
commit a4d7729776d17d9bdf2341eacd70b9770b9ba8d2
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Mar 21 09:55:21 2016 -0500
Set default value for debug_type variable.
commit 0e2447fa55d8c5fa2b1fc4150073512495c5f9eb
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 17 16:32:05 2016 -0500
Add const correctness to auxinfo_t struct (microkernels need update theoretically).
commit 1d1a426d18ec03754021456862a1f4d1dfec1fbf
Merge: 5a978fff d226dfa0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 7 15:17:53 2016 -0600
Merge pull request #46 from devinamatthews/new-config-opts
Add several changes to the build system.
commit d226dfa05190eb477b33563b1edccf8603973336
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sat Mar 5 16:18:14 2016 -0600
Add several changes to the build system.
1) Add -- options.
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
4) Add make V=[0,1] option to control build verbosity.
commit 5a978fffdb8f09a81c89541d541d4a6830cd70a4
Merge: adb2b4e0 63e26423
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 4 17:26:58 2016 -0600
Merge pull request #45 from devinamatthews/high_prec_timers
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday
commit 63e264239053b913164a849dd8a45829087eaddc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 4 13:17:50 2016 -0600
Make sure that -lrt is linked on Linux.
commit 44fddd48dc1708a956803d1948f04429ec0d8700
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 4 12:36:38 2016 -0600
Add missing \.
commit 7cabd2131f953de23e7015d760b0ddfda51b1251
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 3 11:43:07 2016 -0600
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday.
commit adb2b4e096c78e8b2f85fd372cf0d5eb04af5be8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Mar 2 14:48:12 2016 -0600
Fixing guard for non implemented partitioning through packed matrices
commit 4ca5d5b1fd6f2e4a8b2e139c5405475239581e51
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Tue Mar 1 21:33:01 2016 +0100
sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
commit 627d59b5ba06866b26f46e4434a0435b600925e3
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Mon Feb 29 21:53:12 2016 +0100
symbolic link for bulldozer configuration to kernels
commit 2dc5c0ae038ed175fab85751803ada05734d1ba1
Merge: f2809fc5 3d0fae81
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 29 12:22:51 2016 -0600
Merge pull request #40 from tkelman/bulldozer-symlink
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
commit f2809fc5f74466c755da6a5b4632853e634060b5
Merge: f86b94f2 8624a33c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Feb 27 13:06:03 2016 -0600
Merge pull request #39 from devinamatthews/fix_f2c_conflicts
Devin's f2c type namespace update.
Details:
- Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
- Removed most of the body of bli_f2c.h, which was unused.
commit 3d0fae810d942085d8f2d389820b4e0027577db8
Author: Tony Kelman <tony@kelman.net>
Date: Thu Feb 25 23:24:03 2016 -0800
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
to fix linking issue mentioned in #37 and https://groups.google.com/forum/#!topic/blis-devel/iypwljcaeEI
commit 8624a33ccc12dff6f6c4f92992ca5636af1576a6
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Feb 25 13:51:26 2016 -0600
Fix remaining f2c conflicts.
commit 372eef0b6c0a535bf88d4b46b72f61266e8491ba
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Feb 25 12:01:58 2016 -0600
Fixed most conflicts after hack-n-slash ofr bli_f2c.h, cleanup in
progress.
commit f86b94f206e2e09fa3221cc55c3dc5b05ca4775a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 23 18:12:34 2016 -0600
Included missing blas2blis integer def to CBLAS.
Details:
- Added #include "bli_config_macro_defs" to all cblas_*.c files in
compat/cblas/src. This has the effect of defining
BLIS_BLAS2BLIS_INT_TYPE_SIZE to the default value if bli_config.h does
not define it. Thanks to Tony Kelman for reporting this bug.
- In cblas_i?amax.c, changed the type of the variable 'iamax' from 'int'
to 'f77_int'. This eliminates a compiler warning and a potential
runtime bug and/or crash when the size of an int differs from the size
of f77_int (as determined by BLIS_BLAS2BLIS_INT_TYPE_SIZE).
commit 0b126de1342c11c65623bcb38e258e21e9244e3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 13 16:29:12 2015 -0600
Consolidated packm_blk_var1 and packm_blk_var2.
Details:
- Consolidated the two blocked variants for packm into a single
implementation (packm_blk_var1) and removed the other variant.
- Updated all induced method _cntl_init() functions in frame/cntl/ind/
to use the new blocked variant 1.
- Defined two new macros, bli_is_ind_packed() and bli_is_nat_packed(),
to detect pack_t schemas for induced methods and native execution,
respectively.
commit 30e5eb29e060b97752f702d2ea5d101d950f53b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 13 12:14:19 2015 -0600
Minor changes to treatment of rs, cs in bli_obj.c.
Details:
- Applied a patch submitted by Devin Matthews that:
- implements subtle changes to handling of somewhat unusual cases of
row and column strides to accommodate certail tensor cases, which
includes adding dimension parameters to _is_col_tilted() and
_is_row_tilted() macros,
- simplifies how buffers are sized when requested BLIS-allocated
objects,
- re-consolidates bli_adjust_strides_*() into one function, and
- defines 'restrict' keyword as a "nothing" macro for C++ and pre-C99
environments.
commit f0a4f41b5acf55b41707ec821c4c5f9076dfbc24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 12 15:22:50 2015 -0600
Fixed unimplemented case in core2 sgemm ukernel.
Details:
- Implemented the "beta == 0" case for general stride output for the
dunnington sgemm micro-kernel. This case had been, up until now,
identical to the "beta != 0" case, which does not work when the
output matrix has nan's and inf's. It had manifested as nan residuals
in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin
Matthews for reporting this bug.
commit 42810bbfa0b8f006ecc5128d903909ec13ea63f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 12 12:07:46 2015 -0600
Fixed minor bugs for uncommon obj_create cases.
Details:
- Separated bli_adjust_strides() into _alloc() and _attach() flavors so
that the latter can avoid a test performed by the former, in which the
rs and cs are overridden and set to zero if either matrix dimension is
zero. Actually, we also disable this overridding behavior, even for the
_alloc() case, since keeping the original strides (probably) does not
hurt anything. The original code has been kept commented-out, though,
in case an unintended consequence is later discovered.
- Fixed a typo in an error check for general stride cases where rs == cs.
commit 3e6dd11467643fbc2cb45c13cec8dd6024232833
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 3 10:30:08 2015 -0600
Minor re-expression in quadratic partitioning code.
Details:
- Minor change to quadratic equation solution code that avoids
recomputation of the sqrt() parameter when the compiler is not
smart enough to perform this optimization automatically.
commit 0694b722f7e4df00efb32639095a2aca80e67f52
Merge: 3e116f0a 33557ecc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 2 17:24:25 2015 -0600
Merge branch 'master' of github.com:flame/blis
commit 3e116f0a2953f50b3c068759a775ad7ffae04e49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 2 17:18:23 2015 -0600
Fixed imaginary bug in quadratic partitioning code.
Details:
- Fixed a bug in the relatively new quadratic partitioning code that,
under the right conditions, would perform sqrt() on a negative value.
If the solution is imaginary, we discard it and use an alternate
partition width that assumes no diagonal intersection. That alternate
width is actually already computed, so, the fix was quite simple.
Thanks to Devangi Parikh for reporting this bug.
commit 33557ecccaf49b2569b7f3d7bcea52c2aab94c68
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Mon Nov 2 12:18:43 2015 -0800
add Travis CI build status icon to the README
commit 4a502fbe77bd0f701108baaa559d9cfb483f88de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 2 13:28:34 2015 -0600
Laid groundwork for runtime memory pool resizing.
Details:
- Changed bli_pool_finalize() so that the freeing begins with the block
at top_index instead of block 0. This allows us to use the function
for terminal finalization as well as temporary cleanup prior to
reinitialization. Also, clear the pool_t struct upon _pool_finalize()
in case it is called in the terminal case with some blocks still
checked out to threads (in which case the threads will see the new
block size as 0 and thus release the block as intended).
- Added bli_pool_reinit(), which calls _pool_finalize() followed by
_pool_init() with new parameters.
- Added bli_mem_reinit(), which is based on bli_pool_reinit().
- Added new wrapper, _mem_compute_pool_block_sizes(), which calls
_mem_compute_pool_block_sizes_dt().
- Updated bli_mem_release() so that the pblk_t is freed, via
_pool_free_block(), if the block size recorded in the mem_t at the
time the pblk_t was acquired is now different from the value in the
pool_t.
commit 37e55ca39bdbddaec03ad30d43e8ad2b3e549c96
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 30 18:25:04 2015 -0500
Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
Details:
- Fixed a family of bugs in the triangular level-3 operations for
certain complex implementations (3m1 and 4m1a) that only manifest if
one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
- Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
for the triangular case.
- Fixed the incorrect computation of imaginary stride, as stored in
the auxinfo_t struct in trmm and trsm macro-kernels.
- Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
cases where the the register blocksize for the triangular matrix is
odd. Introduced a new byte-granular pointer arithmetic macro,
bli_ptr_add(), that computes the correct value.
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
terms of __typeof__, which is used by bli_ptr_add() macro.
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
for singleton problems because the inherent ambiguity of whether a
scalar is row-stored or column-stored causes the wrong parameter
combination code to be executed (by dumb luck of our checking for
row storage first).
- Added commented-out debugging lines to 3m1/4m1a and reference
micro-kernels, and trsm_ll macro-kernel.
commit 46294d80e5a79c598e200e1c8ec2a642ff839971
Merge: d3159c57 a0a7b85a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 27 12:41:23 2015 -0500
Merge pull request #35 from figual/master
Fixed incomplete code in the double precision ARMv8 microkernel.
commit a0a7b85ac3e157af53cff8db0e008f4a3f90372c
Author: Francisco Igual <figual@ucm.es>
Date: Tue Oct 27 08:59:15 2015 +0000
Fixed incomplete code in the double precision ARMv8 microkernel.
commit d3159c5740c9ee7f8c0b661003aab6f00646ad6f
Merge: b489152e 7e03e45b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 21 14:54:00 2015 -0500
Merge branch 'master' of github.com:flame/blis
commit b489152e112644ec3b6d19e687231a9607f7694f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 21 14:53:17 2015 -0500
Use vzeroall in haswell micro-kernels.
commit 7e03e45bfe6c27c4fdbf06b1caa7f49e9a5fef49
Merge: 77ddb0b1 4f88c29f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 14 13:26:07 2015 -0500
Merge pull request #33 from xianyi/master
Enable Travis CI
commit 4f88c29f9e634cbb6fb22d8c88931f0ec78ad7db
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Wed Oct 14 12:57:50 2015 -0500
Detect Intel Broadwell (using Haswell config).
commit 4b0ac1a9984a93f7ad4369b10fca63991107d9f5
Merge: fe3e355c 77ddb0b1
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Wed Oct 14 12:51:05 2015 -0500
Merge branch 'upstream_master'
commit 77ddb0b1d31ada111dadf392766ba6d9210ed9fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 13 12:53:06 2015 -0500
Removed flop-counting mechanism.
Details:
- Removed the optional flop-counting feature introduced in commit
7574c994.
commit 276da366187460a4c8e6e0910e79cb39ce780bfe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 12 11:43:03 2015 -0500
Minor formatting change to README.md.
commit d17057446f5404824478e8a6cd08f242ab75544a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 12 11:39:49 2015 -0500
Added "Getting Started" section to README.md.
Details:
- Added section to README.md file containing links to wikis with brief
descriptions.
commit e7e1f2f7b601b21b50e3cdad8972cb3fe11018d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 2 16:51:52 2015 -0500
Minor updates to CREDITS, README files.
commit 55329906ecd7ce1ab910e4d30a29354a9172e7ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 26 20:47:19 2015 -0500
Minor edits to README.md, testsuite.
Details:
- Fixed typos in README.md.
- Fixed column heading alignment for testsuite when matlab output is
enabled.
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
commit bbebdb5793a8fd6aaf257012ab0272beaa04a0de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 25 14:47:27 2015 -0500
Replaced README with README.md.
Details:
- Replaced the old (and short) README file with a much more comprehensive
version written in github-flavored markdown. The new file is based on
content taken from the old Google Code homepage.
commit e2e9d64a63485461192d9c2a6dd0183a8b71013c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 24 12:14:03 2015 -0500
Load balance thread ranges for arbitrary diagonals.
Details:
- Expanded/updated interface for bli_get_range_weighted() and
bli_get_range() so that the direction of movement is specified in the
function name (e.g. bli_get_range_l2r(), bli_get_range_weighted_t2b())
and also so that the object being partitioned is passed instead of an
uplo parameter. Updated invocations in level-3 blocked variants, as
appropriate.
- (Re)implemented bli_get_range_*() and bli_get_range_weighted_*() to
carefully take into account the location of the diagonal when computing
ranges so that the area of each subpartition (which, in all present
level-3 operations, is proportional to the amount of computation
engendered) is as equal as possible.
- Added calls to a new class of routines to all non-gemm level-3 blocked
variants:
bli_<oper>_prune_unref_mparts_[mnk]()
where <oper> is herk, trmm, or trsm and [mnk] is chosen based on which
dimension is being partitioned. These routines call a more basic
routine, bli_prune_unref_mparts(), to prune unreferenced/unstored
regions from matrices and simultaneously adjust other matrices which
share the same dimension accordingly.
- Simplified herk_blk_var2f, trmm_blk_var1f/b as a result of more the
new pruning routines.
- Fixed incorrect blocking factors passed into bli_get_range_*() in
bli_trsm_blk_var[12][fb].c
- Added a new test driver in test/thread_ranges that can exercise the new
bli_get_range_*() and bli_get_range_weighted_*() under a range of
conditions.
- Reimplemented m and n fields of obj_t as elements in a "dim"
array field so that dimensions could be queried via index constant
(e.g. BLIS_M, BLIS_N). Adjusted/added query and modification
macros accordingly.
- Defined mdim_t type to enumerate BLIS_M and BLIS_N indexing values.
- Added bli_round() macro, which calls C math library function round(),
and bli_round_to_mult(), which rounds a value to the nearest multiple
of some other value.
- Added miscellaneous pruning- and mdim_t-related macros.
- Renamed bli_obj_row_offset(), bli_obj_col_offset() macros to
bli_obj_row_off(), bli_obj_col_off().
commit fe3e355c9c5a6f65b8736b009e2d501b62a83ea1
Merge: efa641e3 4dd9dd3e
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Fri Aug 21 14:38:36 2015 -0500
Merge branch 'upstream_master'
commit efa641e36b73abee34166a252e90e28a6281d92d
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Sat Aug 22 03:15:50 2015 +0800
Try to fix the compiling bug on travis.
commit 4dd9dd3e1de626b51bfe85d9ee65f193d60e8d38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 21 11:52:37 2015 -0500
Fixed minor alignment ambiguity bug in bli_pool.c.
Details:
- Fixed a typecasting ambiguity in bli_pool_alloc_block() in which
pointer arithmetic was performed on a void* as if it were a byte
pointer (such as char*). Some compilers may have already been
interpreting this situation as intended, despite the sloppiness.
Thanks to Aleksei Rechinskii for reporting this issue.
- Redefined pointer alignment macros to typecast to uintptr_t instead of
siz_t.
commit 12ffd568b04feda57147c13b67717416a01c82f8
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Sat Aug 22 00:24:28 2015 +0800
Add Travis CI.
commit ecc3ebb749e0861c27deda52b5f87236ede4901b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 29 13:31:12 2015 -0500
CHANGELOG update (0.1.8)
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 29 13:31:09 2015 -0500
Version file update (0.1.8)
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e
Merge: fdfe14f1 d4b89136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 9 13:54:54 2015 -0500
Merge branch 'master' of github.com:flame/blis
commit fdfe14f1e17ba5a2f8dfa0bdb799c6b0e730211b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 9 13:52:39 2015 -0500
Added support for Intel Haswell/Broadwell.
Details:
- Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
and FMA instructions. (Complex support is currently provided by default
induced method, 4m1a.)
- Added a 'haswell' configuration, which uses the aforementioned kernels.
- Inserted auto-detection support for haswell configuration in
build/auto-detect/cpuid_x86.c.
- Modified configure script to explicitly echo when automatic or manual
configuration is in progress.
- Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.
commit d4b891369c1eb0879ade662ff896a5b9a7fca207
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 7 10:06:53 2015 -0500
Added 'carrizo' configuration.
Details:
- Added a new configuration for AMD Excavator-based hardware also known
as Carrizo when referring to the entire APU. This configuration uses
the same micro-kernels as the piledriver, but with different
cache blocksizes.
commit 0b7255a642d56723f02d7ca1f8f21809967b8515
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 19 12:01:50 2015 -0500
CHANGELOG update (0.1.7)
commit 267253de8a7be546ce87626443ee38701c1d411f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 19 12:01:49 2015 -0500
Version file update (0.1.7)
commit 7cd01b71b5e757a6774625b3c9f427f5e7664a76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 19 11:31:53 2015 -0500
Implemented dynamic allocation for packing buffers.
Details:
- Replaced the old memory allocator, which was based on statically-
allocated arrays, with one based on a new internal pool_t type, which,
combined with a new bli_pool_*() API, provides a new abstract data
type that implements the same memory pool functionality but with blocks
from the heap (ie: malloc() or equivalent). Hiding the details of the
pool in a separate API also allows for a much simpler bli_mem.c family
of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
sane defaults for the values previously found in bli_config. Those
values can be overridden by #defining them in bli_config.h the same
way kernel defaults can be overridden in bli_kernel.h. This file most
resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
blocks in the memory pool. Also added a corresponding query routine to
the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
reflection, it seems that the goal of more predictable L1 cache
replacement behavior is outweighed by the harm caused by non-contiguous
micro-panels when k % kc != 0. I honestly don't think anyone will even
miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
modules. Thanks to Devangi Parikh for pointing out these
miscalculations.
- Comment, whitespace changes.
commit 9848f255a3bab17d1139c391cca13ff3f1ffe6ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 11 19:14:22 2015 -0500
Added early return to API-level _init() routines.
Details:
- Added conditional code that returns early from the API-level _init()
routines if the API is already initialized. Actually meant for this to
be included in 5f93cbe8.
commit 5f93cbe870f3478870e15581e7fd450dad5bba1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 11 18:52:12 2015 -0500
Introduced API-level initialization.
Details:
- Added API-level initialization state to _const, _error, _mem, _thread,
_ind, and _cntl APIs. While this functionality will mostly go unused,
adding miniscule overhead at init-time, there will be at least once
instance in the near future where, in order to avoid an infinite loop,
a certain portion of the initialization will call a query function that
itself attempts to call bli_init(). API-level initialization will allow
this later stage to verify that an earlier stage of initialization has
completed, even if the overall call to bli_init() has not yet returned.
- Added _is_initialized() functions for each API, setting the underlying
bool_t during _init() and unsetting it during _finalize().
- Comment, whitespace changes.
commit ee129c6b028bc5ac88da7c74fde72c49803742ff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 10 12:53:28 2015 -0500
Fixed bugs in _get_range(), _get_range_weighted().
Details:
- Fixed some bugs that only manifested in multithreaded instances of
some (non-gemm) level-3 operations. The bugs were related to invalid
allocation of "edge" cases to thread subpartitions. (Here, we define
an "edge" case to be one where the dimension being partitioned for
parallelism is not a whole multiple of whatever register blocksize
is needed in that dimension.) In BLIS, we always require edge cases
to be part of the bottom, right, or bottom-right subpartitions.
(This is so that zero-padding only has to happen at the bottom, right,
or bottom-right edges of micro-panels.) The previous implementations
of bli_get_range() and _get_range_weighted() did not adhere to this
implicit policy and thus produced bad ranges for some combinations of
operation, parameter cases, problem sizes, and n-way parallelism.
- As part of the above fix, the functions bli_get_range() and
_get_range_weighted() have been renamed to use _l2r, _r2l, _t2b,
and _b2t suffixes, similar to the partitioning functions. This is
an easy way to make sure that the variants are calling the right
version of each function. The function signatures have also been
changed slightly.
- Comment/whitespace updates.
- Removed unnecessary '/' from macros in bli_obj_macro_defs.h.
commit 9135dfd69d39f3bbd75034f479f27a78dbfebcce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 5 13:37:44 2015 -0500
Minor updates to test/3m4m files.
commit d62ceece943b20537ec4dd99f25136b9ba2ae340
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 3 12:56:45 2015 -0500
Minor update to test/3m4m/runme.sh.
Details:
- Removed some stale script code that should have been removed
during 590bb3b8c.
commit b6ee82a3d421c9c4f1eb6848c7c6e37aa46de799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 3 12:14:23 2015 -0500
Minor cleanup to bli_init() and friends.
Details:
- Spun-off initialization of global scalar constants to bli_const_init()
and of threading stuff to bli_thread_init().
- Added some missing _finalize() functions, even when there is nothing
to do.
commit 1213f5cebabc1637ce9dd45c4bfa87bb93677c29
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 2 13:27:47 2015 -0500
POSIX thread bugfixes/edits to bli_init.c, _mem.c.
Details:
- Fixed a sort-of bug in bli_init.c whereby the wrong pthread mutex
was used to lock access to initialization/finalization actions.
But everything worked out okay as long as bli_init() was called by
single-threaded code.
- Changed to static initialization for memory allocator mutex in
bli_mem.c, and moved mutex to that file (from bli_init.c).
- Fixed some type mismatches in bli_threading_pthreads.c that resulted
in compiler warnings.
- Fixed a small memory leak with allocated-but-never-freed (and unused)
pthread_attr_t objects.
- Whitespace changes to bli_init.c and bli_mem.c.
commit 590bb3b8c5c0389159c5a9451b6c156c5f237e8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun May 24 16:02:53 2015 -0500
Backed-out adjusted dim changes to test/3m4m.
Details:
- Reverted most changes applied during commit ec25807b.
commit ec25807b26da943868f0d0517c3720e50181b8f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 10 13:23:50 2015 -0500
Tweaks to test/3m4m to test with adjusted dims.
Details:
- Updated test/3m4m driver files to build test drivers that allow
comparision of real "asm_blis" results to complex "asm_blis" results,
except with the latter's problem sizes adjusted so that problems are
generated with equal flop counts.
commit 426b6488580a92bf071a62dc319a9c837ce39821
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 8 15:12:21 2015 -0500
Fixed a packing bug that manifested in trsm_r.
Details:
- Fixed a bug that caused a memory leak in the contiguous memory
allocator. Because packm_init() was using simple aliasing when
a subpartition object was marked as zeros by bli_acquire_mpart_*(),
the "destination" pack object's mem_t entry was being overwritten
by the corresponding field of the "source" object (which was likely
NULL). This prevented the block from being released back to the
memory allocator. But this bug only manifested when changing the
location of packing B from outside the var1 loop to inside the
var3 loop, and only for trsm with triangular B (side = right). The
bug was fixed by changing the type of alias used in packm_init()
when handling zero partition cases. Specifically, we now use
bli_obj_alias_for_packing(), which does not clobber the destination
(pack) object's mem_t field. Thanks to Devangi Parikh for this bug
report.
commit c84286d5cef48f16d83831baac1f46b9856b9a36
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 4 15:39:14 2015 -0500
More minor tweaks to test/3m4m.
Details:
- Added a line of output that forces matlab to allocate the entire array
up-front.
- Re-enabled real domain benchmarks in runme.sh, which were temporarily
disabled.
commit 309717c8ebf4ef1369f15cf41340e13c25b41573
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 3 19:28:49 2015 -0500
More tweaks to test/3m4m, configurations.
Details:
- Fixed incorrect number of mc_x_kc memory blocks in
sandybridge/bli_config.h.
- Enabled OpenMP multithreding in piledriver/bli_config.h.
- More updates to test/3m4m driver files.
commit 4baf3b9c69b2f648be9e46e07ccc9859dd675828
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 3 16:44:32 2015 -0500
Tweaked test/3m4m driver, including acml support.
Details:
- Added ACML support to test/3m4m driver Makefile and runme.sh script.
commit a32f7c49ca4ea869d2a6c66818780f4321743d67
Merge: 349e075a 4bfd1ce8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 3 08:28:11 2015 -0500
Merge pull request #23 from xianyi/master
Add auto-detecting CPU on configure stage.
commit 349e075ad6a8e2a1211d94f36d24828c9d44b052
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 2 18:12:28 2015 -0500
Tweaks to sandybridge config, test/3m4m driver.
Details:
- Enable OpenMP support by default in sandybridge's bli_config.h.
- Reorganized sandybridge's bli_kernel.h.
- Updated 3m4m Makefile, runme.sh to also test MKL implementation.
commit 4bfd1ce8ca93f93d170dd2715f0a32027b417b46
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Thu Apr 2 16:40:21 2015 -0500
Detect NEON for cortex-a9 and cortex-a15.
commit aa6eec4f43137057276fe6119bdbfb5c52682527
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Thu Apr 2 16:03:44 2015 -0500
Detect the CPU architecture. Support ARM cores.
Detect the CPU architecture by compiler's predefined macros.
Then, detect the CPU cores.
Support detecting x86 and ARM architectures.
commit 2947cfb749c937b0f62fac36cc92f123bd45b53c
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Wed Apr 1 12:24:00 2015 -0500
Add auto-detecting CPU on configure stage.
e.g. /Path_to_BLIS/configure auto
Now, it only support detecting x86 CPUs.
commit 26a4b8f6f985597f80e0174990bf541f1d9bafac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 1 10:44:54 2015 -0500
Implemented 3m2, 3m3 induced algorithms (gemm only).
Details:
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
support in packm_init(), packm_blk_var2().
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
as an argument instead of computing it locally. Exception: for trmm,
is_p must be computed locally, since it changes for triangular
packed matrices. Also exposed is_p in interface to dt-specific
packm_blk_var2 (and _var1, even though it does not use imaginary
stride).
- Renamed many functions/variables from _3mi to _3mis to indicate that
they work for either interleaved or separated 3m pack schemas.
- Generalized gemm and herk macro-kernels to pass in imaginary stride
rather than compute them locally.
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
and 3m3-specific virtual micro-kernels.
- Added special gemm macro-kernels to support 3m2 and 3m3.
- Added support for 3m2 and 3m3 to testsuite.
- Corrected the type of the panel dimension (pd_) in various macro-
kernels from inc_t to dim_t.
- Renamed many functions defined in bli_blocksize.c.
- Moved most induced-related macro defs from frame/include to
frame/ind/include.
- Updated the _ukernel.c files so that the micro-kernel function pointers
are obtained from the func_t objects rather than the cpp macros that
define the function names.
- Updated test/3m4m driver, Makefile, and run script.
commit ddf62ba7d2da08225b201585b85e06c967767dea
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 27 14:27:51 2015 -0500
Refuse to free the packm thread info if it uses the single threaded version
commit 016fc587584d958a0e430a56a5e2c05022ac2f17
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 27 14:23:02 2015 -0500
Don't free packm thread info if it is null
commit 00a443c529a60862a57b93e303a0b3212c9b1df4
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 27 14:11:07 2015 -0500
Use bli_malloc instead of malloc for the thread info paths
commit f1a6b7d02861ccebdc500ea98778cc0f6cddad17
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 18 15:37:10 2015 -0500
Reorganized code for induced complex methods.
Details:
- Consolidated most of the code relating to induced complex methods
(e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
are now enabled on a per-operation basis. The current "available"
(enabled and implemented) implementation can then be queried on
an operation basis. Micro-kernel func_t objects as well as blksz_t
objects can also be queried in a similar maner.
- Redefined several micro-kernel and operation-related functions in
bli_info_*() API, in accordance with above changes.
- Added mr and nr fields to blksz_t object, which point to the mr
and nr blksz_t objects for each cache blocksize (and are NULL for
register blocksizes). Renamed the sub-blocksize field "sub" to
"mult" since it is really expressing a blocksize multiple.
- Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
trsm to correctly query mr and nr (for purposes of nudging kc).
- Introduced an enumerated opid_t in bli_type_defs.h that uniquely
identifies an operation. For now, only level-3 id values are defined,
along with a generic, catch-all BLIS_NOID value.
- Reworked testsuite so that all induced methods that are enabled
are tested (one at a time) rather than only testing the first
available method.
- Reformated summary at the beginning of testsuite output so that
blocksize and micro-kernel info is shown for each induced method
that was requested (as well as native execution).
- Reduced the number of columns needed to display non-matlab
testsuite output (from approx. 90 to 80).
commit 8d5169ccda954e5f72944308a036dcb7ebfc9097
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 18 11:38:08 2015 -0500
Fixed bug in release of mem_t buffer.
Details:
- Fixed a bug that affects all level-2 and level-3 blocked variants. The
bug only manifested, however, if the packing of operands (A and B in
gemm, for example) spanned multiple nodes in the control tree. Until
recently, the main consumers of packm were level-3 operations, all of
which packed both input operands from blocked variant 1 (B outside of
the loop, and A within the loop). This particular usage masked a flaw
in the code whereby bli_obj_release_pack() would always release the
underlying mem_t buffer (provided it was allocated), even if the buffer
was not allocated in the current variant. This has been fixed by
replacing all calls to bli_obj_release_pack() with calls to a new
function, bli_packm_release(), which takes the same control tree node
argument passed into the object's corresponding call to packm_init()
or packv_init(). bli_packm_release() then proceeds to invoke
bli_obj_release_pack() only if the control tree node indicates that
packing was requested. Thanks to Devangi Parikh for identifying this
bug.
commit c0acca0f5182ba96fd39c9d10b34a896a6e74206
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 3 10:56:22 2015 -0600
Clarified comments in testsuite input.operations.
commit 03ba9a6b17861d9e1adc0cf924439c4d7e860d19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 24 10:33:28 2015 -0600
Removed some 'old' directories.
commit a86db60ee270cdeb745ae7cf68f9e0becc9f522d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 23 18:42:39 2015 -0600
Extensive renaming of 3m/4m-related files, symbols.
Details:
- Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
('i' for "interleaved"). Similar changes to 3M/4M macros.
- Renamed all 3m/4m files and functions to 3m1/4m1.
- Whitespace changes.
commit 8cf8da291a0fb2f491f410969a76ec0fbda47faf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 20 15:24:27 2015 -0600
Minor updates to induced complex mode management.
Details:
- Relocated bli_4mh.c, bli_4mb.c, bli_4m.c, bli_3mh.c, bli_3m.c (and
associated headers) from frame/base to frame/base/induced.
- Added bli_xm.? to frame/base/induced, which implements
bli_xm_is_enabled(), which detects whether ANY induced complex method
is currently enabled.
- The new function bli_xm_is_enabled() is now used in bli_info.c to
detect when an induced complex method is used, so we know when to
return blocksizes from one of the induced methods' blocksize objects.
commit 411e637ee7d1083a84f58f08938d51e63d7c3c9a
Merge: c2569b88 fc0b7712
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Fri Feb 20 20:39:25 2015 -0600
Merge branch 'master' of http://github.com/flame/blis
commit c2569b8803d4ccc1d7b6f391713461b51443601d
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Fri Feb 20 20:38:19 2015 -0600
Fixed a memory leak in freeing the thread infos
commit fc0b771227abf86d81f505b324f69f6e83db1d8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 20 11:47:44 2015 -0600
Added max(mr,nr) to kc in static mem pools.
Details:
- Changed the static memory definitions to compute the maximum register
blocksize for each datatype and add it to kc when computing the size
of blocks of A and B. This formally accounts for the nudging of kc
up to a multiple of mr or nr at runtime for triangular operations
(e.g. trmm).
commit af32e3a608631953ef770341df10a14a991bf290
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Thu Feb 19 22:51:11 2015 -0600
Fixed a bug with get_range_weighted would return end = 0 for small problem sizes
commit 441d47542a64e131578d00da7404c1ed387a721c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 19 17:06:10 2015 -0600
Renamed 3m and 4m symbols/macros to 3mi and 4mi.
Details:
- Renamed several variables and macros from 3m/4m to 3mi/4mi. This is
because those packing schemas were always implicitly "interleaved".
This new naming scheme will make way for new schemas that separate
instead of interleve the real and imaginary (and summed) parts.
- Expanded the pack format sub-field of the pack schema field of the
info_t to 4 bits (from 3). This will allow for more schema types
going forward.
- Removed old _cntl.c files for herk3m, herk4m, trmm3m, trmm4m.
commit 518a1756ccf02122b96fc437b538604a597df42a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 19 14:27:09 2015 -0600
Fixed indexing bug for trmm3 via 3mh, 4mh.
Details:
- Fixed a bug that only affected trmm3 when performed via 3mh or 4mh,
whereby micro-panels of the triangular matrix were packed with "dead
space" between them due to failing to adjust for the fact that pointer
arithmetic was occurring in units of complex elements while the data
being packed consisted of real elements. It turns out that the macro-
kernel suffered from the same bug, meaning the panels were actually
being packed and read consistently. The only way I was able to
discover the bug in the first place was because the packed block of A
was overflowing into the beginning of the packed row panel of B using
the sandybridge configuration.
commit 493087d730f01d5169434f461644e5633f48a42f
Merge: 650d2a6f 25021299
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 18 09:45:51 2015 -0600
Merge branch 'master' of github.com:flame/blis
commit 25021299b670775df8ca9c87910c63d7e74ed946
Merge: fe2b8d39 f05a5763
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 11 20:03:21 2015 -0600
Merge branch 'master' of github.com:flame/blis
commit fe2b8d39a445ac848686e78c7540fd046cb95492
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 11 19:33:10 2015 -0600
Fixed an obscure bug in 3mh/3m/4mh/4m packing.
Details:
- Modified bli_packm_blk_var1.c and _var2.c to increase the triangular
case's panel increment by 1 if it would otherwise be odd. This is
particularly necessary in _var2.c when handling the interleaved 3m
or ro/io/rpi pack schemas, since division of an odd number by 2 can
happen if both the panel length and the panel packing dimension
(register packing blocksize) are odd, thus making their product odd.
- Modified bli_packm_init.c so that panel strides are increased by 1
if they would otherwise be odd, even for non-3m related packing.
- Modified the trmm and trsm macro-kernels so that triangular packed
micro-panels are traversed with this new "increment by 1 if odd"
policy.
- Added sanity checks in trmm and trsm macro-kernels that would result
in an abort() if the conditions that would lead to a "divide odd
integer by 2" scenario ever manifest.
- Defined bli_is_odd(), _is_even() macros in bli_scalar_macro_defs.h.
commit 650d2a6ff2e593151a296ca86b5214afcc747afc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 9 14:59:20 2015 -0600
Added initial support for imaginary stride.
Details:
- Added an imaginary stride field ("is") to obj_t.
- Renamed bli_obj_set_incs() macro to bli_obj_set_strides().
- Defined bli_obj_imag_stride() and bli_obj_set_imag_stride() and
added invocations in key locations.
- Added some basic error-checking related to imaginary stride.
- For now, imaginary stride will not be exposed into the most-used
BLIS APIs such as bli_obj_create(), and certainly not the
computational APIs such as bli_dgemm().
commit f05a57634a7c8e3864b25b3335d1194c1ea1aeb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Feb 8 19:40:34 2015 -0600
Defined gemm cntl function to query ukrs func_t.
Details:
- Added a new function, bli_gemm_cntl_ukrs(), that returns the func_t*
for the gemm micro-kernels from the leaf node of the control tree.
This allows all the func_t* fields from higher-level nodes in the tree
to be NULL, which makes the function that builds the control trees
slightly easier to read.
- Call bli_gemm_cntl_ukrs() instead of the cntl_gemm_ukrs() macro in
all bli_*_front() functions (which is needed to apply the row/column
preference optimization).
- In all level-3 bli_*_cntl_init() functions, changed the _obj_create()
function arguments corresponding to the gemm_ukrs fields in higher-
level cntl tree nodes to NULL.
- Removed some old her2k macro-kernels.
commit cefd3d5d2001264de17cf63dae541f890cb9daaf
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 5 11:09:12 2015 -0600
A couple of functions were incorrectly ifdeffed away on Xeon Phi. Fixed this
commit 7574c9947d57a19f613880e3b9f62f8c8f6df4ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 4 12:11:55 2015 -0600
Added basic flop-counting mechanism (level-3 only).
Details:
- Added optional flop counting to all level-3 front-ends, which is
enabled via BLIS_ENABLE_FLOP_COUNT. The flop count can be
reset at any time via bli_flop_count_reset() and queried via
bli_flop_count(). Caveats:
- flop counts are approximate for her[2]k, syr[2]k, trmm, and
trsm operations;
- flop counts ignore extra flops due to non-unit alpha;
- flop counts do not account for situations where beta is zero.
commit ceda4f27d1f1bcf19320e09848e0f2e3b9941e6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 29 13:22:54 2015 -0600
Implemented bli_obj_imag_equals().
Details:
- Implemented a new function, bli_obj_imag_equals(), which compares the
imaginary part of the first argument to the second argument, which may
be a BLIS_CONSTANT or of a regular real datatype.
commit 81114824a05a9053229efd577a8a94a856deda93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 6 12:15:21 2015 -0600
Minor 4m/3m consolidation to mem_pool_macro_defs.h.
Details:
- Merged the 4m and 3m definitions in bli_mem_pool_macro_defs.h to
reduce code and improve readability.
commit 36a9b7b7436d9423ba4de2a9f85cfcd43577b783
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Dec 17 21:53:50 2014 +0000
reduced the default number of MC by KC blocks for bgq
commit c60619c7c3568f044a849abbab60209aa7455423
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 16 17:08:22 2014 -0600
Minor tweaks for 3m4m test drivers.
Details:
- Changed gemm_kc blocksizes to be reduced by two-thirds instead of
half.
- Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
computing the fixed k dimension.
- Fixed runme.sh so that it would use multiple threads for s/dgemm
cases.
commit c6929ba6a5e6f633a7295e979a2b8df8c7ecdb1b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 16 11:27:50 2014 -0600
Added 4m_1b to test/3m4m test driver and script.
commit 785d480805fc0d6f4251b5499933515740b6b2a7
Merge: 9456f330 4156c088
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 12 14:34:19 2014 -0600
Merge branch 'master' of github.com:flame/blis
commit 9456f330af4617f9ee32972d51f974aa2d84f97b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 12 14:31:57 2014 -0600
Added 4m_1b implementation for gemm.
Details:
- Added yet another 4m-based implementation for complex domain level-3
operations. This method, which the 3m/4m paper identifies as Algorithm
"4m_1b" fissures the first loop around the micro-kernel so that the
real sub-panel of the current micro-panel of B is multiplied against
(both sub-panels of) all micro-panels of A, before doing the same for
the imaginary sub-panel of the micro-panel of B. For now, only gemm is
supported, and 4m_1b (labeled "4mb" within the framework) is not yet
integrated into the test suite.
commit 4156c0880d9aea4ff04a9c4fa139ba8c437d8bfb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 9 16:03:14 2014 -0600
Fixed obscure level-2 packing / general stride bug.
Details:
- Fixed a bug in certain structured level-2 operations that manifested
only when the structured matrix was provided to BLIS as matrix stored
with general stride. The bug was introduced in c472993b when the
densify field was removed from the packm control tree node and
associated APIs. Since then, the packed object was unconditionally
marked with an uplo field of BLIS_DENSE. This is fine for level-3
operations where micro-panels are always densified, but in level-2
contexts, the underlying unblocked variant (fused or unfused) of
structured operations (e.g. trmv) still needs to know whether to
execute its "lower" or "upper" branches of code. Since this field
was unconditionally being set to BLIS_DENSE, the unblocked variants
were always executed the "else" branch, which happened to be the
"lower" case code. Thus, running an upper case produced the wrong
answer. This most obviously manifested in the form of failures for
trmm, trmm3, and trsm in the test suite.
The bug was fixed by setting the packed object's uplo field to
BLIS_DENSE only if the schema indicated that micro-panels were to be
packed. Otherwise, we can assume we are packing to regular row or
column storage, as is the case with level-2 packing. Thanks to
Francisco Igual for reporting the testsuite failures and ultimately
leading us to this bug.
commit 689f60a578b461119e9ea90c74f642b9eb79addb
Merge: bef24e67 483e4d6a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Dec 7 14:03:30 2014 -0600
Merge pull request #21 from figual/master
Adding armv8a configuration and micro-kernels.
commit 483e4d6a3fdbef9d9ab47fb674c9476c70ca9f0f
Author: Francisco D. Igual <figual@ucm.es>
Date: Sun Dec 7 20:27:49 2014 +0100
Adding armv8a configuration and micro-kernels.
Only sgemm micro-kernel is fully functional at this point.
commit bef24e67e0f93579c2a80315348dc2e227f72a72
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Nov 26 18:00:56 2014 -0600
Fixed a type of race condition exposed by pthreads implementation.
Lead thread of the inner thread communicator could exit subproblem, move on the next iteration of the loop and modify a1_pack, b1_pack, or c1_pack while other threads were still using those.
Barriers were inserted to fix this.
commit 76bde44411f0e34266bab9d666a54ef22be97320
Merge: e56e6143 f3d729e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 26 17:25:24 2014 -0600
Merge branch 'master' of github.com:flame/blis
commit f3d729e504ec012e7dc7e02b2ecd42e004c6894d
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Nov 26 22:25:24 2014 -0600
Added static mutex to bli_init and bli_finalize
commit d71cc797866ff502ad1127527016f463267eef80
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Nov 26 21:35:39 2014 -0600
Refactored bli_threading files and added support for pthreads
commit e56e61438ff7fcf25a48c0b7603f18df782b50b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 26 17:20:35 2014 -0600
Minor cleanups to bli_threading.h and friends.
Details:
- No longer need to define BLIS_ENABLE_MULTITHREADING manually in
bli_config.h; it now gets defined when BLIS_ENABLE_OPENMP or
BLIS_ENABLE_PTHREADS is defined.
- Added sanity check to prevent both BLIS__ENABLE_OPENMP and
BLIS_ENABLE_PTHREADS from being enabled simultaneously.
- Reorganization of bli_threading*.h header files, which led to
simplification of threading-related part of blis.h.
- added "-fopenmp -lpthread" to LDFLAGS of sandybridge make_defs.mk
file.
commit 3be2744cbe2c56d38c23fd818aa5c1f10cc7ea51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 21 12:28:08 2014 -0600
Update to template gemm ukernel comments.
Details:
- Updated comments on alignment of a1 and b1 to match wiki.
commit 994429c6881b2ade92d9d7949bcaebfbf2cc65eb
Merge: 58796abd 694029d9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 20 13:55:35 2014 -0600
Merge pull request #20 from TimmyLiu/master
#define PASTEF773 required by cblas compatibility layer
commit 694029d9d7db857d642ab536955c0621791108c8
Author: Timmy <timmy.liu@amd.com>
Date: Wed Nov 19 15:25:14 2014 -0600
#define PASTEF773 required by cblas compatiility layer
commit 58796abda66b133346f8d523b39178afc336351f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 6 14:31:52 2014 -0600
Removed KC constraint comments from _kernel.h files.
Details:
- Since 4674ca8c, the constraint that KC be a multiple of both MR and
NR have been relaxed, and thus it was time to remove the comments
from the top of the bli_kernel.h files of all configurations.
commit 7bbc95a54f706d43c7f7951f0e5995f86130cd52
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 29 10:52:23 2014 -0500
Added new piledriver micro-kernels.
Details:
- Added new micro-kernels for the AMD piledriver architecture (one
for each datatype).
- Updates and tweaks to piledriver configuration.
- Added 3xk packm micro-kernel support.
- Explicitly unrolled some of the smaller packm micro-kernels.
- Added notes to avx/sandybridge and piledriver micro-kernel files
acknowledging the influence of the corresponding kernel code in
OpenBLAS.
commit 59613f1d5500f6279963327db2fbc84bc9135183
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 17:21:37 2014 -0500
Added separeate micro-panel alignment for A and B.
Details:
- Changed the recently-added micro-panel alignment macros so that we now
have two sets--one for micro-panels of matrix A and one for micro-
panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
- Store each set of alignment values into a separate blksz_t object in
bli_gemm_cntl_init().
- Adjusted packm_init() to use the separate alignment values.
- Added query routines for the new alignment values to bli_info.c.
- Modified test suite output accordingly.
commit a8e12884ee1fddd3fd77ca5a68aa0cb857f3af57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 11:35:48 2014 -0500
CHANGELOG update (0.1.6)
commit 38ea5022e4ed846112198c4e1672fcdaeb90dc71
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 11:35:45 2014 -0500
Version file update (0.1.6)
commit a3e6341bdb0e28411f935d6b4708a6389663e004
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 11:13:28 2014 -0500
Factored common code from blocksize functions.
Details:
- Split bli_determine_blocksize_[fb]() into two functions each, the
newer ones ending with the _sub suffix. These new sub-functions are
now called from bli_[gemm|trmm|trsm]_determine_kc_[fb](), which
eliminates redundant code and will allow any future tweaks to the
core sub-functions to automatically be inherited by the operation-
specific versions.
commit 4674ca8cffb58331ff7edf23bbe0e3f6a7558489
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 10:50:59 2014 -0500
Extended newly relaxed KC to hemm, symm.
Details:
- These changes were intended for the previous commit.
- Defined bli_gemm_determine_kc_[fb]() and bli_gemm_determine_kc_[fb](),
which determine blocksizes for gemm-based operations, taking special
care to "nudge" the kc dimension up to a multiple of MR or NR for
hemm and symm operations, as needed.
- Changed bli_gemm_blk_var3f.c to call bli_gemm_determine_kc_f().
instead of bli_determine_blocksize_f().
- Comment updates to bli_trmm_blocksize.c, bli_trsm_blocksize.c.
commit ab954ba6f874eaca7b001804491f866ef6b9b327
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 22 17:21:58 2014 -0500
Relaxed constraint that KC be multiple of MR, NR.
Details:
- Relaxed a long-held requirement in register blocksizes that required
the kernel programmer to choose a KC that was divisible by both MR
and NR. This was very constraining on some architectures that did not
use register blocksizes that were powers of two. The constraint is
now enforced only for trmm and trsm, where it is needed, and it is
now handled by "nudging" kc upward at runtime, if necessary, to be a
multiple of MR or NR, as needed.
- Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](),
which determine blocksizes for trmm and trsm, taking special care to
"nudge" the kc dimension up to a multiple of MR or NR, as needed.
- Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]()
instead of bli_determine_blocksize_[fb]().
- Added safeguard to bli_align_dim_to_mult() that returns the dimension
unmodified if the dimension multiple is zero (to avoid division by
zero).
- Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from
bli_kernel_macro_defs.h.
- Whitespace, variable name changes to bli_blocksize.c.
- Removed old commented code from bli_gemm_cntl.c.
commit 95cdae65d6b88e043ee14bcd53cd2e800d7aecb4
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Oct 22 16:30:16 2014 -0500
Fixed bug in KNC microkernel where k=0 and beta != 1
commit e64dba5633fc49b768b5edc7762f2b5d8a4d0588
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 20 19:23:06 2014 -0500
Re-implemented micro-panel alignment.
Details:
- This commit re-implements a feature that was removed in commit
c2b2ab62. It was removed because, at the time, I wasn't sure how the
micro-panel alignment feature would interact with the 4m method (when
applied at the micro-kernrel level), and so it seemed safer to disable
the feature entirely rather than allow possible breakage. This commit
revisits the issue and safely re-implements the feature in a way that
is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
- Modified the static memory pool to account for micro-panel alignment
space.
- Modified packm_init and blocked variants to align whole micro-panels
by a datatype-specific alignment value that may be set by the
configuration. (If it is not set by the configuration, it will default
to BLIS_SIZEOF_?.)
- Modified macro-kernels so that:
- storage stride is handled properly given the new micro-panel
alignment behavior;
- indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
trsm, is more robust (e.g. will work if the applicable packing
register blocksize is odd);
- imaginary strides are computed and stored within auxinfo_t structs,
which allows the virtual micro-kernels to more easily determine how
to index into the micro-panel operands.
- Modified virtual 3m and 4m micro-kernels to use the imaginary strides
within the auxinfo_t structs instead of panel strides.
- Deprecated the panel stride fields from the auxinfo_t structs.
- Updated test suite to print out the micro-panel alignment values.
commit add16b0e5402924301e7078e4ca5e3ef725bff0b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 17 11:49:24 2014 -0500
Added 3m4m test driver subdir of 'test'.
Details:
- Added a modified test driver for [cz]gemm that will test all 3m/4m
as well as assembly-based and OpenBLAS implementations of gemm
in single and multithreaded modes.
commit e171504a72406c61a173241d8bccf0a5ceb10582
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 17 11:25:59 2014 -0500
Use correct definition of bli_is_last_iter().
Details:
- As intended for previous commit, the new definition of
bli_is_last_iter() is now disabled in favor of the old
definition.
commit 0d954087b2b55d2f5f3c5e57d702b318ca2300f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 17 11:19:34 2014 -0500
Minor changes and fixes.
Details:
- Redefined bli_is_last_iter() to take thread_id and num_thread
arguments, which allows the macro to correctly compute whether a
given iteration is the last that the thread will compute in that
particular loop. The new definition, however, remains disabled
(commented out) until someone can look at this more closely, as
the new definition seems to actually hurt performance slightly.
- Whitespace and related updates to level-3 macro-kernels.
- Updated test suite so that performance results in the hundreds of
gigaflops does not disrupt the column alignment of the output.
commit d1e86e1876e433f54b501ec5a005b4ba7c5ce4e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 12 13:43:47 2014 -0500
More minor tweaks to sandybridge/avx micro-kernel.
Details:
- Re-enabled use of b_next for dgemm and cgemm micro-kernels.
commit 7b6fe4cae57cb22c09c1a97595e1a201a02cbcd2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 12 12:01:51 2014 -0500
Minor tweaks to sandybridge/avx micro-kernels.
Details:
- Changed the MC blocksize for zgemm micro-kernel from 128 to 64.
- Removed usage of b_next in all x86_64/avx gemm micro-kernels.
commit a6a156e9feec47154e7a0fd43bcc006b1fc04aba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 10 14:26:41 2014 -0500
Added cgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based cgemm micro-kernel (via GNU extended inline
assembly syntax).
- Updated sandybridge configuration accordingly.
commit 6f8575ab2580e167a022293b76ddf0514f71b613
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 10 10:01:45 2014 -0500
Added zgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
assembly syntax).
- Updated sandybridge configuration accordingly.
commit 23ce7ee542a12ca40b4b6090ad2558d180e16d37
Merge: 99fd9a39 7a8ad47f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 9 16:41:22 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 99fd9a39718cb7281f6fb23f9fef7cca4fe514f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 9 16:38:04 2014 -0500
Fixed two minor bugs.
Details:
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
modules whereby the uplo bits of some packed matrix objects were not
being set properly, resulting in false FAILURE results for those
tests. Thanks to Tyler Smith for bringing this issue to my attention.
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
"not yet implemented" abort() when creating a 1x1 object with non-unit
strides.
commit 7a8ad47fb2d100a9da93aa8cab774fcceeaab733
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Oct 8 15:52:13 2014 -0500
Minor changes to knc configuration, including preference row major storage
Also fixed a bug in the knc micro-kernel where it would fail if k == 0
commit 76b7c34af0c09f47d9615b18857a356acddc788a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 2 14:15:38 2014 -0500
Fixed a bug in the pack schema-related bit macros.
Details:
- Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
include all six bits presently used in the pack schema bitfield of
the info field of obj_t structs. Prior to this commit, the macro
constant only included the lowest five bits, which excluded the
"is or is not packed" bit. This manifested as a strange bug in
probably many level-2 codes that invoked packing, though we only
observed it in ger before fixing. Thanks to Devin Matthews for
finding and reporting this bug.
commit a5763e332226598d70c47dfa9cad4578e15ef5f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 2 13:28:17 2014 -0500
Added extra output to bli_obj_print().
Details:
- Print extra values from info field of obj_t struct within
bli_obj_print().
commit 9bba209fc44fbfce943ba6a51cd8278a0cb6b159
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Sep 29 14:56:36 2014 -0500
Fixed bug when packing anywhere besides in blk_var_1 for gemm.
commit 614a4afc9272adb47e5a8b83b39d56c2804d95d6
Merge: b541b667 4a7df04e
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Sep 26 10:49:57 2014 -0500
Merge branch 'master' of http://github.com/flame/blis
commit 4a7df04e8a4ffdb9561d26426afd35e4fe15b013
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 22 16:06:15 2014 -0500
Added 30xk support for packm ukernels.
Details:
- Updated bli_kernel_*_macro_defs.h headers to include default
definitions for 30xk packm kernels.
- Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
included 30xk kernels.
- Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.
commit b6d4bd792e0d44ce4b28afef343f5ff3ba89c285
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 22 16:02:37 2014 -0500
Fixed missing tabs from Makefile patch.
commit 32630f9b6f0d5ba28d5b56dae4c7288a37158743
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 19 17:18:20 2014 -0500
Comment update to virtual micro-kernels.
commit 13447cffead7c6d137a7a3ccbf9e552ed0477467
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 19 13:00:48 2014 -0500
Minor bugfix to top-level Makefile.
Details:
- Applied a patch that allows the top-level Makefile to work on certain
systems. The patch simply separates out the source-to-object code
generation rules for .c and .S files into two separate rules. Thanks
to Devin Matthews for submitting this patch.
commit e80a4537846416719c067ae08a53aeda978c572d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 18 10:24:20 2014 -0500
Fixed bug introduced by bugfix in 25b258d.
Details:
- We actually need to check alignment of lda*sizeof(double) and NOT
a+lda because in the latter case, alignment could cancel out and
still allow the optimized code to run when it shouldn't. Thanks
to Devin for pointing this out.
commit 25b258d61f9c8cee64e922f4131784b6edb196dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 18 10:10:49 2014 -0500
Fixed a non-fatal problem with bugfix in a68b316c.
Details:
- The bugfix in a68b316c was inadvertantly checkin alignment of the
leading dimension itself, rather than the byte size of the leading
dimension. Now, we simply check alignment of a+lda.
commit 96302d4fc81363410e41c3a3c43a65df44d97ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 18 09:43:40 2014 -0500
Renamed bli_info_get_*_ukr_type() functions.
Details:
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
This makes them consistent with the bli_info_get_*_impl_string()
functions.
commit a68b316ca4852509f84ed50e01afac486bf70f58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 17 11:10:07 2014 -0500
Fixed alignment bugs in level-1f kernels.
Details:
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
were attempting to compute problems with unaligned leading dimensions
with optimized code, rather than (correctly) using the reference
implementations. Thanks to Devin Matthews for reporting this bug.
commit 870761eb902e4866090d1d3446a345df3d6d4599
Merge: e9899be0 a2b59a37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 16 18:20:49 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit e9899be09044829e23386bd73e394f1dd7778210
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 16 18:19:32 2014 -0500
Added high-level implementations of 4m, 3m.
Details:
- Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
high levels, respectively. APIs for trmm and trsm were NOT added due
to the fact that these approaches are inherently incompatible with
implementing 4m or 3m at high levels (because the input right-hand
side matrix is overwritten).
- Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
3m so that all are stylistically consistent.
- Added new "rih" packing kernels (both low-level and structure-aware)
to support both 4mh and 3mh.
- Defined new pack_t schemas to support real-only, imaginary-only, and
real+imaginary packing formats.
- Added various level0 scalar macros to support the rih packm kernels.
- Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
- Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
that order) and execute the first one that is enabled, or the native
implementation if none are enabled.
- Added implementation query functions for each level-3 operation so
that the user can query a string that describes the implementation
that is currently enabled.
- Updated test suite to output implementation types for reach level-3
operation, as well as micro-kernel types for each of the five micro-
kernels.
- Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
- Fixed an obscure bug when packing Hermitian matrices (regular packing
type) whereby the diagonal elements of the packed micro-panels could
get tainted if the source matrix's imaginary diagonal part contained
garbage.
commit a2b59a37f166f70a6dd5793db2530823ef590c2b
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Sep 15 10:44:44 2014 -0500
Fixed make defs so that they actually compile for bulldozer
commit 86fc7e40764f78ec217f50216ef4fa5b57dbfbc7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Sep 15 10:35:46 2014 -0500
Added bulldozer configuration and updated piledriver micro-kernel
commit 0644e61a79a57f136be5f4c47b9099cff2af06e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 11 12:55:34 2014 -0500
Minor updates to bli_packm_init.c.
commit 9dc9b44a057a08e20ad4d423344f0ecad54c1eb2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 11 12:03:28 2014 -0500
Renamed bli_obj_pack_status() to _pack_schema().
Details:
- Renamed the bli_obj_pack_status() macro to bli_obj_pack_schema() in
order to help avoid confusion as to what the macro returns.
commit cf5efdde0588a0d5b6ea57fe7d7be5000be06f8e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 11 11:47:56 2014 -0500
Pass pack_t schemas into ukernels via auxinfo_t.
Details:
- Modified macro-kernels to pass the pack_t schema values for matrices
A and B into the datatype-specific functions, where they are now
inserted into a newly-expanded auxinfo_t struct. This gives gives the
micro-kernels access to the pack_t schema values embedded in the
control trees, which determine the precise format into which the
matrix elements are packed.
- Updated a call to bli_packm_init_pack() in src/test_libblis.c to
remove densify argument. Meant to include this in commit c472993b.
commit cc8d2b82775cca3c2d51bf427f4e77c8024a6d15
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 9 13:48:22 2014 -0500
Updated old test drivers in 'test'.
commit c472993bbccb69e9ffc409c79b742426c8ad2ad4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 9 13:42:04 2014 -0500
Removed densify argument to packm_cntl_obj_create().
Details:
- Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
This argument was inserted very early in BLIS's development, when it
was anticipated that the developer may sometimes wish to pack a
Hermitian, symmetric, or triangular matrix without making it dense.
But as it turns out, if we are packing a matrix, we always want to
make it dense in some way or another due to the fact that the micro-
kernel only multiplies dense micro-panels. Thus, unless/until there
is a real need for the feature, it seems reasonable to remove it from
the packm_cntl API.
commit 5c43ee387146cd76dc59b730dac6683a8446b834
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 8 15:19:29 2014 -0500
Moved trmm4m/3m_cntl files to 'old' directory.
Details:
- Meant to include this in previous commit.
commit 7b2f469d5465ed73b1ca88124bc9a1987388aa27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 8 14:49:50 2014 -0500
Retired trmm_t control tree definitions, usage.
Details:
- Replaced all trmm_t control tree instances and usage with that of
gemm_t. This change is similar to the recent retirement of the herk_t
control tree.
- Tweaked packm blocked variants so that the triangular code does NOT
assume that k is a multiple of MR (when A is triangular) or NR (when
B is triangular). This means that bottom-right micro-panels packed for
trmm will have different zero-padding when k is not already a multiple
of the relevant register blocksize. While this creates a seemingly
arbitrary and unnecessary distinction between trmm and trsm packing,
it actually allows trmm to be handled with one control tree, instead
of one for left and one for right side cases. Furthermore, since only
one tree is required, it can now be handled by the gemm tree, and thus
the trmm control tree definitions can be disposed of entirely.
- Tweaked trmm macro-kernels so that they do NOT inflate k up to a
multiple of MR (when A is triangular) or NR (when B is triangular).
- Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
of which are to facilitate above-mentioned changes whereby k is no
longer required to be a multiple of register blocksize when packing
triangular micro-panels.
- Adjusted trmm3 according to above changes.
- Retired trmm_t control tree creation/initialization functions.
commit 576e9e9255a79dba9cd3c804267f51e0b4aa6e8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Sep 7 16:12:52 2014 -0500
Retired herk_t control tree definitions, usage.
Details:
- Replaced all herk_t control tree instances and usage with that of
gemm_t, since the two types presently have the same fields. This means
that herk, her2k, syrk, and syr2k can simply use the gemm control tree
as-is, just as hemm and symm have been doing for some time now.
- Retired herk_t control tree creation/initialization functions.
- Retired many _target.c and .h files into 'old' directories.
commit b2fed052c9a23d858ef0afbe220b342bce9aa7f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 3 17:07:25 2014 -0500
Minor code cleanup to bli_packm_struc_cxk*.c
Details:
- Realized that we don't need to track rs_p11 and cs_p11 for
Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
equal to rs_p and cs_p.
commit 023ce770966b3b5a98bba729c5af1f45e15ebb97
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 3 10:47:53 2014 -0500
Minor update to packm_cxk kernels.
Details:
- Changed m and n dimension parameter names to panel_dim and panel_len,
respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
functions. This makes the code a little easier to read since "m" and
"n" have connotations that are not applicable here.
- Comment updates.
commit 189def3667d9218adbeec45e2801fd074341a679
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 1 16:23:17 2014 -0500
Retired portions of bli_kernel_3m/4m_macro_defs.h.
Details:
- Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
4m/3m-specific blocksizes after realizing that this can be done in
bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
are used.
- The maximum cache values for 4m/3m are stll needed when computing mem
pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
definitions in terms of the regular cache blocksizes are now in place.
- Similarly, the register blocksizes for 4m/3m are still needed in
bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
terms of the regular register blocksizes are now in place.
commit af521ee6f2a77d61c98b833e85c09969987bc00d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 1 14:06:46 2014 -0500
Changed semantics of blocksize extensions.
Details:
- Changed semantics of cache and register blocksize extensions so that
the extended values are tracked, rather than just the marginal
extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
that these "max" query routines grab the maximum value for cache
blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.
commit 07f23aefd52f5ba4960dbd46e59b180a2136b8e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 31 11:58:50 2014 -0500
Pass pack schema into packm_struc_cxk*().
Details:
- Changed the interface to the packm_struc_cxk*() kernels to include
the pack_t schema. This allows the implementation to more easily
determine how the micro-panel is stored (row-stored column panel
or column-stored row panel).
- Updated packm blocked variants to pass in the schema.
- Updated packm_ker_t function pointer definition accordingly.
commit f032ba9b1186cb02184574d339565f53d733aa42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 30 16:21:20 2014 -0500
Reorganized packm implementation.
Details:
- Reorganized packm variants and structure-aware kernels so that all
routines for a given pack format (4m, 3m, regular) reside in a single
file.
- Renamed _blk_var4 to _blk_var2 and generalized so that it will work
for
both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
- Added a new packm_ker_t function pointer type to
bli_kernel_type_defs.h
to facilitate function pointer typecasting in the datatype-specific
packm_blk_var2() functions.
- Deprecated _blk_var3.
- Fixed a bug in the triangular micro-panel packing facility that
affected trmm and trmm3 with unit diagonals.
commit c6793cecb70788bdf2c76ab8102504ea97be9d2a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 17:14:48 2014 -0500
Reorganized #includes for scalar macro headers.
Details:
- Reordered the #include statements in bli_scalar_macro_defs.h so that
conventional, ri-, and ri3-based macros are grouped together.
- Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.
commit b4da8907284345be4374f87a88679c4886ab866e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 14:10:32 2014 -0500
Whitespace, comments updates on packm_blk_var?.c.
commit 46e46a1d83da586c3dd9fd7a01eb16067abbaee1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 12:05:45 2014 -0500
Minor updates to packm blocked, cxk_3m/4m code.
Details:
- Added 'const' qualifier to inlined packing code that handles
micro-panel packing that is too large for an existing packm ukernel.
- Comment updates.
commit 908dc688b5979995eaacb3aa937f241551a8df00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 11:55:12 2014 -0500
Pass pack schema into blocked packm routines.
Details:
- Rather than passing the packm blocked routines a boolean value that
represents whether the matrix is being packed to row or column storage,
we now pass in the pack schema itself.
commit a0ff6066e06075ab5f92b19247b39b92ed15f1bf
Merge: c4c99c48 d40b32bc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 15:56:21 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit c4c99c4813bf9817592a7899c5d33412fe22313f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 15:52:22 2014 -0500
Renamed packm scalar from beta to kappa.
Details:
- The packm implementation (i.e. sources files in frame/1m/packm and
frame/1m/packm/ukernels), interchangeably used the names "beta" and
"kappa" to refer to the optional scalar to be applied during packing.
This commit renames all uses of "beta" to be "kappa", since "beta"
sometimes evokes the scalar specifically on the output matrix of a
level-2 or level-3 operation.
commit d40b32bc24ffbae24123e054307b3138969bb095
Merge: 9331f794 6c25c379
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 13:46:36 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 6c25c379fadb50834146e1614f7b80c093c2aad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 13:44:10 2014 -0500
Consolidated unpackm ukernels into single file.
Details:
- Reorganized unpackm ukernels into a single file,
bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
ukernels in commit 4cc2b46.
commit 9331f79443223fe267676ee54c439e1ed320380c
Merge: 7fc48a7d 670b6392
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 10:54:21 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 670b63926a7f4fc694abc5b1582ef8a4f367f5a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 10:46:27 2014 -0500
Added whitespace to bli_obj_scalar_ routine calls.
Details:
- Added extra spaces to align arguments of
bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
the fact that the function was previously named
bli_obj_init_scalar_copy_of() and the name change, performed in
b444489f, was done via recursive sed commands which left subsequent
lines untouched.
commit 7fc48a7d920e07fd8e9528ab2565123f8f4e67f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 23 16:50:58 2014 -0500
Combined 4m/3m bits into an expanded bitfield.
Details:
- Combined the 4m/3m bits into an expanded bitfield, which will encode
the packing "format" of the micro-panels. This will allow for more
easily and compactly encoding additional formats.
- Other minor comment/whitespace updates to bli_type_defs.h.
- Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
format bitfield.
- Comment update to bli_kernel_post_macro_defs.h.
- Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.
commit ef0143cc1417e4815e4cafd5a464cc83fe7a1e86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 23 14:02:27 2014 -0500
Renamed _ri, _ri3 packm ukernels to _4m, _3m.
Details:
- Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
- Updated names of cpp macros that correspond to packm ukernels.
commit b0ccac116158b5ed3316d34798748ba0c6d78672
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 21 19:21:52 2014 -0500
Cleaned up front-end layering for 4m/3m.
Details:
- Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
and bli_gemm4m_entry()) to hide the control trees from the code that
decides whether to execute native or 4m-based implementations. The
layering was also applied to 3m.
- Branch to 4m code based on the return value of bli_4m_is_enabled(),
rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
the groundwork for users to be able to change at runtime which
implementation is called by the main front-ends (e.g. bli_gemm()).
- Retired some experimental gemm code that hadn't been touched in
months.
commit bedec95451cabfa7a8906b51018a5e0572998a5e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 21 18:25:48 2014 -0500
Added bli_4m API for querying 4m enabled state.
Details:
- Added bli_4m.c (and header), which defines a simple API that can be
used to query, enable, and disable 4m-based complex support in BLIS.
The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
the variable that determines the state (enabled or disabled).
- Changed bli_info*() API so that all cache and register blocksize-
related query routines return the blksz_t objects' values as they
exist at runtime, rather than return the values as determined by the
configuration system (e.g. bli_kernel.h, or defaults for those values
not specified). This sets the foundation for being able to change
those blocksizes at runtime.
commit b541b667cabfa6d41b50ad1e49209651ee6812cc
Merge: 699a8151 dd61307f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Aug 20 14:44:51 2014 -0500
Merge branch 'master' of http://github.com/flame/blis
Conflicts:
frame/3/trsm/bli_trsm_blk_var2b.c
frame/3/trsm/bli_trsm_blk_var2f.c
commit 699a8151ca3d5021e834a1784ef45dcc3a3d17cd
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Aug 20 14:43:17 2014 -0500
Some improvements to trsm parallelism
commit dd61307f55bb6bc762fe0ef0446479d6c0536723
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 20 09:52:16 2014 -0500
Minor update to sandybridge MC_S, KC_S.
Details:
- Changed sandybridge MC and KC for single-precision real to 128 and 384,
respectively.
- Updated comments in template configuration's gemm micro-kernel file
to document the new "contiguous row preference" macro.
commit d0eec4bddd740ce360d0f655362c551287cf925b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 19 15:49:19 2014 -0500
Added optional row preference to ukernel config.
Details:
- Added the ability for the kernel developer to indicate the gemm micro-
kernel as having a preference for accessing the micro-tile of C via
contiguous rows (as opposed to contiguous columns). This property may
be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
which may be defined or left undefined. Leaving it undefined leads to
the default assumption of column preference.
- Changed conditionals in frame/3/*/*_front.c that induce transposition
of the operation so that the transposition is induced only if there
is disagreement between the storage of C and the preference of the
micro-kernel. Previously, the only conditional that needed to be met
was that C was row-stored, which is to say that we assumed the micro-
kernel preferred column-contiguous access on C.
- Added a "prefers_contig_rows" property to func_t objects, and updated
calls to bli_func_obj_create() in _cntl.c files in order to support
the above changes.
- Removed the row-storage optimization from bli_trsm_front.c because
it is actually ineffective. This is because the right-side case of
trsm flips the A and B micro-panel operands (since BLIS only requires
left-side gemmtrsm/trsm kernels), meaning any transposition done
at the high level is then undone at the low level.
- Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
invocation of the bli_obj_swap() macro.
commit 4cc2b464f29cafbfef9295b073b857fe0752f710
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 15 11:49:15 2014 -0500
Reorganized packm ukernels.
Details:
- Previously, packm micro-kernels were organized by the implied register
blocksize (panel dimension) assumed by the kernel, meaning conventional,
ri, and ri3 variations of some micro-kernel size were housed in the same
file. This commit reorganizes the micro-kernels so that all sizes reside
in the same file for each format type (conventional, ri, and ri3).
commit fcc10054a11b6fc3976986f57feccf741596cbf6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 13 12:32:06 2014 -0500
Tweaks to gemm4m, gemm3m virtual ukernels.
Details:
- Fixed a potential, but as-yet unobserved bug in gemm3m that would
allow undesirable inf/NaN propogation, since C was being scaled by
beta even if it was equal to zero.
- In gemm3m micro-kernel, we now avoid copying C to the temporary
micro-tile if beta is zero.
- Rearranged computation in gemm4m so that the temporary C micro-tile
is accessed less, and C is accessed only after the micro-kernel
calls. This improves performance marginally in most situations.
- Comment updates to both gemm4m and gemm3m micro-kernels.
commit cdcbacc2fa871317c8e7ef961ecc6d70ab22dc34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 12 12:45:38 2014 -0500
Removed redundant redef of packm ukr prototypes.
Details:
- Removed redundant macro code that redefined packm ukernel prototypes
when the previous macro was already sufficient. This helps de-clutter
the packm ukernel prototyping headers a little bit.
commit 82dac98d9032ccb598068a55ddf23d7898491e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 12 12:36:25 2014 -0500
Relocated packm ukernel #includes.
Details:
- Consolidated the #include statements for packm ukernel headers from
bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
bli_packm.h.
- Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.
commit 7f77856e25aad5fc6f172ed3e57b6351804e31a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 12 12:20:15 2014 -0500
Removed unused 4m/3m-related packm macro defs.
Details:
- Removed unused and unneeded s- and d-flavored macro definitions for
packm ukernels related to the complex 4m and 3m methods, as
implemented in BLIS.
commit bc1d86b2d4d436b1dfba2d0098501aaca9cbb8b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 19:01:20 2014 -0500
Sandy Bridge configuration, micro-kernel update.
Details:
- Minor updates to bli_config and bli_kernel.h for sandybridge
configuration.
- Renamed existing AVX intrinsic-based micro-kernel file to
bli_gemm_int_d8x4.c.
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
gemm micro-kernels for single- and double-precision real.
commit 98ec95877a95242e159b2bf0c879115a59e4c6e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 18:28:32 2014 -0500
Corrected comment for _obj_is_[row|col]_stored().
Details:
- Fixed a mistake in the comments introduced in the previous commit for
bli_obj_is_row_stored() and bli_obj_is_col_stored().
commit 43d5e419e1b424d2143817103dbee8ead797e8aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 18:20:40 2014 -0500
Reverted _obj_is_[row|col]_stored() macros.
Details:
- Rolled back recent changes to bli_obj_is_row_stored() and
bli_obj_is_col_stored() so that those macros now only inspect the
strides (row or column). It turns out that the more sophisticated
definitions introduced in a51e32e are not necessary, because these
"obj" macros are virtually never used on packed matrices, and when
they are, they can use bli_obj_is_[row|col}_packed() macros, which
inspect the info bitfield.
commit 45692e3ad4b7e1d05ac4302398df4efce04b4284
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 13:21:15 2014 -0500
Reverted some accidental changes.
Details:
- Reverted some changes that were unintentionally included in the
previous commit (9526ce98). Thanks to Tony Kelman for pointing
this out. (Note: a few select changes were not reverted.)
commit 9526ce98812be908bc4915f2849b657fb6ce1b49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 6 14:13:46 2014 -0500
Updated copyright headers of emscripten configuration files.
commit 30833ed71d56f231ddba21e632bcbbc90b12a97c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 6 12:12:03 2014 -0500
Minor edits to configurations' make_defs.mk files.
Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
is defined first and then the other two are defined in terms of
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
little easier to read.
commit 9d61afeae2ba70fe1df07e7546f6954ea83aed12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 4 16:01:59 2014 -0500
CHANGELOG update (0.1.5)
commit bde56d0ecfd0ec20330fac290b91a6dca0cf94e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 4 16:01:58 2014 -0500
Version file update (0.1.5)
commit 4c6ceea4be35d089630986eb5b959b9e97214077
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 4 15:49:59 2014 -0500
Added CBLAS compatibility layer.
Details:
- Added a new section in bli_config.h files of all configurations for
enabling CBLAS support. (Currently, the default is for the CBLAS layer
to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
subroutines found in CBLAS that allow calling some BLAS routines with
the return value passed as the last argument rather than as an actual
(function) return value. This was probably intended to allow CBLAS to
avoid the whole f2c debacle altogether. However, since BLIS does not
assume the presence of a Fortran compiler, we had to provide similar
routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
commit caab62dac0fb0bd0d674118f409c81680db94d29
Merge: 383631b5 db97ce97
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 3 14:36:18 2014 -0500
Merge pull request #19 from kevinoid/fix-install-perms-error
Fix permissions error installing to non-owned directory
commit db97ce979b88c051922c2f946ce52d523c7a12c6
Author: Kevin Locke <kevin@kevinlocke.name>
Date: Sun Aug 3 12:48:04 2014 -0600
Fix permissions error installing to non-owned directory
When installing to a directory which is not owned by the installing
user, even when the user has write permission for the directory, the
installation can fail with an error similar to the following:
Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1
In the example case, the error occurred because the user attempted to
install to /usr/local and /usr/local/lib is owned by root with mode 2755
which the Makefile unsuccessfully attempted to change to 0755.
Given that installing to /usr/local is likely to be quite common and the
ownership/permissions are the default for Debian and Debian-derived
Linux distributions (perhaps others as well), this commit attempts to
support that use case by using mkdir rather than install to create the
directory (which is the same approach as Automake).
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
commit 383631b514c3d42b724640f57644eea276cc418c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 31 14:51:48 2014 -0500
Redefined bit field macros with bitshift operator.
Details:
- Redefined many of the macros that define bit fields and bit values in
the obj_t info field using the bitshift operator (<<). This makes it
easier to reorder bit fields, or expand existing bit fields, or add
new fields. The bitshifting should be evaluated by the compiler at
compile-time.
commit 137143345dc93cc9a83da5ba88b25bac7502de86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 31 12:12:45 2014 -0500
Reimplemented unit blocksize fix in prev commit.
Details:
- Instead of inferring the storage format of the micro-panels from within
the packm variants, we now pass in a bool_t value that denotes whether
the packed matrix contains row-stored column panels or column-stored
row panels. This value can then be tested more easily inside the main
packm variant loop.
- Renumbered pack_t schema values in bli_type_defs.h so that there are
now five bits, each with different meaning:
- 4: packed or not packed?
- 3: packed for 3m?
- 2: packed for 4m?
- 1: packed to panels?
- 0: stored by rows or columns?
- Added new macros that test for status of above bits in schema bit
subfield, and renamed some existing macros related to 4m/3m.
commit a51e32ec061941cd10119ea80115c82a40b1673f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 30 10:41:48 2014 -0500
Fixed unit register blocksize brokenness.
Details:
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
and column-stored micro-panels when MR or NR is unit. When either
register blocksize (or both) is equal to one, inspecting the strides of
the affected packed micro-panel is no longer sufficient to determine
whether the micro-panel is a row-stored column panel or a column-stored
row panel (because both strides are unit). At that point, dimension
information is necessary when invoking the bli_is_row_stored_f() and
bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
Ilya Polkovnichenko for reporting this bug.
- Added panel dimensions (m and n) to obj_t, which are set in
packm_init() and then passed into the blocked variants to support the
aforementioned update.
commit c2732272f0ac680a0ad19fa9db5d587398a1479a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 29 16:37:18 2014 -0500
Removed old/unused packm variants.
commit b97fa9a5a70fe0123e5eebd999b947461d38445f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:54:09 2014 -0500
Minor usage update to build/bump-version.sh.
commit b18ba5f62d98629cdd519ff4c96fc67ec1a62fb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:52:05 2014 -0500
Added missing 'bla_' prefix to r_imag(), d_imag().
Details:
- Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
Ali for pointing the mis-named functions.
commit af7a8e6c042cade452130a6729377f1a3ef4e19e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:20:13 2014 -0500
CHANGELOG update (0.1.4)
commit a7537071b152ecff671f8716595d37dc09e4fd51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:20:12 2014 -0500
Version file update (0.1.4)
commit acff74041bf02c7b9fdfa24b507bca782a4c5fce
Merge: cdb9413e 47b243ef
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 15:07:30 2014 -0500
Merge branch 'master' of https://github.com/flame/blis
commit cdb9413e140f8a198666250ec88fa34b5425a9c3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 15:05:15 2014 -0500
Enabled threading for a couple more loops in TRSM
JC loop is now enabled for the left-sided case
IC loop is now enabled for the right-sided case
commit 47b243ef08f4101de3d936f2373343e67eaa4dd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 23 13:41:13 2014 -0500
Call setid for early return from herk/her2k.
Details:
- Added setid call (to zero imaginary parts of diagonal elements) to
early return branches of herk_front() and her2k_front() for cases
where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
- Comment update.
commit 3e7b0db5b0e24f5fd66c60bacabc019885ddbec5
Merge: 2f8a357d ed3e33d5
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 13:40:44 2014 -0500
Merge branch 'master' of https://github.com/flame/blis
commit 2f8a357de5fb55163a969d888cf059f24b78125c
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 13:40:12 2014 -0500
Some TRSM threading fixes/additions
commit ed3e33d548047be3283ff41268fdf716563bc542
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:40:43 2014 -0500
Tweaked behavior of herk, her2k for BLAS compat.
Details:
- Updated herk_front() and her2k_front() to explicitly set the imaginary
components of the diagonal entries of C to zero after the computation
is complete. This is needed in case downstream applications read the
full diagonal entries (i.e., including imaginary part), which could, in
the absence of this modification, accumulate numerical error from
subsequent rank-k/rank-2k updates.
- Updated BLAS compatibility wrappers for herk and her2k to return early
if:
n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
This also results in the imaginary components of diagonal entries NOT
being set to zero (see above), which is consistent with BLAS.
- Updated mkherm to use setid instead of an inlined loop over the
diagonal.
commit ea59a5c93cde1467a3715abc53dda4aecf961873
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:36:02 2014 -0500
Added new level-1d operation: setid.
Details:
- Defined a new level-1d operation, setid, which sets the imaginary
elements of an object's diagonal to a single scalar. This can be
useful, for example, when trying to make the diagonal of a Hermitian
matrix real-valued.
commit 8965a965931318619ceaebd7c32edccf3022d0c7
Merge: 1785efb5 5b73e80b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:34:32 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 1785efb5420bc7b9c850a068cb5d99837071e877
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:33:01 2014 -0500
Minor improvements to invertd and setd.
Details:
- Added missing call to invertd_check() from front-end.
- Changed setd front-end call of scald_check() to setd_check().
commit 5b73e80b71c054c1945a06aff044ef629bc1a9a0
Merge: a41e68e0 20690fe3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 18 12:21:20 2014 -0500
Merge pull request #16 from Maratyszcza/emscripten
Emscripten port
commit a41e68e09e73b999fab0bb430a43dccfc63aab45
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 17 13:25:56 2014 -0500
Reimplemented BLIS initialization/finalization.
Details:
- Rewrote bli_init() and bli_finalize() with OpenMP critical sections
for thread-safety. Also added lots of explanatory comments.
- Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
suffix, and reimplemented for simplicity. Updated all invocations
in BLAS compatibility layer to use _auto() suffix.
commit 36358948ea75074bda32a9f8c008f835b87d21db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 17 10:58:10 2014 -0500
Retired frame/3/gemm/other directory.
Details:
- Removed frame/3/gemm/other directory, which contained some outdated
and/or experimental variants.
commit c73261f17edf589e76bdbe297702a1fbbd69275f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:23:51 2014 -0500
More minor cleanups post-copyright update.
commit 2a09d24463d358be6243b24f112fad057c2aefe0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:17:09 2014 -0500
Reverted power7 symlinks destroyed by sed script.
Details:
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
after recursive-sed.sh mistakenly replaced them with copies of the
actual files to which they referred. Meant to include this in previous
commit.
commit 7ed415824d3b2e78541b6f64e404ca5347c06d3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:14:33 2014 -0500
Updated copyright headers (continued).
Details:
- Inserted "at Austin" into third clause of license declarations.
Meant to include this change in previous commit.
commit 5c2c6c85616834ff2716ece083118201d9df6dde
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:05:03 2014 -0500
Updated copyright headers to contain "at Austin".
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
commit fcec68cda3f6e90ae055e7304e6674c1c5c8d010
Merge: 94c0df79 4a20ed1a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 11:35:34 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 94c0df797eda377931f29a41ba6a89c0ed58daca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 11:24:36 2014 -0500
Changed order of zero dim / error checking.
Details:
- Updated level-2 and level-3 internal back-ends so that the operation's
_check() function is called BEFORE any attempt to return early due to
the presence of zero dimensions. This ordering makes more sense because
(for example) object dimensions should match even if one of them is
zero. Previously, a dimension mismatch could result in an early return
with no error message.
- Updated bli_check_object_buffer() so that NULL buffers result in an
error only if the object is dimensionally non-empty (i.e., only if both
of the object's dimensions are non-zero). This allows BLIS operations
to be performed on dimensionally empty objects (i.e., where at least one
dimension is zero).
- Updated the error message associated with bli_check_object_buffer()
to mention the newly relaxed constraint mentioned above, vis-a-vis
non-zero dimensions.
commit 20690fe3018ce17c8df61ce0bffecaa7911dc3a5
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jul 13 22:50:56 2014 -0700
Emscripten port
commit 4a20ed1a3f5e9e5232df30aa0e568e6c00c56ce1
Merge: 6a515e98 8ccdfaef
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 13 17:45:01 2014 -0500
Merge pull request #14 from Maratyszcza/master
Support "make test" for PNaCl configuration
commit 6a515e988f2ae1628258a6dec2c0e9cf2d04790f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 13 17:38:33 2014 -0500
Implemented dsdot() and sdsdot() in compat layer.
Details:
- Replaced "not yet implemented" error messages in dsdot() and sdsdot()
with actual implementations. (These routines are so rarely used that
this log message will probably lead to some people learning of their
existence for the first time.)
commit 255668ddd1004552c6cc65035ec6486671ce99bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 13 17:30:44 2014 -0500
Inserted gemv beta-scaling bug into compat layer.
Details:
- BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
y of non-zero length and a vector x of zero length results in no action.
Given that the operation is y := beta*y + A*x, many (most?) individuals
would expect vector y to still be scaled by beta. BLIS, when called
natively, handles these cases intuitively (with beta scaling).
Unfortunately, many BLAS test suites actually check for the way this
situation is handled. Therefore, we have decided to implement this "bug"
in the compatibility layer so as to provide "bug-for-bug" compatibility
with BLAS.
commit 570a154581bdb353fa13a219c7cb3c81d3dceffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 12 17:51:05 2014 -0500
Comment/formatting updates to build scripts.
Details:
- Minor updates to comments and formatting in bump-version.sh and
update-version-file.sh scripts.
commit 26cd81990631ff799791629206e068126ff9e3a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 10 13:16:07 2014 -0500
Added bli_info_*() query functions.
Details:
- Added a new API family, bli_info_*(), which can be used to query
information about how BLIS was configured. Most of these values are
returned as gint_t, with the exception of the version string which
is char*.
- Changed how the testsuite driver queries information about how BLIS
was configured (from using macro constants directly to using the
new bli_info API).
- Removed bli_version.c and its header file.
- Added STRINGIFY_INT() macro to bli_macro_defs.h
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
an actual naming conflict, but because the name 'info_t' would now be
somewhat misleading in the presence of the new bli_info API, as the
two are unrelated).
commit 970b43141697d8c31a033f59513bb59d7cc78ab0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 10 09:30:00 2014 -0500
Minor bugfixes to BLAS compatibility layer.
Details:
- Changed bla_amax.c so that i?amax() routines now correctly return 0
if ( n < 1 || incx <= 0 ).
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
f2c's abs() macro for float and double cases.
- Thanks to Murtaza Ali for suggesting the two fixes above.
- Updated label of fnormv to normfv in testsuite/input.operations.
commit 8ccdfaef4c42ad8957af8607a1a9ee29b9277d4b
Author: Marat Dukhan <maratek@gmail.com>
Date: Tue Jul 8 23:14:36 2014 -0700
Replicated logic from testsuite/Makefile in top-level Makefile to support make test
commit caa6507ff3724c80d60987f309b8bbc5b50a9841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 8 10:25:27 2014 -0500
Minor cleanup to standalone test drivers.
Details:
- Very minor code changes to standalone test drivers in 'test' directory.
- Added *.so files to '.gitignore'.
commit 6c65e9a58fe55990ebb99ec3986443e18af35338
Merge: cb12e456 daca500d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 8 10:13:49 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit cb12e456f94c196c093e52f02a7cbca0032fc86e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 8 10:07:46 2014 -0500
Fixed possible level-3 inf/NaN issue when beta=0.
Details:
- Redefined xpbys_mxn and xpbys_mxn_u/_l macros to employ a copy
(instead of scaling by beta) when beta is zero. This will stamp out
any possible infs or NaNs in the output matrix, if it happens to be
uninitialized. Thanks to Tony Kelman for isolating this bug.
commit daca500db5e2448ba0da8047b75eb0f88d9f40e3
Merge: ab3bc915 47023502
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Jul 3 12:52:52 2014 -0500
Merge branch 'master' of http://github.com/flame/blis
commit 4702350278af31f662b458127777dd4d85a3192f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 3 11:48:23 2014 -0500
Defined _ukernel_void() wrappers to micro-kernels.
Details:
- Added wrappers for micro-kernels so that users may invoke the
micro-kernels without knowing what the function names actually are.
This is useful when an application wishes to call the micro-kernel
from a shared library instance of BLIS, where the application may not
necessarily have the luxury of grabbing the micro-kernel name(s) from
C preprocessor macros at compile-time. Also, since the wrappers use
void* pointers, one's environment does not need to be aware of some
BLIS types such as scomplex and dcomplex. These wrappers now join the
level-1 and level-1f kernel wrappers, which pre-dated this commit.
- Removed the wrapper definitions and prototypes from the micro-kernel
test suite modules, and replaced calls to them with calls to the new
wrappers mentioned above.
commit ab3bc9153b914fbaf259e15b66c91d628e7c8661
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Jul 3 11:19:43 2014 -0500
Fixed a bug for TRSM when BLIS_ENABLE_MULTITHREADING is not set but the multithreading environment variables are turned on
commit b8134b720b985783ee6a582a3eb5d6c51f00d051
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 2 16:02:39 2014 -0500
Quick and dirty multithreading for TRSM
Should work fine for small number of threads (up to 8 or maybe even 16).
However, performance is yet untested.
This parallelizes the "JR" loop for the left sided cases
and the "IR" loop for the right sided cases.
Future work is to parallelize the outer loops as well.
commit e8ef69692831db07ddbe9485a5e504ac3f03e496
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 2 14:59:27 2014 -0500
Added shared library support to build system.
Details:
- Modified top-level Makefile to support building shared (dynamic)
libraries.
- Updated most configurations' make_defs.mk files to include necessary
compiler/linker flags needed by top-level Makefile.
- Note that by default, all configurations presently do NOT build
shared libraries. To enable, one must change the value of
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.
commit b80df0f2cffb015da02e70a82b8512da9891ab67
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:52:39 2014 -0500
Added bump-version.sh script to 'build' directory.
Details:
- Added a bash script, bump-version.sh, to aid in incrementing the BLIS
version string.
commit 9ef1f1e21d083697fc730e48d7d9169c201f3da2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:48:17 2014 -0500
CHANGELOG update (0.1.3)
commit 036cc634918463b1caa0fd89c9a211f2f5639af7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:48:17 2014 -0500
Version file update (0.1.3)
commit 09d9a3bf6763932d9f571085b2cfd1b8631eccba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:43:26 2014 -0500
Reverting version file to test new version script.
Details:
- Changed version file contents to 0.1.2 so that I can test out a new
version file bumping script.
commit ebb33965981dcb2b0bdee5fc7fdf6c959420f311
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 11:22:50 2014 -0500
Added 'version' file.
commit 2cb9a5501a3cbeb6692cf68e896087ba73b6af69
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 10:42:29 2014 -0500
Removed 'version' from .gitignore file.
commit b40dcefc5ee31f67aa3990e2e9d2ef8ed1386a25
Merge: 7101a8ee b693b0cd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 10:39:05 2014 -0500
Merge pull request #11 from Maratyszcza/stable
[sc]axpy kernels for PNaCl
commit b693b0cddcfb41450e3c09a3ab97acb44c1ccdec
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 22 13:44:25 2014 -0700
[SC]AXPY kernels for PNaCl
commit 7101a8eec0327d6c3a7eb36eb4b0fd45c1c6d162
Merge: ad48dca2 020a831b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 19 21:46:50 2014 -0500
Merge pull request #10 from Maratyszcza/stable
Portable Native Client port
commit 020a831bc5f61744cb8354886aa679b99b1285f6
Author: Marat Dukhan <maratek@gmail.com>
Date: Thu Jun 19 00:58:26 2014 -0700
Code clean-up in PNaCl port
commit 491be4f91ed725522f5cc7184053857c6c376ada
Author: Marat Dukhan <maratek@gmail.com>
Date: Thu Jun 19 00:45:44 2014 -0700
Optimized dot product kernels for PNaCl
commit 4b8e71aab80182873a2e138eb07902b8d8fd5480
Author: Marat Dukhan <maratek@gmail.com>
Date: Thu Jun 19 00:43:25 2014 -0700
Use AR rcs flags for PNaCl target to avoid warning
commit 031deb2a5c718d569bde842590a791b812f4cf1d
Author: Marat Dukhan <maratek@gmail.com>
Date: Wed Jun 18 03:11:34 2014 -0700
PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac)
commit 68a02976e3c3638f0a9821342e269a1743e3ace3
Author: Marat Dukhan <maratek@gmail.com>
Date: Wed Jun 18 03:10:25 2014 -0700
Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features
commit 6f8462eb0ec278b89731e73ef583386a3371d095
Author: Marat Dukhan <maratek@gmail.com>
Date: Wed Jun 18 03:08:46 2014 -0700
Fix inconsistent VERBOSE macro in Makefile
commit b2ffb4de8b6872cb23537ad282e557d11dcd9c8b
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 15 18:41:30 2014 -0400
Reformatted PNaCl GEMM kernels
commit 6de2d472d98baa215264a776f3d5291780a6a085
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 15 08:44:31 2014 -0400
CGEMM and ZGEMM kernels for PNaCl
commit f064711a5e6fb3852c17c7520909b09dc27665f2
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 15 06:27:37 2014 -0400
SGEMM and DGEMM kernels for PNaCl
commit ad48dca22913a363899f0bef45553898718eebb1
Merge: ee2b6792 7118f87e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 14 15:10:13 2014 -0500
Merge pull request #9 from tkelman/memalign_windows
Use _aligned_malloc instead of posix_memalign on Windows
commit 7118f87e18b4941423472afc00215c1d1f2a1fcd
Author: Tony Kelman <tony@kelman.net>
Date: Sat Jun 14 06:53:20 2014 -0700
Use _aligned_malloc instead of posix_memalign on Windows
commit ee2b679281ca45fb40b2198e293bc3bc3d446632
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Jun 6 12:41:55 2014 -0500
Only include omp.h if BLIS_ENABLE_OPENMP is set
commit 19c05dfaac43c627f86e897c8c00f1f9440754aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 5 10:54:16 2014 -0500
CHANGELOG update (for 0.1.2).
commit 00f232f8ed1f7c41619b12ebf779ebe2c3b2d3cd
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Jun 2 13:40:57 2014 -0500
Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi
commit 3fc60e491426f6248c0feae88d971e4d1f88fb95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 21 11:34:42 2014 -0500
Fixed ldim alignment bug in core2 gemm ukernel.
Details:
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
a segmentation fault if a column-stored matrix's starting address was
aligned, but its leading dimension was such that its second column was
unaligned. Basically, the micro-kernel was assuming that aligned load
instructions were safe when they actually were not. An extra condition
that checks the alignment of cs_c (ie: the leading dimension in the
column storage case) has now been added. Thanks to Michael Lehn for
reporting this bug.
commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987
Merge: 8c5d6071 21fb0893
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 20 09:53:19 2014 -0500
Merge pull request #8 from tlrmchlsmth/master
Added multithreading to most level-3 operations.
commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon May 19 20:38:55 2014 -0700
Reverting changes dunnington and reference configs
Now they are unchanged from the main branch of BLIS
commit 8a0ef0e0db5880730425926f8ba56b457a2ba764
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri May 16 13:44:14 2014 -0500
Fixed rounding error in bli_get_range_weighted
commit 0b4b1680334528b1b60bc696537600f763198e92
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri May 16 12:23:37 2014 -0500
Fixed bug with disabling JC loop threading for right sided trmm
commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed May 14 16:20:06 2014 -0500
Disabled parallelism for right-sided TRMM JC loop
The loop has dependent iterations.
commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed May 14 14:59:04 2014 -0500
Fixed bug with bli_get_range_weighted
commit 45957cc7745e9bb1698408d72f53ef192e960820
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue May 13 17:14:46 2014 -0500
Allowed threading to be turned off
No longer requires OpenMP to compile
Define the following in bli_config.h in order to enable multithreading:
BLIS_ENABLE_MULTITHREADING
BLIS_ENABLE_OPENMP
Also fixes a bug with bli_get_range_weighted
commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon May 12 17:26:19 2014 -0500
Disabled multithreading of the kc loop
commit 456df0372170bd7ca2c7e2d85365a69f1f04de88
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 30 12:28:00 2014 -0500
Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity
commit f4fdfe8fc573553eb36795b79cdf681270dab71b
Merge: 31bb065b 8c5d6071
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 30 11:46:35 2014 -0500
Merge http://github.com/flame/blis
commit 8c5d6071e24ba10a53669390a47287e86ff354ce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 29 12:26:12 2014 -0500
Added _check() routines for fprint[mv], rand[mv].
Details:
- Added _check() routines for fprintm, fprintv, randm, and randv.
- Added invocations to the above routines from their respective
front-ends.
commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 28 16:48:25 2014 -0500
Changed treatment of NULL object buffers.
Details:
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
the buffer address being attached to be non-NULL. This is acceptable
because the user was already able to create and use objects with NULL
buffers (via bli_obj_create_without_buffer(), which initializes the
buffer to NULL).
- Inserted calls to newly defined function, bli_check_object_buffer(),
into nearly all operations' _check() or _int_check() functions. This
allows BLIS to abort peacefully if a computational routine is called
with an object containing a NULL buffer. By contrast, under such
conditions, BLAS would typically fail with a segmentation fault.
- Within operation front-ends, moved the calls to _check()/_int_check()
so that zero dimensions are checked first (and if found, execution
returns with trivial or no computation). This resolves issue #7. Thanks
to Jack Poulson for reporting this bug.
commit 31bb065ba40ae0c5a614e743b8025abca012b99e
Merge: 20e24430 7c619599
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 23 12:30:19 2014 -0500
Merge http://github.com/flame/blis
commit 7c61959955c8ba78160d0ed4d1979022029d963b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 10 17:18:36 2014 -0500
Can now query register blocksizes from blk algs.
Details:
- Added a new field to blksz_t objects that allows one to attach a
sub-object. Doing this allows us to associate a register blocksize with
any given cache blocksize. That way, the register blocksize can be
queried wherever the cache blocksize would normally be accessible
(e.g. a blocked algorithm).
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
blocksizes are attached to the cache blocksizes after they are created.
commit 58671597d3d450817b2eda576c05ed6dadd8af6d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 10 15:35:30 2014 -0500
Minor cleanups to level-2 _cntl.c files.
Details:
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
imported and used, rather than blocksizes being declared locally.
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
4m/3m variants).
- Removed test/old/test_blis2.c.
commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Tue Apr 8 17:50:44 2014 +0000
Some fixes for the bgq kernels
commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:43:44 2014 -0500
Add -openmp to ldflags as well
commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:37:50 2014 -0500
Added -openmp flag to Xeon Phi build for convenience
commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:31:15 2014 -0500
Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC
commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:29:10 2014 -0500
Fix for tree barrier freeing bug
commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 15:09:10 2014 -0500
Bunch of minor fixes
Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)
Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
commit 575fb9b0b08f3bdb56ccde056da619d1585617c1
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 12:13:29 2014 -0500
Changed default blocking factor to default double precision MR and NR
commit ab9c7880335c281432d5809fe0dec46753d22569
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 11:38:11 2014 -0500
Added faster tree barriers necessary for performance for Xeon Phi
Fixed up some stuff in the thread info free functions
Disabled threading for TRSM so that it actually works when threading environment variables are set
commit ec58a7923cccac08632670caadf3cf6ff5dce766
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 10:22:48 2014 -0500
Freeing thread info paths.
Also made herk IC and JC loops do weighted partitioning
commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36
Merge: 4e3eb39a 21a0efb3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 09:54:54 2014 -0500
Merge http://github.com/flame/blis
Conflicts:
kernels/bgq/1/bli_axpyv_opt_var1.c
kernels/bgq/1/bli_dotv_opt_var1.c
commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Fri Apr 4 14:50:03 2014 +0000
Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well
commit 21a0efb33d7435139e9c43c1a4787a6bff533e26
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 3 16:38:44 2014 -0500
Fixed follow-up to issue #6.
commit c318157a9bee8ea6e59be16f99f65d9271fe0d27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 3 16:24:34 2014 -0500
Fixed issue #6 (incorrect 'restrict' usage).
Details:
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
(However, there may be other instances of similar misuse elsewhere in
BLIS.) Thanks to Jeff Hammond for reporting this issue.
commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 3 12:25:45 2014 -0500
Added #include "arm_neon.h" to ARM gemm ukernel.
Details:
- Inserted #include "arm_neon.h" into gemm ukernel source file for
arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.
commit 2041c264517b6c590fd4f7e8253e6911b622d1c3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Apr 3 10:30:03 2014 -0500
Added barriers needed prior to doing scalar reset for rank-k updates.
commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 1 14:34:31 2014 -0500
Attempted to fix uninitialized variable warnings.
Details:
- Added initialization statements to various macros used in level 1m and
1m-like operations. I wasn't able to reproduce the reported behavior,
so hopefully this takes care of it. Thanks to Jeff Hammond for the
report.
commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 1 12:57:24 2014 -0500
Use generic paths for toolchain in POWER7.
Details:
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
commit 1584ae1c83c3a8c1af76acb46404747507650f19
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 28 15:15:48 2014 -0500
Fixed race condition involving scalar reset
commit 459dde4acc09e49380da58fb7b246db488884ad9
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 27 17:06:45 2014 -0500
Made barrier after packing implicit.
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
but not the outer packing routines.
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 27 14:18:46 2014 -0500
Some fixes for the internal functions,
was innappropriately only having thread chief do some things.
commit a6fd48345424e097f71652be013aa897e098b41e
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Wed Mar 26 17:19:46 2014 +0000
Added test drivers for level 3 BLAS that run tests in parallel using MPI
commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Wed Mar 26 15:39:05 2014 +0000
Some fixes for the bgq configuration
commit f0824a04fc75e231c3a3d7757fa4e7294173282f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 24 15:21:42 2014 -0500
Initial commit to enable threading in TRSM,
Also enabled weighted partitioning for herk, trmm
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
Correctly computed a_next and b_next for gemm, herk macrokernels
a_next and b_next point to the current micropanels in trmm
commit 23d9eab354fbc88165889832955e126772bf8488
Merge: 5d5dc2ee fd3e32a5
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 20 16:54:35 2014 -0500
Merge https://github.com/flame/blis
commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 20 16:43:36 2014 -0500
Parallelized trmm and trmm3
Also fixed bugs in packm
commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 20 13:59:48 2014 -0500
Refined INSERT_GENTFUNC macro usage.
Details:
- Defined new INSERT_GENTFUNC macros so that the macro always takes
exactly the number of arguments needed for the particular operation or
variant being defined. Many operations were using INSERT_GENTFUNC
macros that expected one auxiliary argument even though none were
needed. Those instances have now been updated. Most of these instances
were in the level-0 and -1v operations, as well as some operations
defined in frame/util.
commit 9b0e715f29338a1a1d6445907d2445c35f011121
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 19 15:47:54 2014 -0500
Minor simplifications to trmm, trsm macro-kernels.
Details:
- Simplified some code that would have allowed the diagonal of a trmm
or trsm triangular matrix to intersect the short end of a micro-panel.
This is disallowed via higher-level constraints on cache blocksizes, so
this code was never needed and only served to obfuscate.
- Updated some comments in trmm, trsm macro-kernels.
commit a3902750b9ab4923433f7e353f3669c3c419f8e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 19 12:35:17 2014 -0500
Reorganized norm operations.
Details:
- Completely reoganized norm operations:
- Renames:
- fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
- absumv -> norm1v (vector 1-norm)
- New operations:
- norm1m (matrix 1-norm)
- normiv, normim (infinity-norm)
- amaxv (BLAS-like absolute maximum value index)
- asumv (BLAS-like absolute sum)
- Deprecated absumm, as it did not correspond to any actual norm.
(However, an inlined version now exists in the testsuite module for
randm.)
commit c0140cb752f27e99742f85d23be2181c00a1335e
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Mar 19 11:21:16 2014 -0500
Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state
Now just performed by the master thread.
commit fb42983bd9943711baa7d1c6496de1215bb816ef
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 16:37:28 2014 -0500
Fixed a barrier bug and a thread decorator bug
commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 15:23:09 2014 -0500
Fixing function pointer issues with thread decorator
commit ec8b88f93533942d3711191873310e7ff281bda6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 14:35:37 2014 -0500
Enabled threading for packm blocked variants 3 and 4
commit 0ac534cdf657bbf04601abfe719ba2887aab5da7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 13:26:27 2014 -0500
Added decorator for calling parallelized intermal functions
Will allow for easy support for different threading models
commit 5296f58975f7d351f88909cc80b6d0cffd73def7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 17 17:15:35 2014 -0500
Fixing some bugs with herk parallelization
commit c51d0110831eb89361b4720bf7ed75edbd26ebce
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 17 15:00:47 2014 -0500
Initial multithreading support for HERK
commit c720b141568d1f289146bf34ded08001f2c0dfbb
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 17 11:39:32 2014 -0500
Switched to using environment variables to control threading.
The environment variables all follow the format BLIS_X_NT,
where X is the index of the loop as described in our paper
Anatomy of High Performance Many-Threaded Matrix Multiplication.
These indices are IR, JR, IC, KC, and JC.
Also enabled parallelism for hemm and symm, but these are currently untested.
commit 92233cf64274b27b2217c5cfffe75443ff6137a4
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 11 14:16:08 2014 -0500
Some fixes to gemm thread info tree creation,
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
instead of BLIS_SINGLE_THREADED
commit 020f80c30289d8bcaa688bf600b01fae9b23b54f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 11 12:08:17 2014 -0500
Added files specific to threading for gemm and packm operations
commit 8d8f4352a41926bc923e47be836365b6b726aff2
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 10 15:47:28 2014 -0500
Added single threaded thread info data structures specifically for gemm and packm
commit 0e8677761175189583ca7d855e24b2bbdd2dada8
Merge: 2e727a02 b3bff631
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 10 15:16:21 2014 -0500
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 10 15:14:33 2014 -0500
Modifying the thread info data structures
This change makes each operation have its own thread info type,
allowing more fine control of threading in operations that have different types of suboperations
commit a770590cf21a459f04bf941c58ee2afd272cc441
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 3 14:31:44 2014 -0600
Minor fixes to sumsqv, abmaxv.
Details:
- Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
the vector (or matrix) contains a NaN.
- Minor change to bli_abmaxv_unb_var1() to more closely mimic the
behavior of netlib BLAS's izamax(). There, a "less than or equal to"
operator is used in the search instead of "less than", which would
change the element index returned if there were multiple maximum values.
- Added macro function definitions for bli_isinf() and bli_isnan(), which
are currently implemented in terms of isinf() and isnan() from math.h.
commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7
Merge: 2c158fb8 e8757b03
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 16:53:24 2014 -0600
Merge https://github.com/flame/blis
commit 2c158fb885c27f7b599dc1e85b57edd684f19223
Merge: e4738c48 c2b2ab62
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 16:46:23 2014 -0600
Merge https://github.com/flame/blis
Conflicts:
frame/1m/packm/bli_packm_blk_var1.c
commit e8757b03a74f9891632242e9a90efb32150826f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 27 16:40:07 2014 -0600
Use "%ld" as int format specifier in fprintm.
Details:
- Changed "%d" to "%ld" when printing integers via bli_fprintm().
- Meant to include this in previous commit.
commit c663ce3b5170fee7dfb5b528b650d70c8e932cac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 27 16:32:57 2014 -0600
Fixed various bugs when C99 complex is enabled.
Details:
- Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
elsewhere in the framework that were not yet set up to work properly
when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
- Extensive changes to f2c-derived files in frame/compat/f2c to allow
C99 complex storage. Most of these changes center around accessing
real and imaginary components via bli_?real()/bli_?imag() accessor
macros, and setting of values via bli_?sets() assignment macros.
(Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
was broken.)
commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 16:29:46 2014 -0600
Added support for parallelism in gemm micro-kernel
commit bfe214b633765ed40b57b330fbb84c332663aa40
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 15:53:10 2014 -0600
Fixed bug with parallel packing, and bug with allocating an array of thread infos
In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
This dependeny was removed, allowing each iteration to be executed in parallel.
Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.
commit 6193d9ceea552e67170dba45abde04c64271c705
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 14:09:19 2014 -0600
Fixed bug in thread trees
commit ac5a2de1d17ffd460b00fee9757898525a09abae
Merge: 01b125e8 bd3c7ecf
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 11:59:33 2014 -0600
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
commit 01b125e815f19410e8e0611d088b84570e499e93
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 11:55:45 2014 -0600
First pass at adding parallelism to BLIS.
Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.
commit c2b2ab62707e4174892aff3ce65f36f54878fae5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 26 12:46:45 2014 -0600
Deprecated panel stride alignment in bli_config.h.
Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
configurations. It was already going unused in packm_init() since the
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
micro-kernels.
commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 25 17:58:42 2014 -0600
CHANGELOG update (for 0.1.1).
commit fde5f1fdece19881f50b142e8611b772a647e6d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 25 13:34:56 2014 -0600
Added extensive support for configuration defaults.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
macro constants. Examples:
BLIS_SAXPYV_KERNEL_REF
BLIS_DDOTXF_KERNEL_REF
BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
with a common base name; [sdcz] datatype flavors of each kernel or
micro-kernel (level-1v, -1f, or 3) may now be named independently.
This means you can now, if you wish, encode the datatype-specific
register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
undefined in bli_kernel.h will default to the corresponding reference
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
datatype chars to match the number of types the kernel WOULD take in
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
your level-1v/-1f kernels. The framework now prvides a _kernel()
function which serves as the obj_t wrapper for whatever kernels are
specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
longer need to include any prototyping headers from within
bli_kernel.h. The framework now generates kernel prototypes, with the
proper type signature, based on the kernel names defined (or defaulted
to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
kernel is left undefined by bli_kernel.h, but its same-precision real
domain equivalent IS defined, BLIS will use a 4m-based implementation
for the datatype x implementations of all level-3 operations, using
only the real gemm micro-kernel.
commit 15b51e990f1d21333b5f7af97c211756247336e5
Merge: 6363a9f6 fc04b5eb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 21 09:04:32 2014 -0600
Merge branch 'master' of github.com:fgvanzee/blis
commit fc04b5eb69868c341ce03f5ef1f02de4b8c121b0
Merge: b29e1c2b d1813c9d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 21 09:04:13 2014 -0600
Merge pull request #3 from figual/master
New ARM armv7a kernels and Assembly file consideration in Makefile
commit d1813c9dee34410833db5061e6588ec1a6c9ecd4
Author: Francisco Igual <figual@pandaboard.(none)>
Date: Fri Feb 21 15:14:31 2014 +0100
Added new armv7a micro-kernels and configuration files from Werner Saar.
commit 0cd098c03a000ed9426a7e9135190696da8cadbc
Author: Francisco Igual <figual@pandaboard.(none)>
Date: Fri Feb 21 15:12:30 2014 +0100
o Modified Makefile to consider .S assembly microkernels.
commit 6363a9f658257fe3d814a3dce5308f807adb54a2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 19 17:00:52 2014 -0600
Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
virtual complex micro-kernels which are implemented via only real
domain micro-kernels. Two new implementations are provided: 4m and 3m.
4m implements complex matrix multiplication in terms of four real
matrix multiplications, where as 3m uses only three and thus is
capable of even higher (than peak) performance. However, the 3m method
has somewhat weaker numerical properties, making it less desirable
in general.
- Further refined packing routines, which were recently revamped, and
added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
commit b29e1c2b278c177e104c84ba462820ee8296df6c
Merge: ee60377e bd3c7ecf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 14 14:11:54 2014 -0600
Merge pull request #2 from tlrmchlsmth/master
Fixes and improvements to xeon phi implementation.
commit bd3c7ecfb54a9b9851c7d364f41c21e4cff52f6f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Feb 14 14:05:57 2014 -0600
Removing changes to input.general and input.operations
commit ce066863683cb4e910270cf8ab8e138b01ff3358
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Feb 14 13:40:24 2014 -0600
Fixed more Xeon Phi bugs, especially with scattered update
commit 31134b5c7076423aee1b4f494e925f27171d97e6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Feb 14 11:19:44 2014 -0600
Some fixes, changes, and improvements to the microkernel to the Xeon Phi
commit ee60377e467862b9d8a7205c45dce5cf66c78c46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 13 14:03:31 2014 -0600
Shifted some fields in info_t.
Details:
- Shifted the pack order, pack buffer type, and structure type fields
to make room for an extra bit in the pack type/status field.
commit bd3ab1ad4cf42f8bc30ab262acf8eccb49bb1a08
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 13 09:29:55 2014 -0600
Minor fixes to trsm consistent with prev on trmm.
Details:
- Removed use of bli_min() and bli_max() that were only being used to
try to support situations where the diagonal would intersect the
short end of some micro-panels, which is situation that is disallowed
at a higher level by various constraints on the register and cache
blocksize. This only affected trsm_ll and trsm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.
commit 6260b0b5f8bd248f3f66e5a1c6854bdbd9d02ad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 13 09:19:56 2014 -0600
Fixed obscure bug in trmm_ll, trmm_lu.
Details:
- Fixed an obscure bug in left-hand trmm that would only manifest when
non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
are used.
- Removed use of bli_min() and bli_max() that were only being used to
try to support situations where the diagonal would intersect the
short end of some micro-panels, which is situation that is disallowed
at a higher level by various constraints on the register and cache
blocksize. This only affected trmm_ll and trmm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.
commit 16915c1c1e55c660bf82141cdadf7c0860d5b464
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 11 10:54:19 2014 -0600
Fixed an obscure bug in packm_cxk().
Details:
- Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
from ldp, which is always equal to PACKMR or PACKNR. The problem with
this is that the pack ukernels were implicitly assuming that the
panel dimension of the panel being packed was equal to ldp, which
is not the case when the register blocksizes extensions are non-zero
(ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
problem has been fixed by passing ldp into the pack ukernels, which
now walk through the packed micro-panel region by incrementing by this
value, rather than incrementing by the inherent panel dimension value
assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
- Also fixed a very minor edge case inefficiency whereby pack ukernels
smaller than the default were not being used in edge cases, and instead
those situations were being handled by scal2m. This is related to the
issue above, because the pack ukernel itself was being chosen based on
ldp instead of the panel dimension.
commit b7da57b282c5a5e2208946e60309d2352f55351d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 11 10:28:23 2014 -0600
Updated calls to packm_blk_var2() in testsuite.
Details:
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
_var1(). Meant to include this in previous commit.
commit c255a293e25b2223c88e8800267cd06ad2a90041
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 10 14:31:24 2014 -0600
Consolidated packm_blk_var2 and var3.
Details:
- Consolidated the functionality previously supported by packm_blk_var2()
and packm_blk_var3() into a new variant, packm_blk_var1().
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
to accommodate above changes.
- Removed packm_blk_var3() and retired packm_blk_var2() to
frame/1m/packm/old.
- Updated all level-3 _cntl_init() functions so that the new, more
versatile packm_blk_var1 is used for all level-3 matrix packing.
commit 32d8f264ae7b28155f5d7b21dcc5ecb78da2e0ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Feb 9 10:07:37 2014 -0600
Refactored packm variants.
Details:
- Revised packm_blk_var2() and _var3() by encapsulating the general,
hermitian/symmetric, and triangular panel-packing subproblems into
separate functions: packm_gen_cxk(), packm_herm_cxk(), and
packm_tri_cxk(), respectively. Also, homogenized the packm code as
well as the new specialized packm_*_cxk() code to further improve
readability.
commit 6c8067028707947fcdf4f856a272e15bb9ed91e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 7 11:27:15 2014 -0600
Renamed enumerated type in testsuite and modules.
Details:
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
renamed all corresponding "impl" variables to "iface".
commit 6c12598b1bc567f0b08f58aebdc753a1c1390378
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 18:26:35 2014 -0600
Employ simpler INSERT_ macro for ref ukernels.
Details:
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
argument--the base name of the function--and employed this macro
in the reference micro-kernel files instead of the _BASIC macro,
which takes one auxiliary argument. That argument was not being
used and probably just acted to unnecessarily obfuscate.
commit 32cae66326b68706d0e695cfd60c9ca5bc32c534
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 18:06:42 2014 -0600
Fixed some instances of sloppy 'restrict' usage.
Details:
- Fixed some technical incorrectness with some usage of the 'restrict'
keyword in the reference trsm micro-kernels.
- Tweak to testsuite/Makefile that causes rebuild if libblis was
touched.
commit 7aceef7683e2a2aff3c7ec2a73508036af2e19e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 17:31:19 2014 -0600
Updated comments in macro-kernels.
Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
NR = 4. It's always good for MR != NR in the reference configuration
since it may help uncover bugs related to non-square micro-kernels.
commit 8fd292aa78950bcdf556605718f09d13f9575abc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 14:32:21 2014 -0600
Pass panel dimensions into macro-kernels.
Details:
- Modified the interfaces to the datatype-specific macro-kernels so that:
- pd_a and pd_b are passed in (which contain the panel dimensions of
packed panels of a and b).
- rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
- Modified implementations of datatype-specific macro-kernels so pd_a,
pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
and PACKNR, respectively.
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
header file bli_kernel_post_macro_defs.h.
commit 3404e6657eabb017cd1580a2f1dd8e6fb13df923
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 5 11:19:10 2014 -0600
Deprecated incremental blocksize macro const defs.
Details:
- Removed macro constant definitions related to incremental blocksizes
from all configurations' bli_kernel.h files. This change is minor and
is mostly a cleanup related to a previous commit.
commit 1e9afd39a63e0a58167d4439c1a0a880a4a35657
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 4 20:15:19 2014 -0600
Comment updates (removed vestiges of "bd").
commit 5cf58f7c2d5bc0d2d94d9576f7158d8f133b7aac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 4 09:15:19 2014 -0600
Added early returns for "object is zeros" case.
Details:
- Added some logic to packm_init(), pack_int() and gemm_int() so that
(a) objects marked as BLIS_ZEROS are not packed, and (b) those
objects are not computed with. This functionality is not currently
needed by any existing implementations, but may be used in the
future.
commit 6bbd4be769a9b344a55abe5ddaca1a99fd29f7b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 3 13:15:25 2014 -0600
Added 'f' on some gemm and trmm blocked variants.
Details:
- Added 'f' to some block variant files/functions to be consistent with
other file/functions' naming convention. Here, the f indicates
partitioning in the "forward" direction.
commit eb13cb2c6b182df5e2a9b88c76f50e2cee25b9e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 3 11:07:01 2014 -0600
Removed redundant non-gemm blksz_t creation.
Details:
- Removed code that creates duplicate blksz_t objects for herk, trmm,
and trsm. Instead, the gemm blksz_t objects are accessed via extern
and used directly. This reduces the amount of code associated with
each of the three _cntl_init() and _cntl_finalize() function.
commit 0a023a7d9e58e53b8c204a5f49aa8ca9afeba938
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 29 14:02:08 2014 -0600
Introduced new level-3 front-end layer.
Details:
- Added new _front() functions for each level-3 operation. This is done
so that the choosing of the control tree (and *only* the choosing of
the control tree) happens in what was previously the "front end"
(e.g. bli_gemm()). That control tree is then passed into the _front()
function, which then performs up-front tasks such as parameter
checking.
commit 251c5d112196d37b183e554bc9d406104aed65fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 28 19:40:29 2014 -0600
Removed redundant hemm, her2k control trees.
Details:
- Removed code that generated a control tree specifically for hemm and
symm. Instead, the gemm control tree is now configured so that it
works for gemm, hemm, or symm.
- Retired most her2k code, as it was not being used. (Currently, her2k is
implemented as two invocations of herk.) I couldn't think of many
situations where her2k variants were needed.
- Removed some older her2k code.
commit 5a36e5bf2f59d1e85d6dbce32a07d604c5e82d11
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 27 11:13:00 2014 -0600
Embed func_t microkernel objects in control trees.
Details:
- Modified all control tree node definitions to include a new field of
type func_t*, which is similar to a blksz_t except that it contains
one function pointer (each typed simply as void*) for each datatype.
We use the func_t* to embed pointers to the micro-kernels to use for
the leaf-level nodes of each control tree. This change is a natural
extension of control trees and will allow more flexibility in the
future.
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
from the incomming (previously ignored) control tree node and then pass
the queried pointer into the datatype-specific macro-kernel code, which
then casts the pointer to the appropriate type (new typedefs residing
in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
is, determined when the datatype-specific macro-kernel functions are
instantiated by the C preprocessor).
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
base names if they do not exist already, and then uses those to build
datatype-specific micro-kernel function names. This will allow
developers extra flexibility if they wanted to, for example, name each
of their datatype-specific micro-kernels differently (e.g. double
real might be named bli_dgemm_opt_4x4() while double complex might be
named bli_zgemm_opt_2x2()).
- Inserted appropriate code into _cntl_init() functions that allocates
and initializes a func_t object for the corresponding micro-kernels.
The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
and then reused via extern wherever possible.
commit 6cbd6f1c7f1915180aa28939833afde48665c5ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 24 10:38:29 2014 -0600
Removed commented mixed domain macro-kernel code.
Details:
- Removed commented-out code from macro-kernels that was supposed to
facilitate implementing mixed domain (complex times real) matrix
multiplication. This functionality is still (probably possible),
but I'm getting tired of looking at the code every time I edit
a macro-kernel. Plus, there are probably ways of doing it at a
higher level, via control trees.
commit 29778be1119f1a884330d7f8dc424a2df4101d58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 22 16:03:11 2014 -0600
Removed b_aux field from cntl nodes.
Details:
- Removed b_aux field from all control tree node definitions. This field
was being used in certain optimizations (incremental blocking) that were
not actually being employed within BLIS, and are probably not employed
by others.
- Updated all _cntl_obj_create() function definitions and invocations
according to above change.
- Retired bli_gemm_blk_var4.c, which was one such function that employed
incremental blocking, but which was never called by BLIS itself.
commit 06ac727a42ec9e832c7832745036702014638f99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 15 16:44:52 2014 -0600
Updated some comments in level-3 front ends.
commit d628bf1da1560f1f5126a1ddfed8714f0a4b8da3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 15 11:40:12 2014 -0600
Consolidated pack_t enums; retired VECTOR value.
Details:
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
makes room in the three pack_t bits of the info field of obj_t so that
two values are now unused, and may be used for other future purposes.
- Updated sloppy terminology usage in comments in level-2 front-ends.
(Replaced "is contiguous" with more accurate "has unit stride".)
commit ddc8c1c379b4787be5954802906593d7ea144452
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 13 14:55:43 2014 -0600
Suppress warning in Makefile (UNINSTALL_LIBS).
Details:
- Redirect errors to /dev/null when using 'find' to locate libraries that
would be uninstalled upon executing "make uninstall-old". Before, if the
Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
or directory" message was emitted. This message was harmless, but is now
suppressed in this situation.
commit f8f67d7251bffc05020e20527c100c8115fd5e55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 10 09:06:11 2014 -0600
Typecast bli_getopt() return value in testsuite.
Details:
- In the test suite driver, inserted an explicit typecast of the return
value of bli_getopt() prior parsing. The lack of typecast caused a
problem on at least one system whereby a return value of -1 was
interpreted as garbage character. Thanks to Francisco Igual for finding
and submitting this fix.
commit e7f154fe2ed3e10e2323cefe5d25c2c23ac902c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 10 08:48:07 2014 -0600
Applied edge case fix to arm/neon microkernel.
Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
double precision real gemm microkernel in kernels/arm/neon/3.
commit 89c76a8a51d070d263c13bfa5ace65769509f2b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 9 12:08:37 2014 -0600
Allow building outside source distribution.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
a user can build a BLIS library outside of the top-level directory of
the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
which will compile, link, and run the testsuite binary. This works even
if the build directory is externally located, thanks to the test suite
binary's new -g and -o command-line options. Also, when creating the
test suite via the top-level Makefile, the linking is against the
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
commit 12fa82ec12cc340ab28552997d9d50f7c98691f8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 8 16:09:26 2014 -0600
Implemented bli_getopt().
Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
a custom version of getopt(), which may be used to parse command line
options passed into a program via argc/argv. I am implementing this
function myself, as opposed to using the version available via unistd.h,
for portability reasons, as the only requirements are string.h (which
is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
path) to the parameters and operations input files: -g may be used to
specify the general input file and -o to specify the operations input
file). If -g or -o or both are not given, default filenames are assumed
(as well as their existence in the current directory).
commit cafb58e86ea5cfb21b9eedc57ca8ebbf24252098
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 6 13:28:36 2014 -0600
Updated template micro-kernels to use auxinfo_t.
Details:
- Updated template micro-kernel implementations (located in
config/template/kernels), to adhere to the new auxinfo_t interface.
Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
and the BLAS compatibility layer).
commit 9ab126b499c3805045020cb89a8a5848e28d3bf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 6 12:13:26 2014 -0600
Removed error checks in netlib->BLIS param mapping
Details:
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
If the char value input to these functions was not one of the defined
values, bli_check_error_code() with the appropriate error code value
would be called, resulting in an abort(). This was unnecessary and
redundant since these routines are currently only used within the
BLAS compatibility layer, and they are only called AFTER parameter
checking has already been performed on the original BLAS char values.
If the application tried to override xerbla() to prevent an abort()
from being called, this error checking would still get in the way.
Thus, instead of reporting the error situation to the framework (ie:
calling abort()), an arbitrary BLIS parameter value is now chosen and
the function returns normally. Thanks to Jeff Hammond for finding and
reporting this issue.
commit 2cb13600f9f9601c60e7f96f4ca159d169ade9cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 3 12:29:13 2014 -0600
Updated year in copyright headers to 2014.
commit 290fa54e0083c9c837188b8321b13b1b282e7b0c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 20 14:10:26 2013 -0600
Store variable panel strides in trmm/trsm auxinfo.
Details:
- Changed the value being stored into the auxinfo_t structure in trmm
and trsm macro-kernels. Whereas before we stored whatever value was
provided to the macro-kernel implementation via ps_a/ps_b, now we
store the stride that will advance to the next variable-length
micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.
commit e3a6c7e77667fd749248df3f75f880266c3136ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 19 16:29:31 2013 -0600
Macroized conditionals for a2/b2 in macro-kernels.
Details:
- Replaced conditional expressions in macro-kernels related to computing
the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
commit a0331fb10a50393e31d16339053b75b944132da1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 19 14:50:11 2013 -0600
Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
with a pointer to a new datatype, auxinfo_t, which is simply a struct
that holds a_next and b_next. The struct may hold other auxiliary
information that may be useful to a micro-kernel, such as micro-panel
stride. Micro-kernels may access struct fields via accessor macros
defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
as well as macro-kernels (for declaring and initializing the structs)
according to above change.
commit 392428dea4001fe4384efe29f6cde32f8abeeb35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 12 19:01:47 2013 -0600
Added "ri" scalar macros.
Details:
- Added set of basic scalar macros that take arguments' real and
imaginary components separately, named like the previous set except
with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
"whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
commit f60c8adc2f61eaba06b892f4e73000159de93056
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 10 14:39:56 2013 -0600
Minor updates to dunnington configuration.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
commit 4ef20150492db254b5baf2368add62e19b0ac11b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 9 18:53:03 2013 -0600
Tweaks to dunnington configuration (x86_64/core2).
Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
double-precision real).
commit 5ad2ce7bf5ba3ea955e6d517bfd270e02820263b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 9 18:30:49 2013 -0600
Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
for x86_64/core2 were calling the wrong reference code (l instead
of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
commit d289f5d3a9c0e1a68a17c1c32b736e282a289c4c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 5 10:56:13 2013 -0600
Whitespace changes to level-2 blocked variants.
Details:
- Joined some lines in level-2 blocked variants to match formatting used
in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.
commit b444489f100d218bc8ef29b01ff8489c358559f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 3 16:08:30 2013 -0600
Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
every object contains an internal scalar that defaults to 1.0. This
facilitates passing scalars around without having to house them in
separate objects. These "attached" scalars are stored in the internal
atom_t field of the obj_t struct, and are always stored to be the same
datatype as the object to which they are attached. Level-3 variants no
longer take scalar arguments, however, level-3 internal back-ends stll
do; this is so that the calling function can perform subproblems such
as C := C - alpha * A * B on-the-fly without needing to change either
of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):
bli_obj_init_scalar_copy_of()
-> bli_obj_scalar_init_detached_copy_of()
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
bli_obj_create_scalar_with_attached_buffer()
-> bli_obj_create_1x1_with_attached_buffer()
bli_obj_scalar_equals() -> bli_obj_equals()
- Defined new functions:
bli_obj_scalar_detach()
bli_obj_scalar_attach()
bli_obj_scalar_apply_scalar()
bli_obj_scalar_reset()
bli_obj_scalar_has_nonzero_imag()
bli_obj_scalar_equals()
- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:
bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
bli_obj_is_scalar() -> bli_obj_is_1x1()
- Defined new macros to set and copy internal scalars between objects:
bli_obj_set_internal_scalar()
bli_obj_copy_internal_scalar()
- In level-3 internal back-ends, added conditional blocks where alpha and
beta are checked for non-unit-ness. Those values for alpha and beta are
applied to the scalars attached to aliases of A/B/C, as appropriate,
before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
attached to A and B are multiplied together to obtain alpha, while beta
is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
future support for mixed domain/precision. These can be added back later
once that functionality is given proper treatment. Also, removed the
creating of copy-casts of alpha and beta since typecasting of scalars
is now implicitly handled in the internal back-ends when alpha and
beta are applied to the attached scalars.
commit 992de486d6f23e69a623abd15ae77d7881d13871
Merge: 9552e6ee fd4ac636
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 2 13:58:46 2013 -0600
Unimplemented kernels now call reference.
Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
datatypes call the corresponding reference kernel. Previously, these
kernel functions called abort() with a "not yet implemented" error
message.
commit fd4ac636d9a55cec1476a444bd4e70def219dc8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 2 13:50:36 2013 -0600
Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
unimplemented kernel functions simply call the corresponding reference
implementation. (Previously, these unimplemented functions would
abort() with a "not yet implemented" message.)
commit 9552e6ee824d4345d5e908e869e071d19829819a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Nov 24 11:40:31 2013 -0600
Removed optional scaling from packm control tree.
Details:
- Removed does_scale field from packm control tree node and
bli_packm_cntl_obj_create() interface. Adjusted all invocations of
_cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
bli_obj_alias_to() does a full alias (shallow copy) while
bli_obj_alias_for_packing() does a partial alias that preserves the
pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
to support heterogeneous datatypes.
commit e65c476284db9ef64b23191a21c2584b1083342f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 19 10:05:35 2013 -0600
Minor updates to packm_blk_var2.c and _blk_var3.c.
Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
instead of setm(), scal2m().
commit 9e1d0d4bca48eda54301d8976f203e2544c9df3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 18:11:07 2013 -0600
Added trsm_l, trsm_u ukernels for x86_64/core2.
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
that already existed in kernels/x86_64/core2-sse3/3.
commit 85e7e02ea3a9190b6fcff5d46b00d41c79cb1242
Merge: 67761e22 70720054
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 12:02:00 2013 -0600
Merge branch 'master'. Forgot to git-pull.
commit 67761e224c92500eecf9c1540cc72bdd2fb27679
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 11:57:40 2013 -0600
Attempting to fix errors in bgq build.
Details:
- Removed restrict declaration from b_cast and c_cast from
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
are causing problems for xlc only in those two files and no other
macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
commit 707200541d344f98cf34c9801954dbb36fbe0447
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 11:17:31 2013 -0600
Syntax error fix in x86_64/core2 gemmtrsm_u ukr.
commit bbe2b84a49e7785d4d0c514cda34adfbe66478b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 11:11:06 2013 -0600
Updated Makefile in test, testsuite.
Details:
- Updated Makefiles in test and testsuite directories to use the new
BLIS header installation directory scheme, which is to compile with
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
commit 9bd7fcfd436625ca2108128086671319362f4d92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 10:58:09 2013 -0600
Outer-to-inner 'restrict' fix in macro-kernels.
Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
macro-kernels. Previously, all restricted pointers were being declared
at the outer-most function scope level. While this violates the C99
standard, very few of the compilers used with BLIS so far have seemed
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
for identifying this bug (and suggesting the fix).
commit 50549a6a31dd26cf63a013e0ede16b2c7ce835b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Nov 17 18:31:27 2013 -0600
Changed header install directory to include/blis.
Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
commit d70733abddfb9a95661897e1e4f3c1f3cfa7cbaa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 16 17:34:25 2013 -0600
Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
Thanks to Francisco Igual for contributing these kernels and
configurations.
commit d37c2cff62089c86983c2f79762f4b5329037373
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 13 10:47:11 2013 -0600
Minor comment and Makefile changes.
Details:
- Added missing 'check-config' and 'check-make-defs' targets to
testsuite/Makefile.
- Removed unused 'test' target from top-level Makefile.
- Comment changes to testsuite input files.
commit 19885f893a17b91ee79bead0620d0f913392d4c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 11 12:09:21 2013 -0600
Updated some kernel comment headers.
Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
instead of libflame.
commit 1a4d698f42981d74fe5f29b980031e1ee7dc42d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 11 10:15:40 2013 -0600
CHANGELOG update (for 0.1.0).
commit 089048d5895a30221b6b1976c9be93ad6443420d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 9 17:18:00 2013 -0600
Added object wrappers to 1f test suite modules.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
only apparent if you were configuring with something other than the
reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
working as intended the reference configuration was selected, because
most kernel sets, such as those in the template set, do not have object
wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
commit 9ef3752079de10124bed906b5d28479d04aa8187
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 8 17:20:47 2013 -0600
Updated template kernels wrt KernelsHowTo wiki.
Details:
- Merged latest state of KernelsHowTo wiki into template micro-kernels
located in config/template/kernels/3.
commit 376bbb59c8944e29c5c1ff6637920d8451370afa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 8 11:17:34 2013 -0600
Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 7 11:36:11 2013 -0600
Added comments to testsuite/input.operations.
Details:
- Added extensive comments to the top of testsuite/input.operations,
which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.
commit a98f78b715fb256a519870071bb5266130d70b21
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 6 15:32:47 2013 -0600
Changed dim_t and inc_t to be signed integers.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
This will facilitate interoperability with Fortran in the future.
(Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
or use the absolute value of the strides, rather than the raw strides
which may now be signed. Added new macros bli_is_row_stored_f() and
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
and changed the packm_blk_var[23] variants to use these macros instead
of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
layer properly handles situations where vector increments are negative.
Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
_check routines.
commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 6 10:09:10 2013 -0600
Minor comment update to BLAS compat files.
commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 4 15:50:00 2013 -0600
Fixed bugs in scalv and setv.
Details:
- Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
a segmentation fault may occur if beta is not the same type as
the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.
commit f5953259a1842ee48e5833c22ac86e68a337bfe1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 4 14:43:55 2013 -0600
Fixed a bug related to Hermitian matrix diagonals.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
diagonal elements of Hermitian matrices were already zero. This property
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 2 17:19:40 2013 -0500
Added scaling to abval2s, sqrt2s macros.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
and overflow from squaring the real and imaginary components. (This is
the same technique used to fix recent bugs in invscals/invscaljs and
inverts.)
commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 1 10:28:04 2013 -0500
Added new dotxaxpyf variant 2.
Details:
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
kernels. By default, this variant is not used by any other operation.
commit 97f89fbcf202d72fc440b614708e352ea31633e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 1 10:16:39 2013 -0500
Fixed bug in complex invscals.
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
operator instead of "<".
commit eda42a21d17a2742eab69ab801ed530b82488c8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 31 18:00:44 2013 -0500
Defined missing symbols in bla_rotg.c
Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
these bugs.
commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 30 14:39:01 2013 -0500
Fixed bugs in scalm and setm.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 24 14:32:20 2013 -0500
Fixed over/under-flow in complex inversion.
Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
in an "unsafe" manner, such that very large and very small values were
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.
commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 23 12:15:25 2013 -0500
Fixed parameter checking issue in BLAS syr[2]k.
Details:
- Fixed a minor parameter checking bug in the BLAS compatibility layer
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
trans parameter of either operation, it is (a) allowed, and (b) treated
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
Sukharev for finding and reporting this bug.
commit a091a219bda55e56817acd4930c2aa4472e53ba5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 14 10:11:29 2013 -0500
Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 11:37:19 2013 -0500
Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
the Sandy Bridge architecture, including a dgemm ukernel coded with
AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
commit 03106d650e4030d4c9831683448376f92fc52d41
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 10:40:38 2013 -0500
Fixed minor perf bug in gemm_ker_var2.
Details:
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
Tze Meng for helping me identify this bug.
commit b053337387dbdef9035be03538222670a21707ca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 18:26:55 2013 -0500
Added fusing factors, MR/NR to test suite output.
Details:
- Updated the test suite driver (and modules where appropriate) so that
the level-1f fusing factors are output along with the variable dimension.
While this is not strictly necessary, since the fusing factors are output
in the initial parameter summary, it allows extra reassurance to the user
since the fusing factors appear alongside the variable dimension, which
together give a complete picture of the problem size. Similar changes were
made for outputting the register blocksizes when reporting results for the
micro-kernel test modules.
commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 14:20:06 2013 -0500
Added test suite modules for level-1f, 3 kernels.
Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
commit 680188d46bb15b9a1a2867638104939dc77ca2a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 13:23:37 2013 -0500
Cleaned up old test drivers.
Details:
- Minor updates to old test drivers in preparation for our participation
in ACM TOMS's replicated results initiative.
commit 3690bdd4f95769c935c410414112102cc3e108b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 11:45:33 2013 -0500
More updates to level-1f kernels for core2-sse3.
Details:
- Changed types in function signatures to match new prototypes. Meant to
include this in previous commit.
commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 11:27:27 2013 -0500
Fixed outdated fusing factor macros in 1f kernels.
Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
this out.
commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 1 17:01:18 2013 -0500
Added section overrides to test suite.
Details:
- Added new lines of input to the test suite's input.operations file, which
allows the user to disable entire sections (levels) of tests. Before this
change, the user had to manually disable each operation tests's "master
switch". (This is why input.operations.0 existed: to allow a more
convenient starting point for someone who only wanted to test one or a
few operations.)
commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 30 12:58:18 2013 -0500
Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
commit 97aaf220a847363b4da35935eca17790c0ef71f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 10:51:36 2013 -0500
Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
Intel MIC
IBM BG/Q
IBM Power7
AMD Piledriver
Loogson 3A
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
"clarksville" configuration to "dunnington".
commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 13 14:31:53 2013 -0500
Removed default configuration behavior.
Details:
- Changed the configure script so that it no longer defaults to the
reference configuration. This change is being made so that the
developer has a firm awareness of which configuration is being used
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
suggested change.
commit da77e9614f54f92f703f01e3b9bd67a83280150c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 13 12:00:37 2013 -0500
Minor improvements to static memory allocator.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
functionality includes computing the pool size for each datatype (using
that datatype's cache blocksizes) and using the maximum to size the
actual pool array. This addresses the somewhat common pitfall whereby a
developer updates cache blocksizes in bli_kernel.h for only one datatype
(say, single-precision real), while the memory pools are sized using the
double-precision real values. Then, when the developer attempts to link
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
a message saying the static memory pool was exhausted. Clearly, this
message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
meant to check for size consistency among the various cache blocksizes.
(Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
reasonable place to put these constants, rather than further crowd up
bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
I hadn't used it in months. We can always re-create it later.
commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 10 17:17:28 2013 -0500
Added ESSL and Accelerate targets to test drivers.
Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
/ providing this patch.
commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 10 16:35:12 2013 -0500
Various changes to treatment of integers.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6d1).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.
commit 068437736b41d51a1f5ec47839f059bf58a20413
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 14:07:58 2013 -0500
Fixed set-but-not-used compiler (gcc) warnings.
Details:
- Used void-casts of certain variables to appease gcc (and perhaps other
compilers) when such variables are only used in the complex instances of
the functions. Special thanks to Karl Rupp for suggesting a portable fix
for these warnings.
commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 13:48:52 2013 -0500
Small fix to Windows defs.mk makefile fragment.
Details:
- Commented out a !include statement that was attempting to include a
version file that does not yet exist. For now, the version string is
hard-coded into defs.mk.
commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 13:09:16 2013 -0500
Added Windows build system.
Details:
- Added a 'windows' directory, which contains a Windows build system
similar to that of libflame's. Thanks to Martin for getting this up
and running.
- Spun off system header #includes into bli_system.h, which is included
in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).
commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 11:04:46 2013 -0500
Edited bli_?lamch.c to avoid Windows keyword.
Details:
- Renamed "small" variable to "smnum" to avoid collision with Windows type
by the same name. This change is needed in advance of the upcoming Windows
build system.
commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 4 13:36:07 2013 -0500
Switched integer typedefs (again) to C types.
Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
operation tests.
commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 4 12:09:11 2013 -0500
Falling back to 32-bit integers for dim_t, etc.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
I've changed the default typedef for gint_t and guint_t from int64_t and
uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
to introductory output of the testsuite.
commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 3 21:58:07 2013 -0500
Applied temp fix to typecasting bug in testsuite.
Details:
- Applied a temporary fix to the typecasting bug in the testsuite driver.
The fix involves casting both numerator and denominator to unsigned long.
This fix is more voodoo than science, as I can't be sure why it even
works.
commit 9ee6e125373869c4213c017ce772c38ecefba103
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 3 21:53:27 2013 -0500
Changed dimension spec for gemm in testsuite.
Details:
- Encounted a bizarre typecasting bug whereby the test suite was not
computing the proper dimension from the problem size and dimension
specification when the latter was set to -3. Will investigate.
Thanks to Fran for finding this "bug".
commit e8be081e68c385ab44d0fea8dade21d40c200b79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 28 15:52:34 2013 -0500
Generalized matlab and file output in testsuite.
Details:
- Added a new option in input.general that allows outputting in
matlab/octave format so that one can output in matlab format
independently from outputting to files.
- Adjusted input.operations according to above.
- Added input.operations.0 and input.operations.1 with all options
disabled and enabled, respectively.
commit d352c746e5683037d41b5061dfb5ce08e1d0843b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 27 13:41:46 2013 -0500
Added single/real gemm micro-kernel for x86_64.
Details:
- Added a single-precision real gemm micro-kernel in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
bli_packm_blk_var3.c
commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 19 12:07:41 2013 -0500
Fixed bug in bli_acquire_mpart_t2b(), _l2r().
Details:
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
that cause incorrect partitioning when SUBPART0 was requested. This
bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
this bug.
- Removed dupl kernels from kernels/x86_64/3 directory.
- Uncommented beta == 0 optimizaition code in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 8 14:39:35 2013 -0500
Moved init_safe(), finalize_safe() to BLAS compat.
Details:
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
initializers in the BLIS layer wasn't buying us anything because the user
could still call the library with uninitialized global scalar constants,
for example. Thus, we will just have to live with the constraint that
bli_init() MUST be called before calling ANY routine with a bli_ prefix.
- Added the missing _init_safe() and finalize_safe() calls to the level-1
BLAS compatibility wrappers.
commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 8 13:30:19 2013 -0500
Miscellaneous updates.
Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
kernels, in that the interior and edge-case handling is expressed once
inside the loops in the n and m dimensions, rather than the edge-case
handling being "unrolled" and expressed as distinct code regions. The
previous macro-kernel now lives in retired form in the subdirectory
other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
applied to optimize the macro-kernel accesses pattern on C when C is
row-stored.
- Various updates inside of test/exec_sizes.
commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 7 12:27:04 2013 -0500
Fixed bug in interface of bla_ger_check().
Details:
- Fixed the misplaced lda parameter in the function signature of
bla_ger_check(). Thanks to Tyler for finding this bug.
commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 6 12:25:51 2013 -0500
Fixed cpp guard typos in frame/compat/check files.
Details:
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
- Fixed various syntax errors in the code that had yet to be compiled
due to the aforementioned bug.
commit f4ec28e723d28d998f1038f82da6986e44320ef6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 1 11:24:23 2013 -0500
Added basic OpenMP-based gemm and packm files.
Details:
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
into the following auxiliary files
frame/1m/packm/other/bli_packm_blk_var2.c
frame/3/gemm/other/bli_gemm_ker_var2.c
The routine in the first file uses a basic OpenMP parallel region to
parallelize the packing of blocks of A and panels of B, while the
second uses a similar parallel region to parallelize along the n
dimension of the gemm macro-kernel.
commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
Merge: 67a8b949 6e7e4523
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 26 11:14:27 2013 -0500
Merge branch 'master' of https://code.google.com/p/blis
commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 26 11:12:37 2013 -0500
Added missing cpp kernel blocksize constraints.
Details:
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
constraints on the register blocksizes relative to the cache blocksizes.
Thanks to Tyler for helping me stumble across this issue.
commit 6e7e452343014e8f86640874dc1dbadca4a642a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 22 14:50:57 2013 -0500
Fixed minor warnings and misc issues.
Details:
- Fixed various warnings output by gcc 4.6.3-1, including removing some
set-but-not-used variables and addressing some instances of typecasting
of pointer types to integer types of different sizes.
commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 22 12:54:32 2013 -0500
Tightened some macros that detect datatypes.
Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
the "special" bit is taken into account so that BLIS_INT is differentiated
from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
being used.
commit b33e2f4443b9043b554963320280ff7783773652
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 19 17:15:03 2013 -0500
CHANGELOG update (for 0.0.9).
commit 0680916fdd532f7a4716b11a2515243b2c08d00f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 18 18:04:34 2013 -0500
Added BLAS error checking to compatibility layer.
Details:
- Added frame/compat/check directory, which now houses companion _check()
routines for each of the BLAS wrappers in frame/compat. These _check()
routines are called from the compatibility wrappers and mimic the
error-checking present in the netlib BLAS.
- Edited bla_xerbla.c so that xerbla() translates the operation string to
uppercase before printing.
- Redefined util routines in frame/compat/f2c/util in terms of level0
macros.
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
- Commented out prototypes in test/test_*.c since Fortran integers are now
int64_t by default (and the prototypes that were present in the files
used int).
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
since blis.h was already being included.
- Other minor changes to code in frame/compat/f2c.
commit 4e80ad28c97273db3366428ec44020da7944964d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 18 17:53:31 2013 -0500
Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
complex arithmetic to the scalar-level macros in include/level0. This
includes a somewhat substantial reorganization and re-layering of much
of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
commented-out by default, which optionally enables the use of built-in
C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
used (bli_proj_dt_to_real_if_imag_eq0).
commit 6072d7c848e837ba20d607f7b727438ada31bdcf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 17 12:27:45 2013 -0500
Fixed bugs in trsm, trmm macro-kernels.
Details:
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
trmm macro-kernels were updated in a similar fashion.
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
diagoffb when recomputing k to skip a zero region below where the
diagonal intersects the right side of the block. The corresponding
trmm macro-kernel was also updated.
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
needed to be placed AFTER the block that recomputes k to skip the zero
region (if present). The other three trsm macro-kernels, as well as the
trmm macro-kernels, were updated in the same manner, for consistency.
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
being updated to skip a zero region to the left of where the diagonal
of A intersects the top edge of the block.
- Comment updates to all trsm and trmm macro-kernels.
- Comment updates to bli_packm_init.c.
commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 10 14:53:59 2013 -0500
Added f2c'ed Givens rotation wrappers.
Details:
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
along with other wrappers for which no BLIS implementation exists.
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
rotm, and rotmg operations.
commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 10 13:40:12 2013 -0500
Removed copynz defs from bli_kernel.h files.
Details:
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
configuration. (Meant to include this in previous commit.)
commit aec12d90f596e8c04b1ad178258a1cd38108f59d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 10 13:33:30 2013 -0500
Removed copynzv, copynzm and related codes.
Details:
- Removed copynzv and copynzm operation directories. These operations
implemented a variation of copyv/m that, in the case of real source
and complex destination operands, leaves the imaginary component
untouched (rather than setting it to zero). I realize now that the
special case(s) (e.g. gemm with real A and B but complex C) that I
thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.
commit b0a0a0f274a761788531b5d281cc3b411b7124ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 9 17:15:38 2013 -0500
Added handling of restrict, stdint.h for non-C99.
Details:
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
manually typedefs the types we need (which, for now, are unconditionally
int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
nothing for C++ and non-C99.
commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 8 15:20:34 2013 -0500
Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
of char.
- Updated bla_amax() wrappers so that the return type is defined directly
as f77_int, rather than letting the prototype-generating macro decide
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
so I removed them. Also, changed the body of the wrapper so that a
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
doublecomplex types so that they use .real and .imag instead (now that
we are using scomplex and dcomplex).
commit 372501398564fdba3d5a3db86c30bc1039b185ff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 8 11:24:18 2013 -0500
Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
packs one micro-panel of B at a time. This is useful for certain
special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
commit 9915d667a79f23e3a2a2516247c560e9063a1646
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 7 13:28:39 2013 -0500
Defined "total" blocksize query functions.
Details:
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
the default blocksize plus blocksize extension (using the type or the type
of an object).
- Comment update in bli_packm_cxk.c.
commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 27 13:19:56 2013 -0500
Consolidated lower/upper her[2]k blocked variants.
Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
renamed the resulting variants, according to the same changes recently
made to trmm and trsm.
- Implemented support for four new subpartitions types:
BLIS_SUBPART1T
BLIS_SUBPART1B
BLIS_SUBPART1L
BLIS_SUBPART1R
which correspond to "merged" partitions that include the middle "1"
partition as well as either the neighboring "0" or "2" partition. This is
used to clean up code in herk/her2k var2 that attempts to partition away
the strictly zero region above or below the diagonal of a matrix operand
that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
zero region in the panel of C that is passed in. This is now needed given
that herk/her2k var1 no longer partitions off this zero region before
calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
commit 02002ef6f3d2746665982793db36714bd69bccc9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 24 17:08:14 2013 -0500
Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
side case is now handled explicitly, rather than induced indirectly by
transposing and swapping strides on operands. This allows us to walk through
the output matrix with favorable access patterns no matter how it is stored,
for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
lower/upper distinction. Instead, we simply label the variants by which
dimension is partitioned and whether the variant marches forwards or
backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
(as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
blocksize extensions (if non-zero) were not being used to appropriately size
the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
packing of A (left-hand operand) and B (right-hand operand) is done with
NR and MR, respectively (instead of MR and NR).
commit d1e81ddc848ee47bc188735883d14582bdd0cabc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 13 11:14:21 2013 -0500
Minor generalizing tweaks to trmm blk var1, var2.
commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 12 16:40:04 2013 -0500
CHANGELOG update.
commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 12 16:02:12 2013 -0500
Use separate CFLAGS for "kernels" directories.
Details:
- Added a new "special" directory type: any source code within directories
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
compiler flags. This allows the developer to specify a separate set of
flags (e.g. optimization flags) for compiling kernels while maintaining a
standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
according to above changes.
commit 08475e7c7653ba598665071a617d10f0d8f763c2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 11 12:18:39 2013 -0500
Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
packing from a lower or upper-stored symmetric/Hermitian matrix to column
panels (which are row-stored). Previously one could only pack to row panels
(which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
execution datatypes.
commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 7 11:04:10 2013 -0500
Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
schemes?" switch is disabled, which would result in redundant tests being
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.
commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 6 13:36:06 2013 -0500
Whitespace changes to old test drivers.
Details:
- Replaced tabs with four spaces in places where indention was already
in place.
commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 4 14:57:46 2013 -0500
Fixed unaligned handling in axpyf, dotxaxpyf.
Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
intrinsic implementation of dotxaxpyf kernel.
commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 16:54:52 2013 -0500
Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
commit 0288c827d3659bb225ac9c10f168b623ed0106a2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 1 08:02:23 2013 -0500
Updated ukernels for x86_64.
Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
recently).
commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 6 11:05:08 2013 -0500
Replaced axpys usage with subs in trsv.
Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
sizeof(dcomplex).
commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 24 16:28:10 2013 -0500
Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
unaligned subpartitions. We were already going out of our way a bit to
handle edge cases in the first iteration for blocked variants, and this
was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
into account how the choice of variant needed to be altered for
upper-stored matrices (given that only lower-stored algorithms are
explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
consistency with that of the bugfix for trmv/trsv (both of which now
use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
was invalid only because the code was expecting 1 (for purposes of
performing contiguous vector loads) but got a value greater than 1 because
the column stride of the object (e.g. rho) was inflated for alignment
purposes (albeit unnecessarily since there is only one element in the
object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
internal back-ends to correctly handle cases where output operand still
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 3 17:35:32 2013 -0500
Renamed _trans_status() macro.
Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
previous commit.
commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 3 17:24:58 2013 -0500
Renamed _set_trans(), _trans_status() macros.
Details:
- Renamed the following macros:
bli_obj_set_trans() -> bli_obj_set_onlytrans()
bli_obj_trans_status() -> bli_obj_onlytrans_status()
to remove ambiguity as to which bits are read/updated.
commit 2f8174509ea9f844db11ebd9389de5168e85b132
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 1 15:06:30 2013 -0500
Unconditionally check memory pool(s) for errors.
Details:
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
memory pool is exhausted before checking out and returning a block, even
if BLIS error checking has been disabled. These errors are useful because
they likely indicate that BLIS was improperly configured for the code
being run.
commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 1 15:00:30 2013 -0500
CHANGELOG update.
commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 30 19:35:54 2013 -0500
Absorbed blocksize extensions into main objects.
Details:
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
happens if and only if cache blocksizes are created with non-zero
extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
blocksize extension mechanism is now available for use.
commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 17:16:59 2013 -0500
Added option to disable err checking in testsuite.
Details:
- Added a new line to input.general that allows one to specify the error-
checking level to use for each BLIS experiment. The only two levels
supported for now are "no error checking" and "full error checking".
commit 096b366ddcfe386f44419ef84d8df8be13825f86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 16:43:43 2013 -0500
Use cntl trees that block in n dimension.
Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
algorithms that first paritition in the n dimension with a blocksize
of NC. Typically this is not an issue since only very large problems
exceed that of NC. But developers often run very large problems, and
so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
bli_param_macro_defs.h.
commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 12:06:12 2013 -0500
Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
commit df80acf517dde180ddcc5835c6136b2fa7556d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 19:43:23 2013 -0500
Fixed computation of b_next in L3 macro-kernels.
Details:
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
and trsm, in that the edge cases are captured by the main loop, rather
than trying to have "cleanup" sections that result in four distinct
parts (interior, bottom edge, right edge, bottom-right edge) of the
code.
- Fixed the way b_next was being computed in the non-gemm level-3
macro-kernels (herk, trmm, trsm). The way they are computed now matches
that of gemm.
commit 3671528cf8efe4b445d196665143a5c50c2c6048
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 19:12:14 2013 -0500
Fixed minor bug in computing b_next in gemm.
commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 17:49:10 2013 -0500
Fixed rare edge case bug in herk_l macro-kernel.
Details:
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
chosen to be much larger than NR, then one could encounter edge cases
in the the MC dimension that fall entirely below the diagonal, which
the previous implementation of the herk_l macro-kernel was not allowing
for.
commit 1dab11e37d1cb403cbe75b73a644c00de534f104
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 17:17:11 2013 -0500
Updated x86 gemmtrsm ukernels to use alpha.
commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 16:00:18 2013 -0500
Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
addresses of the next micro-panels of A and B. By passing these
pointers into the micro-kernel, we allow the micro-kernel author to
prefetch micro-panels of A and B as necessary (though this is
completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
a_next and b_next. Note that ONLY the gemm macro-kernel computes
a_next and b_next with the precise semantics we want. I will go back
and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
commit f3815dc84d385c514a5acaf1e925424a57be2f51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 11:12:33 2013 -0500
Added code for backward edge-case blocking.
Disabled:
- Edited bli_determine_blocksize_b() to include experimental (and
currently disabled) code that computes extended blocks.
- Updated commnts relate to above changes.
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 22 19:00:43 2013 -0500
Updated dupl implementation to use PACKNR and NR.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
explicitly so navigate b1 so that situations where PACKNR > NR are
supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 21 15:10:34 2013 -0500
Disabled blocksize checks for memory pools.
Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
by the contiguous memory allocator for all types, given that the values for
double precision real are the ones used to allocate the space. These checks
can easily go awry in certain situations, especially if you are developing for
only one datatype. So for now, they are probably more trouble than they are
worth.
commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 21 15:00:24 2013 -0500
Allow ldim of packed micro-panels != MR, NR.
Details:
- Made substantial changes throughout the framework to decouple the leading
dimension (row or column stride) used within each packed micro-panel from
the corresponding register blocksize. It appears advantageous on some
systems to use, for example, packed micro-panels of A where the column
stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
to use when packing micro-panels of A and B.
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
where appropriate, instead of MR and NR.
- Added pd field (panel dimension) to obj_t.
- New interface to bli_packm_cntl_obj_create().
- Renamed bli_obj_packed_length()/_width() macros to
bli_obj_padded_length()/_width().
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
- Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
blocksize for edge cases, which can improve performance at the margins.
commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 19 15:26:29 2013 -0500
Fixed bug in compatibility layer (her2k/syr2k).
Details:
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
Marker for reporting the behavior that led to this bug.
commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 19:39:13 2013 -0500
Changed old level3 test drivers to call front-ends.
Details:
- Changed old level-3 test drivers, in 'test' directory, to always call the
front-end object API instead of the internal back-end with the locally
defined control tree.
commit 83e45de23e565138b8fde06fb11cfedc973b7246
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 18:33:03 2013 -0500
Allow packm_init() to reacquire a too-small mem_t.
Details:
- Changed bli_packm_init() to react differently to a situation where a pack
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
than what will be needed to hold the block/panel that now needs to be
packed. Previously, this situation was treated with an abort() since I
assumed something was horribly wrong. I have changed the code so that it now
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
new information. (This change was done at the request of Bryan Marker to
facilitate code generation via DxT.)
commit a6990434173b0cf651f8521194f3aef738deb7d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 13:52:47 2013 -0500
Fixed bug in packing block of A for hemm/symm.
Details:
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
symmetric matrix where the block of A being packed intersects the diagonal,
but some of its micro-panels do not intersect the diagonal and lie completely
in the unstored region. Thanks to Francisco Igual for reporting this bug.
- Comment updates to both _blk_var2.c and _blk_var3.c.
commit c92e7590e1934f830814ab614c794215ebe0c415
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 17 20:53:29 2013 -0500
Activated bli_packm_acquire_mpart_t2b().
Details:
- Removed the overly-paranoid bli_abort() from the end of
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
partitioning through packed blocks of A. Also, and more importantly,
changed an earlier check that was causing an erroneous (but
coincidentally redundant) abort(). Also, updated some of the comments
in bli_packm_part.c.
commit bea579e9f009a44e08008eb14d09f38748ab2b53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 19:43:14 2013 -0500
Allow creation of "empty" objects.
Details:
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
modified bli_adjust_strides() to explicitly handle m = n = 0.
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
commit 7904e20f2e6908571ee5008da2a08084198eefae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 17:37:16 2013 -0500
Fixed "root" object bug in bli_her[2]k/syr[2]k.
Details:
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
that manifested as the incorrect triangle being updated. It occurred when
the user would pass in a matrix object that was correctly marked as
symmetric/Hermitian and lower-stored, but whose root object was never marked
as lower (or upper). We now alias and re-assign root status for matrix C
within the front-ends. Note that trmm and trsm were already doing this,
albeit for a slightly different reason (to allow the internal back-end to
choose which algorithm to run--lower or upper--based on the uplo of the root
object for both left and right side cases). Thanks to Bryan Marker for
leading me to this bug.
commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 11:24:03 2013 -0500
Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
and not just a proper floating-point type. (If it is a constant, default to
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 19:27:57 2013 -0500
Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
call a corresponding packm_part routine instead. However, it was
unintentionally catching matrices that were marked as "packed" by virtue
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
checks for row or column panel packing. (Note that I first attempted to
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
erroneous behavior that led me to this bug.
commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 17:54:43 2013 -0500
Removed local reference ukernel blocksize macros.
Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
reference microkernel definition and header. Meant to include this in
a recent/previous commit (0020ef7c8271).
commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 14:40:31 2013 -0500
Formatting change to mods in previous commit.
commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 14:31:40 2013 -0500
Set structure of objects in level-2 BLIS APIs.
Details:
- Added missing statement to set structure field of local objects in
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
reporting this bug.
commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 10:21:26 2013 -0500
Tweak to test suite function string construction.
Details:
- Fixed a minor bug in the way that the test suite would construct function
name strings when the user anchored all parameters in input.operations.
In this case, the test driver would mistake this situation for one where
the operation simply had no parameters to begin with, and thus would not
include the parameter string in the function string that is output for
every result.
commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 09:59:46 2013 -0500
Fixed a bug in reference implementation of dupl.
Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
creation interface.
- Added 'restrict' to x86 gemm microkernel interface.
commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 14 19:05:33 2013 -0500
Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
_kernel.h includes an optimized microkernel header, which uses BLIS types
such as dim_t and inc_t, which would precede the definition of those types
in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
(immediately after that of bli_kernel.h).
commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 16:53:16 2013 -0500
CHANGELOG update.
commit ec16c52f2ecf419c749175ce0a297441c10f1c68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 16:41:16 2013 -0500
Updated INSTALL file (now redirects to website).
commit 0020ef7c82711a7ebf08e5174f939bee2563184c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 15:26:35 2013 -0500
Removed gemmtrsm-, trsm-specific blocksize macros.
Details:
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
instead of operation-specific ones.
- Removed local, gemmtrsm-specific blocksize macro definitions found in
micro-kernel header files.
(Meant to include above changes in 31b100e7bf4a.)
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 12 15:25:54 2013 -0500
Added/renamed alignment constants to _config.h.
Details:
- Added new memory alignment constants:
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
and renamed existing ones
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
to better convey what the alignment factor is used for (and what it is
not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
use bli_align_dim_to_size(), which takes a third argument (the desired
alignment).
commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 12 11:40:55 2013 -0500
Fixed an bug in axpyv/axpym when alpha is unit.
Details:
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
this bug.
commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 16:39:25 2013 -0500
Moved _POSIX_C_SOURCE def to compiler cmd line.
Details:
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
the compiler command line arguments in make_defs.mk (for both configs).
Thanks to Devin Matthews for suggesting this change.
commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 16:28:17 2013 -0500
Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
would not conflict with anything defined by the user (or the language).
Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 11:11:52 2013 -0500
Added new kernel blocksize macro aliases.
Details:
- Added new macros that alias level-3 cache and register blocksize macros
to names that can be constructed via the PASTEMAC macro. These aliased
macro definitions live inside bli_kernel_macro_defs.h, which is now
#included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
(found in macro-kernel header files).
commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 10:35:39 2013 -0500
Updated CREDITS file.
commit 79328c15410215737f3f14cd069328cf52aa11fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 10:32:14 2013 -0500
Reverted testsuite object files' home to 'obj'.
Details:
- Removed 'obj' and 'lib' from .gitignore.
- Added testsuite/obj/.gitkeep (which is an empty file).
- Updated testsuite/Makefile accordingly.
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
empty directories in git.
commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 17:45:39 2013 -0500
Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
commit 357893f5be5c56ab7b062874005e77e614b23f06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 14:48:15 2013 -0500
Applied fix from prev commit to gemmtrsm_?_ref_4x4
Details:
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
bli_gemmtrsm_u_ref_4x4.c.
commit 54988e8dca44475610bcaee5a7bc1c40e8921402
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 19:08:43 2013 -0500
Fixed a performance bug in trsm.
Details:
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
reference gemm microkernel was hard-coded, and thus always called, even
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
manifested as artificially low trsm performance for all problem sizes, but
especially for small problem sizes as it only affected blocks of A that
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
find this bug.
commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 16:08:22 2013 -0500
Generate testsuite objects 'src'.
Details:
- Tweaked the testsuite makefile so that object files are stored in 'src'
rather than 'obj', since (a) the top-level .gitignore dictates that
obj directories are to be ignored, and (b) since git has problems
tracking empty directories. Now, users do not need to create their own
obj directories within their own local clones of BLIS.
commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 15:18:42 2013 -0500
Minor formatting changes.
commit a571af816d72727e16cad37007e7043b9d6fa362
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 15:00:13 2013 -0500
Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
value of the pack schema bits in the info field of obj_t, rather than
comparing the obj_t buffer with that of the mem_t entry. This was the cause
of a very low probability bug whereby uninitialized memory caused the macro
to evaluate to TRUE even though the object in question was not packed.
Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 6 12:54:45 2013 -0500
Updated information in testsuite output header.
Details:
- Added to the information that is echoed at the beginning of the test suite's
output, and also re-labeled some existing information.
commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 5 17:19:43 2013 -0500
Fixed edge case handling bug in herk macrokernels.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
only manifests when BLIS is configured such that MR != NR. The bug involves
incorrectly detecting edge cases, which resulted in some parts of matrix C
potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
8 and 4, respectively, so that I can better stress the framework on a
day-to-day basis. (The fact that they were both equal to 4 for so long is
why I did not stumble upon this bug much sooner.)
commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 4 15:25:43 2013 -0500
Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
contain explicit loops over MR and NR, thus allowing them to be used
unmodified by developers who want to build a reference library with
custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
single-char macros are also defined.
commit 6684b73d5501f91d24a79e26655a42819c9b3114
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 2 13:06:20 2013 -0500
Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
used for.
commit fb68087f8727cd5fd656a742a110e54fb1c91db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 15:10:16 2013 -0500
More memory alignment-related tweaks.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
contiguous memory alignment (rather than the system/malloc alignment).
commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 14:14:53 2013 -0500
Always define memory alignment size cpp constant.
Details:
- Removed guard around #define for memory alignment size constant.
Memory alignment should always be enabled, and so this value should
always be defined.
commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 13:59:19 2013 -0500
Renamed memory alignment macro constant.
Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
BLIS_MEMORY_ALIGNMENT_SIZE.
commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 12:43:14 2013 -0500
Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
subsequent packed panel of A and B begins at an aligned address. (The
first panel is presumably aligned to system alignment because it is
aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
is used to allocate enough space for each block no matter what kind of
register blocking is used (even if register blocksize is unit and every
row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
and KC*NC result in identical footprints for all datatypes.
commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 20:18:12 2013 -0500
CHANGELOG update.
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 20:01:49 2013 -0500
Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 18:49:36 2013 -0500
Removed several 'old' directories and files.
Details:
- Removed most of the 'old' directories scattered throughout the framework,
which includes alternate/half-baked/broken implementations.
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 18:00:10 2013 -0500
Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
header files throughout the framework. Given that these low-level headers
are included within #blis2.h in a very specific order, #include'ing blis2.h
within them directly is unnecessary.
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 17:18:58 2013 -0500
Added cpp guards to conflicting libflame typedefs.
Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
This is a temporary hack to allow interoperability with libflame. (Similarly
temporary changes are being made to libflame's type definitions file.)
commit f469907503fcdc24dff0174c569170e6e756e045
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:20:15 2013 -0500
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
(e.g. "prefetch" instructions, which are different than the particular
kind of prefetching/preloading referred to by this constant).
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:09:59 2013 -0500
Removed build/old directory.
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:07:01 2013 -0500
Deprecated 'flame' configuration.
Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 18:07:40 2013 -0500
Added missing conjbeta argument to scald.
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 15:37:20 2013 -0500
Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
m_packed and n_packed fields to obj_t. These new fields track the same as
the old ones. From an abstraction standpoint, it seemed awkward to store
those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
respectively.
- Updated packm variants to access the packed length and width fields from
their new locations.
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 10:37:03 2013 -0500
CHANGELOG update.
commit e7d41229d3b1674e74f47d7f29fae004a745201a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 17:12:36 2013 -0500
Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
allocator instantiates and initializes three separate memory pool objects,
each one associated with a separate array of contiguous memory blocks, each
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
objects use a stack structure internally to track which blocks in the region
have been "checked out" to a thread and which are still available. Critical
regions are now clearly marked and adaptable to parallel environments (e.g.
OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
of packed buffer is being allocated. The enumerated type for this argument
is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
a dimension to the system alignment, since that value has to do with
starting addresses, whereas the values we are dealing with are unitless
dimensions.
commit 1e76cae00cb0a04544aaae1ade878686b238d283
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 12:21:42 2013 -0500
Perform her2k var1 loops in sequence.
Details:
- Changed variant 1 of her2k so that the two rank-k products are computed
and accumulated in sequence rather than fused into one loop. This is
necessary if BLIS is to be configured to provide only enough contiguous
memory for one panel of B.
commit c95c270eba91ae4efc26603beddfd0292caa919b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 14:42:15 2013 -0600
Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
opposed to the currently used) dimensions of the memory region. This
allows packm_init() to be more robust in situations where memory is
already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
in packm_init(), to update the "currently used" dimensions of the mem_t
object if the requested dimensions are smaller than the allocated
dimensions.
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 14:00:10 2013 -0600
Fixed test suite flop formulas for ops with side.
Details:
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
trmm3, and trsm.
- Comment updates in herk macro-kernels.
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 2 12:47:06 2013 -0600
Added "version" to .gitignore.
Details:
- Added "version" to .gitignore file so that the file does not show up when
running 'git status', or accidentally get pulled into the index when
running 'git add' or 'git add --all'.
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 2 12:43:54 2013 -0600
Removed version file from version control.
Details:
- Removed version file from version control to prevent git errors that occur
when trying to pull new commits.
commit bb612f864e9c17dd9805e9446840f02259619469
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 1 12:55:42 2013 -0600
Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
updated as part of the macro. All current uses of the macro have been
coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
commit f24e29b789e7314764a818ceb3063126936c986f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 18:15:41 2013 -0600
Replaced banded/packed BLAS2 stubs with f2c code.
Details:
- Retired the blas2blis wrappers that simply called abort with a "not yet
implemented" message. This includes all of the level-2 banded and packed
routines.
- Replaced the aforementioned with the corresponding netlib implementations
having been run through f2c (with some customization).
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
commit 1454c1a14207766dfed372b8e38b47fa384f5198
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 12:38:45 2013 -0600
Moved Fortran name-mangling macro to bl2_config.h.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
configuration directory (bl2_config.h, specifically) given that it can be
expected to be tweaked by some developers.
commit ede75693e5a36c6006087c4a7df834175b604504
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 12:11:24 2013 -0600
Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
includes virtually all of the BLAS, including banded and packed level-2
operations.
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
initialization, which stores the "exit status" in an err_t, which is then
read by the latter function to determine whether finalization should actually
take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
it automatically initializes itself (via bl2_init_safe()), until/unless the
application code explicitly calls bl2_finalize().
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
wrappers.
commit 995edf43e21c1868732dbdd7fee14b08730218bd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 21 14:30:50 2013 -0600
Updated version file. (Forgot to in prev commit).
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 21 12:00:17 2013 -0600
Fixed some scalar types in BLAS-like Herm APIs.
Details:
- Some of the scalars of Hermitian operations, such as alpha in her,
alpha and beta in herk, and beta in her2k, need to be real. These
arguments were typed incorrectly as the complex types. This has been
fixed. Note the issue was only present in the BLAS-like APIs for
these operations (not the native object-based interfaces).
commit 5ece050a669e74ba4a711d1d4669239d22d45642
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 15:50:54 2013 -0600
Updated version file. (Forgot to in prev commit).
commit f243034b8b430d4684680ea8eddfd246e73fefc0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 14:11:36 2013 -0600
Changed API of packm_init_pack() to use blksz_t.
Details:
- Changed the interface of packm_init_pack() so that mult_m and mult_n
are passed in as type blksz_t* instead of dim_t.
- Make similar change for packv_init_pack().
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 15 09:59:48 2013 -0600
Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
in only the strictly lower or upper triangle of the matrix being modified.
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 10:42:56 2013 -0600
Updated beta == zero semantics of mulsc.
Details:
- Updated beta == zero semantics of mulsc. Hopefully this is the last
operation that needed updating.
- Added Devin to CREDITS file.
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 10:18:00 2013 -0600
Removed some calls to setv() in test modules.
Details:
- Removed calls to setv() in test modules whose sole purpose was to
initialize vectors to zero to ensure that nan's and inf's would not
taint the computation. Now that beta == zero semantics have been
updated to clear the output operand (when beta is zero), rather than
multiply against it, these setv() calls are no longer needed.
commit e6ac623a902f776c42f85eadbf76996d9770a0db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 18:44:59 2013 -0600
Properly implemented beta == 0 semantics.
Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
respectively.
- Added code to the following operations that sets the output operand to
zero if the corresponding scalar is zero (rather than performing the
floating-point multiply, or in the case of setv, copying the value).
This will prevent nan's and inf's from creeping into results from
uninitialized memory.
- axpy
- dotxv
- scalv
- scal2v
- setv
- gemv
- ger
- hemv
- her
- her2
- gemm reference ukernels
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 18:29:53 2013 -0600
Fixed stale interface to packm_unb_var1().
Details:
- Removed the control tree from the interface to packm_unb_var1(), which
I meant to do when it was un-deprecated.
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 13:21:00 2013 -0600
Un-deprecated packm_unb_var1.c (needed by l2 ops).
Details:
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
operations still need this routine for packing matrices. Now, whether
level-2 operations should be packing matrices to begin with is another
matter. But this fixes the segmentation fault one would have gotten when
running bl2_gemv() on a general stride matrix.
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 18:39:35 2013 -0600
Removed cntl tree usage from packm implementation.
Details:
- Added new fields to obj_t info field:
- invert_diag
- pack_order_if_upper
- pack_order_if_lower
These fields allow packm_init() to embed information that begins
in the control tree into the object so that the packm implementation
does not need to use control trees at all. This is being done to aid
Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
These were part of prototype implementations and are no longer needed.
commit eb139ae256651af7820b93ef982626180195b87f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 12:39:30 2013 -0600
Replaced bl2_abs() with _fabs() where appropriate.
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 12:23:48 2013 -0600
Removed level-0 macros projrs, grabis.
Details:
- Replaced instances of projrs and grabis macros with newer,
more general-purpose getris.
commit 03a260a457c8964e4603a655cee0d40ac17affba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 11:45:34 2013 -0600
Restored executable permissions to scripts.
Details:
- Restored executable (0755) permissions to scripts that were touched by
the recursive sed script that updated the copyright headers in the
previous commit.
commit 1274e1243775e5e705114257a43176f63635227f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 14:37:47 2013 -0600
Updated copyright headers from 2012 to 2013.
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 13:38:07 2013 -0600
CHANGELOG update.
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 13:20:44 2013 -0600
Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
By encoding both pieces of information into one constant in _kernel.h,
it seems somewhat less likely others will encounter this bug in the
future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
and defined blocksizes in _cntl.c files to these default values.
- Changed semantics of her2k and syr2k such that these operations no longer
expect the B matrix to already be conjugate-transposed (or just transposed
for syr2k). However, these semantics are preserved for the internal
mechanics of the implementations, including the internal back-end and all
blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
respectively.
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
replaced by selecting either the real part or both parts within the unblocked
algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
verify that alpha has an imaginary component equal to zero (for her, but
not syr).
- Changed control tree for her to forgo packing.
- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
(two-operand add and subtract).
- Added new scalar macros:
- getris: acquire real and imaginary components.
- setris: set real and imaginary components.
- addjs: addition with conjugated x.
- subjs: subtraction with conjugated x.
- Defined new utility operations:
- absumv: element-wise sum of absolute values for vector elements.
- absumm: element-wise sum of absolute values for matrix elements.
- mkherm: convert existing matrix to Hermitian.
- mksymm: convert existing matrix to symmetric.
- mktrim: convert existing matrix to triangular.
- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.
- Fixed a bug in the her2k macro-kernels (which currently are simply
implemented in terms of two invocations of herk) whereby beta was being
applied to both the first and second rank-k updates, rather than only
the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
properly implemented due to erroneous assumptions regarding aliasing and
root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
value was typecast to float before inversion. This affected non-unit cases
of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
now mimics the rank-k strategy of gemm, whereby alpah is applied during
the first iteration of variant 3, with BLIS_ONE passed in instead for
subsequent iterations. This also required passing alpha into the macro-
kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
called for blocks strictly above the diagonal. While this sounds good in
theory, this cannot be done because gemm_ker_var2 expects row panels of
A to be packed from top to bottom, while for trsm_u, A is actually packed
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
dimensions were mishandled due to incorrect arguments to the copyv kernel.
Also changed the copyv kernel invocation to scal2v so that these edge
cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
a potential future bug whereby a mem_t object that is actually no longer
"allocated" from the static pool is mistaken for being allocated due to
failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
toggled when the requested subpartition needed to be "reflected" due to it
residing in an unstored region.
commit be94fb84c0351602d7585269f29998e3bf83f899
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 4 10:55:21 2013 -0600
Added missing 'd' to fused gemmtrsm function name.
commit 879a179e1dee36f0c56765f2ab91a26861019b34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 4 10:37:27 2013 -0600
Added debug statements to bl2_mm_acquire_m().
Details:
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
with prematurely exhausted memory pool.
- Removed 'd' from kernel names of reference kernels in clarksville
configuration's bl2_kernel.h
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 17:07:50 2012 -0600
Defined Frobenius norm operations.
Details:
- Added level-0 grabis macro operation to grab imaginary component of one
variable and copy it to the real component of another variable.
- Defined sumsqv operation, which computes the sum of the absolute squares
of the elements of a vector. This implementation is modeled after ?lassq
in netlib LAPACK.
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
vectors and matrices, respectively. These operations are treated as one-
operand operations where the output norm value is the real projection of
the datatype of the input operand. Both operations are implemented in terms
of sumsqv.
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 17:02:55 2012 -0600
Added GENT*R macros; tweaked bl2_machval defs.
Details:
- Added function and prototype macro-generating macros for GENTFUNCR and
GENTPROTR, which are one-operand macros with auxiliary real projection
types.
- Tweaked bl2_machval files to use new macros.
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 11:35:14 2012 -0600
Fixed harmless macro bug in level-1m operations.
Details:
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
despite the bug, which is why I had not discovered it until now.
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 15:07:36 2012 -0600
Renamed x86,x86_64 kernels to indicate 'd' fusing.
Details:
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
to emphasize that the fusing shape is not for all datatype instances, but
rather just for one (that of double-precision real). Other fusing shapes
would be proportional to their precision and domain "byte footprints".
- Corresponding changes to config/clarksville/bl2_kernel.h.
commit 6fbbdd4e194d06096ad08c5db61127be338067db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:34:02 2012 -0600
More tweaks to _config.h, _kernel.h; smem tweaks.
Details:
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
accomplishes the same thing (enabling posix_memalign()) without enabling
all of the GNU extensions we don't need.
- Defined the size of the static memory pool in terms of MC, KC, and NC,
as well as two new constants that determine how many MCxKC blocks and
how many KCxNC blocks should be allocated (defined in bl2_config.h).
- In the case of static memory pool exhaustion, replaced the generic
bl2_abort() with a specific error code call.
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 16:07:36 2012 -0600
Minor reordering of bl2_config.h definitions.
commit 4a83f67490136a898f558e273b76a687aed8b893
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 12:35:54 2012 -0600
Consolidated configuration headers.
Details:
- Merged contents of bl2_arch.h into bl2_config.h for reference and
clarksville configurations.
- Updated CREDITS, INSTALL, LICENSE, README files.
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 14 12:45:26 2012 -0600
Fixed bug in reference gemm ukernels.
Details:
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
was not correctly accumulated and scaled (by alpha) into the output matrix
C. (Thanks to Fran for finding this bug.)
- Whitespace changes to reference trsm kernels.
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 13 18:17:54 2012 -0600
Expanded reference packm/unpackm kernel set to 16.
Details:
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
unpackm.
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
kernel size is requested. (Thanks to Tyler for finding this bug.)
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
to above changes, for 'reference' and 'clarksville' configurations.
- Updated CHANGELOG.
- Removed "output*.m" from .gitignore.
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 17:23:32 2012 -0600
Minor updates towards to 0.0.1.
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 16:18:40 2012 -0600
Tweaks to get BLIS compiling again on clarksville.
Details:
- Updated header files and make_defs.mk in config/clarksville.
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
- Shuffled include statements in blis2.h.
commit cc58ea86010b1f046134d13b546c878389df9af5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 14:55:12 2012 -0600
Added template fragment.mk; updated .gitignore.
commit 714c527b0eb153b7e2040b79349edc8372f743fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 19:54:04 2012 -0600
Added 'changelog' make target; other tweaks.
Details:
- Updated CHANGELOG.
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
overwrites CHANGELOG with the output.
- Other trivial changes.
commit e4e5404d26aded4873278e85faf6f14ac32115b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 17:34:53 2012 -0600
Define static memory pool size in bl2_config.h.
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 17:18:00 2012 -0600
Refined INSTALL text; added 'showconfig' target.
Details:
- Added 'showconfig' target to Makefile.
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
to object file rules.
- Added config.mk as prerequisite to library install rules.
- Edited and added to INSTALL file.
commit 26cb659dd79636489db5a051aa60fff80273a7b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 15:34:53 2012 -0600
Added auto-detection of version string (via git).
Details:
- Added build/update-version-file.sh script for auto-detecting "version"
string and updating 'version' file accordingly. (If .git directory is
not present, then it is assumed this copy of BLIS is a downloaded
release, in which case 'version' file is left unchanged.)
- Added invocation of update-version-file.sh to configure script.
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 14:27:11 2012 -0600
Wrote first draft of INSTALL file.
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 12:42:35 2012 -0600
Updated standalone test Makefile and other fixes.
Details:
- Major edits to test/Makefile to bring up-to-date wrt new build system;
should no longer be broken.
- Minor edits to top-level Makefile.
- Fixed copy-and-paste bugs in
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
commit 2f272b40f43307909736327f49d17737c7a05d37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 4 19:22:14 2012 -0600
Added build system and continued reorganization.
Details:
- Added/renamed packm, unpackm kernels.
- Added machine value routines.
- Added param_map facility.
- Renamed AUTHORS to CREDITS.
- Added Makefile; continued to expand upon existing configure script.
- #define fuse_fac macros in operation headers if not defined already
(by the user in bl2_kernels.h).
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 3 12:36:11 2012 -0600
Initial commit.
|