1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557
|
cimport libc.stdio
from libc.stdlib cimport malloc, calloc, free
from cpython.mem cimport PyMem_Malloc, PyMem_Free
from libc.string cimport strncmp
cimport numpy as np
import numpy as np
import sys
from cpython cimport Py_INCREF, PyNumber_Index
from cpython.object cimport Py_EQ, Py_NE
def api_version():
"""api_version()
"""
# (library version, module version)
return (GPUARRAY_API_VERSION, 0)
def abi_version():
"""abi_version()
"""
major_version = GPUARRAY_ABI_VERSION / 1000
minor_version = GPUARRAY_ABI_VERSION % 1000
return (major_version, minor_version)
np.import_array()
# to export the numeric value
SIZE = GA_SIZE
SSIZE = GA_SSIZE
# Numpy API steals dtype references and this breaks cython
cdef object PyArray_Empty(int a, np.npy_intp *b, np.dtype c, int d):
Py_INCREF(c)
return _PyArray_Empty(a, b, c, d)
cdef bytes _s(s):
if isinstance(s, unicode):
return (<unicode>s).encode('ascii')
if isinstance(s, bytes):
return s
raise TypeError("Expected a string")
cdef size_t countis(l, object val):
cdef size_t count
cdef size_t i
count = 0
for i in range(len(l)):
if l[i] is val:
count += 1
return count
def cl_wrap_ctx(size_t ptr):
"""
cl_wrap_ctx(ptr)
Wrap an existing OpenCL context (the cl_context struct) into a
GpuContext class.
"""
cdef gpucontext *(*cl_make_ctx)(void *, int)
cdef GpuContext res
cl_make_ctx = <gpucontext *(*)(void *, int)>gpuarray_get_extension("cl_make_ctx")
if cl_make_ctx == NULL:
raise RuntimeError, "cl_make_ctx extension is absent"
res = GpuContext.__new__(GpuContext)
res.ctx = cl_make_ctx(<void *>ptr, 0)
if res.ctx == NULL:
raise RuntimeError, "cl_make_ctx call failed"
return res
def cuda_wrap_ctx(size_t ptr, bint own):
"""
cuda_wrap_ctx(ptr)
Wrap an existing CUDA driver context (CUcontext) into a GpuContext
class.
If `own` is true, libgpuarray is now responsible for the context and
it will be destroyed once there are no references to it.
Otherwise, the context will not be destroyed and it is the calling
code's responsibility.
"""
cdef gpucontext *(*cuda_make_ctx)(void *, int)
cdef int flags
cdef GpuContext res
cuda_make_ctx = <gpucontext *(*)(void *, int)>gpuarray_get_extension("cuda_make_ctx")
if cuda_make_ctx == NULL:
raise RuntimeError, "cuda_make_ctx extension is absent"
res = GpuContext.__new__(GpuContext)
flags = 0
if not own:
flags |= GPUARRAY_CUDA_CTX_NOFREE
res.ctx = cuda_make_ctx(<void *>ptr, flags)
if res.ctx == NULL:
raise RuntimeError, "cuda_make_ctx call failed"
return res
import numpy
cdef dict NP_TO_TYPE = {
np.dtype('bool'): GA_BOOL,
np.dtype('int8'): GA_BYTE,
np.dtype('uint8'): GA_UBYTE,
np.dtype('int16'): GA_SHORT,
np.dtype('uint16'): GA_USHORT,
np.dtype('int32'): GA_INT,
np.dtype('uint32'): GA_UINT,
np.dtype('int64'): GA_LONG,
np.dtype('uint64'): GA_ULONG,
np.dtype('float32'): GA_FLOAT,
np.dtype('float64'): GA_DOUBLE,
np.dtype('complex64'): GA_CFLOAT,
np.dtype('complex128'): GA_CDOUBLE,
np.dtype('float16'): GA_HALF,
}
cdef dict TYPE_TO_NP = dict((v, k) for k, v in NP_TO_TYPE.iteritems())
def register_dtype(np.dtype dtype, cname):
"""
register_dtype(dtype, cname)
Make a new type known to the cluda machinery.
This function return the associted internal typecode for the new
type.
Parameters
----------
dtype: numpy.dtype
new type
cname: str
C name for the type declarations
"""
cdef gpuarray_type *t
cdef int typecode
cdef char *tmp
t = <gpuarray_type *>malloc(sizeof(gpuarray_type))
if t == NULL:
raise MemoryError, "Can't allocate new type"
tmp = <char *>malloc(len(cname)+1)
if tmp == NULL:
free(t)
raise MemoryError
memcpy(tmp, <char *>cname, len(cname)+1)
t.size = dtype.itemsize
t.align = dtype.alignment
t.cluda_name = tmp
typecode = gpuarray_register_type(t, NULL)
if typecode == -1:
free(tmp)
free(t)
raise RuntimeError, "Could not register type"
NP_TO_TYPE[dtype] = typecode
TYPE_TO_NP[typecode] = dtype
cdef np.dtype typecode_to_dtype(int typecode):
res = TYPE_TO_NP.get(typecode, None)
if res is not None:
return res
else:
raise NotImplementedError, "TODO"
# This function takes a flexible dtype as accepted by the functions of
# this module and ensures it becomes a numpy dtype.
cdef np.dtype dtype_to_npdtype(dtype):
if dtype is None:
return None
if isinstance(dtype, int):
return typecode_to_dtype(dtype)
try:
return np.dtype(dtype)
except TypeError:
pass
if isinstance(dtype, np.dtype):
return dtype
raise ValueError("data type not understood", dtype)
# This is a stupid wrapper to avoid the extra argument introduced by having
# dtype_to_typecode declared 'cpdef'.
cdef int get_typecode(dtype) except -1:
return dtype_to_typecode(dtype)
cpdef int dtype_to_typecode(dtype) except -1:
"""
dtype_to_typecode(dtype)
Get the internal typecode for a type.
Parameters
----------
dtype: numpy.dtype
type to get the code for
"""
if isinstance(dtype, int):
return dtype
try:
dtype = np.dtype(dtype)
except TypeError:
pass
if isinstance(dtype, np.dtype):
res = NP_TO_TYPE.get(dtype, None)
if res is not None:
return res
raise ValueError, "don't know how to convert to dtype: %s"%(dtype,)
def dtype_to_ctype(dtype):
"""
dtype_to_ctype(dtype)
Return the C name for a type.
Parameters
----------
dtype: numpy.dtype
type to get the name for
"""
cdef int typecode = dtype_to_typecode(dtype)
cdef const gpuarray_type *t = gpuarray_get_type(typecode)
cdef bytes res
if t.cluda_name == NULL:
raise ValueError, "No mapping for %s"%(dtype,)
res = t.cluda_name
return res.decode('ascii')
cdef ga_order to_ga_order(ord) except <ga_order>-2:
if ord == "C" or ord == "c":
return GA_C_ORDER
elif ord == "A" or ord == "a" or ord is None:
return GA_ANY_ORDER
elif ord == "F" or ord == "f":
return GA_F_ORDER
else:
raise ValueError, "Valid orders are: 'A' (any), 'C' (C), 'F' (Fortran)"
cdef int strides_ok(GpuArray a, strides):
# Check that the passed in strides will not go outside of the
# memory of the array. It is assumed that the strides are of the
# proper length.
cdef ssize_t max_axis_offset
cdef size_t lower = a.ga.offset
cdef size_t upper = a.ga.offset
cdef size_t itemsize = gpuarray_get_elsize(a.ga.typecode)
cdef size_t size
cdef unsigned int i
gpudata_property(a.ga.data, GA_BUFFER_PROP_SIZE, &size)
for i in range(a.ga.nd):
if a.ga.dimensions[i] == 0:
return 1
max_axis_offset = <ssize_t>(strides[i]) * <ssize_t>(a.ga.dimensions[i] - 1)
if max_axis_offset > 0:
if upper + max_axis_offset > size:
return 0
upper += max_axis_offset
else:
if lower < <size_t>(-max_axis_offset):
return 0
lower += max_axis_offset
return (upper + itemsize) <= size
class GpuArrayException(Exception):
"""
Exception used for most errors related to libgpuarray.
"""
class UnsupportedException(GpuArrayException):
pass
cdef type get_exc(int errcode):
if errcode == GA_VALUE_ERROR:
return ValueError
if errcode == GA_DEVSUP_ERROR:
return UnsupportedException
else:
return GpuArrayException
cdef bint py_CHKFLAGS(GpuArray a, int flags):
return GpuArray_CHKFLAGS(&a.ga, flags)
cdef bint py_ISONESEGMENT(GpuArray a):
return GpuArray_ISONESEGMENT(&a.ga)
cdef void array_fix_flags(GpuArray a):
GpuArray_fix_flags(&a.ga)
cdef int array_empty(GpuArray a, gpucontext *ctx,
int typecode, unsigned int nd, const size_t *dims,
ga_order ord) except -1:
cdef int err
err = GpuArray_empty(&a.ga, ctx, typecode, nd, dims, ord)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(ctx, err)
cdef int array_fromdata(GpuArray a,
gpudata *data, size_t offset, int typecode,
unsigned int nd, const size_t *dims,
const ssize_t *strides, int writeable) except -1:
cdef int err
err = GpuArray_fromdata(&a.ga, data, offset, typecode, nd, dims,
strides, writeable)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(gpudata_context(data), err)
cdef int array_view(GpuArray v, GpuArray a) except -1:
cdef int err
err = GpuArray_view(&v.ga, &a.ga)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_sync(GpuArray a) except -1:
cdef int err
with nogil:
err = GpuArray_sync(&a.ga)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_index(GpuArray r, GpuArray a, const ssize_t *starts,
const ssize_t *stops, const ssize_t *steps) except -1:
cdef int err
err = GpuArray_index(&r.ga, &a.ga, starts, stops, steps)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_take1(GpuArray r, GpuArray a, GpuArray i,
int check_err) except -1:
cdef int err
err = GpuArray_take1(&r.ga, &a.ga, &i.ga, check_err)
if err != GA_NO_ERROR:
if err == GA_VALUE_ERROR:
raise IndexError, GpuArray_error(&r.ga, err)
raise get_exc(err), GpuArray_error(&r.ga, err)
cdef int array_setarray(GpuArray v, GpuArray a) except -1:
cdef int err
err = GpuArray_setarray(&v.ga, &a.ga)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&v.ga, err)
cdef int array_reshape(GpuArray res, GpuArray a, unsigned int nd,
const size_t *newdims, ga_order ord,
bint nocopy) except -1:
cdef int err
err = GpuArray_reshape(&res.ga, &a.ga, nd, newdims, ord, nocopy)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_transpose(GpuArray res, GpuArray a,
const unsigned int *new_axes) except -1:
cdef int err
err = GpuArray_transpose(&res.ga, &a.ga, new_axes)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_clear(GpuArray a) except -1:
GpuArray_clear(&a.ga)
cdef bint array_share(GpuArray a, GpuArray b):
return GpuArray_share(&a.ga, &b.ga)
cdef gpucontext *array_context(GpuArray a) except NULL:
cdef gpucontext *res
res = GpuArray_context(&a.ga)
if res is NULL:
raise GpuArrayException, "Invalid array or destroyed context"
return res
cdef int array_move(GpuArray a, GpuArray src) except -1:
cdef int err
err = GpuArray_move(&a.ga, &src.ga)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_write(GpuArray a, void *src, size_t sz) except -1:
cdef int err
with nogil:
err = GpuArray_write(&a.ga, src, sz)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_read(void *dst, size_t sz, GpuArray src) except -1:
cdef int err
with nogil:
err = GpuArray_read(dst, sz, &src.ga)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&src.ga, err)
cdef int array_memset(GpuArray a, int data) except -1:
cdef int err
err = GpuArray_memset(&a.ga, data)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_copy(GpuArray res, GpuArray a, ga_order order) except -1:
cdef int err
err = GpuArray_copy(&res.ga, &a.ga, order)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_transfer(GpuArray res, GpuArray a) except -1:
cdef int err
with nogil:
err = GpuArray_transfer(&res.ga, &a.ga)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_split(_GpuArray **res, GpuArray a, size_t n, size_t *p,
unsigned int axis) except -1:
cdef int err
err = GpuArray_split(res, &a.ga, n, p, axis)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&a.ga, err)
cdef int array_concatenate(GpuArray r, const _GpuArray **a, size_t n,
unsigned int axis, int restype) except -1:
cdef int err
err = GpuArray_concatenate(&r.ga, a, n, axis, restype)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(a[0], err)
cdef const char *kernel_error(GpuKernel k, int err) except NULL:
return gpucontext_error(gpukernel_context(k.k.k), err)
cdef int kernel_init(GpuKernel k, gpucontext *ctx,
unsigned int count, const char **strs, const size_t *len,
const char *name, unsigned int argcount, const int *types,
int flags) except -1:
cdef int err
cdef char *err_str = NULL
err = GpuKernel_init(&k.k, ctx, count, strs, len, name, argcount,
types, flags, &err_str)
if err != GA_NO_ERROR:
if err_str != NULL:
try:
py_err_str = err_str.decode('UTF-8')
finally:
free(err_str)
raise get_exc(err), py_err_str
raise get_exc(err), gpucontext_error(ctx, err)
cdef int kernel_clear(GpuKernel k) except -1:
GpuKernel_clear(&k.k)
cdef gpucontext *kernel_context(GpuKernel k) except NULL:
cdef gpucontext *res
res = GpuKernel_context(&k.k)
if res is NULL:
raise GpuArrayException, "Invalid kernel or destroyed context"
return res
cdef int kernel_sched(GpuKernel k, size_t n, size_t *gs, size_t *ls) except -1:
cdef int err
err = GpuKernel_sched(&k.k, n, gs, ls)
if err != GA_NO_ERROR:
raise get_exc(err), kernel_error(k, err)
cdef int kernel_call(GpuKernel k, unsigned int n, const size_t *gs,
const size_t *ls, size_t shared, void **args) except -1:
cdef int err
err = GpuKernel_call(&k.k, n, gs, ls, shared, args)
if err != GA_NO_ERROR:
raise get_exc(err), kernel_error(k, err)
cdef int kernel_property(GpuKernel k, int prop_id, void *res) except -1:
cdef int err
err = gpukernel_property(k.k.k, prop_id, res)
if err != GA_NO_ERROR:
raise get_exc(err), kernel_error(k, err)
cdef GpuContext pygpu_default_context():
return default_context
cdef GpuContext default_context = None
cdef int ctx_property(GpuContext c, int prop_id, void *res) except -1:
cdef int err
err = gpucontext_property(c.ctx, prop_id, res)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(c.ctx, err)
def set_default_context(GpuContext ctx):
"""
set_default_context(ctx)
Set the default context for the module.
The provided context will be used as a default value for all the
other functions in this module which take a context as parameter.
Call with `None` to clear the default value.
If you don't call this function the context of all other functions
is a mandatory argument.
This can be helpful to reduce clutter when working with only one
context. It is strongly discouraged to use this function when
working with multiple contexts at once.
Parameters
----------
ctx: GpuContext
default context
"""
global default_context
default_context = ctx
def get_default_context():
"""
get_default_context()
Return the currently defined default context (or `None`).
"""
return default_context
cdef GpuContext ensure_context(GpuContext c):
global default_context
if c is None:
if default_context is None:
raise TypeError, "No context specified."
return default_context
return c
cdef bint pygpu_GpuArray_Check(object o):
return isinstance(o, GpuArray)
def count_platforms(kind):
"""
count_platforms(kind)
Return number of host's platforms compatible with `kind`.
"""
cdef unsigned int platcount
cdef int err
err = gpu_get_platform_count(_s(kind), &platcount)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(NULL, err)
return platcount
def count_devices(kind, unsigned int platform):
"""
count_devices(kind, platform)
Returns number of devices in host's `platform` compatible with `kind`.
"""
cdef unsigned int devcount
cdef int err
err = gpu_get_device_count(_s(kind), platform, &devcount)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(NULL, err)
return devcount
cdef GpuContext pygpu_init(dev, gpucontext_props *p):
cdef int err
cdef GpuContext res
if dev.startswith('cuda'):
kind = b"cuda"
if dev[4:] == '':
devnum = -1
else:
devnum = int(dev[4:])
gpucontext_props_cuda_dev(p, devnum)
elif dev.startswith('opencl'):
kind = b"opencl"
devspec = dev[6:].split(':')
if len(devspec) < 2:
raise ValueError, "OpenCL name incorrect. Should be opencl<int>:<int> instead got: " + dev
if not devspec[0].isdigit() or not devspec[1].isdigit():
raise ValueError, "OpenCL name incorrect. Should be opencl<int>:<int> instead got: " + dev
else:
gpucontext_props_opencl_dev(p, int(devspec[0]), int(devspec[1]))
else:
raise ValueError, "Unknown device format:" + dev
res = GpuContext.__new__(GpuContext)
res.kind = kind
err = gpucontext_init(&res.ctx, <char *>res.kind, p)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(NULL, err)
return res
def init(dev, sched='default', single_stream=False, kernel_cache_path=None,
max_cache_size=sys.maxsize, initial_cache_size=0):
"""
init(dev, sched='default', single_stream=False, kernel_cache_path=None,
max_cache_size=sys.maxsize, initial_cache_size=0)
Creates a context from a device specifier.
Device specifiers are composed of the type string and the device
id like so::
"cuda0"
"opencl0:1"
For cuda the device id is the numeric identifier. You can see
what devices are available by running nvidia-smi on the machine.
Be aware that the ordering in nvidia-smi might not correspond to
the ordering in this library. This is due to how cuda enumerates
devices. If you don't specify a number (e.g. 'cuda') the first
available device will be selected according to the backend order.
For opencl the device id is the platform number, a colon (:) and
the device number. On Debian, the clinfo package can
list available platforms and devices. Or, you can experiment with
the values, unavailable ones will just raise an error, and there
are no gaps in the valid numbers.
Parameters
----------
dev: str
device specifier
sched: {'default', 'single', 'multi'}
optimize scheduling for which type of operation
disable_alloc_cache: bool
disable allocation cache (if any)
single_stream: bool
enable single stream mode
"""
cdef gpucontext_props *p = NULL
cdef int err
cdef bytes kernel_cache_path_b
err = gpucontext_props_new(&p)
if err != GA_NO_ERROR:
raise MemoryError
try:
if sched == 'single':
err = gpucontext_props_sched(p, GA_CTX_SCHED_SINGLE)
elif sched == 'multi':
err = gpucontext_props_sched(p, GA_CTX_SCHED_MULTI)
elif sched != 'default':
raise TypeError('unexpected value for parameter sched: %s' % (sched,))
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(NULL, err)
if kernel_cache_path:
kernel_cache_path_b = _s(kernel_cache_path)
gpucontext_props_kernel_cache(p, <const char *>kernel_cache_path_b)
err = gpucontext_props_alloc_cache(p, initial_cache_size,
max_cache_size)
if err != GA_NO_ERROR:
raise get_exc(err), gpucontext_error(NULL, err)
if single_stream:
gpucontext_props_set_single_stream(p);
except:
gpucontext_props_del(p)
raise
return pygpu_init(dev, p)
def zeros(shape, dtype=GA_DOUBLE, order='C', GpuContext context=None,
cls=None):
"""
zeros(shape, dtype='float64', order='C', context=None, cls=None)
Returns an array of zero-initialized values of the requested
shape, type and order.
Parameters
----------
shape: iterable of ints
number of elements in each dimension
dtype: str, numpy.dtype or int
type of the elements
order: {'A', 'C', 'F'}
layout of the data in memory, one of 'A'ny, 'C' or 'F'ortran
context: GpuContext
context in which to do the allocation
cls: type
class of the returned array (must inherit from GpuArray)
"""
res = empty(shape, dtype=dtype, order=order, context=context, cls=cls)
array_memset(res, 0)
return res
cdef GpuArray pygpu_zeros(unsigned int nd, const size_t *dims, int typecode,
ga_order order, GpuContext context, object cls):
cdef GpuArray res
res = pygpu_empty(nd, dims, typecode, order, context, cls)
array_memset(res, 0)
return res
cdef GpuArray pygpu_empty(unsigned int nd, const size_t *dims, int typecode,
ga_order order, GpuContext context, object cls):
cdef GpuArray res
context = ensure_context(context)
res = new_GpuArray(cls, context, None)
array_empty(res, context.ctx, typecode, nd, dims, order)
return res
cdef GpuArray pygpu_fromgpudata(gpudata *buf, size_t offset, int typecode,
unsigned int nd, const size_t *dims,
const ssize_t *strides, GpuContext context,
bint writable, object base, object cls):
cdef GpuArray res
res = new_GpuArray(cls, context, base)
array_fromdata(res, buf, offset, typecode, nd, dims,
strides, writable)
return res
cdef GpuArray pygpu_copy(GpuArray a, ga_order ord):
cdef GpuArray res
res = new_GpuArray(type(a), a.context, None)
array_copy(res, a, ord)
return res
cdef int pygpu_move(GpuArray a, GpuArray src) except -1:
array_move(a, src)
return 0
def empty(shape, dtype=GA_DOUBLE, order='C', GpuContext context=None,
cls=None):
"""
empty(shape, dtype='float64', order='C', context=None, cls=None)
Returns an empty (uninitialized) array of the requested shape,
type and order.
Parameters
----------
shape: iterable of ints
number of elements in each dimension
dtype: str, numpy.dtype or int
type of the elements
order: {'A', 'C', 'F'}
layout of the data in memory, one of 'A'ny, 'C' or 'F'ortran
context: GpuContext
context in which to do the allocation
cls: type
class of the returned array (must inherit from GpuArray)
"""
cdef size_t *cdims
cdef unsigned int nd
try:
nd = <unsigned int>len(shape)
except TypeError:
nd = 1
shape = [shape]
cdims = <size_t *>calloc(nd, sizeof(size_t))
if cdims == NULL:
raise MemoryError, "could not allocate cdims"
try:
for i, d in enumerate(shape):
cdims[i] = d
return pygpu_empty(nd, cdims,
dtype_to_typecode(dtype), to_ga_order(order),
context, cls)
finally:
free(cdims)
def asarray(a, dtype=None, order='A', GpuContext context=None):
"""
asarray(a, dtype=None, order='A', context=None)
Returns a GpuArray from the data in `a`
If `a` is already a GpuArray and all other parameters match, then
the object itself returned. If `a` is an instance of a subclass
of GpuArray then a view of the base class will be returned.
Otherwise a new object is create and the data is copied into it.
`context` is optional if `a` is a GpuArray (but must match exactly
the context of `a` if specified) and is mandatory otherwise.
Parameters
----------
a: array-like
data
dtype: str, numpy.dtype or int
type of the elements
order: {'A', 'C', 'F'}
layout of the data in memory, one of 'A'ny, 'C' or 'F'ortran
context: GpuContext
context in which to do the allocation
"""
return array(a, dtype=dtype, order=order, copy=False, context=context,
cls=GpuArray)
def ascontiguousarray(a, dtype=None, GpuContext context=None):
"""
ascontiguousarray(a, dtype=None, context=None)
Returns a contiguous array in device memory (C order).
`context` is optional if `a` is a GpuArray (but must match exactly
the context of `a` if specified) and is mandatory otherwise.
Parameters
----------
a: array-like
input
dtype: str, numpy.dtype or int
type of the return array
context: GpuContext
context to use for a new array
"""
return array(a, order='C', dtype=dtype, ndmin=1, copy=False,
context=context)
def asfortranarray(a, dtype=None, GpuArray context=None):
"""
asfortranarray(a, dtype=None, context=None)
Returns a contiguous array in device memory (Fortran order)
`context` is optional if `a` is a GpuArray (but must match exactly
the context of `a` if specified) and is mandatory otherwise.
Parameters
----------
a: array-like
input
dtype: str, numpy.dtype or int
type of the elements
context: GpuContext
context in which to do the allocation
"""
return array(a, order='F', dtype=dtype, ndmin=1, copy=False,
context=context)
def may_share_memory(GpuArray a not None, GpuArray b not None):
"""
may_share_memory(a, b)
Returns True if `a` and `b` may share memory, False otherwise.
"""
return array_share(a, b)
def from_gpudata(size_t data, offset, dtype, shape, GpuContext context=None,
strides=None, writable=True, base=None, cls=None):
"""
from_gpudata(data, offset, dtype, shape, context=None, strides=None, writable=True, base=None, cls=None)
Build a GpuArray from pre-allocated gpudata
Parameters
----------
data: int
pointer to a gpudata structure
offset: int
offset to the data location inside the gpudata
dtype: numpy.dtype
data type of the gpudata elements
shape: iterable of ints
shape to use for the result
context: GpuContext
context of the gpudata
strides: iterable of ints
strides for the results (C contiguous if not specified)
writable: bool
is the data writable?
base: object
base object that keeps gpudata alive
cls: type
view type of the result
Notes
-----
This function might be deprecated in a later release since the only
way to create gpudata pointers is through libgpuarray functions
that aren't exposed at the python level. It can be used with the
value of the `gpudata` attribute of an existing GpuArray.
.. warning::
This function is intended for advanced use and will crash the
interpreter if used improperly.
"""
cdef size_t *cdims = NULL
cdef ssize_t *cstrides = NULL
cdef unsigned int nd
cdef size_t size
cdef int typecode
context = ensure_context(context)
try:
nd = <unsigned int>len(shape)
except TypeError:
nd = 1
shape = [shape]
if strides is not None and len(strides) != nd:
raise ValueError, "strides must be the same length as shape"
typecode = dtype_to_typecode(dtype)
try:
cdims = <size_t *>calloc(nd, sizeof(size_t))
cstrides = <ssize_t *>calloc(nd, sizeof(ssize_t))
if cdims == NULL or cstrides == NULL:
raise MemoryError
for i, d in enumerate(shape):
cdims[i] = d
if strides:
for i, s in enumerate(strides):
cstrides[i] = s
else:
size = gpuarray_get_elsize(typecode)
for i in range(nd-1, -1, -1):
cstrides[i] = size
size *= cdims[i]
return pygpu_fromgpudata(<gpudata *>data, offset, typecode, nd, cdims,
cstrides, context, writable, base, cls)
finally:
free(cdims)
free(cstrides)
def array(proto, dtype=None, copy=True, order=None, unsigned int ndmin=0,
GpuContext context=None, cls=None):
"""
array(obj, dtype='float64', copy=True, order=None, ndmin=0, context=None, cls=None)
Create a GpuArray from existing data
This function creates a new GpuArray from the data provided in
`obj` except if `obj` is already a GpuArray and all the parameters
match its properties and `copy` is False.
The properties of the resulting array depend on the input data
except if overridden by other parameters.
This function is similar to :meth:`numpy.array` except that it returns
GpuArrays.
Parameters
----------
obj: array-like
data to initialize the result
dtype: string or numpy.dtype or int
data type of the result elements
copy: bool
return a copy?
order: str
memory layout of the result
ndmin: int
minimum number of result dimensions
context: GpuContext
allocation context
cls: type
result class (must inherit from GpuArray)
"""
return carray(proto, dtype, copy, order, ndmin, context, cls)
cdef carray(proto, dtype, copy, order, unsigned int ndmin,
GpuContext context, cls):
cdef GpuArray res
cdef GpuArray arg
cdef GpuArray tmp
cdef np.ndarray a
if isinstance(proto, GpuArray):
arg = proto
if context is not None and context.ctx != array_context(arg):
raise ValueError, "cannot copy an array to a different context"
if (not copy
and (dtype is None or dtype_to_typecode(dtype) == arg.typecode)
and (order is None or order == 'A' or
(order == 'C' and py_CHKFLAGS(arg, GA_C_CONTIGUOUS)) or
(order == 'F' and py_CHKFLAGS(arg, GA_F_CONTIGUOUS)))):
if arg.ga.nd < ndmin:
shp = arg.shape
idx = (1,)*(ndmin-len(shp))
shp = idx + shp
arg = arg.reshape(shp)
if not (cls is None or arg.__class__ is cls):
arg = arg.view(cls)
return arg
shp = arg.shape
if len(shp) < ndmin:
idx = (1,)*(ndmin-len(shp))
shp = idx + shp
if order is None or order == 'A':
if py_CHKFLAGS(arg, GA_C_CONTIGUOUS):
order = 'C'
elif py_CHKFLAGS(arg, GA_F_CONTIGUOUS):
order = 'F'
if cls is None:
cls = type(proto)
res = empty(shp, dtype=(dtype or arg.dtype), order=order, cls=cls,
context=arg.context)
res.base = arg.base
if len(shp) < ndmin:
tmp = res[idx]
else:
tmp = res
array_move(tmp, arg)
return res
context = ensure_context(context)
# We need a contiguous array for the copy
if order != 'C' and order != 'F':
order = 'C'
a = numpy.array(proto, dtype=dtype_to_npdtype(dtype), order=order,
ndmin=ndmin, copy=False)
res = pygpu_empty(np.PyArray_NDIM(a), <size_t *>np.PyArray_DIMS(a),
dtype_to_typecode(a.dtype), to_ga_order(order),
context, cls)
array_write(res, np.PyArray_DATA(a), np.PyArray_NBYTES(a))
return res
cdef void (*cuda_enter)(gpucontext *)
cdef void (*cuda_exit)(gpucontext *)
cuda_enter = <void (*)(gpucontext *)>gpuarray_get_extension("cuda_enter")
cuda_exit = <void (*)(gpucontext *)>gpuarray_get_extension("cuda_exit")
cdef class GpuContext:
"""
Class that holds all the information pertaining to a context.
The currently implemented modules (for the `kind` parameter) are
"cuda" and "opencl". Which are available depends on the build
options for libgpuarray.
The flag values are defined in the gpuarray/buffer.h header and
are in the "Context flags" group. If you want to use more than
one value you must bitwise OR them together.
If you want an alternative interface check :meth:`~pygpu.gpuarray.init`.
Parameters
----------
kind: str
module name for the context
devno: int
device number
flags: int
context flags
"""
def __dealloc__(self):
if self.ctx != NULL:
gpucontext_deref(self.ctx)
def __reduce__(self):
raise RuntimeError, "Cannot pickle GpuContext object"
def __init__(self):
if type(self) is GpuContext:
raise RuntimeError, "Called raw GpuContext.__init__"
def __enter__(self):
if cuda_enter == NULL:
raise RuntimeError("cuda_enter not available")
if cuda_exit == NULL:
raise RuntimeError("cuda_exit not available")
if self.kind != b"cuda":
raise ValueError("Context manager only works for cuda")
cuda_enter(self.ctx)
return self
def __exit__(self, t, v, tb):
cuda_exit(self.ctx)
property ptr:
"Raw pointer value for the context object"
def __get__(self):
return <size_t>self.ctx
property devname:
"Device name for this context"
def __get__(self):
cdef char tmp[256]
ctx_property(self, GA_CTX_PROP_DEVNAME, tmp)
return tmp.decode('ascii')
property unique_id:
"Device PCI Bus ID for this context"
def __get__(self):
cdef char tmp[16]
ctx_property(self, GA_CTX_PROP_UNIQUE_ID, tmp)
return tmp.decode('ascii')
property lmemsize:
"Size of the local (shared) memory, in bytes, for this context"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_LMEMSIZE, &res)
return res
property numprocs:
"Number of compute units for this context"
def __get__(self):
cdef unsigned int res
ctx_property(self, GA_CTX_PROP_NUMPROCS, &res)
return res
property bin_id:
"Binary compatibility id"
def __get__(self):
cdef const char *res
ctx_property(self, GA_CTX_PROP_BIN_ID, &res)
return res;
property total_gmem:
"Total size of global memory on the device"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_TOTAL_GMEM, &res)
return res
property free_gmem:
"Size of free global memory on the device"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_FREE_GMEM, &res)
return res
property maxlsize0:
"Maximum local size for dimension 0"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_MAXLSIZE0, &res)
return res
property maxlsize1:
"Maximum local size for dimension 1"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_MAXLSIZE1, &res)
return res
property maxlsize2:
"Maximum local size for dimension 2"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_MAXLSIZE2, &res)
return res
property maxgsize0:
"Maximum global size for dimension 0"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_MAXGSIZE0, &res)
return res
property maxgsize1:
"Maximum global size for dimension 1"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_MAXGSIZE1, &res)
return res
property maxgsize2:
"Maximum global size for dimension 2"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_MAXGSIZE2, &res)
return res
property largest_memblock:
"Size of the largest memory block you can allocate"
def __get__(self):
cdef size_t res
ctx_property(self, GA_CTX_PROP_LARGEST_MEMBLOCK, &res)
return res
cdef class flags(object):
cdef int fl
def __cinit__(self, fl):
self.fl = fl
def __reduce__(self):
return (flags, (self.fl,))
def __getitem__(self, idx):
cdef const char *key
cdef size_t n
cdef char c
if isinstance(idx, unicode):
idx = idx.encode('UTF-8')
if isinstance(idx, bytes):
key = idx
n = len(idx)
else:
raise KeyError, "Unknown flag"
if n == 1:
c = key[0]
if c == 'C':
return self.c_contiguous
elif c == 'F':
return self.f_contiguous
elif c == 'W':
return self.writeable
elif c == 'B':
return self.behaved
elif c == 'O':
return self.owndata
elif c == 'A':
return self.aligned
elif c == 'U':
return self.updateifcopy
elif c == 'X':
return self.writebackifcopy
elif n == 2:
if strncmp(key, "CA", n) == 0:
return self.carray
if strncmp(key, "FA", n) == 0:
return self.farray
elif n == 3:
if strncmp(key, "FNC", n) == 0:
return self.fnc
elif n == 4:
if strncmp(key, "FORC", n) == 0:
return self.forc
elif n == 6:
if strncmp(key, "CARRAY", n) == 0:
return self.carray
if strncmp(key, "FARRAY", n) == 0:
return self.farray
elif n == 7:
if strncmp(key, "FORTRAN", n) == 0:
return self.fortran
if strncmp(key, "BEHAVED", n) == 0:
return self.behaved
if strncmp(key, "OWNDATA", n) == 0:
return self.owndata
if strncmp(key, "ALIGNED", n) == 0:
return self.aligned
elif n == 9:
if strncmp(key, "WRITEABLE", n) == 0:
return self.writeable
elif n == 10:
if strncmp(key, "CONTIGUOUS", n) == 0:
return self.c_contiguous
elif n == 12:
if strncmp(key, "UPDATEIFCOPY", n) == 0:
return self.updateifcopy
if strncmp(key, "C_CONTIGUOUS", n) == 0:
return self.c_contiguous
if strncmp(key, "F_CONTIGUOUS", n) == 0:
return self.f_contiguous
elif n == 15:
if strncmp(key, "WRITEBACKIFCOPY", n) == 0:
return self.writebackifcopy
raise KeyError, "Unknown flag"
def __repr__(self):
return '\n'.join(" %s : %s" % (name.upper(), getattr(self, name))
for name in ["c_contiguous", "f_contiguous",
"owndata", "writeable", "aligned",
"updateifcopy", "writebackifcopy"])
def __richcmp__(self, other, int op):
cdef flags a
cdef flags b
if not isinstance(self, flags) or not isinstance(other, flags):
return NotImplemented
a = self
b = other
if op == Py_EQ:
return a.fl == b.fl
elif op == Py_NE:
return a.fl != b.fl
raise TypeError, "undefined comparison for flag object"
property c_contiguous:
def __get__(self):
return bool(self.fl & GA_C_CONTIGUOUS)
property contiguous:
def __get__(self):
return self.c_contiguous
property f_contiguous:
def __get__(self):
return bool(self.fl & GA_F_CONTIGUOUS)
property fortran:
def __get__(self):
return self.f_contiguous
property updateifcopy:
# Not supported.
def __get__(self):
return False
property writebackifcopy:
# Not supported.
def __get__(self):
return False
property owndata:
# There is no equivalent for GpuArrays and it is always "True".
def __get__(self):
return True
property aligned:
def __get__(self):
return bool(self.fl & GA_ALIGNED)
property writeable:
def __get__(self):
return bool(self.fl & GA_WRITEABLE)
property behaved:
def __get__(self):
return (self.fl & GA_BEHAVED) == GA_BEHAVED
property carray:
def __get__(self):
return (self.fl & GA_CARRAY) == GA_CARRAY
# Yes these are really defined like that according to numpy sources.
# I don't know why.
property forc:
def __get__(self):
return ((self.fl & GA_F_CONTIGUOUS) == GA_F_CONTIGUOUS or
(self.fl & GA_C_CONTIGUOUS) == GA_C_CONTIGUOUS)
property fnc:
def __get__(self):
return ((self.fl & GA_F_CONTIGUOUS) == GA_F_CONTIGUOUS and
not (self.fl & GA_C_CONTIGUOUS) == GA_C_CONTIGUOUS)
property farray:
def __get__(self):
return ((self.fl & GA_FARRAY) != 0 and
not ((self.fl & GA_C_CONTIGUOUS) != 0))
property num:
def __get__(self):
return self.fl
cdef GpuArray new_GpuArray(object cls, GpuContext ctx, object base):
cdef GpuArray res
if ctx is None:
raise RuntimeError, "ctx is None in new_GpuArray"
if cls is None or cls is GpuArray:
res = GpuArray.__new__(GpuArray)
else:
res = GpuArray.__new__(cls)
res.base = base
res.context = ctx
return res
cdef GpuArray pygpu_view(GpuArray a, object cls):
cdef GpuArray res = new_GpuArray(cls, a.context, a.base)
array_view(res, a)
return res
cdef int pygpu_sync(GpuArray a) except -1:
array_sync(a)
return 0
cdef GpuArray pygpu_empty_like(GpuArray a, ga_order ord, int typecode):
cdef GpuArray res
if ord == GA_ANY_ORDER:
if (py_CHKFLAGS(a, GA_F_CONTIGUOUS) and
not py_CHKFLAGS(a, GA_C_CONTIGUOUS)):
ord = GA_F_ORDER
else:
ord = GA_C_ORDER
if typecode == -1:
typecode = a.ga.typecode
res = new_GpuArray(type(a), a.context, None)
array_empty(res, a.context.ctx, typecode,
a.ga.nd, a.ga.dimensions, ord)
return res
cdef np.ndarray pygpu_as_ndarray(GpuArray a):
return _pygpu_as_ndarray(a, None)
cdef np.ndarray _pygpu_as_ndarray(GpuArray a, np.dtype ldtype):
cdef np.ndarray res
if not py_ISONESEGMENT(a):
a = pygpu_copy(a, GA_ANY_ORDER)
if ldtype is None:
ldtype = a.dtype
res = PyArray_Empty(a.ga.nd, <np.npy_intp *>a.ga.dimensions,
ldtype, (py_CHKFLAGS(a, GA_F_CONTIGUOUS) and
not py_CHKFLAGS(a, GA_C_CONTIGUOUS)))
array_read(np.PyArray_DATA(res), np.PyArray_NBYTES(res), a)
return res
cdef GpuArray pygpu_index(GpuArray a, const ssize_t *starts,
const ssize_t *stops, const ssize_t *steps):
cdef GpuArray res
res = new_GpuArray(type(a), a.context, a.base)
try:
array_index(res, a, starts, stops, steps)
except ValueError, e:
raise IndexError, "index out of bounds"
return res
cdef GpuArray pygpu_reshape(GpuArray a, unsigned int nd, const size_t *newdims,
ga_order ord, bint nocopy, int compute_axis):
cdef GpuArray res
res = new_GpuArray(type(a), a.context, a.base)
if compute_axis < 0:
array_reshape(res, a, nd, newdims, ord, nocopy)
return res
cdef unsigned int caxis = <unsigned int>compute_axis
if caxis >= nd:
raise ValueError("compute_axis is out of bounds")
cdef size_t *cdims
cdef size_t tot = 1
cdef unsigned int i
for i in range(nd):
if i != caxis:
tot *= newdims[i]
cdims = <size_t *>calloc(nd, sizeof(size_t))
if cdims == NULL:
raise MemoryError, "could not allocate cdims"
cdef size_t d
try:
for i in range(nd):
d = newdims[i]
if i == caxis:
d = a.size // tot
if d * tot != a.size:
raise GpuArrayException, "..."
cdims[i] = d
array_reshape(res, a, nd, cdims, ord, nocopy)
return res
finally:
free(cdims)
cdef GpuArray pygpu_transpose(GpuArray a, const unsigned int *newaxes):
cdef GpuArray res
res = new_GpuArray(type(a), a.context, a.base)
array_transpose(res, a, newaxes)
return res
cdef int pygpu_transfer(GpuArray res, GpuArray a) except -1:
array_transfer(res, a)
return 0
def _split(GpuArray a, ind, unsigned int axis):
"""
_split(a, ind, axis)
"""
cdef list r = [None] * (len(ind) + 1)
cdef Py_ssize_t i
if not axis < a.ga.nd:
raise ValueError, "split on non-existant axis"
cdef size_t m = a.ga.dimensions[axis]
cdef size_t v
cdef size_t *p = <size_t *>PyMem_Malloc(sizeof(size_t) * len(ind))
if p == NULL:
raise MemoryError()
cdef _GpuArray **rs = <_GpuArray **>PyMem_Malloc(sizeof(_GpuArray *) * len(r))
if rs == NULL:
PyMem_Free(p)
raise MemoryError()
try:
for i in range(len(r)):
r[i] = new_GpuArray(type(a), a.context, a.base)
rs[i] = &(<GpuArray>r[i]).ga
for i in range(len(ind)):
v = ind[i]
# cap the values to the end of the array
p[i] = v if v < m else m
array_split(rs, a, len(ind), p, axis)
return r
finally:
PyMem_Free(p)
PyMem_Free(rs)
cdef GpuArray pygpu_concatenate(const _GpuArray **a, size_t n,
unsigned int axis, int restype,
object cls, GpuContext context):
cdef res = new_GpuArray(cls, context, None)
array_concatenate(res, a, n, axis, restype)
return res
def _concatenate(list al, unsigned int axis, int restype, object cls,
GpuContext context):
"""
_concatenate(al, axis, restype, cls, context)
"""
cdef Py_ssize_t i
context = ensure_context(context)
cdef const _GpuArray **als = <const _GpuArray **>PyMem_Malloc(sizeof(_GpuArray *) * len(al))
if als == NULL:
raise MemoryError()
try:
for i in range(len(al)):
if not isinstance(al[i], GpuArray):
raise TypeError, "expected GpuArrays to concatenate"
als[i] = &(<GpuArray>al[i]).ga
return pygpu_concatenate(als, len(al), axis, restype, cls, context)
finally:
PyMem_Free(als)
cdef int (*cuda_get_ipc_handle)(gpudata *, GpuArrayIpcMemHandle *)
cdef gpudata *(*cuda_open_ipc_handle)(gpucontext *, GpuArrayIpcMemHandle *, size_t)
cuda_get_ipc_handle = <int (*)(gpudata *, GpuArrayIpcMemHandle *)>gpuarray_get_extension("cuda_get_ipc_handle")
cuda_open_ipc_handle = <gpudata *(*)(gpucontext *, GpuArrayIpcMemHandle *, size_t)>gpuarray_get_extension("cuda_open_ipc_handle")
def open_ipc_handle(GpuContext c, bytes hpy, size_t l):
"""
open_ipc_handle(c, hpy, l)
Open an IPC handle to get a new GpuArray from it.
Parameters
----------
c: GpuContext
context
hpy: bytes
binary handle data received
l: int
size of the referred memory block
"""
cdef char *b
cdef GpuArrayIpcMemHandle h
cdef gpudata *d
b = hpy
memcpy(&h, b, sizeof(h))
d = cuda_open_ipc_handle(c.ctx, &h, l)
if d is NULL:
raise GpuArrayException, gpucontext_error(c.ctx, 0)
return <size_t>d
cdef class GpuArray:
"""
Device array
To create instances of this class use
:meth:`~pygpu.gpuarray.zeros`, :meth:`~pygpu.gpuarray.empty` or
:meth:`~pygpu.gpuarray.array`. It cannot be instantiated
directly.
You can also subclass this class and make the module create your
instances by passing the `cls` argument to any method that return a
new GpuArray. This way of creating the class will NOT call your
:meth:`__init__` method.
You can also implement your own :meth:`__init__` method, but you
must take care to ensure you properly initialized the GpuArray C
fields before using it or you will most likely crash the
interpreter.
"""
def __dealloc__(self):
array_clear(self)
def __cinit__(self):
memset(&self.ga, 0, sizeof(_GpuArray))
def __init__(self):
if type(self) is GpuArray:
raise RuntimeError, "Called raw GpuArray.__init__"
def __reduce__(self):
raise RuntimeError, "Cannot pickle GpuArray object"
cdef __index_helper(self, key, unsigned int i, ssize_t *start,
ssize_t *stop, ssize_t *step):
cdef Py_ssize_t dummy
cdef Py_ssize_t k
try:
k = PyNumber_Index(key)
if k < 0:
k += self.ga.dimensions[i]
if k < 0 or (<size_t>k) >= self.ga.dimensions[i]:
raise IndexError, "index %d out of bounds" % (i,)
start[0] = k
step[0] = 0
return
except TypeError:
pass
if isinstance(key, slice):
PySlice_GetIndicesEx(key, self.ga.dimensions[i],
start, stop, step, &dummy)
if stop[0] < start[0] and step[0] > 0:
stop[0] = start[0]
elif key is Ellipsis:
start[0] = 0
stop[0] = self.ga.dimensions[i]
step[0] = 1
else:
raise IndexError, "cannot index with: %s" % (key,)
def write(self, np.ndarray src not None):
"""
write(src)
Writes host's Numpy array to device's GpuArray.
This method is as fast as or even faster than :ref:asarray, because it
skips possible allocation of a buffer in device's memory. It uses this
already allocated GpuArray buffer to contain `src` array from host's
memory. It is required though that the GpuArray and the Numpy array are
compatible in byte size and data type. It is also needed for the
GpuArray to be well behaved and contiguous. If `src` is not aligned or
compatible in contiguity it will be copied to a new Numpy array in order
to be. It is allowed for this GpuArray and `src` to have different
shapes.
Parameters
----------
src: numpy.ndarray
source array in host
Raises
------
ValueError
If this GpuArray is not compatible with `src` or if it is
not well behaved or contiguous.
"""
if not self.flags.behaved:
raise ValueError, "Destination GpuArray is not well behaved: aligned and writeable"
if self.flags.c_contiguous:
src = np.asarray(src, order='C')
elif self.flags.f_contiguous:
src = np.asarray(src, order='F')
else:
raise ValueError, "Destination GpuArray is not contiguous"
if self.dtype != src.dtype:
raise ValueError, "GpuArray and Numpy array do not have matching data types"
cdef size_t npsz = np.PyArray_NBYTES(src)
cdef size_t sz = gpuarray_get_elsize(self.ga.typecode)
cdef unsigned i
for i in range(self.ga.nd):
sz *= self.ga.dimensions[i]
if sz != npsz:
raise ValueError, "GpuArray and Numpy array do not have the same size in bytes"
array_write(self, np.PyArray_DATA(src), sz)
def read(self, np.ndarray dst not None):
"""
read(dst)
Reads from this GpuArray into host's Numpy array.
This method is as fast as or even faster than :ref:__array__ method and
thus :ref:numpy.asarray. This is because it skips allocation of a new
buffer in host's memory to contain device's GpuArray. It uses an
existing Numpy ndarray as a buffer to get the GpuArray. It is required
though that the GpuArray and the Numpy array to be compatible in byte
size, contiguity and data type. It is also needed for `dst` to be
writeable and properly aligned in host's memory and for `self` to be
contiguous. It is allowed for this GpuArray and `dst` to have different
shapes.
Parameters
----------
dst: numpy.ndarray
destination array in host
Raises
------
ValueError
If this GpuArray is not compatible with `src` or if `dst`
is not well behaved.
"""
if not np.PyArray_ISBEHAVED(dst):
raise ValueError, "Destination Numpy array is not well behaved: aligned and writeable"
if (not ((self.flags.c_contiguous and self.flags.aligned and
dst.flags['C_CONTIGUOUS']) or
(self.flags.f_contiguous and self.flags.aligned and
dst.flags['F_CONTIGUOUS']))):
raise ValueError, "GpuArray and Numpy array do not match in contiguity or GpuArray is not aligned"
if self.dtype != dst.dtype:
raise ValueError, "GpuArray and Numpy array do not have matching data types"
cdef size_t npsz = np.PyArray_NBYTES(dst)
cdef size_t sz = gpuarray_get_elsize(self.ga.typecode)
cdef unsigned i
for i in range(self.ga.nd):
sz *= self.ga.dimensions[i]
if sz != npsz:
raise ValueError, "GpuArray and Numpy array do not have the same size in bytes"
array_read(np.PyArray_DATA(dst), sz, self)
def get_ipc_handle(self):
"""
get_ipc_handle()
"""
cdef GpuArrayIpcMemHandle h
cdef int err
if cuda_get_ipc_handle is NULL:
raise SystemError, "Could not get necessary extension"
if self.context.kind != b'cuda':
raise ValueError, "Only works for cuda contexts"
err = cuda_get_ipc_handle(self.ga.data, &h)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&self.ga, err)
res = <bytes>(<char *>&h)[:sizeof(h)]
return res
def __array__(self, ldtype=None):
"""
__array__(ldtype=None)
Return a :class:`numpy.ndarray` with the same content.
Automatically used by :meth:`numpy.asarray`.
"""
return _pygpu_as_ndarray(self, ldtype)
def __bool__(self):
"""
__bool__()
"""
if self.size == 0:
return False
elif self.size == 1:
return bool(numpy.asarray(self))
else:
raise ValueError('The truth value of a multi-element array is ambiguous')
def _empty_like_me(self, dtype=None, order='C'):
"""
_empty_like_me(dtype=None, order='C')
Returns an empty (uninitialized) GpuArray with the same
properties except if overridden by parameters.
"""
cdef int typecode
cdef GpuArray res
if dtype is None:
typecode = -1
else:
typecode = dtype_to_typecode(dtype)
return pygpu_empty_like(self, to_ga_order(order), typecode)
def copy(self, order='C'):
"""
copy(order='C')
Return a copy if this array.
Parameters
----------
order: {'C', 'A', 'F'}
memory layout of the copy
"""
return pygpu_copy(self, to_ga_order(order))
def transfer(self, GpuContext new_ctx):
"""
transfer(new_ctx)
"""
cdef GpuArray r
if not GpuArray_ISONESEGMENT(&self.ga):
# For now raise an error, may make it work later
raise ValueError("transfer() only works for contigous source")
r = pygpu_empty(self.ga.nd, self.ga.dimensions, self.ga.typecode,
GA_C_ORDER if GpuArray_IS_C_CONTIGUOUS(&self.ga) else GA_F_ORDER,
new_ctx, None)
pygpu_transfer(r, self) # Will raise an error if needed
return r
def __copy__(self):
return pygpu_copy(self, GA_C_ORDER)
def __deepcopy__(self, memo):
if id(self) in memo:
return memo[id(self)]
else:
return pygpu_copy(self, GA_C_ORDER)
def sync(self):
"""
sync()
Wait for all pending operations on this array.
This is done automatically when reading or writing from it,
but can be useful as a separate operation for timings.
"""
pygpu_sync(self)
def view(self, object cls=GpuArray):
"""
view(cls=GpuArray)
Return a view of this array.
The returned array shares device data with this one and both
will reflect changes made to the other.
Parameters
----------
cls: type
class of the view (must inherit from GpuArray)
"""
return pygpu_view(self, cls)
def astype(self, dtype, order='A', copy=True):
"""
astype(dtype, order='A', copy=True)
Cast the elements of this array to a new type.
This function returns a new array will all elements cast to
the supplied `dtype`, but otherwise unchanged.
If `copy` is False and the type and order match `self` is
returned.
Parameters
----------
dtype: str or numpy.dtype or int
type of the elements of the result
order: {'A', 'C', 'F'}
memory layout of the result
copy: bool
Always return a copy?
"""
cdef GpuArray res
cdef int typecode = dtype_to_typecode(dtype)
cdef ga_order ord = to_ga_order(order)
if (not copy and typecode == self.ga.typecode and
((py_CHKFLAGS(self, GA_F_CONTIGUOUS) and ord == GA_F_ORDER) or
(py_CHKFLAGS(self, GA_C_CONTIGUOUS) and ord == GA_C_ORDER))):
return self
res = self._empty_like_me(dtype=typecode, order=order)
array_move(res, self)
return res
def reshape(self, shape, order='C'):
"""
reshape(shape, order='C')
Returns a new array with the given shape and order.
The new shape must have the same size (total number of
elements) as the current one.
"""
cdef size_t *newdims
cdef unsigned int nd
cdef unsigned int i
cdef int compute_axis
try:
nd = <unsigned int>len(shape)
except TypeError:
nd = 1
shape = [shape]
newdims = <size_t *>calloc(nd, sizeof(size_t))
if newdims == NULL:
raise MemoryError, "calloc"
compute_axis = -1
try:
for i in range(nd):
if shape[i] == -1:
assert compute_axis == -1
compute_axis = i
newdims[i] = 1
else:
newdims[i] = shape[i]
return pygpu_reshape(self, nd, newdims, to_ga_order(order), 0, compute_axis)
finally:
free(newdims)
def transpose(self, *params):
"""
transpose(*params)
"""
cdef unsigned int *new_axes
cdef unsigned int i
if len(params) is 1 and isinstance(params[0], (tuple, list)):
params = params[0]
if params is () or params == (None,):
return pygpu_transpose(self, NULL)
else:
if len(params) != self.ga.nd:
raise ValueError("axes don't match: " + str(params))
new_axes = <unsigned int *>calloc(self.ga.nd, sizeof(unsigned int))
try:
for i in range(self.ga.nd):
new_axes[i] = params[i]
return pygpu_transpose(self, new_axes)
finally:
free(new_axes)
def __len__(self):
if self.ga.nd > 0:
return self.ga.dimensions[0]
else:
raise TypeError, "len() of unsized object"
def __getitem__(self, key):
cdef unsigned int i
if key is Ellipsis:
return self.__cgetitem__(key)
# A list or a sequence of list should trigger "fancy" indexing.
# This is not implemented yet.
# Conversely, if a list contains slice or Ellipsis objects, it behaves
# the same as a tuple.
if isinstance(key, list):
if any(isinstance(k, slice) or k is Ellipsis for k in key):
return self.__getitem__(tuple(key))
else:
raise NotImplementedError, "fancy indexing not supported"
try:
iter(key)
except TypeError:
key = (key,)
else:
if all(isinstance(k, list) for k in key):
raise NotImplementedError, "fancy indexing not supported"
key = tuple(key)
# Need to massage Ellipsis here, to avoid packing it into a tuple.
if countis(key, Ellipsis) > 1:
raise IndexError, "cannot use more than one Ellipsis"
# The following code replaces an Ellipsis found in the key by
# the corresponding number of slice(None) objects, depending on the
# number of dimensions. As example, this allows indexing on the last
# dimension with a[..., 1:] on any array (including 1-dim). This
# is also required for numpy compat.
try:
ell_idx = key.index(Ellipsis)
except ValueError:
pass
else:
# Need number of axes minus missing dimensions extra slice(None)
# objects, not counting None entries and the Ellipsis itself
num_slcs = self.ga.nd - (len(key) - countis(key, None) - 1)
fill_slices = (slice(None),) * num_slcs
key = key[:ell_idx] + fill_slices + key[ell_idx + 1:]
# Remove the None entries for indexing
getitem_idcs = tuple(k for k in key if k is not None)
# For less than 1 index, fill up with slice(None) to the right.
# This allows indexing a[1:] in multi-dimensional arrays, where the
# slice is applied along the first axis only. It also allows
# a[()], which simply is a view in Numpy.
if len(getitem_idcs) <= 1:
getitem_idcs = (getitem_idcs +
(slice(None),) * (self.ga.nd - len(getitem_idcs)))
# Slice into array, then reshape, accommodating for None entries in key
sliced = self.__cgetitem__(getitem_idcs)
if countis(key, None) == 0:
# Avoid unnecessary reshaping if there was no None
return sliced
else:
new_shape = []
i = 0
if sliced.shape:
for k in key:
if isinstance(k, int):
continue
elif k is None:
new_shape.append(1)
else:
new_shape.append(sliced.shape[i])
i += 1
# Add remaining entries from sliced.shape if existing (happens
# for 1 index or less if ndim >= 2).
new_shape.extend(sliced.shape[i:])
return sliced.reshape(new_shape)
cdef __cgetitem__(self, key):
cdef ssize_t *starts
cdef ssize_t *stops
cdef ssize_t *steps
cdef unsigned int i
cdef unsigned int d
cdef unsigned int el
if key is Ellipsis:
return pygpu_view(self, None)
elif self.ga.nd == 0:
if isinstance(key, tuple) and len(key) == 0:
return self
else:
raise IndexError, "0-d arrays can't be indexed"
starts = <ssize_t *>calloc(self.ga.nd, sizeof(ssize_t))
stops = <ssize_t *>calloc(self.ga.nd, sizeof(ssize_t))
steps = <ssize_t *>calloc(self.ga.nd, sizeof(ssize_t))
try:
if starts == NULL or stops == NULL or steps == NULL:
raise MemoryError
d = 0
if isinstance(key, (tuple, list)):
if Ellipsis in key:
# The following code replaces the first Ellipsis
# found in the key by a bunch of them depending on
# the number of dimensions. As example, this
# allows indexing on the last dimension with
# a[..., 1:] on any array (including 1-dim). This
# is also required for numpy compat.
el = key.index(Ellipsis)
if isinstance(key, tuple):
key = (key[:el] +
(Ellipsis,)*(self.ga.nd - (len(key) - 1)) +
key[el+1:])
else:
key = (key[:el] +
[Ellipsis,]*(self.ga.nd - (len(key) - 1)) +
key[el+1:])
if len(key) > self.ga.nd:
raise IndexError, "too many indices"
for i in range(0, len(key)):
self.__index_helper(key[i], i, &starts[i], &stops[i],
&steps[i])
d += <unsigned int>len(key)
else:
self.__index_helper(key, 0, starts, stops, steps)
d += 1
for i in range(d, self.ga.nd):
starts[i] = 0
stops[i] = self.ga.dimensions[i]
steps[i] = 1
return pygpu_index(self, starts, stops, steps)
finally:
free(starts)
free(stops)
free(steps)
def __setitem__(self, idx, v):
cdef GpuArray tmp, gv
if isinstance(idx, list):
if any(isinstance(i, slice) or i is Ellipsis for i in idx):
self.__setitem__(tuple(idx), v)
else:
raise NotImplementedError, "fancy indexing not supported"
try:
iter(idx)
except TypeError:
idx = (idx,)
else:
if all(isinstance(i, list) for i in idx):
raise NotImplementedError, "fancy indexing not supported"
idx = tuple(idx)
if countis(idx, Ellipsis) > 1:
raise IndexError, "cannot use more than one Ellipsis"
# Remove None entries, they should be ignored (as in Numpy)
idx = tuple(i for i in idx if i is not None)
tmp = self.__cgetitem__(idx)
gv = carray(v, self.ga.typecode, False, 'A', 0, self.context, GpuArray)
array_setarray(tmp, gv)
def take1(self, GpuArray idx):
"""
take1(idx)
"""
cdef GpuArray res
cdef size_t odim
if idx.ga.nd != 1:
raise ValueError, "Expected index with nd=1"
odim = self.ga.dimensions[0]
try:
self.ga.dimensions[0] = idx.ga.dimensions[0]
res = pygpu_empty_like(self, GA_C_ORDER, -1)
finally:
self.ga.dimensions[0] = odim
array_take1(res, self, idx, 1)
return res
def __hash__(self):
raise TypeError, "unhashable type '%s'" % (self.__class__,)
def __nonzero__(self):
cdef int sz = self.size
if sz == 0:
return False
if sz == 1:
return bool(numpy.asarray(self))
else:
raise ValueError, "Truth value of array with more than one element is ambiguous"
property shape:
"shape of this ndarray (tuple)"
def __get__(self):
cdef unsigned int i
res = [None] * self.ga.nd
for i in range(self.ga.nd):
res[i] = self.ga.dimensions[i]
return tuple(res)
def __set__(self, newshape):
# We support -1 only in a call to reshape
cdef size_t *newdims
cdef unsigned int nd
cdef unsigned int i
cdef int err
nd = <unsigned int>len(newshape)
newdims = <size_t *>calloc(nd, sizeof(size_t))
if newdims == NULL:
raise MemoryError, "calloc"
try:
for i in range(nd):
newdims[i] = newshape[i]
err = GpuArray_reshape_inplace(&self.ga, nd, newdims, GA_C_ORDER)
if err != GA_NO_ERROR:
raise get_exc(err), GpuArray_error(&self.ga, err)
finally:
free(newdims)
property T:
def __get__(self):
return pygpu_transpose(self, NULL)
property size:
"The number of elements in this object."
def __get__(self):
cdef size_t res = 1
cdef unsigned int i
for i in range(self.ga.nd):
res *= self.ga.dimensions[i]
return res
property strides:
"data pointer strides (in bytes)"
def __get__(self):
cdef unsigned int i
res = [None] * self.ga.nd
for i in range(self.ga.nd):
res[i] = self.ga.strides[i]
return tuple(res)
def __set__(self, newstrides):
cdef unsigned int i
if len(newstrides) != self.ga.nd:
raise ValueError("new strides are the wrong length")
if not strides_ok(self, newstrides):
raise ValueError("new strides go outside of allocated memory")
for i in range(self.ga.nd):
self.ga.strides[i] = newstrides[i]
array_fix_flags(self)
property ndim:
"The number of dimensions in this object"
def __get__(self):
return self.ga.nd
property dtype:
"The dtype of the element"
def __get__(self):
return typecode_to_dtype(self.ga.typecode)
property typecode:
"The gpuarray typecode for the data type of the array"
def __get__(self):
return self.ga.typecode
property itemsize:
"The size of the base element."
def __get__(self):
return gpuarray_get_elsize(self.ga.typecode)
property flags:
"""Return a flags object describing the properties of this array.
This is mostly numpy-compatible with some exceptions:
* Flags are always constant (numpy allows modification of certain flags in certain cicumstances).
* OWNDATA is always True, since the data is refcounted in libgpuarray.
* UPDATEIFCOPY/WRITEBACKIFCOPY are not supported, therefore always False.
"""
def __get__(self):
return flags(self.ga.flags)
property offset:
"Return the offset into the gpudata pointer for this array."
def __get__(self):
return self.ga.offset
property data:
"""Return a pointer to the raw OpenCL buffer object.
This will fail for arrays that have an offset.
"""
def __get__(self):
if self.context.kind != b"opencl":
raise TypeError("This is for OpenCL arrays.")
if self.offset != 0:
raise ValueError("This array has an offset.")
# This wizardry grabs the actual backend pointer since it's
# guarenteed to be the first element of the gpudata
# structure.
return <size_t>((<void **>self.ga.data)[0])
property base_data:
"Return a pointer to the backing OpenCL object."
def __get__(self):
if self.context.kind != b"opencl":
raise TypeError("This is for OpenCL arrays.")
# This wizardry grabs the actual backend pointer since it's
# guarenteed to be the first element of the gpudata
# structure.
return <size_t>((<void **>self.ga.data)[0])
property gpudata:
"Return a pointer to the raw backend object."
def __get__(self):
if self.context.kind != b"cuda":
raise TypeError("This is for CUDA arrays.")
# This wizardry grabs the actual backend pointer since it's
# guarenteed to be the first element of the gpudata
# structure.
return <size_t>((<void **>self.ga.data)[0]) + self.offset
def __str__(self):
return str(numpy.asarray(self))
def __repr__(self):
try:
return 'gpuarray.' + repr(numpy.asarray(self))
except Exception:
return 'gpuarray.array(<content not available>)'
cdef class GpuKernel:
"""
GpuKernel(source, name, types, context=None, have_double=False, have_small=False, have_complex=False, have_half=False, cuda=False, opencl=False)
Compile a kernel on the device
The kernel function is retrieved using the provided `name` which
must match what you named your kernel in `source`. You can safely
reuse the same name multiple times.
The `have_*` parameter are there to tell libgpuarray that we need
the particular type or feature to work for this kernel. If the
request can't be satisfied a :class:`.UnsupportedException` will be
raised in the constructor.
Once you have the kernel object you can simply call it like so::
k = GpuKernel(...)
k(param1, param2, n=n)
where `n` is the minimum number of threads to run. libgpuarray
will try to stay close to this number but may run a few more
threads to match the hardware preferred multiple and stay
efficient. You should watch out for this in your code and make
sure to test against the size of your data.
If you want more control over thread allocation you can use the
`gs` and `ls` parameters like so::
k = GpuKernel(...)
k(param1, param2, gs=gs, ls=ls)
If you choose to use this interface, make sure to stay within the
limits of `k.maxlsize` or the call will fail.
Parameters
----------
source: str
complete kernel source code
name: str
function name of the kernel
types: list or tuple
list of argument types
context: GpuContext
device on which the kernel is compiled
have_double: bool
ensure working doubles?
have_small: bool
ensure types smaller than float will work?
have_complex: bool
ensure complex types will work?
have_half: bool
ensure half-floats will work?
cuda: bool
kernel is cuda code?
opencl: bool
kernel is opencl code?
Notes
-----
With the cuda backend, unless you use the cluda include, you must
either pass the mangled name of your kernel or declare the
function 'extern "C"', because cuda uses a C++ compiler
unconditionally.
.. warning::
If you do not set the `have_` flags properly, you will either
get a device-specific error (the good case) or silent
completely bogus data (the bad case).
"""
def __dealloc__(self):
cdef unsigned int numargs
cdef int *types
cdef unsigned int i
cdef int res
# We need to do all of this at the C level to avoid touching
# python stuff that could be gone and to avoid exceptions
if self.k.k is not NULL:
res = gpukernel_property(self.k.k, GA_KERNEL_PROP_NUMARGS, &numargs)
if res != GA_NO_ERROR:
return
res = gpukernel_property(self.k.k, GA_KERNEL_PROP_TYPES, &types)
if res != GA_NO_ERROR:
return
for i in range(numargs):
if types[i] != GA_BUFFER:
free(self.callbuf[i])
kernel_clear(self)
free(self.callbuf)
def __reduce__(self):
raise RuntimeError, "Cannot pickle GpuKernel object"
def __cinit__(self, source, name, types, GpuContext context=None,
have_double=False, have_small=False, have_complex=False,
have_half=False, cuda=False, opencl=False, *a, **kwa):
cdef const char *s[1]
cdef size_t l
cdef unsigned int numargs
cdef unsigned int i
cdef int *_types
cdef int flags = 0
source = _s(source)
name = _s(name)
self.context = ensure_context(context)
if have_double:
flags |= GA_USE_DOUBLE
if have_small:
flags |= GA_USE_SMALL
if have_complex:
flags |= GA_USE_COMPLEX
if have_half:
flags |= GA_USE_HALF
if cuda:
flags |= GA_USE_CUDA
if opencl:
flags |= GA_USE_OPENCL
s[0] = source
l = len(source)
numargs = <unsigned int>len(types)
self.callbuf = <void **>calloc(len(types), sizeof(void *))
if self.callbuf == NULL:
raise MemoryError
_types = <int *>calloc(numargs, sizeof(int))
if _types == NULL:
raise MemoryError
try:
for i in range(numargs):
if (types[i] == GpuArray):
_types[i] = GA_BUFFER
else:
_types[i] = dtype_to_typecode(types[i])
self.callbuf[i] = malloc(gpuarray_get_elsize(_types[i]))
if self.callbuf[i] == NULL:
raise MemoryError
kernel_init(self, self.context.ctx, 1, s, &l,
name, numargs, _types, flags)
finally:
free(_types)
def __call__(self, *args, n=None, gs=None, ls=None, shared=0):
"""
__call__(*args, n=None, gs=None, ls=None, shared=0)
"""
if n is None and (ls is None or gs is None):
raise ValueError, "Must specify size (n) or both gs and ls"
self.do_call(n, gs, ls, args, shared)
cdef do_call(self, py_n, py_gs, py_ls, py_args, size_t shared):
cdef size_t n
cdef size_t gs[3]
cdef size_t ls[3]
cdef size_t tmp
cdef unsigned int nd
cdef const int *types
cdef unsigned int numargs
cdef unsigned int i
nd = 0
if py_ls is None:
ls[0] = 0
nd = 1
else:
if isinstance(py_ls, int):
ls[0] = py_ls
nd = 1
elif isinstance(py_ls, (list, tuple)):
if len(py_ls) > 3:
raise ValueError, "ls is not of length 3 or less"
nd = len(py_ls)
if nd >= 3:
ls[2] = py_ls[2]
if nd >= 2:
ls[1] = py_ls[1]
if nd >= 1:
ls[0] = py_ls[0]
else:
raise TypeError, "ls is not int or list"
if py_gs is None:
if nd != 1:
raise ValueError, "nd mismatch for gs (None)"
gs[0] = 0
else:
if isinstance(py_gs, int):
if nd != 1:
raise ValueError, "nd mismatch for gs (int)"
gs[0] = py_gs
elif isinstance(py_gs, (list, tuple)):
if len(py_gs) > 3:
raise ValueError, "gs is not of length 3 or less"
if len(py_ls) != nd:
raise ValueError, "nd mismatch for gs (tuple)"
if nd >= 3:
gs[2] = py_gs[2]
if nd >= 2:
gs[1] = py_gs[1]
if nd >= 1:
gs[0] = py_gs[0]
else:
raise TypeError, "gs is not int or list"
numargs = self.numargs
if len(py_args) != numargs:
raise TypeError, "Expected %d arguments, got %d," % (numargs, len(py_args))
kernel_property(self, GA_KERNEL_PROP_TYPES, &types)
for i in range(numargs):
self._setarg(i, types[i], py_args[i])
if py_n is not None:
if nd != 1:
raise ValueError, "n is specified and nd != 1"
n = py_n
kernel_sched(self, n, &gs[0], &ls[0])
kernel_call(self, nd, gs, ls, shared, self.callbuf)
cdef _setarg(self, unsigned int index, int typecode, object o):
if typecode == GA_BUFFER:
if not isinstance(o, GpuArray):
raise TypeError, "expected a GpuArray"
self.callbuf[index] = <void *>((<GpuArray>o).ga.data)
elif typecode == GA_SIZE:
(<size_t *>self.callbuf[index])[0] = o
elif typecode == GA_SSIZE:
(<ssize_t *>self.callbuf[index])[0] = o
elif typecode == GA_FLOAT:
(<float *>self.callbuf[index])[0] = o
elif typecode == GA_DOUBLE:
(<double *>self.callbuf[index])[0] = o
elif typecode == GA_BYTE:
(<signed char *>self.callbuf[index])[0] = o
elif typecode == GA_UBYTE:
(<unsigned char *>self.callbuf[index])[0] = o
elif typecode == GA_SHORT:
(<short *>self.callbuf[index])[0] = o
elif typecode == GA_USHORT:
(<unsigned short *>self.callbuf[index])[0] = o
elif typecode == GA_INT:
(<int *>self.callbuf[index])[0] = o
elif typecode == GA_UINT:
(<unsigned int *>self.callbuf[index])[0] = o
elif typecode == GA_LONG:
(<long *>self.callbuf[index])[0] = o
elif typecode == GA_ULONG:
(<unsigned long *>self.callbuf[index])[0] = o
else:
raise ValueError("Bad typecode in _setarg: %d "
"(please report this, it is a bug)" % (typecode,))
property maxlsize:
"Maximum local size for this kernel"
def __get__(self):
cdef size_t res
kernel_property(self, GA_KERNEL_PROP_MAXLSIZE, &res)
return res
property preflsize:
"Preferred multiple for local size for this kernel"
def __get__(self):
cdef size_t res
kernel_property(self, GA_KERNEL_PROP_PREFLSIZE, &res)
return res
property numargs:
"Number of arguments to kernel"
def __get__(self):
cdef unsigned int res
kernel_property(self, GA_KERNEL_PROP_NUMARGS, &res)
return res
|