1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430
|
.. _NEP42:
==============================================================================
NEP 42 — New and extensible DTypes
==============================================================================
:title: New and extensible DTypes
:Author: Sebastian Berg
:Author: Ben Nathanson
:Author: Marten van Kerkwijk
:Status: Accepted
:Type: Standard
:Created: 2019-07-17
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2020-October/081038.html
.. note::
This NEP is third in a series:
- :ref:`NEP 40 <NEP40>` explains the shortcomings of NumPy's dtype implementation.
- :ref:`NEP 41 <NEP41>` gives an overview of our proposed replacement.
- NEP 42 (this document) describes the new design's datatype-related APIs.
- :ref:`NEP 43 <NEP43>` describes the new design's API for universal functions.
******************************************************************************
Abstract
******************************************************************************
NumPy's dtype architecture is monolithic -- each dtype is an instance of a
single class. There's no principled way to expand it for new dtypes, and the
code is difficult to read and maintain.
As :ref:`NEP 41 <NEP41>` explains, we are proposing a new architecture that is
modular and open to user additions. dtypes will derive from a new ``DType``
class serving as the extension point for new types. ``np.dtype("float64")``
will return an instance of a ``Float64`` class, a subclass of root class
``np.dtype``.
This NEP is one of two that lay out the design and API of this new
architecture. This NEP addresses dtype implementation; :ref:`NEP 43 <NEP43>` addresses
universal functions.
.. note::
Details of the private and external APIs may change to reflect user
comments and implementation constraints. The underlying principles and
choices should not change significantly.
******************************************************************************
Motivation and scope
******************************************************************************
Our goal is to allow user code to create fully featured dtypes for a broad
variety of uses, from physical units (such as meters) to domain-specific
representations of geometric objects. :ref:`NEP 41 <NEP41>` describes a number
of these new dtypes and their benefits.
Any design supporting dtypes must consider:
- How shape and dtype are determined when an array is created
- How array elements are stored and accessed
- The rules for casting dtypes to other dtypes
In addition:
- We want dtypes to comprise a class hierarchy open to new types and to
subhierarchies, as motivated in :ref:`NEP 41 <NEP41>`.
And to provide this,
- We need to define a user API.
All these are the subjects of this NEP.
- The class hierarchy, its relation to the Python scalar types, and its
important attributes are described in `nep42_DType class`_.
- The functionality that will support dtype casting is described in `Casting`_.
- The implementation of item access and storage, and the way shape and dtype
are determined when creating an array, are described in :ref:`nep42_array_coercion`.
- The functionality for users to define their own DTypes is described in
`Public C-API`_.
The API here and in :ref:`NEP 43 <NEP43>` is entirely on the C side. A Python-side version
will be proposed in a future NEP. A future Python API is expected to be
similar, but provide a more convenient API to reuse the functionality of
existing DTypes. It could also provide shorthands to create structured DTypes
similar to Python's
`dataclasses <https://docs.python.org/3.8/library/dataclasses.html>`_.
******************************************************************************
Backward compatibility
******************************************************************************
The disruption is expected to be no greater than that of a typical NumPy
release.
- The main issues are noted in :ref:`NEP 41 <NEP41>` and will mostly affect
heavy users of the NumPy C-API.
- Eventually we will want to deprecate the API currently used for creating
user-defined dtypes.
- Small, rarely noticed inconsistencies are likely to change. Examples:
- ``np.array(np.nan, dtype=np.int64)`` behaves differently from
``np.array([np.nan], dtype=np.int64)`` with the latter raising an error.
This may require identical results (either both error or both succeed).
- ``np.array([array_like])`` sometimes behaves differently from
``np.array([np.array(array_like)])``
- array operations may or may not preserve dtype metadata
- Documentation that describes the internal structure of dtypes will need
to be updated.
The new code must pass NumPy's regular test suite, giving some assurance that
the changes are compatible with existing code.
******************************************************************************
Usage and impact
******************************************************************************
We believe the few structures in this section are sufficient to consolidate
NumPy's present functionality and also to support complex user-defined DTypes.
The rest of the NEP fills in details and provides support for the claim.
Again, though Python is used for illustration, the implementation is a C API only; a
future NEP will tackle the Python API.
After implementing this NEP, creating a DType will be possible by implementing
the following outlined DType base class,
that is further described in `nep42_DType class`_:
.. code-block:: python
:dedent: 0
class DType(np.dtype):
type : type # Python scalar type
parametric : bool # (may be indicated by superclass)
@property
def canonical(self) -> bool:
raise NotImplementedError
def ensure_canonical(self : DType) -> DType:
raise NotImplementedError
For casting, a large part of the functionality is provided by the "methods" stored
in ``_castingimpl``
.. code-block:: python
:dedent: 0
@classmethod
def common_dtype(cls : DTypeMeta, other : DTypeMeta) -> DTypeMeta:
raise NotImplementedError
def common_instance(self : DType, other : DType) -> DType:
raise NotImplementedError
# A mapping of "methods" each detailing how to cast to another DType
# (further specified at the end of the section)
_castingimpl = {}
For array-coercion, also part of casting:
.. code-block:: python
:dedent: 0
def __dtype_setitem__(self, item_pointer, value):
raise NotImplementedError
def __dtype_getitem__(self, item_pointer, base_obj) -> object:
raise NotImplementedError
@classmethod
def __discover_descr_from_pyobject__(cls, obj : object) -> DType:
raise NotImplementedError
# initially private:
@classmethod
def _known_scalar_type(cls, obj : object) -> bool:
raise NotImplementedError
Other elements of the casting implementation is the ``CastingImpl``:
.. code-block:: python
:dedent: 0
casting = Union["safe", "same_kind", "unsafe"]
class CastingImpl:
# Object describing and performing the cast
casting : casting
def resolve_descriptors(self, Tuple[DTypeMeta], Tuple[DType|None] : input) -> (casting, Tuple[DType]):
raise NotImplementedError
# initially private:
def _get_loop(...) -> lowlevel_C_loop:
raise NotImplementedError
which describes the casting from one DType to another. In
:ref:`NEP 43 <NEP43>` this ``CastingImpl`` object is used unchanged to
support universal functions.
Note that the name ``CastingImpl`` here will be generically called
``ArrayMethod`` to accommodate both casting and universal functions.
******************************************************************************
Definitions
******************************************************************************
.. glossary::
dtype
The dtype *instance*; this is the object attached to a numpy array.
DType
Any subclass of the base type ``np.dtype``.
coercion
Conversion of Python types to NumPy arrays and values stored in a NumPy
array.
cast
Conversion of an array to a different dtype.
parametric type
A dtype whose representation can change based on a parameter value,
like a string dtype with a length parameter. All members of the current
``flexible`` dtype class are parametric. See
:ref:`NEP 40 <parametric-datatype-discussion>`.
promotion
Finding a dtype that can perform an operation on a mix of dtypes without
loss of information.
safe cast
A cast is safe if no information is lost when changing type.
On the C level we use ``descriptor`` or ``descr`` to mean
*dtype instance*. In the proposed C-API, these terms will distinguish
dtype instances from DType classes.
.. note::
NumPy has an existing class hierarchy for scalar types, as
seen :ref:`in the figure <nep-0040_dtype-hierarchy>` of
:ref:`NEP 40 <NEP40>`, and the new DType hierarchy will resemble it. The
types are used as an attribute of the single dtype class in the current
NumPy; they're not dtype classes. They neither harm nor help this work.
.. _nep42_DType class:
******************************************************************************
The DType class
******************************************************************************
This section reviews the structure underlying the proposed DType class,
including the type hierarchy and the use of abstract DTypes.
Class getter
==============================================================================
To create a DType instance from a scalar type users now call
``np.dtype`` (for instance, ``np.dtype(np.int64)``). Sometimes it is
also necessary to access the underlying DType class; this comes up in
particular with type hinting because the "type" of a DType instance is
the DType class. Taking inspiration from type hinting, we propose the
following getter syntax::
np.dtype[np.int64]
to get the DType class corresponding to a scalar type. The notation
works equally well with built-in and user-defined DTypes.
This getter eliminates the need to create an explicit name for every
DType, crowding the ``np`` namespace; the getter itself signifies the
type. It also opens the possibility of making ``np.ndarray`` generic
over DType class using annotations like::
np.ndarray[np.dtype[np.float64]]
The above is fairly verbose, so it is possible that we will include
aliases like::
Float64 = np.dtype[np.float64]
in ``numpy.typing``, thus keeping annotations concise but still
avoiding crowding the ``np`` namespace as discussed above. For a
user-defined DType::
class UserDtype(dtype): ...
one can do ``np.ndarray[UserDtype]``, keeping annotations concise in
that case without introducing boilerplate in NumPy itself. For a
user-defined scalar type::
class UserScalar(generic): ...
we would need to add a typing overload to ``dtype``::
@overload
__new__(cls, dtype: Type[UserScalar], ...) -> UserDtype
to allow ``np.dtype[UserScalar]``.
The initial implementation probably will return only concrete (not abstract)
DTypes.
*This item is still under review.*
Hierarchy and abstract classes
==============================================================================
We will use abstract classes as building blocks of our extensible DType class
hierarchy.
1. Abstract classes are inherited cleanly, in principle allowing checks like
``isinstance(np.dtype("float64"), np.inexact)``.
2. Abstract classes allow a single piece of code to handle a multiplicity of
input types. Code written to accept Complex objects can work with numbers
of any precision; the precision of the results is determined by the
precision of the arguments.
3. There's room for user-created families of DTypes. We can envision an
abstract ``Unit`` class for physical units, with a concrete subclass like
``Float64Unit``. Calling ``Unit(np.float64, "m")`` (``m`` for meters) would
be equivalent to ``Float64Unit("m")``.
4. The implementation of universal functions in :ref:`NEP 43 <NEP43>` may require
a class hierarchy.
**Example:** A NumPy ``Categorical`` class would be a match for pandas
``Categorical`` objects, which can contain integers or general Python objects.
NumPy needs a DType that it can assign a Categorical to, but it also needs
DTypes like ``CategoricalInt64`` and ``CategoricalObject`` such that
``common_dtype(CategoricalInt64, String)`` raises an error, but
``common_dtype(CategoricalObject, String)`` returns an ``object`` DType. In
our scheme, ``Categorical`` is an abstract type with ``CategoricalInt64`` and
``CategoricalObject`` subclasses.
Rules for the class structure, illustrated :ref:`below <nep42_hierarchy_figure>`:
1. Abstract DTypes cannot be instantiated. Instantiating an abstract DType
raises an error, or perhaps returns an instance of a concrete subclass.
Raising an error will be the default behavior and may be required initially.
2. While abstract DTypes may be superclasses, they may also act like Python's
abstract base classes (ABC) allowing registration instead of subclassing.
It may be possible to simply use or inherit from Python ABCs.
3. Concrete DTypes may not be subclassed. In the future this might be relaxed
to allow specialized implementations such as a GPU float64 subclassing a
NumPy float64.
The
`Julia language <https://docs.julialang.org/en/v1/manual/types/#man-abstract-types-1>`_
has a similar prohibition against subclassing concrete types.
For example methods such as the later ``__common_instance__`` or
``__common_dtype__`` cannot work for a subclass unless they were designed
very carefully.
It helps avoid unintended vulnerabilities to implementation changes that
result from subclassing types that were not written to be subclassed.
We believe that the DType API should rather be extended to simplify wrapping
of existing functionality.
The DType class requires C-side storage of methods and additional information,
to be implemented by a ``DTypeMeta`` class. Each ``DType`` class is an
instance of ``DTypeMeta`` with a well-defined and extensible interface;
end users ignore it.
.. _nep42_hierarchy_figure:
.. figure:: _static/dtype_hierarchy.svg
:figclass: align-center
Miscellaneous methods and attributes
==============================================================================
This section collects definitions in the DType class that are not used in
casting and array coercion, which are described in detail below.
* Existing dtype methods (:class:`numpy.dtype`) and C-side fields are preserved.
* ``DType.type`` replaces ``dtype.type``. Unless a use case arises,
``dtype.type`` will be deprecated.
This indicates a Python scalar type which represents the same values as
the DType. This is the same type as used in the proposed `Class getter`_
and for `DType discovery during array coercion`_.
(This can may also be set for abstract DTypes, this is necessary
for array coercion.)
* A new ``self.canonical`` property generalizes the notion of byte order to
indicate whether data has been stored in a default/canonical way. For
existing code, "canonical" will just signify native byte order, but it can
take on new meanings in new DTypes -- for instance, to distinguish a
complex-conjugated instance of Complex which stores ``real - imag`` instead
of ``real + imag``. The ISNBO ("is
native byte order") flag might be repurposed as the canonical flag.
* Support is included for parametric DTypes. A DType will be deemed parametric
if it inherits from ParametricDType.
* DType methods may resemble or even reuse existing Python slots. Thus Python
special slots are off-limits for user-defined DTypes (for instance, defining
``Unit("m") > Unit("cm")``), since we may want to develop a meaning for these
operators that is common to all DTypes.
* Sorting functions are moved to the DType class. They may be implemented by
defining a method ``dtype_get_sort_function(self, sortkind="stable") ->
sortfunction`` that must return ``NotImplemented`` if the given ``sortkind``
is not known.
* Functions that cannot be removed are implemented as special methods.
Many of these were previously defined part of the :c:type:`PyArray_ArrFuncs`
slot of the dtype instance (``PyArray_Descr *``) and include functions
such as ``nonzero``, ``fill`` (used for ``np.arange``), and
``fromstr`` (used to parse text files).
These old methods will be deprecated and replacements
following the new design principles added.
The API is not defined here. Since these methods can be deprecated and renamed
replacements added, it is acceptable if these new methods have to be modified.
* Use of ``kind`` for non-built-in types is discouraged in favor of
``isinstance`` checks. ``kind`` will return the ``__qualname__`` of the
object to ensure uniqueness for all DTypes. On the C side, ``kind`` and
``char`` are set to ``\0`` (NULL character).
While ``kind`` will be discouraged, the current ``np.issubdtype``
may remain the preferred method for this type of check.
* A method ``ensure_canonical(self) -> dtype`` returns a new dtype (or
``self``) with the ``canonical`` flag set.
* Since NumPy's approach is to provide functionality through unfuncs,
functions like sorting that will be implemented in DTypes might eventually be
reimplemented as generalized ufuncs.
.. _nep_42_casting:
******************************************************************************
Casting
******************************************************************************
We review here the operations related to casting arrays:
- Finding the "common dtype," returned by :func:`numpy.promote_types` and
:func:`numpy.result_type`
- The result of calling :func:`numpy.can_cast`
We show how casting arrays with ``astype(new_dtype)`` will be implemented.
`Common DType` operations
==============================================================================
When input types are mixed, a first step is to find a DType that can hold
the result without loss of information -- a "common DType."
Array coercion and concatenation both return a common dtype instance. Most
universal functions use the common DType for dispatching, though they might
not use it for a result (for instance, the result of a comparison is always
bool).
We propose the following implementation:
- For two DType classes::
__common_dtype__(cls, other : DTypeMeta) -> DTypeMeta
Returns a new DType, often one of the inputs, which can represent values
of both input DTypes. This should usually be minimal:
the common DType of ``Int16`` and ``Uint16`` is ``Int32`` and not ``Int64``.
``__common_dtype__`` may return NotImplemented to defer to other and,
like Python operators, subclasses take precedence (their
``__common_dtype__`` method is tried first).
- For two instances of the same DType::
__common_instance__(self: SelfT, other : SelfT) -> SelfT
For nonparametric built-in dtypes, this returns a canonicalized copy of
``self``, preserving metadata. For nonparametric user types, this provides
a default implementation.
- For instances of different DTypes, for example ``>float64`` and ``S8``,
the operation is done in three steps:
1. ``Float64.__common_dtype__(type(>float64), type(S8))``
returns ``String`` (or defers to ``String.__common_dtype__``).
2. The casting machinery (explained in detail below) provides the
information that ``">float64"`` casts to ``"S32"``
3. ``String.__common_instance__("S8", "S32")`` returns the final ``"S32"``.
The benefit of this handoff is to reduce duplicated code and keep concerns
separate. DType implementations don't need to know how to cast, and the
results of casting can be extended to new types, such as a new string encoding.
This means the implementation will work like this::
def common_dtype(DType1, DType2):
common_dtype = type(dtype1).__common_dtype__(type(dtype2))
if common_dtype is NotImplemented:
common_dtype = type(dtype2).__common_dtype__(type(dtype1))
if common_dtype is NotImplemented:
raise TypeError("no common dtype")
return common_dtype
def promote_types(dtype1, dtype2):
common = common_dtype(type(dtype1), type(dtype2))
if type(dtype1) is not common:
# Find what dtype1 is cast to when cast to the common DType
# by using the CastingImpl as described below:
castingimpl = get_castingimpl(type(dtype1), common)
safety, (_, dtype1) = castingimpl.resolve_descriptors(
(common, common), (dtype1, None))
assert safety == "safe" # promotion should normally be a safe cast
if type(dtype2) is not common:
# Same as above branch for dtype1.
if dtype1 is not dtype2:
return common.__common_instance__(dtype1, dtype2)
Some of these steps may be optimized for nonparametric DTypes.
Since the type returned by ``__common_dtype__`` is not necessarily one of the
two arguments, it's not equivalent to NumPy's "safe" casting.
Safe casting works for ``np.promote_types(int16, int64)``, which returns
``int64``, but fails for::
np.promote_types("int64", "float32") -> np.dtype("float64")
It is the responsibility of the DType author to ensure that the inputs
can be safely cast to the ``__common_dtype__``.
Exceptions may apply. For example, casting ``int32`` to
a (long enough) string is at least at this time considered "safe".
However ``np.promote_types(int32, String)`` will *not* be defined.
**Example:**
``object`` always chooses ``object`` as the common DType. For
``datetime64`` type promotion is defined with no other datatype, but if
someone were to implement a new higher precision datetime, then::
HighPrecisionDatetime.__common_dtype__(np.dtype[np.datetime64])
would return ``HighPrecisionDatetime``, and the casting implementation,
as described below, may need to decide how to handle the datetime unit.
**Alternatives:**
- We're pushing the decision on common DTypes to the DType classes. Suppose
instead we could turn to a universal algorithm based on safe casting,
imposing a total order on DTypes and returning the first type that both
arguments could cast to safely.
It would be difficult to devise a reasonable total order, and it would have
to accept new entries. Beyond that, the approach is flawed because
importing a type can change the behavior of a program. For example, a
program requiring the common DType of ``int16`` and ``uint16`` would
ordinarily get the built-in type ``int32`` as the first match; if the
program adds ``import int24``, the first match becomes ``int24`` and the
smaller type might make the program overflow for the first time. [1]_
- A more flexible common DType could be implemented in the future where
``__common_dtype__`` relies on information from the casting logic.
Since ``__commond_dtype__`` is a method a such a default implementation
could be added at a later time.
- The three-step handling of differing dtypes could, of course, be coalesced.
It would lose the value of splitting in return for a possibly faster
execution. But few cases would benefit. Most cases, such as array coercion,
involve a single Python type (and thus dtype).
The cast operation
==============================================================================
Casting is perhaps the most complex and interesting DType operation. It
is much like a typical universal function on arrays, converting one input to a
new output, with two distinctions:
- Casting always requires an explicit output datatype.
- The NumPy iterator API requires access to functions that are lower-level
than what universal functions currently need.
Casting can be complex and may not implement all details of each input
datatype (such as non-native byte order or unaligned access). So a complex
type conversion might entail 3 steps:
1. The input datatype is normalized and prepared for the cast.
2. The cast is performed.
3. The result, which is in a normalized form, is cast to the requested
form (non-native byte order).
Further, NumPy provides different casting kinds or safety specifiers:
* ``equivalent``, allowing only byte-order changes
* ``safe``, requiring a type large enough to preserve value
* ``same_kind``, requiring a safe cast or one within a kind, like float64 to float32
* ``unsafe``, allowing any data conversion
and in some cases a cast may be just a view.
We need to support the two current signatures of ``arr.astype``:
- For DTypes: ``arr.astype(np.String)``
- current spelling ``arr.astype("S")``
- ``np.String`` can be an abstract DType
- For dtypes: ``arr.astype(np.dtype("S8"))``
We also have two signatures of ``np.can_cast``:
- Instance to class: ``np.can_cast(dtype, DType, "safe")``
- Instance to instance: ``np.can_cast(dtype, other_dtype, "safe")``
On the Python level ``dtype`` is overloaded to mean class or instance.
A third ``can_cast`` signature, ``np.can_cast(DType, OtherDType, "safe")``,may be used
internally but need not be exposed to Python.
During DType creation, DTypes will be able to pass a list of ``CastingImpl``
objects, which can define casting to and from the DType.
One of them should define the cast between instances of that DType. It can be
omitted if the DType has only a single implementation and is nonparametric.
Each ``CastingImpl`` has a distinct DType signature:
``CastingImpl[InputDtype, RequestedDtype]``
and implements the following methods and attributes:
* To report safeness,
``resolve_descriptors(self, Tuple[DTypeMeta], Tuple[DType|None] : input) -> casting, Tuple[DType]``.
The ``casting`` output reports safeness (safe, unsafe, or same-kind), and
the tuple is used for more multistep casting, as in the example below.
* To get a casting function,
``get_loop(...) -> function_to_handle_cast (signature to be decided)``
returns a low-level implementation of a strided casting function
("transfer function") capable of performing the
cast.
Initially the implementation will be *private*, and users will only be
able to provide strided loops with the signature.
* For performance, a ``casting`` attribute taking a value of ``equivalent``, ``safe``,
``unsafe``, or ``same-kind``.
**Performing a cast**
.. _nep42_cast_figure:
.. figure:: _static/casting_flow.svg
:figclass: align-center
The above figure illustrates a multistep
cast of an ``int24`` with a value of ``42`` to a string of length 20
(``"S20"``).
We've picked an example where the implementer has only provided limited
functionality: a function to cast an ``int24`` to an ``S8`` string (which can
hold all 24-bit integers). This means multiple conversions are needed.
The full process is:
1. Call
``CastingImpl[Int24, String].resolve_descriptors((Int24, String), (int24, "S20"))``.
This provides the information that ``CastingImpl[Int24, String]`` only
implements the cast of ``int24`` to ``"S8"``.
2. Since ``"S8"`` does not match ``"S20"``, use
``CastingImpl[String, String].get_loop()``
to find the transfer (casting) function to convert an ``"S8"`` into an ``"S20"``
3. Fetch the transfer function to convert an ``int24`` to an ``"S8"`` using
``CastingImpl[Int24, String].get_loop()``
4. Perform the actual cast using the two transfer functions:
``int24(42) -> S8("42") -> S20("42")``.
``resolve_descriptors`` allows the implementation for
``np.array(42, dtype=int24).astype(String)``
to call
``CastingImpl[Int24, String].resolve_descriptors((Int24, String), (int24, None))``.
In this case the result of ``(int24, "S8")`` defines the correct cast:
``np.array(42, dtype=int24).astype(String) == np.array("42", dtype="S8")``.
**Casting safety**
To compute ``np.can_cast(int24, "S20", casting="safe")``, only the
``resolve_descriptors`` function is required and
is called in the same way as in :ref:`the figure describing a cast <nep42_cast_figure>`.
In this case, the calls to ``resolve_descriptors``, will also provide the
information that ``int24 -> "S8"`` as well as ``"S8" -> "S20"`` are safe
casts, and thus also the ``int24 -> "S20"`` is a safe cast.
In some cases, no cast is necessary. For example, on most Linux systems
``np.dtype("long")`` and ``np.dtype("longlong")`` are different dtypes but are
both 64-bit integers. In this case, the cast can be performed using
``long_arr.view("longlong")``. The information that a cast is a view will be
handled by an additional flag. Thus the ``casting`` can have the 8 values in
total: the original 4 of ``equivalent``, ``safe``, ``unsafe``, and ``same-kind``,
plus ``equivalent+view``, ``safe+view``, ``unsafe+view``, and
``same-kind+view``. NumPy currently defines ``dtype1 == dtype2`` to be True
only if byte order matches. This functionality can be replaced with the
combination of "equivalent" casting and the "view" flag.
(For more information on the ``resolve_descriptors`` signature see the
:ref:`nep42_C-API` section below and :ref:`NEP 43 <NEP43>`.)
**Casting between instances of the same DType**
To keep down the number of casting
steps, CastingImpl must be capable of any conversion between all instances
of this DType.
In general the DType implementer must include ``CastingImpl[DType, DType]``
unless there is only a singleton instance.
**General multistep casting**
We could implement certain casts, such as ``int8`` to ``int24``,
even if the user provides only an ``int16 -> int24`` cast. This proposal does
not provide that, but future work might find such casts dynamically, or at least
allow ``resolve_descriptors`` to return arbitrary ``dtypes``.
If ``CastingImpl[Int8, Int24].resolve_descriptors((Int8, Int24), (int8, int24))``
returns ``(int16, int24)``, the actual casting process could be extended to include
the ``int8 -> int16`` cast. This adds a step.
**Example:**
The implementation for casting integers to datetime would generally
say that this cast is unsafe (because it is always an unsafe cast).
Its ``resolve_descriptors`` function may look like::
def resolve_descriptors(self, DTypes, given_dtypes):
from_dtype, to_dtype = given_dtypes
from_dtype = from_dtype.ensure_canonical() # ensure not byte-swapped
if to_dtype is None:
raise TypeError("Cannot convert to a NumPy datetime without a unit")
to_dtype = to_dtype.ensure_canonical() # ensure not byte-swapped
# This is always an "unsafe" cast, but for int64, we can represent
# it by a simple view (if the dtypes are both canonical).
# (represented as C-side flags here).
safety_and_view = NPY_UNSAFE_CASTING | _NPY_CAST_IS_VIEW
return safety_and_view, (from_dtype, to_dtype)
.. note::
While NumPy currently defines integer-to-datetime casts, with the possible
exception of the unit-less ``timedelta64`` it may be better to not define
these casts at all. In general we expect that user defined DTypes will be
using custom methods such as ``unit.drop_unit(arr)`` or ``arr *
unit.seconds``.
**Alternatives:**
- Our design objectives are:
- Minimize the number of DType methods and avoid code duplication.
- Mirror the implementation of universal functions.
- The decision to use only the DType classes in the first step of finding the
correct ``CastingImpl`` in addition to defining ``CastingImpl.casting``,
allows to retain the current default implementation of
``__common_dtype__`` for existing user defined dtypes, which could be
expanded in the future.
- The split into multiple steps may seem to add complexity rather than reduce
it, but it consolidates the signatures of ``np.can_cast(dtype, DTypeClass)``
and ``np.can_cast(dtype, other_dtype)``.
Further, the API guarantees separation of concerns for user DTypes. The user
``Int24`` dtype does not have to handle all string lengths if it does not
wish to do so. Further, an encoding added to the ``String`` DType would
not affect the overall cast. The ``resolve_descriptors`` function
can keep returning the default encoding and the ``CastingImpl[String,
String]`` can take care of any necessary encoding changes.
- The main alternative is moving most of the information that is here pushed
into the ``CastingImpl`` directly into methods on the DTypes. But this
obscures the similarity between casting and universal functions. It does
reduce indirection, as noted below.
- An earlier proposal defined two methods ``__can_cast_to__(self, other)`` to
dynamically return ``CastingImpl``. This
removes the requirement to define all possible casts at DType creation
(of one of the involved DTypes).
Such an API could be added later. It resembles Python's ``__getattr__`` in
providing additional control over attribute lookup.
**Notes:**
``CastingImpl`` is used as a name in this NEP to clarify that it implements
all functionality related to a cast. It is meant to be identical to the
``ArrayMethod`` proposed in NEP 43 as part of restructuring ufuncs to handle
new DTypes. All type definitions are expected to be named ``ArrayMethod``.
The way dispatching works for ``CastingImpl`` is planned to be limited
initially and fully opaque. In the future, it may or may not be moved into a
special UFunc, or behave more like a universal function.
.. _nep42_array_coercion:
Coercion to and from Python objects
==============================================================================
When storing a single value in an array or taking it out, it is necessary to
coerce it -- that is, convert it -- to and from the low-level representation
inside the array.
Coercion is slightly more complex than typical casts. One reason is that a
Python object could itself be a 0-dimensional array or scalar with an
associated DType.
Coercing to and from Python scalars requires two to three
methods that largely correspond to the current definitions:
1. ``__dtype_setitem__(self, item_pointer, value)``
2. ``__dtype_getitem__(self, item_pointer, base_obj) -> object``;
``base_obj`` is for memory management and usually ignored; it points to
an object owning the data. Its only role is to support structured datatypes
with subarrays within NumPy, which currently return views into the array.
The function returns an equivalent Python scalar (i.e. typically a NumPy
scalar).
3. ``__dtype_get_pyitem__(self, item_pointer, base_obj) -> object`` (initially
hidden for new-style user-defined datatypes, may be exposed on user
request). This corresponds to the ``arr.item()`` method also used by
``arr.tolist()`` and returns Python floats, for example, instead of NumPy
floats.
(The above is meant for C-API. A Python-side API would have to use byte
buffers or similar to implement this, which may be useful for prototyping.)
When a certain scalar
has a known (different) dtype, NumPy may in the future use casting instead of
``__dtype_setitem__``.
A user datatype is (initially) expected to implement
``__dtype_setitem__`` for its own ``DType.type`` and all basic Python scalars
it wishes to support (e.g. ``int`` and ``float``). In the future a
function ``known_scalar_type`` may be made public to allow a user dtype to signal
which Python scalars it can store directly.
**Implementation:** The pseudocode implementation for setting a single item in
an array from an arbitrary Python object ``value`` is (some
functions here are defined later)::
def PyArray_Pack(dtype, item_pointer, value):
DType = type(dtype)
if DType.type is type(value) or DType.known_scalartype(type(value)):
return dtype.__dtype_setitem__(item_pointer, value)
# The dtype cannot handle the value, so try casting:
arr = np.array(value)
if arr.dtype is object or arr.ndim != 0:
# not a numpy or user scalar; try using the dtype after all:
return dtype.__dtype_setitem__(item_pointer, value)
arr.astype(dtype)
item_pointer.write(arr[()])
where the call to ``np.array()`` represents the dtype discovery and is
not actually performed.
**Example:** Current ``datetime64`` returns ``np.datetime64`` scalars and can
be assigned from ``np.datetime64``. However, the datetime
``__dtype_setitem__`` also allows assignment from date strings ("2016-05-01")
or Python integers. Additionally the datetime ``__dtype_get_pyitem__``
function actually returns a Python ``datetime.datetime`` object (most of the
time).
**Alternatives:** This functionality could also be implemented as a cast to and
from the ``object`` dtype.
However, coercion is slightly more complex than typical casts.
One reason is that in general a Python object could itself be a
zero-dimensional array or scalar with an associated DType.
Such an object has a DType, and the correct cast to another DType is already
defined::
np.array(np.float32(4), dtype=object).astype(np.float64)
is identical to::
np.array(4, dtype=np.float32).astype(np.float64)
Implementing the first ``object`` to ``np.float64`` cast explicitly,
would require the user to take to duplicate or fall back to existing
casting functionality.
It is certainly possible to describe the coercion to and from Python objects
using the general casting machinery, but the ``object`` dtype is special and
important enough to be handled by NumPy using the presented methods.
**Further issues and discussion:**
- The ``__dtype_setitem__`` function duplicates some code, such as coercion
from a string.
``datetime64`` allows assignment from string, but the same conversion also
occurs for casting from the string dtype to ``datetime64``.
We may in the future expose the ``known_scalartype`` function to allow the
user to implement such duplication.
For example, NumPy would normally use
``np.array(np.string_("2019")).astype(datetime64)``
but ``datetime64`` could choose to use its ``__dtype_setitem__`` instead
for performance reasons.
- There is an issue about how subclasses of scalars should be handled. We
anticipate to stop automatically detecting the dtype for
``np.array(float64_subclass)`` to be float64. The user can still provide
``dtype=np.float64``. However, the above automatic casting using
``np.array(scalar_subclass).astype(requested_dtype)`` will fail. In many
cases, this is not an issue, since the Python ``__float__`` protocol can be
used instead. But in some cases, this will mean that subclasses of Python
scalars will behave differently.
.. note::
*Example:* ``np.complex256`` should not use ``__float__`` in its
``__dtype_setitem__`` method in the future unless it is a known floating
point type. If the scalar is a subclass of a different high precision
floating point type (e.g. ``np.float128``) then this currently loses
precision without notifying the user.
In that case ``np.array(float128_subclass(3), dtype=np.complex256)``
may fail unless the ``float128_subclass`` is first converted to the
``np.float128`` base class.
DType discovery during array coercion
==============================================================================
An important step in the use of NumPy arrays is creation of the array from
collections of generic Python objects.
**Motivation:** Although the distinction is not clear currently, there are two main needs::
np.array([1, 2, 3, 4.])
needs to guess the correct dtype based on the Python objects inside.
Such an array may include a mix of datatypes, as long as they can be
promoted.
A second use case is when users provide the output DType class, but not the
specific DType instance::
np.array([object(), None], dtype=np.dtype[np.string_]) # (or `dtype="S"`)
In this case the user indicates that ``object()`` and ``None`` should be
interpreted as strings.
The need to consider the user provided DType also arises for a future
``Categorical``::
np.array([1, 2, 1, 1, 2], dtype=Categorical)
which must interpret the numbers as unique categorical values rather than
integers.
There are three further issues to consider:
1. It may be desirable to create datatypes associated
with normal Python scalars (such as ``datetime.datetime``) that do not
have a ``dtype`` attribute already.
2. In general, a datatype could represent a sequence, however, NumPy currently
assumes that sequences are always collections of elements
(the sequence cannot be an element itself).
An example would be a ``vector`` DType.
3. An array may itself contain arrays with a specific dtype (even
general Python objects). For example:
``np.array([np.array(None, dtype=object)], dtype=np.String)``
poses the issue of how to handle the included array.
Some of these difficulties arise because finding the correct shape
of the output array and finding the correct datatype are closely related.
**Implementation:** There are two distinct cases above:
1. The user has provided no dtype information.
2. The user provided a DType class -- as represented, for example, by ``"S"``
representing a string of any length.
In the first case, it is necessary to establish a mapping from the Python type(s)
of the constituent elements to the DType class.
Once the DType class is known, the correct dtype instance needs to be found.
In the case of strings, this requires to find the string length.
These two cases shall be implemented by leveraging two pieces of information:
1. ``DType.type``: The current type attribute to indicate which Python scalar
type is associated with the DType class (this is a *class* attribute that always
exists for any datatype and is not limited to array coercion).
2. ``__discover_descr_from_pyobject__(cls, obj) -> dtype``: A classmethod that
returns the correct descriptor given the input object.
Note that only parametric DTypes have to implement this.
For nonparametric DTypes using the default instance will always be acceptable.
The Python scalar type which is already associated with a DType through the
``DType.type`` attribute maps from the DType to the Python scalar type.
At registration time, a DType may choose to allow automatically discover for
this Python scalar type.
This requires a lookup in the opposite direction, which will be implemented
using global a mapping (dictionary-like) of::
known_python_types[type] = DType
Correct garbage collection requires additional care.
If both the Python scalar type (``pytype``) and ``DType`` are created dynamically,
they will potentially be deleted again.
To allow this, it must be possible to make the above mapping weak.
This requires that the ``pytype`` holds a reference of ``DType`` explicitly.
Thus, in addition to building the global mapping, NumPy will store the ``DType`` as
``pytype.__associated_array_dtype__`` in the Python type.
This does *not* define the mapping and should *not* be accessed directly.
In particular potential inheritance of the attribute does not mean that NumPy will use the
superclasses ``DType`` automatically. A new ``DType`` must be created for the
subclass.
.. note::
Python integers do not have a clear/concrete NumPy type associated right
now. This is because during array coercion NumPy currently finds the first
type capable of representing their value in the list of `long`, `unsigned
long`, `int64`, `unsigned int64`, and `object` (on many machines `long` is
64 bit).
Instead they will need to be implemented using an ``AbstractPyInt``. This
DType class can then provide ``__discover_descr_from_pyobject__`` and
return the actual dtype which is e.g. ``np.dtype("int64")``. For
dispatching/promotion in ufuncs, it will also be necessary to dynamically
create ``AbstractPyInt[value]`` classes (creation can be cached), so that
they can provide the current value based promotion functionality provided
by ``np.result_type(python_integer, array)`` [2]_ .
To allow for a DType to accept inputs as scalars that are not basic Python
types or instances of ``DType.type``, we use ``known_scalar_type`` method.
This can allow discovery of a ``vector`` as a scalar (element) instead of a sequence
(for the command ``np.array(vector, dtype=VectorDType)``) even when ``vector`` is itself a
sequence or even an array subclass. This will *not* be public API initially,
but may be made public at a later time.
**Example:** The current datetime DType requires a
``__discover_descr_from_pyobject__`` which returns the correct unit for string
inputs. This allows it to support::
np.array(["2020-01-02", "2020-01-02 11:24"], dtype="M8")
By inspecting the date strings. Together with the common dtype
operation, this allows it to automatically find that the datetime64 unit
should be "minutes".
**NumPy internal implementation:** The implementation to find the correct dtype
will work similar to the following pseudocode::
def find_dtype(array_like):
common_dtype = None
for element in array_like:
# default to object dtype, if unknown
DType = known_python_types.get(type(element), np.dtype[object])
dtype = DType.__discover_descr_from_pyobject__(element)
if common_dtype is None:
common_dtype = dtype
else:
common_dtype = np.promote_types(common_dtype, dtype)
In practice, the input to ``np.array()`` is a mix of sequences and array-like
objects, so that deciding what is an element requires to check whether it
is a sequence.
The full algorithm (without user provided dtypes) thus looks more like::
def find_dtype_recursive(array_like, dtype=None):
"""
Recursively find the dtype for a nested sequences (arrays are not
supported here).
"""
DType = known_python_types.get(type(element), None)
if DType is None and is_array_like(array_like):
# Code for a sequence, an array_like may have a DType we
# can use directly:
for element in array_like:
dtype = find_dtype_recursive(element, dtype=dtype)
return dtype
elif DType is None:
DType = np.dtype[object]
# dtype discovery and promotion as in `find_dtype` above
If the user provides ``DType``, then this DType will be tried first, and the
``dtype`` may need to be cast before the promotion is performed.
**Limitations:** The motivational point 3. of a nested array
``np.array([np.array(None, dtype=object)], dtype=np.String)`` is currently
(sometimes) supported by inspecting all elements of the nested array.
User DTypes will implicitly handle these correctly if the nested array
is of ``object`` dtype.
In some other cases NumPy will retain backward compatibility for existing
functionality only.
NumPy uses such functionality to allow code such as::
>>> np.array([np.array(["2020-05-05"], dtype="S")], dtype=np.datetime64)
array([['2020-05-05']], dtype='datetime64[D]')
which discovers the datetime unit ``D`` (days).
This possibility will not be accessible to user DTypes without an
intermediate cast to ``object`` or a custom function.
The use of a global type map means that an error or warning has to be given if
two DTypes wish to map to the same Python type. In most cases user DTypes
should only be implemented for types defined within the same library to avoid
the potential for conflicts. It will be the DType implementor's responsibility
to be careful about this and use avoid registration when in doubt.
**Alternatives:**
- Instead of a global mapping, we could rely on the scalar attribute
``scalar.__associated_array_dtype__``. This only creates a difference in
behavior for subclasses, and the exact implementation can be undefined
initially. Scalars will be expected to derive from a NumPy scalar. In
principle NumPy could, for a time, still choose to rely on the attribute.
- An earlier proposal for the ``dtype`` discovery algorithm used a two-pass
approach, first finding the correct ``DType`` class and only then
discovering the parametric ``dtype`` instance. It was rejected as
needlessly complex. But it would have enabled value-based promotion
in universal functions, allowing::
np.add(np.array([8], dtype="uint8"), [4])
to return a ``uint8`` result (instead of ``int16``), which currently happens for::
np.add(np.array([8], dtype="uint8"), 4)
(note the list ``[4]`` instead of scalar ``4``).
This is not a feature NumPy currently has or desires to support.
**Further issues and discussion:** It is possible to create a DType
such as Categorical, array, or vector which can only be used if ``dtype=DType``
is provided. Such DTypes cannot roundtrip correctly. For example::
np.array(np.array(1, dtype=Categorical)[()])
will result in an integer array. To get the original ``Categorical`` array
``dtype=Categorical`` will need to be passed explicitly.
This is a general limitation, but round-tripping is always possible if
``dtype=original_arr.dtype`` is passed.
.. _nep42_c-api:
******************************************************************************
Public C-API
******************************************************************************
DType creation
==============================================================================
To create a new DType the user will need to define the methods and attributes
outlined in the `Usage and impact`_ section and detailed throughout this
proposal.
In addition, some methods similar to those in :c:type:`PyArray_ArrFuncs` will
be needed for the slots struct below.
As mentioned in :ref:`NEP 41 <NEP41>`, the interface to define this DType
class in C is modeled after :PEP:`384`: Slots and some additional information
will be passed in a slots struct and identified by ``ssize_t`` integers::
static struct PyArrayMethodDef slots[] = {
{NPY_dt_method, method_implementation},
...,
{0, NULL}
}
typedef struct{
PyTypeObject *typeobj; /* type of python scalar or NULL */
int flags /* flags, including parametric and abstract */
/* NULL terminated CastingImpl; is copied and references are stolen */
CastingImpl *castingimpls[];
PyType_Slot *slots;
PyTypeObject *baseclass; /* Baseclass or NULL */
} PyArrayDTypeMeta_Spec;
PyObject* PyArray_InitDTypeMetaFromSpec(PyArrayDTypeMeta_Spec *dtype_spec);
All of this is passed by copying.
**TODO:** The DType author should be able to define new methods for the
DType, up to defining a full object, and, in the future, possibly even
extending the ``PyArrayDTypeMeta_Type`` struct. We have to decide what to make
available initially. A solution may be to allow inheriting only from an
existing class: ``class MyDType(np.dtype, MyBaseclass)``. If ``np.dtype`` is
first in the method resolution order, this also prevents an undesirable
override of slots like ``==``.
The ``slots`` will be identified by names which are prefixed with ``NPY_dt_``
and are:
* ``is_canonical(self) -> {0, 1}``
* ``ensure_canonical(self) -> dtype``
* ``default_descr(self) -> dtype`` (return must be native and should normally be a singleton)
* ``setitem(self, char *item_ptr, PyObject *value) -> {-1, 0}``
* ``getitem(self, char *item_ptr, PyObject (base_obj) -> object or NULL``
* ``discover_descr_from_pyobject(cls, PyObject) -> dtype or NULL``
* ``common_dtype(cls, other) -> DType, NotImplemented, or NULL``
* ``common_instance(self, other) -> dtype or NULL``
Where possible, a default implementation will be provided if the slot is
omitted or set to ``NULL``. Nonparametric dtypes do not have to implement:
* ``discover_descr_from_pyobject`` (uses ``default_descr`` instead)
* ``common_instance`` (uses ``default_descr`` instead)
* ``ensure_canonical`` (uses ``default_descr`` instead).
Sorting is expected to be implemented using:
* ``get_sort_function(self, NPY_SORTKIND sort_kind) -> {out_sortfunction, NotImplemented, NULL}``.
For convenience, it will be sufficient if the user implements only:
* ``compare(self, char *item_ptr1, char *item_ptr2, int *res) -> {-1, 0, 1}``
**Limitations:** The ``PyArrayDTypeMeta_Spec`` struct is clumsy to extend (for
instance, by adding a version tag to the ``slots`` to indicate a new, longer
version). We could use a function to provide the struct; it would require
memory management but would allow ABI-compatible extension (the struct is
freed again when the DType is created).
CastingImpl
==============================================================================
The external API for ``CastingImpl`` will be limited initially to defining:
* ``casting`` attribute, which can be one of the supported casting kinds.
This is the safest cast possible. For example, casting between two NumPy
strings is of course "safe" in general, but may be "same kind" in a specific
instance if the second string is shorter. If neither type is parametric the
``resolve_descriptors`` must use it.
* ``resolve_descriptors(PyArrayMethodObject *self, PyArray_DTypeMeta *DTypes[2],
PyArray_Descr *dtypes_in[2], PyArray_Descr *dtypes_out[2], NPY_CASTING *casting_out)
-> int {0, -1}`` The out
dtypes must be set correctly to dtypes which the strided loop
(transfer function) can handle. Initially the result must have instances
of the same DType class as the ``CastingImpl`` is defined for. The
``casting`` will be set to ``NPY_EQUIV_CASTING``, ``NPY_SAFE_CASTING``,
``NPY_UNSAFE_CASTING``, or ``NPY_SAME_KIND_CASTING``.
A new, additional flag,
``_NPY_CAST_IS_VIEW``, can be set to indicate that no cast is necessary and a
view is sufficient to perform the cast. The cast should return
``-1`` when an error occurred. If a cast is not possible (but no error
occurred), a ``-1`` result should be returned *without* an error set.
*This point is under consideration, we may use ``-1`` to indicate
a general error, and use a different return value for an impossible cast.*
This means that it is *not* possible to inform the user about why a cast is
impossible.
* ``strided_loop(char **args, npy_intp *dimensions, npy_intp *strides,
...) -> int {0, -1}`` (signature will be fully defined in :ref:`NEP 43 <NEP43>`)
This is identical to the proposed API for ufuncs. The additional ``...``
part of the signature will include information such as the two ``dtype``\s.
More optimized loops are in use internally, and
will be made available to users in the future (see notes).
Although verbose, the API will mimic the one for creating a new DType:
.. code-block:: C
typedef struct{
int flags; /* e.g. whether the cast requires the API */
int nin, nout; /* Number of Input and outputs (always 1) */
NPY_CASTING casting; /* The "minimal casting level" */
PyArray_DTypeMeta *dtypes; /* input and output DType class */
/* NULL terminated slots defining the methods */
PyType_Slot *slots;
} PyArrayMethod_Spec;
The focus differs between casting and general ufuncs. For example, for casts
``nin == nout == 1`` is always correct, while for ufuncs ``casting`` is
expected to be usually `"no"`.
**Notes:** We may initially allow users to define only a single loop.
Internally NumPy optimizes far more, and this should be made public
incrementally in one of two ways:
* Allow multiple versions, such as:
* contiguous inner loop
* strided inner loop
* scalar inner loop
* Or, more likely, expose the ``get_loop`` function which is passed additional
information, such as the fixed strides (similar to our internal API).
* The casting level denotes the minimal guaranteed casting level and can be
``-1`` if the cast may be impossible. For most non-parametric casts, this
value will be the casting level. NumPy may skip the ``resolve_descriptors``
call for ``np.can_cast()`` when the result is ``True`` based on this level.
The example does not yet include setup and error handling. Since these are
similar to the UFunc machinery, they will be defined in :ref:`NEP 43 <NEP43>` and then
incorporated identically into casting.
The slots/methods used will be prefixed with ``NPY_meth_``.
**Alternatives:**
- Aside from name changes and signature tweaks, there seem to be few
alternatives to the above structure. The proposed API using ``*_FromSpec``
function is a good way to achieve a stable and extensible API. The slots
design is extensible and can be changed without breaking binary
compatibility. Convenience functions can still be provided to allow creation
with less code.
- One downside is that compilers cannot warn about function-pointer
incompatibilities.
******************************************************************************
Implementation
******************************************************************************
Steps for implementation are outlined in the Implementation section of
:ref:`NEP 41 <NEP41>`. In brief, we first will rewrite the internals of
casting and array coercion. After that, the new public API will be added
incrementally. We plan to expose it in a preliminary state initially to gain
experience. All functionality currently implemented on the dtypes will be
replaced systematically as new features are added.
******************************************************************************
Alternatives
******************************************************************************
The space of possible implementations is large, so there have been many
discussions, conceptions, and design documents. These are listed in
:ref:`NEP 40 <NEP40>`. Alternatives were also been discussed in the
relevant sections above.
******************************************************************************
References
******************************************************************************
.. [1] To be clear, the program is broken: It should not have stored a value
in the common DType that was below the lowest int16 or above the highest
uint16. It avoided overflow earlier by an accident of implementation.
Nonetheless, we insist that program behavior not be altered just by
importing a type.
.. [2] NumPy currently inspects the value to allow the operations::
np.array([1], dtype=np.uint8) + 1
np.array([1.2], dtype=np.float32) + 1.
to return a ``uint8`` or ``float32`` array respectively. This is
further described in the documentation for :func:`numpy.result_type`.
******************************************************************************
Copyright
******************************************************************************
This document has been placed in the public domain.
|