1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998
|
# Error Handling Rationale and Proposal
## Table of Contents
- [Fundamentals](#fundamentals)
- [Kinds of propagation](#kinds-of-propagation)
- [Kinds of error](#kinds-of-error)
- [Simple domain errors](#simple-domain-errors)
- [Recoverable errors](#recoverable-errors)
- [Universal errors](#universal-errors)
- [Logic failures](#logic-failures)
- [Analysis](#analysis)
- [Propagation methods](#propagation-methods)
- [Marked propagation](#marked-propagation)
- [Typed propagation](#typed-propagation)
- [Typed manual propagation](#typed-manual-propagation)
- [Typed automatic propagation](#typed-automatic-propagation)
- [The default typing rule](#the-default-typing-rule)
- [Enforcement](#enforcement)
- [Specificity](#specificity)
- [Tradeoffs of typed propagation](#tradeoffs-of-typed-propagation)
- [Error Types](#error-types)
- [Implementation design](#implementation-design)
- [Implicit manual propagation](#implicit-manual-propagation)
- [`setjmp` / `longjmp`](#setjmp--longjmp)
- [Table-based unwinding](#table-based-unwinding)
- [Clean-up actions](#clean-up-actions)
- [`finally`](#finally)
- [`defer`](#defer)
- [Destructors](#destructors)
- [Survey](#survey)
- [C](#c)
- [C++](#c-1)
- [Objective-C](#objective-c)
- [Java](#java)
- [C#](#c-2)
- [Haskell](#haskell)
- [Rust](#rust)
- [Go](#go)
- [Scripting languages](#scripting-languages)
- [Proposal](#proposal)
- [Automatic propagation](#automatic-propagation)
- [Typed propagation](#typed-propagation-1)
- [Higher-order polymorphism](#higher-order-polymorphism)
- [Generic polymorphism](#generic-polymorphism)
- [Error type](#error-type)
- [Marked propagation](#marked-propagation-1)
- [Asserting markers](#asserting-markers)
- [Other syntax](#other-syntax)
- [Clean-up actions](#clean-up-actions-1)
- [`using`](#using)
- [C and Objective-C Interoperation](#c-and-objective-c-interoperation)
- [Error types](#error-types-1)
- [Objective-C method error patterns](#objective-c-method-error-patterns)
- [Detecting an error](#detecting-an-error)
- [The error parameter](#the-error-parameter)
- [CoreFoundation functions](#corefoundation-functions)
- [Other C APIs](#other-c-apis)
- [Implementation design](#implementation-design-1)
This paper surveys the error-handling world, analyzes various ideas
which have been proposed or are in practice in other languages, and
ultimately proposes an error-handling scheme for Swift together with
import rules for our APIs.
## Fundamentals
I need to establish some terminology first.
### Kinds of propagation
I've heard people talk about **explicit vs. implicit propagation**. I'm
not going to use those terms, because they're not helpful: there are at
least three different things about error-handling that can be more or
less explicit, and some of the other dimensions are equally important.
The most important dimensions of variation are:
- Whether the language allows functions to be designated as producing
errors or not; such a language has **typed propagation**.
- Whether, in a language with typed propagation, the default rule is
that a function can produce an error or that it can't; this is the
language's **default propagation rule**.
- Whether, in a language with typed propagation, the language enforces
this statically, so that a function which cannot produce an error
cannot call a function which can without handling it; such a
language has **statically-enforced typed propagation**. (A language
could instead enforce this dynamically by automatically inserting
code to assert if an error propagates out. C++ does this.)
- Whether the language requires all potential error sites to be
identifiable as potential error sites; such a language has **marked
propagation**.
- Whether propagation is done explicitly with the normal data-flow and
control-flow tools of the language; such a language has **manual
propagation**. In contrast, a language where control implicitly
jumps from the original error site to the proper handler has
**automatic propagation**.
### Kinds of error
What is an error? There may be many different possible error conditions
in a program, but they can be categorized into several kinds based on
how programmers should be expected to react to them. Since the
programmer is expected to react differently, and since the language is
the tool of the programmer's reaction, it makes sense for each group to
be treated differently in the language.
To be clear, in many cases the kind of error reflects a conscious
decision by the author of the error-producing code, and different
choices might be useful in different contexts. The example I'm going to
use of a "simple domain error" could easily be instead treated as a
"recoverable error" (if the author expected automatic propagation to be
more useful than immediate recovery) or even a "logic failure" (if the
author wished to prevent speculative use, e.g. if checking the
precondition was very expensive).
In order of increasing severity and complexity:
#### Simple domain errors
A simple domain error is something like calling `String.toInt()` on a
string that isn't an integer. The operation has an obvious precondition
about its arguments, but it's useful to be able to pass other values to
test whether they're okay. The client will often handle the error
immediately.
Conditions like this are best modeled with an optional return value.
They don't benefit from a more complex error-handling model, and using
one would make common code unnecessarily awkward. For example,
speculatively trying to parse a `String` as an integer in Java requires
catching an exception, which is far more syntactically heavyweight (and
inefficient without optimization).
Because Swift already has good support for optionals, these conditions
do not need to be a focus of this proposal.
#### Recoverable errors
Recoverable errors include file-not-found, network timeouts, and similar
conditions. The operation has a variety of possible error conditions.
The client should be encouraged to recognize the possibility of the
error condition and consider the right way to handle it. Often, this
will be by aborting the current operation, cleaning up after itself if
needed, and propagating the error to a point where it can more sensibly
be handled, e.g. by reporting it to the user.
These are the error conditions that most of our APIs use `NSError` and
`CFError` for today. Most libraries have some similar notion. This is
the focus of this proposal.
#### Universal errors
The difference between universal errors and ordinary recoverable errors
is less the kind of error condition and more the potential sources of
the error in the language. An error is universal if it could arise from
such a wealth of different circumstances that it becomes nearly
impracticable for the programmer to directly deal with all the sources
of the error.
Some conditions, if they are to be brought into the scope of
error-handling, can only conceivably be dealt with as universal errors.
These include:
- Asynchronous conditions, like the process receiving a `SIGINT`, or
the current thread being cancelled by another thread. These
conditions could, in principle, be delivered at an arbitrary
instruction boundary, and handling them appropriately requires
extraordinary care from the programmer, the compiler, and the
runtime.
- Ubiquitous errors, like running out of memory or overflowing the
stack, that essentially any operation can be assumed to potentially
do.
But other kinds of error condition can essentially become universal
errors with the introduction of abstraction. Reading the size of a
collection, or reading a property of an object, is not an operation that
a programmer would normally expect to produce an error. However, if the
collection is actually backed by a database, the database query might
fail. If the user must write code as if any opaque abstraction might
produce an error, they are stuck in the world of universal errors.
Universal errors mandate certain language approaches. Typed propagation
of universal errors is impossible, other than special cases which can
guarantee to not produce errors. Marked propagation would provoke a user
revolt. Propagation must be automatic, and the implementation must be
"zero-cost", or as near to it as possible, because checking for an error
after every single operation would be prohibitive.
For these reasons, in our APIs, universal error conditions are usually
implemented using Objective-C exceptions, although not all uses of
Objective-C exceptions fall in this category.
This combination of requirements means that all operations must be
implicitly "unwindable" starting from almost any call site it makes. For
the stability of the system, this unwinding process must restore any
invariants that might have been temporarily violated; but the compiler
cannot assist the programmer in this. The programmer must consciously
recognize that an error is possible while an invariant is broken, and
they must do this proactively --- that, or track it down when they
inevitably forget. This requires thinking quite rigorously about one's
code, both to foresee all the error sites and to recognize that an
important invariant is in flux.
How much of a problem this poses depends quite a lot on the code being
written. There are some styles of programming that make it pretty
innocuous. For example, a highly functional program which
conscientiously kept mutation and side-effects to its outermost loops
would naturally have very few points where any invariants were in flux;
propagating an error out of an arbitrary place within an operation would
simply abandon all the work done up to that point. However, this happy
state falls apart quite quickly the more that mutation and other
side-effects come into play. Complex mutations cannot be trivially
reversed. Packets cannot be unsent. And it would be quite amazing for us
to assert that code shouldn't be written that way, understanding nothing
else about it. As long as programmers do face these issues, the language
has some responsibility to help them.
Therefore, in my judgment, promoting the use of universal errors is
highly problematic. They undermine the easy comprehension of code, and
they undermine the language's ability to help the programmer reason
about errors. This design will instead focus on explicitly trackable
errors of the sort that `NSError` is used for today on Apple platforms.
However, there are some important reasons not to rule out universal
errors completely:
- They remain the only viable means of bringing certain error
conditions into the error-handling model, as discussed above. Of
these, most run into various objections; the most important
remaining use case is "escaping", where an unexpected implementation
of an API that was not designed to throw finds itself needing to.
- Objective-C and C++ exceptions are a legitimate interoperation
problem on any conceivable platform Swift targets. Swift must have
some sort of long-term answer for them.
These reasons don't override the problems with universal errors. It is
inherently dangerous to implicitly volunteer functions for unwinding
from an arbitrary point. We don't want to promote this model. However,
it is certainly possible to write code that handles universal errors
correctly; and pragmatically, unwinding through most code will generally
just work. Swift could support a secondary, untyped propagation
mechanism using "zero-cost" exceptions. Code can be written carefully to
minimize the extent of implicit unwinding, e.g. by catching universal
errors immediately after calling an "escaping" API and rethrowing them
with normal typed propagation.
However, this work is outside of the scope of Swift 2.0. We can
comfortably make this decision because doing so doesn't lock us out of
implementing it in the future:
- We do not currently support propagating exceptions through Swift
functions, so changing `catch` to catch them as well would not be a
major compatibility break.
- With some admitted awkwardness, external exceptions can be reflected
into an `Error` - like model automatically by the catch mechanism.
- In the meanwhile, developers who must handle an Objective-C
exception can always do so by writing a stub in Objective-C to
explicitly "bridge" the exception into an `NSError` out parameter.
This isn't ideal, but it's acceptable.
#### Logic failures
The final category is logic failures, including out of bounds array
accesses, forced unwrap of `nil` optionals, and other kinds of
assertions. The programmer has made a mistake, and the failure should be
handled by fixing the code, not by attempting to recover dynamically.
High-reliability systems may need some way to limp on even after an
assertion failure. Tearing down the process can be viewed as a vector
for a denial-of-service attack. However, an assertion failure might
indicate that the process has been corrupted and is under attack, and
limping on anyway may open the system up for other, more serious forms
of security breach.
The correct handling of these error conditions is an open question and
is not a focus of this proposal. Should we decide to make them
recoverable, they will likely follow the same implementation mechanism
as universal errors, if not necessarily the same language rules.
## Analysis
Let's take a deeper look into the different dimensions of error-handling
I laid out above.
### Propagation methods
At a language level, there are two basic ways an error can be propagated
from an error site to something handling it.
The first is that it can be done with the normal evaluation, data flow,
and control flow processes of the language; let's call this **manual
propagation**. Here's a good example of manual propagation using special
return values in an imperative language, C:
```c
struct object *read_object(void) {
char buffer[1024];
ssize_t numRead = read(0, buffer, sizeof(buffer));
if (numRead < 0) return NULL;
...
}
```
Here's an example of manual propagation of an error value through
out-parameters in another imperative language, Objective-C:
```objc
- (BOOL) readKeys: (NSArray<NSString*>**) strings error: (NSError**) err {
while (1) {
NSString *key;
if ([self readKey: &key error: err]) {
return TRUE;
}
...
}
...
}
```
Here's an example of manual propagation using an ADT in an impure
functional language, SML; it's somewhat artificial because the SML
library actually uses exceptions for this:
```sml
fun read_next_cmd () =
case readline(stdin) of
NONE => NONE
| SOME line => if ...
```
All of these excerpts explicitly test for errors using the language's
standard tools for data flow and then explicitly bypass the evaluation
of the remainder of the function using the language's standard tools for
control flow.
The other basic way to propagate errors is in some hidden, more
intrinsic way not directly reflected in the ordinary control flow rules;
let's call this **automatic propagation**. Here's a good example of
automatic propagation using exceptions in an imperative language, Java:
```java
String next = readline();
```
If `readline` encounters an error, it throws an exception; the language
then terminates scopes until it dynamically reaches a `try` statement
with a matching handler. Note the lack of any code at all implying that
this might be happening.
The chief disadvantages of manual propagation are that it's tedious to
write and requires a lot of repetitive boilerplate. This might sound
superficial, but these are serious concerns. Tedium distracts
programmers and makes them careless; careless error-handling code can be
worse than useless. Repetitive boilerplate makes code less readable,
hurting maintainability; it occupies the programmer's time, creating
opportunity costs; it discourages handling errors *well* by making it
burdensome to handle them *at all*; and it encourages shortcuts (such as
extensive macro use) which may undermine other advantages and goals.
The chief disadvantage of automatic propagation is that it obscures the
control flow of the code. I'll talk about this more in the next section.
Note that automatic propagation needn't be intrinsic in a language. The
propagation is automatic if it doesn't correspond to visible constructs
in the source. This effect can be duplicated as a library with any
language facility that allows restructuring of code (e.g. with macros or
other term-rewriting facilities) or overloading of basic syntax (e.g.
Haskell mapping its `do` notation onto monads).
Note also that multiple propagation strategies may be "in play" for any
particular program. For example, Java generally uses exceptions in its
standard libraries, but some specific APIs might opt to instead return
`null` on error for efficiency reasons. Objective-C provides a fairly
full-featured exceptions model, but the standard APIs (with a few
important exceptions) reserve them solely for unrecoverable errors,
preferring manual propagation with `NSError` out-parameters instead.
Haskell has a large number of core library functions which return
`Maybe` values to indicate success or error, but it also offers at least
two features resembling traditional, automatically-propagating
exceptions (the `ErrorT` monad transform and exceptions in the `IO`
monad).
So, while I'm going to talk as if languages implement a single
propagation strategy, it should be understood that reality will always
be more complex. It is literally impossible to prevent programmers from
using manual propagation if they want to. Part of the proposal will
discuss using multiple strategies at once.
### Marked propagation
Closely related to the question of whether propagation is manual or
automatic is whether it is marked or unmarked. Let's say that a language
uses **marked propagation** if there is something *at the call site*
which indicates that propagation is possible from that point.
To a certain extent, every language using manual propagation uses marked
propagation, since the manual code to propagate the error approximately
marks the call which generated the error. However, it is possible for
the propagation logic to get separated from the call.
Marked propagation is at odds with one other major axis of language
design: a language can't solely use marked propagation if it ever
performs implicit operations that can produce errors. For example, a
language that wanted out-of-memory conditions to be recoverable errors
would have to consider everything that could allocate memory to a source
of propagation; in a high-level language, that would include a large
number of implicit operations. Such a language could not claim to use
marked propagation.
The reason this all matters is because unmarked propagation is a pretty
nasty thing to end up with; it makes it impossible to directly see what
operations can produce errors, and therefore to directly understand the
control flow of a function. This leaves you with two options as a
programmer:
- You can carefully consider the actual dynamic behavior of every
function called by your function.
- You can carefully arrange your function so that there are no
critical sections where a universal error can leave things in an
unwanted state.
There are techniques for making the second more palatable. Chiefly, they
involve never writing code that relies on normal control flow to
maintain invariants and clean up after an operation; for example, always
using constructors and destructors in C++ to manage resources. This is
compulsory in C++ with exceptions enabled because of the possibility of
implicit code that can throw, but it could theoretically be used in
other languages. However, it still requires a fairly careful and
rigorous style of programming.
It is possible to imagine a marked form of automatic propagation, where
the propagation itself is implicit except that (local) origination
points have to be explicitly marked. This is part of our proposal, and
I'll discuss it below.
### Typed propagation
The next major question is whether error propagation is explicitly
tracked and limited by the language. That is, is there something
explicitly *in the declaration of a function* that tells the programmer
whether it can produce errors? Let's call this **typed propagation**.
#### Typed manual propagation
Whether propagation is typed is somewhat orthogonal to whether it's
manual or marked, but there are some common patterns. The most dominant
forms of manual propagation are all typed, since they pass the failure
out of the callee, either as a direct result or in an out-parameter.
Here's another example of an out-parameter:
```objc
- (instancetype)initWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding)enc error:(NSError **)error;
```
Out-parameters have some nice advantages. First, they're a reliable
source of marking; even if the actual propagation gets separated from
the call, you can always detect a call that can generate errors as long
as its out-parameter has a recognizable name. Second, some of the
boilerplate can be shared, because you can use the same variable as an
out-parameter multiple times; unfortunately, you can't use this to
"cheat" and only check for an error once unless you have some
conventional guarantee that later calls won't spuriously overwrite the
variable.
A common alternative in functional languages is to return an `Either`
type:
```rust
trait Writer {
fn write_line(&mut self, s: &str) -> Result<(), IoError>;
}
```
This forces the caller to deal with the error if they want to use the
result. This works well unless the call does not really have a
meaningful result (as `write_line` does not); then it depends on whether
language makes it easy to accidentally ignore results. It also tends to
create a lot of awkward nesting:
```rust
fn parse_two_ints_and_add_them() {
match parse_int() {
Err e => Err e
Ok x => match parse_int() {
Err e => Err e
Ok y => Ok (x + y)
}
}
}
```
Here, another level of nesting is required for every sequential
computation that can fail. Overloaded evaluation syntax like Haskell's
`do` notation would help with both of these problems, but only by
switching to a kind of automatic propagation.
Manual propagation can be untyped if it occurs through a side channel.
For example, consider an object which set a flag on itself when it
encountered an error instead of directly returning it; or consider a
variant of POSIX which expected you to separately check `errno` to see
if a particular system call failed.
#### Typed automatic propagation
Languages with typed automatic propagation vary along several
dimensions.
##### The default typing rule
The most important question is whether you opt in to producing errors or
opt out of them. That is, is a function with no specific annotation able
to produce errors or not?
The normal resilience guideline is that you want the lazier option to
preserve more flexibility for the implementation. A function that can
produce errors is definitely more flexible, since it can do more things.
Contrariwise, changing a function that doesn't produce errors into a
function that does clearly changes its contract in ways that callers
need to respond to. Unfortunately, this has some unpleasant
consequences:
- Marked propagation would become very burdensome. Every call would
involve an annotation, either on the function (to say it cannot
generate errors) or on the call site (to mark propagation). Users
would likely rebel against this much bookkeeping.
- Most functions cannot generate recoverable errors in the way I've
defined that. That is, ignoring sources of universal errors, most
functions can be reasonably expected to not be able to produce
errors. But if that's not the default state, that means that most
functions would need annotations; again, that's a lot of tedious
bookkeeping. It's also a lot of clutter in the API.
- Suppose that you notice that a function incorrectly lacks an
annotation. You go to fix it, but you can't without annotating all
of the functions it calls, ad infinitum; like `const` correctness in
C++, the net effect is to punish conscientious users for trying to
improve their code.
- A model which pretends that every function is a source of errors is
likely to be overwhelming for humans. Programmers ought to think
rigorously about their code, but expecting them to also make
rigorous decisions about all the code their code touches is probably
too much. Worse, without marked propagation, the compiler can't
really help the programmer concentrate on the known-possible sources
of error.
- The compiler's analysis for code generation has to assume that all
sorts of things can produce errors when they really can't. This
creates a lot of implicit propagation paths that are actually 100%
dead, which imposes a serious code-size penalty.
The alternative is to say that, by default, functions are not being able
to generate errors. This agrees with what I'm assuming is the most
common case. In terms of resilience, it means expecting users to think
more carefully about which functions can generate errors before
publishing an API; but this is similar to how Swift already asks them to
think carefully about types. Also, they'll have at least added the right
set of annotations for their initial implementation. So I believe this
is a reasonable alternative.
##### Enforcement
The next question is how to enforce the typing rules that prohibit
automatic propagation. Should it be done statically or dynamically? That
is, if a function claims to not generate errors, and it calls a function
that generates errors without handling the error, should that be a
compiler error or a runtime assertion?
The only real benefit of dynamic enforcement is that it makes it easier
to use a function that's incorrectly marked as being able to produce
errors. That's a real problem if all functions are assumed to produce
errors by default, because the mistake could just be an error of
omission. If, however, functions are assumed to not produce errors, then
someone must have taken deliberate action that introduced the mistake. I
feel like the vastly improved static type-checking is worth some
annoyance in this case.
Meanwhile, dynamic enforcement undermines most of the benefits of typed
propagation so completely that it's hardly worth considering. The only
benefit that really remains is that the annotation serves as meaningful
documentation. So for the rest of this paper, assume that typed
propagation is statically enforced unless otherwise indicated.
##### Specificity
The last question is how specific the typing should be: should a
function be able to state the specific classes of errors it produces, or
should the annotation be merely boolean?
Experience with Java suggests that getting over-specific with exception
types doesn't really work out for the best. It's useful to be able to
recognize specific classes of error, but libraries generally want to
reserve flexibility about the exact kind of error they produce, and so
many errors just end up falling into broad buckets. Different libraries
end up with their own library-specific general error classes, and
exception lists end up just restating the library's own dependencies or
wrapping the underlying errors in ways that lose critical information.
#### Tradeoffs of typed propagation
Typed propagation has a number of advantages and disadvantages, mostly
independent of whether the propagation is automatic.
The chief advantage is that it is safer. It forces programmers to do
*something* to handle or propagate errors. That comes with some
downsides, which I'll talk about, but I see this as a fairly core static
safety guarantee. This is especially important in an environment where
shuttling operations between threads is common, since it calls out the
common situation where an error needs to propagate back to the
originating thread somehow.
Even if we're settled on using typed propagation, we should be aware of
the disadvantages and investigate ways to ameliorate them:
- Any sort of polymorphism gets more complicated, especially
higher-order functions. Functions which cannot generate errors are
in principle subtypes of functions which can. But:
- Composability suffers. A higher-order function must decide
whether its function argument is allowed to generate errors. If
not, the function may be significantly limiting its usability,
or at least making itself much more difficult to use with
error-generating functions. If so, passing a function that does
not may require a conversion (an awkward explicit one if using
manual propagation), and the result of the call will likely
claim to be able to generate errors when, in fact, it cannot.
This can be solved with overloads, but that's a lot of
boilerplate and redundancy, especially for calls that take
multiple functions (like the function composition operator).
- If an implicit conversion is allowed, it may need to introduce
thunks. In some cases, these thunks would be inlineable
---except that, actually, it is quite useful for code to be able
to reverse this conversion and dynamically detect functions that
cannot actually generate errors. For example, an algorithm might
be able to avoid some unnecessary bookkeeping if it knows that
its function argument never fails. This poses some
representation challenges.
- It tends to promote decentralized error handling instead of letting
errors propagate to a level that actually knows how to handle them.
- Some programmers will always be tempted to incorrectly pepper
their code with handlers that just swallow errors instead of
correctly propagating them to the right place. This is often
worse than useless; it would often be better if the error just
propagated silently, because the result can be a system in an
inconsistent state with no record of why. Good language and
library facilities for propagating errors can help avoid this,
especially when moving actions between threads.
- There are many situations where errors are not actually possible
because the programmer has carefully restricted the input. For
example, matching `/[0-9]{4}/` and then parsing the result as an
integer. It needs to be convenient to do this in a context that
cannot actually propagate errors, but the facility to do this
needs to be carefully designed to discourage use for swallowing
real errors. It might be sufficient if the facility does not
actually swallow the error, but instead causes a real failure.
- It is possible that the ease of higher-order programming in
Swift might ameliorate many of these problems by letting users
writing error-handling combinators. That is, in situations where
a lazy Java programmer would find themselves writing a
`try/catch` to swallow an exception, Swift would allow them to
do something more correct with equal convenience.
One other minor advantage of marked, statically-enforced typed
propagation: it's a boon for certain kinds of refactoring. Specifically,
when a refactor makes an operation error-producing when it wasn't
before, the absence of any those properties makes the refactor more
treacherous and increases the odds of accidentally introducing a bug. If
propagation is untyped, or the typing isn't statically enforced, the
compiler isn't going to help you at all to find call sites which need to
have error-checking code. Even with static typed propagation, if the
propagation isn't marked specifically on the call site, the compiler
won't warn you about calls made from contexts that can handle or
implicitly propagate the error. But if all these things are true, the
compiler will force you to look at all the existing call sites
individually.
### Error Types
There are many kinds of error. It's important to be able to recognize
and respond to specific error causes programmatically. Swift should
support easy pattern-matching for this.
But I've never really seen a point to coarser-grained categorization
than that; for example, I'm not sure how you're supposed to react to an
arbitrary, unknown IO error. And if there are useful error categories,
they can probably be expressed with predicates instead of public
subclasses. I think we start with a uni-type here and then challenge
people to come up with reasons why they need anything more.
### Implementation design
There are several different common strategies for implementing automatic
error propagation. (Manual propagation doesn't need special attention in
the implementation design.)
The implementation has two basic tasks common to most languages:
- Transferring control through scopes and functions to the appropriate
handler for the error.
- Performing various semantic "clean up" tasks for the scopes that
were abruptly terminated:
- tearing down local variables, like C++ variables with
destructors or strong/weak references in ARC-like languages;
- releasing heap-allocated local variables, like captured
variables in Swift or `__block` variables in ObjC;
- executing scope-specific termination code, like C#'s `using` or
Java/ObjC's `synchronized` statements; and
- executing ad hoc cleanup blocks, like `finally` blocks in Java
or `defer` actions in Swift.
Any particular call frame on the stack may have clean-ups or potential
handlers or both; call these **interesting frames**.
#### Implicit manual propagation
One strategy is to implicitly produce code to check for errors and
propagate them up the stack, imitating the code that the programmer
would have written under manual propagation. For example, a function
call could return an optional error in a special result register; the
caller would check this register and, if appropriate, unwind the stack
and return the same value.
Since propagation and unwinding are explicit in the generated code, this
strategy hurts runtime performance along the non-error path more than
the alternatives, and more code is required to do the explicitly
unwinding. Branches involved in testing for errors are usually very easy
to predict, so in hot code the direct performance impact is quite small,
and the total impact is dominated by decreased code locality. Code can't
always be hot, however.
These penalties are suffered even by uninteresting frames unless they
appear in tail position. (An actual tail call isn't necessary; there
just can't be anything that error propagation would skip.) And functions
must do some added setup work before returning.
The upside is that the error path suffers no significant penalties
beyond the code-size impact. The code-size impact can be significant,
however: there is sometimes quite a lot of duplicate code needed for
propagation along the error path.
This approach is therefore relatively even-handed about the error vs.
the non-error path, although it requires some care in order to minimize
code-size penalties for parallel error paths.
#### `setjmp` / `longjmp`
Another strategy to is to dynamically maintain a thread-local stack of
interesting frames. A function with an interesting frame must save
information about its context in a buffer, like `setjmp` would, and then
register that buffer with the runtime. If the scope returns normally,
the buffer is accordingly unregistered. Starting propagation involves
restoring the context for the top of the interesting-frames stack; the
place where execution returns is called the "landing pad".
The advantage of this is that uninteresting frames don't need to do any
work; context restoration just skips over them implicitly. This is
faster both for the error and non-error paths. It is also possible to
optimize this strategy so that (unlike `setjmp`) the test for an error
is implicitly elided: use a slightly different address for the landing
pad, so that propagating errors directly restore to that location.
The downside is that saving the context and registering the frame are
not free:
- Registering the frame requires an access to thread-local state,
which on our platforms means a function call because we're not
willing to commit to anything more specific in the ABI.
- Jumping across arbitrary frames invalidates the callee-save
registers, so the registering frame must save them all eagerly. In
calling conventions with many callee-save registers, this can be
very expensive. However, this is only necessary when it's possible
to resume normal execution from the landing pad: if the landing pad
only has clean-ups and therefore always restarts propagation, those
registers will have been saved and restored further out.
- Languages like C++, ObjC ARC, and Swift that have non-trivial
clean-ups for many local variables tend to have many functions with
interesting frames. This means both that the context-saving
penalties are higher and that skipping uninteresting frames is a
less valuable optimization.
- By the same token, functions in those languages often have many
different clean-ups and/or handlers. For example, every new
non-trivial variable might introduce a new clean-up. The function
must either register a new landing pad for each clean-up (very
expensive!) or track its current position in a way that a
function-wide landing pad can figure out what scope it was in.
This approach can be hybridized with the unwinding approach below so
that the interesting-frames stack abstractly describes the clean-ups in
the frame instead of just restoring control somewhere and expecting the
frame to figure it out. This can decrease the code size impact
significantly for the common case of frames that just need to run some
clean-ups before propagating the error further. It may even completely
eliminate the need for a landing pad.
The ObjC/C++ exceptions system on iOS/ARM32 is kind of like that hybrid.
Propagation and clean-up code is explicit in the function, but the
registered context includes the "personality" information from the
unwinding tables, which makes the decision whether to land at the
landing pad at all. It also uses an optimized `setjmp` implementation
that both avoids some context-saving and threads the branch as described
above.
The ObjC exceptions system on pre-modern runtimes (e.g. on PPC and i386)
uses the standard `setjmp` / `longjmp` functions. Every protected scope
saves the context separately. This is all implemented in a very unsafe
way that does not behave well in the presence of inlining.
Overall, this approach requires a lot of work in the non-error path of
functions with interesting frames. Given that we expect functions with
interesting frames to be very common in Swift, this is not an
implementation approach we would consider in the abstract. However, it
is the implementation approach for C++/ObjC exceptions on iOS/ARM32, so
we need to at least interoperate with that.
#### Table-based unwinding
The final approach is side-table stack unwinding. This relies on being
able to accurately figure out how to unwind through an arbitrary
function on the system, given only the return address of a call it made
and the stack pointer at that point.
On our system, this proceeds as follows. From an instruction pointer,
the system unwinder looks up what linked image (executable or dylib)
that function was loaded from. The linked image contains a special
section, a table of unwind tables indexed by their offset within the
linked image. Every non-leaf function should have an entry within this
table, which provides sufficient information to unwind the function from
an arbitrary call site.
This lookup process is quite expensive, especially since it has to
repeat all the way up the stack until something actually handles the
error. This makes the error path extremely slow. However, no explicit
setup code is required along the non-error path, and so this approach is
sometimes known as "zero-cost". That's something of a misnomer, because
it does have several costs that can affect non-error performance. First,
there's a small amount of load-time work required in order to resolve
relocations to symbols used by the unwind tables. Second, the error path
often requires code in the function, which can decrease code locality
even if never executed. Third, the error path may use information that
the non-error path would otherwise discard. And finally, the unwind
tables themselves can be fairly large, although this is generally only a
binary-size concern because they are carefully arranged to not need to
be loaded off of disk unless an exception is thrown. But overall,
"zero-cost" is close enough to correct.
To unwind a frame in this sense specifically means:
- Deciding whether the function handles the error.
- Cleaning up any interesting scopes that need to be broken down
(either to get to the handler or to leave the function).
- If the function is being fully unwound, restoring any callee-save
registers which the function might have changed.
This is language-specific, and so the table contains language-specific
"personality" information, including a reference to a function to
interpret it. This mechanism means that the unwinder is extremely
flexible; not only can it support arbitrary languages, but it can
support different language-specific unwinding table layouts for the same
language.
Our current personality records for C++ and Objective-C contain just
enough information to decide (1) whether an exception is handled by the
frame and (2) if not, whether a clean-up is currently active. If either
is true, it restores the context of a landing pad, which manually
executes the clean-ups and enters the handler. This approach generally
needs as much code in the function as implicit manual propagation would.
However, we could optimize this for many common cases by causing
clean-ups to be called automatically by the interpretation function.
That is, instead of a landing pad that looks notionally like this:
```c++
void *exception = /*...*/;
SomeCXXType::~SomeCXXType(&foo);
objc_release(bar);
objc_release(baz);
_Unwind_Resume(exception);
```
The unwind table would have a record that looks notionally like this:
```c++
CALL_WITH_FRAME_ADDRESS(&SomeCXXType::~SomeCXXType, FRAME_OFFSET_OF(foo))
CALL_WITH_FRAME_VALUE(&objc_release, FRAME_OFFSET_OF(bar))
CALL_WITH_FRAME_VALUE(&objc_release, FRAME_OFFSET_OF(baz))
RESUME
```
And no code would actually be needed in the function. This would
generally slow the error path down, because the interpretation function
would have to interpret this mini-language, but it would move all the
overhead out of the function and into the error table, where it would be
more compact.
This is something that would also benefit C++ code.
### Clean-up actions
Many languages have a built-in language tool for performing arbitrary
clean-up when exiting a scope. This has two benefits. The first is that,
even ignoring error propagation, it acts as a "scope guard" which
ensures that the clean-up is done if the scope is exited early due to a
`return`, `break`, or `continue` statement; otherwise, the programmer
must carefully duplicate the clean-up in all such places. The second
benefit is that it makes clean-up tractable in the face of automatic
propagation, which creates so many implicit paths of control flow out of
the scope that expecting the programmer to cover them all with explicit
catch-and-rethrow blocks would be ridiculous.
There's an inherent tension in these language features between putting
explicit clean-up code in the order it will be executed and putting it
near the code it's cleaning up after. The former means that a
top-to-bottom read of the code tells you what actions are being
performed when; you don't have to worry about code implicitly
intervening at the end of a scope. The latter makes it easy to verify at
the point that a clean-up is needed that it will eventually happen; you
don't need to scan down to the finally block and analyze what happens
there.
#### `finally`
Java, Objective-C, and many other languages allow `try` statements to
take a `finally` clause. The clause is an ordinary scope and may take
arbitrary actions. The `finally` clause is performed when the preceding
controlled scopes (including any `catch` clauses) are exited in any way:
whether by falling off the end, directly branching or returning out, or
throwing an exception.
`finally` is a rather awkward and verbose language feature. It separates
the clean-up code from the operation that required it (although this has
benefits, as discussed above). It adds a lot of braces and indentation,
so edits that add new clean-ups can require a lot of code to be
reformatted. When the same scope needs multiple clean-ups, the
programmer must either put them in the same `finally` block (and thus
create problems with clean-ups that might terminate the block early) or
stack them up in separate blocks (which can really obscure the otherwise
simple flow of code).
#### `defer`
Go provides a `defer` statement that just enqueues arbitrary code to be
executed when the function exits. (More details of this appear in the
survey of Go.)
This allows the defer action to be written near the code it "balances",
allowing the reader to immediately see that the required clean-up will
be done (but this has drawbacks, as discussed above). It's very compact,
which is nice as most defer actions are short. It also allows multiple
actions to pile up without adding awkward nesting. However, the
function-exit semantics exacerbate the problem of searching for
intervening clean-up actions, and they introduce semantic and
performance problems with capturing the values of local variables.
#### Destructors
C++ allows types to define destructor functions, which are called when a
function goes out of scope.
These are often used directly to clean up the ownership or other
invariants on the type's value. For example, an owning-pointer type
would free its value in its destructor, whereas a hash-table type would
destroy its entries and free its buffer.
But they are also often used idiomatically just for the implicit
destructor call, as a "scope guard" to ensure that something is done
before the current operation completes. For an example close to my own
heart, a compiler might use such a guard when parsing a local scope to
ensure that new declarations are removed from the scope chains even if
the function exits early due to a parse error. Unfortunately, since type
destructors are C++'s only tool for this kind of clean-up, introducing
ad-hoc clean-up code requires defining a new type every time.
The unique advantage of destructors compared to the options above is
that destructors can be tied to temporary values created during the
evaluation of an expression.
Generally, a clean-up action becomes necessary as the result of some
"acquire" operation that occurs during an expression. `defer` and
`finally` do not take effect until the next statement is reached, which
creates an atomicity problem if code can be injected after the acquire.
(For `finally`, this assumes that the acquire appears *before* the
`try`. If instead the acquire appears *within* the `try`, there must be
something which activates the clean-up, and that has the same atomicity
problem.)
In contrast, if the acquire operation always creates a temporary with a
destructor that does the clean-up, the language automatically guarantees
this atomicity. This pattern is called "resource acquisition is
initialization", or "RAII". Under RAII, all resources that require
clean-up are carefully encapsulated within types with user-defined
destructors, and the act of constructing an object of that type is
exactly the act of acquiring the underlying resource.
Swift does not support user-defined destructors on value types, but it
does support general RAII-like programming with class types and `deinit`
methods, although (at the moment) the user must take special care to
keep the object alive, as Swift does not normally guarantee the
destruction order of objects.
RAII is very convenient when there's a definable "resource" and
somebody's already wrapped its acquisition APIs to return
appropriately-destructed objects. For other tasks, where a reasonable
programmer might balk at defining a new type and possibly wrapping an
API for a single purpose, a more *ad hoc* approach may be warranted.
## Survey
### C
C doesn't really have a consensus error-handling scheme. There's a
built-in unwinding mechanism in `setjmp` and `longjmp`, but it's
disliked for a host of good reasons. The dominant idiom in practice is
for a function to encode failure using some unreasonable value for its
result, like a null pointer or a negative count. The bad value(s) are
often function-specific, and sometimes even argument- or state-specific.
On the caller side, it is unfortunately idiomatic (in some codebases) to
have a common label for propagating failure at the end of a function
(hence `goto fail`); this is because there's no inherent language
support for ensuring that necessary cleanup is done before propagating
out of a scope.
### C++
C++ has exceptions. Exceptions can have almost any type in the language.
Propagation typing is tied only to declarations; an indirect function
pointer is generally assumed to be able to throw. Propagation typing
used to allow functions to be specific about the kinds of exceptions
they could throw (`throws (std::exception)`), but this is deprecated in
favor of just indicating whether a function can throw
(`noexcept(false)`).
C++ aspires to making out-of-memory a recoverable condition, and so
allocation can throw. Therefore, it is essentially compulsory for the
language to assume that constructors might throw. Since constructors are
called pervasively and implicitly, it makes sense for the default rule
to be that all functions can throw. Since many error sites are implicit,
there is little choice but to use automatic unmarked propagation. The
only reasonable way to clean up after a scope in such a world is to
allow the compiler to do it automatically. C++ programmers therefore
rely idiomatically on a pattern of shifting all scope cleanup into the
destructors of local variables; sometimes such local values are created
solely to set up a cleanup action in this way.
Different error sites occur with a different set of cleanups active, and
there are a large number of such sites. In fact, prior to C++11,
compilers were forced to assume by default that destructor calls could
throw, so cleanups actually created more error sites. This all adds up
to a significant code-size penalty for exceptions, even in projects
which don't directly use them and which have no interest in recovering
from out-of-memory conditions. For this reason, many C++ projects
explicitly disable exceptions and rely on other error propagation
mechanisms, on which there is no widespread consensus.
### Objective-C
Objective-C has a first-class exceptions mechanism which is similar in
feature set to Java's: `@throw` / `@try` / `@catch` / `@finally`.
Exception values must be instances of an Objective-C class. The language
does a small amount of implicit frame cleanup during exception
propagation: locks held by `@synchronized` are released, stack copies of
`__block` variables are torn down, and ARC `__weak` variables are
destroyed. However, the language does not release object pointers held
in local variables, even (by default) under ARC.
Objective-C exceptions used to be implemented with `setjmp`, `longjmp`,
and thread-local state managed by a runtime, but the only surviving
platform we support which does that is i386, and all others now use a
"zero-cost" implementation that interoperates with C++ exceptions.
Objective-C exceptions are *mostly* only used for unrecoverable
conditions, akin to what I called "failures" above. There are a few
major exceptions to this rule, where APIs do use exceptions to report
errors.
Instead, Objective-C mostly relies on manual propagation, predominantly
using out-parameters of type `NSError**`. Whether the call failed is
usually *not* indicated by whether a non-`nil` error was written into
this parameter; calls are permitted both to succeed and write an error
object into the parameter (which should be ignored) and to report an
error without creating an actual error object. Instead, whether the call
failed is reported in the formal return value. The most common
convention is for a false `BOOL` result or null object result to mean an
error, but ingenious programmers have come up with many other
conventions, and there do exist APIs where a null object result is
valid.
CF APIs, meanwhile, have their own magnificent set of somewhat
inconsistent conventions.
Therefore, we can expect that incrementally improving CF / Objective-C
interoperation is going to be a long and remarkably painful process.
### Java
Java has a first-class exceptions mechanism with unmarked automatic
propagation: `throw` / `try` / `catch` / `finally`. Exception values
must be instances of something inheriting from `Throwable`. Propagation
is generally typed with static enforcement, with the default being that
a call cannot throw exceptions *except* for subclasses of `Error` and
`RuntimeException`. The original intent was that these classes would be
used for catastrophic runtime errors (`Error`) and programming mistakes
caught by the runtime (`RuntimeException`), both of which we would
classify as unrecoverable failures in our scheme; essentially, Java
attempts to promote a fully statically-enforced model where truly
catastrophic problems can still be handled when necessary.
Unfortunately, these motivations don't seem to have been communicated
very well to developers, and the result is kind of a mess.
Java allows methods to be very specific about the kinds of exception
they throw. In my experience, exceptions tend to fall into two
categories:
- There are some very specific exception kinds that callers know to
look for and handle on specific operations. Generally these are
obvious, predictable error conditions, like a host name not
resolving, or like a string not being formatted correctly.
- There are also a lot of very vague, black-box exception kinds that
can't really be usefully responded to. For example, if a method
throws `IOException`, there's really nothing a caller can do except
propagate it and abort the current operation.
So specific typing is useful if you can exhaustively handle a small
number of specific failures. As soon as the exception list includes any
kind of black box type, it might as well be a completely open set.
### C#
C#'s model is almost exactly like Java's except that it is untyped: all
methods are assumed to be able to throw. For this reason, it also has a
simpler type hierarchy, where all exceptions just inherit from
`Exception`.
The rest of the hierarchy doesn't really make any sense to me. Many
things inherit directly from `Exception`, but many other things inherit
from a subclass called `SystemException`. `SystemException` doesn't seem
to be any sort of logical grouping: it includes all the
runtime-assertion exceptions, but it also includes every exception
that's thrown anywhere in the core library, including XML and IO
exceptions.
C# also has a `using` statement, which is useful for binding something
over a precise scope and then automatically disposing it on all paths.
It's just built on top of `try` / `finally`.
### Haskell
Haskell provides three different common error-propagation mechanisms.
The first is that, like many other functional languages, it supports
manual propagation with a `Maybe` type. A function can return `Nothing`
to indicate that it couldn't produce a more useful result. This is the
most common failure method for functions in the functional subset of the
library.
The `IO` monad also provides true exceptions with unmarked automatic
propagation. These exceptions can only be handled as an `IO` action, but
are otherwise untyped: there is no way to indicate whether an `IO`
action can or cannot throw. Exceptions can be thrown either as an `IO`
action or as an ordinary lazy functional computation; in the latter
case, the exception is only thrown if the computation is evaluated for
some reason.
The `ErrorT` monad transform provides typed automatic propagation. In an
amusing twist, since the only native computation of `ErrorT` is
`throwError`, and the reason to write a computation specifically in
`ErrorT` is if it's throwing, and every other computation must be
explicitly lifted into the monad, `ErrorT` effectively uses marked
propagation by omission, since everything that *can't* throw is
explicitly marked with a `lift`:
```haskell
prettyPrintShiftJIS :: ShiftJISString -> ErrorT TranscodeError IO ()
prettyPrintShiftJIS str = do
lift $ putChar '"' -- lift turns an IO computation into an ErrorT computation
case transcodeShiftJISToUTF8 str of
Left error -> throwError error
Right value -> lift $ putEscapedString value
lift $ putChar '"'
```
### Rust
Rust distinguishes between *failures* and *panics*.
A panic is an assertion, designed for what I called logic failures;
there's no way to recover from one, it just immediately crashes.
A failure is just when a function doesn't produce the value you might
expect, which Rust encourages you to express with either `Option<T>`
(for simple cases, like what I described as simple domain errors) or
`Result<T>` (which is effectively the same, except carrying an error).
In either case, it's typed manual propagation, although Rust does at
least offer a standard macro which wraps the common
pattern-match-and-return pattern for `Result<T>`.
The error type in Rust is a very simple protocol, much like this
proposal suggests.
### Go
Go uses an error result, conventionally returned as the final result of
functions that can fail. The caller is expected to manually check
whether this is nil; thus, Go uses typed manual propagation.
The error type in Go is an interface named `error`, with one method that
returns a string description of the error.
Go has a `defer` statement:
```go
defer foo(x, y)
```
The argument has to be a call (possibly a method call, possibly a call
to a closure that you made specifically to immediately call). All the
operands are evaluated immediately and captured in a deferred action.
Immediately after the function exits (through whatever means), all the
deferred actions are executed in LIFO order. Yes, this is tied to
function exit, not scope exit, so you can have a dynamic number of
deferred actions as a sort of implicit undo stack. Overall, it's a nice
if somewhat quirky way to do ad-hoc cleanup actions.
It is also a key part of a second, funky kind of error propagation,
which is essentially untyped automatic propagation. If you call `panic`
--- and certain builtin operations like array accesses behave like they
do --- it immediately unwinds the stack, running deferred actions as it
goes. If a function's deferred action calls `recover`, the panic stops,
the rest of the deferred actions for the function are called, and the
function returns. A deferred action can write to the named results,
allowing a function to turn a panic error into a normal, final-result
error. It's conventional to not panic over API boundaries unless you
really mean it; recoverable errors are supposed to be done with
out-results.
### Scripting languages
Scripting languages generally all use (untyped, obviously) automatic
exception propagation, probably because it would be quite error-prone to
do manual propagation in an untyped language. They pretty much all fit
into the standard C++/Java/C# style of `throw` / `try` / `catch`. Ruby
uses different keywords for it, though.
I feel like Python uses exceptions a lot more than most other scripting
languages do, though.
## Proposal
### Automatic propagation
Swift should use automatic propagation of errors, rather than relying on
the programmer to manually check for them and return out. It's just a
lot less boilerplate for common error handling tasks. This introduces an
implicit control flow problem, but we can ameliorate that with marked
propagation; see below.
There's no compelling reason to deviate from the `throw` / `catch`
legacy here. There are other options, like `raise` / `handle`. In
theory, switching would somewhat dissociate Swift from the legacy of
exceptions; people coming from other languages have a lot of assumptions
about exceptions which don't necessarily apply to Swift. However, our
error model is similar enough to the standard exception model that
people are inevitably going to make the connection; there's no getting
around the need to explain what we're trying to do. So using different
keywords just seems petty.
Therefore, Swift should provide a `throw` expression. It requires an
operand of type `Error` and formally yields an arbitrary type. Its
dynamic behavior is to transfer control to the innermost enclosing
`catch` clause which is satisfied by the operand. A quick example:
```swift
if timeElapsed() > timeThreshold { throw HomeworkError.Overworked }
```
A `catch` clause includes a pattern that matches an error. We want to
repurpose the `try` keyword for marked propagation, which it seems to
fit far better, so `catch` clauses will instead be attached to a
generalized `do` statement:
```swift
do {
// ...
} catch HomeworkError.Overworked {
// a conditionally-executed catch clause
} catch _ {
// a catch-all clause
}
```
Swift should also provide some tools for doing manual propagation. We
should have a standard Rust-like `Result<T>` enum in the library, as
well as a rich set of tools, e.g.:
- A function to evaluate an error-producing closure and capture the
result as a `Result<T>`.
- A function to unpack a `Result<T>` by either returning its value or
propagating the error in the current context.
- A futures library that traffics in `Result<T>` when applicable.
- An overload of `dispatch_sync` which takes an error-producing
closure and propagates an error in the current context.
- etc.
### Typed propagation
Swift should use statically-enforced typed propagation. By default,
functions should not be able to throw. A call to a function which can
throw within a context that is not allowed to throw should be rejected
by the compiler.
Function types should indicate whether the function throws; this needs
to be tracked even for first-class function values. Functions which do
not throw are subtypes of functions that throw.
This would be written with a `throws` clause on the function declaration
or type:
```swift
// This function is not permitted to throw.
func foo() -> Int {
// Therefore this is a semantic error.
return try stream.readInt()
}
// This function is permitted to throw.
func bar() throws -> Int {
return try stream.readInt()
}
// 'throws' is written before the arrow to give a sensible and
// consistent grammar for function types and implicit () result types.
func baz() throws {
if let byte = try stream.getOOB() where byte == PROTO_RESET {
reset()
}
}
// 'throws' appears in a consistent position in function types.
func fred(_ callback: (UInt8) throws -> ()) throws {
while true {
let code = try stream.readByte()
if code == OPER_CLOSE { return }
try callback(code)
}
}
// It only applies to the innermost function for curried functions;
// this function has type:
// (Int) -> (Int) throws -> Int
func jerry(_ i: Int)(j: Int) throws -> Int {
// It's not an error to use 'throws' on a function that can't throw.
return i + j
}
```
The reason to use a keyword here is that it's much nicer for function
declarations, which generally outnumber function types by at least an
order of magnitude. A punctuation mark would be easily lost or mistaken
amidst all the other punctuation in a function declaration, especially
if the punctuation mark were something like `!` that can validly appear
at the end of a parameter type. It makes sense for the keyword to appear
close to the return type, as it's essentially a part of the result and a
programmer should be able to see both parts in the same glance. The
keyword appears before the arrow for the simple reason that the arrow is
optional (along with the rest of the return type) in function and
initializer declarations; having the keyword appear in slightly
different places based on the presence of a return type would be silly
and would make adding a non-void return type feel awkward. The keyword
itself should be descriptive, and it's particularly nice for it to be a
form of the verb used by the throwing expression, conjugated as if
performed by the function itself. Thus, `throw` becomes `throws`; if we
used `raise` instead, this would be `raises`, which I personally find
unappealing for reasons I'm not sure I can put a name to.
It shouldn't be possible to overload functions solely based on whether
the functions throw. That is, this is not legal:
```swift
func foo() { ... } // called in contexts that cannot throw
func foo() throws { ... } // called in contexts that can throw
```
It is valuable to be able to overload higher-order functions based on
whether an argument function throws; it is easy to imagine algorithms
that can be implemented more efficiently if they do not need to worry
about exceptions. (We do not, however, particularly want to encourage a
pattern of duplicating. This is straightforward if the primary
type-checking pass is able to reliably decide whether a function value
can throw.)
Typed propagation checking can generally be performed in a secondary
pass over a type-checked function body: if a function is not permitted
to throw, walk its body and verify that there are no `throw` expressions
or calls to functions that can `throw`. If all throwing calls must be
marked, this can be done prior to type-checking to decide syntactically
whether a function can apparently throw; of course, the later pass is
still necessary, but the ability to do this dramatically simplifies the
implementation of the type-checker, as discussed below. Certain
type-system features may need to be curtailed in order to make this
implementation possible for schedule reasons. (It's important to
understand that this is *not* the motivation for marked propagation.
It's just a convenient consequence that marked propagation makes this
implementation possible.)
Reliably deciding whether a function value can throw is easy for
higher-order uses of declared functions. The problem, as usual, is
anonymous functions. We don't want to require closures to be explicitly
typed as throwing or non-throwing, but the fully-accurate inference
algorithm requires a type-checked function body, and we can't always
type-check an anonymous function independently of its enclosing context.
Therefore, we will rely on being able to do a pass prior to
type-checking to syntactically infer whether a closure throws, then
making a second pass after type-checking to verify the correctness of
that inference. This may break certain kinds of reasonable code, but the
multi-pass approach should let us heuristically unbreak targeted cases.
Typed propagation has implications for all kinds of polymorphism:
#### Higher-order polymorphism
We should make it easy to write higher-order functions that behave
polymorphically w.r.t. whether their arguments throw. This can be done
in a fairly simple way: a function can declare that it throws if any of
a set of named arguments do. As an example (using strawman syntax):
```swift
func map<T, U>(_ array: [T], fn: T throws -> U) throwsIf(fn) -> [U] {
...
}
```
There's no need for a more complex logical operator than disjunction.
You can construct really strange code where a function throws only if
one of its arguments doesn't, but it'd be contrived, and it's hard to
imagine how they could be type-checked without a vastly more
sophisticated approach. Similarly, you can construct situations where
whether a function can throw is value-dependent on some other argument,
like a "should I throw an exception" flag, but it's hard to imagine such
cases being at all important to get right in the language. This schema
is perfectly sufficient to express normal higher-order stuff.
In fact, while the strawman syntax above allows the function to be
specific about exactly which argument functions cause the callee to
throw, that's already overkill in the overwhelmingly likely case of a
function that throws if any of its argument functions throw (and there's
probably only one). So it would probably be better to just have a single
`rethrows` annotation, with vague plans to allow it to be parameterized
in the future if necessary.
This sort of propagation-checking would be a straightforward extension
of the general propagation checker. The normal checker sees that a
function isn't allowed to propagate out and looks for propagation
points. The conditional checker sees that a function has a conditional
propagation clause and looks for propagation points, assuming that the
listed functions don't throw (including when looking at any conditional
propagation clauses). The parameter would have to be a `let`.
We probably do need to get higher-order polymorphism right in the first
release, because we will need it for the short-circuiting operators.
#### Generic polymorphism
It would be useful to be able to parameterize protocols, and protocol
conformances, on whether the operations produce errors. Lacking this
feature means that protocol authors must decide to either conservatively
allow throwing conformances, and thus force all generic code using the
protocol to deal with probably-spurious errors, or aggressively forbid
them, and thus forbid conformances by types whose operations naturally
throw.
There are several different ways we could approach this problem, and
after some investigation I feel confident that they're workable.
Unfortunately, they are clearly out-of-scope for the first release. For
now, the standard library should provide protocols that cannot throw,
even though this limits some potential conformances. (It's worth noting
that such conformances generally aren't legal today, since they'd need
to return an error result somehow.)
A future direction for both generic and higher-order polymorphism is to
consider error propagation to be one of many possible effects in a
general, user-extensible effect tracking system. This would allow the
type system to check that certain specific operations are only allowed
in specific contexts: for example, that a blocking operation is only
allowed in a blocking context.
#### Error type
The Swift standard library will provide `Error`, a protocol with a very
small interface (which is not described in this proposal). The standard
pattern should be to define the conformance of an `enum` to the type:
```swift
enum HomeworkError : Error {
case Overworked
case Impossible
case EatenByCat(Cat)
case StopStressingMeWithYourRules
}
```
The `enum` provides a namespace of errors, a list of possible errors
within that namespace, and optional values to attach to each option.
For now, the list of errors in a domain will be fixed, but permitting
future extension is just ordinary enum resilience, and the standard
techniques for that will work fine in the future.
Note that this corresponds very cleanly to the `NSError` model of an
error domain, an error code, and optional user data. We expect to import
system error domains as enums that follow this approach and implement
`Error`. `NSError` and `CFError` themselves will also conform to
`Error`.
The physical representation (still being nailed down) will make it
efficient to embed an `NSError` as an `Error` and vice-versa. It should
be possible to turn an arbitrary Swift `enum` that conforms to `Error`
into an `NSError` by using the qualified type name as the domain key,
the enumerator as the error code, and turning the payload into user
data.
It's acceptable to allocate memory whenever an error is needed, but our
representation should not inhibit the optimizer from forwarding a
`throw` directly to a `catch` and removing the intermediate error
object.
### Marked propagation
Swift should use marked propagation: there should be some lightweight
bit of syntax decorating anything that is known be able to throw (other
than a `throw` expression itself, of course).
Our proposed syntax is to repurpose `try` as something that can be
wrapped around an arbitrary expression:
```swift
// This try applies to readBool().
if try stream.readBool() {
// This try applies to both of these calls.
let x = try stream.readInt() + stream.readInt()
// This is a semantic error; it needs a try.
var y = stream.readFloat()
// This is okay; the try covers the entire statement.
try y += stream.readFloat()
}
```
Developers can "scope" the `try` very tightly by writing it within
parentheses or on a specific argument or list element:
```swift
// Semantic error: the try only covers the parenthesized expression.
let x = (try stream.readInt()) + stream.readInt()
// The try applies to the first array element. Of course, the
// developer could cover the entire array by writing the try outside.
let array = [ try foo(), bar(), baz() ]
```
Some developers may wish to do this to make the specific throwing calls
very clear. Other developers may be content with knowing that something
within a statement can throw.
We also briefly considered the possibility of putting the marker into
the call arguments clause, e.g.:
```swift
parser.readKeys(&strings, try)
```
This works as long as the only throwing calls are written syntactically
as calls; this covers calls to free functions, methods, and
initializers. However, it effectively requires Swift to forbid operators
and property and subscript accessors from throwing, which may not be a
reasonable limitation, especially for operators. It is also somewhat
unnatural, and it forces users to mark every single call site instead of
allowing them to mark everything within a statement at once.
Autoclosures pose a problem for marking. For the most part, we want to
pretend that the expression of an autoclosure is being evaluated in the
enclosing context; we don't want to have to mark both a call within the
autoclosure and the call to the function taking the autoclosure! We
should teach the type-checking pass to recognize this pattern: a call to
a function that `throwsIf` an autoclosure argument does.
There's a similar problem with functions that are supposed to feel like
statements. We want you to be able to write:
```swift
autoreleasepool {
let string = parseString(try)
// ...
}
```
without marking the call to `autoreleasepool`, because this undermines
the ability to write functions that feel like statements. However, there
are other important differences between these trailing-closure uses and
true built-in statements, such as the behavior of `return`, `break`, and
`continue`. An attribute which marks the function as being
statement-like would be a necessary step towards addressing both
problems. Doing this reliably in closures would be challenging, however.
#### Asserting markers
Typed propagation is a hypothesis-checking mechanism and so suffers from
the standard problem of false positives. (Basic soundness eliminates
false negatives, of course: the compiler is supposed to force
programmers to deal with *every* source of error.) In this case, a false
positive means a situation where an API is declared to throw but an
error is actually dynamically impossible.
For example, a function to load an image from a URL would usually be
designed to produce an error if the image didn't exist, the connection
failed, the file data was malformed, or any of a hundred other problems
arose. The programmer should be expected to deal with that error in
general. But a programmer might reasonably use the same API to load an
image completely under their control, e.g. from their program's private
resources. We shouldn't make it too syntactically inconvenient to "turn
off" error-checking for such calls.
One important point is that we don't want to make it too easy to
*ignore* errors. Ignored errors usually lead to a terrible debugging
experience, even if the error is logged with a meaningful stack trace;
the full context of the failure is lost and can be difficult to
reproduce. Ignored errors also have a way of compounding, where an error
that's "harmlessly" ignored at one layer of abstraction causes another
error elsewhere; and of course the second error can be ignored, etc.,
but only by making the program harder and harder to understand and
debug, leaving behind log files that are increasingly jammed with the
detritus of a hundred ignored errors. And finally, ignoring errors
creates a number of type-safety and security problems by encouraging
programs to blunder onwards with meaningless data and broken invariants.
Instead, we just want to make it (comparatively) easy to turn a static
problem into a dynamic one, much as assertions and the ! operator do. Of
course, this needs to be an explicit operation, because otherwise we
would completely lose typed propagation; and it should be call-specific,
so that the programmer has to make an informed decision about individual
operations. But we already have an explicit, call-site-specific
annotation: the `try` operator. So the obvious solution is to allow a
variant of `try` that asserts that an error is not thrown out of its
operand; and the obvious choice there within our existing design
language is to use the universal "be careful, this is unsafe" marker by
making the keyword `try!`.
It's reasonable to ask whether `try!` is actually *too* easy to write,
given that this is, after all, an unsafe operation. One quick rejoinder
is that it's no worse than the ordinary `!` operator in that sense. Like
`!`, it's something that a cautious programmer might want to investigate
closer, and you can easily imagine codebases that expect uses of it to
always be explained in comments. But more importantly, just like `!`
it's only *statically* unsafe, and it will reliably fail when the
programmer is wrong. Therefore, while you can easily imagine (and
demonstrate) incautious programmers flailing around with it to appease
the type-checker, that's not actually a tenable position for the overall
program: eventually the programmer will have to learn how to use the
feature, or else their program simply won't run.
Furthermore, while `try!` does somewhat undermine error-safety in the
hands of a careless programmer, it's still better to promote this kind
of unsafety than to implicitly promote the alternative. A careless
programmer isn't going to write good error handling just because we
don't give them this feature. Instead, they'll write out a `do/catch`
block, and the natural pressure there will be to silently swallow the
error --- after all, that takes less boilerplate than asserting or
logging.
In a future release, when we add support for universal errors, we'll
need to reconsider the behavior of `try!`. One possibility is that
`try!` should simply start propagating its operand as a universal error;
this would allow emergency recovery. Alternatively, we may want `try!`
to assert that even universal errors aren't thrown out of it; this would
provide a more consistent language model between the two kinds of
errors. But we don't need to think too hard about this yet.
### Other syntax
#### Clean-up actions
Swift should provide a statement for cleaning up with an *ad hoc*
action.
Overall, I think it is better to use a Go-style `defer` than a
Java-style `try ... finally`. While this makes the exact order of
execution more obscure, it does make it obvious that the clean-up *will*
be executed without any further analysis, which is something that
readers will usually be interested in.
Unlike Go, I think this should be tied to scope-exit, not to
function-exit. This makes it very easy to know the set of `defer`
actions that will be executed when a scope exits: it's all the `defer`
statement in exactly that scope. In contrast, in Go you have to
understand the dynamic history of the function's execution. This also
eliminates some semantic and performance oddities relating to variable
capture, since the `defer` action occurs with everything still in scope.
One downside is that it's not as good for "transactional" idioms which
push an undo action for everything they do, but that style has
composition problems across function boundaries anyway.
I think `defer` is a reasonable name for this, although we might also
consider `finally`. I'll use `defer` in the rest of this proposal.
`defer` may be followed by an arbitrary statement. The compiler should
reject an action that might terminate early, whether by throwing or with
`return`, `break`, or `continue`.
Examples:
```swift
if exists(filename) {
let file = open(filename, O_READ)
defer close(file)
while let line = try file.readline() {
// ...
}
// close occurs here, at the end of the formal scope.
}
```
We should consider providing a convenient way to mark that a `defer`
action should only be taken if an error is thrown. This is a convenient
shorthand for controlling the action with a flag that's only set to true
at the end of an operation. The flag approach is often more useful,
since it allows the action to be taken for *any* early exit, e.g. a
`return`, not just for error propagation.
#### `using`
Swift should consider providing a `using` statement which acquires a
resource, holds it for a fixed period of time, optionally binds it to a
name, and then releases it whenever the controlled statement exits.
`using` has many similarities to `defer`. It does not subsume `defer`,
which is useful for many ad-hoc and tokenless clean-ups. But it is
convenient for the common pattern of a type-directed clean-up.
We do not expect this feature to be necessary in the first release.
### C and Objective-C Interoperation
It's of paramount importance that Swift's error model interact as
cleanly with Objective-C APIs as we can make it.
In general, we want to try to import APIs that produce errors as
throwing; if this fails, we'll import the API as an ordinary
non-throwing function. This is a safe approach only under the assumption
that importing the function as throwing will require significant changes
to the call. That is, if a developer writes code assuming that an API
will be imported as throwing, but in fact Swift fails to import the API
that way, it's important that the code doesn't compile.
Fortunately, this is true for the common pattern of an error
out-parameter: if Swift cannot import the function as throwing, it will
leave the out-parameter in place, and the compiler will complain if the
developer fails to pass an error argument. However, it is possible to
imagine APIs where the "meat" of the error is returned in a different
way; consider a POSIX API that simply sets `errno`. Great care would
need to be taken when such an API is only partially imported as
throwing.
Let's wade into the details.
#### Error types
`NSError` and `CFError` should implement the `Error` protocol. It should
be possible to turn an arbitrary Swift `enum` that conforms to `Error`
into an `NSError` by using the qualified type name as the domain key,
the enumerator as the error code, and turning the payload into user
data.
Recognizing system enums as error domains is a matter of annotation.
Most likely, Swift will just special-case a few common domains in the
first release.
#### Objective-C method error patterns
The most common error pattern in ObjC by far is for a method to have an
autoreleased `NSError**` out-parameter. We don't currently propose
automatically importing anything as `throws` when it lacks such a
parameter.
If any APIs take an `NSError**` and *don't* intend for it to be an error
out-parameter, they will almost certainly need it to be marked.
##### Detecting an error
Many of these methods have some sort of significant result which is used
for testing whether an error occurred:
- The most common pattern is a `BOOL` result, where a false value
means an error occurred. This seems unambiguous.
Swift should import these methods as if they'd returned `Void`.
- Also common is a pointer result, where a `nil` result usually means
an error occurred.
I've been told that there are some exceptions to this rule, where a
`nil` result is valid and the caller is apparently meant to check
for a non-`nil` error. I haven't been able to find any such APIs in
Cocoa, though; the claimed APIs I've been referred to do have
nullable results, but returned via out-parameters with a
[BOOL]{.title-ref} formal result. So it seems to be a sound policy
decision for Objective-C that `nil` results are errors by default.
CF might be a different story, though.
When a `nil` result implies that an error has occurred, Swift should
import the method as returning a non-optional result.
- A few CF APIs return `void`. As far as I can tell, for all of these,
the caller is expected to check for a non-`nil` error.
For other sentinel cases, we can consider adding a new clang attribute
to indicate to the compiler what the sentinel is:
- There are several APIs returning `NSInteger` or `NSUInteger`. At
least some of these return 0 on error, but that doesn't seem like a
reasonable general assumption.
- `AVFoundation` provides a couple methods returning
`AVKeyValueStatus`. These produce an error if the API returned
`AVKeyValueStatusFailed`, which, interestingly enough, is not the
zero value.
The clang attribute would specify how to test the return value for an
error. For example:
```objc
+ (NSInteger)writePropertyList:(id)plist
toStream:(NSOutputStream *)stream
format:(NSPropertyListFormat)format
options:(NSPropertyListWriteOptions)opt
error:(out NSError **)error
NS_ERROR_RESULT(0)
- (AVKeyValueStatus)statusOfValueForKey:(NSString *)key
error:(NSError **)
NS_ERROR_RESULT(AVKeyValueStatusFailed);
```
We should also provide a Clang attribute which specifies that the
correct way to test for an error is to check the out-parameter. Both of
these attributes could potentially be used by the static analyzer, not
just Swift. (For example, they could try to detect an invalid error
check.)
A constant value would be sufficient for the cases I've seen, but if the
argument has to generalized to a simple expression, that's still
feasible.
##### The error parameter
The obvious import rule for Objective-C methods with `NSError**`
out-parameters is to simply mark them `throws` and remove the selector
clause corresponding to the out-parameter. That is, a method like this
one from `NSAttributedString`:
```objc
- (NSData *)dataFromRange:(NSRange)range
documentAttributes:(NSDictionary *)dict
error:(NSError **)error;
```
would be imported as:
```swift
func dataFromRange(
_ range: NSRange,
documentAttributes dict: NSDictionary
) throws -> NSData
```
However, applying this rule haphazardly causes problems for Objective-C
interoperation, because multiple methods can be imported the same way.
The model is far more comprehensible to both compiler and programmer if
the original Objective-C declaration can be unambiguously reconstructed
from a Swift declaration.
There are two sources of this ambiguity:
- The error parameter could have appeared at an arbitrary position in
the selector; that is, both `foo:bar:error:` and `foo:error:bar:`
would appear as `foo:bar:` after import.
- The error parameter could have had an arbitrary selector chunk; that
is, both `foo:error:` and `foo:withError:` would appear as `foo:`
after import.
To allow reconstruction, then, we should only apply the rule when the
error parameter is the last parameter and the corresponding selector is
either `error:` or the first chunk. Empirically, this seems to do the
right thing for all but two sets of APIs in the public API:
- The `ISyncSessionDriverDelegate` category on `NSObject` declares
half-a-dozen methods like this:
```objc
- (BOOL)sessionDriver:(ISyncSessionDriver *)sender
didRegisterClientAndReturnError:(NSError **)outError;
```
Fortunately, these delegate methods were all deprecated in Lion, and
Swift currently doesn't even import deprecated methods.
- `NSFileCoordinator` has half a dozen methods where the `error:`
clause is second-to-last, followed by a block argument. These
methods are not deprecated as far as I know.
Of course, user code could also fail to follow this rule.
I think it's acceptable for Swift to just not import these methods as
`throws`, leaving the original error parameter in place exactly as if
they didn't follow an intelligible pattern in the header.
This translation rule would import methods like this one from
`NSDocument`:
```objc
- (NSDocument *)duplicateAndReturnError:(NSError **)outError;
```
like so:
```swift
func duplicateAndReturnError() throws -> NSDocument
```
Leaving the `AndReturnError` bit around feels unfortunate to me, but I
don't see what we could do without losing the ability to automatically
reconstruct the Objective-C signature. This pattern is common but hardly
universal; consider this method from `NSManagedObject`:
```objc
- (BOOL)validateForDelete:(NSError **)error;
```
This would be imported as:
```swift
func validateForDelete() throws
```
This seems like a really nice import.
#### CoreFoundation functions
CF APIs use `CFErrorRef` pretty reliably, but there are two problems.
First, we're not as confident about the memory management rules for the
error object. Is it always returned at +1?
Second, I'm not as confident about how to detect that an error has
occurred:
- There are a lot of functions that return `Boolean` or `bool`. It's
likely that these functions consistently use the same convention as
Objective-C: false means error.
- Similarly, there are many functions that return an object reference.
Again, we'd need a policy on whether to treat `nil` results as
errors.
- There are a handful of APIs that return a `CFIndex`, all with
apparently the same rule that a zero value means an error. (These
are serialization APIs, so writing nothing seems like a reasonable
error.) But just like Objective-C, that does not seem like a
reasonable default assumption.
- `ColorSyncProfile` has several related functions that return
`float`! These are both apparently meant to be checked by testing
whether the error result was filled in.
There are also some APIs that do not use `CFErrorRef`. For example, most
of the `CVDisplayLink` APIs in CoreVideo returns their own `CVReturn`
enumeration, many with more than one error value. Obviously, these will
not be imported as throwing unless CoreVideo writes an overlay.
#### Other C APIs
In principle, we could import POSIX functions into Swift as throwing
functions, filling in the error from `errno`. It's nearly impossible to
imagine doing this with an automatic import rule, however; much more
likely, we'd need to wrap them all in an overlay.
### Implementation design
Error propagation for the kinds of explicit, typed errors that I've been
focusing on should be handled by implicit manual propagation. It would
be good to bias the implementation somewhat towards the non-error path,
perhaps by moving error paths to the ends of functions and so on, and
perhaps even by processing cleanups with an interpretive approach
instead of directly inlining that code, but we should not bias so
heavily as to seriously compromise performance. In other words, we
should not use table-based unwinding.
Error propagation for universal errors should be handled by table-based
unwinding. `catch` handlers can catch both, mapping unwind exceptions to
`Error` values as necessary. With a carefully-designed interpretation
function aimed to solve the specific needs of Swift, we can avoid most
of the code-size impact by shifting it to the unwind tables, which
needn't ever be loaded in the common case.
|