File: Accessors.rst

package info (click to toggle)
swiftlang 6.1.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,791,532 kB
  • sloc: cpp: 9,901,743; ansic: 2,201,431; asm: 1,091,827; python: 308,252; objc: 82,166; f90: 80,126; lisp: 38,358; pascal: 25,559; sh: 20,429; ml: 5,058; perl: 4,745; makefile: 4,484; awk: 3,535; javascript: 3,018; xml: 918; fortran: 664; cs: 573; ruby: 396
file content (1507 lines) | stat: -rw-r--r-- 63,419 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
:orphan:

Optimizing Accessors in Swift
=============================

Definitions
-----------

An abstract storage declaration is a language construct that declares
a means of accessing some sort of abstract entity.  I'll just say
"storage" hereafter.

Swift provides three storage declarations:

* a single named entity, called a *variable* and declared with ``var``
* a single named entity which can never be reassigned, called a *constant* and declared with ``let``
* a compound unnamed entity accessed with an index, called a *subscript* and declared with ``subscript``

These features are similar to those in other languages.  Swift notably
lacks compound named entities, such as C#'s indexed properties; the
design team intentionally chose to favor the use of named single
entities of subscriptable type.

It's useful to lump the two kinds of single named entities together;
I'll just call them both "variables" hereafter.

Subscripts must always be instance type members.  When a variable is
a type member, it's called a *property*.

When I say that these entities are *abstract*, I mean that they're not
directly tied to any particular implementation.  All of them may be
backed directly by memory storing values of the entity's type, or they
may simply provide a way of invoking arbitrary code to access a
logical memory, or they may be a little of both.

Full-value accesses
-------------------

All accesses to storage, no matter the implementation, can be performed
with two primitive operations:

* a full-value load, which creates a new copy of the current value
* a full-value store, which overwrites the current value with a
  different, independent value

A function which implements a full-value load is called a *getter*;
a full-value store, a *setter*.

An operation which calls for a full-value load into a temporary, then
a modification of the temporary, then a full-value store of the
temporary into the original entity, is called a *write-back*.

Implementing accesses with full-value accesses introduces two
problems: subobject clobbering and performance.

Subobject clobbering
~~~~~~~~~~~~~~~~~~~~

Subobject clobbering is a semantic issue.  It occurs when there are
two changes to an entity in flight at once, as in::

   swap(&point.x, &point.y)

(By "at once", I mean synchronously.  Unlike Java, Swift is willing to
say that an *asynchronous* simultaneous access (mutating or not) to an
entity that's being modified is completely undefined behavior, with
any sort of failure permitted, up to and including memory corruption
and crashes.  As Swift transitions towards being a systems language,
it can selectively choose to define some of this behavior,
e.g. allowing racing accesses to different elements of an array.)

Subobject clobbering is a problem with user expectation.  Users are
unlikely to write code that intentionally modifies two obviously
overlapping entities, but they might very reasonably write code that
modifies two "separate" sub-entities.  For example, they might write
code to swap two properties of a struct, or two elements of an array.
The natural expansion of these subobject accesses using whole-object
accesses generates code that "loses" the changes made to one of the
objects::

  var point0 = point
  var x = point0.x
  var point1 = point
  var y = point1.y
  swap(&x, &y)
  point1.y = y
  point = point1
  point0.x = x
  point = point0

Note that ``point.y`` is left unchanged.

Local analysis
^^^^^^^^^^^^^^

There are two straightforward solutions to subobject clobbering, both
reliant on doing local analysis to recognize two accesses which
obviously alias (like the two references to ``point`` in the above
example).  Once you've done this, you can either:

* Emit an error, outlawing such code.  This is what Swift currently
  does (but only when the aliasing access must be implemented with
  full-value loads and stores).
* Use a single temporary, potentially changing the semantics.

It's impossible to guarantee the absence of subobject clobbering with
this analysis without extremely heavy-handed languages changes.
Fortunately, subobject clobbering is "only" a semantics issue, not a
memory-safety issue, at least as long as it's implemented with
full-value accesses.

Reprojection
^^^^^^^^^^^^

A more general solution is to re-project the modified subobject from
scratch before writing it back.  That is, you first acquire an initial
value like normal for the subobject you wish to modify, then modify
that temporary copy in-place.  But instead of then recursively
consuming all the intermediate temporaries when writing them back, you
drop them all and recompute the current value, then write the modified
subobject to it, then write back all the way up.  That is::

  var point0 = point
  var x = point0.x
  var point1 = point
  var y = point1.y
  swap(&x, &y)
  point1 = point      // reload point1
  point1.y = y
  point = point1
  point0 = point      // reload point0
  point0.x = x
  point = point0

In this example, I've applied the solution consistently to all
accesses, which protects against unseen modifications (e.g. during the
call to ``swap``) at the cost of performing two extra full-value
loads.

You can heuristically lower this by combining it with a simple local
analysis and only re-projecting when writing back to other l-values
besides the last.  In other words, generate code that will work as
long as the entity is not modified behind abstraction barriers or
through unexpected aliases::

  var point0 = point
  var x = point0.x
  var point1 = point
  var y = point1.y
  swap(&x, &y)
  point1.y = y        // do not reload point1
  point = point1
  point0 = point      // reload point0
  point0.x = x
  point = point0

Note that, in either solution, you've introduced extra full-value
loads.  This may be quite expensive, and it's not guaranteed to be
semantically equivalent.

Performance
~~~~~~~~~~~

There are three major reasons why full-value accesses are inefficient.

Unnecessary subobject accesses
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first is that they may load or store more than is necessary.

As an obvious example, imagine a variable of type ``(Int, Int)``; even
if my code only accesses the first element of the tuple, full-value
accesses force me to read or write the second element as well.  That
means that, even if I'm purely overwriting the first element, I
actually have to perform a full-value load first so that I know what
value to use for the second element when performing the full-value
store.

Additionally, while unnecessarily loading the second element of an
``(Int, Int)`` pair might seem trivial, consider that the tuple could
actually have twenty elements, or that the second element might be
non-trivial to copy (e.g. if it's a retainable pointer).

Abstraction barriers
^^^^^^^^^^^^^^^^^^^^

A full-value load or store which you can completely reason about is one
thing, but if it has to be performed as a call, it can be a major
performance drag.

For one, calls do carry a significant amount of low-level overhead.

For another, optimizers must be extremely conservative about what a
call might do.  A retainable pointer might have to be retained and
later released purely to protect against the possibility that a getter
might, somehow, cause the pointer to otherwise be deallocated.

Furthermore, the conventions of the call might restrict performance.
One way or another, a getter for a retainable pointer generally
returns at +1, meaning that as part of the return, it is retained,
forcing the caller to later release.  If the access were instead
direct to memory, this retain might be avoidable, depending on what
the caller does with the pointer.

Copy-on-write
^^^^^^^^^^^^^

These problems are compounded by copy-on-write (COW) types.  In Swift,
a copy-on-write value embeds an object reference.  Copying the value
has low immediate cost, because it simply retains the existing
reference.  However, modifying a value requires the reference to be
made unique, generally by copying the data held by the value into a
fresh object.  I'll call this operation a *structural copy* in an
effort to avoid the more treacherous term "deep copy".

COW types are problematic with full-value accesses for several reasons.

First, COW types are often used to implement aggregates and thus often
have several distinguishable subobjects which users are likely to
think of as independent.  This heightens the dangers of subobject
clobbering.

Second, a full-value load of a COW type implies making the object
reference non-unique.  Changing the value at this point will force a
structural copy.  This means that modifying a temporary copy has
dramatically worse performance compared to modifying the original
entity in-place.  For example::

  window.name += " (closing)"

If ``&window.name`` can be passed directly to the operator, and the
string buffer is uniquely referenced by that string, then this
operation may be as cheap as copying a few characters into the tail of
the buffer.  But if this must be done with a write-back, then the
temporary will never have a unique reference, and there will always
be an unneeded structural copy.

Conservative access patterns
----------------------------

When you know how storage is implemented, it's straightforward to
generate an optimal access to it.  There are several major reasons why
you might not know how a storage declaration is implemented, though:

* It might be an abstract declaration, not a concrete declaration.
  Currently this means a protocol member, but Swift may someday add
  abstract class members.

* It might be a non-final class member, where the implementation you
  can see is potentially overridable by a subclass.

* It might be a resilient declaration, where you know only that the
  entity exists and know nothing statically about its implementation.

In all of these cases, you must generate code that will handle the
worst possible case, which is that the entity is implemented with a
getter and a setter.  Therefore, the conservative access pattern
includes opaque getter and setter functions.

However, for all the reasons discussed above, using unnecessary
full-value accesses can be terrible for performance.  It's really bad
if a little conservatism --- e.g. because Swift failed to devirtualize
a property access --- causes asymptotic inefficiencies.  Therefore,
Swift's native conservative access pattern also includes a third
accessor which permits direct access to storage when possible.  This
accessor is called ``materializeForSet``.

``materializeForSet`` receives an extra argument, which is an
uninitialized buffer of the value type, and it returns a pointer and a
flag.  When it can provide direct access to storage for the entity, it
constructs a pointer to the storage and returns false.  When it can't,
it performs a full-value load into the buffer and returns true.  The
caller performs the modification in-place on the returned pointer and
then, if the flag is true, passes the value to the setter.

The overall effect is to enable direct storage access as a dynamic
optimization when it's impossible as a static optimization.

For now, ``materializeForSet`` is always automatically generated based
on whether the entity is implemented with a computed setter.  It is
possible to imagine data structures that would benefit from having
this lifted to a user-definable feature; for example, a data structure
which sometimes holds its elements in memory but sometimes does not.

``materializeForSet`` can provide direct access whenever an address
for the storage can be derived.  This includes when the storage is
implemented with a ``mutableAddress`` accessor, as covered below.
Observing accessors currently prevent ``materializeForSet`` from
offering direct access; that's fixable for ``didSet`` using a slightly
different code pattern, but ``willSet`` is an inherent obstacle.

Independent of any of the other optimizations discussed in this
whitepaper, ``materializeForSet`` had the potential to immediately
optimize the extremely important case of mutations to COW values in
un-devirtualized class properties, with fairly minimal risk.
Therefore, ``materializeForSet`` was implemented first, and it shipped
in Xcode 6.1.

Direct access at computed addresses
-----------------------------------

What entities can be directly accessed in memory?  Non-computed
variables make up an extremely important set of cases; Swift has
enough built-in knowledge to know that it can provide direct access to
them.  But there are a number of other important cases where the
address of an entity is not built-in to the compiler, but where direct
access is nonetheless possible.  For example, elements of a simple
array always have independent storage in memory.  Most benchmarks on
arrays would profit from being able to modify array elements in-place.

There's a long chain of proposals in this area, many of which are
refinement on previous proposals.  None of these proposals has yet
shipped in Xcode.

Addressors
~~~~~~~~~~

For something like a simple array (or any similar structure, like a
deque) which is always backed by a buffer, it makes sense for the
implementor to simply define accessors which return the address of
the element.  Such accessors are called *addressors*, and there are
two: ``address`` and ``mutableAddress``.

The conservative access pattern can be generated very easily from
this: the getter calls ``address`` and loads from it, the setter calls
``mutableAddress`` and stores to it, and ``materializeForSet``
provides direct access to the address returned from
``mutableAddress``.

If the entity has type ``T``, then ``address`` returns an
``UnsafePointer<T>`` and ``mutableAddress`` returns an
``UnsafeMutablePointer<T>``.  This means that the formal type of the
entity must exactly match the formal type of the storage.  Thus, the
standard subscript on ``Dictionary<K,V>`` cannot be implemented using
addressors, because the formal type of the entity is ``V?``, but the
backing storage holds a ``V``.  (And this is in keeping with user
expectations about the data structure: assigning ``nil`` at a key is
supposed to erase any existing entry there, not create a new entry to
hold ``nil``.)

This simple addressor proposal was the first prong of our efforts to
optimize array element access.  Unfortunately, while it is useful for
several other types (such as ``ContiguousArray`` and
``UnsafeMutablePointer``), it is not flexible enough for the ``Array``
type.

Mixed addressors
~~~~~~~~~~~~~~~~

Swift's chief ``Array`` type is only a simple array when it is not
interacting with Objective-C.  Type bridging requires ``Array`` to be
able to store an immutable ``NSArray`` instance, and the ``NSArray``
interface does not expose the details of how it stores elements.  An
``NSArray`` is even permitted to dynamically generate its values in
its ``objectAtIndex:`` method.  And it would be absurd for ``Array``
to perform a structural copy during a load just to make non-mutating
accesses more efficient!  So the load access pattern for ``Array``'s
subscript declaration must use a getter.

Fortunately, this requirement does not preclude using an addressor for
mutating accesses.  Mutations to ``Array`` always transition the array
to a unique contiguous buffer representation as their first step.
This means that the subscript operator can sensibly return an address
when it's used for the purposes of mutation: in other words, exactly
when ``mutableAddress`` would be invoked.

Therefore, the second prong of our efforts to optimize array element
access was to allow entities to be implemented with the combination of
a ``get`` accessor and a ``mutableAddress`` accessor.  This is
straightforward in the user model, where it simply means lifting a
restriction.  It's more complex behind the scenes because it broke
what was previously a clean conceptual division between "physical" and
"logical" l-values.

Mixed addressors have now been adopted by ``Array`` to great success.
As expected, they substantially improved performance mutating COW
array elements.  But they also fix an important instance of subobject
clobbering, because modifications to different subobjects (notably,
different elements of the same array) can occur simultaneously by
simply projecting out their addresses in the unique buffer.  For
example, this means that it's possible to simply swap two elements
of an array directly::

  swap(&array[i], &array[j])

  // Expanded:
  array.transitionToUniquelyReferenced()
  let address_i = array.buffer.storage + i
  array.transitionToUniquelyReferenced()
  let address_j = array.buffer.storage + j
  swap(address_i, address_j)

Mixed addressors weren't completely implemented until very close to
the Xcode 6.1 deadline, and they changed code-generation patterns
enough to break a number of important array-specific optimizations.
Therefore, the team sensibly decided that they were too risky for that
release, and that there wasn't enough benefit from other applications
to justify including any of the addressor work.

In a way, that was a fortunate decision, because the naive version of
addressors implemented so far in Swift creates a safety hole which
would otherwise have been exposed to users.

Memory unsafety of addressors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The semantics and memory safety of operations on COW types rely on a
pair of simple rules:

* A non-mutating operation must own a reference to the buffer for
  the full course of the read.

* A mutating operation must own a unique reference to the buffer
  for the full course of the mutation.

Both rules tend to be naturally satisfied by the way that operations
are organized into methods.  A value must own a reference to its
buffer at the moment that a method is invoked on it.  A mutating
operation immediately transitions the buffer to a unique reference,
performing a structural copy if necessary.  This reference will remain
valid for the rest of the method as long as the method is *atomic*: as
long as it does not synchronously invoke arbitrary user code.

(This is a single-threaded notion of atomicity.  A second thread which
modifies the value simultaneously can clearly invalidate the
assumption.  But that would necessarily be a data race, and the
language design team is willing to say that such races have fully
undefined behavior, and arbitrary consequences like memory corruption
and crashes are acceptable in their wake.)

However, addressors are not atomic in this way: they return an address
to the caller, which may then interleave arbitrary code before
completing the operation.  This can present the opportunity for
corruption if the interleaved code modifies the original value.
Consider the following code::

  func operate(value: inout Int, count: Int) { ... }

  var array: [Int] = [1,2,3,4]
  operate(&array[0], { array = []; return 0 }())

The dynamic sequence of operations performed here will expand like so::

  var array: [Int] = [1,2,3,4]
  let address = array.subscript.mutableAddress(0)
  array = []
  operate(address, 0)

The assignment to ``array`` within the closure will release the buffer
containing ``address``, thus passing ``operate`` a dangling pointer.

Nor can this be fixed with a purely local analysis; consider::

  class C { var array: [Int] }
  let global_C = C()

  func assign(value: inout Int) {
    C.array = []
    value = 0
  }

  assign(&global_C.array[0])

Fixing the memory safety hole
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conceptually, the correct fix is to guarantee that the rules are
satisfied by ensuring that the buffer is retained for the duration of
the operation.  Any interleaving modifications will then see a
non-uniquely-referenced buffer and perform a structural copy::

  // Project the array element.
  let address = array.subscript.mutableAddress(0)

  // Remember the new buffer value and keep it retained.
  let newArrayBuffer = array.buffer
  retain(newArrayBuffer)

  // Reassign the variable.
  release(array.buffer)
  array.buffer = ...

  // Perform the mutation.  These changes will be silently lost, but
  // they at least won't be using deallocated memory.
  operate(address, 0)

  // Release the "new" buffer.
  release(newArrayBuffer)

Note that this still leaves a semantic hole if the original value is
copied in interleaving code before the modification, because the
subsequent modification will be reflected in the copy::

  // Project the array element.
  let address = array.subscript.mutableAddress(0)

  // Remember the new buffer value and keep it retained.
  let newArrayBuffer = array.buffer
  retain(newArrayBuffer)

  // Copy the value.  Note that arrayCopy uses the same buffer that
  // 'address' points into.
  let arrayCopy = array
  retain(arrayCopy.buffer)

  // Perform the mutation.
  operate(address, 0)

  // Release the "new" buffer.
  release(newArrayBuffer)

This might be unexpected behavior, but the language team is willing to
accept unexpected behavior for this code.  What's non-negotiable is
breaking memory safety.

Unfortunately, applying this fix naively reintroduces the problem of
subobject clobbering: since a modification of one subobject
immediately retains a buffer that's global to the entire value, an
interleaved modification of a different subobject will see a
non-unique buffer reference and therefore perform a structural copy.
The modifications to the first subobject will therefore be silently
lost.

Unlike the interleaving copy case, this is seen as unacceptable.
Notably, it breaks swapping two array elements::

  // Original:
  swap(&array[i], &array[j])

  // Expanded:

  // Project array[i].
  array.transitionToUniquelyReferenced()
  let address_i = array.buffer.storage + i
  let newArrayBuffer_i = array.buffer
  retain(newArrayBuffer_i)

  // Project array[j].  Note that this transition is guaranteed
  // to have to do a structural copy.
  array.transitionToUniquelyReferenced()
  let address_j = array.buffer.storage + j
  let newArrayBuffer_j = array.buffer
  retain(newArrayBuffer_j)

  // Perform the mutations.
  swap(address_i, address_j)

  // Balance out the retains.
  release(newArrayBuffer_j)
  release(newArrayBuffer_i)

get- and setForMutation
~~~~~~~~~~~~~~~~~~~~~~~

Some collections need finer-grained control over the entire mutation
process. For instance, to support divide-and-conquer algorithms using
slices, sliceable collections must "pin" and "unpin" their buffers
while a slice is being mutated to grant permission for the slice
to mutate the collection in-place while sharing ownership. This
flexibility can be exposed by a pair of accessors that are called
before and after a mutation. The "get" stage produces both the
value to mutate and a state value (whose type must be declared) to
forward to the "set" stage. A pinning accessor can then look something
like this::

  extension Array {
    subscript(range: Range<Int>) -> Slice<Element> {
      // `getForMutation` must declare its return value, a pair of both
      // the value to mutate and a state value that is passed to
      // `setForMutation`.
      getForMutation() -> (Slice<Element>, PinToken) {
        let slice = _makeSlice(range)
        let pinToken = _pin()
        return (slice, pinToken)
      }

      // `setForMutation` receives two arguments--the result of the
      // mutation to write back, and the state value returned by
      // `getForMutation`.
      setForMutation(slice, pinToken) {
        _unpin(pinToken)
        _writeSlice(slice, backToRange: range)
      }
    }
  }

``getForMutation`` and ``setForMutation`` must appear as a pair;
neither one is valid on its own. A ``get`` and ``set`` accessor
should also still be provided for simple read and write operations.
When the compiler has visibility that storage is implemented in
terms of ``getForMutation`` and ``setForMutation``, it lowers a mutable
projection using those accessors as follows::

  // A mutation like this (assume `reverse` is a mutating method):
  array[0...99].reverse()
  // Decomposes to:
  let index = 0...99
  (var slice, let state) = array.`subscript.getForMutation`(index)
  slice.reverse()
  array.`subscript.setForMutation`(index, slice, state)

To support the conservative access pattern,
a `materializeForSet` accessor can be generated from `getForMutation`
and `setForMutation` in an obvious fashion: perform `getForMutation`
and store the state result in its scratch space, and return a
callback that loads the state and hands it off to `setForMutation`.

The beacon of hope for a user-friendly future: Inversion of control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Addressors and ``{get,set}ForMutation`` expose important optimizations
to the standard library, but are undeniably fiddly and unsafe constructs
to expose to users. A more natural model would be to
recognize that a compound mutation is a composition of nested scopes, and
express it in the language that way. A strawman model might look something
like this::

  var foo: T {
    get { return getValue() }
    set { setValue(newValue) }

    // Perform a full in-out mutation. The `next` continuation is of
    // type `(inout T) -> ()` and must be called exactly once
    // with the value to hand off to the nested mutation operation.
    mutate(next) {
      var value = getValue()
      next(&value)
      setValue(value)
    }
  }

This presents a natural model for expressing the lifetime extension concerns
of addressors, and the state maintenance necessary for pinning ``getForMutation``
accessors::

  // An addressing mutator
  mutate(next) {
    withUnsafePointer(&resource) {
      next(&$0.memory)
    }
  }

  // A pinning mutator
  mutate(next) {
    var slice = makeSlice()
    let token = pin()
    next(&slice)
    unpin(token)
    writeBackSlice(slice)
  }

For various semantic and implementation efficiency reasons, we don't want to
literally implement every access as a nesting of closures like this. Doing so
would allow for semantic surprises (a mutate() operation never invoking its
continuation, or doing so multiple times would be disastrous), and would
interfere with the ability for `inout` and `mutating` functions to throw or
otherwise nonlocally exit. However, we could present this model using
*inversion of control*, similar to Python generators or async-await.
A `mutate` operation could `yield` the `inout` reference to its inner value,
and the compiler could enforce that a `yield` occurs exactly once on every
control flow path::

  // An addressing, yielding mutator
  mutate {
    withUnsafePointer(&resource) {
      yield &$0.memory
    }
  }

  // A pinning mutator
  mutate {
    var slice = makeSlice()
    let token = pin()
    yield &slice
    unpin(token)
    writeBackSlice(slice)
  }

This obviously requires more implementation infrastructure than we currently
have, and raises language and library design issues (in particular,
lifetime-extending combinators like ``withUnsafePointer`` would need either
a ``reyields`` kind of decoration, or to become macros), but represents a
promising path toward exposing the full power of the accessor model to
users in an elegant way.

Acceptability
-------------

This whitepaper has mentioned several times that the language team is
prepared to accept such-and-such behavior but not prepared to accept
some other kind of behavior.  Clearly, there is a policy at work.
What is it?

General philosophy
~~~~~~~~~~~~~~~~~~

For any given language problem, a perfect solution would be one which:

* guarantees that all operations complete without crashing or
  corrupting the program state,

* guarantees that all operations produce results according to
  consistent, reliable, and intuitive rules,

* does not limit or require complex interactions with the remainder
  of the language, and

* imposes no performance cost.

These goals are, however, not all simultaneously achievable, and
different languages reach different balances.  Swift's particular
philosophy is as follows:

* The language should be as dynamically safe as possible.
  Straightforward uses of ordinary language features may cause
  dynamic failure, but the should never corrupt the program state.
  Any unsafe language or library features (other than simply calling
  into C code) should be explicitly labeled as unsafe.

  A dynamic failure should mean that the program reliably halts,
  ideally with a message clearly describing the source of the
  failure.  In the future, the language may allow for emergency
  recovery from such failures.

* The language should sit on top of C, relying only on a relatively
  unobtrusive runtime.  Accordingly, the language's interactions
  with C-based technologies should be efficient and obvious.

* The language should allow a static compiler to produce efficient
  code without dynamic instrumentation.  Accordingly, static
  analysis should only be blocked by incomplete information when
  the code uses an obviously abstract language feature (such as
  calling a class method or an unknown function), and the language
  should provide tools to allow programmers to limit such cases.

  (Dynamic instrumentation can, of course, still help, but it
  shouldn't be required for excellent performance.)

General solutions
~~~~~~~~~~~~~~~~~

A language generally has six tools for dealing with code it considers
undesirable.  Some of this terminology is taken from existing
standards, others not.

* The language may nonetheless take steps to ensure that the code
  executes with a reliable result.  Such code is said to have
  *guaranteed behavior*.

* The language may report the code as erroneous before it executes.
  Such code is said to be *ill formed*.

* The language may reliably report the code as having performed an
  illegal operation when it executes.  Such code is said to be
  *asserting* or *aborting*.

* The language may allow the code to produce an arbitrary-but-sound
  result.  Such code is said to have *unspecified behavior* or to
  have produced an *unspecified value*.

* The language may allow the code to produce an unsound result which
  will result in another of these behaviors, but only if used.
  Such code is said to have produced a *trap value*.

* The language may declare the code to be completely outside of the
  guarantees of the language.  Such code is said to have
  *undefined behavior*.

In keeping with its design philosophy, Swift has generally limited
itself to the first four solutions, with two significant exceptions.

The first exception is that Swift provides several explicitly unsafe
language and library features, such as ``UnsafePointer<T>`` and
``unowned(unsafe)``.  The use of these features is generally subject
to undefined behavior rules.

The second exception is that Swift does not make any guarantees about
programs in the presence of race conditions.  It is extremely
difficult to make even weak statements about the behavior of a program
with a race condition without either:

* heavily restricting shared mutable state on a language level, which
  would require invasive changes to how the language interacts with C;

* forcing implicit synchronization when making any change to
  potentially shared memory, which would cripple performance and
  greatly complicate library implementation; or

* using a garbage collector to manage all accessible memory, which
  would impose a very large burden on almost all of Swift's language
  goals.

Therefore, Swift does surrender safety in the presence of races.

Acceptability conditions for storage accesses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Storage access involves a tension between four goals:

* Preserving all changes when making simultaneous modifications to
  distinct subobjects; in other words, avoiding subobject clobbering

* Performing a predictable and intuitive sequence of operations when
  modifying storage that's implemented with a getter and setter

* Avoiding unnecessary copies of a value during a modification,
  especially when this forces a structural copy of a COW value

* Avoiding memory safety holes when accessing storage that's been
  implemented with memory.

Reprojection_ is good at preserving changes, but it introduces extra
copies, and it's less intuitive about how many times getters and
setters will be called.  There may be a place for it anyway, if we're
willing to accept the extra conceptual complexity for computed
storage, but it's not a reasonable primary basis for optimizing the
performance of storage backed by memory.

Solutions permitting in-place modification are more efficient, but
they do have the inherent disadvantage of having to vend the address
of a value before arbitrary interleaving code.  Even if the address
remains valid, and the solution to that avoids subobject clobbering,
there's an unavoidable issue that the write can be lost because the
address became dissociated from the storage.  For example, if your
code passes ``&array[i]`` to a function, you might plausibly argue
that changes to that argument should show up in the ``i``\ th element of
``array`` even if you completely reassign ``array``.  Reprojection_
could make this work, but in-place solutions cannot efficiently do so.
So, for any in-place solution to be acceptable, there does need to be
some rule specifying when it's okay to "lose track" of a change.

Furthermore, the basic behavior of COW means that it's possible to
copy an array with an element under modification and end up sharing
the same buffer, so that the modification will be reflected in a value
that was technically copied beforehand.  For example::

  var array = [1,2,3]
  var oldArray : [Int] = []

  // This function copies array before modifying it, but because that
  // copy is of a value undergoing modification, the copy will use
  // the same buffer and therefore observe updates to the element.
  func foo(element: inout Int) {
    oldArray = array
    element = 4
  }

  // Therefore, oldArray[2] will be 4 after this call.
  foo(&array[2])

Nor can this be fixed by temporarily moving the modified array aside,
because that would prevent simultaneous modifications to different
elements (and, in fact, likely cause them to assert).  So the rule
will also have to allow this.

However, both of these possibilities already come up in the design of
both the library and the optimizer.  The optimizer makes a number of
assumptions about aliasing; for example, the general rule is that
storage bound to an ``inout`` parameter cannot be accessed through
other paths, and while the optimizer is not permitted to compromise
memory safety, it is permitted to introduce exactly this kind of
unexpected behavior where aliasing accesses may or may not the storage
as a consistent entity.

Formal accesses
^^^^^^^^^^^^^^^

That rule leads to an interesting generalization.  Every modification
of storage occurs during a *formal access* (FA) to that storage.  An
FA is also associated with zero or more *designated storage names*
(DSNs), which are ``inout`` arguments in particular execution records.
An FA arises from an l-value expression, and its duration and DSN set
depend on how the l-value is used:

* An l-value which is simply loaded from creates an instantaneous FA
  at the time of the load.  The DSN set is empty.

  Example::

    foo(array)
    // instantaneous FA reading array

* An l-value which is assigned to with ``=`` creates an
  instantaneous FA at the time of the primitive assignment.  The DSN
  set is empty.

  Example::

    array = []
    // instantaneous FA assigning array

  Note that the primitive assignment strictly follows the evaluation
  of both the l-value and r-value expressions of the assignment.
  For example, the following code::

    // object is a variable of class type
    object.array = object.array + [1,2,3]

  produces this sequence of formal accesses::

    // instantaneous FA reading object (in the left-hand side)
    // instantaneous FA reading object (in the right-hand side)
    // instantaneous FA reading object.array (in the right-hand side)
    // evaluation of [1,2,3]
    // evaluation of +
    // instantaneous FA assigning object.array

* An l-value which is passed as an ``inout`` argument to a call
  creates an FA beginning immediately before the call and ending
  immediately after the call.  (This includes calls where an
  argument is implicitly passed ``inout``, such as to mutating
  methods or user-defined assignment operators such as ``+=`` or
  ``++``.) The DSN set contains the ``inout`` argument within the
  call.

  Example::

    func swap<T>(lhs: inout T, rhs: inout T) {}

    // object is a variable of class type
    swap(&leftObject.array, &rightObject.array)

  This results in the following sequence of formal accesses::

    // instantaneous FA reading leftObject
    // instantaneous FA reading rightObject
    // begin FA for inout argument leftObject.array (DSN={lhs})
    // begin FA for inout argument rightObject.array (DSN={rhs})
    // evaluation of swap
    // end FA for inout argument rightObject.array
    // end FA for inout argument leftObject.array

* An l-value which is used as the base of a member storage access
  begins an FA whose duration is the same as the duration of the FA
  for the subobject l-value.  The DSN set is empty.

  Example::

    swap(&leftObject.array[i], &rightObject.array[j])

  This results in the following sequence of formal accesses::

    // instantaneous FA reading leftObject
    // instantaneous FA reading i
    // instantaneous FA reading rightObject
    // instantaneous FA reading j
    // begin FA for subobject base leftObject.array (DSN={})
    // begin FA for inout argument leftObject.array[i] (DSN={lhs})
    // begin FA for subobject base rightObject.array (DSN={})
    // begin FA for inout argument rightObject.array[j] (DSN={rhs})
    // evaluation of swap
    // end FA for subobject base rightObject.array[j]
    // end FA for inout argument rightObject.array
    // end FA for subobject base leftObject.array[i]
    // end FA for inout argument leftObject.array

* An l-value which is the base of an ! operator begins an FA whose
  duration is the same the duration of the FA for the resulting
  l-value.  The DSN set is empty.

  Example::

    // left is a variable of type T
    // right is a variable of type T?
    swap(&left, &right!)

  This results in the following sequence of formal accesses::

    // begin FA for inout argument left (DSN={lhs})
    // begin FA for ! operand right (DSN={})
    // begin FA for inout argument right! (DSN={rhs})
    // evaluation of swap
    // end FA for inout argument right!
    // end FA for ! operand right
    // end FA for inout argument left

* An l-value which is the base of a ? operator begins an FA whose
  duration begins during the formal evaluation of the l-value
  and ends either immediately (if the operand was nil) or at the
  end of the duration of the FA for the resulting l-value.
  In either case, the DSN set is empty.

  Example::

    // left is a variable of optional struct type
    // right is a variable of type Int
    left?.member += right

  This results in the following sequence of formal accesses, assuming
  that ``left`` contains a value::

    // begin FA for ? operand left (DSN={})
    // instantaneous FA reading right (DSN={})
    // begin FA for inout argument left?.member (DSN={lhs})
    // evaluation of +=
    // end FA for inout argument left?.member
    // end FA for ? operand left

The FAs for all ``inout`` arguments to a call begin simultaneously at
a point strictly following the evaluation of all the argument
expressions.  For example, in the call ``foo(&array, array)``, the
evaluation of the second argument produces a defined value, because
the FA for the first argument does not begin until after all the
arguments are formally evaluated.  No code should actually be emitted
during the formal evaluation of ``&array``, but for an expression like
``someClassReference.someArray[i]``, the class r-value and index
expressions would be fully evaluated at that time, and then the
l-value would be kept abstract until the FA begins.  Note that this
requires changes in SILGen's current code generation patterns.

The FA rule for the chaining operator ``?`` is an exception to the
otherwise-simple intuition that formal accesses begin immediately
before the modification begins.  This is necessary because the
evaluation rules for ``?`` may cause arbitrary computation to be
short-circuited, and therefore the operand must be accessed during the
formal evaluation of the l-value.  There were three options here:

* Abandon short-circuiting for assignments to optional l-values.  This
  is a very high price; short-circuiting fits into user intuitions
  about the behavior of the chaining operator, and it can actually be
  quite awkward to replicate with explicit accesses.

* Short-circuit using an instantaneous formal access, then start a
  separate formal access before the actual modification.  In other
  words, evaluation of ``X += Y`` would proceed by first determining
  whether ``X`` exists (capturing the results of any r-value
  components), discarding any projection information derived from
  that, evaluating ``Y``, reprojecting ``X`` again (using the saved
  r-value components and checking again for whether the l-value
  exists), and finally calling the ``+=`` operator.

  If ``X`` involves any sort of computed storage, the steps required
  to evaluate this might be... counter-intuitive.

* Allow the formal access to begin during the formal evaluation of the
  l-value. This means that code like the following will have
  unspecified behavior::

    array[i]?.member = deriveNewValueFrom(array)

In the end, I've gone with the third option.  The intuitive
explanation is that ``array`` has to be accessed early in order to
continue the evaluation of the l-value.  I think that's
comprehensible to users, even if it's not immediately obvious.

Disjoint and non-disjoint formal accesses
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm almost ready to state the core rule about formal accesses, but
first I need to build up a few more definitions.

An *abstract storage location* (ASL) is:

* a global variable declaration;

* an ``inout`` parameter declaration, along with a reference
  to a specific execution record for that function;

* a local variable declaration, along with a reference to a
  specific execution record for that declaration statement;

* a static/class property declaration, along with a type having
  that property;

* a struct/enum instance property declaration, along with an
  ASL for the base;

* a struct/enum subscript declaration, along with a concrete index
  value and an ASL for the base;

* a class instance property declaration, along with an instance of
  that class; or

* a class instance subscript declaration, along with a concrete
  index value and an instance of that class.

Two abstract storage locations may be said to *overlap*.  Overlap
corresponds to the imprecise intuition that a modification of one
location directly alters the value of another location.  Overlap
is an "open" property of the language: every new declaration may
introduce its own overlap behavior.  However, the language and
library make certain assertions about the overlap of some locations:

* An ``inout`` parameter declaration overlaps exactly the set
  of ASLs overlapped by the ASL which was passed as an argument.

* If two ASLs are both implemented with memory, then they overlap
  only if they have the same kind in the above list and the
  corresponding data match:

    * execution records must represent the same execution
    * types must be the same
    * class instances must be the same
    * ASLs must overlap

* For the purposes of the above rule, the subscript of a standard
  library array type is implemented with memory, and the two
  indexes match if they have the same integer value.

* For the purposes of the above rule, the subscript of a standard
  library dictionary type is implemented with memory, and the two
  indexes match if they compare equal with ``==``.

Because this definition is open, it is impossible to completely
statically or dynamically decided it.  However, it would still be
possible to write a dynamic analysis which decided it for common
location kinds.  Such a tool would be useful as part of, say, an
ASan-like dynamic tool to diagnose violations of the
unspecified-behavior rule below.

The overlap rule is vague about computed storage partly because
computed storage can have non-obvious aliasing behavior and partly
because the subobject clobbering caused by the full-value accesses
required by computed storage can introduce unexpected results that can
be reasonably glossed as unspecified values.

This notion of abstract storage location overlap can be applied to
formal accesses as well.  Two FAs ``x`` and ``y`` are said to be
*disjoint* if:

* they refer to non-overlapping abstract storage locations or

* they are the base FAs of two disjoint member storage accesses
  ``x.a`` and ``y.b``.

Given these definitions, the core unspecified-behavior rule is:

  If two non-disjoint FAs have intersecting durations, and neither FA
  is derived from a DSN for the other, then the program has
  unspecified behavior in the following way: if the second FA is a
  load, it yields an unspecified value; otherwise, both FAs store an
  unspecified value in the storage.

Note that you cannot have two loads with intersecting durations,
because the FAs for loads are instantaneous.

Non-overlapping subobject accesses make the base accesses disjoint
because otherwise code like ``swap(&a[0], &a[1])`` would have
unspecified behavior, because the two base FAs are to clearly
overlapping locations and have intersecting durations.

Note that the optimizer's aliasing rule falls out from this rule.  If
storage has been bound as an ``inout`` argument, accesses to it
through any path not derived from the ``inout`` argument will start a
new FA for overlapping storage, the duration of which will necessarily
intersect duration with that of the FA through which the ``inout``
argument was bound, causing unspecified behavior.  If the ``inout``
argument is forwarded to another call, that will start a new FA which
is validly based on a DSN of the first; but an attempt to modify the
storage through the first ``inout`` argument while the second call is
active will create a third FA not based on the DSN from the second
``inout`` call, causing a conflict there.  Therefore a function may
assume that it can see all accesses to the storage bound to an
``inout`` argument.

If you didn't catch all that...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That may have been a somewhat intense description, so here's a simple
summary of the rule being proposed.

If storage is passed to an ``inout`` argument, then any other
simultaneous attempt to read or write to that storage, including to
the storage containing it, will have unspecified behavior.  Reads
from it may see partially-updated values, or even values which will
change as modifications are made to the original storage; and writes
may be clobbered or simply disappear.

But this only applies during the call with the ``inout`` argument: the
evaluation of other arguments to the call will not be interfered with,
and as soon as the call ends, all these modifications will resolve
back to a quiescent state.

And this unspecified behavior has limits.  The storage may end up with
an unexpected value, with only a subset of the writes made to it, and
copies from it may unexpectedly reflect modifications made after they
were copied.  However, the program will otherwise remain in a
consistent and uncorrupted state.  This means that execution will be
able to continue apace as long as these unexpected values don't trip
up some higher-level invariant.

Tracking formal accesses
------------------------

Okay, now that I've analyzed this to death, it's time to make a
concrete proposal about the implementation.

As discussed above, the safety hole with addressors can be fixed by
always retaining the buffer which keeps the address valid.  Assuming
that other uses of the buffer follow the general copy-on-write
pattern, this retain will prevent structural changes to the buffer
while the address is in use.

But, as I also discussed above, this introduces two problems:

Copies during modification
~~~~~~~~~~~~~~~~~~~~~~~~~~

Copying a COW aggregate value always shares the same buffer that was
stored there at the time of the copy; there is no uniqueness check
done as part of the copy.  Changes to subobjects will then be
instantly reflected in the "copy" as they are made to the original.
The structure of the copy will stay the same, but the values of
its subobjects will appear to spontaneously change.

I want to say that this behavior is acceptable according to the
formal-access rule I laid out above.  How does that reasoning work?

First, I need to establish what kind of behavior is at work here.  It
clearly isn't guaranteed behavior: copies of COW values are normally
expected to be independent.  The code wasn't rejected by the compiler,
nor did it dynamically assert; it simply seems to misbehave.  But
there are limits to the misbehavior:

* By general COW rules, there's no way to change the structure of an
  existing buffer unless the retain count is 1.  For the purposes of
  this analysis, that means that, as long as the retain count is
  above 1, there's no way to invalidate the address returned by the
  addressor.

* The buffer will be retained for as long as the returned address
  is being modified.  This retain is independent of any storage
  which might hold the aggregate value (and thus also retain the buffer).

* Because of this retain, the only way for the retain count to drop
  to 1 is for no storage to continue to refer to the buffer.

* But if no storage refers to the buffer, there is no way to
  initiate an operation which would change the buffer structure.

Thus the address will remain valid, and there's no danger of memory
corruption.  The only thing is that the program no longer makes useful
guarantees about the value of the copied aggregate.  In other words,
the copy yielded an unspecified value.

The formal-access rule allows loads from storage to yield an
unspecified value if there's another formal access to that storage in
play and the load is (1) not from an l-value derived from a name in
the other FA's DSN set and (2) not from a non-overlapping subobject.
Are these conditions true?

Recall that an addressor is invoked for an l-value of the form::

  base.pointee

or::

  base[i]

Both cases involve a formal access to the storage ``base`` as the base
of a subobject formal access.  This kind of formal access always has
an empty DSN set, regardless of how the subobject is used.  A COW
mutable addressor will always ensure that the buffer is uniquely
referenced before returning, so the only way that a value containing
that buffer can be copied is if the load is a non-subobject access to
``base``.  Therefore, there are two simultaneous formal accesses to
the same storage, and the load is not from an l-value derived from the
modification's DSN set (which is empty), nor is it for a
non-overlapping subobject.  So the formal-access rule applies, and
an unspecified value is an acceptable result.

The implementation requirement here, then, is simply that the
addressor must be called, and the buffer retained, within the duration
of the formal access.  In other words, the addressor must only
be called immediately prior to the call, rather than at the time
of the formal evaluation of the l-value expression.

What would happen if there *were* a simultaneous load from a
non-overlapping subobject?  Accessing the subobject might cause a
brief copy of ``base``, but only for the duration of copying the
subobject.  If the subobject does not overlap the subobject which was
projected out for the addressor, then this is harmless, because the
addressor will not allow modifications to those subobjects; there
might be other simultaneous formal accesses which do conflict, but
these two do not.  If the subobject does overlap, then a recursive
analysis must be applied; but note that the exception to the
formal-access rule will only apply if non-overlapping subobjects were
projected out from *both* formal accesses.  Otherwise, it will be
acceptable for the access to the overlapping subobject to yield an
unspecified value.

Avoiding subobject clobbering during parallel modification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The other problem is that the retain will prevent simultaneous changes
to the same buffer.  The second change will cause a structural copy,
and the first address will end up modifying a buffer which is no
longer referenced: in other words, the program will observe subobject
clobbering.  A similar analysis to the one from the last section
suggests that this can be described as unspecified behavior.

Unfortunately, this unspecified behavior is unwanted: it violates the
guarantees of the formal-access rule as I laid it out above, because
it occurs even if you have formal accesses to two non-overlapping
subobjects.  So something does need to be done here.

One simple answer is to dynamically track whether a COW buffer is
currently undergoing a non-structural mutation.  I'll call this *NSM
tracking*, and I'll call buffers which are undergoing non-structural
mutations *NSM-active*.

The general rules of COW say that mutating operations must ensure that
their buffer is uniquely referenced before performing the
modification.  NSM tracking works by having non-structural mutations
perform a weaker check: the buffer must be either uniquely referenced
or be NSM-active.  If the non-structural mutation allows arbitrary
code to run between the start of the mutation and the end --- as an
addressor does --- it must both retain the buffer and flag it as
NSM-active for the entire duration.

Because the retain still occurs, and because any *structural* changes
to the buffer that might invalidate the addresses of subobjects are
still blocked by that retain, all of the earlier analysis about the
memory safety of simultaneous accesses still applies.  The only change
is that simultaneous non-structural modifications, as would be created
by simultaneous formal accesses to subobjects, will now be able to
occur on a single buffer.

A set of simultaneous formal accesses on a single thread follows a
natural stack protocol, or can be made to do so with straightforward
SILGen and SIL optimizer consideration.  Therefore, the runtime can
track whether a buffer is NSM-active on a thread using a single bit,
which nested modifications can be told not to clear.  Call this the
*NSM bit*.  Ignoring multithreading considerations for a moment, since
the NSM bit is only ever set at the same as a retain and only ever
cleared at the same time as a release, it makes sense to pack this
into the strong reference count.  There is no need to support this
operation on non-Swift objects.  The runtime should provide three new
functions:

* A function to test whether an object is either uniquely referenced
  or NSM-active.  Call this ``swift_isUniquelyReferencedForNSM``.

* A function to perform the above test and, if the test passes and
  the NSM bit is not set, atomically retain the object and set
  the NSM bit.  It should return both the result of the test and an
  object to later set as NSM-inactive.  That object will be nil if
  the test failed or the NSM bit was already set.  Call this
  ``swift_tryRetainForNSM``.

* A function to atomically clear the NSM bit and release the object.
  Call this ``swift_releaseForNSM``.

These operations should also be reflected in SIL.

Concurrent modifications and the non-structural modification bit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

What about concurrency?  Two concurrent non-structural modifications
could race to set the NSM bit, and then the winning thread could clear
it before the other thread's modification is complete.  This could
cause memory-unsafe behavior, since the losing thread would be
modifying the object through an address while not retaining the value.

The major question here is whether this is a significant objection.
It's accepted that race conditions have undefined behavior.  Is such
code inherently racy?

The answer appears to be "no", and that it is possible to write code
which concurrently writes to existing non-overlapping elements of a
COW aggregate without causing races; but that such code is extremely
fraught, and moreover it is extremely fraught regardless of whether
NSM-activeness is tracked with a single bit or a wider count.  Consider:

* If the shared aggregate value is ever non-uniquely referenced, two
  threads concurrently modifying it will race to unique the array.
  This unavoidably has undefined behavior, because uniquing the
  array requires the previous value to eventually be released, and a
  race may cause an over-release.

* Assume that it's possible to guarantee that the aggregate value's
  buffer is uniquely referenced before any threads concurrently
  access it.  Now, all of the threads are performing different
  concurrent accesses.

  * If any of the accesses is a structural modification, there will
    be a race to re-unique the buffer.

  * If all of the accesses are non-structural modifications, then
    there will be no races as long as the retain-and-set and
    release-and-clear operations are atomic: when starting any
    particular operation, the buffer will always either be uniquely
    referenced or have the bit set.

  * If any of the accesses is a read, and that read does not occur
    during a non-structural modification, then the buffer may
    briefly become non-uniquely referenced and there will be a
    race from concurrent modifications to re-unique it.

  * If any of the accesses is a read, and that read occurs during a
    non-structural modification, and the optimizer does not re-order
    the read's retain/release around the retainForNSM/releaseForNSM
    operations, then it matters how NSM-activeness is tracked.

    If there is complete tracking (i.e. a count, not just a single
    bit), the retain for the read will only occur while the buffer
    is flagged as NSM-active, and so it will have no effect.

    If there is incomplete tracking (i.e. just a single NSM bit),
    then there is a potential for undefined behavior.  Suppose two
    threads race to set the NSM bit.  The loser then initiates a
    read and retains the buffer.  Before the loser releases the
    buffer, the winner clears the NSM bit.  Now another thread might
    see that the buffer is non-uniquely referenced and not
    NSM-active, and so it will attempt to unique the buffer.

    It is probably unreasonable to require the optimizer to never
    reorder ordinary retains and releases past retainForNSM and
    releaseForNSM operations.

More importantly, the use case here (many threads concurrently
accessing different elements of a shared data structure) just
inherently doesn't really work well with a COW data structure.  Even
if the library were able to make enough guarantees to ensure that,
with the right pattern of accesses, there would never be a structural
copy of the aggregate, it would still be extremely inefficient,
because all of the threads would be competing for atomic access to the
strong reference count.

In short, I think it's reasonable for the library to say that programs
which want to do this should always use a type with reference
semantics.  Therefore, it's reasonable to ignore concurrent accesses
when deciding how to best track whether an aggregate is undergoing
non-structural modification.  This removes the only objection I
can see to tracking this with a single NSM bit.

Code generation patterns
~~~~~~~~~~~~~~~~~~~~~~~~

The signatures and access patterns for addressors will need to change
in order to ensure memory-safety.

``mutableAddress`` currently returns an ``UnsafeMutablePointer``; it
will need to return ``(Builtin.NativeObject?, UnsafeMutablePointer)``.
The owner pointer must be a native object; we cannot efficiently
support either uniqueness checking or the NSM bit on non-Swift
objects.  SILGen will mark that the address depends on the owner
reference and push a cleanup to ``releaseForNSM`` it.

``address`` currently returns an ``UnsafePointer``; it will need to
return ``(Builtin.NativeObject?, UnsafePointer)``.  I do not currently
see a reason to allow non-Swift owners, but the model doesn't depend
on that.  SILGen will mark that the address depends on the owner
reference and push a cleanup to ``release`` it.

In order to support ultimately calling an addressor in the
conservative access path, ``materializeForSet`` must also return an
owner reference.  Since ``materializeForSet`` calls ``mutableAddress``
in this case, SILGen will follow that pattern for calls.  SILGen will
also assume that the need to perform a ``releaseForNSM`` is exclusive
with the need to call the setter.

Mutating operations on COW types will now have two different paths for
making a buffer mutable and unique: one for structural mutations and
another for non-structural mutations.  I expect that this will require
separate semantics annotations, and the optimizer will have to
recognize both.

``releaseForNSM`` operations will not be reorderable unless the
optimizer can prove that the objects are distinct.

Summary of proposal and plan
----------------------------

Let me summarize what I'm proposing:

* Swift's core approach to optimizing accesses should be based
  around providing direct access to memory, either statically or
  dynamically.  In other words, Swift should adopt addressors on
  core data structures as much as possible.

* Swift should fix the current memory hole with addressors by
  retaining for the duration of the access and, for modifications,
  flagging the buffer as NSM-active.  The implementation plan
  follows:

  * The runtime implements the NSM-bit and its entrypoints.

  * SIL provides operations for manipulating and querying the NSM
    bit.  IRGen implements these operations using the runtime
    functions.  Builtins are exposed.

  * The standard library changes data structures to do different
    uniquing for structural and non-structural modifications.  This
    patch is not yet committed.

  * The optimizer reacts to the above.  When both are settled, they
    can be committed.

  * SILGen changes the emission patterns for l-values so that
    addresses and writebacks are live only during the formal
    access.

  * Sema changes the signature of ``address``, ``mutableAddress``,
    and ``materializeForSet`` to return an optional owner reference.
    Sema changes ``materializeForSet`` synthesis to return the
    owner correctly.  SILGen implements the desired code patterns.

    The standard library changes its addressor implementations
    to continue to compile, but for staging purposes, it only uses
    nil owners.

  * The standard library changes addressor implementations to
    use meaningful owners.  This patch is not yet committed.

  * The optimizer reacts to the above.  When both are settled, they
    can be committed.