File: TextFormatting.rst

package info (click to toggle)
swiftlang 6.0.3-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,519,992 kB
  • sloc: cpp: 9,107,863; ansic: 2,040,022; asm: 1,135,751; python: 296,500; objc: 82,456; f90: 60,502; lisp: 34,951; pascal: 19,946; sh: 18,133; perl: 7,482; ml: 4,937; javascript: 4,117; makefile: 3,840; awk: 3,535; xml: 914; fortran: 619; cs: 573; ruby: 573
file content (477 lines) | stat: -rw-r--r-- 16,140 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
:orphan:

Text Formatting in Swift
========================

:Author: Dave Abrahams
:Author: Chris Lattner
:Author: Dave Zarzycki
:Date: 2013-08-12


.. contents:: Index

**Abstract:** We propose a system for creating textual representations
of Swift objects. Our system unifies conversion to ``String``, string
interpolation, printing, and representation in the REPL and debugger.

Scope
-----

Goals
.....

* The REPL and LLDB ("debuggers") share formatting logic
* All types are "debug-printable" automatically
* Making a type "printable for humans" is super-easy
* ``toString()``-ability is a consequence of printability.
* Customizing a type's printed representations is super-easy
* Format variations such as numeric radix are explicit and readable
* Large textual representations do not (necessarily) ever need to be
  stored in memory, e.g. if they're being streamed into a file or over
  a remote-debugging channel.

Non-Goals
.........

.. sidebar:: Rationale

  Localization (including single-locale linguistic processing such as
  what's found in Clang's diagnostics subsystem) is the only major
  application we can think of for dynamically-constructed format
  strings, [#dynamic]_ and is certainly the most important consumer of
  that feature.  Therefore, localization and dynamic format strings
  should be designed together, and *under this proposal* the only
  format strings are string literals containing interpolations
  ("``\(...)``"). Cocoa programmers can still use Cocoa localization
  APIs for localization jobs.

  In Swift, only the most common cases need to be very terse.
  Anything "fancy" can afford to be a bit more verbose. If and when
  we address localization and design a full-featured dynamic string
  formatter, it may make sense to incorporate features of ``printf``
  into the design.

* **Localization** issues such as pluralizing and argument
  presentation order are beyond the scope of this proposal.

* **Dynamic format strings** are beyond the scope of this proposal.

* **Matching the terseness of C**\ 's ``printf`` is a non-goal.

CustomStringConvertible Types
-----------------------------

``CustomStringConvertible`` types can be used in string literal interpolations,
printed with ``print(x)``, and can be converted to ``String`` with
``x.toString()``.

The simple extension story for beginners is as follows:

  "To make your type ``CustomStringConvertible``, simply declare conformance to
  ``CustomStringConvertible``::

    extension Person : CustomStringConvertible {}

  and it will have the same printed representation you see in the
  interpreter (REPL). To customize the representation, give your type
  a ``func format()`` that returns a ``String``::

    extension Person : CustomStringConvertible {
      func format() -> String {
        return "\(lastName), \(firstName)"
      }
    }

The formatting protocols described below allow more efficient and
flexible formatting as a natural extension of this simple story.

Formatting Variants
-------------------

``CustomStringConvertible`` types with parameterized textual representations
(e.g. number types) *additionally* support a ``format(...)`` method
parameterized according to that type's axes of variability::

  print(offset)
  print(offset.format()) // equivalent to previous line
  print(offset.format(radix: 16, width: 5, precision: 3))

Although ``format(...)`` is intended to provide the most general
interface, specialized formatting interfaces are also possible::

  print(offset.hex())


Design Details
--------------

Output Streams
..............

The most fundamental part of this design is ``TextOutputStream``, a thing
into which we can stream text: [#character1]_

::

  protocol TextOutputStream {
    func append(_ text: String)
  }

Every ``String`` can be used as an ``TextOutputStream`` directly::

  extension String : TextOutputStream {
    func append(_ text: String)
  }

Debug Printing
..............

Via compiler magic, *everything* conforms to the ``CustomDebugStringConvertible``
protocol. To change the debug representation for a type, you don't
need to declare conformance: simply give the type a ``debugFormat()``::

  /// A thing that can be printed in the REPL and the Debugger
  protocol CustomDebugStringConvertible {
    typealias DebugRepresentation : TextOutputStreamable = String

    /// Produce a textual representation for the REPL and
    /// Debugger.
    func debugFormat() -> DebugRepresentation
  }

Because ``String`` is a ``TextOutputStreamable``, your implementation of
``debugFormat`` can just return a ``String``. If want to write
directly to the ``TextOutputStream`` for efficiency reasons,
(e.g. if your representation is huge), you can return a custom
``DebugRepresentation`` type.


.. Admonition:: Guideline

   Producing a representation that can be consumed by the REPL
   and LLDB to produce an equivalent object is strongly encouraged
   where possible!  For example, ``String.debugFormat()`` produces
   a representation starting and ending with "``"``", where special
   characters are escaped, etc. A ``struct Point { var x, y: Int }``
   might be represented as "``Point(x: 3, y: 5)``".

(Non-Debug) Printing
....................

The ``CustomStringConvertible`` protocol provides a "pretty" textual representation
that can be distinct from the debug format. For example, when ``s``
is a ``String``, ``s.format()`` returns the string itself,
without quoting.

Conformance to ``CustomStringConvertible`` is explicit, but if you want to use the
``debugFormat()`` results for your type's ``format()``, all you
need to do is declare conformance to ``CustomStringConvertible``; there's nothing to
implement::

  /// A thing that can be print()ed and toString()ed.
  protocol CustomStringConvertible : CustomDebugStringConvertible {
    typealias PrintRepresentation : TextOutputStreamable = DebugRepresentation

    /// produce a "pretty" textual representation.
    ///
    /// In general you can return a String here, but if you need more
    /// control, return a custom TextOutputStreamable type
    func format() -> PrintRepresentation {
      return debugFormat()
    }

    /// Simply convert to String
    ///
    /// You'll never want to reimplement this
    func toString() -> String {
      var result: String
      self.format().write(result)
      return result
    }
  }

``TextOutputStreamable``
........................

Because it's not always efficient to construct a ``String``
representation before writing an object to a stream, we provide a
``TextOutputStreamable`` protocol, for types that can write themselves into an
``TextOutputStream``. Every ``TextOutputStreamable`` is also a
``CustomStringConvertible``, naturally::

  protocol TextOutputStreamable : CustomStringConvertible {
    func writeTo<T: TextOutputStream>(_ target: [inout] T)

    // You'll never want to reimplement this
    func format() -> PrintRepresentation {
      return self
    }
  }

How ``String`` Fits In
......................

``String``\ 's ``debugFormat()`` yields a ``TextOutputStreamable`` that
adds surrounding quotes and escapes special characters::

  extension String : CustomDebugStringConvertible {
    func debugFormat() -> EscapedStringRepresentation {
      return EscapedStringRepresentation(self)
    }
  }

  struct EscapedStringRepresentation : TextOutputStreamable {
    var _value: String

    func writeTo<T: TextOutputStream>(_ target: [inout] T) {
      target.append("\"")
      for c in _value {
        target.append(c.escape())
      }
      target.append("\"")
    }
  }

Besides modeling ``TextOutputStream``, ``String`` also conforms to
``TextOutputStreamable``::

  extension String : TextOutputStreamable {
    func writeTo<T: TextOutputStream>(_ target: [inout] T) {
      target.append(self) // Append yourself to the stream
    }

    func format() -> String {
      return self
    }
  }

This conformance allows *most* formatting code to be written entirely
in terms of ``String``, simplifying usage. Types with other needs can
expose lazy representations like ``EscapedStringRepresentation``
above.

Extended Formatting Example
---------------------------

The following code is a scaled-down version of the formatting code
used for ``Int``. It represents an example of how a relatively
complicated ``format(...)`` might be written::

  protocol CustomStringConvertibleInteger
    : ExpressibleByIntegerLiteral, Comparable, SignedNumber, CustomStringConvertible {
    func %(lhs: Self, rhs: Self) -> Self
    func /(lhs: Self, rhs: Self) -> Self
    constructor(x: Int)
    func toInt() -> Int

    func format(_ radix: Int = 10, fill: String = " ", width: Int = 0)
      -> RadixFormat<This> {

      return RadixFormat(this, radix: radix, fill: fill, width: width)
    }
  }

  struct RadixFormat<T: CustomStringConvertibleInteger> : TextOutputStreamable {
    var value: T, radix = 10, fill = " ", width = 0

    func writeTo<S: TextOutputStream>(_ target: [inout] S) {
      _writeSigned(value, &target)
    }

    // Write the given positive value to stream
    func _writePositive<T:CustomStringConvertibleInteger, S: TextOutputStream>(
      _ value: T, stream: [inout] S
    ) -> Int {
      if value == 0 { return 0 }
      var radix: T = T.fromInt(self.radix)
      var rest: T = value / radix
      var nDigits = _writePositive(rest, &stream)
      var digit = UInt32((value % radix).toInt())
      var baseCharOrd : UInt32 = digit <= 9 ? '0'.value : 'A'.value - 10
      stream.append(String(UnicodeScalar(baseCharOrd + digit)))
      return nDigits + 1
    }

    func _writeSigned<T:CustomStringConvertibleInteger, S: TextOutputStream>(
      _ value: T, target: [inout] S
    ) {
      var width = 0
      var result = ""

      if value == 0 {
        result = "0"
        ++width
      }
      else {
        var absVal = abs(value)
        if (value < 0) {
          target.append("-")
          ++width
        }
        width += _writePositive(absVal, &result)
      }

      while width < width {
        ++width
        target.append(fill)
      }
      target.append(result)
    }
  }

  extension Int : CustomStringConvertibleInteger {
    func toInt() -> Int { return self }
  }


Possible Extensions (a.k.a. Complications)
------------------------------------------

We are not proposing these extensions. Since we have given them
considerable thought, they are included here for completeness and to
ensure our proposed design doesn't rule out important directions of
evolution.

``TextOutputStream`` Adapters
.............................

Most text transformations can be expressed as adapters over generic
``TextOutputStream``\ s. For example, it's easy to imagine an upcasing
adapter that transforms its input to upper case before writing it to
an underlying stream::

  struct UpperStream<UnderlyingStream:TextOutputStream> : TextOutputStream {
    func append(_ x: String) { base.append(x.toUpper()) }
    var base: UnderlyingStream
  }

However, upcasing is a trivial example: many such transformations--such
as ``trim()`` or regex replacement--are stateful, which implies some
way of indicating "end of input" so that buffered state can be
processed and written to the underlying stream:

.. parsed-literal::

  struct TrimStream<UnderlyingStream:TextOutputStream> : TextOutputStream {
    func append(_ x: String) { ... }
    **func close() { ... }**
    var base: UnderlyingStream
    var bufferedWhitespace: String
  }

This makes general ``TextOutputStream`` adapters more complicated to write
and use than ordinary ``TextOutputStream``\ s.

``TextOutputStreamable`` Adapters
.................................

For every conceivable ``TextOutputStream`` adaptor there's a corresponding
``TextOutputStreamable`` adaptor. For example::

  struct UpperStreamable<UnderlyingStreamable : TextOutputStreamable> {
    var base: UnderlyingStreamable

    func writeTo<T: TextOutputStream>(_ target: [inout] T) {
      var adaptedStream = UpperStream(target)
      self.base.writeTo(&adaptedStream)
      target = adaptedStream.base
    }
  }

Then, we could extend ``TextOutputStreamable`` as follows::

  extension TextOutputStreamable {
    typealias Upcased : TextOutputStreamable = UpperStreamable<This>
    func toUpper() -> UpperStreamable<This> {
      return Upcased(self)
    }
  }

and, finally, we'd be able to write:

.. parsed-literal::

  print(n.format(radix:16)\ **.toUpper()**)

The complexity of this back-and-forth adapter dance is daunting, and
might well be better handled in the language once we have some formal
model--such as coroutines--of inversion-of-control. We think it makes
more sense to build the important transformations directly into
``format()`` methods, allowing, e.g.:

.. parsed-literal::

  print(n.format(radix:16, **case:.upper**))

Possible Simplifications
------------------------

One obvious simplification might be to fearlessly use ``String`` as
the universal textual representation type, rather than having a
separate ``TextOutputStreamable`` protocol that doesn't necessarily
create a fully-stored representation. This approach would trade some
efficiency for considerable design simplicity. It is reasonable to
ask whether the efficiency cost would be significant in real cases,
and the truth is that we don't have enough information to know. At
least until we do, we opt not to trade away any CPU, memory, and
power.

If we were willing to say that only ``class``\ es can conform to
``TextOutputStream``, we could eliminate the explicit ``[inout]`` where
``TextOutputStream``\ s are passed around. Then, we'd simply need a
``class StringStream`` for creating ``String`` representations. It
would also make ``TextOutputStream`` adapters a *bit* simpler to use
because you'd never need to "write back" explicitly onto the target
stream. However, stateful ``TextOutputStream`` adapters would still need a
``close()`` method, which makes a perfect place to return a copy of
the underlying stream, which can then be "written back":

.. parsed-literal::

  struct AdaptedStreamable<T : TextOutputStreamable> {
    ...
    func writeTo<Target: TextOutputStream>(_ target: [inout] Target) {
      // create the stream that transforms the representation
      var adaptedTarget = adapt(target, adapter);
      // write the Base object to the target stream
      base.writeTo(&adaptedTarget)
      // Flush the adapted stream and, in case Target is a value type,
      // write its new value
      **target = adaptedTarget.close()**
    }
    ...
  }

We think anyone writing such adapters can handle the need for explicit
write-back, and the ability to use ``String`` as an ``TextOutputStream``
without additionally allocating a ``StringStream`` on the heap seems
to tip the balance in favor of the current design.

--------

.. [#format] Whether ``format(...)`` is to be a real protocol or merely
   an ad-hoc convention is TBD. So far, there's no obvious use for a
   generic ``format`` with arguments that depend on the type being
   formatted, so an ad-hoc convention would be just fine.

.. [#character1] We don't support streaming individual code points
   directly because it's possible to create invalid sequences of code
   points. For any code point that, on its own, represents a valid
   ``Character`` (a.k.a. Unicode `extended grapheme cluster`__), it is
   trivial and inexpensive to create a ``String``. For more
   information on the relationship between ``String`` and
   ``Character`` see the (forthcoming, as of this writing) document
   *Swift Strings State of the Union*.

   __ http://www.unicode.org/glossary/#extended_grapheme_cluster

.. [#dynamic] In fact it's possible to imagine a workable system for
   localization that does away with dynamic format strings altogether,
   so that all format strings are fully statically-checked and some of
   the same formatting primitives can be used by localizers as by
   fully-privileged Swift programmers. This approach would involve
   compiling/JIT-ing localizations into dynamically-loaded modules.
   In any case, that will wait until we have native Swift dylibs.