File: format-algorithm.rst

package info (click to toggle)
cmake-format 0.6.13-7
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 3,436 kB
  • sloc: python: 16,990; makefile: 14
file content (255 lines) | stat: -rw-r--r-- 11,387 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
.. _formatting-algorithm:

====================
Formatting Algorithm
====================

The formatter works by attempting to select an appropriate ``position`` and
``wrap`` (collectively referred to as a "layout") for each node in the layout
tree. Positions are represented by ``(row, col)`` pairs and the wrap dictates
how childen of that node are positioned.

--------
Wrapping
--------

``cmake-format`` implements three styles of wrapping.
The default wrapping for all nodes is horizontal wrapping. If horizontal
wrapping fails to emit an admissible layout, then a node will advance to
either vertical wrapping or nested wrapping (which one depends on the type of
node).

Horizontal Wrapping
===================

Horizontal wrapping is like "word wrap". Each child is assigned a position
immediately following it's predecessor, so long as that child fits in the
remaining space up to the column limit. Otherwise the child is moved to the
next line::

    |                       |<- col-limit
    | ██████ ███ ██ █████   |
    | ███████████████ ████  |
    | █████████ ████        |

Note that a line comment can force an early newline::

    |                       |<- col-limit
    | ██████ ███ #          |
    | ██ █████              |
    | ███████████████ ████  |
    | █████████ ████        |

Note that wrapping happens at the depth of the layout tree, so if we have
multiple groups of multiple arguments each, then each group will be placed
as if it were a single unit::

    |                               |<- col-limit
    | (██████ ███) (██ █████)       |
    | (███ ██ ███████████████ ████) |

Groups may be parenthetical groups (as above) or keyword groups::

    |                               |<- col-limit
    | ▒▒▒▒▒▒▒ ███ ▒▒▒ █████         |
    | ▒▒▒▒▒ ██ ███████████████ ████ |

or any other grouping assigned by the parser.

In the event that a subgroup cannot be packed within a single line of full
column width, it will be wrapped internally, and the next group placed on
the next line::

    |                               |<- col-limit
    | ▒▒▒▒▒ ███ ▒▒▒▒ █████          |
    | ▒▒▒▒ ██ ███████████████ ████  |
    |      ██ █████                 |
    | ▒▒▒▒▒▒▒ ██ ██ █ ▒▒ █          |

In particular the following is never a valid packing (where the two groups are
siblings) in the layout tree::

    |                               |<- col-limit
    | ▒▒▒ █████ ▒▒▒ ██              |
    |               ███████████████ |
    |               ████ ██ █████   |

Vertical Wrapping
=================

Vertical wrapping assigns each child to the next row::

    ██████
    ███
    ██
    █████
    ███████████████
    ████

Again, note that this happens at the depth of the layout tree. In particular
children may be wrapped horizontally within the subtrees::

    | ▒▒▒▒▒▒ ███ ██████       |<- col-limit
    | ▒▒▒ ██████ ██           |
    | ▒▒▒▒ ████ █████ ██████  |
    |      ██████ ██████      |
    |      ████ ██████████    |
    | ▒▒ ███ ████             |


Nesting
=======

Nesting places children in a column which is one ``tab_width`` to the
right of the parent node's position, and one line below. For example::

    |                       |<- col-limit
    | ▒▒▒▒▒                 |
    |   ██ ███ ██ █████     |
    |   ████████████████    |
    |   █████████ ████      |

In a more deeply nested layout tree, we might see the following::

    |                           |<- col-limit
    | ▓▓▓▓▓                     |
    |   ▒▒▒▒▒                   |
    |     ██ ███ ██ █████       |
    |     ████████████████      |
    |     █████████ ████        |
    |   ▒▒▒                     |
    |     ████ ███ █            |
    |   ▒▒▒▒▒▒                  |
    |     ████ ███ █            |

Depending on how ``cmake-format`` is configured, elements at different depths
may be nested differently. For example::

    |                           |<- col-limit
    | ▓▓▓▓▓                     |
    |   ▒▒▒▒▒ ██ ███ ██ █████   |
    |         ████████████████  |
    |         █████████ ████    |
    |   ▒▒▒ ████ ███ █          |
    |   ▒▒▒▒▒▒ ████ ███ █       |

Note that the only nodes that can nest are ``STATEMENT`` and ``KWARGGROUP``
nodes. These nodes necessarily only have one child, an ``ARGGROUP`` node.
Therefore there really isn't a notion of "wrapping" for these nodes.

--------------------
Formatting algorithm
--------------------

For top-level nodes in the layout tree (i.e. ``COMMENT``, ``STATEMENT``,
``BODY``, ``FLOW_CONTROL``, etc...) the positioning is straight forward and
these nodes are laid out in a single pass. Each child is positioned on the
first line after the output cursor of it's predecessor, and at a column
``config.format.tab_size`` to the right of it's parent.

``STATEMENTS`` however, are laid out over several passes until the
text for that subtree is accepted. Each pass is governed by a
specification mapping pass number to a wrap decision (i.e. a
boolean indicating whether or not to wrap vertical or nest children)

Layout Passes
=============

The current algorithm works in a kind of top-down refinement. When a node is
laid out by calling it's ``reflow()`` method, it is informed of its parent's
current pass number (``passno``). It then iterates through its own ``passno``
from zero up to it's parent's ``passno`` and terminates at the first admissible
layout. Note that within the layout of the node itself, it's current
``passno`` can only affect its ``wrap`` decision. However, because each of its
children will advance through their own passes, the overall layout of a subtree
between two different passes may change, even if the node at the subtree root
didn't change it's ``wrap`` decision between those passes.

This approach seems to work well even for
:ref:`deeply nested <install-case-study>` or
:ref:`complex <conditionals-case-study>` statements.

Newline decision
================

When a node is in horizontal layout mode (``wrap=False``), there are a couple
of reasons why the algorithm might choose to insert a newline between two
of it's children.

1. If a token would overflow the column limit, insert a newline (e.g. the
   usual notion of wrapping)
2. If the token is the last token before a closing parenthesis, and the
   token plus the parenthesis would overflow the column limit, then insert a
   newline.
3. If a token is preceded by a line comment, then the token cannot be placed
   on the same line as the comment (or it will become part of the comment) so
   a newline is inserted between them.
4. If a token is a line comment which is not associated with an argument (e.g.
   it is  a "free" comment at the current scope) then it will not be placed
   on the same line as a preceding argument token. If it was, then subsequent
   parses would associate this comment with that argument. In such a case, a
   newline is inserted between the preceding argument and the line comment.
5. If the node is an interior node, and one of it's children is internally
   wrapped (i.e. consumes more than two lines) then it will not be placed
   on the same line as another node. In such a case a newlines is inserted.
6. If the node is an interior node and a child fails to find an admissible
   layout at the current cursor, a newline is inserted and a new layout attempt
   is made for the child.

Admissible layouts
==================

There are a couple of reasons why a layout may be deemed inadmissible:

1. If the bounding box of a node overflows the column limit
2. If a node is horizontally wrapped at the current ``passno`` but consumes
   more than ``max_lines_hwrap`` lines
3. If the node is horizontally wrapped at the current ``passno`` but the node
   path is marked as ``always_wrap``

Comments
========

A (multi-line) comment on the last row does not contribute to the height for
the purposes of this thresholding, but one on any other line does. Another way
to say this is that comments are excluded from the size computation, but
their influence on other argument is not::

    # This content is 3 lines tall
    foobarbaz_hello(▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
       ▏argument_one argument_two # this comment is two lines long and it   ▕
       ▏                          # forces the next argument onto line three▕
       ▏argument_three argument_four)                                       ▕
       ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    # This is only 2 lines tall
    foobarbaz_hello(▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
       ▏argument_one argument_two argument_three▕
       ▏argument_four # this comment is two lines long and wraps but it
       ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔# has no contribution to the size of the content.

Dealing with comments during horizontal wrapping can be a little tricky.
They definitely induce a newline at their termination, but they may also
predicate a newline in front of the commented argument. See the examples in
:ref:`Case Studies/Comments <comments-case-study>`. We don't necessarily need
to deal with this right now. The user can always force the issue by adding some
comment strings that force a comment width, like this::

    set(HEADERS header_a.h header_b.h header_c.h
        header_d.h # This comment is pretty long and if it's argument is close
                   # to the edge of the column then the comment gets wrapped
                   # very poorly ------------------------
        header_e.h header_f.h)

The string of dashes ``------------------------`` is long enough that the
minimum width of the comment block is given by::

    # This comment is pretty
    # long and if it's
    # argument is close to the
    # edge of the column then
    # the comment gets wrapped
    # very poorly
    # ------------------------

Which would preclude it from being crammed into the right-most slot.