File: CONTRIBUTING.md

package info (click to toggle)
ruff 0.0.291%2Bdfsg1-4
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 27,672 kB
  • sloc: python: 30,930; sh: 189; makefile: 9
file content (426 lines) | stat: -rw-r--r-- 15,868 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
# Contributing to the Ruff Formatter

The goal of our formatter is to be compatible with Black except for rare edge cases (mostly
involving comment placement). This document outlines the expected development workflow for the
formatter and walks through some of its internals.

## Testing your changes

You can use the `ruff_python_formatter` binary to format individual files and show debug info.
It's fast to compile because it doesn't depend on `ruff`. The easiest way is to create a
`scratch.py` (or `scratch.pyi`) in the project root and run:

```shell
cargo run --bin ruff_python_formatter -- --emit stdout scratch.py
```

...which supports the `--print-ir` and `--print-comments` flag. (We recommend running with
`--print-comments`.)

<details>
<summary>Usage example</summary>

Command

```shell
cargo run --bin ruff_python_formatter -- --emit stdout --print-comments --print-ir scratch.py
```

Input

```python
def f():  # a
    pass
```

Output

```text
[
  "def f",
  group([group(["()"]), source_position(7)]),
  ":",
  line_suffix(["  # a"]),
  expand_parent,
  indent([hard_line_break, "pass", source_position(21)]),
  hard_line_break,
  source_position(21),
  hard_line_break,
  source_position(22)
]
{
    Node {
        kind: StmtFunctionDef,
        range: 0..21,
        source: `def f(): # a⏎`,
    }: {
        "leading": [],
        "dangling": [
            SourceComment {
                text: "# a",
                position: EndOfLine,
                formatted: true,
            },
        ],
        "trailing": [],
    },
}
def f():  # a
    pass
```

</details>

The other option is to use the playground (also check the playground README):

```shell
cd playground && npm install && npm run dev:wasm && npm run dev
```

Run`npm run dev:wasm` and reload the page in the browser to refresh.

**Tests** Running the entire ruff test suite is slow, `cargo test -p ruff_python_formatter` is a
lot faster. We use [insta](https://insta.rs/) to create snapshots of all tests in
`crates/ruff_python_formatter/resources/test/fixtures/ruff`. We have copied the majority of tests
over from Black to check the difference between Ruff and Black output. Whenever we have no more
differences on a Black input file, the snapshot is deleted.

**Ecosystem checks** `scripts/formatter_ecosystem_checks.sh` runs Black compatibility and stability
checks on a number of selected projects. It will print the similarity index, the percentage of lines
that remains unchanged between Black's formatting and our formatting. You could compute it as the
number of neutral lines in a diff divided by the neutral plus the removed lines. We run this script
in CI, you can view the results in a PR page under "Checks" > "CI" > "Summary" at the bottom of the
page. The stability checks catch for three common problems: The second
formatting pass looks different than the first (formatter instability or lack of idempotency),
printing invalid syntax (e.g. missing parentheses around multiline expressions) and panics (mostly
in debug assertions). You should ensure that your changes don't decrease the similarity index.

**Terminology** For `()`, `[]` and `{}` we use the following terminology:

- Parentheses: `(`, `)` or all kind of parentheses (`()`, `[]` and `{}`, e.g.
    `has_own_parentheses`)
- Brackets: `[`, `]`
- Braces: `{`, `}`

## `format_dev`

It's possible to format an entire project:

```shell
cargo run --bin ruff_dev -- format-dev --write /path/to/my_project
```

Available options:

- `--write`: Format the files and write them back to disk.
- `--stability-check`: Format twice (but don't write to disk without `--write`) and check for
    differences and crashes.
- `--multi-project`: Treat every subdirectory as a separate project. Useful for ecosystem checks.
- `--error-file`: Write all errors to the given file.
- `--log-file`: Write all messages to the given file.
- `--stats-file`: Use together with `--multi-project`, this writes the similarity index as unicode
    table to the given file.

**Large ecosystem checks** It is also possible to check a large number of repositories. This dataset
is large (~60GB), so we only do this occasionally:

```shell
# Get the list of projects
curl https://raw.githubusercontent.com/akx/ruff-usage-aggregate/master/data/known-github-tomls-clean.jsonl > github_search.jsonl
# Repurpose this script to download the repositories for us
python scripts/check_ecosystem.py --checkouts target/checkouts --projects github_search.jsonl -v $(which true) $(which true)
# Check each project for formatter stability
cargo run --bin ruff_dev -- format-dev --stability-check --error-file target/formatter-ecosystem-errors.txt --multi-project target/checkouts
```

**Shrinking** To shrink a formatter error from an entire file to a minimal reproducible example,
you can use `ruff_shrinking`:

```shell
cargo run --bin ruff_shrinking -- <your_file> target/shrinking.py "Unstable formatting" "target/debug/ruff_dev format-dev --stability-check target/shrinking.py"
```

The first argument is the input file, the second is the output file where the candidates
and the eventual minimized version will be written to. The third argument is a regex matching the
error message, e.g. "Unstable formatting" or "Formatter error". The last argument is the command
with the error, e.g. running the stability check on the candidate file. The script will try various
strategies to remove parts of the code. If the output of the command still matches, it will use that
slightly smaller code as starting point for the next iteration, otherwise it will revert and try
a different strategy until all strategies are exhausted.

## Helper structs

To abstract formatting something into a helper, create a new struct with the data you want to
format and implement `Format<PyFormatContext<'_>> for MyStruct`. Below is a small dummy example.

```rust
/// Helper to hide the fields for the struct
pub(crate) fn empty_parenthesized<'content>(
    comments: &'content [SourceComment],
    has_plus_prefix: bool,
) -> FormatEmptyParenthesized<'content> {
    FormatEmptyParenthesized {
        comments,
        has_plus_prefix,
    }
}

/// The wrapper struct
pub(crate) struct FormatEmptyParenthesized<'content> {
    comments: &'content [SourceComment],
    has_plus_prefix: bool,
}

impl Format<PyFormatContext<'_>> for FormatEmptyParenthesized<'_> {
    /// Here we implement the actual formatting
    fn fmt(&self, f: &mut Formatter<PyFormatContext>) -> FormatResult<()> {
        if self.has_plus_prefix {
            text("+").fmt(f)?; // This is equivalent to `write!(f, [text("*")])?;`
        }
        write!(
            f,
            [
                text("("),
                soft_block_indent(&dangling_comments(&self.comments)),
                text(")")
            ]
        )
    }
}
```

If the struct is used across modules, also adds constructor function that hides the fields of the
struct. Since it implements `Format`, you can directly use it in write calls:

```rust
write!(f, [empty_parenthesized(dangling_end_of_line_comments)])?;
```

Check the `builders` module for existing primitives.

## Adding new syntax

Occasionally, Python will add new syntax. After adding it to `ruff_python_ast`, run `generate.py`
to generate stubs for node formatting. This will add a `Format{{Node}}` struct
that implements `Default` (and `AsFormat`/`IntoFormat` impls in `generated.rs`, see orphan rules
below).

```rust
#[derive(Default)]
pub struct FormatStmtReturn;
```

We implement `FormatNodeRule<{{Node}}> for Format{{Node}}`. Inside, we destructure the item to make
sure we're not missing any field. If we want to write multiple items, we use an efficient `write!`
call, for single items `.format().fmt(f)` or `.fmt(f)` is sufficient.

```rust
impl FormatNodeRule<StmtReturn> for FormatStmtReturn {
    fn fmt_fields(&self, item: &StmtReturn, f: &mut PyFormatter) -> FormatResult<()> {
        // Here we destructure item and make sure each field is listed.
        // We generally don't need range is it's underscore-ignored
        let StmtReturn { range: _, value } = item;
        // Implement some formatting logic, in this case no space (and no value) after a return with
        // no value
        if let Some(value) = value {
            write!(
                f,
                [
                    text("return"),
                    // There are multiple different space and newline types (e.g.
                    // `soft_line_break_or_space()`, check the builders module), this one will
                    // always be translate to a normal ascii whitespace character
                    space(),
                    // `return a, b` is valid, but if it wraps we'd need parentheses.
                    // This is different from `(a, b).count(1)` where the parentheses around the
                    // tuple are mandatory
                    value.format().with_options(Parenthesize::IfBreaks)
                ]
            )
        } else {
            text("return").fmt(f)
        }
    }
}
```

If something such as list or a tuple can break into multiple lines if it is too long for a single
line, wrap it into a `group`. Ignoring comments, we could format a tuple with two items like this:

```rust
write!(
    f,
    [group(&format_args![
        text("("),
        soft_block_indent(&format_args![
            item1.format()
            text(","),
            soft_line_break_or_space(),
            item2.format(),
            if_group_breaks(&text(","))
        ]),
        text(")")
    ])]
)
```

If everything fits on a single line, the group doesn't break and we get something like `("a", "b")`.
If it doesn't, we get something like

```Python
(
    "a",
    "b",
)
```

For a list of expression, you don't need to format it manually but can use the `JoinBuilder` util,
accessible through `.join_comma_separated`. Finish will write to the formatter internally.

```rust
f.join_comma_separated(item.end())
    .nodes(elts.iter())
    .finish()
// Here we have a builder that separates each element by a `,` and a [`soft_line_break_or_space`].
// It emits a trailing `,` that is only shown if the enclosing group expands. It forces the enclosing
// group to expand if the last item has a trailing `comma` and the magical comma option is enabled.
```

If you need avoid second mutable borrows with a builder, you can use `format_with(|f| { ... })` as
a formattable element similar to `text()` or `group()`.

## Comments

Comments can either be own line or end-of-line and can be marked as `Leading`, `Trailing` and `Dangling`.

```python
# Leading comment (always own line)
print("hello world")  # Trailing comment (end-of-line)
# Trailing comment (own line)
```

Comments are automatically attached as `Leading` or `Trailing` to a node close to them, or `Dangling`
if there are only tokens and no nodes surrounding it. Categorization is automatic but sometimes
needs to be overridden in
[`place_comment`](https://github.com/astral-sh/ruff/blob/be11cae619d5a24adb4da34e64d3c5f270f9727b/crates/ruff_python_formatter/src/comments/placement.rs#L13)
in `placement.rs`, which this section is about.

```Python
[
    # This needs to be handled as a dangling comment
]
```

Here, the comment is dangling because it is preceded by `[`, which is a non-trivia token but not a
node, and  followed by `]`, which is also a non-trivia token but not a node. In the `FormatExprList`
implementation, we have to call `dangling_comments` manually and stub out the
`fmt_dangling_comments` default from `FormatNodeRule`.

```rust
impl FormatNodeRule<ExprList> for FormatExprList {
    fn fmt_fields(&self, item: &ExprList, f: &mut PyFormatter) -> FormatResult<()> {
        // ...

        write!(
            f,
            [group(&format_args![
                text("["),
                dangling_comments(dangling), // Gets all the comments marked as dangling for the node
                soft_block_indent(&items),
                text("]")
            ])]
        )
    }

    fn fmt_dangling_comments(&self, _node: &ExprList, _f: &mut PyFormatter) -> FormatResult<()> {
        // Handled as part of `fmt_fields`
        Ok(())
    }
}
```

A related common challenge is that we want to attach comments to tokens (think keywords and
syntactically meaningful characters such as `:`) that have no node on their own. A slightly
simplified version of the `while` node in our AST looks like the following:

```rust
pub struct StmtWhile {
    pub range: TextRange,
    pub test: Box<Expr<TextRange>>,
    pub body: Vec<Stmt<TextRange>>,
    pub orelse: Vec<Stmt<TextRange>>,
}
```

That means in

```python
while True:  # Trailing condition comment
    if f():
        break
    # trailing while comment
# leading else comment
else:
    print("while-else")
```

the `else` has no node, we're just getting the statements in its body.

The preceding token of the leading else comment is the `break`, which has a node, the following
token is the `else`, which lacks a node, so by default the comment would be marked as trailing
the `break` and wrongly formatted as such. We can identify these cases by looking for comments
between two bodies that have the same indentation level as the keyword, e.g. in our case the
leading else comment is inside the `while` node (which spans the entire snippet) and on the same
level as the `else`. We identify those case in
[`handle_own_line_comment_around_body`](https://github.com/astral-sh/ruff/blob/4bdd99f8822d914a59f918fc46bbd17a88e2fe47/crates/ruff_python_formatter/src/comments/placement.rs#L390)
and mark them as dangling for manual formatting later. Similarly, we find and mark comment after
the colon(s) in
[`handle_end_of_line_comment_around_body`](https://github.com/astral-sh/ruff/blob/4bdd99f8822d914a59f918fc46bbd17a88e2fe47/crates/ruff_python_formatter/src/comments/placement.rs#L238C4-L238C14)
.

The comments don't carry any extra information such as why we marked the comment as trailing,
instead they are sorted into one list of leading, one list of trailing and one list of dangling
comments per node. In `FormatStmtWhile`, we can have multiple types of dangling comments, so we
have to split the dangling list into after-colon-comments, before-else-comments, etc. by some
element separating them (e.g. all comments trailing the colon come before the first statement in
the body) and manually insert them in the right position.

A simplified implementation with only those two kinds of comments:

```rust
fn fmt_fields(&self, item: &StmtWhile, f: &mut PyFormatter) -> FormatResult<()> {

    // ...

    // See FormatStmtWhile for the real, more complex implementation
    let first_while_body_stmt = item.body.first().unwrap().end();
    let trailing_condition_comments_end =
        dangling_comments.partition_point(|comment| comment.slice().end() < first_while_body_stmt);
    let (trailing_condition_comments, or_else_comments) =
        dangling_comments.split_at(trailing_condition_comments_end);

    write!(
        f,
        [
            text("while"),
            space(),
            test.format(),
            text(":"),
            trailing_comments(trailing_condition_comments),
            block_indent(&body.format())
            leading_comments(or_else_comments),
            text("else:"),
            block_indent(&orelse.format())
        ]
    )?;
}
```

## The orphan rules and trait structure

For the formatter, we would like to implement `Format` from the rust_formatter crate for all AST
nodes, defined in the rustpython_parser crate. This violates Rust's orphan rules. We therefore
generate in `generate.py` a newtype for each AST node with implementations of `FormatNodeRule`,
`FormatRule`, `AsFormat` and `IntoFormat` on it.

![excalidraw showing the relationships between the different types](orphan_rules_in_the_formatter.svg)