1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
|
from debputy.linting.lint_util import LintState
# Contributing to diagnostics or reformatting (such as `debputy lint`)
This document is aimed at introducing a potential new contributor to the
diagnostics and reformatting framework in `debputy`.
## Prerequisites
The reader is expected to:
* Be familiar with Python coding
* Have some basic understanding of diagnostics ("linting") or
formatting (`debputy reformat`)
## Context and important modules
In `debputy`, linting is built as a part of the language server protocol
(LSP) feature. This means that most of the linting code starts in the
`debputy.lsp` package with a few helper utilities in `debputy.linting`.
Finally, the entry point is in
`debputy.commands.debputy_cmd.lint_and_lsp_cmds.py`, which is in turn
imported by `debputy.commands.debputy_cmd` for setting up the command
line.
The supported languages are defined in `debputy.lsp.languages`. Each
language provides a number of annotated functions that defines the
entry point for the relevant LSP feature. The relevant functions will
be the one annotated with `@lint_diagnostics(_DISPATCH_RULE)` for diagnostics.
For formatting, the functions are the ones annotated with:
* `@lsp_cli_reformat_document(_DISPATCH_RULE)`
* `@lsp_format_document(_DISPATCH_RULE)`
* `@lsp_will_save_wait_until(_DISPATCH_RULE)`
These three should generally delegate to a common base method that does the
actual formatting by converting arguments to a `LintState`. Long term, the
three should be merged so writing formatting for a language requires less
coding.
In either case, the function is invoked at least once per file of this type
being processed. In LSP mode, a file can be processed multiple times as the
file is being edited or saved leading to multiple calls for the same file.
The tests live in `tests/lint_tests/lint` (diagnostics) and
`tests/lint_tests/reformat` (reformatting). For historical reasons,
they both share the prefix `tests/lint_tests`. A better name there is
welcome.
### Restrictions
It is important that the CLI features and the LSP module share the same
code base, so the logic is in one place.
Because the code base is shared with the LSP module, it is important to
**never** access files directly. In the LSP mode, files might have unsaved
content that is not visible in the file system.
Additionally, due to how the LSP module works, the code must always track
very offset and line perfectly and *never* mutate it in place as it will
corrupt the state for future calculations.
For the files that can have multiple formats or versions (such as
`debian/copyright`), the code must start by validating the file is the
correct format first. The `deb822`-based files have a built-in mechanism
for this. Other formats must roll their own.
## Key entities
Here are the following key entities in `debputy`'s diagnostics module:
* The `LintState`
* `TERange` / `TEPosition` vs. Range` / `Position`
* A `Diagnostic`
* A `Quickfix`
* A `TextEdit`
### `LintState`
The `LintState` provides accessors to the diagnostics and reformatting.
It abstracts away many of the language server specific APIs vs. CLI
support to simplify writing the logic. It also simplifies the setup
for testing as a `LintState` has considerably fewer moving parts than
the `LanguageServer` / `DebputyLanguageServer` classes.
The `LintState` will provide you with:
* `content` / `lines`: The raw contents of the file being processed
right now.
* `path` / `doc_uri`: The path or URI to the file being processed.
Note: There is **no** guarantee that exists or contains the same
as `content` / `lines`. A file can be open in the editor having
a full and valid path without ever being stored to disk.
* `parsed_deb822_file_content`: If the file is `deb822` based,
then this attribute will contain a parsed version of the file.
#### Diagnostic specific features for `LintState` (linting only)
For diagnostics, the following attributes or members might be
of interest:
* `emit_diagnostic`: The central pillar of the linting engine as
is makes the diagnostic available to the user.
* `slow_iter`: Wraps an iterable in an awaitable that occasionally
yields. This is mainly useful for the LSP case where `debputy`
should still respond to editor requests.
* `source_package` / `binary_packages`: Access to the fields of
of the `debian/control` file for the `Source` stanza or one
of the `Package` stanzas.
* `related_diagnostic_information`: If a diagnostic should
highlight some related content, this function can be called
to setup the related ranges. Useful for things like
"X and Y conflict" cases, etc.
* `spellchecker`: Provides a spellchecker if the file context
human-readable test to be spellchecked. A placeholder will
be provided if spellchecking is disabled, so the code does
not does have to care for that.
* `dh_sequencer_data` / `debputy_metadata`: Data related to
`dh` or `debputy` that might be useful if the diagnostic
depends on the attributes provided by these.
#### Formatting specific features for `LintState` (reformat only)
* `position_codec`: Codec for mapping between `debputy` and the editor.
This is only relevant for `TextEdit` or when creating LSP objects
directly. See Range / Position for details*.
* `effective_preference`: The active formatting preference.
### `TERange` / `TEPosition` vs. Range` / `Position`
Positions and ranges are central in everything. All diagnostics
and formatting (`TextEdit`s) involve selecting a range. For
diagnostics, the range decides what the editor (or CLI) should
highlight. For reformatting, the formatting is done by creating
`TextEdit`s and every edit involves a range, which should be
replaced by a piece of text. Here an empty range is used for
inserting text and empty replacement for deleting.
Every range is in turn made up of two positions. A start position
and an end position.
Unfortunately, this gets a bit more complex here, since some more
aspects on how define positions.
#### `python-debian` vs LSP position and ranges
In the diagnostics, almost everything uses `python-debian` position.
They are imported as `TERange` and `TEPosition`. These are always in
"server encoding" (see next section)
However, for a few concern cases in diagnostics and for all `TextEdit`s
in reformatting, the code will need LSP position or ranges. Typically,
the code will start with a `TERange` and then convert it to a
LSP `Range` (`server encoded`) via `te_range_to_lsp`. Ditto for `TEPosition`
to LSP `Position` via `te_position_to_lsp`. Finally, the entity has to
be converted to the client encoding.
Note: `debputy` could swap to LSP position/ranges internally with no
loss of functionality. However, any `deb822` format would still need
to interface with `TERange` and `TEPosition` since those are the positions
used by `python-debian` (the parser for the `deb822` formats).
#### `server encoding` vs. `client encoding`
The language server protocol operates with a client encoding and a server
encoding of the text. The two sides decides their own internal way of
encoding strings. However, when transferring position and ranges, they
agree on one of `UTF-16`, `UTF-32`, or `UTF-8` as the encoding and all
positions are based on that regardless of what the server/client uses.
In the code base, the term `server encoding` is used for positions or
ranges based on Python's string indexes. As an example, the `1:2` in
`"æøå"[1:2]` is a valid Python range for the string and will extract
the letter `ø`. Assuming this constitutes the first line of a file,
then it would be defined as the following LSP Range (`server encoding`)
plus how to convert it to encoding that the editor supports.
```python
import lsprotocol.types as types
from debputy.linting.lint_util import LintState
lint_state: LintState = ...
sever_encoded_range = types.Range(
types.Position(0, 1),
types.Position(0, 2),
)
client_encoded_range = lint_state.position_codec.range_to_client_units(
lint_state.lines,
sever_encoded_range,
)
```
The general advice is stay in the `python-debian` entities
(`TERange` + `TEPosition`) whenever possible. Then, convert to
LSP format using `te_range_to_lsp` and finally convert to client
range. When naming variables containing LSP entities, there is
a convention to use `_server_units` as suffix for variables
containing server encoded variables.
Caveat: The example above is only valid when the range being
encoded is from the file currently being processed. While it is
the most common case, it will not suffice for cross-file
diagnostics. The `lint_state.lines` should match the file the
range is from.
### `Diagnostic` (linting only)
A `Diagnostic` is used to flag something as incorrect. The `debputy`
code internally uses the Language Server Protocol (LSP) definition of a
diagnostic, because the module aims at providing diagnostics via
the LSP. The linting code does not use the LSP `Diagnostic` directly
(using instead `LintState.emit_diagnostic`), which enables `debputy`
to add some extra features mostly transparently. Nevertheless, the API is
constrained by the LSP spec in some ways and that is generally by design.
Every diagnostic has 5 key pieces of information:
* The file it applies to. This is generally *implicit* in `debputy`
hidden inside the `LintState` object itself.
* A text range in the file, that should be highlighted. Unlike most
other linters, a simple line number is insufficient. The code must
identify the *range* to be marked as incorrect (accounting for
whitespace, comments, etc.).
The `emit_diagnostics` expects a `TERange`.
* A severity (`error`, `warning`, ...) that is used to classify how
important the issue is. The `debputy` framework has more severities
than the LSP. These severities are visible in `debputy lint`. In the
LSP mode, the extra severities are mapped to an existing LSP severity
transparently to the linting code.
* A text message to show to the user for the diagnostic in question.
This is a one-liner and this limitation comes from the LSP.
* A source/authority of the diagnostic. This is also shown to the
user with the goal of aiding them in understanding where the
requirement comes from. The authority is typically something like
`Policy 3.4`.
The code base checks for known authorities to ensure consistency,
so you may need to add a new authority. See
`DIAG_SOURCE_WITHOUT_SECTIONS` and `DIAG_SOURCE_WITH_SECTIONS`
in `lint_util.py` for known authorities.
Diagnostics are emitted via the `LintState.emit_diagnostic` method.
#### Quick fixes
Beyond the bare minimum, a `Diagnostic` can also benefit from quick-fixes.
Each diagnostic can have a list of quick-fixes. Each element in that list
defines an option for how to fix the diagnostic. Additionally, the quick-fixes
may be used for auto-fixing with `debputy lint --auto-fix` mode.
The list should generally be the return value of one of:
* `propose_correct_text_quick_fix`: When used, the range provided in the
diagnostic will be provided with the given string passed to this function.
* `propose_insert_text_on_line_after_diagnostic_quick_fix`: When used, insert
the text provided to this function on the fist line starting immediately
after the end position of the diagnostic range. Remember a trailing newline
if you intend to create a new line.
* `propose_remove_range_quick_fix`: When used, the range provided to
`emit_diagnostic` will be removed. Anything remaining on the line
(such as trailing newlines) will remain.
* `propose_remove_line_quick_fix`: When used, the range provided to
`emit_diagnostic` must span no more than a single line. Said line will
be deleted.
Note: Feel free to add more of these quick fixes to `debputy.lsp.quickfixes`.
If the fix provided is not good enough for non-interactive fixing such as
`debputy lint --auto-fix`, then pass `enable_non_interactive_auto_fix=False`
to `LintState.emit_diagnostic`. The common case is with spellchecking since
it has a lot of false positives.
# Testing
Most of the relevant tests live in `tests/lint_tests`. There might be some
special-cases or very complex tests in `tests/lsp_tests`.
## Testing diagnostics
When testing diagnostics, the test will generally need a small file that
triggers the issue. Each existing format with tests will define a
`line_linter` fixture that takes a list of lines and runs it through
the diagnostics code.
It is important to note that the diagnostics returned are LSP diagnostics.
As a consequence, they use LSP severities which are not 1:1 with `debputy`
severities. As an example `"pedantic"` and `"spelling"` both become
`DiagnosticSeverity.Hint`.
Useful tricks:
* Ranges can be turned into a string for a simpler comparison than creating
a full LSP Range object.
* The `diag_range_to_text` helper can convert the lines + range into the
concrete text matched by that range.
* It is possible to test the quick fix with a bit of extra lifting. See
`test_dch_lint_signoff_line_email_date_spacing` from
`test_lint_changelog.py`
* If the contents of `debian/control` is relevant (and it is not the `dcpy` test),
remember to set `line_linter.dctrl_lines`.
An example from `tests/lint_tests/lint/test_lint_debputy.py`:
```python
def test_debputy_lint_syntax_error_a(line_linter: LintWrapper) -> None:
lines = textwrap.dedent(
"""\
manifest-version: '0.1'
foo
"""
).splitlines(keepends=True)
diagnostics = line_linter(lines)
assert len(diagnostics) == 1
issue = diagnostics[0]
# We do not check the error message here, since it is directly from the YAML library
# and therefore an implementation detail.
assert f"{issue.range}" == "1:0-1:3"
assert issue.severity == DiagnosticSeverity.Error
assert diag_range_to_text(lines, issue.range) == "foo"
```
## Testing reformatting
When testing reformatting, the test will generally need a small file
that should be reformatted. Each existing format with tests will define a
`reformater` fixture that takes a list of lines and runs it through
the reformatting code. Note the test has to provide the desired style
for the formatting to do anything.
Generally:
* Often the test should do at least two passes. The first run without a
style to ensure the code backs out without a style. The second run can
then be done with the `black` style or the style being tested.
* Edits can be applied via the `apply_formatting_edits` and then compared
against an expected content.
An example is `test_dctrl_reformat_preserve_comments` from
`tests/lint_tests/reformat/test_reformat_dctrl.py`.
|