1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329
|
Building a linter
=================
.. sectionauthor:: Sebastian Ehlert <@awvwgk>
.. image:: https://img.shields.io/badge/difficulty-beginner-brightgreen
:alt: Difficulty: Beginner
This tutorial will show how to use TOML Fortran to build a linter for your configuration files.
Linters provide a way to encourage or enforce a certain style or flag up common usage errors.
Target selection
----------------
This tutorial will look into finding lint in the package manifest from the Fortran package manager (`fpm <https://fpm.fortran-lang.org>`_).
We will use its plugin mechanism to create a new subcommand called ``lint``.
We start with setting up the package manifest for our linter:
.. code-block:: toml
:caption: fpm.toml
name = "fpm-lint"
version = "0.1.0"
[dependencies]
toml-f.git = "https://github.com/toml-lang/toml-f.git"
Configuration of the linter
---------------------------
To configure our linter we will use the `extra section <https://fpm.fortran-lang.org/en/spec/manifest.html#additional-free-data-field>`__ in the manifest which is specially reserved for tools integrating with fpm and boldly claim *extra.fpm.lint* as our configuration section.
Using the package manifest provides us with two advantages, first this document will be present in all projects using fpm, second if we can read our configuration from the manifest, we are already sure it is valid TOML.
.. code-block:: toml
:caption: fpm.toml
# ...
[extra.fpm.lint]
package-name = true
bare-keys = true
Now we will set up our main program to run the linter.
.. literalinclude:: fpm-lint/app/main0.f90
:language: fortran
:caption: app/main.f90
We create a utility module for the *get_argument* function used to retrieve the manifest name, in most cases we can default to *fpm.toml*, but for testing it is convenient to pass an argument.
.. literalinclude:: fpm-lint/src/utils.f90
:language: fortran
:caption: src/utils.f90
:lines: 1-5, 7, 13-15, 49-69
The first error source we can encounter stems from parsing the TOML document itself.
This is outside of our responsibility to handle, still we want to check whether we can report the error correctly.
.. literalinclude:: fpm-lint/example/0-invalid.toml
:language: toml
:caption: fpm.toml (invalid)
Running the linter on this document will break with the following message produced by the *toml_load* procedure.
.. ansi-block::
❯ fpm run -- invalid.toml
[0;1;31merror[0m[0;1m: Invalid expression for value[0m
[0;1;34m-->[0m invalid.toml:4:15
[0;1;34m|[0m
4 [0;1;34m|[0m package-name =
[0;1;34m|[0m [0;1;31m^[0m [0;1;31munexpected newline[0m
[0;1;34m|[0m
With this case covered we proceed with reading the configuration for our linter.
Our configuration from the package manifest will be stored in a *lint_config* type which we define in a separate module.
Reading the configuration will happen from the root table, meaning we have to advance through several subtables first before we can process the options for our linter.
We want to report errors with rich context information here as well, therefore we request the *origin* in every call to the *get_value* interface and produce a report using the *context* we obtained in the main program.
.. literalinclude:: fpm-lint/src/config.f90
:language: fortran
:caption: src/config.f90
For convenience, we defined a *make_error* routine to allocate the error handler and store our report from the context.
At this point, we should check whether our error reporting works and run the linter on an incorrect TOML document.
.. literalinclude:: fpm-lint/example/0-incorrect.toml
:language: toml
:caption: fpm.toml
.. dropdown:: current main program
Putting everything together in the main program should look like this.
.. literalinclude:: fpm-lint/app/main1.f90
:language: fortran
:caption: app/main.f90
Running our linter on this file will correctly flag this as an error since a string value is provided rather than a boolean value.
.. ansi-block::
❯ fpm run -- fpm.toml
[0;1;31merror[0m[0;1m: Entry in 'package-name' must be boolean[0m
[0;1;34m-->[0m fpm.toml:4:16-21
[0;1;34m|[0m
4 [0;1;34m|[0m package-name = "true"
[0;1;34m|[0m [0;1;31m^^^^^^[0m [0;1;31mexpected boolean value[0m
[0;1;34m|[0m
Finally, we define a logging mechanism to capture our actual linting messages which are not fatal.
The logger provides two procedures, *add_message* to store a message and *show_log* to display all stored messages.
.. literalinclude:: fpm-lint/src/logger.f90
:language: fortran
:caption: src/logger.f90
Recommended package name
------------------------
As a first linting check we will inspect the package name, for this we will apply the following rules:
1. the package name should be a TOML bare key to not require quotes in *dependency* sections, characters like dots, colons, or slashes are not allowed
2. TOML generally favors lowercase dashed keys, therefore we will discourage capitalization (camelCase and PascalCase) as well as underscores (snake_case)
3. there are several ways to declare strings in TOML, we want to favor the normal string one
An example of a package name we would disallow would be *fpmLinter* as seen in the manifest below.
.. literalinclude:: fpm-lint/example/1-camel-case.toml
:language: toml
:caption: fpm.toml
Let's start with our implementation of this check.
For convenience we will reexport the other modules from the *fpm_lint* module, this allows one clean import in the main program.
Then we define the *lint_data* procedure, where we first check whether the *name* key is present, if not we create a message at the *info* level and leave our block scope, as all further checks rely on the presence of the entry.
We can now check whether the entry is provided as a string or maybe as something else, like a literal string, which we can flag.
Furthermore, we verify that the package name uses only lowercase letters, numbers, and dashes with the *verify* intrinsic.
.. literalinclude:: fpm-lint/src/lint.f90
:language: fortran
:caption: src/lint.f90
:lines: 1-14, 16-66, 191
.. tip::
The ``toml_level`` parameter provides a statically initialized derived type enumerating all available report levels.
Similarly, the ``token_kind`` parameter provides an enumeration of the token kinds.
You can think of it as an enumerator with a proper namespace.
.. dropdown:: current main program
Putting everything together in the main program should look like this.
.. literalinclude:: fpm-lint/app/main2.f90
:language: fortran
:caption: app/main.f90
We check this on the camelCase package name from above and can find the following output.
.. ansi-block::
❯ fpm run -- fpm.toml
[0;1;35minfo[0m[0;1m: Package name should be lowercase with dashes[0m
[0;1;34m-->[0m fpm.toml:1:8-18
[0;1;34m|[0m
1 [0;1;34m|[0m name = "fpmLinter"
[0;1;34m|[0m [0;1;35m^^^^^^^^^^^[0m
[0;1;34m|[0m
.. admonition:: Exercise
:class: note
Add a check for the length of the package name, everything under three characters is probably a bad choice, so is a too long package name.
Create an example to trigger the error with your new check.
What happens if a too long camelCase package name is used?
Bare key paths preferred
------------------------
TOML allows to quote keys, however this might become visually distracting if some keys are quoted and others are not.
With our package name rule, there should not be the need to quote any keys even in dependency sections.
To determine whether a string is used in the context of a key we need a way to identify all keys.
We could check all entries in the data structures by implementing a visitor object which walks through all tables and checks the keys.
However, this is somewhat inefficient and we can also miss keys that are not recorded.
.. literalinclude:: fpm-lint/example/2-dotted-keys.toml
:language: toml
:caption: fpm.toml
In this example, the second occurrence of the key ``toml-f`` will only reference the table but it is already defined the line before.
The quotation marks are visually identifiable as lint and we need a programmatic way to flag this.
Instead of working with the data structure, we will use the parser to record more tokens in the context.
Rather than using the context to only report errors, we will use it to identify keys.
This is done by increasing the *context_detail* option in the *config* keyword of the parser to one.
Now all tokens except for whitespace and comments will be recorded.
.. code-block:: fortran
:caption: app/main.f90
call toml_load(table, manifest, error=error, context=context, &
& config=toml_parser_config(color=color, context_detail=1))
.. tip::
Increasing the ``context_detail`` to two will also record whitespace and comments.
This can be useful when writing checks for whitespace or indentation styles.
Our linter pass will work as follows:
1. identifying all relevant keys in the manifest
2. check whether they are keypath tokens
3. create a report for any key that is a string or a literal
Our implementation reflects this by first collecting an array of *toml_key* objects in *list* and then iterating over all entries checking whether they are the correct *token_kind*.
.. literalinclude:: fpm-lint/src/lint.f90
:language: fortran
:caption: src/lint.f90
:lines: 67-96
To create the list we need to implement the *identify_keys* procedure.
The rules in TOML for key paths are simple: before an equal sign we can have key paths and keypath can only be present in table bodies or inline tables.
This can be implemented by using a stack storing whether the current scope belongs in a table, array, or value.
We will always push a new scope on the respective token opening it, *i.e.* a value is opened by an equal sign, an array by a right bracket, and an inline table by a right curly brace.
To distinguish table headers from inline arrays we only push arrays on our stack after an equal sign.
Finally, we default to a table scope if no other scope is present and we have collected all required rules to identify key paths.
Similarly, we can identify the endings of the scopes.
We then can check whether the current scope on the top of the stack allows key paths and record those in our list.
.. literalinclude:: fpm-lint/src/lint.f90
:language: fortran
:caption: src/lint.f90
:lines: 98-189
For convenience, we implement a *push_back* and *pop* function to add and remove scopes from our stack.
The *pop* function will additionally perform a check whether we want to remove a matching scope and save us some repetition in the loop this way.
In our utility module, we implement the *resize* procedure for an array of integers
.. literalinclude:: fpm-lint/src/utils.f90
:language: fortran
:caption: src/utils.f90
:lines: 1-48, 69
.. dropdown:: current main program
Putting everything together in the main program should look like this.
.. literalinclude:: fpm-lint/app/main3.f90
:language: fortran
:caption: app/main.f90
At this point, we can now add a call in our main program to run the linter.
.. ansi-block::
❯ fpm run -- fpm.toml
[0;1;35minfo[0m[0;1m: String used in key path[0m
[0;1;34m-->[0m fpm.toml:5:1-8
[0;1;34m|[0m
5 [0;1;34m|[0m "toml-f".tag = "v0.2.3"
[0;1;34m|[0m [0;1;35m^^^^^^^^[0m [0;1;35muse bare key instead[0m
[0;1;34m|[0m
Now for something more tricky with an inline table to check whether our scoping rules are working correctly.
.. literalinclude:: fpm-lint/example/2-inline-table.toml
:language: toml
:caption: fpm.toml
Our linter can correctly identify the *tag* entry as a string in the key path context and produces the appropriate message.
.. ansi-block::
❯ fpm run -- fpm.toml
[0;1;35minfo[0m[0;1m: String used in key path[0m
[0;1;34m-->[0m fpm.toml:4:53-57
[0;1;34m|[0m
4 [0;1;34m|[0m toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"}
[0;1;34m|[0m [0;1;35m^^^^^[0m [0;1;35muse bare key instead[0m
[0;1;34m|[0m
.. admonition:: Exercise
:class: note
Previously, we flagged the usage of a literal string as a value for the package name, however a package manifest can contain much more string values.
Create a check for all string values in the manifest to ensure they use double-quotes.
Collect string values (*string*, *literal*, *mstring*, and *mliteral*) from array and value scopes for this purpose.
Can you make a meaningful suggestion if a literal string contains characters that must be escaped in a double-quoted string?
Summary
-------
This concludes the linting we wanted to implement for the fpm package manifest.
For a feature-complete linter, the rule set to check for is usually growing with time and might also shift as new rules are adopted.
Our linter currently provides only a few rules but has the potential to include more checks as the need arises.
.. admonition:: Exercise
Our output is currently in the order of the checks, rather than in the order of reports occurring in the TOML document.
The output of the reports might become more intuitive if it was sorted according to the source lines.
Record the first character in the output together with the messages in the logger.
Have the logger sort the messages according to their order before printing them.
.. important::
In this tutorial, you have learned how to report custom error messages in your TOML input data.
You can now
- report colorized error messages with rich context information
- create error messages when reading a TOML data structure
- control the details captured in the context describing the TOML document
- check a TOML document based on the token information in the context
|