1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569
|
Development notes
=================
Collection of various notes about development practices, how-to's or
checklists.
Adding a new ioctl, extending an existing one
---------------------------------------------
- add code to `strace <https://github.com/strace/strace>`__ so the ioctl calls
are parsed into a human readable form. Most of the ioctls are already
`implemented <https://github.com/strace/strace/blob/master/src/btrfs.c>`__ and
can be used a reference.
Tracepoints
-----------
The tracepoint message format should be compact and consistent, so please stick
to the following format:
- *key=value* no spaces around *=*
- separated by spaces, not commas
- named values: print value and string, like "%llu(%s)", no space between,
string in parens
- avoid abbreviating key values too much, (e.g. use 'size' not 'sz')
- hexadecimal values are always preceded by *0x* (use "0x%llx")
- use *struct btrfs_inode* for inode types, not plain *struct inode*
- inode number type is *u64*, use *btrfs_ino* if needed
- key can be printed as *[%llu,%u,%llu]*
- enum types need to be exported as *TRACE_DEFINE_ENUM*
**Example:**
event: *btrfs__chunk*
string: ``"root=%llu(%s) offset=%llu size=%llu num_stripes=%d sub_stripes=%d type=%s"``
Error messages, verbosity
-------------------------
- use *btrfs_\** helpers (btrfs_err, btrfs_info, ...), they print a filesystem
identification like ``BTRFS info (device sdb): ...``
- first letter in the string is lowercase
- message contents
- be descriptive
- keep the text length reasonable (fits one line without wrapping)
- no typos
- print values that refer to what happened (inode number, subvolume
id, path, offset, ...)
- print error value from the last call
- no *"\\n"* at the end of the string
- no *".*'' at the end of text
- un-indent the string so it fits under 80 columns
- don't split long strings, overflow 80 is ok in this case (we want
to be able to search for the error messages in the sources easily)
- value representation
- decimal: offsets, length, ordinary numbers
- hexadecimal: checksums
- hexadecimal + string: bitmasks (e.g. raid profiles, flags)
- intervals of integers:
- closed interval (end values inclusive): [0, 4096]
- half-open (right value excluded): [0, 4096)
- half-open (left value excluded): (0, 4096] -- that one may look
strange but is used in several places
Message level
-------------
- btrfs_err -- such messages have high visibility so use them for serious
problems that need user attention
- btrfs_warn -- conditions that are not too serious but can point to potential
problems, the system should be still in a good state
- btrfs_info -- use for informative messages that are useful to see what's
happening in the filesystem or might help debugging problems in the future
and are worth keeping in the logs
Error handling and transaction abort
------------------------------------
Please keep all transaction abort exactly at the place where they happen and do
not merge them to one. This pattern should be used everywhere and is important
when debugging because we can pinpoint the line in the code from the syslog
message and do not have to guess which way it got to the merged call.
Error handling and return values
--------------------------------
Functions are supposed to handle all errors of the callees and clean up the
local context before returning. If a function does not need to pass errors to
the caller it can be switched to return *void*. Before doing so make sure that:
- the function does not call any BUG/BUG_ON
- all callees properly handle errors and do not call BUG/BUG_ON in place of
error handling
- the whole call chain starting from the function satisfies the above
Handling unexpected conditions
------------------------------
This is different than error handling. An unexpected condition happens when the
code invariants/assumptions do not hold and there's no way to recover from the
situation. This means that returning an error to the caller can't be done and
continuing would only propagate the logic error further. The reasons for that
bug can be two fold: internal (a genuine bug) or external (e.g. memory bitflip,
memory corrupted by other subsystem). In this case it is allowed to use the
nuclear option and do BUG_ON, that is otherwise highly discouraged.
There are several ways how to react to the unexpected conditions:
- btrfs_abort_transaction()
The recommended way if and only if we can not recover from the situation and
have a transaction handle.
This would cause the filesystem to be flipped read-only to prevent further
corruption.
Additionally call trace would be dumped for the first btrfs_abort_transaction()
call site.
- ASSERT()
Conditionally compiled in and crashes when the condition is false.
This should only be utilized for debugging purposes, acts as a fail-fast
option for developers, thus should not be utilized for error handling.
It's recommended only for very basic (thus sometimes unnecessary) requirements.
Such usage should be easy to locate, have no complex call chain.
E.g. to rule out invalid function parameter combinations.
Should not be utilized on any data/metadata reads from disks, as they can be
invalid. For sanity check examples of on-disk metadata, please refer to
`Tree checker`.
- WARN_ON
Unconditional and noisy checks, but still allow the code to continue.
Should only be utilized if a call trace is critical for debugging.
Not recommended if:
* The call site is unique or can be easily located
In that case, an error message is recommended.
* The call site would eventually lead to a btrfs_abort_transaction() call
btrfs_abort_transaction() call would dump the stack anyway.
If the call trace is critical, it's recommended to move the
btrfs_abort_transaction() call closer to the place where the error happens.
- BUG_ON
Should not be utilized, and is incrementally removed or replaced in the code.
Error injection using eBPF
--------------------------
Functions can be annotated to enable error injection using the eBPF scripts.
See e.g. ``disk-io.c:open_ctree``. For btrfs-specific injection, the annotation
is ALLOW_ERROR_INJECTION, but beware that this only overrides the return value
and this can leak memory or other resources. For error injection to generic
helpers (e.g. memory allocation), you can use something like
``bcc/tools/inject.py kmalloc btrfs_alloc_device() -P 0.001``
Resources:
- `eBPF <https://ebpf.io/>`_
- `BCC tools <https://github.com/iovisor/bcc>`_
Warnings and issues found by static checkers and similar tools
--------------------------------------------------------------
There are tools to automatically identify known issues in code and report them
as problems to be fixed, but not all such reports are valid or relevant in the
context of the code base. The fix should really fix the code, not just the
tool's warning. Such patches will be rejected with explanation first time and
ignored when sent repeatedly. Patches fixing real problems with a good
explanation are welcome. If you're not sure about sending such patch, please
ask the https://kernelnewbies.org/KernelJanitors for help.
Do not blindly report issues caught by:
- checkpatch.pl -- the script is good for catching some coding style but this
whole wiki page exists to be explicit what we want, not necessarily what
checkpatch wants
- clang static analyzer -- lots of the reports are not real problems and may
depend on a condition that's not recognized by the checker
- gcc -Wunused -- any of the -Wunused-\* options can report a valid issue but
it must be viewed in wider context and not just removed to get rid of the
warning
- codespell -- fixing typos is fine but should be done in batches and over
whole code base
Hints:
- if you find an issue, look in the whole code base if there are more instances
same or following a similar pattern
- look into git history of the changed code, why it got there and when, it may
help to understand if it's a bug or e.g. a stale code
Coding style preferences
------------------------
Before applying recommendations from this sections, please make sure you're
familiar with the `kernel coding style guide
<https://www.kernel.org/doc/html/latest/process/coding-style.html%7Cgeneric>`__.
The purpose of coding style is to maintain unified and consistent look & feel
of the patches and code, keeping distractions to minimum which decreases
cognitive load and allows focus on the important things. Coding style is not
only where to put white space or curly brackets but also coding patterns with
meaning that is established and understood in the developer group. The code in
linux kernel is maintained for a long period of time and maintainability is of
crucial importance. This means it does take time to write good code, with
attention to detail. Once written the code could stay unchanged for years but
will be read many times. `Read more
<https://simpleprogrammer.com/maintaining-code/>`__.
General advice: *Try to keep the same style and formatting of the code that's
already there.*
Patches
^^^^^^^
- for patch subject use ``btrfs:`` followed by a lowercase
- read the patch again and fix all typos and grammar
- size units should use short form K/M/G/T or IEC form KiB/MiB/GiB
- don't write references to parameters to subject (like removing @pointer)
- do not end subject with a dot '.'
- parts of btrfs that could have a subject prefix to point to a specific subsystem
- scrub, tests, integrity-checker, tree-checker, discard, locking, sysfs,
raid56, qgroup, compression, send, ioctl
- additional information
- if there's a stack trace relevant for the patch, add it there (lockdep,
crash, warning)
- steps to reproduce a bug (that will also get turned to a proper fstests
case)
- sample output before/after if it could have impact on userspace
- `pahole <https://linux.die.net/man/1/pahole>`__ output if structure is
being reorganized and optimized
Function declarations
^^^^^^^^^^^^^^^^^^^^^
- avoid prototypes for static functions, order them in new code in a way that
does not need it
- but don't move static functions just to get rid of the prototype
- exported functions have ``btrfs_`` prefix
- do not use functions with double underscore, there's only a few valid uses of
that, namely when *\__function* is doing the core of the work with looser
checks, no locks or more parameters than *function*
- function type and name are on the same line, if this is too long, the
parameters continue on the next line (indented)
- ``static inline`` functions should be small (measured by their resulting binary
size) and not used in .c unless it's justified that inlining makes a difference
- conditional definitions should follow the style below, where the full
function body is in .c
.. code-block:: c
#ifdef CONFIG_BTRFS_DEBUG
void btrfs_assert_everything_is_fine(void *ptr);
#else
void btrfs_assert_everything_is_fine(void *ptr) { }
#endif
Headers
^^^^^^^
- leave one newline before ``#endif`` in headers
- include headers that add usage of a data structure or API, also remove such
header with last use of the API
- keep the includes in headers self-contained and compilable, add forward
declarations for types if possible
Comments
^^^^^^^^
- function comment goes to the .c file, not the header
- ``kdoc style`` is recommended but the exact syntax is not mandated and
we're using only subset of the formatting
- the first line (mandatory): contains a brief description of what
the function does and should provide a summary information
- do write in the imperative style "Iterate all pages and clear
some bits"
- don't write "This function is a helper to ...", "This is used
to ..."
- parameter description (optional):
- each line describes the parameter
- the list needs to be in the same order as for the function
- the list needs to be complete
- trivial parameters don't need to be explained, e.g. fs_info is
clear so the description could be 'the filesystem'
- context of the parameters matters a lot in some cases and
cannot be inferred from the name, then it should be documented
.. code-block:: c
/*
* Look for blocks in the given offset.
*
* @fs_info: trivial parameters should be in the list but with some short description
* @offset: describe the context of the argument, e.g. offset to page or inode ...
*
* Long description comes here if necessary.
*
* Return value semantics if it's not obvious
*/
- comment new ``enum/define`` values, brief description or pointers to the code
that uses them
- comment new ``data structures``, their purpose and context of use
- do not put struct member comments on the same line, put it on the line
before and do not trim/abbreviate the text
- comment text that fits on one line can use the */\* text \*/* format, slight
overflow of 80 chars is OK
Misc
^^^^
- fix spelling, grammar and formatting of comments that get moved or changed
- fix coding style in code that's only moved
- one newline between functions
- 80 chars per line are recommended but longer lines are OK (up to 90) if the
code "looks better" without the line break, e.g. if half of the word is beyond 80 chars
but it's clear what it is, or function prototypes do not need to wrap arguments
Locking
^^^^^^^
- do not use ``spin_is_locked`` but ``lockdep_assert_held``
- do not use ``assert_spin_locked`` without reading it's semantics (it does
not check if the caller hold the lock)
- use ``lockdep_assert_held`` and its friends for lock assertions (``lockdep_assert_not_held``,
``lockdep_assert_held_write``)
- add lock assertions to functions called deep in the call chain
- *(to be decided)* do not use ``scoped_guard()`` instead of explicit locking section
- *(to be decided)* using ``guard()`` is possible in justified cases where it improves
readability, i.e. short function (20 lines) with entire body under one lock
Code
^^^^
- default function return value is ``int ret``, temporary return values should
be named like ``ret2`` etc
- structure initializers should use ``{ 0 }``
- do not use ``short int`` type if possible, if it fits to char/u8 use it instead,
or plain int/u32
- memory barriers need to be always documented
- add ``const`` to parameters that are not modified
- use ``bool`` for indicators and bool status variables (not int)
- use matching int types (size, signedness), with exceptions
- use ``ENUM_BIT`` for enumerated bit values (that don't have assigned fixed numbers)
- add function annotations ``__cold``, ``__init``, ``__attribute_const__`` if applicable
- use automatic variable cleanup for:
- *struct btrfs_path* with ``BTRFS_PATH_AUTO_FREE``
- any variable allocated by *kmalloc/kfree* using ``AUTO_KFREE`` or ``AUTO_KVFREE``
- for structures sparsely used do ``__free(...)`` when it does not make sense
to define the cleanup macro
- use of ``unlikely()`` annotation is OK and recommended for the following cases:
- control flow of the function is changed due to error handling and it
leads to *never-happens* errors like ``EUCLEAN``, ``EIO``
Output
^^^^^^
- when dumping a lot of data after an error, consider what will remain visible
last
- in case of ``btrfs_print_leaf``, print the specific error message after
that
Expressions, operators
^^^^^^^^^^^^^^^^^^^^^^
- spaces around ``binary operators``, no spaces for unary operators
- extra braces around expressions that might be harder to understand wrt
precedence are fine (logical and/or, shifts with other operations)
- *a \* b + c*, *(a << b) + c*, *(a % b) + c*
- ``64bit division`` is not allowed, either avoid it completely, or use bit
shifts or use div_u64 helpers; do not use *do_div* for division as it's a
macro and has side effects
- do not use chained assignments: no *a = b = c;*
Variable declarations, parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- declaration block in a function should do only simple initializations
(pointers, constants); nothing that would require error handling or has
non-trivial side effects
- use ``const`` extensively
- add temporary variable to store a value if it's used multiple times in the
function, or if reading the value needs to chase a long pointer chain
- do not mix declarations and statements (although kernel uses C standard that
allows that)
Kernel config options
---------------------
Compile-time config options for kernel that can help debugging, testing. They
usually take a hit on performance or resources (memory) so they should be
selected wisely. The options in **bold** should be safe to use by default for
debugging builds.
Please refer to the option documentation for further details.
- devices for testing
- **CONFIG_BLK_DEV_LOOP** - enable loop device
- for fstests: **DM_FLAKEY**, **CONFIG_FAIL_MAKE_REQUEST**
- **CONFIG_SCSI_DEBUG** - fake scsi block device
- memory
- **CONFIG_SLUB_DEBUG** - boot with slub_debug
- CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT (on
newer kernels)
- CONFIG_SCHED_STACK_END_CHECK
- CONFIG_PAGE_POISONING
- CONFIG_HAVE_DEBUG_KMEMLEAK
- CONFIG_FAILSLAB -- fault injection to kmalloc
- CONFIG_DEBUG_LIST
- CONFIG_BUG_ON_DATA_CORRUPTION
- btrfs
- **CONFIG_BTRFS_DEBUG**
- **CONFIG_BTRFS_ASSERT**
- **CONFIG_BTRFS_EXPERIMENTAL**
- **CONFIG_BTRFS_FS_RUN_SANITY_TESTS** -- basic tests on module load
- locking
- CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_MUTEXES
- CONFIG_DEBUG_LOCK_ALLOC
- CONFIG_PROVE_LOCKING, CONFIG_LOCKDEP
- CONFIG_LOCK_STAT
- CONFIG_PROVE_RCU
- CONFIG_DEBUG_ATOMIC_SLEEP
- sanity checks
- CONFIG_DEBUG_STACK_USAGE, CONFIG_HAVE_DEBUG_STACKOVERFLOW,
CONFIG_DEBUG_STACKOVERFLOW
- CONFIG_STACKTRACE
- CONFIG_KASAN -- address sanitizer
- CONFIG_UBSAN -- undefined behaviour sanitizer
- CONFIG_KCSAN -- concurrency checker
- verbose reporting
- CONFIG_DEBUG_BUGVERBOSE
- tracing
- CONFIG_TRACING etc
BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Not a bug and please don't report it. The lockdep structures can get in some
cases full and cannot properly track locks anymore. There's only a workaround
to increase the kernel config value of CONFIG_LOCKDEP_CHAINS_BITS, default is
16, 18 tends to work, increase if needed.
fstests setup
-------------
The `fstests <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/>`_
suite has very few "hard" requirements and will succeed without
actually running many tests. In order to ensure full test coverage, your test
environment should provide the settings from the following sections. Please
note that newly added tests silently add new dependencies, so you should always
review results after an update.
Kernel config options for complete test coverage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``CONFIG_FAULT_INJECTION=y``
- ``CONFIG_FAULT_INJECTION_DEBUG_FS=y``
- ``CONFIG_FAIL_MAKE_REQUEST=y``
- ``CONFIG_DM_FLAKEY=m`` or ``y``
- ``CONFIG_DM_THIN_PROVISIONING=m`` or ``y``
- ``CONFIG_DM_SNAPSHOT=m`` or ``y``
- ``CONFIG_DM_DELAY=m`` or ``y``
- ``CONFIG_DM_ERROR=m`` or ``y``
- ``CONFIG_DM_LOG_WRITES=m`` or ``y``
- ``CONFIG_DM_DUST=m`` or ``y``
- ``CONFIG_DM_ZERO=m`` or ``y``
- ``CONFIG_BLK_DEV_LOOP=m`` or ``y``
- ``CONFIG_EXT4_FS=m`` or ``y``
- ``CONFIG_SCSI_DEBUG=m``
- ``CONFIG_BLK_DEV_ZONED=y`` for zoned mode test coverage
- ``CONFIG_IO_URING==y``
Kernel config options for better bug reports
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the list in the section above for more options.
User space utilities and development library dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- acl
- attr
- btrfsprogs
- dbench
- dmsetup (device-mapper)
- duperemove
- e2fsprogs
- fio
- fsverity-utils
- libacl
- libaio
- libattr
- libcap-progs
- libuuid
- lvm2
- openssl
- parted
- psmisc (killall)
- xfsprogs >= 4.3.1 (``xfs_io -c reflink`` is required)
Note: This list may be incomplete.
Storage environment
^^^^^^^^^^^^^^^^^^^
- at least 4 identically sized partitions/disks/virtual disks, specified using
``$SCRATCH_DEV_POOL``
- some tests may require 8 equally sized ``SCRATCH_DEV_POOL`` partitions
- some tests need at least 10G of free space, as determined by ``df``, i.e.
the size of the device may need to be larger, 12G should work
- some tests require ``$LOGWRITES_DEV``, yet another separate block device,
for power fail testing
- for testing trim and discard, the devices must be capable of that (physical
or virtual)
Other requirements
^^^^^^^^^^^^^^^^^^
- An ``fsgqa`` user and group must exist.
- An ``fsgqa2`` user and group must exist.
- The user ``nobody`` must exist.
- An ``123456-fsgqa`` user and group must exist.
|