File: Glossary.rst

package info (click to toggle)
btrfs-progs 6.2-1%2Bdeb12u1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 17,244 kB
  • sloc: ansic: 114,376; sh: 9,576; python: 1,242; makefile: 820
file content (333 lines) | stat: -rw-r--r-- 14,420 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
Glossary
========

Terms in *italics* also appear in this glossary.

allocator
	Usually *allocator* means the *block* allocator, i.e. the logic
	inside filesystem which decides where to place newly allocated blocks
	in order to maintain several constraints (like data locality, low
	fragmentation).

	In btrfs, allocator may also refer to *chunk* allocator, i.e. the
	logic behind placing chunks on devices.

balance
	An operation that can be done to a btrfs filesystem, for example
	through ``btrfs fi balance /path``. A
	balance passes all data in the filesystem through the *allocator*
	again. It is primarily intended to rebalance the data in the filesystem
	across the *devices* when a device is added or removed. A balance
	will regenerate missing copies for the redundant *RAID* levels, if a
	device has failed. As of Linux kernel 3.3, a balance operation can be
	made selective about which parts of the filesystem are rewritten.

barrier
	An instruction to the disk hardware to ensure that everything before
	the barrier is physically written to permanent storage before anything
	after it. Used in btrfs's *copy on write* approach to ensure
	filesystem consistency.

block
	A single physically and logically contiguous piece of storage on a
	device, of size e.g. 4K.

block group
	The unit of allocation of space in btrfs. A block group is laid out on
	the disk by the btrfs *allocator*, and will consist of one or more
	*chunks*, each stored on a different *device*. The number of chunks
	used in a block group will depend on its *RAID* level.

B-tree
	The fundamental storage data structure used in btrfs. Except for the
	*superblocks*, all of btrfs *metadata* is stored in one of several
	B-trees on disk. B-trees store key/item pairs. While the same code is
	used to implement all of the B-trees, there are a few different
	categories of B-tree. The name *btrfs*
	refers to its use of B-trees.

btrfsck
	Tool in *btrfs-progs* that checks a filesystem *offline* (i.e.
	unmounted), and reports on any errors in the filesystem structures it
	finds.  By default the tool runs in read-only mode as fixing errors is
        potentially dangerous.  See also *scrub*.

btrfs-progs
	User mode tools to manage btrfs-specific features. Maintained at
        http://github.com/kdave/btrfs-progs.git . The main frontend to btrfs
        features is the standalone tool *btrfs*, although
        other tools such as *mkfs.btrfs* and *btrfstune* are also part of
        btrfs-progs.

chunk
	A part of a *block group*. Chunks are either 1 GiB in size (for data)
	or 256 MiB (for *metadata*).

chunk tree
	A layer that keeps information about mapping between physical and
	logical block addresses. It's stored within the *system* group.

cleaner
	Usually referred to in context of deleted subvolumes. It's a background
	process that removes the actual data once a subvolume has been deleted.
	Cleaning can involve lots of IO and CPU activity depending on the
	fragmentation and amount of shared data with other subvolumes.

copy-on-write
	Also known as *COW*. The method that btrfs uses for modifying data.
	Instead of directly overwriting data in place, btrfs takes a copy of
	the data, alters it, and then writes the modified data back to a
	different (free) location on the disk. It then updates the *metadata*
	to reflect the new location of the data. In order to update the
	metadata, the affected metadata blocks are also treated in the same
	way. In COW filesystems, files tend to fragment as they are modified.
	Copy-on-write is also used in the implementation of *snapshots* and
	*reflink copies*. A copy-on-write filesystem is, in theory,
	*always* consistent, provided the underlying hardware supports
	*barriers*.

COW
	See *copy-on-write*.

default subvolume
	The *subvolume* in a btrfs filesystem which is mounted when mounting
	the filesystem without using the ``subvol=`` mount option.

device
	A Linux block device, e.g. a whole disk, partition, LVM logical volume,
	loopback device, or network block device. A btrfs filesystem can reside
	on one or more devices.

df
	A standard Unix tool for reporting the amount of space used and free in
	a filesystem. The standard tool does not give accurate results, but the
	*btrfs* command from *btrfs-progs* has
	an implementation of *df* which shows space available in more detail. See
	the
	[[FAQ#Why_does_df_show_incorrect_free_space_for_my_RAID_volume.3F|FAQ]]
	for a more detailed explanation of btrfs free space accounting.

DUP
	A form of "*RAID*" which stores two copies of each piece of data on
	the same *device*. This is similar to *RAID-1*, and protects
	against *block*-level errors on the device, but does not provide any
	guarantees if the entire device fails. By default, btrfs uses *DUP*
	profile for metadata on filesystems with one rotational device,
	*single* profile on filesystems with one non-rotational device, and
	*RAID1* profile on filesystems with more than one device.

ENOSPC
	Error code returned by the OS to a user program when the filesystem
	cannot allocate enough data to fulfill the user requested. In most
	filesystems, it indicates there is no free space available in the
	filesystem. Due to the additional space requirements from btrfs's
	*COW* behaviour, btrfs can sometimes return ENOSPC when there is
	apparently (in terms of *df*) a large amount of space free. This is
	effectively a bug in btrfs, and (if it is repeatable), using the mount
	option ``enospc_debug`` may give a report
	that will help the btrfs developers. See the
	[[FAQ#if_your_device_is_large_.28.3E16GiB.29|FAQ entry]] on free space.

extent
	Contiguous sequence of bytes on disk that holds file data.

	A file stored on disk with 3 extents means that it consists of three
	fragments of contiguous bytes. See *filefrag*. A file in one extent
	would mean it is not fragmented.

Extent buffer
	An abstraction to allow access to *B-tree* blocks larger than a page size.

fallocate
	Command line tool in util-linux, and a syscall, that reserves space in
	the filesystem for a file, without actually writing any file data to
	the filesystem. First data write will turn the preallocated extents
        into regular ones. See *fallocate(1)* and *fallocate(2)* manual pages
        for more details.

filefrag
	A tool to show the number of extents in a file, and hence the amount of
	fragmentation in the file. It is usually part of the e2fsprogs package
	on most Linux distributions. While initially developed for the ext2
	filesystem, it works on Btrfs as well. It uses the *FIEMAP* ioctl.

free space cache
	Btrfs doesn't track free space, it only tracks allocated space. Free
	space is by definition any holes in the allocated space, but finding
	these holes is actually fairly I/O intensive. The free space cache
	stores a condensed representation of what is free. It is updated on
	every *transaction* commit.

fsync
	On Unix and Unix-like operating systems (of which Linux is the latter),
	the ``fsync()`` system call causes all buffered file
	descriptor related data changes to be flushed to the underlying block
	device. When a file is modified on a modern operating system the
	changes are generally not written to the disk immediately but rather
	those changes are buffered in memory for reasons of performance,
	calling ``fsync()`` causes any in-memory changes to be written
	to disk.

generation
	An internal counter which updates for each *transaction*. When a
	*metadata* block is written (using *copy on write*), current
	generation is stored in the block, so that blocks which are too new
	(and hence possibly inconsistent) can be identified.

key
	A fixed sized tuple used to identify and sort items in a *B-tree*.
	The key is broken up into 3 parts: *objectid*, *type*, and
	*offset*. The *type* field indicates how each of the other two
	fields should be used, and what to expect to find in the item.

item
	A variable sized structure stored in B-tree leaves. Items hold
	different types of data depending on key type.

log tree
        A b-tree that temporarily tracks ongoing metadata updates until a full
        transaction commit is done. It's a performance optimization of
        ``fsync``. The log tracked in the tree are replayed if the filesystem
        is not unmounted cleanly.

metadata
	Data about data. In btrfs, this includes all of the internal data
	structures of the filesystem, including directory structures,
	filenames, file permissions, checksums, and the location of each file's
	*extents*. All btrfs metadata is stored in *B-trees*.

mkfs.btrfs
	The tool (from *btrfs-progs*) to create a btrfs filesystem.

offline
	A filesystem which is not mounted is offline. Some tools (e.g.
	*btrfsck*) will only work on offline filesystems. Compare *online*.

online
	A filesystem which is mounted is online. Most btrfs tools will only
	work on online filesystems. Compare *offline*.

orphan
        A file that's still in use (opened by a running process) but all
        directory entries of that file have been removed.

RAID
	A class of different methods for writing some additional redundant data
	across multiple *devices* so that if one device fails, the missing
	data can be reconstructed from the remaining ones. See *RAID-0*,
	*RAID-1*, *RAID-5*, *RAID-6*, *RAID-10*, *DUP* and
	*single*. Traditional RAID methods operate across multiple devices of
	equal size, whereas btrfs's RAID implementation works inside *block
	groups*.

RAID-0
	A form of *RAID* which provides no form of error recovery, but
	stripes a single copy of data across multiple devices for performance
	purposes. The stripe size is fixed to 64KB for now.

RAID-1
	A form of *RAID* which stores two complete copies of each piece of
	data. Each copy is stored on a different *device*. btrfs requires a
	minimum of two devices to use RAID-1. This is the default for btrfs's
	*metadata* on more than one device.

RAID-5
	A form of *RAID* which stripes a single copy of data across multiple
	*devices*, including one device's worth of additional parity data.
	Can be used to recover from a single device failure.

RAID-6
	A form of *RAID* which stripes a single copy of data across multiple
	*devices*, including two device's worth of additional parity data. Can
	be used to recover from the failure of two devices.

RAID-10
	A form of *RAID* which stores two complete copies of each piece of
	data, and also stripes each copy across multiple devices for
	performance.

reflink
	Parameter to ``cp``, allowing it to take advantage of the
	capabilities of *COW*-capable filesystems. Allows for files to be
	copied and modified, with only the modifications taking up additional
	storage space. May be considered as *snapshots* on a single file rather
	than a *subvolume*. Example: ``cp --reflink file1 file2``

relocation
	The process of moving block groups within the filesystem while
	maintaining full filesystem integrity and consistency. This
	functionality is underlying *balance* and *device* removing features.

scrub
	An *online* filesystem checking tool. Reads all the data and metadata
	on the filesystem, and uses *checksums* and the duplicate copies from
	*RAID* storage to identify and repair any corrupt data.

seed device
	A readonly device can be used as a filesystem seed or template (e.g. a
	CD-ROM containing an OS image). Read/write devices can be added to
	store modifications (using *copy on write*), changes to the writable
	devices are persistent across reboots. The original device remains
	unchanged and can be removed at any time (after Btrfs has been
	instructed to copy over all missing blocks). Multiple read/write file
	systems can be built from the same seed.

single
	A "*RAID*" level in btrfs, storing a single copy of each piece of data.
	The default for data (as opposed to *metadata*) in btrfs. Single is
	also default metadata profile for non-rotational (SSD, flash) devices.

snapshot
	A *subvolume* which is a *copy on write* copy of another subvolume. The
	two subvolumes share all of their common (unmodified) data, which means
	that snapshots can be used to keep the historical state of a filesystem
	very cheaply. After the snapshot is made, the original subvolume and
	the snapshot are of equal status: the original does not "own" the
	snapshot, and either one can be deleted without affecting the other
	one.

subvolume
	A tree of files and directories inside a btrfs that can be mounted as
	if it were an independent filesystem. A subvolume is created by taking
	a reference on the root of another subvolume. Each btrfs filesystem has
	at least one subvolume, the *top-level subvolume*, which contains
	everything else in the filesystem. Additional subvolumes can be created
        and deleted with the *btrfs<* tool. All subvolumes share the same pool
        of free space in the filesystem. See also *default subvolume*.

superblock
	The *block* on the disk, at a fixed known location and of fixed size,
	which contains pointers to the disk blocks containing all the other
	filesystem *metadata* structures. btrfs stores multiple copies of the
	superblock on each *device* in the filesystem at offsets 64 KiB, 64
	MiB, 256 GiB, 1 TiB and PiB.

system array
	Cryptic name of *superblock* metadata describing how to assemble a
	filesystem from multiple device. Prior to mount, the command *btrfs dev
	scan* has to be called, or all the devices have to be specified via
	mount option *device=/dev/ice*.

top-level subvolume
	The *subvolume* at the very top of the filesystem. This is the only
	subvolume present in a newly-created btrfs filesystem, and internally has ID 5,
	otherwise could be referenced as 0 (e.g. within the *set-default* subcommand of
	*btrfs*).

transaction
	A consistent set of changes. To avoid generating very large amounts of
	disk activity, btrfs caches changes in RAM for up to 30 seconds
	(sometimes more often if the filesystem is running short on space or
	doing a lot of *fsync*s), and then writes (commits) these changes out
	to disk in one go (using *copy on write* behaviour). This period of
	caching is called a transaction. Only one transaction is active on the
	filesystem at any one time.

transid
	An alternative term for *generation*.

writeback
	*Writeback* in the context of the Linux kernel can be defined as the
	process of writing "dirty" memory from the page cache to the disk,
	when certain conditions are met (timeout, number of dirty pages over a
	ratio).