1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394
|
fstransform idea comes from now-abandoned program convertfs (http://tzukanov.narod.ru/convertfs)
and from the author's desire to return to C/C++ programming after a long hiatus.
fstransform is designed and implemented from scratch with safety in mind:
every reasonable precaution is taken in order not to lose/corrupt data.
Anyway, the operations performed by fstransform, and especially the
goal NOT to need or use a backup, are intrinsically VERY RISKY.
For this reason, no matter how well (or badly) designed fstransform
actually is, there are chances that by using it you will LOSE YOUR DATA.
Please understand that the risk is only yours, and that you take full
responsibility for using fstransform. In other words, the following standard
disclaimer applies:
THE AUTHOR DECLINES ALL RESPONSIBILITIES FOR ANY DAMAGE THAT MAY DERIVE
FROM USING THE PROGRAMS AND PROCEDURES DESCRIBED IN THIS DOCUMENT
Having said that, let's move to the requirements that led to fstransform:
############################### Requirements ##################################
1) be able function with very limited free disk space
2) be able to remap in-place a block device from any filesystem to any other,
preserving file, directories, soft links, hard links, special devices,
timestamps and owner/group permissions if the following prerequisites
are satisfied:
a) source filesystem supports sparse files (holes) and ioctl(FIEMAP or FIBMAP)
b) both source and target filesystems are POSIX-like,
i.e. they support links, special devices, timestamps and UNIX-like permissions
3) be reasonably fast
operations on disk must be O(N), i.e. linear with device size,
and perform large contiguous reads and writes as much as possible.
#################### Requirements not yet implemented ########################
4) be SAFE.
If relevant hardware and drivers are working correctly,
data must NEVER be lost or corrupted, not even in case of program crash
or sudden power loss.
Additionally, relevant hardware and drivers must be checked for known defects
at program start, I/O problems must be detected as soon as possible,
both using S.M.A.R.T. and by detecting I/O errors,
and every reasonable effort must be made not to lose or corrupt data
even in case of defective/failing disks or other relevant hardware
and their drivers.
5) store work progress status on disk,
and be able to resume work or completely undo it when program is started again.
################################# Reminders ###################################
* instantiate template classes only T = ft_uint and T = ft_uoff
* components must be emulable, especially OS I/O
* fstatfs(loop_fd) to stat device file-system, including used space (=S) and total inodes (=I)
* create loop file-system with number of inodes >= I
* Immediately check on created loop file-system that available space >= S
* move files inside loop file-system, monitoring available space on BOTH
original device and loop file-system
* at fsremap start, check that device is mounted read-only
############################## fsremap algorithm #################################
================== Example initial situation =================
device
+-----------------------------------------------------------------------------+
|01|02| |04| | |07|08| | | | | | |15| | | |19|20|21| | | | |26|
+-----------------------------------------------------------------------------+
loop file inside device
+-----------------------------------------------------------------------------+
| | | | |21|26| | |19|17|10|15|08|11| |07|09| | | | |02|01| |06| |
+-----------------------------------------------------------------------------+
============== Example final situation (target) ==============
loop-file (physical blocks)
+-----------------------------------------------------------------------------+
|01|02| | | |06|07|08|09|10|11| | | |15| |17| |19| |21| | | | |26|
+-----------------------------------------------------------------------------+
device backup inside loop-file (physical blocks) - somewhere in these blocks
+-----------------------------------------------------------------------------+
| | |**|**|**| | | | | | |**|**|**| |**| |**| |**| |**|**|**|**| |
+-----------------------------------------------------------------------------+
================ Algorithm simulation ================
0) compute LOOP-FILE extents, DEVICE extents and FREE-SPACE extents
1) find LOOP-FILE (logical) holes, i.e. LOOP-HOLES
+-----------------------------------------------------------------------------+
| | | | | | | | | | | | | | | | | | | | | | | | | | |
+------03-04-05-------------------12-13-14----16----18----20----22-23-24-25---+
2) re-number used DEVICE blocks, setting ->logical to values
from loop-file holes (LOOP-HOLES). do not greedily use low hole numbers:
a) prefer holes with ->logical numbers equal to DEVICE ->physical block number:
they produce an INVARIANT block, already in its final destination
(marked with @@)
b) spread the remaining ->logical across rest of holes (use best-fit allocation)
device renumbered blocks - original numbers are above
+01-02----04-------07-08-------------------15----------19-20-21-------------26+
|03|05| |04| | |12|13| | | | | | |16| | | |18|20|23| | | | |25|
+---------@@----------------------------------------------@@------------------+
2.1) mark as INVARIANT (with @@) the LOOP-FILE (logical) blocks already in their
final destination and forget them (no work on those).
also compute total length of blocks remaining in LOOP-FILE.
in this case, none
+01-02----04-------07-08-------------------15----------19-20-21-------------26+
|03|05| |04| | |12|13| | | | | | |16| | | |18|20|23| | | | |25|
+---------@@----------------------------------------------@@------------------+
3) merge renumbered DEVICE blocks with remaining LOOP-FILE blocks (remember who's who)
merged blocks
+01-02----04-------07-08-------------------15----------19-20-21-------------26+
|03|05| |04|21|26|12|13|19|17|10|15|08|11|16|07|09| |18|20|23|02|01| |06|25|
+---------@@----------------------------------------------@@------------------+
4) compute (physical) intersection of FREE-SPACE and LOOP-HOLES,
and mark it as INVARIANT FREE-SPACE (with !!).
we can use these blocks as partial or total replacement for STORAGE - see 5)
if there are sections with enough consecutive blocks (>= blocks to relocate / 1024)
in this case, only 24
+01-02----04-------07-08-------------------15----------19-20-21-------------26+
|03|05| |04|21|26|12|13|19|17|10|15|08|11|16|07|09| |18|20|23|02|01| |06|25|
+---------@@----------------------------------------------@@----------!!------+
5) sort
Favor in-order disk reads, even if they cause out-of-order disk writes
Reason: exploit read-ahead and write-back to speed up execution
Reads and writes will happen in moderately large in-order chunks, such that:
a) chunk size must fit into RAM
b) storage must not overflow
storage is the contatenation of primary-storage and secondary-storage.
primary-storage is the intersection of FREE-SPACE and LOOP-HOLES,
aligned to system PAGE_SIZE.
secondary storage is the file ${dir}/.fsremap/job.$$.storage
+==============+
| | | | | | storage (SMALL!)
+==============+
5.0) start with position P=1
5.1) if storage has N free blocks (in this case N=5),
lookup the contents of the consecutive positions P..P+N-1
on disk (in this case 01..05).
5.2) note the blocks in such positions (in this case 03|05| |04|21).
ignore both free blocks and blocks already in their final destination:
in case we find either kind, go back to 5.1 and increase N accordingly
(in some cases we could loop between 5.1 and 5.2 many times)
(in this case, one block is free and another - 04 - is in its final
destination so we set N=7, extend the lookup to 01..07 and find
03|05| |@@|21|26|12 - again, remember to ignore 04, it's marked @@)
5.3) sort the found blocks by their (logical) number (in this case 03 05 12 21 26)
5.4) for each block, check the contents of its final destination
(in this case, pos[03]=free, pos[05]=21, pos[12]=15, pos[21]=23, pos[26]=25).
if a final destination is free, move the block there and mark
its old position as free, otherwise put the block in a "move queue"
5.5) for each non-free final destination, examine the "lookup" block it contains
and check if such block final destination is free or not.
in case it is free, directly move there the "lookup" block,
else move the "lookup" block into storage.
also remember to update the "move queue", as some "lookup" blocks
may also belong to it
(in this case, final destinations are 21 15 23 25.
none of them is free, so blocks 03 05 12 21 26 are moved to storage)
+======21=26===+
|21|15|23|25| | storage
+==============+
5.6) mark as free the blocks moved away from final destinations checked at 5.4)
+01-02----04-------07-08-------------------15----------19-20------------------+
|03|05| |04| |26|12|13|19|17|10| |08|11|16|07|09| |18|20| |02|01| |06| |
+---------@@----------------------------!!----------------@@----!!----!!------+
5.7) move blocks in the "move queue" to their final destinations,
if they were not moved already to their final destination by step 5.5
(note: step 5.5 could have moved them to storage)
remember to read and write them in disk order, and to mark their initial
position as free AFTER they have been written in their final destination,
and recorded as such.
(in this case: read 03|05 21|26|12, write 03 05 12 21 26)
and to mark them with @@
+------01-04-02-------08----------07-------15----------19-20------------------+
| | |03|04|05| | |13|19|17|10|12|08|11|16|07|09| |18|20|21|02|01| |06|26|
+------@@-@@-@@-------------------@@----!!----------------@@-@@-!!----!!----@@+
5.8) in storage, mark as free any block that was just written (to its
final destination) during previous step (in this case, 21).
+======21=26===+
| |15|23|25| | storage
+==============+
5.9) scan storage for blocks that can be directly written to their
final destination (i.e. if it's free) and move them
(in this case none)
5.10) if storage is at least 50% full then perform extra pass 6,
else set P = P+N and restart from 5.1 if new P is less than device size
6) extra pass if storage is at least 50% full
6.1) find at least N/2 and up to N blocks that can be directly moved
to their final destination
note: since storage has at least N/2 blocks, there are at least N/2
free final destinations, and thus at least N/2 blocks that could be moved
to such destinations.
how to find: get the position number of free blocks not marked '!!'
note: the blocks could be in storage if we arrived here from step 6.2)
(in this case 01 02 06 07 18)
read them in disk order (in this case 07 18 02 01 06)
and write them into their final destination in disk order (in this case
01 02 07 07 18)
+------01-04-02-------08----------07-------15-------19----20------------------+
|01|02|03|04|05|06|07|13|19|17|10|12|08|11|16| |09|18| |20|21| | | | |26|
+@@-@@-@@-@@-@@-@@-@@-------------@@----!!----------@@----@@-@@-!!----!!----@@+
6.2) repeat 6.1) until storage is 25% full or less,
or until work is finished,
or until less than N/2 blocks can be moved (whatever comes first)
(in this case move 16 19 23 25)
+------01-04-02-------08----------07----------15----19----20-------21----26---+
|01|02|03|04|05|06|07|13| |17|10|12|08|11| |16|09|18|19|20|21| |23| |25|26|
+@@-@@-@@-@@-@@-@@-@@-------------@@----!!----@@----@@-@@-@@-@@-!!-@@-!!-@@-@@+
+==============+
| |15| | | | storage
+==============+
6.3) OBSOLETE - free positions marked as '!!' are primary-storage
move as many blocks as possible from storage to free positions
marked '!!'
(in this case 15)
+------01-04-02-------08----------07----------15----19----20-------21----26---+
|01|02|03|04|05|06|07|13| |17|10|12|08|11| |16|09|18|19|20|21|15|23| |25|26|
+@@-@@-@@-@@-@@-@@-@@-------------@@----!!----@@----@@-@@-@@-@@-!!-@@-!!-@@-@@+
+==============+
| | | | | | storage
+==============+
6.4) set P = P + N and restart from 5.1 if new P is less than device size
(in this case N = 7 and new P = 8)
[continued]
[5.1] N=5, lookup positions 08..12
[5.2] pos[09]=free, pos[12]=@@, so increase N=7 and try again
[5.1] N=7, lookup positions 08..14, found blocks 13 17 10 08 11 (ignore 12,
is marked '@@')
[5.2] 5 positions are used (= storage size), continue with [5.3]
[5.3] order blocks by their number, 08 10 11 13 17
[5.4] for each block, check the contents of its final destination
pos[08]=13, pos[10]=17, pos[11]=10, pos[13]=08, pos[17]=09
read blocks in final destinations, 13 17 10 08 09
[5.5] check the contents of such blocks (13 17 10 08 09) final destinations.
for each such block, in case its own final destination is free move it there,
else move it into storage
only 09 can be moved directly (17 requires 09 to be moved first,
13 requires 08 to be moved first),
so put 13 17 10 08 to storage.
+08============+
|13|17|10|08| | storage
+==============+
[5.6] mark the blocks moved away from final destinations as free
+------01-04-02-------------------07----------15----19----20-------21----26---+
|01|02|03|04|05|06|07| |09| | |12| |11| |16| |18|19|20|21|15|23| |25|26|
+@@-@@-@@-@@-@@-@@-@@----@@-------@@----!!----@@----@@-@@-@@-@@-!!-@@-!!-@@-@@+
[5.7] move the blocks selected at step [5.2] into their final destinations,
if they were not moved already in their final destination by step 5.5
(note: step 5.5 could have moved them to storage)
read/write in disk order, update marks.
read 13 17 10 08 11 (11 from main device, others from storage)
write 08 10 11 13 17 to their final destinations
+------01-04-02-------------------07-08-------15----19----20-------21----26---+
|01|02|03|04|05|06|07|08|09|10|11|12|13| | |16|17|18|19|20|21|15|23| |25|26|
+@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-!!----@@-@@-@@-@@-@@-@@-!!-@@-!!-@@-@@+
[5.8] in storage, mark as free any block that was just written (to its
final destination) during previous step (in this case, all).
+==============+
| | | | | | storage
+==============+
[5.9] scan storage for blocks that can be directly written to their
final destination (i.e. if it's free) and move them
(in this case none)
[5.10] if storage is more than 50% full then perform extra pass 6,
else set P = P+N and restart from 5.1 if new P is less than device size
(in this case storage is empty, so since N=7, set new P=15 and
restart from 5.1)
[5.1] N=5, lookup positions 15..19
[5.2] pos[15]=free, all other = @@, so increase N=10 and try again
[5.1] N=10, lookup positions 15..24
[5.2] ignoring free and @@ positions, we find pos[22] = 15
so increase N=14 and try again
[5.1] N=14, lookup positions 15..28 -> truncate to device length (26)
so N=12, lookup 15..26
[5.2] ignoring free and @@ positions, we still find only pos[22] = 15
and cannot increase N more, we are at end of device
[5.3] order found blocks by their number, 15
[5.4] for each block, check the contents of its final destination
pos[15]=free
read blocks in final destinations, none
[5.5] check the contents of such blocks (none) final destinations
and for each such block -> empty loop
[5.6] mark the blocks moved away from final destinations (none) as free
[5.7] move the blocks selected at step [5.2] into their final destinations,
if they were not moved already in their final destination by step 5.5
(note: step 5.5 could have moved them to storage)
read/write in disk order, update marks.
we move block 15 to its destination
+------01-04-02-------------------07-08-------15----19----20-------21----26---+
|01|02|03|04|05|06|07|08|09|10|11|12|13| |15|16|17|18|19|20|21| |23| |25|26|
+@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-@@-!!-@@-@@-@@-@@-@@-@@-@@-!!-@@-!!-@@-@@+
[5.8] in storage, mark as free any block that was just written (to its
final destination) during previous step (in this case, none).
[5.9] scan storage for blocks that can be directly written to their
final destination (i.e. if it's free) and move them
(in this case none)
[5.10] if storage is more than 50% full then perform extra pass 6,
else set P = P+N and restart from 5.1 if new P is less than device size
(in this case storage is empty, so since N=12, set new P=26,
is not less than device size -> we finished)
|