File: rs01.tex

package info (click to toggle)
dvdisaster 0.79.5-10
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 42,960 kB
  • sloc: ansic: 33,411; sh: 4,898; makefile: 138
file content (237 lines) | stat: -rw-r--r-- 10,816 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
\newpage
\section{The RS01 codec}
\label{rs01}
This section describes the dvdisaster RS01 Reed-Solomon codec. 
It was conceived during the summer of 2004 for creating 
error correction files in the first dvdisaster versions.
At this time, CD media was still predominant. 
Typical machines were based on Pentium 4 (tm) processors.
Measured by todays standards physical RAM and hard disk
space were scarce, and especially hard disk random I/O
was extremely slow. 

\smallskip

In order to work efficiently with the available technology,
RS01 was designed to be as space efficient as possible 
and to minimize hard disk random access. 
Optimizing the data layout for random access efficiency
lead to a parity byte distribution which left the error correction
file vulnerable to being damaged. RS01 was 
occasionally being critcized for not being able to recover 
from damaged error corrction files, but these points
were not really fair. RS01 error correction
files were never designed for being stored on fragile
media. They are supposed to
be either stored on hard disk, or to be stored on optical
media which itself is protected by dvdisaster error
correction which has the following consequences:
 Unlike optical media, hard disks do not degrade
gradually. Hard disks are usually either 100\% readable or 
completely dead, so we can assume that error correction
files on hard disk are either completely readable or fully lost.

Storing error correction files on optical media is a different
story. While an error correction file could protect itself to some
degree against lost sectors (as RS03 ecc files do), it is still
prone to the shortcomings of a file level error correction. 
The biggest disadvantage of file level error correction is
that there is no protection of file system meta data.
If meta data like a directory node becomes damaged, all files
in the directory are lost regardless of the redundancy contained
within the files. Therefore any medium containing error 
correction files must be protected with an image level
error correction layer (by using RS01,RS02 or RS03 on the medium), 
since only image level error correction avoids meta 
data sectors to become a single point of failure. See the
discussion at \url{https://web.archive.org/web/20180428070843/http://dvdisaster.net/en/qa32.html} for
more information on the advantages of image level data protection
over file level approaches.

\smallskip

Nevertheless, the time has come to phase out the RS01 codec.
Consider creating an error correction file with 32 roots 
for a 650MiB sized image using both  codecs\footnote{The benchmark was
done using the GNU/Linux version 
of dvdisaster 0.79.4 on a AMD Athlon(tm) II X4 615e 
processor. RS03 used all 4 cores of the machine.
Both image and ecc files were stored in {\tt /dev/shm}
to rule out I/O effects.}:

\begin{center}
\begin{tabular}{|l|r|r|}
\hline
codec & ecc file size & encoding time \\
\hline
RS01 & 94.58MiB & 46.2s \\
RS03 & 96.68MiB &  2.4s \\
\hline
\end{tabular}
\end{center}

RS03 is about 2.2\% less storage efficient than RS01 since
its data layout has been rearranged for better parallelization.
But this is made up by a 19-fold speed improvement as
RS03 can use multiple cores and SSE2 extensions
(of course the speed improvement varies depending on the
hardware used).
Since all other properties of RS03 do at least match those
of RS01, it's fair to begin phasing out RS01 in dvdisaster.

%\smallskip

dvdisaster V0.80 will be the first and only version 
featuring all three codecs. In version 0.82, users
will be presented a note the RS01 became deprecated.
In subsequent releases support for encoding RS01 will
be removed. Of course, capabilities to use and decode
RS01 will remain in dvdisaster for umlimited time.
Existing RS01 error correction files should remain in use
and there is be no need to replace them with RS03 ones.

\subsection{Physical layout}

\begin{figure}
 \begin{center}
 \includegraphics[width=67mm]{spiral-rs01.eps}
 \caption{Interpretation of physical layout in the .iso image}
 \label{layout-phy-one}
 \end{center}
\end{figure}

RS01 is meant to protect data which has already been written to an optical
medium, so the parity data can not be appended to the medium and must instead 
be kept in a separate error correction file. Like all dvdisaster
codecs, RS01 is based on a RS(255,k) Reed-Solomon code with  each
ecc block being comprised of $n$ data bytes and $k$ parity bytes, and
$n+k=255$.

The $n$ data bytes are taken from an iso image generated from the medium.
Reading data directly from the optical drive during encoding would slow down the
process tremendously due to massive random access over the medium, and 
quickly wear out the drive mechanics. However producing the .iso image 
takes one fast linear read, accesses the drive in a way it is designed to be used,
and puts the data on hard disk which can sustain the needed random access I/O.

Reed-Solomon codes
work best when errors are evenly distributed over all ecc blocks.
Therefore the $n$ data bytes used for creating an ecc block must be picked from
locations which are evenly distributed over the medium with a maximum
distance between each data byte pair. To obtain a suitable data distribution,
it is taken  into account that optical media are recorded as a single long 
spiral\footnote{Multiple layered
media contain one spiral for each physical layer, but are otherwise conceptually
identical.} of sectors each containing 2048 bytes.
The first sector lies at the innermost position of the spiral and is indexed with 0;
numbering continues onward to the outside of the spiral. The .iso image
contains a 1:1 mapping of this storage scheme, with the first 2048 bytes
holding the contents of sector 0, the next 2048 bytes resembling sector 1, and so on.

When encoding with $n$ data bytes per ecc block, the iso image is divided into
$n$ layers which physically map to the medium as shown in fig.\ref{layout-phy-one}. 
This distributes the ecc block reasonably good over the medium surface.
However since the image size does not need
to be a multiple of the layer size, the $n$-th layer may be physically shorter
as the layer size. For encoding purposes, the non-existant sectors in layer
$n$ are treated as sectors being filled with 2048 zero bytes. 

\subsection{Logical ecc file layout}

\begin{figure}
 \begin{center}
 \includegraphics[width=\textwidth]{rs01-layout.eps}
 \caption{Logical RS01 layout}
 \label{layout-logical-one}
 \end{center}
\end{figure}

The ecc file layout, and therefore the relationship between the iso image
contents and the ecc file, is shown in 
figure \ref{layout-logical-one}. The first 4096 bytes of the ecc file
contain the ecc header whose format is described in appendix \ref{eh}.
For RS01, only the data fields marked with ``all'' or ``RS01'' are
relevant; all other fields should be set to zero.

Next to the ecc header comes the CRC section of the ecc file. If the
iso image contains $s$ sectors, the next $4*s$ bytes in the ecc file
contain the CRC32 sums of the sectors from the iso image: Let $b_1,\dots,b_{2048}$ denote 
the bytes of the first data sector; $b_{2049},\dots,b_{4096}$ those of the
second data sector and so on. Then $c_1 = CRC32(b_1,\dots,b_{2048})$,
$c_2 = CRC32(b_{2049},\dots,b_{4096})$ etc. Note that in contrast to
RS02 and RS03, bytes from the CRC section are not included into the ecc block
calculation and are therefore not protected by ecc.

\smallskip

The remainder of the ecc file contains the parity bytes of the
ecc blocks. For an ecc file built with $k$ roots, 
the iso image is logically divided into 
$n = 255-k$ layers as shown in figure \ref{layout-logical-one}.
The $d_{i,j}$ denote the $i-th$ byte in the $j-th$ layer.
In order to create the first ecc block, bytes $d_{1,1}$ to  $d_{1,n}$ are taken from the
$n$ layers. Then the RS(255,k) code is calculated (see appendix \ref{rs} for its parameters)
and the 
resulting $k$ parity bytes $e_{1,1}$ up to $e_{k,1}$ are stored
in the ecc file. The resulting ecc block is marked grey in the
figure. The next ecc blocks are calculated and stored accordingly.
In total, the ecc section contains $k*ls$ bytes of parity information,
with the $k$ parity bytes of each ecc block being stored consecutively.

\subsection{Calculating the layout for encoding}

The RS01 layout is fully determined by the number of roots for the error correction code
and the iso image size in sectors (from now on, ``size'' always means ``number of
2048K sectors). The number of roots can be freely chosen by the user from the
range of $[8...100]$. The iso image size is directly measured
from the iso image file.

\smallskip

The number of data layers is simply calculated from the number of roots, $k$:

\[ data\ layers = 255 - k\]

The size of each layer is:

\[ layer\ size = \left\lceil\frac{medium\ size}{data\ layers}\right\rceil\]

At the end of the last layer, $data\ layers * layer\ size - medium\ size$
zero filled padding sectors are used in the encoding process.

\subsection{Getting the layout when recovering defective media}

The required parameters are taken from the ecc header stored in
the error correction file (see appendix \ref{eh}). Especially,
the number of roots are taken from the {\em eccBytes} field and
the medium size is recorded in the {\em sectors} field.

\subsection{md5 checksums}

RS01 provides two md5 checksums for integrity checking.
The md5 sum of the iso image is calculated and stored in the
{\em mediumSum} field of the ecc header. 
Another md5 sum is calculated over the ecc file, excluding the
first 4096 bytes, and stored in the {\em eccSum} field of
the ecc header. It can be used to verify the integrity of the
ecc file itself. The ecc header is protected by its own CRC
checksum which is stored in the {\em selfCRC} field.

\smallskip

The md5 checksum generation is the major obstacle for parallelizing
the encoder. In RS03, md5sum generation has been made optional since
the RS03 layout allows suffcient consistency checks 
by doing a quick error syndrome check using the Reed-Solomon code.

\subsection{Special cases}

Error correction files can be created for any type of input files, not just iso files,
as long as the input files are ``reasonably'' long\footnote{Input files should contain
at least 2048*(255-k) bytes, so that there is at least one sector for each data
layer.}. Since input files are processed in units of 2048 kByte sectors, 
files whose byte size is not an integer multiple of 2048 are virtually padded 
with zeroes. In that case, the {\em inLast} field of the ecc header
contains the real byte size of the last file ``sector'' so that recovering the
last file sector does not write out the padding bytes. A size of zero in the
{\em inLast} field means that the last sector contains 2048 bytes.