1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353
|
\documentclass[a4paper,12pt,english]{article}
\usepackage{babel}
\usepackage[]{graphicx}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
%% Because html converters don't know tabularnewline
\providecommand{\tabularnewline}{\\}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
\newenvironment{lyxcode}
{\begin{list}{}{
\setlength{\rightmargin}{\leftmargin}
\setlength{\listparindent}{0pt}% needed for AMS classes
\raggedright
\setlength{\itemsep}{0pt}
\setlength{\parsep}{0pt}
\normalfont\ttfamily}%
\item[]}
{\end{list}}
\newenvironment{lyxlist}[1]
{\begin{list}{}
{\settowidth{\labelwidth}{#1}
\setlength{\leftmargin}{\labelwidth}
\addtolength{\leftmargin}{\labelsep}
\renewcommand{\makelabel}[1]{##1\hfil}}}
{\end{list}}
\begin{document}
\title{Specification of the PFS File Format\\version 1.5}
\maketitle
\section{Introduction}
This document contains a detailed specification of the pfs file
format. PFS file format is intended to store in particular
high-dynamic range images, but it is also flexible enough to store
additional data, like depth map or motion flow. To learn how PFS
format can be useful and why it is different from existing image
formats, see a list of Frequently Asked Questions of the PFS tools
package. Information in this section should be sufficient to implement
pfs compatible reader or writer.
\section{Copyright}
Copyright (c) 2003 Rafal Mantiuk and Grzegorz Krawczyk
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
\section{Change History}
\begin{itemize}
\item 1.0 (15.12.2004 RM) --- Original version
\item 1.1 (14.02.2005 RM) --- Colon ':' is no longer allowed in the name of a tag; Added a conceptual UML data-model
\item 1.2 (16.03.2006 RM) --- Fixed typos (thanks to grendelkhan)
\item 1.3 (16.08.2006 RM) --- Added a list of registered channel names; added comments on endianness)
\item 1.4 (28.06.2007 RM) --- Fixed column-/row-major ambiguity (thanks to Matt)
\item 1.5 (06.08.2007 RM) --- Specified maximum string lengths and valid value ranges. Added suggestion to prepend custom channel names with ''x''. (thanks to Martin)
\item 1.6 (03.10.2008 RM) --- Added a new tag 'BITDEPTH'
\end{itemize}
\section{General Structure}
A pfs-stream is a byte-stream that contains one or more frames. Each
frame is stored in the pfs-stream right after another frame, without
any markers or separators between them. End of file indicates that
there are no more frames in the pfs-stream.
A conceptual data model of the pfs stream is shown in
Figure~\ref{cap:data-model}. A pfs-stream can contain any number of
frames, which include any number of channels. Both frame and channel
has an associated tag-container, which can contain any number of tags
(name=value pairs).
\begin{figure}
\centering \includegraphics[width=\textwidth]{data_model.png}
\caption{A conceptual UML data model of the pfs-stream}
\label{cap:data-model}
\end{figure}
A structure of a single frame is shown in Table~\ref{cap:pfs-frame}.
A frame encoded in the pfs format consists of a text header followed
by binary data. The header contains information on frame size, number
of channels and tags. Data items in the header are separated by end of
line (EOL) characters. Note that only a unix-type EOL character is
allowed (a single character of the ASCII code 10). MSDOS-type EOL is
not allowed. To avoid portability problems, use C string
``{\tt\textbackslash{}x0a}'' instead of ``{\tt\textbackslash{}n}''.
Note that the header ends with a {\tt ENDH} string, which is not
followed by an EOL character. This prevents C formated IO functions,
like {\tt fscanf}, from reading both the EOL character and first bytes of
binary data that happen to have the values of white spaces.
The binary data part of a frame follows text header and holds a 2D
array of floating points for each channel. The structure of a channel
is described in the next section.
%
\begin{table}
\begin{tabular}{|l|l|p{5cm}|}
\hline
Data&
Type&
Description\tabularnewline
\hline
\hline
\textbf{PFS1\P}&
&
Constant identifier\tabularnewline
\hline
width height\P&
\emph{int int}&
Width and height of each channel (1--65535) \tabularnewline
\hline
channelCount\P&
\emph{int}&
Number of channels (1--1024)\tabularnewline
\hline
frameTagCount\P&
\emph{int}&
Number of tags associated with the frame (0--1024)\tabularnewline
\hline
\emph{for i=1:frameTagCount}&
&
\tabularnewline
\hline
~~tagName\textbf{=}tagValue\P&
\emph{string string}&
An i-th tag: name=value pair (max length of \texttt{strcat(string, string)} - 1023)\tabularnewline
\hline
\emph{for i=1:channelCount}&
&
\tabularnewline
\hline
~~channelName\P&
\emph{string}&
Name of the i-th channel (max string length -- 32)\tabularnewline
\hline
~~channelTagCount\P&
\emph{int}&
Number of tags associated with i-th channel (0--1024)\tabularnewline
\hline
~~\emph{for j=1:channelTagCount}&
&
\tabularnewline
\hline
~~~~tagName\textbf{=}tagValue\P&
\emph{string string}&
An i-th tag: name=value pair (max length of \texttt{strcat(string, string)} - 1023)\tabularnewline
\hline
\textbf{ENDH}&
&
Signalizes the end of the header\tabularnewline
\hline
\emph{for i=1:channelCount}&
&
\tabularnewline
\hline
~~channelData&
\emph{bin-float}{[}w{]}{[}h{]}&
Row-major array of 32-bit floating points\tabularnewline
\hline
\end{tabular}
\caption{\label{cap:pfs-frame}A structure of pfs-frame. Bold font denotes
literal strings. '\P'~is a unix-type end of line character
(ASCII code 10). Types: \emph{int} -- integer value given as string;
\emph{string} -- a string of one or more characters (1-byte ASCII); \emph{bin-float}{[}w{]}{[}h{]}
-- a row major array of 32-bit floating point numbers in binary format. The range of valid values and maximum string lengths are given in the parenthesis in the ``description'' column.}
\end{table}
\section{Channels}
Channels are two-dimensional arrays of floating point numbers, which
can contain anything from color data to motion flow and depth (z-buffer)
information. Channels are written in a stream in the same order as they
are listed in the header. Note that no assumption can be made on the
order of channels --- it may differ from file to file and some
applications may even reverse that order.
For the sake of performance, channel data are encoded in binary
format. Each cell (pixel) of an array should be encoded as a IEEE
Standard 754 Floating Point Number. Fortunately, this representation
is used by CPU for most architectures, including Intel-based PC's, so
it is enough to store the values in memory as a C 'float' type and
then write them to an IO stream. An array should be encoded in a
stream in a row major order --- all cells of the first row are
followed by the cells of the second row and so on. The encoding starts
from the top left corner. The bytes should be encoded in the
little-endian order (LSB), which is appropriate for the x86
platform\footnote{Current implementation of the pfs library does not
handle big-endian system, so the pfs files generated on x86 and
powerPC platforms are not compatible}.
This version of the pfs format specification defines the following
channels:
\begin{description}
\item {\bf X} --- X color component of the CIE XYZ color space. See comments below.
\item {\bf Y} --- Y color component of the CIE XYZ color space. See comments below.
\item {\bf Z} --- Z color component of the CIE XYZ color space. See comments below.
\item {\bf DEPTH} --- Depth channel. Format of the depth information is currently not standardized.
\item {\bf ALPHA} --- Alpha channel that encodes transparency. The values should be in the range from 0 to 1.
\end{description}
If other information than color, depth and alpha needs to be stored in
a channel, a custom name starting with a lower-case ''x'' letter
should be used. If you think that a particular channel name should be
registered in this document, feel free to suggest it on the {\tt pfstools@googlegroups.com}
discussion group.
Color information must be represented in CIE XYZ ($2^\circ$ standard
observer) color space. Depending on the value of the LUMINANCE tag,
color data are linear (RELATIVE -- linearly related to luminance),
linear and calibrated (ABSOLUTE -- the values of Y channel represent
luminance in $cd/m^2$), or non-linear (DISPLAY -- gamma-corrected).
The preferred representation is a 'linear and calibrated'.
'non-linear' is only a temporary representation used before data is
written to low dynamic range image files (PNG, JPG) or displayed. For
more information refer to the next section --- \ref{sec:tags}. The
channels must be named using upper case letters: 'X', 'Y' and 'Z'. For
color images, all three X, Y and Z channels must be given. It is
enough to include 'Y' channel for gray-level images.
\section{Tags}
\label{sec:tags}
pfs format allows for storing tags, which are 'name=value' pairs. Tags
are robust mechanism for specifying additional information within a
pfs-stream. Both the name and value should be a text string, i.e. if a
floating point number is to be stored, it should be converted into a
text string. The name must not contain '=' and ':' character (some
programs use notation 'channel\_name:tag\_name' to assign a tag to the
channel). Spaces in front of and after '=' are allowed, although they
will not be skipped and they will be interpreted as a part of name or
value strings. Tags can be associated with a frame or separate
channel, depending how they are placed in the text header of the pfs
frame (see Table~\ref{cap:pfs-frame}). Each tag associated with a
single frame or channel must have an unique name. There is no limit to
the number of tags.
In general, tags can be used to store any user defined data, but there
are also several tags that are reserved and are part of the pfs
format. Those tags let precisely define content of the pfs stream.
The reserved tags are listed below:
\begin{description}
\item LUMINANCE specifies a photometric content of the XYZ color
channels or Y luminance channel alone. LUMINANCE tag distinguishes
between different methods of storing lumiance data. This tag must
be associated with a frame and the frame must contain either all
XYZ channels or Y channel. Possible values are: ABSOLUTE, RELATIVE,
and DISPLAY.
\begin{description}
\item ABSOLUTE Set the LUMINANCE tag to this value if the high-dynamic
range data is calibrated. Y channel should contain absolute
luminance values given in $cd/m^2$. Note that luminance values
must be $\geq0$ for calibrated content.
\item RELATIVE High-dynamic range data is of RELATIVE LUMINANCE if
it is not calibrated, that is Y channel data is proportional
to luminance, but nothing is known about absolute value of
luminance. This is the case of most of the high-dynamic range
images available on Internet.
\item DISPLAY If the data comes from low-dynamic range file and no
colorimetric information is provided in ICC profile, the data is
simply given in pixel values of an unknown display device. This
is the worst case, but also most common as most of the
low-dynamic range formats (jpeg, png, tiff) do not contain any
data that can be used to recover luminance values. The values of the Y
channel can range from $0$ to $1$ and are already gamma
corrected.
\end{description}
If the LUMINANCE tag is not specified in the pfs-stream, RELATIVE is assumed.
Example: LUMINANCE=ABSOLUTE
\item VISUAL\_CONTENT tag specifies visual content of the XYZ color
channels or Y luminance channel alone. If an image was captured as
high-dynamic range photograph, it has one-to-one correspondence
with the real word, therefore its VISUAL\_CONTENT is MEASUREMENT.
However, if the same image has been tone mapped (or gamut mapped)
it still resembles appearance of the real word scene, but it has no
longer one-to-one correspondence. In this case its VISUAL\_CONTENT
is RENDERING. This tag is typically set to rendering after tone
mapping of high-dynamic range images. This tag must be associated
with a frame and the frame must contain either all XYZ channels or
Y channel.
\begin{description}
\item MEASUREMENT Set VISUAL\_CONTENT tag to MEASUREMENT if the
image has one-to-one correspondence with the real word, i.e. it
contains measured data.
\item RENDERING Set VISUAL\_CONTENT to RENDERING if the image has the same
appearance as the real world scene, but there is no photometric
correspondence.
\end{description}
If the tag is not specified in the pfs-stream, MEASUREMENT is assumed.
If LUMINANCE tag is set to DISPLAY, the VISUAL\_CONTENT is assumed to be
RENDERING.
Example: VISUAL\_CONTENT=RENDERING
\item WHITE\_Y defines the value of channel Y that is perceived as
reference white. The value should be given in units of an Y
channel. If LUMINANCE=ABSOLUTE, the white point is given in
$cd/m^2$. The value of this tag should be a single floating point
number in a text format.
Example: WHITE\_Y=100
\item WHITE\_x, WHITE\_y tags define relative coordinates of color
that is perceived as reference white. The value of those tags must
be a floating point number in a text format. The value is given
as a relative colorimatric coordinates of the CIE XYZ space ($2^\circ$
standard observer) $y=Y/(X+Y+Z)$ and $x=X/(X+Y+Z)$.
If the tag is not specified, D65 white point is assumed: $x=0.3127\:y=0.3290$
Example: WHITE\_x=0.3457~~WHITE\_y=0.3585
\item FILE\_NAME contains a string with the name of a source file.
This can be used to locate the origin of a frame.
Example: FILE\_NAME=my\_sequence\_frame\_0000.hdr
\item BITDEPTH specifies the number of bits, that are required to
store a single channel in a display-referred image
(LUMINANCE=DISPLAY) without quality loss. The number should be from
1 to 32.
Example: BITDEPTH=16
\end{description}
\section{Discussion}
The structure of pfs files is a compromise between simplicity and
robustness and (as well) between portability and performance. Therefore
a small header is stored in text format as this is the easiest to
decode on different platforms. Storing image data as text would result
in too much overhead and thus binary format is used. Since parsing a file
is typically more difficult to implement than writing it, pfs format
tries to make parsing files as easy as possible. This is why sizes of
all variable length structures are given first and then actual data
follows.
\end{document}
|