File: mpeg_lib.tex

package info (click to toggle)
libmpeg1 1.3.1-2.1
links: PTS
area: main
in suites: sarge, woody
size: 784 kB
ctags: 644
sloc: ansic: 8,520; sh: 1,333; makefile: 169
file content (519 lines) | stat: -rw-r--r-- 22,351 bytes
parent folder | download | duplicates (3)
\documentclass[11pt]{article}
\usepackage{fullpage}
\usepackage{palatino}
\usepackage{psfig}

\title{The MPEG Library\\
       Version 1.2\footnote{Also covers version 1.3}}
\author{Greg Ward\\({\tt greg@bic.mni.mcgill.ca})}
\date{February, 1996}

%\renewcommand{\familydefault}{cmss}
%\renewcommand{\bfdefault}{b}   % make ``bold'' come out medium-width

% \code - for typesetting a little bit of computer code (in typewriter font)
\newcommand{\code}[1]{\texttt{#1}}


% ttdescription - a description-like environment where item descriptions are
% typeset using "\tt", followed by a colon

\newcommand{\ttlabel}[1]{\texttt{#1:}\quad\hfil}
\newenvironment{ttdescription}[1] {\newbox\holder
  \setbox\holder=\hbox{\ttlabel#1} \dimen0=\wd\holder
 \begin{list}{}
   {\labelsep=-0.25in \rightmargin=0.25in \leftmargin=\dimen0
     \addtolength{\leftmargin}{0.25in}
  \labelwidth=\leftmargin
  \let\makelabel\ttlabel}}%     this comment is needed to hide the newline
{\end{list}}

% \prototype - for typesetting a function prototype
\newcommand{\prototype}[1]{%
  \textbf{Function prototype:}\par \smallskip \code{#1}\par\medskip }

\newenvironment{Arguments}[1]{%
\noindent\textbf{Arguments:}%
\begin{ttdescription}{#1}}
{\end{ttdescription}\medskip}

\newenvironment{Notes}{%
\noindent\textbf{Notes:}\par\smallskip}
{\medskip}

\begin{document}

\maketitle

\tableofcontents

\section{Introduction and Background}

The MPEG Library is based on an effort from the University of
California at Berkeley to create a portable, software-based MPEG
decoder  \cite{Patel93}.  This resulted in the widely distributed (and
widely modified) \code{mpeg\_play}, a highly-optimized MPEG decoder
that was specifically geared towards displaying under X Windows.  The
value of having a portable, software MPEG decoder is amply
demonstrated by the number of programs that have been adapted from
this original Berkeley source (including ports to the Linux SVGA
library, Silicon Graphics hardware, and a non-display MPEG information
utility).  However, the utility of the decoder was limited by the
difficulty of extracting the useful, MPEG-related source code from the
X11-specific, display-related source.  Essentially what was needed was
a simple interface that would allow a programmer to extract frames
from an MPEG stream (either before or after converting to RGB colour
space), and then to do with the image data as he or she saw fit.

The MPEG Library is intended to fill this need.  It was developed at
the Montreal Neurological Institute in the summer of 1994 in order to
facilitate the development of a high-performance, feature-heavy
MPEG player for Silicon Graphics workstations.  Since then, the
Library has found a use in numerous applications, notably as one of
several optional libraries used for extending the well-known
ImageMagick suite of graphics applications.

\section{Programming with the MPEG Library}

Using the Library is quite straightforward, and is analogous to the
way in which files have been traditionally handled: you open an MPEG
stream to initialize internal data structures, and then read frames
until the stream is exhausted.  At any point, you can rewind the
stream to start over; however, random access is not allowed.  (This is
not due to a fundamental weakness with MPEG; however, due to the
nature of the decoding engine at the heart of the MPEG Library, don't
expect to see it implemented here any time soon.)  When you are
finished with the stream, you close it to clean up.

Here is a simple example program to open an MPEG stream (named by the
first command-line argument) and read all frames from it.  Since
displaying images is as non-portable as it is desirable, I have
included calls to dummy routines \code{InitializeDisplay()} and
\code{ShowFrame()}; actually defining these is up to you.  
\begin{verbatim}
#include <stdio.h>
#include "mpeg.h"

int main (int argc, char *argv[])
{
   FILE       *mpeg;
   ImageDesc   img;
   Boolean     moreframes = TRUE;
   char       *pixels;

   mpeg = fopen (argv[1], "r");
   SetMPEGOption (MPEG_DITHER, FULL_COLOR_DITHER);
   OpenMPEG (mpeg, &img);

   InitializeDisplay (img.Width, img.Height);
   pixels = (char *) malloc (img.Size);
   while (moreframes)
   {
      moreframes = GetMPEGFrame (pixels);
      DisplayFrame (img.Width, img.Height, pixels)
   }
   CloseMPEG ();
   fclose (mpeg);
}
\end{verbatim}

For a concrete example, you might wish to consult \code{easympeg.c}, a
very simple SGI-specific MPEG player included with the Library.  Also,
I have omitted any error-checking or handling here; again, consult
\code{easympeg.c} for a more realistic example.\footnote{For an even
more realistic (but of course considerably larger) example, take a
look at \code{glmpeg\_play}.  This is the full-featured MPEG player
that was the impetus for creating the MPEG Library; it is available by
from the same location as the library itself: 
\code{ftp://ftp.bic.mni.mcgill.ca/pub/mpeg}.}

Note in particular the following points about the above code:
\begin{itemize}
\item The caller must take care of opening and closing the file
containing the MPEG stream; the Library assumes that it is passed a
file ready for reading.
\item The \code{ImageDesc} structure contains all the information that
should be needed to display frames from the MPEG stream (although not
necessarily all the information you could possibly want to know about
an MPEG stream).  
%In particular, the fields \code{Height} and
%\code{Width} are the height and width of each frame in pixels;
%\code{Depth} is the depth (in bits) of each pixel; \code{PixelSize} is
%the actual number of bits stored per pixel; \code{Size} is the size in
%bytes of each entire decoded and uncompressed frame; and
%\code{BitmapPad} gives the ``quantum'' of a scan line.  (Each scan
%line starts on an even multiple of this many bits.)
\item \code{SetMPEGOption()} can be used to control somewhat the
decoding of frames.  In addition to selecting a dithering mode, you
can also select the luminance and chrominance ranges used for
dithering.  Also, note that \code{SetMPEGOption()} should be called
{\em before} \code{OpenMPEG()} when setting the dithering method.
\item The MPEG data can be decoded using a variety of dithering
methods.  (Note that in this context, {\em dithering\/} refers to converting
from the luminance-chromaticity, or YCrCb, colour space in which MPEG
data is encoded, to the more conventional RGB scheme.)
\item You don't need to pass any parameters to \code{GetMPEGFrame()}
  or \code{CloseMPEG()} to tell it which MPEG stream you mean; this is
  because the Berkeley decoding engine (and hence the MPEG Library
  itself) depends heavily on global variables, and unfortunately
  cannot decode more than one MPEG at a time.
\end{itemize}
%Most of these
%involve both pixel values (which are filled in by calls to
%\code{GetMPEGFrame()} and a colour map which maps pixel values to RGB
%values.  The scheme used here, which is the default, uses
%\code{FULL\_COLOR\_DITHER} to return pixels as RGB triples; no colour
%map is involved.  The other useful dithering modes are
%\code{ORDERED\_DITHER} and \code{GRAY\_DITHER}, both of which offer
%better performance and consume less memory at the expense of image
%quality.

More detailed information is provided in sections below.

\section{Concepts and Data Formats}

This section deals with the main concepts needed to control the MPEG
Library and to display the data it returns.  It does {\em not\/} deal
with the details of how MPEG streams are encoded, stored, or decoded.

\subsection{Dithering modes}
\label{sec:dithering}

A large number of dithering modes (in fact, all the modes provided by
the original \code{mpeg\_play}) are available.  A few produce
nonsensical results, but all have been fully tested in the context of
the MPEG Library and found to agree with the results given by
\code{mpeg\_play}.

``Dithering'' in this context is the conversion from
luminance-chrominance colour space (aka YCrCb, YIQ, or YUV, which is how
MPEG streams are encoded and is the same space used by NTSC television
signals) to some form of RGB space.  The implementors of
\code{mpeg\_play} found that outright conversion to red/green/blue
values takes both more time and memory than any other method they
experimented with, so most modes are colour mapped.  This means that
\code{OpenMPEG()} will create a colour map which can be accessed by the
user via the \code{ColormapEntry} pointer in \code{ImageDesc}, and that
the pixel values returned by \code{GetMPEGFrame()} are indeces into this
colour map.  The dithering mode affects the quality of the decoded
images, the number of bits used per pixel, and the colour depth of the
image.

The dithering mode is selected with \code{SetMPEGOption()}, using the
\code{MPEG\_DITHER} option and one of the following values:
\begin{ttdescription}{FULL\_COLOR\_DITHER}

\item[ORDERED\_DITHER] 8-bit colour-mapped; reasonable quality;
  decoding is almost as fast as \code{GRAY\_DITHER}
\item[ORDERED2\_DITHER] 8-bit colour-mapped; reasonable quality
\item[MBORDERED\_DITHER] 8-bit colour-mapped; reasonable quality
\item[FS4\_DITHER] 8-bit colour-mapped; colours are all wrong
\item[FS2\_DITHER] 8-bit colour-mapped; colours are all wrong
\item[FS2FAST\_DITHER] 8-bit colour mapped using Floyd-Steinberg error
  diffusion; reasonable quality
\item[HYBRID\_DITHER] 8-bit colour-mapped; passable colour
\item[HYBRID2\_DITHER] 8-bit colour-mapped; slightly worse than
  \code{HYBRID\_DITHER}
\item[Twox2\_DITHER] 8-bit colour-mapped with pixels doubled; poor
  quality
\item[GRAY\_DITHER] a 256-shade grayscale rendering; nice
  quality and fastest decoding
\item[FULL\_COLOR\_DITHER] a high-quality 24-bit colour rendering;
  results in slowest decoding
\item[MONO\_DITHER] 1-bit monochrome dithering; use as last resort for
  1-bit displays
\item[THRESHOLD\_DITHER] ??
\end{ttdescription}
The descriptions here are my entirely subjective judgments of the image
quality with each dithering mode.  ``Reasonable'' quality is better than
``passable.''  Your mileage may vary.  ``8-bit'' or ``24-bit'' here
refers to the colour depth in the final images, i.e. the minimum number
of bits allocated to each pixel.  Authoritative information about the
actual pixel size can be found in the \code{ImageDesc} structure filled
in by \code{OpenMPEG()}; for instance, if you select
\code{FULL\_COLOR\_DITHER}, the colour depth is 24 bits, but 32 bits are
allocated per pixel.  Thus, the \code{PixelSize} field in
\code{ImageDesc} will be 32, and the \code{Depth} field will be 24.

Note that the dithering mode must be set {\em before\/} \code{OpenMPEG()}
is called.  For example, to select gray-scale dithering and then open
the file \code{example.mpg} as an MPEG stream:
\begin{verbatim}
     char       filename[] = "example.mpg";
     FILE       *mpeg;
     ImageDesc  image;

     mpeg = fopen (filename, "r");
     SetMPEGOption (MPEG_DITHER, (int) GRAY_DITHER);
     OpenMPEG (&image);
\end{verbatim}

\subsection{Colour maps}
\label{sec:colour_maps}

Most dithering modes result in images whose pixel values are indeces to an
8-bit colour map.  This colour map is accessed via the \code{ImageDesc}
structure, and it is created by \code{OpenMPEG()} based on the dithering type
selected by \code{SetMPEGOption()} (this is why the dithering type must be
set before calling \code{OpenMPEG()}).  Note that ``8-bit'' refers to
the size of the colour map: each pixel in the colour-mapped images is 8
bits long, so the colour map itself has 256 entries.

The colour map is accessed via the \code{Colormap} field of
\code{ImageDesc}, which points to an array of \code{ColormapSize}
colour map entries.  Each colour map entry is a structure of the form
\begin{verbatim}
typedef struct
{
   short red, green, blue;
} ColormapEntry;
\end{verbatim}
and the colour map is created when \code{OpenMPEG()} is called.  If no
colour map is created (i.e., the dithering mode is
\code{FULL\_COLOR\_DITHER}), then \code{ColormapSize} will be -1 and
\code{Colormap} will be \code{NULL}.  For example:
\begin{verbatim}
   char      *filename;
   FILE      *MPEG;
   ImageDesc MPEGInfo;

   filename = argv[1];

   /* Prepare to read and decode an MPEG stream */

   MPEG = fopen (filename, "rb");
   if (!OpenMPEG (MPEG, &MPEGInfo))
      exit;

   /* Do we have a colour-mapped mode? */

   if (MPEGInfo.Colormap != NULL)
   {
      for (i = 0; i < MPEGInfo.ColormapSize; i++)
      {
         mapcolor (i, MPEGInfo.Colormap[i].red,
                      MPEGInfo.Colormap[i].green,
                      MPEGInfo.Colormap[i].blue);
      }
   }
   /* ... */
\end{verbatim}
Here, we assume that the function \code{mapcolor()} is available to
set the system colour map.

% NOTE to self: check how mpeg_play generates colour maps again;
% after all, its pixel values are 8 bits too, so it can't be using
% a huge colour map to keep from clobbering X's.  Problem basically
% is that grayscale dithering uses a 256-elt. colour map, which clobberss
% the system map on the 4D35's -- on Indy's too?  portia?

%Actually, the situation is slightly more complicated: if it were literally
%like this, then programs that use the MPEG Library would be stuck with
%using entries 0-255 (or 0-127, depending on the dithering mode) of the
%system colour map.  On windowed systems such as X Windows, this is often
%undesirable, as it might clobber colour map entries used by the system or
%other programs.  Thus, the MPEG Library provides a mechanism for mapping
%``ideal'' colour map indeces (which are in the range 0..127 or 255) to 
%real-world colour map indeces.  This 

\subsection{Image data format}
\label{sec:image_data}

The image data, as returned by \code{GetMPEGFrame()}, is formatted in
a straightforward way.  Pixels are stored in row-major order, starting
at the upper left-hand corner of the image.  The number of bits
allocated per pixel is given by the \code{PixelSize} field of
\code{ImageDesc}.  This is illustrated in
Figure~\ref{fig:image_format}, which shows a sample $8 \times
10$-pixel image, with the offset into the image data for each pixel.
If the pixels are 8 bits each, then this will be a simple byte offset.

\begin{figure}[htbp]
  \centerline{\psfig{figure=image_format.eps}}
  \caption{Illustration of image data layout for a sample $8 \times 10$
    image.  The number at each pixel is just the offset into the image
    data array.}
  \label{fig:image_format}
\end{figure}


\section{Programming Reference}

\subsection{The \code{ImageDesc} structure}
Relevant declarations:
\begin{verbatim}
typedef struct
{
   unsigned char red, green, blue;
} ColormapEntry;

typedef struct
{
   int  Height;             /* in pixels */
   int  Width;              
   int  Depth;              /* image depth (bits) */
   int  PixelSize;          /* bits actually stored per pixel */
   int  Size;               /* bytes for whole image */
   int  BitmapPad;          /* "quantum" of a scanline -- */
                            /* each scanline starts on an even */
                            /* interval of this many bits */
   int  ColormapSize;     
   ColormapEntry *Colormap; /* an array of ColormapSize entries */
} ImageDesc;
\end{verbatim}

This structure provides (hopefully) all the information needed to
display an MPEG stream, although it doesn't necessarily provide all
the information you could possibly want to know about such a stream.
However, that's not the intent of the MPEG Library; if you really need
to know, for instance, just how many intra-frames are in a particular
MPEG, you might want to take a look at the \code{mpegstat} program,
which was also based on the Berkeley X11 player.%
\footnote{\code{mpegstat} should also be available at
   \code{ftp://ftp.bic.mni.mcgill.ca/pub/mpeg}.}

Here is the list of fields in the structure:
\begin{ttdescription}{ColormapSize}
\item[Height] the height, in pixels, of the movie.
\item[Width] the width, in pixels, of the movie.  Note that due to the
  block nature of MPEG encoding, the height and width will always be
  multiples of 16.
\item[Depth] the number of bits per pixel that are actually relevant
  to the display.  For most dithering methods, this will be 8 (i.e.,
  we usually use an 8-bit colour map); for full-colour dithering, it
  will be 24.
\item[PixelSize] the number of bits (not bytes!) of storage allocated
  per pixel.
\item[Size] the size, in bytes, of one entire unencoded frame.  This
  is simply equal to \code{Height*Width*PixelSize/8}.  (Note:
  currently, \code{BitmapPad} is ignored in the calculation of
  \code{Size}.)
\item[BitmapPad] the ``quantum'' of a scan line; i.e., each scan line
  starts on an even interval of this many bits.
\item[ColormapSize] the number of entries in the colour map.  This is
  usually 128, but for most dithering methods it can be indirectly
  modified by the user of the Library.  It is zero in non-colourmapped
  modes.
\item[Colormap] the table used to map pixel values to
  red/\-green/\-blue values (which are themselves stored as bytes in
  the \code{ColormapEntry} structure.  It is \code{NULL} in
  non-colourmapped modes.
\end{ttdescription}


\subsection{\code{SetMPEGOption()}}
\prototype{void SetMPEGOption (MPEGOptionEnum Option, int Value)}
\noindent\code{Option} should be one of:
  \begin{ttdescription}{MPEG\_CMAP\_INDEX}
  \item[MPEG\_DITHER] Sets the dithering mode, which controls how YCrCb
    values are converted to RGB space.  \code{Value} should be a
    \code{DitherEnum} value, cast to \code{int}.  Dithering modes are
    explained above, in section~\ref{sec:dithering}.

  \item[MPEG\_LUM\_RANGE]
  \item[MPEG\_CR\_RANGE]
  \item[MPEG\_CB\_RANGE] These set the ranges of luminance and
    chromaticity values.  The defaults are 8, 4, and 4.  (I do
    not understand the effects of changing these; my experiments
    indicate that doing so garbles perfectly good colour maps.)

%  \item[MPEG\_CMAP\_INDEX] This allows you to set the mapping of
%    ``ideal'' pixel values (i.e., indeces into the colour map returned
%    by \code{OpenMPEG()} in the \code{Color\-map} field of the
%    \code{ImageDesc} structure) to ``real-world'' pixel values, or
%    indeces into the system colour map that you were able to allocate.
%    See section~\ref{sec:colour_maps} for more information on
%    colour-mapped modes.  The \code{Value} passed to
%    \code{SetMPEGOption()} should be a pointer to an array of
%    \code{unsigned char} with \code{ColormapSize} entries (cast to
%    \code{int}, of course).  The contents of this array will be copied
%    into a private Library array, so you don't need to worry about 
%    keeping it around after calling \code{SetMPEGOption()}.

  \end{ttdescription}

\begin{Notes}
  \code{SetMPEGOption()} allows you to set a variety of options related
  to MPEG decoding.  The possible values for \code{Option} are described
  above; the possible values for \code{Value} value are dependent on
  what \code{Option} you are setting.  Whatever \code{Value} is, it
  should of course be cast to an \code{int}.
\end{Notes}


\subsection{\code{OpenMPEG()}}
\prototype{Boolean OpenMPEG (FILE *MPEGfile, ImageDesc *Image)}
\begin{Arguments}{MPEGfile}
\item[MPEGfile] A file that is already open for reading, positioned at
  the beginning of an MPEG stream.
\item[Image] Pointer to a user-declared \code{ImageDesc} structure.
  You shouldn't change any of the fields in \code{*Image} yourself,
  either before or after calling \code{OpenMPEG()}; use
  \code{SetMPEGOption()} instead.
\end{Arguments}
\begin{Notes}
  \code{OpenMPEG()} prepares an MPEG stream for decoding.  It
  initializes internal data structures for decoding and dithering
  and---if applicable---creates a colour map.  After calling
  \code{OpenMPEG()}, the following fields in \code{*Image} will be
  set: \code{Height}, \code{Width}, \code{Depth}, \code{PixelSize},
  \code{Size}, \code{BitmapPad}, \code{ColormapSize}, and
  \code{Colormap}.
\end{Notes}


\subsection{\code{GetMPEGFrame()}}
\prototype{Boolean GetMPEGFrame (char *Frame)}
\begin{Arguments}{Frame}
\item[Frame] Pointer to a user-allocated chunk of memory.  Must have
  enough room for the decoded image, which can be determined from the
  \code{Size} field of \code{ImageDesc}.
\end{Arguments}
\begin{Notes}
  Decodes the next frame from the movie.  Returns \code{TRUE} if there
  are any frames left to decode in the movie, or \code{FALSE} if the
  decoded frame is the last frame in the movie.  That is, for a movie
  with $N$ frames \code{GetMPEGFrame()} will return \code{TRUE} $N-1$
  times, and then the call to decode the last frame will return
  \code{FALSE}.  After that, the behaviour of \code{GetMPEGFrame()} is
  undefined (unless you call \code{RewindMPEG()}.)
\end{Notes}


\subsection{\code{RewindMPEG()}}
\prototype{void RewindMPEG (FILE *MPEGfile, ImageDesc *Image)}
\begin{Arguments}{MPEGfile}
\item[MPEGfile] The open, readable stream pointer that was also passed
  to \code{OpenMPEG()}.
\item[Image] The image descriptor that was passed to and filled in by
  \code{OpenMPEG()}.
\end{Arguments}
\begin{Notes}
  Repositions \code{MPEGfile}'s file-offset pointer to point to the
  beginning of the stream, and reinitializes internal MPEG Library
  structures to prepare reading the MPEG again.  The first call to
  \code{GetMPEGFrame()} after calling \code{RewindMPEG()} will return
  the first frame of the movie, as though \code{OpenMPEG()} had just
  been called.
\end{Notes}

\section*{Acknowledgements}
Most of the credit for this package should go to the authors of
\code{mpeg\_play}; all I can really take credit for is shuffling code
around and coming up with a reasonably intelligent interface to the
decoding engine that does the real work.  

Thanks to Magnus Heldestat for contributing a sped-up version of
24bit.c, resulting in faster decoding/dithering of full-colour images.

\begin{thebibliography}{0}
  \bibitem{LeGall91} Didier LeGall, ``MPEG--A Video Compression
  Standard for Multimedia Applications,'' {\em Communications of the
    ACM\/}, April 1991, Vol 34 Num 4, pp. 46--58.

  \bibitem{Patel93} Ketan Patel, Brian C. Smith, and Lawrence A. Rowe,
  ``Performace of a Software MPEG Video Decoder'', {\em ACM Multimedia
    '93 Conference}.
\end{thebibliography}

\end{document}