1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454
|
<chapter> <title> The complex multi-layer input </title>
<para>
The idea behind the input module is to treat packets, without knowing
at all what is in it. It only takes a packet,
reads its ID, and delivers it to the decoder at the right time
indicated in the packet header (SCR and PCR fields in MPEG).
All the basic browsing operations are implemented without peeking at the
content of the elementary stream.
</para>
<para>
Thus it remains very generic. This also means you can't do stuff like
"play 3 frames now" or "move forward 10 frames" or "play as fast as you
can but play all frames". It doesn't even know what a "frame" is. There
is no privileged elementary stream, like the video one could be (for
the simple reason that, according to MPEG, a stream may contain
several video ES).
</para>
<sect1> <title> What happens to a file </title>
<para>
An input thread is spawned for every file read. Indeed, input structures
and decoders need to be reinitialized because the specificities of
the stream may be different. <function> input_CreateThread </function>
is called by the interface thread (playlist module).
</para>
<para>
At first, an input plug-in capable of reading the plugin item is looked
for [this is inappropriate : we should first open the socket,
and then probe the beginning of the stream to see which plug-in can read
it]. The socket is opened by either <function> input_FileOpen</function>,
<function> input_NetworkOpen</function>, or <function>
input_DvdOpen</function>. This function sets two very important parameters :
<parameter> b_pace_control </parameter> and <parameter> b_seekable
</parameter> (see next section).
</para>
<note> <para>
We could use so-called "access" plugins for this whole mechanism
of opening the input socket. This is not the case because we
thought only those three methods were to be used at present,
and if we need others we can still build them in.
</para> </note>
<para>
Now we can launch the input plugin's <function> pf_init </function>
function, and an endless loop doing <function> pf_read </function>
and <function> pf_demux</function>. The plugin is responsible
for initializing the stream structures
(<parameter>p_input->stream</parameter>), managing packet buffers,
reading packets and demultiplex them. But in most tasks it will
be assisted by functions from the advanced input API (c). That is
what we will study in the coming sections !
</para>
</sect1>
<sect1> <title> Stream Management </title>
<para>
The function which has opened the input socket must specify two
properties about it :
</para>
<orderedlist>
<listitem> <para> <emphasis> p_input->stream.b_pace_control
</emphasis> : Whether or not the stream can be read at our own
pace (determined by the stream's frequency and
the host computer's system clock). For instance a file or a pipe
(including TCP/IP connections) can be read at our pace, if we don't
read fast enough, the other end of the pipe will just block on a
<function> write() </function> operation. On the contrary, UDP
streaming (such as the one used by VideoLAN Server) is done at
the server's pace, and if we don't read fast enough, packets will
simply be lost when the kernel's buffer is full. So the drift
introduced by the server's clock must be regularly compensated.
This property controls the clock management, and whether
or not fast forward and slow motion can be done.</para>
<note> <title> Subtilities in the clock management </title> <para>
With a UDP socket and a distant server, the drift is not
negligible because on a whole movie it can account for
seconds if one of the clocks is slightly fucked up. That means
that presentation dates given by the input thread may be
out of sync, to some extent, with the frequencies given in
every Elementary Stream. Output threads (and, anecdotically,
decoder threads) must deal with it. </para>
<para> The same kind of problems may happen when reading from
a device (like video4linux's <filename> /dev/video </filename>)
connected for instance to a video encoding board.
There is no way we could differentiate
it from a simple <command> cat foo.mpg | vlc - </command>, which
doesn't imply any clock problem. So the Right Thing (c) would be
to ask the user about the value of <parameter> b_pace_control
</parameter>, but nobody would understand what it means (you are
not the dumbest person on Earth, and obviously you have read this
paragraph several times to understand it :-). Anyway,
the drift should be negligible since the board would share the
same clock as the CPU, so we chose to neglect it. </para> </note>
</listitem>
<listitem> <para> <emphasis> p_input->stream.b_seekable
</emphasis> : Whether we can do <function> lseek() </function>
calls on the file descriptor or not. Basically whether we can
jump anywhere in the stream (and thus display a scrollbar) or
if we can only read one byte after the other. This has less impact
on the stream management than the previous item, but it
is not redundant, because for instance
<command> cat foo.mpg | vlc - </command> is b_pace_control = 1
but b_seekable = 0. On the contrary, you cannot have
b_pace_control = 0 along with b_seekable = 1. If a stream is seekable,
<parameter> p_input->stream.p_selected_area->i_size </parameter>
must be set (in an arbitrary unit, for instance bytes, but it
must be the same as p_input->i_tell which indicates the byte
we are currently reading from the stream).</para>
<note> <title> Offset to time conversions </title> <para>
Functions managing clocks are located in <filename>
src/input/input_clock.c</filename>. All we know about a file
is its start offset and its end offset
(<parameter>p_input->stream.p_selected_area->i_size</parameter>),
currently in bytes, but it could be plugin-dependant. So
how the hell can we display in the interface a time in seconds ?
Well, we cheat. PS streams have a <parameter> mux_rate </parameter>
property which indicates how many bytes we should read in
a second. This is subject to change at any time, but practically
it is a constant for all streams we know. So we use it to
determine time offsets. </para> </note> </listitem>
</orderedlist>
</sect1>
<sect1> <title> Structures exported to the interface </title>
<para>
Let's focus on the communication API between the input module and the
interface. The most important file is <filename> include/input_ext-intf.h,
</filename> which you should know almost by heart. This file defines
the input_thread_t structure, the stream_descriptor_t and all programs
and ES descriptors included (you can view it as a tree).
</para>
<para>
First, note that the input_thread_t structure features two <type> void *
</type> pointers, <parameter> p_method_data </parameter> and <parameter>
p_plugin_data</parameter>, which you can respectivly use for buffer
management data and plugin data.
</para>
<para>
Second, a stream description is stored in a tree featuring program
descriptors, which themselves contain several elementary stream
descriptors. For those of you who don't know all MPEG concepts, an
elementary stream, aka ES, is a continuous stream of video or
(exclusive) audio data, directly readable by a decoder, without
decapsulation.
</para>
<para>
This tree structure is illustrated by the following
figure, where one stream holds two programs.
In most cases there will only be one program (to my
knowledge only TS streams can carry several programs, for instance
a movie and a football game at the same time - this is adequate
for satellite and cable broadcasting).
</para>
<mediaobject>
<imageobject>
<imagedata fileref="stream.eps" format="EPS" scalefit="1" scale="80"/>
</imageobject>
<imageobject>
<imagedata fileref="stream.gif" format="GIF" />
</imageobject>
<textobject>
<phrase> The program tree </phrase>
</textobject>
<caption>
<para> <emphasis> p_input->stream </emphasis> :
The stream, programs and elementary streams can be viewed as a tree.
</para>
</caption>
</mediaobject>
<warning> <para>
For all modifications and accesses to the <parameter>p_input->stream
</parameter> structure, you <emphasis>must</emphasis> hold
the p_input->stream.stream_lock.
</para> </warning>
<para>
ES are described by an ID (the ID the appropriate demultiplexer will
look for), a <parameter> stream_id </parameter> (the real MPEG stream
ID), a type (defined
in ISO/IEC 13818-1 table 2-29) and a litteral description. It also
contains context information for the demultiplexer, and decoder
information <parameter> p_decoder_fifo </parameter> we will talk
about in the next chapter. If the stream you want to read is not an
MPEG system layer (for instance AVI or RTP), a specific demultiplexer
will have to be written. In that case, if you need to carry additional
information, you can use <type> void * </type> <parameter> p_demux_data
</parameter> at your convenience. It will be automatically freed on
shutdown.
</para>
<note> <title> Why ID and not use the plain MPEG <parameter>
stream_id </parameter> ? </title> <para>
When a packet (be it a TS packet, PS packet, or whatever) is read,
the appropriate demultiplexer will look for an ID in the packet, find the
relevant elementary stream, and demultiplex it if the user selected it.
In case of TS packets, the only information we have is the
ES PID, so the reference ID we keep is the PID. PID don't exist
in PS streams, so we have to invent one. It is of course based on
the <parameter> stream_id </parameter> found in all PS packets,
but it is not enough, since private streams (ie. AC3, SPU and
LPCM) all share the same <parameter> stream_id </parameter>
(<constant>0xBD</constant>). In that case the first byte of the
PES payload is a stream private ID, so we combine this with
the stream_id to get our ID (if you did not understand everything,
it isn't very important - just remember we used our brains
before writing the code :-).
</para> </note>
<para>
The stream, program and ES structures are filled in by the plugin's
<function> pf_init()
</function> using functions in <filename> src/input/input_programs.c,
</filename> but are subject to change at any time. The DVD plugin
parses .ifo files to know which ES are in the stream; the TS plugin
reads the PAT and PMT structures in the stream; the PS plugin can
either parse the PSM structure (but it is rarely present), or build
the tree "on the fly" by pre-parsing the first megabyte of data.
</para>
<warning> <para>
In most cases we need to pre-parse (that is, read the first MB of data,
and go back to the beginning) a PS stream, because the PSM (Program
Stream Map) structure is almost never present. This is not appropriate,
though, but we don't have the choice. A few problems will arise. First,
non-seekable streams cannot be pre-parsed, so the ES tree will be
built on the fly. Second, if a new elementary stream starts after the
first MB of data (for instance a subtitle track won't show up
during the credits), it won't appear in the menu before we encounter
the first packet. We cannot pre-parse the entire stream because it
would take hours (even without decoding it).
</para> </warning>
<para>
It is currently the responsibility of the input plugin to spawn the necessary
decoder threads. It must call <function> input_SelectES </function>
<parameter>( input_thread_t * p_input, es_descriptor_t * p_es )
</parameter> on the selected ES.
</para>
<para>
The stream descriptor also contains a list of areas. Areas are logical
discontinuities in the stream, for instance chapters and titles in a
DVD. There is only one area in TS and PS streams, though we could
use them when the PSM (or PAT/PMT) version changes. The goal is that
when you seek to another area, the input plugin loads the new stream
descriptor tree (otherwise the selected ID may be wrong).
</para>
</sect1>
<sect1> <title> Methods used by the interface </title>
<para>
Besides, <filename> input_ext-intf.c </filename>provides a few functions
to control the reading of the stream :
</para>
<itemizedlist>
<listitem> <para> <function> input_SetStatus </function>
<parameter> ( input_thread_t * p_input, int i_mode ) </parameter> :
Changes the pace of reading. <parameter> i_mode </parameter> can
be one of <constant> INPUT_STATUS_END, INPUT_STATUS_PLAY,
INPUT_STATUS_PAUSE, INPUT_STATUS_FASTER, INPUT_STATUS_SLOWER.
</constant> </para>
<note> <para> Internally, the pace of reading is determined
by the variable <parameter>
p_input->stream.control.i_rate</parameter>. The default
value is <constant> DEFAULT_RATE</constant>. The lower the
value, the faster the pace is. Rate changes are taken into account
in <function> input_ClockManageRef</function>. Pause is
accomplished by simply stopping the input thread (it is
then awaken by a pthread signal). In that case, decoders
will be stopped too. Please remember this if you do statistics
on decoding times (like <filename> src/video_parser/vpar_synchro.c
</filename> does). Don't call this function if <parameter>
p_input->b_pace_control </parameter> == 0.</para> </note>
</listitem>
<listitem> <para> <function> input_Seek </function> <parameter>
( input_thread_t * p_input, off_t i_position ) </parameter> :
Changes the offset of reading. Used to jump to another place in a
file. You <emphasis>mustn't</emphasis> call this function if
<parameter> p_input->stream.b_seekable </parameter> == 0.
The position is a number (usually long long, depends on your
libc) between <parameter>p_input->p_selected_area->i_start
</parameter> and <parameter>p_input->p_selected_area->i_size
</parameter> (current value is in <parameter>
p_input->p_selected_area->i_tell</parameter>). </para>
<note> <para> Multimedia files can be very large, especially
when we read a device like <filename> /dev/dvd</filename>, so
offsets must be 64 bits large. Under a lot of systems, like
FreeBSD, off_t are 64 bits by default, but it is not the
case under GNU libc 2.x. That is why we need to compile VLC
with -D_FILE_OFFSET_BITS=64 -D__USE_UNIX98. </para> </note>
<note> <title> Escaping stream discontinuities </title>
<para>
Changing the reading position at random can result in a
messed up stream, and the decoder which reads it may
segfault. To avoid this, we send several NULL packets
(ie. packets containing nothing but zeros) before changing
the reading position. Indeed, under most video and audio
formats, a long enough stream of zeros is an escape sequence
and the decoder can exit cleanly.
</para> </note>
</listitem>
<listitem> <para> <function> input_OffsetToTime </function>
<parameter> ( input_thread_t * p_input, char * psz_buffer,
off_t i_offset ) </parameter> : Converts an offset value to
a time coordinate (used for interface display).
[currently it is broken with MPEG-2 files]
</para> </listitem>
<listitem> <para> <function> input_ChangeES </function>
<parameter> ( input_thread_t * p_input, es_descriptor_t * p_es,
u8 i_cat ) </parameter> : Unselects all elementary streams of
type <parameter> i_cat </parameter> and selects <parameter>
p_es</parameter>. Used for instance to change language or
subtitle track.
</para> </listitem>
<listitem> <para> <function> input_ToggleES </function>
<parameter> ( input_thread_t * p_input, es_descriptor_t * p_es,
boolean_t b_select ) </parameter> : This is the clean way to
select or unselect a particular elementary stream from the
interface.
</para> </listitem>
</itemizedlist>
</sect1>
<sect1 id="input_buff"> <title> Buffers management </title>
<para>
Input plugins must implement a way to allocate and deallocate packets
(whose structures will be described in the next chapter). We
basically need four functions :
</para>
<itemizedlist>
<listitem> <para> <function> pf_new_packet </function>
<parameter> ( void * p_private_data, size_t i_buffer_size )
</parameter> :
Allocates a new <type> data_packet_t </type> and an associated
buffer of i_buffer_size bytes.
</para> </listitem>
<listitem> <para> <function> pf_new_pes </function>
<parameter> ( void * p_private_data ) </parameter> :
Allocates a new <type> pes_packet_t</type>.
</para> </listitem>
<listitem> <para> <function> pf_delete_packet </function>
<parameter> ( void * p_private_data, data_packet_t * p_data )
</parameter> :
Deallocates <parameter> p_data</parameter>.
</para> </listitem>
<listitem> <para> <function> pf_delete_pes </function>
<parameter> ( void * p_private_data, pes_packet_t * p_pes )
</parameter> :
Deallocates <parameter> p_pes</parameter>.
</para> </listitem>
</itemizedlist>
<para>
All functions are given <parameter> p_input->p_method_data </parameter>
as first parameter, so that you can keep records of allocated and freed
packets.
</para>
<note> <title> Buffers management strategies </title>
<para> Buffers management can be done in three ways : </para>
<orderedlist>
<listitem> <para> <emphasis> Traditional libc allocation </emphasis> :
For a long time we have used in the PS plugin
<function> malloc()
</function> and <function> free() </function> every time
we needed to allocate or deallocate a packet. Contrary
to a popular belief, it is not <emphasis>that</emphasis>
slow.
</para> </listitem>
<listitem> <para> <emphasis> Netlist </emphasis> :
In this method we allocate a very big buffer at the
beginning of the problem, and then manage a list of pointers
to free packets (the "netlist"). This only works well if
all packets have the same size. It is used for long for
the TS input. The DVD plugin also uses it, but adds a
<emphasis> refcount </emphasis> flag because buffers (2048
bytes) can be shared among several packets. It is now
deprecated and won't be documented.
</para> </listitem>
<listitem> <para> <emphasis> Buffer cache </emphasis> :
We are currently developing a new method. It is
already in use in the PS plugin. The idea is to call
<function> malloc() </function> and <function> free()
</function> to absorb stream irregularities, but re-use
all allocated buffers via a cache system. We are
extending it so that it can be used in any plugin without
performance hit, but it is currently left undocumented.
</para> </listitem>
</orderedlist>
</note>
</sect1>
<sect1> <title> Demultiplexing the stream </title>
<para>
After being read by <function> pf_read </function>, your plugin must
give a function pointer to the demultiplexer function. The demultiplexer
is responsible for parsing the packet, gathering PES, and feeding decoders.
</para>
<para>
Demultiplexers for standard MPEG structures (PS and TS) have already
been written. You just need to indicate <function> input_DemuxPS
</function> and <function> input_DemuxTS </function> for <function>
pf_demux</function>. You can also write your own demultiplexer.
</para>
<para>
It is not the purpose of this document to describe the different levels
of encapsulation in an MPEG stream. Please refer to your MPEG specification
for that.
</para>
</sect1>
</chapter>
|