File: design-encoding.txt

package info (click to toggle)
gst-plugins-base1.0 1.10.4-1%2Bdeb9u1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 32,548 kB
  • sloc: ansic: 256,816; sh: 5,213; makefile: 2,908; xml: 1,743; perl: 1,561; python: 309; cpp: 260; sed: 16
file content (571 lines) | stat: -rw-r--r-- 16,624 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
Encoding and Muxing
-------------------

Summary
-------
 A. Problems
 B. Goals
 1. EncodeBin
 2. Encoding Profile System
 3. Helper Library for Profiles
 I. Use-cases researched


A. Problems this proposal attempts to solve
-------------------------------------------

* Duplication of pipeline code for gstreamer-based applications
  wishing to encode and or mux streams, leading to subtle differences
  and inconsistencies across those applications.

* No unified system for describing encoding targets for applications
  in a user-friendly way.

* No unified system for creating encoding targets for applications,
  resulting in duplication of code across all applications,
  differences and inconsistencies that come with that duplication,
  and applications hardcoding element names and settings resulting in
  poor portability.



B. Goals
--------

1. Convenience encoding element

  Create a convenience GstBin for encoding and muxing several streams,
  hereafter called 'EncodeBin'.

  This element will only contain one single property, which is a
  profile.

2. Define a encoding profile system

2. Encoding profile helper library

  Create a helper library to:
  * create EncodeBin instances based on profiles, and
  * help applications to create/load/save/browse those profiles.




1. EncodeBin
------------

1.1 Proposed API
----------------

  EncodeBin is a GstBin subclass.

  It implements the GstTagSetter interface, by which it will proxy the
  calls to the muxer.

  Only two introspectable property (i.e. usable without extra API):
  * A GstEncodingProfile*
  * The name of the profile to use

  When a profile is selected, encodebin will:
  * Add REQUEST sinkpads for all the GstStreamProfile
  * Create the muxer and expose the source pad

  Whenever a request pad is created, encodebin will:
  * Create the chain of elements for that pad
  * Ghost the sink pad
  * Return that ghost pad

  This allows reducing the code to the minimum for applications
  wishing to encode a source for a given profile:

  ...

  encbin = gst_element_factory_make("encodebin, NULL);
  g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
  gst_element_link (encbin, filesink);

  ...

  vsrcpad = gst_element_get_src_pad(source, "src1");
  vsinkpad = gst_element_get_request_pad (encbin, "video_%u");
  gst_pad_link(vsrcpad, vsinkpad);

  ...


1.2 Explanation of the Various stages in EncodeBin
--------------------------------------------------

  This describes the various stages which can happen in order to end
  up with a multiplexed stream that can then be stored or streamed.

1.2.1 Incoming streams

  The streams fed to EncodeBin can be of various types:

  * Video
   * Uncompressed (but maybe subsampled)
   * Compressed
  * Audio
   * Uncompressed (audio/x-raw)
   * Compressed
  * Timed text
  * Private streams


1.2.2 Steps involved for raw video encoding

(0) Incoming Stream

(1) Transform raw video feed (optional)

 Here we modify the various fundamental properties of a raw video
 stream to be compatible with the intersection of:
  * The encoder GstCaps and
  * The specified "Stream Restriction" of the profile/target

 The fundamental properties that can be modified are:
  * width/height
    This is done with a video scaler.
    The DAR (Display Aspect Ratio) MUST be respected.
    If needed, black borders can be added to comply with the target DAR.
  * framerate
  * format/colorspace/depth
    All of this is done with a colorspace converter

(2) Actual encoding (optional for raw streams)

 An encoder (with some optional settings) is used.

(3) Muxing

 A muxer (with some optional settings) is used.

(4) Outgoing encoded and muxed stream


1.2.3 Steps involved for raw audio encoding

 This is roughly the same as for raw video, expect for (1)

(1) Transform raw audo feed (optional)

 We modify the various fundamental properties of a raw audio stream to
 be compatible with the intersection of:
  * The encoder GstCaps and
  * The specified "Stream Restriction" of the profile/target

 The fundamental properties that can be modifier are:
 * Number of channels
 * Type of raw audio (integer or floating point)
 * Depth (number of bits required to encode one sample)


1.2.4 Steps involved for encoded audio/video streams

 Steps (1) and (2) are replaced by a parser if a parser is available
 for the given format.


1.2.5 Steps involved for other streams

 Other streams will just be forwarded as-is to the muxer, provided the
 muxer accepts the stream type.

 


2. Encoding Profile System
--------------------------

 This work is based on:
 * The existing GstPreset system for elements [0]
 * The gnome-media GConf audio profile system [1]
 * The investigation done into device profiles by Arista and
 Transmageddon [2 and 3]

2.2 Terminology
---------------

* Encoding Target Category
  A Target Category is a classification of devices/systems/use-cases
  for encoding.

  Such a classification is required in order for:
  * Applications with a very-specific use-case to limit the number of
    profiles they can offer the user. A screencasting application has
    no use with the online services targets for example. 
  * Offering the user some initial classification in the case of a
    more generic encoding application (like a video editor or a
    transcoder). 

  Ex:
   Consumer devices
   Online service
   Intermediate Editing Format
   Screencast
   Capture
   Computer

* Encoding Profile Target
  A Profile Target describes a specific entity for which we wish to
  encode.
  A Profile Target must belong to at least one Target Category.
  It will define at least one Encoding Profile.

  Ex (with category):
   Nokia N900 (Consumer device)
   Sony PlayStation 3 (Consumer device)
   Youtube (Online service)
   DNxHD (Intermediate editing format)
   HuffYUV (Screencast)
   Theora (Computer)

* Encoding Profile
  A specific combination of muxer, encoders, presets and limitations.

  Ex:
   Nokia N900/H264 HQ
   Ipod/High Quality
   DVD/Pal
   Youtube/High Quality
   HTML5/Low Bandwith
   DNxHD

2.3 Encoding Profile
--------------------

An encoding profile requires the following information:

 * Name
   This string is not translatable and must be unique.
   A recommendation to guarantee uniqueness of the naming could be:
      <target>/<name>
 * Description
   This is a translatable string describing the profile
 * Muxing format
   This is a string containing the GStreamer media-type of the
   container format.
 * Muxing preset
   This is an optional string describing the preset(s) to use on the
   muxer.
 * Multipass setting
   This is a boolean describing whether the profile requires several
   passes.
 * List of Stream Profile

2.3.1 Stream Profiles

A Stream Profile consists of:

 * Type
   The type of stream profile (audio, video, text, private-data)
 * Encoding Format
   This is a string containing the GStreamer media-type of the encoding
   format to be used. If encoding is not to be applied, the raw audio
   media type will be used.
 * Encoding preset
   This is an optional string describing the preset(s) to use on the
   encoder.
 * Restriction
   This is an optional GstCaps containing the restriction of the
   stream that can be fed to the encoder.
   This will generally containing restrictions in video
   width/heigh/framerate or audio depth.
 * presence
   This is an integer specifying how many streams can be used in the
   containing profile. 0 means that any number of streams can be
   used.
 * pass
   This is an integer which is only meaningful if the multipass flag
   has been set in the profile. If it has been set it indicates which
   pass this Stream Profile corresponds to.
 
2.4 Example profile
-------------------

The representation used here is XML only as an example. No decision is
made as to which formatting to use for storing targets and profiles.

<gst-encoding-target>
  <name>Nokia N900</name>
  <category>Consumer Device</category>
  <profiles>
    <profile>Nokia N900/H264 HQ</profile>
    <profile>Nokia N900/MP3</profile>
    <profile>Nokia N900/AAC</profile>
  </profiles>
</gst-encoding-target>

<gst-encoding-profile>
  <name>Nokia N900/H264 HQ</name>
  <description>
    High Quality H264/AAC for the Nokia N900
  </description>
  <format>video/quicktime,variant=iso</format>
  <streams>
    <stream-profile>
      <type>audio</type>
      <format>audio/mpeg,mpegversion=4</format>
      <preset>Quality High/Main</preset>
      <restriction>audio/x-raw,channels=[1,2]</restriction>
      <presence>1</presence>
    </stream-profile>
    <stream-profile>
      <type>video</type>
      <format>video/x-h264</format>
      <preset>Profile Baseline/Quality High</preset>
      <restriction>
        video/x-raw,width=[16, 800],\
	height=[16, 480],framerate=[1/1, 30000/1001]
      </restriction>
      <presence>1</presence>
    </stream-profile>
  </streams>
  
</gst-encoding-profile>

2.5 API
-------
  A proposed C API is contained in the gstprofile.h file in this directory.


2.6 Modifications required in the existing GstPreset system
-----------------------------------------------------------

2.6.1. Temporary preset.

  Currently a preset needs to be saved on disk in order to be
  used.

  This makes it impossible to have temporary presets (that exist only
  during the lifetime of a process), which might be required in the
  new proposed profile system

2.6.2 Categorisation of presets.

  Currently presets are just aliases of a group of property/value
  without any meanings or explanation as to how they exclude each
  other.

  Take for example the H264 encoder. It can have presets for:
  * passes (1,2 or 3 passes)
  * profiles (Baseline, Main, ...)
  * quality (Low, medium, High)

  In order to programmatically know which presets exclude each other,
  we here propose the categorisation of these presets.

  This can be done in one of two ways
  1. in the name (by making the name be [<category>:]<name>)
    This would give for example: "Quality:High", "Profile:Baseline"
  2. by adding a new _meta key
    This would give for example: _meta/category:quality

2.6.3 Aggregation of presets.

  There can be more than one choice of presets to be done for an
  element (quality, profile, pass).

  This means that one can not currently describe the full
  configuration of an element with a single string but with many.

  The proposal here is to extend the GstPreset API to be able to set
  all presets using one string and a well-known separator ('/').

  This change only requires changes in the core preset handling code.

  This would allow doing the following:
  gst_preset_load_preset (h264enc,
                          "pass:1/profile:baseline/quality:high");

2.7 Points to be determined
---------------------------

  This document hasn't determined yet how to solve the following
  problems:

2.7.1 Storage of profiles

  One proposal for storage would be to use a system wide directory
  (like $prefix/share/gstreamer-0.10/profiles) and store XML files for
  every individual profiles.

  Users could then add their own profiles in ~/.gstreamer-0.10/profiles

  This poses some limitations as to what to do if some applications
  want to have some profiles limited to their own usage.


3. Helper library for profiles
------------------------------

 These helper methods could also be added to existing libraries (like
 GstPreset, GstPbUtils, ..).

 The various API proposed are in the accompanying gstprofile.h file.

3.1 Getting user-readable names for formats

 This is already provided by GstPbUtils.

3.2 Hierarchy of profiles

 The goal is for applications to be able to present to the user a list
 of combo-boxes for choosing their output profile:

 [      Category      ]       # optional, depends on the application
 [    Device/Site/..  ]       # optional, depends on the application
 [      Profile       ]

 Convenience methods are offered to easily get lists of categories,
 devices, and profiles.

3.3 Creating Profiles

 The goal is for applications to be able to easily create profiles.

 The applications needs to be able to have a fast/efficient way to:
 * select a container format and see all compatible streams he can use
 with it.
 * select a codec format and see which container formats he can use
 with it.

 The remaining parts concern the restrictions to encoder
 input.

3.4 Ensuring availability of plugins for Profiles

 When an application wishes to use a Profile, it should be able to
 query whether it has all the needed plugins to use it.

 This part will use GstPbUtils to query, and if needed install the
 missing plugins through the installed distribution plugin installer.


I. Use-cases researched
-----------------------

 This is a list of various use-cases where encoding/muxing is being
 used.

* Transcoding

  The goal is to convert with as minimal loss of quality any input
  file for a target use.
  A specific variant of this is transmuxing (see below).

  Example applications: Arista, Transmageddon

* Rendering timelines

  The incoming streams are a collection of various segments that need
  to be rendered.
  Those segments can vary in nature (i.e. the video width/height can
  change).
  This requires the use of identiy with the single-segment property
  activated to transform the incoming collection of segments to a
  single continuous segment.

  Example applications: PiTiVi, Jokosher

* Encoding of live sources

  The major risk to take into account is the encoder not encoding the
  incoming stream fast enough. This is outside of the scope of
  encodebin, and should be solved by using queues between the sources
  and encodebin, as well as implementing QoS in encoders and sources
  (the encoders emitting QoS events, and the upstream elements
  adapting themselves accordingly).

  Example applications: camerabin, cheese

* Screencasting applications

  This is similar to encoding of live sources.
  The difference being that due to the nature of the source (size and
  amount/frequency of updates) one might want to do the encoding in
  two parts:
  * The actual live capture is encoded with a 'almost-lossless' codec
  (such as huffyuv)
  * Once the capture is done, the file created in the first step is
  then rendered to the desired target format.

  Fixing sources to only emit region-updates and having encoders
  capable of encoding those streams would fix the need for the first
  step but is outside of the scope of encodebin.

  Example applications: Istanbul, gnome-shell, recordmydesktop

* Live transcoding

  This is the case of an incoming live stream which will be
  broadcasted/transmitted live.
  One issue to take into account is to reduce the encoding latency to
  a minimum. This should mostly be done by picking low-latency
  encoders.

  Example applications: Rygel, Coherence

* Transmuxing

  Given a certain file, the aim is to remux the contents WITHOUT
  decoding into either a different container format or the same
  container format.
  Remuxing into the same container format is useful when the file was
  not created properly (for example, the index is missing).
  Whenever available, parsers should be applied on the encoded streams
  to validate and/or fix the streams before muxing them.

  Metadata from the original file must be kept in the newly created
  file.

  Example applications: Arista, Transmaggedon

* Loss-less cutting

  Given a certain file, the aim is to extract a certain part of the
  file without going through the process of decoding and re-encoding
  that file.
  This is similar to the transmuxing use-case.

  Example applications: PiTiVi, Transmageddon, Arista, ...

* Multi-pass encoding

  Some encoders allow doing a multi-pass encoding.
  The initial pass(es) are only used to collect encoding estimates and
  are not actually muxed and outputted.
  The final pass uses previously collected information, and the output
  is then muxed and outputted.

* Archiving and intermediary format

  The requirement is to have lossless

* CD ripping

  Example applications: Sound-juicer

* DVD ripping

  Example application: Thoggen



* Research links

  Some of these are still active documents, some other not

[0] GstPreset API documentation
    http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html

[1] gnome-media GConf profiles
    http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html

[2] Research on a Device Profile API
    http://gstreamer.freedesktop.org/wiki/DeviceProfile

[3] Research on defining presets usage
    http://gstreamer.freedesktop.org/wiki/PresetDesign