1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377
|
# PyGST Tutorial/Combining Audio and Video
So far, our demos have been video-only for the sake of clarity. A few
people have asked for some specific examples explaining how to get audio
and video together in the same pipeline, so this article focuses
specifically on these challenges.
# Audio Elements
GStreamer also has various audio sink elements, but for the sake of
simplicity we can just use `autoaudiosink`, which figures out the
correct type for us. Working with audio sinks is much easier than
working with video sinks.
We have seen that GStreamer offers some utility elements for video,
including `videoscale` and `ffmpegcolorspace`. We also have a similar
elements for audio, including: `audioconvert` and `audioresample`.
# What Happens Inside Decodebin
Media files have a hierarchical structure. The top level of this is the
container format, and within that are one or more media streams. Files
with audio and video information have at least one audio and one video
stream. In order to completely decode a movie file, the container format
must first be read and interpreted in order to extract the encoded
streams. Then the streams must be decompressed and possibly converted to
a format suitable for processing or display. In gstreamer terminology,
the container format is read with a “demuxer”, which is short for
“demultiplexer”, and streams are decompressed with “decoders”.
For example, let's say you wanted to decode an mpeg2 audio video stream:

Or how about Motion-JPEG in an AVI container with mp3 audio:

You could create an add-hoc pipeline for each independent scenario, but
that's not very flexible. Most container formats support a wide variety
of codecs, and the number of combinations of containers and codecs
supported by GStreamer is huge. Creating separate pipelines for each
scenario is impractical. What decodebin does is read a little bit of
information from its sink pad. Once it has worked out what container
format and streams are present in the data, it creates the appropriate
chain demuxers and decoders. It creates one source pad for each
individual stream in the file. Therefore if a file contains one video
stream and one audio stream, decodebin creates two pads: one for video,
and one for audio. GStreamer refers to this type of behavior as
*autoplugging* and elements which do this type of thing as
“autopluggers”. Because decodebin can't know what's in a stream until it
reads it, the pads are not created until the pipeline transitions into
the ready or paused states. This is why we must link decodebin to other
elements in the “pad-added” signal.
In previous examples, we were only interested in the video stream, so we
simply ignored any pad that that wasn't compatible with our video
colorspace converter. Now we have two possible targets to link when a
pad is created, and we need to be careful that the audio and video
source pads are linked to the appropriate processing elements.
## Your first Attempt
Suppose we want to play the a file, what do you suppose the pipeline
will look like? Go ahead, grab a napkin and a pen and take a wild stab
at drawing the pipeline for this example.
Done?
You probably drew something like the following:

You can try and run this pipeline with the following gst-launch command
(be sure to set `${FILENAME}` to a suitable path before trying these
examples):
```
gst-launch-0.10 filesrc location=${FILENAME} ! decodebin name=decode \
decode. ! ffmpegcolorspace ! ximagesink \
decode. ! audioconvert ! autoaudiosink
```
So far so good. Now suppose we want to transcode instead. This means we
will pipe decoded audio and video through *encoders* and then into a
*muxer*, which will output to a filesink. We will use motion-jpeg with
raw audio in an avi container. The resulting pipeline looks like this in
`gst-launch` syntax:
```
gst-launch-0.10 filesrc location=${FILENAME} ! decodebin name=decode \
decode. ! ffmpegcolorspace ! jpegenc ! avimux name=muxer \
decode. ! audioconvert ! muxer. \
muxer. ! filesink location=${FILENAME}.avi sync=false
```
This example probably works well enough, but now suppose we want a
preview of the compressed video, so we can tune our quality settings:
```
gst-launch-0.10 filesrc location=${FILENAME} ! decodebin name=decode \
decode. ! ffmpegcolorspace ! jpegenc ! tee ! avimux name=muxer \
tee0. ! jpegdec ! ffmpegcolorspace ! autovideosink sync=false \
decode. ! audioconvert !muxer. \
muxer. ! filesink location=${FILENAME}.avi sync=false
```
This looks perfectly reasonable, but probably will not work -- this is
because you need something else we haven't seen yet: `queues`.
## Queues
The `queue` element is used to allow concurrent execution of streams
within a pipeline. Essentially, it forces elements linked *downstream*
to do their processing in a separate thread. This is especially
important when multiplexing, and when using multiple sink elements.
Consequences of not using queues when required include link errors,
audio / video sync issues, and deadlocks.
When working with multiple streams in gstreamer, use the following rules
of thumb:
- Always add a queue before any sink element when the pipeline
contains multiple sinks
- Always add a queue before each input to a *muxer* (an element which
combines several input streams into one output stream)
One thing to be aware of is that queues introduce *latency*. The
placement of queues within a pipeline can affect the responsiveness of
the pipeline to things like property changes.
Here's the proper version of the previous transcoding example, complete
with queues:
```
gst-launch-0.10 filesrc location=${FILENAME} ! decodebin name=decode \
decode. ! ffmpegcolorspace ! jpegenc ! tee ! queue ! avimux name=muxer \
tee0. ! jpegdec ! ffmpegcolorspace ! queue ! autovideosink sync=false \
decode. ! queue ! audioconvert !muxer. \
muxer. ! queue ! filesink location=${FILENAME}.avi sync=false
```
# Movie Player Demo
In this example we will write a simple movie player applet that can
handle both audio and video. The UI for this example is basically the
same as demo.py, so there's little to say about it. Let's jump straight
into creating the piepline, which looks like this:
[source for this example](audio_video.py.md)

We override `createPipeline` so that it creates two sinks:
def createPipeline(self, w):
"""Given a window, creates a pipeline and connects it to the window"""
# ... duplicate code omitted for brevity
videosink = gst.element_factory_make("ximagesink", "sink")
videosink.set_property("force-aspect-ratio", True)
videosink.set_property("handle-expose", True)
scale = gst.element_factory_make("videoscale", "scale")
cspace = gst.element_factory_make("ffmpegcolorspace", "cspace")
audiosink = gst.element_factory_make("autoaudiosink")
audioconvert = gst.element_factory_make("audioconvert")
# pipeline looks like: ... ! cspace ! scale ! sink
# ... ! audioconvert ! autoaudiosink
self.pipeline.add(cspace, scale, videosink, audiosink,
audioconvert)
scale.link(videosink)
cspace.link(scale)
audioconvert.link(audiosink)
return (self.pipeline, (cspace, audioconvert))
In `magic()`, the main difference is accepting the extra parameters, and
creating the audio and video queues:
src = gst.element_factory_make("filesrc", "src")
src.props.location = args[0]
dcd = create_decodebin()
audioqueue = gst.element_factory_make("queue")
videoqueue = gst.element_factory_make("queue")
pipeline.add(src, dcd, audioqueue, videoqueue)
src.link(dcd)
videoqueue.link(videosink)
audioqueue.link(audiosink)
dcd.connect("pad-added", onPadAdded)
Our dynamic linking code must now take into account one of two possible
targets:
def onPadAdded(source, pad):
# first we see if we can link to the videosink
tpad = videoqueue.get_compatible_pad(pad)
if tpad:
pad.link(tpad)
return
# if not, we try the audio sink
tpad = audioqueue.get_compatible_pad(pad)
if tpad:
pad.link(tpad)
return
[source for this example](audio_video.py.md)
# Video DJ Example
**Note: Currently there is an issue with the `volume` element which
prevents this example from working as well as it should. The audio
latency is much higher than it should be.**
[source for this example](audio_video_crossfade.py.md)
We've seen how to construct a simple pipeline that uses both audio and
video. Now, let's re-visit the video crossfade example, this time
cross-fading both audio and video. As with video, we need two kinds of
elements: an element to control audio volume, and an element to perform
the mixing. The volume element is called `volume` and the audio mixing
element is called `adder`. Adder requires that the incoming streams be
of the same type, width, depth, and rate, so we also need `audioconvert`
and `audioresample` to sanitize the incoming streams.
Go ahead: try to draw the pipeline for this example before looking at
the solution.
[audio\_crossfade\_pipeline.png](audio_crossfade_pipeline.png.md)
Notice that there are some recurring chains of elements. For audio, we
see this chain appear twice:

And for video, this chain appears twice:

To simplify the code for this example, i've factored out the audio and
video code into separate methods. First we make our `pad-added`
signal-handler a proper method so that we can connect to it multiple
places:
def onPad(self, decoder, pad, target):
tpad = target.get_compatible_pad(pad)
if tpad:
pad.link(tpad)
This method creates a chain of audio elements between `decoder` and
`adder`. At the end, we save the volume element as an instance attribute
to that the UI can its properites. This purpose of the `name` parameter
is to help generate a unique attribute names. Notice that we are passing
the target of the dynamic link as a user-parameter of the decoder's
`connect` method.
def addAudioChain(self, pipeline, name, decoder, adder):
volume = gst.element_factory_make("volume")
volume.props.volume = 0.5
audioconvert = gst.element_factory_make("audioconvert")
audiorate = gst.element_factory_make("audioresample")
queue = gst.element_factory_make("queue")
pipeline.add(volume, audioconvert, audiorate, queue)
decoder.connect("pad-added", self.onPad, audioconvert)
audioconvert.link(audiorate)
audiorate.link(queue)
queue.link(volume)
volume.link(adder)
setattr(self, "vol%s" % name, volume)
Now the code to create the chain of video elements. Notice how similar
it is in structure to the audio version:
def addVideoChain(self, pipeline, name, decoder, mixer):
alpha = gst.element_factory_make("alpha")
alpha.props.alpha = 1.0
videoscale = gst.element_factory_make("videoscale")
videorate = gst.element_factory_make("videorate")
colorspace = gst.element_factory_make("ffmpegcolorspace")
queue = gst.element_factory_make("queue")
pipeline.add(alpha, videoscale, videorate, colorspace, queue)
decoder.connect("pad-added", self.onPad, videorate)
videorate.link(videoscale)
videoscale.link(colorspace)
colorspace.link(queue)
queue.link(alpha)
alpha.link(mixer)
setattr(self, "alpha%s" % name, alpha)
For each input, we add a `filesrc` and decoder, then we combine the
audio and video chains.
def addSourceChain(self, pipeline, name, filename, mixer, adder):
src = gst.element_factory_make("filesrc")
src.props.location = filename
dcd = create_decodebin()
pipeline.add(src, dcd)
src.link(dcd)
self.addVideoChain(pipeline, name, dcd, mixer)
self.addAudioChain(pipeline, name, dcd, adder)
Now our `magic` method is fairly concise. All we have to do is create
the `videomixer`, `adder` and connect them to our source element-chains.
def magic(self, pipeline, (videosink, audiosink), args):
"""This is where the magic happens"""
mixer = gst.element_factory_make("videomixer")
adder = gst.element_factory_make("adder")
pipeline.add(mixer, adder)
mixer.link(videosink)
adder.link(audiosink)
self.addSourceChain(pipeline, "A", args[0], mixer, adder)
self.addSourceChain(pipeline, "B", args[1], mixer, adder)
self.alphaB.props.alpha = 0.5
The UI layout is similar to the video-only example, but with an extra
control: a balance adjuster so the user can compensate if the volume
varies significantly between sources A and B.
def customWidgets(self):
self.crossfade = gtk.Adjustment(0.5, 0, 1.0)
self.balance = gtk.Adjustment(1.0, 0.0, 2.0)
crossfadeslider = gtk.HScale(self.crossfade)
balanceslider = gtk.HScale(self.balance)
self.crossfade.connect("value-changed", self.onValueChanged)
self.balance.connect("value-changed", self.onValueChanged)
ret = gtk.Table()
ret.attach(gtk.Label("Crossfade"), 0, 1, 0, 1)
ret.attach(crossfadeslider, 1, 2, 0, 1)
ret.attach(gtk.Label("Balance"), 0, 1, 1, 2)
ret.attach(balanceslider, 1, 2, 1, 2)
return ret
When the slider moves, we want to set the alpha only on source B, as we
did in the video-only example. But we need to set the volume on both
audio sources to complimentary values. We also perform the balance
computation entirely in the UI.
def onValueChanged(self, adjustment):
balance = self.balance.get_value()
crossfade = self.crossfade.get_value()
self.volA.props.volume = (2 - balance) * (1 - crossfade)
self.volB.props.volume = balance * crossfade
self.alphaB.props.alpha = crossfade
[source for this example](audio_video_crossfade.py.md)
## Exercises
1. Modify this example to work with an arbitrary number of sources.
Hint: create a separate alpha slider for each source.
2. Lookup `v4l2src` and `autoaudiosrc` using `gst-inspect`. Use what
you learn replace one of the sources with input from the a webcam
and sound card.
3. Notice that this example does not place any queues between the
mixing and sink elements. This is to minimize the latency of the
slider control.
1. Insert queues between the `videomixer` and `ximagesink`, as well
as between the `adder` and `autoaudiosink` elements. How does
this affect the demo?
2. Try changing the properties of the queues. In particular, see
how changing `max-size-time` and `min-threshold-time` affects
latency. See if you can reduce latency to usable levels.
4. In this example, we factored out code to create recurring portions
of our pipeline into separate methods. GStreamer has a more general
way to abstract repeating combinations of elements, called a *Bin*.
We've already seen one example of such an element, `decodebin`.
Rewrite this example so that common sections of code are factored
into separate Bins. Hint: you also need to learn about *Ghost Pads*.
|