1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818
|
.. ****************************************************************************
* Copyright © 2012-2014 Institut für Nachrichtentechnik, Universität Rostock *
* Copyright © 2006-2014 Quality & Usability Lab, *
* Telekom Innovation Laboratories, TU Berlin *
* *
* This file is part of the SoundScape Renderer (SSR). *
* *
* The SSR is free software: you can redistribute it and/or modify it under *
* the terms of the GNU General Public License as published by the Free *
* Software Foundation, either version 3 of the License, or (at your option) *
* any later version. *
* *
* The SSR is distributed in the hope that it will be useful, but WITHOUT ANY *
* WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS *
* FOR A PARTICULAR PURPOSE. *
* See the GNU General Public License for more details. *
* *
* You should have received a copy of the GNU General Public License along *
* with this program. If not, see <http://www.gnu.org/licenses/>. *
* *
* The SSR is a tool for real-time spatial audio reproduction providing a *
* variety of rendering algorithms. *
* *
* http://spatialaudio.net/ssr ssr@spatialaudio.net *
******************************************************************************
.. _renderers:
The Renderers
=============
General
-------
.. _reproduction_setups:
Reproduction Setups
~~~~~~~~~~~~~~~~~~~
The geometry of the actual reproduction setup is specified in ``.asd``
files, just like sound scenes. By default, it is loaded from the file
``/usr/local/share/ssr/default_setup.asd``. Use the ``--setup`` command
line option to load another reproduction setup file. Note that the
loudspeaker setups have to be convex. This is not checked by the SSR.
The loudspeakers appear at the outputs of your sound card in the same
order as they are specified in the ``.asd`` file, starting with channel
1.
A sample reproduction setup description:
::
<?xml version="1.0"?>
<asdf version="0.1">
<header>
<name>Circular Loudspeaker Array</name>
</header>
<reproduction_setup>
<circular_array number="56">
<first>
<position x="1.5" y="0"/>
<orientation azimuth="-180"/>
</first>
</circular_array>
</reproduction_setup>
</asdf>
We provide the following setups in the directory
``data/reproduction_setups/``:
- ``2.0.asd``: standard stereo setup at 1.5 mtrs distance
- ``2.1.asd``: standard stereo setup at 1.5 mtrs distance plus
subwoofer
- ``5.1.asd``: standard 5.1 setup on circle with a diameter of 3 mtrs
- ``rounded_rectangle.asd``: Demonstrates how to combine circular arcs
and linear array segments.
- ``circle.asd``: This is a circular array of 3 mtrs diameter composed
of 56 loudspeakers.
- ``loudspeaker_setup_with_nearly_all_features.asd``: This setup
describes all supported options, open it with your favorite text
editor and have a look inside.
There is some limited freedom in assigning channels to
loudspeakers: If you insert the element ``<skip number="5"/>``, the
specified number of output channels are skipped and the following
loudspeakers get higher channel numbers accordingly.
Of course, the binaural and BRS renderers do not load a loudspeaker
setup. By default, they assume the listener to reside in the coordinate
origin looking straight forward.
A Note on the Timing of the Audio Signals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The WFS renderer is the only renderer in which the timing of the audio
signals is somewhat peculiar. None of the other renderers imposes any
algorithmic delay on individual source signals. Of course, if you use a
renderer that is convolution based such as the BRS renderer, the
employed HRIRs do alter the timing of the signals due to their inherent
properties.
This is different with the WFS renderer. Here, also the propagation
duration of sound from the position of the virtual source to the
loudspeaker array is taken into account. This means that the farther a virtual
source is located, the longer is the delay imposed on its input signal.
This also holds true for plane waves: Theoretically, plane waves do
originate from infinity. Though, the SSR does consider the origin point
of the plane wave that is specified in ASDF. This origin point also
specifies the location of the symbol that represents the respective
plane wave in the GUI.
We are aware that this procedure can cause confusion and reduces the
ability of a given scene of translating well between different types of
renderers. In the upcoming version 0.4 of the SSR we will implement an
option that will allow you specifying for each individual source whether
the propagation duration of sound shall be considered by a renderer or
not.
Subwoofers
~~~~~~~~~~
All loudspeaker-based renderers support the use of subwoofers. Outputs of the
SSR that are assigned to subwoofers receive a signal having full bandwidth. So,
you will have to make sure yourself that your system lowpasses these signals
appropriately before they are emitted by the subwoofers.
You might need to adjust the level of your subwoofer(s) depending on the
renderers that you are using as the overall radiated power of the normal
speakers cannot be predicted easily so that we cannot adjust for it
automatically. For example, no matter of how many loudspeakers your setup is
composed of the VBAP renderer will only use two loudspeakers at a time to
present a given virtual sound source. The WFS renderer on the other hand might
use 10 or 20 loudspeakers, which can clearly lead to a different sound pressure
level at a given receiver location.
For convenience, ASDF allows for specifying permantent weight for loudspeakers
and subwoofers using the ``weight`` attribute:
::
<loudspeaker model="subwoofer" weight="0.5">
<position x="0" y="0"/>
<orientation azimuth="0"/>
</loudspeaker>
``weight`` is a linear factor that is always applied to the signal of this
speaker. Above example will obviously attenuate the signal by approx. 6 dB. You
can use two ASDF description for the same reproduction setup that
differ only with respect to the subwoofer weights if you're using different
renderers on the same loudspeaker system.
Distance Attenuation
~~~~~~~~~~~~~~~~~~~~
Note that in all renderers -- except for the BRS and generic renderers --, the
distance attenuation in the virtual space is :math:`\frac{1}{r}` with respect
to the distance :math:`r` of the respective virtual point source to the
reference position. Point sources closer than 0.5 m to the reference position
do not experience any increase of amplitude. Virtual plane waves do not
experience any algorithmic distance attenuation in any renderer.
You can specify your own preferred distance attenuation exponent :math:`exp`
(in :math:`\frac{1}{r^{exp}}`) either via the command line argument
``--decay-exponent=VALUE`` or the configuration option ``DECAY_EXPONENT`` (see
the file ``data/ssr.conf.example``). The higher the exponent, the faster is the
amplitude decay over distance. The default exponent is
:math:`exp = 1` [1]_. Fig. :ref:`3.1 <distance_attenuation>` illustrates the effect
of different choices of the exponent. In simple words, the smaller the exponent
the slower is the amplitude decay over distance. Note that the default decay of
:math:`\frac{1}{r}` is theoretically correct only for infinitessimally small
sound sources. Spatially extended sources, like most real world sources, exhibit
a slower decay. So you might want to choose the exponent to be somewhere between
0.5 and 1. You can completely suppress any sort of distance attenuation by
setting the decay exponent to 0.
The amplitude reference distance, i.e. the distance from the reference
at which plane waves are as loud as the other source types (like point
sources), can be set in the SSR configuration file
(Section :ref:`Configuration File <ssr_configuration_file>`). The desired
amplitude reference distance for a given sound scene can be specified in
the scene description (Section :ref:`ASDF <asdf>`). The default value is 3 m.
The overall amplitude normalization is such that plane waves always exhibit the
same amplitude independent of what amplitude reference distance and what decay
exponent have been chosen. Consequently, also virtual point source always
exhibit the same amplitude at amplitude reference distance, whatever it has
been set to.
.. _distance_attenuation:
.. figure:: images/distance_attenuation.png
:align: center
Illustration of the amplitude of virtual point sources as a function of
source distance from the reference point for different exponents
:math:`exp`. The exponents range from 0 to 2 (black color to gray color).
The amplitude reference distance is set to 3 m. Recall that sources
closer than 0.5 m to the reference position do not experience any further
increase of amplitude.
.. [1]
A note regarding previous versions of the WFS renderer: In the present SSR
version, the amplitude decay is handled centrally and equally for all
renderers that take distance attenuation into account (see Table
:ref:`2 <source_props>`). Previously, the WFS renderer relied on the distance
attenuation that was inherent to the WFS driving function. This amplitude
decay is very similar to an exponent of 0.5 (instead of the current default
exponent of 1.0). So you might want to set the decay exponent to 0.5 in WFS
to make your scenes sound like they used to do previously.
Doppler Effect
~~~~~~~~~~~~~~
In the current version of the SSR the Doppler Effect in moving sources
is not supported by any of the renderers.
Signal Processing
~~~~~~~~~~~~~~~~~
All rendering algorithms are implemented on a frame-wise basis with an
internal precision of 32 bit floating point. The signal processing is
illustrated in Fig. :ref:`3.2 <signal_processing>`.
The input signal is divided into individual frames of size *nframes*,
whereby *nframes* is the frame size with which JACK is running. Then
e.g. frame number :math:`n+1` is processed both with previous rendering
parameters :math:`n` as well as with current parameters :math:`n+1`\ .
It is then crossfaded between both processed frames with cosine-shaped
slopes. In other words the effective frame size of the signal processing
is :math:`2\cdot`\ *nframes* with 50% overlap. Due to the
fade-in of the frame processed with the current parameters :math:`n+1`\ ,
the algorithmic latency is slightly higher than for processing done with
frames purely of size *nframes* and no crossfade.
.. _signal_processing:
.. figure:: images/signal_processing.png
:align: center
Illustration of the frame-wise signal processing
as implemented in the SSR renderers (see text)
The implementation approach described above is one version of the
standard way of implementing time-varying audio processing. Note however
that this means that with *all* renderers, moving sources are not
physically correctly reproduced. The physically correct reproduction of
moving virtual sources as in [Ahrens2008a]_ and [Ahrens2008b]_ requires a
different implementation
approach which is computationally significantly more costly.
.. [Ahrens2008a] Jens Ahrens and Sascha Spors. Reproduction of moving virtual
sound sources with special attention to the doppler effect. In 124th
Convention of the AES, Amsterdam, The Netherlands, May 17–20, 2008.
.. [Ahrens2008b] Jens Ahrens and Sascha Spors. Reproduction of virtual sound
sources moving at supersonic speeds in Wave Field Synthesis. In 125th
Convention of the AES, San Francisco, CA, Oct. 2–5, 2008.
.. _binaural_renderer:
Binaural Renderer
-----------------
Executable: ``ssr-binaural``
Binaural rendering is an approach where the acoustical influence of the
human head is electronically simulated to position virtual sound sources
in space. **Be sure that you are using headphones to listen.**
The acoustical influence of the human head is coded in so-called
head-related impulse responses (HRIRs) or equivalently by head-related transfer functions.
The HRIRs are loaded from the file ``/usr/local/share/ssr/default_hrirs.wav``. If you want
to use different HRIRs then use the ``--hrirs=FILE`` command line option or the
SSR configuration file
(Section :ref:`Configuration File <ssr_configuration_file>`) to specify
your custom location. The SSR connects its outputs automatically to
outputs 1 and 2 of your sound card.
For virtual sound sources that are closer to the reference position (=
the listener position) than 0.5 m, the HRTFs are interpolated with a
Dirac impulse. This ensures a smooth transition of virtual sources from
the outside of the listener's head to the inside.
SSR uses HRIRs with an angular resolution of :math:`1^\circ`\ . Thus,
the HRIR file contains 720 impulse responses (360 for each ear) stored
as a 720-channel .wav-file. The HRIRs all have to be of equal length and
have to be arranged in the following order:
- 1st channel: left ear, virtual source position :math:`0^\circ`
- 2nd channel: right ear, virtual source position :math:`0^\circ`
- 3rd channel: left ear, virtual source position :math:`1^\circ`
- 4th channel: right ear, virtual source position :math:`1^\circ`
- ...
- 720th channel: right ear, virtual source position :math:`359^\circ`
If your HRIRs have lower angular resolution you have to interpolate them
to the target resolution or use the same HRIR for serveral adjacent
directions in order to fulfill the format requirements. Higher
resolution is not supported. Make sure that the sampling rate of the
HRIRs matches that of JACK. So far, we know that both 16bit and 24bit
word lengths work.
The SSR automatically loads and uses all HRIR coefficients it finds in
the specified file. You can use the ``--hrir-size=VALUE`` command line
option in order to limit the number of HRIR coefficients read and used
to ``VALUE``. You don't need to worry if your specified HRIR length
``VALUE`` exceeds the one stored in the file. You will receive a warning
telling you what the score is. The SSR will render the audio in any
case.
The actual size of the HRIRs is not restricted (apart from processing
power). The SSR cuts them into partitions of size equal to the JACK
frame buffer size and zero-pads the last partition if necessary.
Note that there's some potential to optimize the performance of the SSR
by adjusting the JACK frame size and accordingly the number of
partitions when a specific number of HRIR taps are desired. The least
computational load arises when the audio frames have the same size like
the HRIRs. By choosing shorter frames and thus using partitioned
convolution the system latency is reduced but computational load is
increased.
The HRIR sets shipped with SSR
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSR comes with two different HRIR sets: FABIAN and KEMAR (QU). The differ with respect to
the manikin that was used in the measurement (FABIAN vs. KEMAR). The reference for the
FABIAN measurement is [Lindau2007]_, and the reference for the KEMAR (QU) is
[Wierstorf2011]_. The low-frequency extension from [SpatialAudio]_ has been applied to the
KEMAR (QU) HRTFs.
You will find all sets in the folder ``data/impulse_responses/hrirs/``.
The suffix ``_eq`` in the file name indicates the equalized data. The unequalized data is
of course also there. See the file
``data/impulse_responses/hrirs/hrirs_fabian_documentation.pdf`` for a few more details on
the FABIAN measurement.
Starting with SSR release 0.5.0, the default HRIR set that is loaded is headphone
compensated, i.e., we equalized the HRIRs a bit in order to compensate for the alterations
that a typical pair of headphones would apply to the ear signals. Note that by design,
headphones do not have a flat transfer function. However, when performing binaural
rendering, we need the headphones to be transparent. Our equalization may not be
perfect for all headphones or earbuds as these can exhibit very different properties
between different models.
We chose a frequency sampling-based minimum-phase filter design. The transfer functions
and impulse responses of the two compensation filters are depicted in Fig. :ref:`3.3
<hrir_comp_filters>`. The impulse responses themselves can be found in the same folder
like the HRIRs (see above). The length is 513 taps so that the unequalized
HRIRs are 512 taps long, the equalized ones are 1024 taps long.
.. _hrir_comp_filters:
.. figure:: images/hrir_comp_filters.png
:align: center
Magnitude transfer functions and impulse responses of the headphone compensation /
equalization filters
Recall that there are several ways of defining which HRIR set is loaded, for example the
``HRIR_FILE_NAME`` in the :ref:`SSR configuration files<ssr_configuration_file>` property,
or the command line option ``--hrirs=FILE``.
.. [Lindau2007] Alexander Lindau and Stefan Weinzierl. FABIAN - Schnelle
Erfassung binauraler Raumimpulsantworten in mehreren Freiheitsgraden. In
Fortschritte der Akustik, DAGA Stuttgart, 2007.
.. [Wierstorf2011] Hagen Wierstorf, Matthias Geier, Alexander Raake, and Sascha Spors. A
Free Database of Head-Related Impulse Response Measurements in the Horizontal Plane
with Multiple Distances. In 130th Convention of the Audio Engineering Society (AES),
May 2011.
.. [SpatialAudio] https://github.com/spatialaudio/lf-corrected-kemar-hrtfs (commit 5b5ec8)
Preparing HRIR sets
~~~~~~~~~~~~~~~~~~~
You can easily prepare your own HRIR sets for use with the SSR by
adopting the MATLAB script ``data/matlab_scripts/prepare_hrirs_cipic.m``
to your needs. This script converts the HRIRs of the KEMAR manikin
included in the CIPIC database [AlgaziCIPIC]_ to the format that the SSR
expects. See the script for further information and how to obtain the raw HRIRs. Note that
the KEMAR (CIPIC) HRIRs are not identical to the KEMAR (QU) ones.
.. [AlgaziCIPIC] V. Ralph Algazi. The CIPIC HRTF database.
https://web.archive.org/web/20170916053150/interface.cipic.ucdavis.edu/sound/hrtf.html.
.. _brs:
Binaural Room Synthesis Renderer
--------------------------------
Executable: ``ssr-brs``
The Binaural Room Synthesis (BRS) renderer is a binaural renderer (refer
to Section :ref:`Binaural Renderer <binaural_renderer>`) which uses one
dedicated
HRIR set of each individual sound source. The motivation is to have more
realistic reproduction than in simple binaural rendering. In this
context HRIRs are typically referred to as binaural room impulse
responses (BRIRs).
Note that the BRS renderer does not consider any specification of a
virtual source's position. The positions of the virtual sources
(including their distance) are exclusively coded in the BRIRs.
Consequently, the BRS renderer does not apply any distance attenuation.
It only applies the respective source's gain and the master volume. No
interpolation with a Dirac as in the binaural renderer is performed for
very close virtual sources. The only quantity which is explicitely
considered is the orientation of the receiver, i.e. the reference.
Therefore, specification of meaningful source and receiver positions is
only necessary when a correct graphical illustration is desired.
The BRIRs are stored in the a format similar to the one for the HRIRs
for the binaural renderer (refer to
Section :ref:`Binaural Renderer <binaural_renderer>`). However, there is a
fundamental difference: In order to be consequent, the different
channels do not hold the data for different positions of the virtual
sound source but they hold the information for different head
orientations. Explicitely,
- 1st channel: left ear, head orientation :math:`0^\circ`
- 2nd channel: right ear, head orientation :math:`0^\circ`
- 3rd channel: left ear, head orientation :math:`1^\circ`
- 4th channel: right ear, head orientation :math:`1^\circ`
- ...
- 720th channel: right ear, head orientation :math:`359^\circ`
In order to assign a set of BRIRs to a given sound source an appropriate
scene description in ``.asd``-format has to be prepared (refer also to
Section :ref:`Audio Scenes <audio_scenes>`). As shown in ``brs_example.asd``
(from the example scenes), a virtual source has the optional property
``properties_file`` which holds the location of the file containing the
desired BRIR set. The location to be specified is relative to the folder
of the scene file. Note that -- as described above -- specification of the
virtual source's position does not affect the audio processing. If you
do not specify a BRIR set for each virtual source, then the renderer
will complain and refuse processing the respective source.
We have measured the BRIRs of the FABIAN
manikin in one of our mid-size meeting rooms called Sputnik with 8
different source positions. Due to the file size, we have not included
them in the release. You can obtain the data from [BRIRs]_.
.. [BRIRs] The Sputnik BRIRs can be obtained from here:
https://github.com/ssr-scenes/tu-berlin/tree/master/sputnik.
More BRIR repositories are compiled here: http://www.soundfieldsynthesis.org/other-resources/#impulse-responses.
.. _vbap:
Vector Base Amplitude Panning Renderer
--------------------------------------
Executable: ``ssr-vbap``
The Vector Base Amplitude Panning (VBAP) renderer uses the algorithm
described in [Pulkki1997]_. It tries to find a loudspeaker pair between which
the phantom source is located (in VBAP you speak of a phantom source rather
than a virtual one). If it does find a loudspeaker pair whose angle is
smaller than :math:`180^\circ` then it calculates the weights
:math:`g_l` and :math:`g_r` for the left and right loudspeaker as
.. math::
g_{l,r} = \frac{\cos\phi \sin \phi_0 \pm \sin \phi \cos \phi_0}
{2\cos \phi_0 \sin \phi_0}.
:math:`\phi_0` is half the angle between the two loudspeakers with
respect to the listening position, :math:`\phi` is the angle between the
position of the phantom source and the direction "between the
loudspeakers".
If the VBAP renderer can not find a loudspeaker pair whose angle is
smaller than :math:`180^\circ` then it uses the closest loudspeaker
provided that the latter is situated within :math:`30^\circ`\ . If not,
then it does not render the source. If you are in verbosity level 2
(i.e. start the SSR with the ``-vv`` option) you'll see a notification
about what's happening.
Note that all virtual source types (i.e. point and plane sources) are
rendered as phantom sources.
Contrary to WFS, non-uniform distributions of loudspeakers are ok here.
Ideally, the loudspeakers should be placed on a circle around the
reference position. You can optionally specify a delay for each
loudspeakers in order to compensate some amount of misplacement. In the
ASDF (refer to Section :ref:`ASDF <asdf>`), each loudspeaker has the optional
attribute ``delay`` which determines the delay in seconds to be applied
to the respective loudspeaker. Note that the specified delay will be
rounded to an integer factor of the temporal sampling period. With 44.1
kHz sampling frequency this corresponds to an accuracy of 22.676
:math:`\mu`\ s, respectively an accuracy of 7.78 mm in terms of
loudspeaker placement. Additionally, you can specify a weight for each
loudspeaker in order to compensate for irregular setups. In the ASDF
format (refer to Section :ref:`ASDF <asdf>`), each loudspeaker has the optional
attribute ``weight`` which determines the linear (!) weight to be
applied to the respective loudspeaker. An example would be
::
<loudspeaker delay="0.005" weight="1.1">
<position x="1.0" y="-2.0"/>
<orientation azimuth="-30"/>
</loudspeaker>
Delay defaults to 0 if not specified, weight defaults to 1.
Although principally suitable, we do not recommend to use our amplitude
panning algorithm for dedicated 5.1 (or comparable) mixdowns. Our VBAP
renderer only uses adjacent loudspeaker pairs for panning which does not
exploit all potentials of such a loudspeaker setup. For the mentioned
formats specialized panning processes have been developed also employing
non-adjacent loudspeaker pairs if desired.
The VBAP renderer is rather meant to be used with non-standardized
setups.
.. [Pulkki1997] Ville Pulkki. Virtual sound source positioning using Vector
Base Amplitude Panning. In Journal of the Audio Engineering Society (JAES),
Vol.45(6), June 1997.
.. _wfs:
Wave Field Synthesis Renderer
-----------------------------
Executable: ``ssr-wfs``
The Wave Field Synthesis (WFS) renderer is the only renderer so far
that discriminates between virtual point sources and plane waves. It
implements the simple (far-field) driving function given in [Spors2008]_. Note
that we have only
implemented a temporary solution to reduce artifacts when virtual sound
sources are moved. This topic is subject to ongoing research. We will
work on that in the future. In the SSR configuration file
(Section :ref:`Configuration File <ssr_configuration_file>`) you can
specify an overall predelay (this is necessary to render focused
sources) and the overall length of the involved delay lines. Both values
are given in samples.
.. [Spors2008] Sascha Spors, Rudolf Rabenstein, and Jens Ahrens. The theory of
Wave Field Synthesis revisited. In 124th Convention of the AES, Amsterdam,
The Netherlands, May 17–20, 2008.
Prefiltering
~~~~~~~~~~~~
As you might know, WFS requires a spectral correction additionally to
the delay and weighting of the input signal. Since this spectral
correction is equal for all loudspeakers, it needs to be performed only
once on the input. We are working on an automatic generation of the
required filter. Until then, we load the impulse response of the desired
filter from a .wav-file which is specified via the ``--prefilter=FILE``
command line option (see Section :ref:`Running SSR <running_ssr>`) or in the
SSR configuration file
(Section :ref:`Configuration File <ssr_configuration_file>`). Make sure
that the specified audio file contains only one channel. Files with a
differing number of channels will not be loaded. Of course, the sampling
rate of the file also has to match that of the JACK server.
Note that the filter will be zero-padded to the next highest power of 2.
If the resulting filter is then shorter than the current JACK frame
size, each incoming audio frame will be divided into subframes for
prefiltering. That means, if you load a filter of 100 taps and JACK
frame size is 1024, the filter will be padded to 128 taps and
prefiltering will be done in 8 cycles. This is done in order to save
processing power since typical prefilters are much shorter than typical
JACK frame sizes. Zero-padding the prefilter to the JACK frame size
usually produces large overhead. If the prefilter is longer than the
JACK frame buffer size, the filter will be divided into partitions whose
length is equal to the JACK frame buffer size.
If you do not specify a filter, then no prefiltering is performed. This
results in a boost of bass frequencies in the reproduced sound field.
In order to assist you in the design of an appropriate prefilter, we
have included the MATLAB script
``data/matlab_scripts/make_wfs_prefilter.m`` which does the job. In the
very top of the file, you can specify the sampling frequency, the
desired length of the filter as well as the lower and upper frequency
limits of the spectral correction. The lower limit should be chosen such
that the subwoofer of your system receives a signal which is not
spectrally altered. This is due to the fact that only loudspeakers which
are part of an array of loudspeakers need to be corrected. The lower
limit is typically around 100 Hz. The upper limit is given by the
spatial aliasing frequency. The spatial aliasing is dependent on the
mutual distance of the loudspeakers, the distance of the considered
listening position to the loudspeakers, and the array geometry. See [Spors2006]_ for
detailed information on how to determine the spatial aliasing frequency
of a given loudspeaker setup. The spatial aliasing frequency is
typically between 1000 Hz and 2000 Hz. For a theoretical treatment of
WFS in general and also the prefiltering, see [Spors2008]_.
The script ``make_wfs_prefilter.m`` will save the impulse response of
the designed filter in a file like ``wfs_prefilter_120_1500_44100.wav``.
From the file name you can extract that the spectral correction starts
at 120 Hz and goes up to 1500 Hz at a sampling frequency of 44100 Hz.
Check the folder ``data/impules_responses/wfs_prefilters`` for a small
selection of prefilters.
.. [Spors2006] Sascha Spors and Rudolf Rabenstein. Spatial aliasing artifacts
produced by linear and circular loudspeaker arrays used for Wave
Field Synthesis. In 120th Convention of the AES, Paris, France,
May 20–23, 2006.
Tapering
~~~~~~~~
When the listening area is not enclosed by the loudspeaker setup,
artifacts arise in the reproduced sound field due to the limited
aperture. This problem of spatial truncation can be reduced by so-called
tapering. Tapering is essentially an attenuation of the loudspeakers
towards the ends of the setup. As a consequence, the boundaries of the
aperture become smoother which reduces the artifacts. Of course, no
benefit comes without a cost. In this case the cost is amplitude errors
for which the human ear fortunately does not seem to be too sensitive.
In order to taper, you can assign the optional attribute ``weight`` to
each loudspeaker in ASDF format (refer to Section [sec:asdf]). The
``weight`` determines the linear (!) weight to be applied to the
respective loudspeaker. It defaults to 1 if it is not specified.
.. _aap:
Ambisonics Amplitude Panning Renderer
-------------------------------------
Executable: ``ssr-aap``
The Ambisonics Amplitude Panning (AAP) renderer does very simple
Ambisonics rendering. It does amplitude panning by simultaneously using
all loudspeakers that are not subwoofers to reproduce a virtual source
(contrary to the VBAP renderer which uses only two loudspeakers at a
time). Note that the loudspeakers should ideally be arranged on a circle
and the reference should be the center of the circle. The renderer
checks for that and applies delays and amplitude corrections to all
loudspeakers that are closer to the reference than the farthest. This
also includes subwoofers. If you do not want close loudspeakers to be
delayed, then simply specify their location in the same direction like
its actual position but at a larger distance from the reference. Then
the graphical illustration will not be perfectly aligned with the real
setup, but the audio processing will take place as intended. Note that
the AAP renderer ignores delays assigned to an individual loudspeaker in
ASDF. On the other hand, it does consider weights assigned to the
loudspeakers. This allows you to compensate for irregular loudspeaker
placement.
Note finally that AAP does not allow to encode the distance of a virtual
sound source since it is a simple panning renderer. All sources will
appear at the distance of the loudspeakers.
If you do not explicitly specify an Ambisonics order, then the maximum
order which makes sense on the given loudspeaker setup will be used. The
automatically chosen order will be one of :math:`(L-1)/2` for an odd number
:math:`L` of loudspeakers and accordingly for even numbers.
You can manually set the order via a command line option
(Section :ref:`Running SSR <running_ssr>`) or the SSR configuration file
(Section :ref:`Configuration File <ssr_configuration_file>`). We therefore
do not explicitly discriminate between "higher order" and "lower order"
Ambisonics since this is not a fundamental property. And where does
"lower order" end and "higher order" start anyway?
Note that the graphical user interface will not indicate the activity of
the loudspeakers since theoretically all loudspeakers contribute to the
sound field of a virtual source at any time.
Conventional driving function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default we use the standard Ambisonics panning function presented,
for example, in [Neukom2007]_. It reads
.. math::
d(\alpha_0) = \frac{\sin\left ( \frac{2M+1}{2} \ (\alpha_0 -
\alpha_\textrm{s})\right )} {(2M+1) \ \sin \left ( \frac{\alpha_0 -
\alpha_\textrm{s}}{2} \right ) },
whereby :math:`\alpha_0` is the azimuth angle of the position of the
considered secondary source, :math:`\alpha_\textrm{s}` is the azimuth
angle of the position of the virtual source, both in radians, and :math:`M` is
the Ambisonics order.
In-phase driving function
~~~~~~~~~~~~~~~~~~~~~~~~~
The conventional driving function leads to both positive and negative
weights for individual loudspeakers. An object (e.g. a listener)
introduced into the listening area can lead to an imperfect interference
of the wave fields of the individual loudspeakers and therefore to an
inconsistent perception. Furthermore, conventional Ambisonics panning
can lead to audible artifacts for fast source motions since it can
happen that the weights of two adjacent audio frames have a different
algebraic sign.
These problems can be worked around when only positive weights are
applied on the input signal (*in-phase* rendering). This can be
accomplished via the in-phase driving function given e.g. in [Neukom2007]_
reading
.. math:: d(\alpha_0) = \cos^{2M} \left (\frac{\alpha_0 - \alpha_\textrm{s}}{2}
\right ) \ . \nonumber
Note that in-phase rendering leads to a less precise localization of the
virtual source and other unwanted perceptions. You can enable in-phase
rendering via the according command-line option or you can set the
``IN_PHASE_RENDERING`` property in the SSR configuration file (see
section :ref:`Configuration File <ssr_configuration_file>`) to be
``TRUE`` or ``true``.
.. [Neukom2007] Martin Neukom. Ambisonic panning. In 123th Convention of the
AES, New York, NY, USA, Oct. 5–8, 2007.
.. _dca:
Distance-coded Ambisonics Renderer
----------------------------------
Executable: ``ssr-dca``
Distance-coded Ambisonics (DCA) is sometimes also termed "Nearfield Compensated Higher-Order Ambisonics". This renderer implements the driving functions from [Spors2011]_. The difference to the AAP renderer is a long story, which we will elaborate on at a later point.
Note that the DCA renderer is experimental at this stage. It currently supports orders of up to 28. There are some complications regarding how the user specifies the locations of the loudspeakers and how the renderer handles them. The rendered scene might appear mirrored or rotated. If you are experiencing this, you might want to play around with the assignment of the outputs and the loudspeakers to fix it temporarily. Or contact us.
Please bear with us. We are going to take care of this soon.
.. [Spors2011] Sascha Spors, Vincent Kuscher, and Jens Ahrens. Efficient Realization of Model-Based Rendering for 2.5-dimensional Near-Field Compensated Higher Order Ambisonics. In IEEE WASPAA, New Paltz, NY, USA, 2011.
.. _genren:
Generic Renderer
----------------
Executable: ``ssr-generic``
The generic renderer turns the SSR into a multiple-input-multiple-output
convolution engine. You have to use an ASDF file in which the attribute
``properties_file`` of the individual sound source has to be set
properly. That means that the indicated file has to be a multichannel
file with the same number of channels like loudspeakers in the setup.
The impulse response in the file at channel 1 represents the driving
function for loudspeaker 1 and so on.
Be sure that you load a reproduction setup with the corresponding number
of loudspeakers.
It is obviously not possible to move virtual sound sources since the
loaded impulse responses are static. We use this renderer in order to
test advanced methods before implementing them in real-time or to
compare two different rendering methods by having one sound source in
one method and another sound source in the other method.
Download the ASDF examples from https://github.com/SoundScapeRenderer/example-scenes and check out the file ``generic_renderer_example.asd`` which comes with all required data.
Look also :ref:`here <mimo>` for more general signal processing examples using SSR.
.. _loudspeaker_properties:
================== ================ ======
.. individual delay weight
------------------ ---------------- ------
binaural renderer *-* *-*
BRS renderer *-* *-*
VBAP renderer *+* *+*
WFS renderer *-* *+*
AAP renderer autom. *+*
generic renderer *-* *-*
================== ================ ======
Table 1: Loudspeaker properties considered by the different renderers.
.. _source_props:
================= ====== ===== ======== ================ ==================== =================
.. gain mute position orientation [2]_ distance attenuation model
----------------- ------ ----- -------- ---------------- -------------------- -----------------
binaural renderer *+* *+* *+* *+* *+* only w.r.t. ampl.
BRS renderer *+* *+* *-* *-* *-* *-*
VBAP renderer *+* *+* *+* *+* *+* only w.r.t. ampl.
WFS renderer *+* *+* *+* *+* *+* *+*
AAP renderer *+* *+* *+* *-* *+* only w.r.t. ampl.
generic renderer *+* *+* *-* *-* *-* *-*
================= ====== ===== ======== ================ ==================== =================
Table 2: Virtual source's properties considered by the different renderers.
Summary
-------
Tables :ref:`1 <loudspeaker_properties>` and :ref:`2 <source_props>` summarize
the functionality of the
SSR renderers.
.. [2]
So far, only planar sources have a defined orientation. By default, their
orientation is always pointing from their nominal position to the reference
point no matter where you move them. Any other information or updates on the
orientation are ignored. You can changes this behavior by using either the
command line option ``--no-auto-rotation``, using the ``AUTO_ROTATION``
configuration parameter, or hitting ``r`` in the GUI.
|