1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
|
VLevel 0.3 technical.txt
This is an outline of how VLevel works, which will hopefully be useful
to anyone who helps develop it. (And me in a month.)
Code Layout
The core of VLevel is the VolumeLeveler class. It contains the
look-ahead buffer. It works on blocks of double-precision numbers,
called values (value_t). Some number (channels) of values makes up
a sample. Many of VolumeLeveler's functions take lengths in samples
instead of values, so be careful.
vlevel-bin drives the VolumeLeveler class, and also uses CommandLine
to parse it's options.
Soon, there will be a Ruby script using SOX that makes Ogg, FLAC, and wav
files work with a nice drag-n-drop GUI.
General Idea
The intent is to make the quiet parts louder. This is done by
computing the volume at each point, then scaling it as described in
the Math section. The complex part is finding out what the volume
is at each point (VolumeLeveler::avg_amp). It must vary smoothly so
the volume changes aren't obvious, but it must always be above the
peaks of the waveform, or clipping could occur.
To do this, there is a fifo buffer. Imagine a graph of position
vs. amplitude showing the contents of the buffer. A horizontal line
is drawn across it, representing VolumeLeveler::avg_amp. From where
avg_amp crosses the y-axis, a line is drawn with the maximum
possible slope to intercept one of the amplitude points. This line
which is as steep as possible, is the smooth line that avg_amp will
follow.
When the value at the head of the buffer is removed, it is scaled
based on avg_amp. avg_amp is then incremented by the slope of the
line. If we reach the point the line was drawn to (max_slope_pos),
we search the fifo for the next point of highest slope. Otherwise,
we only need to check the incoming sample to see if a line drawn to
it has the highest slope.
y y (a few samples later)
^ ^ ^
| / max_slope |
| / |
| /s |s\---------- avg_amp
| / s |s \
| / s |s \ max_slope
| / s s |s s \
|--s-ss-s----avg_amp |s s \
| ss ss s s |s s ss
|ssssssssss |ssssss
+------------> x +---------> x
Sorry for the ASCII art. The result is that the average amplitude
(avg_amp) varies slowly and always stays above the amplitude of each
sample. When the samples are removed, they are scaled based on the
next section.
Math
Once we have avg_amp, each sample is scaled when it is output
according to this:
output_sample = sample * avg_amp ^ (-strength)
This is derived as follows:
First, we convert the amplitude of avg_amp to decibels (1 = 0dB):
avg_amp_db = 10 * log10(avg_amp)
avg_amp_db is less than zero. We want to scale it to be closer to
zero, in such a way that if strength is 1, it will become zero, and
if strength is 0 it will remain unchanged.
ideal_amp_db = avg_amp_db * (1 - strength)
ideal_amp_db = 10 * log10(avg_amp) * (1 - strength)
Now we convert back to samples:
ideal_amp = 10 ^ (ideal_amp_db / 10)
ideal_amp = 10 ^ (log10(avg_amp) * (1 - strength))
ideal_amp = (10 ^ log10(avg_amp)) ^ (1 - strength)
ideal_amp = avg_amp ^ (1 - strength)
Now we find out what we should multiply the samples by to change
their peak amplitude, avg_amp, to their ideal peak amplitude,
ideal_amp:
multiplier = ideal_amp / avg_amp
multiplier = avg_amp ^ (1 - strength) / avg_amp
multiplier = avg_amp ^ (-strength)
And finally, we multiply the sample by the multiplier:
output_sample = sample * multiplier
output_sample = sample * avg_amp ^ (-strength)
Undoing the effect
If the original values for strength weren't too close to 1, you can
undo the VLevel by giving the undo option. It works by changing
strength as shown below.
When we first leveled, we scaled the amplitudes like so:
ideal_amp_db = avg_amp_db * (1 - strength)
To get that back, we solve for avg_amp_db
avg_amp_db = ideal_amp_db * 1 / (1 - strength)
In this pass, however, the original avg_amp_db becomes ideal_amp_db,
and the original (1 - strength) becomes 1 / (1 - strength). Now
we skip ahead a bit:
multiplier = avg_amp ^ (1 - strength) / avg_amp
Substituting as explained above and continuing:
multiplier = avg_amp ^ (1 / (1 - strength)) / avg_amp
multiplier = avg_amp ^ ((1 / (1 - strength)) - 1)
multiplier = avg_amp ^ (strength / (1 - strength))
But how do we get VLevel to do this? Well, we can give it any
strength, and it does this:
multiplier = avg_amp ^ -strength
And we want it to do this:
multiplier = avg_amp ^ (undo_strength / (1 - undo_strength))
So...
-strength = undo_strength / (1 - undo_strength)
strength = undo_strength / (undo_strength - 1)
By choosing strength as above before starting VLevel, we can then
undo the first VLevel, with no change to the main algorithm.
To be totally precise, we'd also have to make a min_multiplier with
a value of 1 / orig_max_multiplier, but that would be slow, and does
anybody care if we drop the static anyway?
It's not perfect, probably because avg_amp moves linearly, not
logarithmically, so there are some rounding errors. Someday I might
try changing that, but it's a big change.
|