File: psycho.txt

package info (click to toggle)
twolame 0.3.13-4
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 2,800 kB
  • sloc: sh: 11,099; ansic: 9,332; perl: 286; makefile: 171
file content (114 lines) | stat: -rw-r--r-- 4,390 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
Psychoacoustic Models in TwoLAME
================================


Introduction
------------

In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which 
are the sonically important parts of the waveform that is being encoded.  The PAM 
looks for loud sounds which may mask soft sounds, noise which may affect the level 
of sounds nearby, sounds which are too soft for us to hear and should be ignored 
and so on.  The information from the PAM is used to determine which parts of the 
spectrum should get more bits and thus be encoded at greater quality - and which 
parts are inaudible/unimportant and should thus get fewer bits.

In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes 
a 'frame'. For each frame the PAM outputs just *32* values 
(The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
There are only 32 values to determine how to alloctate bits for 1152 samples - this 
is a pretty coarse technique.

The different PAMs listed below use different techniques to decide on these 32 
values. Some models are better than others - meaning that the 32 values chosen 
are pretty good at spreading the bits where they should go.  Even with a really 
bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
All of these models have strengths and weaknesses.  The model 'you' end up using 
will be the one that produces the best sound for your ears, for your audio.  

Psychoacoustic Model -1
-----------------------

This PAM doesn't actually look at the samples being encoded to decide upon the 
output values.  There is simply a set of 32 default values which are used, 
regardless of input.

*Pros*: Faaaast. Low complexity. Surprisingly good.
"Surprising" in that the other PAMs go to the effort of calculating FFTs
and subbands and masking, and this one does absolutely *nothing*. 
Zip. Nada. Diddly Squat. This model might be the best example of why 
it is hard to make a good model - if having no computations sounds OK, 
how do you improve on it?

*Cons*: Absolutely no attempt to consider any of the masking effects that 
would help the audio sound better. 


Psychoacoustic Model 0
----------------------

This PAM looks at the sizes of the 'scalefactors' for the audio and combines 
it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.

*Pros*: Faaast. Low complexity.

*Cons*: This model has absolutely no mathematical basis and does not use 
any perceptual model of hearing.  It simply juggles some of the numbers of 
the input sound to determine the values. Feel free to hack the daylights out 
of this PAM - add multipliers, constants, log-tables *anything*. Tweak it until 
you begin to like the sound.


Psychoacoustic Model 1 and 2
----------------------------

These PAMs are from the ISO standard. Just because they are the standard, 
doesn't mean that they are any good. Look at LAME which basically threw out 
the MP3 standard psycho models and made their own (GPSYCHO).

*Pros*: A reference for future PAMs

*Cons*: Terrible ISO code, buggy tables, poor documentation.


Psychoacoustic Model 3
----------------------

A re-implementation of psychoacoustic model 1.  ISO11172 was used as the guide 
for re-writing this PAM from the ground up.

*Pros*: No more obscure tables of values from the ISO code. Hopefully a good 
base to work upon for tweaking PAMs

*Cons*: At the moment, doesn't really sound any better than PAM1


Psychoacoustic Model 4
----------------------

A cleaned up version of PAM2.

*Pros*: Faster than PAM2. No more obscure tables of values from the ISO 
standard. Hopefully a good base to work from for improving the PAMs

*Cons*: Still has the same "warbling"/"Davros" problems as PAM2.



Future psychoacoustic models
----------------------------

There's a heap that could be done. Unfortunately, I've got a set of tin 
ears, crappy speakers and a noisy computer room.  If you've got the 
capability to do proper PAM testing then please feel free to do so. 
Otherwise, I'll just keep plodding along with new ideas as they 
arise, such as:

- Temporal masking (there's no pre-echo or anything in TwoLAME)
- Left Right Masking
- A PAM that's fully tuneable from the command line?
- Graphical output of SMR values etc. Would allow better debugging of PAMs
- Re-sampling routines
- Low/High pass filtering