File: transforms.rst

package info (click to toggle)
pytorch-audio 2.6.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 10,696 kB
  • sloc: python: 61,274; cpp: 10,031; sh: 128; ansic: 70; makefile: 34
file content (153 lines) | stat: -rw-r--r-- 3,380 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
.. py:module:: torchaudio.transforms

torchaudio.transforms
=====================

.. currentmodule:: torchaudio.transforms

``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.


.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png

Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.

.. code::

   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)

           self.spec = Spectrogram(n_fft=n_fft, power=2)

           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )

           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)

       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)

           # Convert to power spectrogram
           spec = self.spec(resampled)

           # Apply SpecAugment
           spec = self.spec_aug(spec)

           # Convert to mel-scale
           mel = self.mel_scale(spec)

           return mel


.. code::

   # Instantiate a pipeline
   pipeline = MyPipeline()

   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)

   # Perform the transform
   features = pipeline(waveform)

Please check out tutorials that cover in-depth usage of trasforms.

.. minigallery:: torchaudio.transforms

Utility
-------

.. autosummary::
    :toctree: generated
    :nosignatures:

    AmplitudeToDB
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
    AddNoise
    Convolve
    FFTConvolve
    Speed
    SpeedPerturbation
    Deemphasis
    Preemphasis

Feature Extractions
-------------------

.. autosummary::
    :toctree: generated
    :nosignatures:

    Spectrogram
    InverseSpectrogram
    MelScale
    InverseMelScale
    MelSpectrogram
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
-------------

The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.

.. autosummary::
    :toctree: generated
    :nosignatures:

    FrequencyMasking
    TimeMasking
    TimeStretch

Loss
----

.. autosummary::
    :toctree: generated
    :nosignatures:

    RNNTLoss

Multi-channel
-------------

.. autosummary::
    :toctree: generated
    :nosignatures:

    PSD
    MVDR
    RTFMVDR
    SoudenMVDR