1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
|
.. py:module:: torchaudio.transforms
torchaudio.transforms
=====================
.. currentmodule:: torchaudio.transforms
``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.
.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png
Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.
.. code::
# Define custom feature extraction pipeline.
#
# 1. Resample audio
# 2. Convert to power spectrogram
# 3. Apply augmentations
# 4. Convert to mel-scale
#
class MyPipeline(torch.nn.Module):
def __init__(
self,
input_freq=16000,
resample_freq=8000,
n_fft=1024,
n_mel=256,
stretch_factor=0.8,
):
super().__init__()
self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)
self.spec = Spectrogram(n_fft=n_fft, power=2)
self.spec_aug = torch.nn.Sequential(
TimeStretch(stretch_factor, fixed_rate=True),
FrequencyMasking(freq_mask_param=80),
TimeMasking(time_mask_param=80),
)
self.mel_scale = MelScale(
n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)
def forward(self, waveform: torch.Tensor) -> torch.Tensor:
# Resample the input
resampled = self.resample(waveform)
# Convert to power spectrogram
spec = self.spec(resampled)
# Apply SpecAugment
spec = self.spec_aug(spec)
# Convert to mel-scale
mel = self.mel_scale(spec)
return mel
.. code::
# Instantiate a pipeline
pipeline = MyPipeline()
# Move the computation graph to CUDA
pipeline.to(device=torch.device("cuda"), dtype=torch.float32)
# Perform the transform
features = pipeline(waveform)
Please check out tutorials that cover in-depth usage of trasforms.
.. minigallery:: torchaudio.transforms
Utility
-------
.. autosummary::
:toctree: generated
:nosignatures:
AmplitudeToDB
MuLawEncoding
MuLawDecoding
Resample
Fade
Vol
Loudness
AddNoise
Convolve
FFTConvolve
Speed
SpeedPerturbation
Deemphasis
Preemphasis
Feature Extractions
-------------------
.. autosummary::
:toctree: generated
:nosignatures:
Spectrogram
InverseSpectrogram
MelScale
InverseMelScale
MelSpectrogram
GriffinLim
MFCC
LFCC
ComputeDeltas
PitchShift
SlidingWindowCmn
SpectralCentroid
Vad
Augmentations
-------------
The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.
.. autosummary::
:toctree: generated
:nosignatures:
FrequencyMasking
TimeMasking
TimeStretch
Loss
----
.. autosummary::
:toctree: generated
:nosignatures:
RNNTLoss
Multi-channel
-------------
.. autosummary::
:toctree: generated
:nosignatures:
PSD
MVDR
RTFMVDR
SoudenMVDR
|