File: transforms.rst

package info (click to toggle)
pytorch-audio 2.6.0-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 10,696 kB
sloc: python: 61,274; cpp: 10,031; sh: 128; ansic: 70; makefile: 34
file content (153 lines) | stat: -rw-r--r-- 3,380 bytes
.. py:module:: torchaudio.transforms

torchaudio.transforms
=====================

.. currentmodule:: torchaudio.transforms

``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.


.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png

Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.

.. code::

   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)

           self.spec = Spectrogram(n_fft=n_fft, power=2)

           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )

           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)

       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)

           # Convert to power spectrogram
           spec = self.spec(resampled)

           # Apply SpecAugment
           spec = self.spec_aug(spec)

           # Convert to mel-scale
           mel = self.mel_scale(spec)

           return mel


.. code::

   # Instantiate a pipeline
   pipeline = MyPipeline()

   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)

   # Perform the transform
   features = pipeline(waveform)

Please check out tutorials that cover in-depth usage of trasforms.

.. minigallery:: torchaudio.transforms

Utility
-------

.. autosummary::
    :toctree: generated
    :nosignatures:

    AmplitudeToDB
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
    AddNoise
    Convolve
    FFTConvolve
    Speed
    SpeedPerturbation
    Deemphasis
    Preemphasis

Feature Extractions
-------------------

.. autosummary::
    :toctree: generated
    :nosignatures:

    Spectrogram
    InverseSpectrogram
    MelScale
    InverseMelScale
    MelSpectrogram
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
-------------

The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.

.. autosummary::
    :toctree: generated
    :nosignatures:

    FrequencyMasking
    TimeMasking
    TimeStretch

Loss
----

.. autosummary::
    :toctree: generated
    :nosignatures:

    RNNTLoss

Multi-channel
-------------

.. autosummary::
    :toctree: generated
    :nosignatures:

    PSD
    MVDR
    RTFMVDR
    SoudenMVDR