1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121
|
# coding: utf-8
# Code source: Brian McFee
# License: ISC
"""
==============
PCEN Streaming
==============
This notebook demonstrates how to use streaming IO with `librosa.pcen`
to do dynamic per-channel energy normalization on a spectrogram incrementally.
This is useful when processing long audio files that are too large to load all at
once, or when streaming data from a recording device.
It also illustrates how to use a pre-allocated output buffer for block-wise
short-time Fourier transforms. This provides a minor speed boost and reduction
in memory usage when processing audio streams.
"""
##################################################
# We'll need numpy and matplotlib for this example
import numpy as np
import matplotlib.pyplot as plt
import soundfile as sf
import librosa
######################################################################
# First, we'll start with an audio file that we want to stream
filename = librosa.ex('humpback')
#####################################################################
# Next, we'll set up the block reader to work on short segments of
# audio at a time.
# We'll generate 64 frames at a time, each frame having 2048 samples
# and 75% overlap.
#
n_fft = 2048
hop_length = 512
# fill_value pads out the last frame with zeros so that we have a
# full frame at the end of the signal, even if the signal doesn't
# divide evenly into full frames.
sr = librosa.get_samplerate(filename)
stream = librosa.stream(filename, block_length=16,
frame_length=n_fft,
hop_length=hop_length,
mono=True,
fill_value=0)
#######################################################################
# For this example, we'll compute PCEN on each block, find the maximum
# response over frequency, and store the results in a list.
# Make an array to store the frequency-averaged PCEN values
pcen_blocks = []
# Initialize the PCEN filter delays to steady state
zi = None
# Create a handle for storing the block STFT outputs
# After the first block has been processed, we can reuse
# this buffer instead of allocating a new one for each block.
D = None
for y_block in stream:
# Compute the STFT (without padding, so center=False)
D = librosa.stft(y_block, n_fft=n_fft, hop_length=hop_length,
center=False, out=D)
# Compute PCEN on the magnitude spectrum, using initial delays
# returned from our previous call (if any)
# store the final delays for use as zi in the next iteration
P, zi = librosa.pcen(np.abs(D), sr=sr, hop_length=hop_length,
zi=zi, return_zf=True)
# Compute the max PCEN over frequency, and append it to our list
pcen_blocks.extend(np.max(P, axis=0))
# Cast to a numpy array for use downstream
pcen_blocks = np.asarray(pcen_blocks)
#####################################################################
# For the sake of comparison, let's see how it would look had we
# run PCEN on the entire spectrum without block-wise processing
y, sr = librosa.load(filename, sr=sr)
# Keep the same parameters as before
D = librosa.stft(y, n_fft=n_fft, hop_length=hop_length, center=False)
# Compute pcen on the magnitude spectrum.
# We don't need to worry about initial and final filter delays if
# we're doing everything in one go.
P = librosa.pcen(np.abs(D), sr=sr, hop_length=hop_length)
pcen_full = np.max(P, axis=0)
#####################################################################
# Plot the PCEN spectrum and the resulting magnitudes
# First, plot the spectrum
fig, ax = plt.subplots(nrows=2, sharex=True)
librosa.display.specshow(P, sr=sr, hop_length=hop_length,
x_axis='time', y_axis='log', ax=ax[0])
ax[0].set(title='PCEN spectrum')
ax[0].label_outer()
# Now we'll plot the pcen curves
times = librosa.times_like(pcen_full, sr=sr, hop_length=hop_length)
ax[1].plot(times, pcen_full, linewidth=3, alpha=0.25, label='Full signal PCEN')
times = librosa.times_like(pcen_blocks, sr=sr, hop_length=hop_length)
ax[1].plot(times, pcen_blocks, linestyle=':', label='Block-wise PCEN')
ax[1].legend()
# Zoom in to a short patch to see the fine details
ax[1].set(xlim=[30, 40]);
|