plotting spectrogram in audio analysis

Question

I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?

Yeah ... got some good resources from your answer. @Oleg Meknikov — kks
– kks, Commented Dec 26, 2017 at 13:04

Oleg Melnikov · Accepted Answer · 2017-12-24 20:58:36Z

9

There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.

from scipy.io import wavfile
from scipy import signal
import numpy as np

sample_rate, audio = wavfile.read(path_to_wav_file)

def log_specgram(audio, sample_rate, window_size=20,
                 step_size=10, eps=1e-10):
    nperseg = int(round(window_size * sample_rate / 1e3))
    noverlap = int(round(step_size * sample_rate / 1e3))
    freqs, times, spec = signal.spectrogram(audio,
                                    fs=sample_rate,
                                    window='hann',
                                    nperseg=nperseg,
                                    noverlap=noverlap,
                                    detrend=False)
    return freqs, times, np.log(spec.T.astype(np.float32) + eps)

Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)

freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.

Here is another thread that explains and gives code on building spectrograms in Python.

edited Dec 24, 2017 at 20:58

answered Dec 23, 2017 at 17:13

Oleg Melnikov

3,2984 gold badges41 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

kks Over a year ago

can you help what is returned by freqs , times and spec ?? I have seen the documentation but still confused . @Oleg Melnikov

Oleg Melnikov Over a year ago

@kks: see the additional explanation for the output :) I hope it helps.

Greg Sadetsky Over a year ago

Thank you very much! Just to add a small point that if the loaded .wav file is stereo, you will see a "noverlap must be less than nperseg" error, which is a red herring. You can get the first channel's audio signal by doing audio = audio[:, 0] and then your log_specgram will work great. :-) Thanks again!

Wes Hardaker · Accepted Answer · 2023-09-14 22:04:04Z

4

Scipy serve this purpose.

import scipy
# Read the .wav file
sample_rate, data = scipy.io.wavfile.read('directory_path/file_name.wav')

# Spectrogram of .wav file
sample_freq, segment_time, spec_data = scipy.signal.spectrogram(data, sample_rate)  
# Note sample_rate and sampling frequency values are same but theoretically they are different measures

Use matplot library to visualize the spectrogram

import matplotlib.pyplot as plt
plt.pcolormesh(segment_time, sample_freq, spec_data )
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

edited Sep 14, 2023 at 22:04

Wes Hardaker

22.3k2 gold badges42 silver badges70 bronze badges

answered Dec 23, 2017 at 17:24

Saran

1,8543 gold badges18 silver badges24 bronze badges

Comments

Lasith Niroshan · Accepted Answer · 2017-12-23 17:14:37Z

0

You can use NumPy, SciPy and matplotlib packages to make spectrograms. See this following post. http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html

answered Dec 23, 2017 at 17:14

Lasith Niroshan

1,08312 silver badges20 bronze badges

Collectives™ on Stack Overflow

plotting spectrogram in audio analysis

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related