6

I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?

3
  • 1
    See this python module: Speech Recognition Commented Dec 23, 2017 at 17:00
  • @kks, has my answer helped you? Commented Dec 23, 2017 at 19:41
  • Yeah ... got some good resources from your answer. @Oleg Meknikov Commented Dec 26, 2017 at 13:04

3 Answers 3

9

There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.

from scipy.io import wavfile
from scipy import signal
import numpy as np

sample_rate, audio = wavfile.read(path_to_wav_file)

def log_specgram(audio, sample_rate, window_size=20,
                 step_size=10, eps=1e-10):
    nperseg = int(round(window_size * sample_rate / 1e3))
    noverlap = int(round(step_size * sample_rate / 1e3))
    freqs, times, spec = signal.spectrogram(audio,
                                    fs=sample_rate,
                                    window='hann',
                                    nperseg=nperseg,
                                    noverlap=noverlap,
                                    detrend=False)
    return freqs, times, np.log(spec.T.astype(np.float32) + eps)

Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)

freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.

Here is another thread that explains and gives code on building spectrograms in Python.

Sign up to request clarification or add additional context in comments.

3 Comments

can you help what is returned by freqs , times and spec ?? I have seen the documentation but still confused . @Oleg Melnikov
@kks: see the additional explanation for the output :) I hope it helps.
Thank you very much! Just to add a small point that if the loaded .wav file is stereo, you will see a "noverlap must be less than nperseg" error, which is a red herring. You can get the first channel's audio signal by doing audio = audio[:, 0] and then your log_specgram will work great. :-) Thanks again!
4

Scipy serve this purpose.

import scipy
# Read the .wav file
sample_rate, data = scipy.io.wavfile.read('directory_path/file_name.wav')

# Spectrogram of .wav file
sample_freq, segment_time, spec_data = scipy.signal.spectrogram(data, sample_rate)  
# Note sample_rate and sampling frequency values are same but theoretically they are different measures

Use matplot library to visualize the spectrogram

import matplotlib.pyplot as plt
plt.pcolormesh(segment_time, sample_freq, spec_data )
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()  

Comments

0

You can use NumPy, SciPy and matplotlib packages to make spectrograms. See this following post. http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.