I'm trying to use speech_recognition 3.1.2 using Python 3.4 but I've been having troubles the entire time.
Initially when trying to use just the example WAV recognizer I was getting TypeError: 'str' does not support the buffer interface so I combed through the source and made the following change:
def read(self, size = -1):
buffer = self.wav_reader.readframes(self.wav_reader.getnframes() if size == -1 else size)
if type(buffer) is str:
buffer = buffer.encode(encoding="utf-8", errors="strict")
print(buffer)
if self.wav_reader.getnchannels() != 1: # stereo audio
try:
buffer = audioop.tomono(buffer, self.wav_reader.getsampwidth(), 1, 1) # convert stereo audio data to mono
except Exception as e:
print(e)
return buffer
from:
def read(self, size = -1):
buffer = self.wav_reader.readframes(self.wav_reader.getnframes() if size == -1 else size)
if self.wav_reader.getnchannels() != 1: # stereo audio
buffer = audioop.tomono(buffer, self.wav_reader.getsampwidth(), 1, 1) # convert stereo audio data to mono
return buffer
While it doesn't throw an error now the transcription quality is terrible. I can run python -m speech_recognition with great accuracy so I'm not sure what is happening. I upped the energy_threshold to 4000 to make sure it wasn't an ambient noise issue. I even used 2 different recognition services (IBM and Google Speech Recognition). Also, for some reason the last 2 buffers are empty Strings which I then have to convert to byte objects
b''
b''
Any advice would be awesome!