Python Speech_Recognition Bad Results

Question

I'm trying to use speech_recognition 3.1.2 using Python 3.4 but I've been having troubles the entire time.

Initially when trying to use just the example WAV recognizer I was getting TypeError: 'str' does not support the buffer interface so I combed through the source and made the following change:

    def read(self, size = -1):
        buffer = self.wav_reader.readframes(self.wav_reader.getnframes() if size == -1 else size)
        if type(buffer) is str:
            buffer = buffer.encode(encoding="utf-8", errors="strict")
            print(buffer)
        if self.wav_reader.getnchannels() != 1: # stereo audio
            try:
                buffer = audioop.tomono(buffer, self.wav_reader.getsampwidth(), 1, 1) # convert stereo audio data to mono
            except Exception as e:
                print(e)
        return buffer

from:

    def read(self, size = -1):
        buffer = self.wav_reader.readframes(self.wav_reader.getnframes() if size == -1 else size)
        if self.wav_reader.getnchannels() != 1: # stereo audio
            buffer = audioop.tomono(buffer, self.wav_reader.getsampwidth(), 1, 1) # convert stereo audio data to mono
        return buffer

While it doesn't throw an error now the transcription quality is terrible. I can run python -m speech_recognition with great accuracy so I'm not sure what is happening. I upped the energy_threshold to 4000 to make sure it wasn't an ambient noise issue. I even used 2 different recognition services (IBM and Google Speech Recognition). Also, for some reason the last 2 buffers are empty Strings which I then have to convert to byte objects

b''
b''

Any advice would be awesome!

there might be massive differences between python 2 and 3 in terms of the way strings are handled, you can have a look at: python3porting.com/problems.html. So not sure the change is super trivial.. — toine
– toine, Commented Nov 4, 2015 at 15:58
@toine That's what I had originally thought too but it looked like they wrote everything in Python 3 which was weird considering the errors I've got. — Obj3ctiv3_C_88
– Obj3ctiv3_C_88, Commented Nov 4, 2015 at 16:03

Anthony Zhang · Accepted Answer · 2015-11-05 02:11:44Z

1

I've pushed out v3.1.3, which should fix the issue. Upgrade with pip install --upgrade SpeechRecognition and try out the fix!

There were actually two factors here:

There is a bug in the Python chunk library where it returns a string rather than an empty bytes object if the file pointer is at or past the end of the file. This was fixed a few months ago but most Python versions in use today still have that bug.
Stereo audio was not properly converted into mono - the channels were still set as stereo. This resulted in some interesting sounding audio!

See the changes here: https://github.com/Uberi/speech_recognition

answered Nov 5, 2015 at 2:11

Anthony Zhang

865 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Speech_Recognition Bad Results

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related