0

I'm trying to use speech_recognition 3.1.2 using Python 3.4 but I've been having troubles the entire time.

Initially when trying to use just the example WAV recognizer I was getting TypeError: 'str' does not support the buffer interface so I combed through the source and made the following change:

    def read(self, size = -1):
        buffer = self.wav_reader.readframes(self.wav_reader.getnframes() if size == -1 else size)
        if type(buffer) is str:
            buffer = buffer.encode(encoding="utf-8", errors="strict")
            print(buffer)
        if self.wav_reader.getnchannels() != 1: # stereo audio
            try:
                buffer = audioop.tomono(buffer, self.wav_reader.getsampwidth(), 1, 1) # convert stereo audio data to mono
            except Exception as e:
                print(e)
        return buffer

from:

    def read(self, size = -1):
        buffer = self.wav_reader.readframes(self.wav_reader.getnframes() if size == -1 else size)
        if self.wav_reader.getnchannels() != 1: # stereo audio
            buffer = audioop.tomono(buffer, self.wav_reader.getsampwidth(), 1, 1) # convert stereo audio data to mono
        return buffer

While it doesn't throw an error now the transcription quality is terrible. I can run python -m speech_recognition with great accuracy so I'm not sure what is happening. I upped the energy_threshold to 4000 to make sure it wasn't an ambient noise issue. I even used 2 different recognition services (IBM and Google Speech Recognition). Also, for some reason the last 2 buffers are empty Strings which I then have to convert to byte objects

b''
b''

Any advice would be awesome!

2
  • there might be massive differences between python 2 and 3 in terms of the way strings are handled, you can have a look at: python3porting.com/problems.html. So not sure the change is super trivial.. Commented Nov 4, 2015 at 15:58
  • @toine That's what I had originally thought too but it looked like they wrote everything in Python 3 which was weird considering the errors I've got. Commented Nov 4, 2015 at 16:03

1 Answer 1

1

I've pushed out v3.1.3, which should fix the issue. Upgrade with pip install --upgrade SpeechRecognition and try out the fix!

There were actually two factors here:

  • There is a bug in the Python chunk library where it returns a string rather than an empty bytes object if the file pointer is at or past the end of the file. This was fixed a few months ago but most Python versions in use today still have that bug.
  • Stereo audio was not properly converted into mono - the channels were still set as stereo. This resulted in some interesting sounding audio!

See the changes here: https://github.com/Uberi/speech_recognition

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.