0

I am working with a end to emd speech recognition system. i have language model for a language in .lm extension a and other inference and pronunciation models.I want it to make prediction from that models can any one suggest me how to do it in python. I can get mfcc's from the audio file and i have language model how to connect these two to make predictions.Thank you in advance.

I am looking for how to use and what library is to be used in python.

2
  • I have some clarifying questions - is the .lm model generated with KenLM? And what are you using for the character prediction part of your model - something like DeepSpeech or Kaldi? Commented Feb 22, 2023 at 21:58
  • yes its generated with n gram model using kenLM Commented Feb 24, 2023 at 16:35

1 Answer 1

0

End to end speech recognition systems use many components, and you will need to investigate and join these components together for your system.

  • Firstly, you will need a way to record audio and generate an audio file or stream. The speech recognition library in PyPI is a good place to start for this. It also uses several other models to do the matching of audio to written text, but you can use the Microphone class in this package to capture audio.

  • You then need a way to do character or phoneme prediction. There are several options for this layer of your project, but what you want is probably an LSTM - long, short-term memory type of model. If you search for LSTM for automatic speech recognition you will probably find some Colab Notebooks or Jupyter notebooks around that implement it from scratch using Torch or Tensorflow.

  • You then need a layer that decodes the characters that have been predicted and matches them to words - connectionist temporal classification on Distill is a good general approach to this and the pyctcdecode library is a good starting place. This takes KenLM models as an input.

Putting these layers together will require some Python experience, but is something you should be able to achieve with a notebook.

If you want to look at an end to end system that already does this, then check out the Deepspeech PlayBook, which walks you through the end to end implementation of a sequence to sequence speech recognition model.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.