Newest 'automatic-speech-recognition' Questions

0 votes

1 answer

182 views

Whisper model Real-time endpoint container deployment failed on Azure ML

I tried to deploy Whisper on Azure ML. I am using the Whipser-openAI-v3 model for deployment. The endpoint creation successes but deployment failed with the error ResouceOperationFailed and so the ...

Mostafa M. Galal

11

asked Jan 30 at 9:50

0 votes

0 answers

179 views

Whisper Inference

why transcribe stage we remove N_FRAMES from mel and in for loop over the mel_segment it didn't take the last segment if it's less than 3000 frame why? let's suppose that he mel = [80,4100] first mel ...

AbdElRhaman Fakhrygmailcom

39

asked Feb 8, 2024 at 20:53

0 votes

1 answer

149 views

Mozilla deepspeech from deepspeech import Model not working

Im trying to use mozilla deepspeech to transcribe text however im running into issues importing the Model module. here is my code from deepspeech.model import model model_file_path='deepspeech-0.9.3-...

Beginner_coder

47

asked Dec 27, 2023 at 15:44

1 vote

0 answers

13 views

Sphinxtrain: Unable to lookup word that exists in the dictionary

I'm adapting a sphinx model for Brazilian portuguese with my own data by following their tutorial and got stuck on the bw command in the "Accumulating observation counts" section. I made ...

Ícaro Lorran

218

asked Nov 16, 2023 at 11:01

0 votes

1 answer

526 views

react-speech-recognition package not working

It's a simple react package that convert user audio to text. I install the package and try its basic code example but it shows a error "RecognitionManager.js:247 Uncaught ReferenceError: ...

Aakash Saini

1

asked Sep 1, 2023 at 7:58

0 votes

1 answer

1k views

Why is Word Information Lost (WIL) calculated the way it is?

Word Information Lost (WIL) is a measure of the performance of an automated speech recognition (ASR) service (e.g. AWS Transcribe, Google Speech-to-Text, etc.) against a gold standard (usually human-...

jayp

362

asked Aug 8, 2023 at 18:56

4 votes

0 answers

841 views

Speaker Diarization is disabled even for supported languages in Google Speech-to-Text API V2

I'm trying to use Google's Speech-to-Text v2 API for transcription and speaker diarization. Per this supported languages page, I should be able to create a Recognizer using the "long" model ...

jayp

362

asked Jul 27, 2023 at 11:47

0 votes

1 answer

33 views

How does placing the output (word) labels on the initial transitions of the words in an FST lead to effective composition?

I am going through hbka.pdf (WFST paper). https://cs.nyu.edu/~mohri/pub/hbka.pdf A WFST figure for reference Here the input label i, the output label o, and weight w of a transition are marked on the ...

Anantha Krishnan

13

asked Jul 10, 2023 at 6:11

1 vote

0 answers

693 views

Fine tunned Whisper-medium always predict "" for all samples

i'm trying to fine tunning whisper-medium for Koreans language. Here is tutorial that i followed. And here is my experiment setting python==3.9.16 transformers==4.27.4 tokenizers==0.13.3 torch==2.0.0 ...

남영우

11

asked Apr 20, 2023 at 2:26

3 votes

3 answers

4k views

How to get all hugging face models list using python?

Is there any way to get list of models available on Hugging Face? E.g. for Automatic Speech Recognition (ASR).

Neerav Mathur Jazzy

41

asked Mar 22, 2023 at 9:08

5 votes

1 answer

8k views

How to segment and transcribe an audio from a video into timestamped segments?

I want to segment a video transcript into chapters based on the content of each line of speech. The transcript would be used to generate a series of start and end timestamps for each chapter. This is ...

nonsequiter

741

asked Mar 20, 2023 at 20:14

Collectives™ on Stack Overflow

Whisper model Real-time endpoint container deployment failed on Azure ML

Whisper Inference

Mozilla deepspeech from deepspeech import Model not working

Sphinxtrain: Unable to lookup word that exists in the dictionary

react-speech-recognition package not working

Why is Word Information Lost (WIL) calculated the way it is?

Speaker Diarization is disabled even for supported languages in Google Speech-to-Text API V2

How does placing the output (word) labels on the initial transitions of the words in an FST lead to effective composition?

Fine tunned Whisper-medium always predict "" for all samples

How to get all hugging face models list using python?

How to segment and transcribe an audio from a video into timestamped segments?

Hot Network Questions