Newest 'speech-recognition' Questions

0 votes

0 answers

49 views

PyInstaller executable throws “OSError: [WinError 50] The request is not supported” when using speech_recognition (FLAC error)

I’m building a Python voice assistant using the speech_recognition library. Everything works perfectly when I run the code from PyCharm or the terminal, but when I convert it to an .exe using Auto Py ...

Konstantin_Violinov

1

asked Oct 25 at 15:36

1 vote

1 answer

55 views

TTS onDone callback never fires on Samsung (Android 15) post-SpeechRecognizer, even with AUDIOFOCUS_REQUEST_GRANTED

I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight. I am developing a voice assistant setup flow where the app ...

Andrei Babenko

21

asked Oct 4 at 12:20

0 votes

1 answer

80 views

Amazon Nova Sonic — should contentStart / contentEnd be sent once per session or once per user turn?

I'm integrating Amazon Nova Sonic (the speech-to-speech foundation model available through Amazon Bedrock) using the bidirectional streaming API The official Amazon Nova Sonic User Guide explains that:...

JJ Kam

91

asked Sep 28 at 22:10

1 vote

1 answer

75 views

How to transcribe audio files (m4a/wav) on Android? Can SpeechRecognizer API be used for this?

I have an audio file (in .m4a / .wav format) stored on the Android device, and I need to transcribe the speech content from it into text. From my understanding, the built-in SpeechRecognizer API in ...

Tushar raina

61

asked Sep 24 at 8:13

0 votes

1 answer

48 views

Android: Google Recognizer Intent: EXTRA_PREFER_OFFLINE and API 33+

Consider this Kotlin code to init a Google speech recognizer: recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) .apply { putExtra( ...

Yanay Lehavi

273

asked Sep 15 at 0:25

0 votes

0 answers

76 views

How to handle limitations and platform differences when using expo-speech-recognition for voice input?

I’m implementing a virtual assistant in my Expo app and want to use expo-speech-recognition for voice input. I’ve read that Android and iOS handle speech recognition differently at the engine level: ...

hoangnv_ral

1

asked Sep 11 at 10:31

0 votes

1 answer

70 views

How to remove the overlay STT box from the SpeechRecognizer API in Android Studio?

While making a STT app in Android Studio (Jetpack Compose). I encountered this in the SpeechRecognizer when I ran the app: STT in app I want to delete that so the UI looks more clean. Is there a way ...

Cold

1

asked Sep 6 at 8:09

0 votes

0 answers

30 views

Azure Speech Service Speaker Diarization: How to Optimize Real-Time Transcription Latency (Node.js + Angular)

I'm using Azure Speech-to-Text with speaker diarization in a real-time transcription app. Backend: Node.js (v18), using microsoft-cognitiveservices-speech-sdk and WebSocket server. Frontend: Angular (...

SGR

2,375

asked Sep 5 at 10:30

0 votes

0 answers

78 views

Python TensorFlow Speech Recognition -1073741819 (0xC0000005) Error

I'm working on a speech recognition project using TensorFlow in Python. Normally, TensorFlow can only be used with a CPU or an NVIDIA GPU. I have an AMD Radeon 7600S GPU. Because of this, I installed ...

Ömer Faruk Solmaz

1

asked Aug 22 at 17:04

1 vote

0 answers

73 views

Voice Wake Word Not Working on Mobile Browsers Using SpeechRecognition in React

I'm building a React web app that uses the Web Speech API (SpeechRecognition) to detect a wake word (like "Hey Wiz") from the user’s microphone input. The functionality works perfectly in ...

Varun V

19

asked Jul 30 at 12:33

0 votes

0 answers

76 views

Standalone Android application with AlphaCephei (Vosk) library

I need to integrate the AlphaCephei library to my Android application. I found a sample but it contains two modules - one is app with demo functionality, and another one is model located in the ...

Carlos

357

asked Jul 11 at 11:51

0 votes

1 answer

144 views

How to detect speech silence in Twilio Media Streams for real-time transcription using deepgram?

Twilio continuously sends audio chunks every 20 milliseconds, even during periods of silence. These chunks may contain silent audio data, making it challenging to identify "real silence" by ...

Lahfir

379

asked Jul 9 at 17:41

0 votes

0 answers

51 views

Speech recognition model giving garbled output

I used the following github repo: Speech Recognition. But since it didn't have code to train and save the model, I looked online and added code to speech_recognition to train and save the model and ...

FaisalShakeel

75

asked Jul 6 at 12:37

0 votes

0 answers

163 views

How to create a speech recognition model from scratch in Python

I am looking to create a speech recognition model from scratch without using an existing model. I have already used Whisper successfully but I need to create a model that I can train myself whose ...

FaisalShakeel

75

asked Jul 3 at 21:40

0 votes

1 answer

398 views

Why is Ollama answering every question and past question I have asked?

I am currently hosting Ollama locally on my laptop and importing it into a Python file. Every time I ask it a question, I append it to my 'messages' array. I then feed the entire 'messages' array to ...

Shmuck

13

asked Jul 3 at 17:20

0 votes

1 answer

98 views

In Apple's Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?

I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different ...

colourmebrad

29

asked May 22 at 12:35

0 votes

1 answer

126 views

WebRTC video issue in iOS-Safari browsers

We use Microsoft Avatar service as in below sample. the Avatar video is generated through a TURN server & sent to our app.. In iOS safari browsers alone, on first load only, WebRTC Audio track ...

balabp

23

asked May 19 at 9:16

0 votes

0 answers

51 views

how to read data from asterisk 11 to agi server ding dong in case of speech to text

I have asterisk server and nodejs agi server (ding-dong npm lib). I want to enable speech to text so that i can NLP. One quick way is to record the file on asterisk and then do stt using google speech ...

Code Guru

15.8k

asked May 10 at 11:08

2 votes

0 answers

63 views

How to perform speech recognition and audio recording simultaneously on Android?

[Q] How to record and transcribe (STT) audio at the same time on Android? I'm building a feature in an Android app that allows users to speak a sentence — the app needs to recognize the speech in real ...

임세현

31

asked May 7 at 9:04

0 votes

0 answers

45 views

Problem: Text Not Pasted After speech Recognition

The code uses Vosk for speech recognition and is supposed to paste the transcribed text into the current input field using pyperclip.copy() and pyautogui.hotkey('ctrl', 'v'). The speech is recognized ...

reza

11

asked May 3 at 15:34

1 vote

0 answers

73 views

Android SpeechRecognizer start sound too low or missing when using Bluetooth SCO audio routing (SDK 23–35)

I’m working on an Android app that includes hands-free voice interaction using the SpeechRecognizer API. It must be compatible from SDK 23 to 35. In most usage scenarios, the app runs outdoors with ...

Pierre Wargnier

457

asked Apr 8 at 14:30

0 votes

1 answer

143 views

I cant seem to get my Android App to work with Vosk in MacOS12.7

I have been through the wringer trying to get Voice Recognition for an Android App i'm developing on my MacOS 12.7, using Python3.10, Kivy2.3, speechrecognition requirement, and Vosk 3.44, ive been ...

GX1705

3

asked Apr 6 at 18:41

0 votes

1 answer

229 views

React Speech Recognition not working on mobile browser

I'm working on a Next.js project and trying to use react-speech-recognition. It works well on Chrome Desktop, but it doesn't work on Chrome Android. I also tried the push-to-talk method, but it still ...

Nico A.L

13

asked Apr 3 at 7:03

0 votes

0 answers

58 views

How to Capture Loopback Audio with SpeechRecognition (PyAudio)?

I’m working on a project where I need to use the speech_recognition module to process audio in real-time. However, from my research, it seems that speech_recognition (which works with pyaudio) only ...

Priyal Deep

1

asked Apr 2 at 13:30

1 vote

1 answer

116 views

window.SpeechRecognition || window.webkitSpeechRecognition; is not working

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1....

Manzer A

3,926

asked Mar 24 at 9:07

0 votes

0 answers

45 views

SpeechRecognition and Pocket Sphinx doesn't work

I'm following a youtube tutorial to make an assistant in python. When I say "Piggy", it responds as usual, but after that, when I say how are you, it gives me this error. The guy in the ...

Jhpark0303

13

asked Mar 21 at 23:48

0 votes

0 answers

75 views

Error with speech recognition: not-allowed

<!DOCTYPE html> <script> // Wait for the DOM to load before setting up the event listener document.addEventListener("DOMContentLoaded", function() { const ...

Mohit ChaudharI

1

asked Mar 5 at 6:20

0 votes

1 answer

158 views

Unable to install pyaudio via homebrew

In file included from src/pyaudio/device_api.c:1: In file included from src/pyaudio/device_api.h:7: /Library/Frameworks/Python.framework/Versions/3.13/include/python3.13/Python.h:19:10: ...

Archit Rajput

1

asked Mar 4 at 10:39

0 votes

1 answer

94 views

Unable to export custom language model data (Speech framework)

I am trying to customise language model but face the error when exporting. I created a project and copied example code from Apple: import Speech class Data { func export() async throws { ...

Goran

1

asked Feb 26 at 11:56

0 votes

0 answers

63 views

"BrokenPipeError: [Errno 32] Broken pipe" when sending a python scripts output to a while loop

Been pulling my brains out for a few hours now, I cannot seem to get this working, I have been to the 3rd page of google results but I cannot seem to get this right. code: #!/bin/bash python3.12 -m ...

usr_40476

11

asked Feb 21 at 22:46

1 vote

1 answer

119 views

Web Audio API preprocessing not improving Azure Speech SDK recognition accuracy for real-time meeting transcription

I'm working on a real-time speech-to-text application where microphone input is processed through Web Audio API before being sent to Azure Speech SDK. The main issue is that some audio content is ...

Su Myat

35

asked Feb 14 at 7:47

0 votes

2 answers

210 views

Flutter: Voice command 'open' not activating microphone listening state with Azure Speech Services

I'm building a Flutter application that uses Azure Speech Services for voice commands. When I say "open", the microphone should start listening (indicated by turning red), but it's not ...

pomoworko.com

982

asked Feb 10 at 20:52

0 votes

0 answers

70 views

PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores

I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices. Environment pyannote....

user29588450

1

asked Feb 10 at 20:47

0 votes

1 answer

143 views

How do I send transcribed text from speech directly to another endpoint or an azure function from my speech resource?

I am using Azure Speech Service resource to transcribe real time audio from my mic using microsoft-cognitiveservices-speech-sdk. I want to send the transcribed text to another endpoint (or to an azure ...

Abdullah Nadeem

1

asked Feb 7 at 7:58

0 votes

0 answers

73 views

Getting "_loaded : false" even after installing the @react-native-voice/voice and using the provided example code. React Native 0.77.0

I am using react native 0.77.0 and facing this {_loaded: false, _listeners: null, _events: {…}} _events : {onSpeechStart: ƒ, onSpeechRecognized: ƒ, onSpeechEnd: ƒ, onSpeechError: ƒ, onSpeechResults: ƒ,...

Aniket Biswas

31

asked Feb 5 at 6:06

0 votes

1 answer

388 views

Azure speech service continuous speech recognition

I'm pretty new to Azure speech service and I'm using twilo/plivo service for connecting a number with azure stt and process it further after transcription. My problem is when I speak something, it's ...

Henven

1

asked Jan 26 at 16:49

0 votes

1 answer

260 views

Azure Pronunciation Assessment Could not deserialize speech context error

I am trying to implement a pronunciation assessment system using Azure's JS SDK (see doc). I get the following error in console: "Could not deserialize speech context. websocket error code: 1007&...

nico_lrx

300

asked Jan 25 at 14:32

0 votes

0 answers

240 views

How to use speech_recognition and pyannote.audio simultaneously

How can I use the data from speech_recognition's listen() function as an embedding to compare with previously recorded .wav files of different speakers talking so that I can print (speaker): (...

Flamethrower

101

asked Jan 21 at 17:08

0 votes

0 answers

109 views

Why is Google recognizer missing in my SpeechRecognition library?

I'm trying to use the google recognizer from the SpeechRecognition library in Python import speech_recognition as sr rec = sr.Recognizer() with sr.Microphone() as mic: rec....

Quasartioon

1

asked Jan 14 at 17:37

1 vote

0 answers

56 views

speech_recognition and gtts don't understand numbers lower than 11

I put together straightforward code that asks the user to choose between option 1, oranges, and option 2, pears: options = { (1, "1", "one", "number one", "...

Louie Morais

11

asked Jan 2 at 11:05

1 vote

1 answer

173 views

WebSocket Connection Issue with Docker Compose and React JS

I'm encountering an issue with local deployment. The problem is with WebSocket connections when using Docker Compose to run a Kaldi server and a React JS frontend. The setup works fine when the Kaldi ...

Fedor

121

asked Dec 30, 2024 at 11:51

0 votes

1 answer

122 views

Vosk speech to text stops working when i disconnect my external mic

In tauri JS app I am recording audio from JS and processing it and sending data to python child process through a rust handler. In python script i am using vosk to convert speech to text on real time. ...

Zeeshan Ahmad Khalil

863

asked Dec 23, 2024 at 4:36

0 votes

1 answer

122 views

How to enable word level Confidence for MS Azure Speech to Text Service for Node JS

According to this, it's possible to get per word confidence levels in the JSON output for the Azure STT service. The issue is that I cannot seem to find out how to do this using the Node JS library (...

Matthew Knill

252

asked Dec 19, 2024 at 4:20

0 votes

2 answers

5k views

Use Vosk speech recognition with Python

I'm trying to use Vosk speech recognition in a Python script, but the result is always : { "text" : "" } It's not a problem with my file because when I use in DOS "vosk-...

Rémi Descamps

29

asked Dec 5, 2024 at 1:23

0 votes

2 answers

52 views

Calculate The Delay of The Recording File

I'm building an application to calculate delay based on keywords found. The method I used is not accurate or even wrong (error). The methods used are as follows: @Override public void onResults(Bundle ...

Sir Arhm

11

asked Dec 4, 2024 at 17:08

0 votes

1 answer

55 views

How to Use OpenTok SDK and Speech Recognizer Simultaneously for Audio/Video Calls in Android?

We have developed an audio/video calling feature using the OpenTok SDK in our Android app. Now, we need to integrate the SpeechRecognizer API to transcribe voice to text during an ongoing OpenTok call....

Nihar Prabhu

1

asked Dec 2, 2024 at 6:47

0 votes

1 answer

111 views

RecognizerIntent.EXTRA_LANGUAGE recently doesn't change Recongnizer language

I have a code in my application which recognize "Persian" language and make a Speech-to-text function: Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent....

Ashile The Great

151

asked Nov 30, 2024 at 6:59

0 votes

0 answers

294 views

Montreal Forced Aligner(MFA) taking too much time(almost 18 days still going on) to train a 33 GB corpus

WE are using Montreal Forced Aligner (MFA) 3.x to train an acoustic model on a large dataset (~33GB of audio and transcripts in an Indian language). The training process takes an extremely long time(...

Swayangjit

1,881

asked Nov 26, 2024 at 5:53

2 votes

0 answers

53 views

How can I prevent muffled audio?

I am making an assistant in Python using SpeechRecognition and some other libraries. I use this library for both getting the voice and turning it into text. But when I try to listen to some audio, ...

PrinceMask

23

asked Nov 23, 2024 at 13:56

-1 votes

1 answer

72 views

Speech to Text prints only one sentence at a time

I'm building a swift app that allows a user to speak into their phone and save the transcription into a textview. The issue comes in when after I speak a sentence and it transcribes it, the textfield ...

Le'Anthony Howell

1

asked Nov 5, 2024 at 22:02

Collectives™ on Stack Overflow