5,355 questions
0
votes
0
answers
49
views
PyInstaller executable throws “OSError: [WinError 50] The request is not supported” when using speech_recognition (FLAC error)
I’m building a Python voice assistant using the speech_recognition library.
Everything works perfectly when I run the code from PyCharm or the terminal,
but when I convert it to an .exe using Auto Py ...
1
vote
1
answer
55
views
TTS onDone callback never fires on Samsung (Android 15) post-SpeechRecognizer, even with AUDIOFOCUS_REQUEST_GRANTED
I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight.
I am developing a voice assistant setup flow where the app ...
0
votes
1
answer
80
views
Amazon Nova Sonic — should contentStart / contentEnd be sent once per session or once per user turn?
I'm integrating Amazon Nova Sonic (the speech-to-speech foundation model available through Amazon Bedrock) using the bidirectional streaming API
The official Amazon Nova Sonic User Guide explains that:...
1
vote
1
answer
75
views
How to transcribe audio files (m4a/wav) on Android? Can SpeechRecognizer API be used for this?
I have an audio file (in .m4a / .wav format) stored on the Android device, and I need to transcribe the speech content from it into text.
From my understanding, the built-in SpeechRecognizer API in ...
0
votes
1
answer
48
views
Android: Google Recognizer Intent: EXTRA_PREFER_OFFLINE and API 33+
Consider this Kotlin code to init a Google speech recognizer:
recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
.apply {
putExtra(
...
0
votes
0
answers
76
views
How to handle limitations and platform differences when using expo-speech-recognition for voice input?
I’m implementing a virtual assistant in my Expo app and want to use expo-speech-recognition for voice input. I’ve read that Android and iOS handle speech recognition differently at the engine level:
...
0
votes
1
answer
70
views
How to remove the overlay STT box from the SpeechRecognizer API in Android Studio?
While making a STT app in Android Studio (Jetpack Compose). I encountered this in the SpeechRecognizer when I ran the app:
STT in app
I want to delete that so the UI looks more clean. Is there a way ...
0
votes
0
answers
30
views
Azure Speech Service Speaker Diarization: How to Optimize Real-Time Transcription Latency (Node.js + Angular)
I'm using Azure Speech-to-Text with speaker diarization in a real-time transcription app.
Backend: Node.js (v18), using microsoft-cognitiveservices-speech-sdk and WebSocket server.
Frontend: Angular (...
0
votes
0
answers
78
views
Python TensorFlow Speech Recognition -1073741819 (0xC0000005) Error
I'm working on a speech recognition project using TensorFlow in Python. Normally, TensorFlow can only be used with a CPU or an NVIDIA GPU. I have an AMD Radeon 7600S GPU. Because of this, I installed ...
1
vote
0
answers
73
views
Voice Wake Word Not Working on Mobile Browsers Using SpeechRecognition in React
I'm building a React web app that uses the Web Speech API (SpeechRecognition) to detect a wake word (like "Hey Wiz") from the user’s microphone input.
The functionality works perfectly in ...
0
votes
0
answers
76
views
Standalone Android application with AlphaCephei (Vosk) library
I need to integrate the AlphaCephei library to my Android application.
I found a sample but it contains two modules - one is app with demo functionality, and another one is model located in the ...
0
votes
1
answer
144
views
How to detect speech silence in Twilio Media Streams for real-time transcription using deepgram?
Twilio continuously sends audio chunks every 20 milliseconds, even during periods of silence. These chunks may contain silent audio data, making it challenging to identify "real silence" by ...
0
votes
0
answers
51
views
Speech recognition model giving garbled output
I used the following github repo: Speech Recognition.
But since it didn't have code to train and save the model, I looked online and added code to speech_recognition to train and save the model and ...
0
votes
0
answers
163
views
How to create a speech recognition model from scratch in Python
I am looking to create a speech recognition model from scratch without using an existing model. I have already used Whisper successfully but I need to create a model that I can train myself whose ...
0
votes
1
answer
398
views
Why is Ollama answering every question and past question I have asked?
I am currently hosting Ollama locally on my laptop and importing it into a Python file. Every time I ask it a question, I append it to my 'messages' array. I then feed the entire 'messages' array to ...
0
votes
1
answer
98
views
In Apple's Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different ...
0
votes
1
answer
126
views
WebRTC video issue in iOS-Safari browsers
We use Microsoft Avatar service as in below sample. the Avatar video is generated through a TURN server & sent to our app.. In iOS safari browsers alone, on first load only, WebRTC Audio track ...
0
votes
0
answers
51
views
how to read data from asterisk 11 to agi server ding dong in case of speech to text
I have asterisk server and nodejs agi server (ding-dong npm lib).
I want to enable speech to text so that i can NLP.
One quick way is to record the file on asterisk and then do stt using google speech ...
2
votes
0
answers
63
views
How to perform speech recognition and audio recording simultaneously on Android?
[Q] How to record and transcribe (STT) audio at the same time on Android?
I'm building a feature in an Android app that allows users to speak a sentence — the app needs to recognize the speech in real ...
0
votes
0
answers
45
views
Problem: Text Not Pasted After speech Recognition
The code uses Vosk for speech recognition and is supposed to paste the transcribed text into the current input field using pyperclip.copy() and pyautogui.hotkey('ctrl', 'v'). The speech is recognized ...
1
vote
0
answers
73
views
Android SpeechRecognizer start sound too low or missing when using Bluetooth SCO audio routing (SDK 23–35)
I’m working on an Android app that includes hands-free voice interaction using the SpeechRecognizer API. It must be compatible from SDK 23 to 35.
In most usage scenarios, the app runs outdoors with ...
0
votes
1
answer
143
views
I cant seem to get my Android App to work with Vosk in MacOS12.7
I have been through the wringer trying to get Voice Recognition for an Android App i'm developing on my MacOS 12.7, using Python3.10, Kivy2.3, speechrecognition requirement, and Vosk 3.44, ive been ...
0
votes
1
answer
229
views
React Speech Recognition not working on mobile browser
I'm working on a Next.js project and trying to use react-speech-recognition. It works well on Chrome Desktop, but it doesn't work on Chrome Android. I also tried the push-to-talk method, but it still ...
0
votes
0
answers
58
views
How to Capture Loopback Audio with SpeechRecognition (PyAudio)?
I’m working on a project where I need to use the speech_recognition module to process audio in real-time. However, from my research, it seems that speech_recognition (which works with pyaudio) only ...
1
vote
1
answer
116
views
window.SpeechRecognition || window.webkitSpeechRecognition; is not working
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1....
0
votes
0
answers
45
views
SpeechRecognition and Pocket Sphinx doesn't work
I'm following a youtube tutorial to make an assistant in python. When I say "Piggy", it responds as usual, but after that, when I say how are you, it gives me this error. The guy in the ...
0
votes
0
answers
75
views
Error with speech recognition: not-allowed
<!DOCTYPE html>
<script>
// Wait for the DOM to load before setting up the event listener
document.addEventListener("DOMContentLoaded", function() {
const ...
0
votes
1
answer
158
views
Unable to install pyaudio via homebrew
In file included from src/pyaudio/device_api.c:1:
In file included from src/pyaudio/device_api.h:7:
/Library/Frameworks/Python.framework/Versions/3.13/include/python3.13/Python.h:19:10: ...
0
votes
1
answer
94
views
Unable to export custom language model data (Speech framework)
I am trying to customise language model but face the error when exporting.
I created a project and copied example code from Apple:
import Speech
class Data {
func export() async throws {
...
0
votes
0
answers
63
views
"BrokenPipeError: [Errno 32] Broken pipe" when sending a python scripts output to a while loop
Been pulling my brains out for a few hours now, I cannot seem to get this working, I have been to the 3rd page of google results but I cannot seem to get this right.
code:
#!/bin/bash
python3.12 -m ...
1
vote
1
answer
119
views
Web Audio API preprocessing not improving Azure Speech SDK recognition accuracy for real-time meeting transcription
I'm working on a real-time speech-to-text application where microphone input is processed through Web Audio API before being sent to Azure Speech SDK. The main issue is that some audio content is ...
0
votes
2
answers
210
views
Flutter: Voice command 'open' not activating microphone listening state with Azure Speech Services
I'm building a Flutter application that uses Azure Speech Services for voice commands. When I say "open", the microphone should start listening (indicated by turning red), but it's not ...
0
votes
0
answers
70
views
PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices.
Environment
pyannote....
0
votes
1
answer
143
views
How do I send transcribed text from speech directly to another endpoint or an azure function from my speech resource?
I am using Azure Speech Service resource to transcribe real time audio from my mic using microsoft-cognitiveservices-speech-sdk. I want to send the transcribed text to another endpoint (or to an azure ...
0
votes
0
answers
73
views
Getting "_loaded : false" even after installing the @react-native-voice/voice and using the provided example code. React Native 0.77.0
I am using react native 0.77.0 and facing this
{_loaded: false, _listeners: null, _events: {…}}
_events : {onSpeechStart: ƒ, onSpeechRecognized: ƒ, onSpeechEnd: ƒ, onSpeechError: ƒ, onSpeechResults: ƒ,...
0
votes
1
answer
388
views
Azure speech service continuous speech recognition
I'm pretty new to Azure speech service and I'm using twilo/plivo service for connecting a number with azure stt and process it further after transcription.
My problem is when I speak something, it's ...
0
votes
1
answer
260
views
Azure Pronunciation Assessment Could not deserialize speech context error
I am trying to implement a pronunciation assessment system using Azure's JS SDK (see doc).
I get the following error in console:
"Could not deserialize speech context. websocket error code: 1007&...
0
votes
0
answers
240
views
How to use speech_recognition and pyannote.audio simultaneously
How can I use the data from speech_recognition's listen() function as an embedding to compare with previously recorded .wav files of different speakers talking so that I can print (speaker): (...
0
votes
0
answers
109
views
Why is Google recognizer missing in my SpeechRecognition library?
I'm trying to use the google recognizer from the SpeechRecognition library in Python
import speech_recognition as sr
rec = sr.Recognizer()
with sr.Microphone() as mic:
rec....
1
vote
0
answers
56
views
speech_recognition and gtts don't understand numbers lower than 11
I put together straightforward code that asks the user to choose between option 1, oranges, and option 2, pears:
options = {
(1, "1", "one", "number one", "...
1
vote
1
answer
173
views
WebSocket Connection Issue with Docker Compose and React JS
I'm encountering an issue with local deployment. The problem is with WebSocket connections when using Docker Compose to run a Kaldi server and a React JS frontend. The setup works fine when the Kaldi ...
0
votes
1
answer
122
views
Vosk speech to text stops working when i disconnect my external mic
In tauri JS app I am recording audio from JS and processing it and sending data to python child process through a rust handler. In python script i am using vosk to convert speech to text on real time.
...
0
votes
1
answer
122
views
How to enable word level Confidence for MS Azure Speech to Text Service for Node JS
According to this, it's possible to get per word confidence levels in the JSON output for the Azure STT service. The issue is that I cannot seem to find out how to do this using the Node JS library (...
0
votes
2
answers
5k
views
Use Vosk speech recognition with Python
I'm trying to use Vosk speech recognition in a Python script, but the result is always :
{
"text" : ""
}
It's not a problem with my file because when I use in DOS "vosk-...
0
votes
2
answers
52
views
Calculate The Delay of The Recording File
I'm building an application to calculate delay based on keywords found. The method I used is not accurate or even wrong (error). The methods used are as follows:
@Override
public void onResults(Bundle ...
0
votes
1
answer
55
views
How to Use OpenTok SDK and Speech Recognizer Simultaneously for Audio/Video Calls in Android?
We have developed an audio/video calling feature using the OpenTok SDK in our Android app. Now, we need to integrate the SpeechRecognizer API to transcribe voice to text during an ongoing OpenTok call....
0
votes
1
answer
111
views
RecognizerIntent.EXTRA_LANGUAGE recently doesn't change Recongnizer language
I have a code in my application which recognize "Persian" language and make a Speech-to-text function:
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent....
0
votes
0
answers
294
views
Montreal Forced Aligner(MFA) taking too much time(almost 18 days still going on) to train a 33 GB corpus
WE are using Montreal Forced Aligner (MFA) 3.x to train an acoustic model on a large dataset (~33GB of audio and transcripts in an Indian language). The training process takes an extremely long time(...
2
votes
0
answers
53
views
How can I prevent muffled audio?
I am making an assistant in Python using SpeechRecognition and some other libraries. I use this library for both getting the voice and turning it into text.
But when I try to listen to some audio, ...
-1
votes
1
answer
72
views
Speech to Text prints only one sentence at a time
I'm building a swift app that allows a user to speak into their phone and save the transcription into a textview. The issue comes in when after I speak a sentence and it transcribes it, the textfield ...