Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
49 views

I’m building a Python voice assistant using the speech_recognition library. Everything works perfectly when I run the code from PyCharm or the terminal, but when I convert it to an .exe using Auto Py ...
Konstantin_Violinov's user avatar
1 vote
1 answer
55 views

I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight. I am developing a voice assistant setup flow where the app ...
Andrei Babenko's user avatar
0 votes
1 answer
80 views

I'm integrating Amazon Nova Sonic (the speech-to-speech foundation model available through Amazon Bedrock) using the bidirectional streaming API The official Amazon Nova Sonic User Guide explains that:...
JJ Kam's user avatar
  • 91
1 vote
1 answer
75 views

I have an audio file (in .m4a / .wav format) stored on the Android device, and I need to transcribe the speech content from it into text. From my understanding, the built-in SpeechRecognizer API in ...
Tushar raina's user avatar
0 votes
1 answer
48 views

Consider this Kotlin code to init a Google speech recognizer: recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) .apply { putExtra( ...
Yanay Lehavi's user avatar
0 votes
0 answers
76 views

I’m implementing a virtual assistant in my Expo app and want to use expo-speech-recognition for voice input. I’ve read that Android and iOS handle speech recognition differently at the engine level: ...
hoangnv_ral's user avatar
0 votes
1 answer
70 views

While making a STT app in Android Studio (Jetpack Compose). I encountered this in the SpeechRecognizer when I ran the app: STT in app I want to delete that so the UI looks more clean. Is there a way ...
Cold's user avatar
  • 1
0 votes
0 answers
30 views

I'm using Azure Speech-to-Text with speaker diarization in a real-time transcription app. Backend: Node.js (v18), using microsoft-cognitiveservices-speech-sdk and WebSocket server. Frontend: Angular (...
SGR's user avatar
  • 2,375
0 votes
0 answers
78 views

I'm working on a speech recognition project using TensorFlow in Python. Normally, TensorFlow can only be used with a CPU or an NVIDIA GPU. I have an AMD Radeon 7600S GPU. Because of this, I installed ...
Ömer Faruk Solmaz's user avatar
1 vote
0 answers
73 views

I'm building a React web app that uses the Web Speech API (SpeechRecognition) to detect a wake word (like "Hey Wiz") from the user’s microphone input. The functionality works perfectly in ...
Varun V's user avatar
  • 19
0 votes
0 answers
76 views

I need to integrate the AlphaCephei library to my Android application. I found a sample but it contains two modules - one is app with demo functionality, and another one is model located in the ...
Carlos's user avatar
  • 357
0 votes
1 answer
144 views

Twilio continuously sends audio chunks every 20 milliseconds, even during periods of silence. These chunks may contain silent audio data, making it challenging to identify "real silence" by ...
Lahfir's user avatar
  • 379
0 votes
0 answers
51 views

I used the following github repo: Speech Recognition. But since it didn't have code to train and save the model, I looked online and added code to speech_recognition to train and save the model and ...
FaisalShakeel's user avatar
0 votes
0 answers
163 views

I am looking to create a speech recognition model from scratch without using an existing model. I have already used Whisper successfully but I need to create a model that I can train myself whose ...
FaisalShakeel's user avatar
0 votes
1 answer
398 views

I am currently hosting Ollama locally on my laptop and importing it into a Python file. Every time I ask it a question, I append it to my 'messages' array. I then feed the entire 'messages' array to ...
Shmuck's user avatar
  • 13
0 votes
1 answer
98 views

I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different ...
colourmebrad's user avatar
0 votes
1 answer
126 views

We use Microsoft Avatar service as in below sample. the Avatar video is generated through a TURN server & sent to our app.. In iOS safari browsers alone, on first load only, WebRTC Audio track ...
balabp's user avatar
  • 23
0 votes
0 answers
51 views

I have asterisk server and nodejs agi server (ding-dong npm lib). I want to enable speech to text so that i can NLP. One quick way is to record the file on asterisk and then do stt using google speech ...
Code Guru's user avatar
  • 15.8k
2 votes
0 answers
63 views

[Q] How to record and transcribe (STT) audio at the same time on Android? I'm building a feature in an Android app that allows users to speak a sentence — the app needs to recognize the speech in real ...
임세현's user avatar
0 votes
0 answers
45 views

The code uses Vosk for speech recognition and is supposed to paste the transcribed text into the current input field using pyperclip.copy() and pyautogui.hotkey('ctrl', 'v'). The speech is recognized ...
reza's user avatar
  • 11
1 vote
0 answers
73 views

I’m working on an Android app that includes hands-free voice interaction using the SpeechRecognizer API. It must be compatible from SDK 23 to 35. In most usage scenarios, the app runs outdoors with ...
Pierre Wargnier's user avatar
0 votes
1 answer
143 views

I have been through the wringer trying to get Voice Recognition for an Android App i'm developing on my MacOS 12.7, using Python3.10, Kivy2.3, speechrecognition requirement, and Vosk 3.44, ive been ...
GX1705's user avatar
  • 3
0 votes
1 answer
229 views

I'm working on a Next.js project and trying to use react-speech-recognition. It works well on Chrome Desktop, but it doesn't work on Chrome Android. I also tried the push-to-talk method, but it still ...
Nico A.L's user avatar
0 votes
0 answers
58 views

I’m working on a project where I need to use the speech_recognition module to process audio in real-time. However, from my research, it seems that speech_recognition (which works with pyaudio) only ...
Priyal Deep's user avatar
1 vote
1 answer
116 views

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1....
Manzer A's user avatar
  • 3,926
0 votes
0 answers
45 views

I'm following a youtube tutorial to make an assistant in python. When I say "Piggy", it responds as usual, but after that, when I say how are you, it gives me this error. The guy in the ...
Jhpark0303's user avatar
0 votes
0 answers
75 views

<!DOCTYPE html> <script> // Wait for the DOM to load before setting up the event listener document.addEventListener("DOMContentLoaded", function() { const ...
Mohit ChaudharI's user avatar
0 votes
1 answer
158 views

In file included from src/pyaudio/device_api.c:1: In file included from src/pyaudio/device_api.h:7: /Library/Frameworks/Python.framework/Versions/3.13/include/python3.13/Python.h:19:10: ...
Archit Rajput's user avatar
0 votes
1 answer
94 views

I am trying to customise language model but face the error when exporting. I created a project and copied example code from Apple: import Speech class Data { func export() async throws { ...
Goran's user avatar
  • 1
0 votes
0 answers
63 views

Been pulling my brains out for a few hours now, I cannot seem to get this working, I have been to the 3rd page of google results but I cannot seem to get this right. code: #!/bin/bash python3.12 -m ...
usr_40476's user avatar
1 vote
1 answer
119 views

I'm working on a real-time speech-to-text application where microphone input is processed through Web Audio API before being sent to Azure Speech SDK. The main issue is that some audio content is ...
Su Myat's user avatar
  • 35
0 votes
2 answers
210 views

I'm building a Flutter application that uses Azure Speech Services for voice commands. When I say "open", the microphone should start listening (indicated by turning red), but it's not ...
pomoworko.com's user avatar
0 votes
0 answers
70 views

I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices. Environment pyannote....
user29588450's user avatar
0 votes
1 answer
143 views

I am using Azure Speech Service resource to transcribe real time audio from my mic using microsoft-cognitiveservices-speech-sdk. I want to send the transcribed text to another endpoint (or to an azure ...
Abdullah Nadeem's user avatar
0 votes
0 answers
73 views

I am using react native 0.77.0 and facing this {_loaded: false, _listeners: null, _events: {…}} _events : {onSpeechStart: ƒ, onSpeechRecognized: ƒ, onSpeechEnd: ƒ, onSpeechError: ƒ, onSpeechResults: ƒ,...
Aniket Biswas's user avatar
0 votes
1 answer
388 views

I'm pretty new to Azure speech service and I'm using twilo/plivo service for connecting a number with azure stt and process it further after transcription. My problem is when I speak something, it's ...
Henven's user avatar
  • 1
0 votes
1 answer
260 views

I am trying to implement a pronunciation assessment system using Azure's JS SDK (see doc). I get the following error in console: "Could not deserialize speech context. websocket error code: 1007&...
nico_lrx's user avatar
  • 300
0 votes
0 answers
240 views

How can I use the data from speech_recognition's listen() function as an embedding to compare with previously recorded .wav files of different speakers talking so that I can print (speaker): (...
Flamethrower's user avatar
0 votes
0 answers
109 views

I'm trying to use the google recognizer from the SpeechRecognition library in Python import speech_recognition as sr rec = sr.Recognizer() with sr.Microphone() as mic: rec....
Quasartioon's user avatar
1 vote
0 answers
56 views

I put together straightforward code that asks the user to choose between option 1, oranges, and option 2, pears: options = { (1, "1", "one", "number one", "...
Louie Morais's user avatar
1 vote
1 answer
173 views

I'm encountering an issue with local deployment. The problem is with WebSocket connections when using Docker Compose to run a Kaldi server and a React JS frontend. The setup works fine when the Kaldi ...
Fedor's user avatar
  • 121
0 votes
1 answer
122 views

In tauri JS app I am recording audio from JS and processing it and sending data to python child process through a rust handler. In python script i am using vosk to convert speech to text on real time. ...
Zeeshan Ahmad Khalil's user avatar
0 votes
1 answer
122 views

According to this, it's possible to get per word confidence levels in the JSON output for the Azure STT service. The issue is that I cannot seem to find out how to do this using the Node JS library (...
Matthew Knill's user avatar
0 votes
2 answers
5k views

I'm trying to use Vosk speech recognition in a Python script, but the result is always : { "text" : "" } It's not a problem with my file because when I use in DOS "vosk-...
Rémi Descamps's user avatar
0 votes
2 answers
52 views

I'm building an application to calculate delay based on keywords found. The method I used is not accurate or even wrong (error). The methods used are as follows: @Override public void onResults(Bundle ...
Sir Arhm's user avatar
0 votes
1 answer
55 views

We have developed an audio/video calling feature using the OpenTok SDK in our Android app. Now, we need to integrate the SpeechRecognizer API to transcribe voice to text during an ongoing OpenTok call....
Nihar Prabhu's user avatar
0 votes
1 answer
111 views

I have a code in my application which recognize "Persian" language and make a Speech-to-text function: Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent....
Ashile The Great's user avatar
0 votes
0 answers
294 views

WE are using Montreal Forced Aligner (MFA) 3.x to train an acoustic model on a large dataset (~33GB of audio and transcripts in an Indian language). The training process takes an extremely long time(...
Swayangjit's user avatar
  • 1,881
2 votes
0 answers
53 views

I am making an assistant in Python using SpeechRecognition and some other libraries. I use this library for both getting the voice and turning it into text. But when I try to listen to some audio, ...
PrinceMask's user avatar
-1 votes
1 answer
72 views

I'm building a swift app that allows a user to speak into their phone and save the transcription into a textview. The issue comes in when after I speak a sentence and it transcribes it, the textfield ...
Le'Anthony Howell's user avatar

1
2 3 4 5
108