555 questions
0
votes
0
answers
52
views
TarsosDSP Pitch Detection Implementation: Sudden Pitch Drops After Note Release with FFT_YIN
Introduction
I'm building a real-time pitch detection app in Kotlin/Android using TarsosDSP. The app captures audio input, detects the fundamental frequency using the FFT_YIN algorithm, and displays ...
0
votes
0
answers
169
views
How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)
I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code:
import { pipeline } from '@...
0
votes
0
answers
136
views
Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?
I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication.
My code captures live ...
2
votes
0
answers
101
views
Speaker Diarization
I need to upload an audio file where two or more speakers are having a conversation, and at times their speech overlaps. The requirement is to segment the audio into distinct chunks, each ...
1
vote
0
answers
193
views
How to Link Zoom (X-Axis) of Two Separate Plotly Plots in Streamlit?
I want to visualize audio data in Streamlit with two separate Plotly plots: one for the Time Domain waveform and one for the MFCC (Mel-frequency cepstral coefficients). I want to link their X-axes so ...
0
votes
0
answers
57
views
PJSIP audio has low volume in beginning, if aec enabled
After a call negotiated and connected, for first 5 seconds (approx.), outgoing (tx) sound very low and many times distorted. If after a long time silence, same case occurred. If we disable aec (...
0
votes
1
answer
58
views
None Gradients for a model with 2 outputs
I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned ...
1
vote
0
answers
249
views
Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech
I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and ...
0
votes
0
answers
319
views
Trying to Collect Audio from YouTube video without Downloading
I am trying to get just the audio data of songs from YouTube videos to analyze without downloading (Python). I started with using yt-dlp with the following code
def search_youtube(song_name, ...
0
votes
0
answers
70
views
PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores
I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices.
Environment
pyannote....
1
vote
0
answers
97
views
Why can't I send an audio stream from JavaScript via SignalR to a .NET Hub?
I’m trying to send an audio stream captured in JavaScript from a browser tab to a .NET SignalR Hub. My goal is to stream audio in chunks/realTime to the server and broadcast it to all connected ...
0
votes
1
answer
191
views
How to Restrict Azure Speech SDK AudioConfig to Only System Audio and Exclude Microphone Input?
Question:
I am working on a Blazor project where I integrate Azure Speech Service to perform speech-to-text transcription on system audio during screen sharing. However, I am facing an issue where ...
1
vote
0
answers
38
views
How do DAW's process sends/returns with busses if they are routed to themselves?
I'm looking into creating the sends/returns/busses logic for my own audio application. When checking existing DAW's (Logic Pro / Ableton Live) I noticed you can make a routing-loop by sending busses/...
1
vote
0
answers
95
views
Trouble converting PDM audio from a Seeed Studio XIAO nRF52840 for speech transcription — only getting white noise
I'm currently working on an iOS app that uses Bluetooth to stream audio data from a Seeed Studio XIAO nRF52840 (Sense) board, which has a PDM microphone. The board is running the OMI Friend firmware, ...
0
votes
0
answers
52
views
How to calculate the total number of different frequencies that are present in audio signal spectrum?
How to calculate the total number of different frequencies that are present in audio signal spectrum? Is that possible? Linux platform.
0
votes
1
answer
350
views
ffmpeg command problem in python to remove silences at the beggining and end. error: returned non-zero exit status 4294967274
I have a problem regarding ffmpeg use in jupyter notebooks. I'm trying to use this funcion for an audio, but I get the below error, I have tried changing the file to wav and different specifications ...
1
vote
0
answers
75
views
Trying to build scream detection using pretrained model
I have been wondering why does “smart” phone need elaborate manual steps to trigger SOS, when it already has enough inputs to detect panic like mic, camera, gps, gyroscope, etc.
I found this model (...
1
vote
0
answers
29
views
Python `speech_recognition` Script Stuck on `recognizer.listen()` Without `phrase_time_limit`
I'm working with the speech_recognition library in Python, and my script is getting stuck on the recognizer.listen() call. I want to continuously listen for commands after detecting the wake word but ...
0
votes
1
answer
146
views
Speaker identification embeddings audio fragment length
I have a base of audio samples matched with concrete speaker like
nick_sample1.mp3
nick_sample2.mp3
...
nick_sampleN.mp3
john_sample1.mp3
john_sample2.mp3
...
john_sampleK.mp3
The task is to match a ...
0
votes
0
answers
378
views
How can I identify silences in audio in Node.js?
I'm working on implementing a real time chatbot using openai's GPT4o model, so far I've got mostly everything working except one key piece which is knowing when the audio streamed by the frontend is ...
1
vote
1
answer
999
views
How to obtain confidence scores from transcriptions using faster_whisper?
I am using the faster_whisper Python library to transcribe audio files. Currently, I am able to get the transcriptions of audio data, but I'm unable to retrieve confidence scores for these ...
1
vote
0
answers
169
views
Increase volume from Google's Text-to-Speech (WAV Audio processing, React Native)
I am using Google's Text to Speech API on my backend. I want to play the resulting audio in my Expo application, but I have a low volume problem on physical devices (but not on simulators).
Right now, ...
1
vote
2
answers
107
views
Using DCT to create real-time "levels" animation for microphone input
For context: I'm trying to create a simple "level monitor" animation of audio data streaming from a microphone. I'm running this code on an iOS device and leaning heavily on the Accelerate ...
0
votes
1
answer
109
views
Not able to use harmonic and rp_entropy function from librosa
I'm trying to extract certain features from an audio file using librosa, but it kept raising an AttributeError for the harmonic and entropy functions. I tried to change my python version from 3.8 to 3....
0
votes
1
answer
283
views
Automating Copyrighted Music Silencing in YouTube Videos Using the YouTube API
I currently manually mute copyrighted songs in my YouTube videos by navigating to the video, clicking on “Restrictions,” and then muting the specific copyrighted track. However, I’d like to achieve ...
2
votes
0
answers
52
views
Clicking/distortion noise at start of mixed audio in java
I am attempting to mix multiple .wav sound files (merging musical notes to create a chord) in Java. The end result is mostly great, except during the first 23ms of the mixed audio where there's a loud ...
1
vote
1
answer
225
views
ToneJS PitchShift with MediaStream
I'm currently building an app with pitch-shifting functionality, and I've found out that ToneJS can do that job. I'd like to know if I can extract only the pitch-shifted part of a track from a media ...
0
votes
2
answers
912
views
Apply gain to specific frequencies using pyDub
I want to increase the volume of a specific frequency in a wav file, making them louder (more audible) then the rest of the frequencies.
What I've done so far (or at least I believe to) is to find the ...
0
votes
1
answer
294
views
Sounddevice Output Overflow
I have troubles of unknown kind with the sounddevice module for Python.
The code below outputs a GUI where you can output a sine wave of a user-defined frequency. Additionaly, one can modulate its ...
0
votes
1
answer
313
views
Query by Example(Searching in audio database using audio query)
I have been given audio database consisting of speech recordings. Speech can be in any language. So transcrips for speech are not available.
Now I will be given one query. I want to see for which ...
1
vote
0
answers
64
views
Are there any libraries/APIs that can take a large audio file and identify music in it?
I'm trying to process large video/audio files and extract timestamps and songs played during the video.
For example, processing a large Twitch stream VOD to find out that songs A, B and C were played ...
1
vote
1
answer
1k
views
Writing to Virtual Audio Cable via Python (PyAudio, Sounddevice)
I am trying to write data to a virtual audio cable using python. So far I don't really care which module I use, so I tried PyAudio and Sounddevive. I installed the Virtual Cable from VB-Audio. It does ...
0
votes
0
answers
181
views
How can I trim silence at the start and end of a recording(wav) in python?
I am trying to remove the silence at the start and end of this recording, so that I just have the voice in between, no need to remove any silence in between, just the start and end portions of the ...
0
votes
1
answer
416
views
Is there a way to get audio from an opencv webcam capture? [duplicate]
I am trying to do some audio processing in python, but I need both the video and audio as I wish to output both when I am done. I have looked up other examples, but they all use video files instead of ...
4
votes
2
answers
3k
views
How can I implement real-time sentiment analysis on live audio streams using Python?
I'm currently working on a project where I need to perform real-time sentiment analysis on live audio streams using Python. The goal is to analyze the sentiment expressed in the spoken words and ...
1
vote
1
answer
457
views
Passing a numpy audio array around different audio libraries
I'm working on a project which involved numerous audio-processing tasks with text-to-speech, but I've hit a small snag. I'm going to be processing possibly hundreds of TTS audio segments, so I want to ...
0
votes
1
answer
1k
views
Offset in timing when transcribing noise-reduced audio with OpenAI's Whisper
I'm working on a project that involves transcribing audio files using OpenAI's Whisper. To improve the quality of the transcriptions, I'm trying to reduce the noise in my audio files using the ...
0
votes
1
answer
871
views
android.media.audiofx how to create a 10 band equilizer
I am trying to write my own MP3 player using .NET MAUI.
One of the libraries are based on android media player. As I read in the documentation it has an Equalizer class in the android.media.audiofx ...
0
votes
0
answers
165
views
How do I calculate the frequency of a sound using python
I have a .wav file, and I need to write code that tells me the frequency in hz of the sound. How do I go about this? I've done some research and all of them point me to something thats related to fft, ...
0
votes
1
answer
700
views
Automatic separation between consonants and vowels in speech recording
Given an audio file of speech, (for example the file you can download from here), that looks like this:
If we examine it we will find that the vowels are the areas with the largest amplitude, and the ...
1
vote
1
answer
891
views
Understanding mel-scaled spectrogram for a simple sine wave
I generate a simple sine wave with a frequency of 100 and calculate an FFT to check that the obtained frequency is correct.
Then I calculate melspectrogram but do not understand what its output means? ...
1
vote
1
answer
283
views
Modify ALAC channel configuration inside M4V (MP4) container
I want to modify channel configuration of an ALAC audio stream inside MP4-container without multiplexing the container.
So I need to change letter H of Audio Data Transport Stream with a hex editor ...
0
votes
1
answer
44
views
Abnormal sound prediction predicts no abnormal sound for all frame
I'm trying to write a function that detect all abnormal shot sound in a audio file extracted from the video file using moviepy. The result should be a dataframe that contains the frame count of video ...
2
votes
1
answer
89
views
"ValueError: x and y must have same first dimension" when trying to plot the signal amplitude of a wav file using Python
Using Python, I am trying to plot the signal amplitude of a wav file, however I am getting the following error "ValueError: x and y must have same first dimension". Here is my code:
import ...
2
votes
2
answers
914
views
No mic get detected as sound.query_devices() returns empty list?
Im trying to get the feed of the mic using "sounddevice" library in Python.
import sounddevice as sd
print(sd.query_devices())
But it returns empty list.
I tried arecord -f cd -d 6 test....
0
votes
0
answers
460
views
Is there way to extract data audio from an mp3 file picked by an user on Android Studio?
My project involves working with audio files to process the audio signal they contain to compare them to other files. The main problem I've been facing for many hours now is that I don't know how or ...
0
votes
1
answer
101
views
take audio input if user saying command else recognize that the user is quiet and terminate the audio input loop/ listening from microphone
I'm trying to build a voice assistant. I'm using speech_recognition library for the same. Below is my code:
def takeCommand():
print("taking command")
r = Recognizer()
m = Microphone()
with ...
4
votes
1
answer
4k
views
How to crop an audio file based on the timestamps present in a list
So, I have an audio file which is very long in duration. I have manual annotations (start and end duration in seconds) of the important parts which I need from the whole audio in a text file. I have ...
-1
votes
1
answer
82
views
Audio processing: How to obtain similar data from audio records that are recorded by different microphones, by the same person
I am currently developing a speaker recognition program which should recognize the speaker by listening the microphone. I'm a newbie at audio processing and machine learning, but I trained a neural ...
1
vote
0
answers
461
views
How to extend audio file into a new duration - adding silent to beginning and ending of the audio
I have a wav file named as a.wav.
Let's say it is 9.432 seconds.
I want to extend it to 13.321 seconds.
So I need to add silence to both beginning and end of video to 1.945 seconds.
The reason is I am ...