Newest 'audio-processing' Questions

0 votes

0 answers

52 views

TarsosDSP Pitch Detection Implementation: Sudden Pitch Drops After Note Release with FFT_YIN

Introduction I'm building a real-time pitch detection app in Kotlin/Android using TarsosDSP. The app captures audio input, detects the fundamental frequency using the FFT_YIN algorithm, and displays ...

Manu campano ortega

3

asked Jul 25 at 10:30

0 votes

0 answers

169 views

How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)

I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code: import { pipeline } from '@...

piyush

1

asked Jul 10 at 8:44

0 votes

0 answers

136 views

Why is pyannote speaker diarization returning "Unknown" for speaker label in real-time audio processing?

I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication. My code captures live ...

Hadil Sghair

1

asked May 21 at 12:10

2 votes

0 answers

101 views

Speaker Diarization

I need to upload an audio file where two or more speakers are having a conversation, and at times their speech overlaps. The requirement is to segment the audio into distinct chunks, each ...

Anjali Pandey

21

asked May 14 at 11:33

1 vote

0 answers

193 views

How to Link Zoom (X-Axis) of Two Separate Plotly Plots in Streamlit?

I want to visualize audio data in Streamlit with two separate Plotly plots: one for the Time Domain waveform and one for the MFCC (Mel-frequency cepstral coefficients). I want to link their X-axes so ...

faith76

21

asked Apr 12 at 21:37

0 votes

0 answers

57 views

PJSIP audio has low volume in beginning, if aec enabled

After a call negotiated and connected, for first 5 seconds (approx.), outgoing (tx) sound very low and many times distorted. If after a long time silence, same case occurred. If we disable aec (...

Serdar KÖYLÜ

1

asked Mar 16 at 16:50

0 votes

1 answer

58 views

None Gradients for a model with 2 outputs

I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned ...

Zahra Kokhazad

9

asked Feb 24 at 15:15

1 vote

0 answers

249 views

Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and ...

dannym25

11

asked Feb 21 at 22:02

0 votes

0 answers

319 views

Trying to Collect Audio from YouTube video without Downloading

I am trying to get just the audio data of songs from YouTube videos to analyze without downloading (Python). I started with using yt-dlp with the following code def search_youtube(song_name, ...

Anthony Reid

119

asked Feb 14 at 1:22

0 votes

0 answers

70 views

PyAnnote Speaker Verification: All Speakers Getting Perfect 1.000 Similarity Scores

I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices. Environment pyannote....

user29588450

1

asked Feb 10 at 20:47

1 vote

0 answers

97 views

Why can't I send an audio stream from JavaScript via SignalR to a .NET Hub?

I’m trying to send an audio stream captured in JavaScript from a browser tab to a .NET SignalR Hub. My goal is to stream audio in chunks/realTime to the server and broadcast it to all connected ...

Levan Amashukeli

175

asked Jan 1 at 19:19

0 votes

1 answer

191 views

How to Restrict Azure Speech SDK AudioConfig to Only System Audio and Exclude Microphone Input?

Question: I am working on a Blazor project where I integrate Azure Speech Service to perform speech-to-text transcription on system audio during screen sharing. However, I am facing an issue where ...

Levan Amashukeli

175

asked Dec 19, 2024 at 19:50

1 vote

0 answers

38 views

How do DAW's process sends/returns with busses if they are routed to themselves?

I'm looking into creating the sends/returns/busses logic for my own audio application. When checking existing DAW's (Logic Pro / Ableton Live) I noticed you can make a routing-loop by sending busses/...

Rene

127

asked Nov 9, 2024 at 22:08

1 vote

0 answers

95 views

Trouble converting PDM audio from a Seeed Studio XIAO nRF52840 for speech transcription — only getting white noise

I'm currently working on an iOS app that uses Bluetooth to stream audio data from a Seeed Studio XIAO nRF52840 (Sense) board, which has a PDM microphone. The board is running the OMI Friend firmware, ...

guillaume olivieri

11

asked Nov 5, 2024 at 17:16

0 votes

0 answers

52 views

How to calculate the total number of different frequencies that are present in audio signal spectrum?

How to calculate the total number of different frequencies that are present in audio signal spectrum? Is that possible? Linux platform.

Lexx Luxx

284

asked Sep 14, 2024 at 11:33

0 votes

1 answer

350 views

ffmpeg command problem in python to remove silences at the beggining and end. error: returned non-zero exit status 4294967274

I have a problem regarding ffmpeg use in jupyter notebooks. I'm trying to use this funcion for an audio, but I get the below error, I have tried changing the file to wav and different specifications ...

Teko JR

21

asked Sep 1, 2024 at 17:02

1 vote

0 answers

75 views

Trying to build scream detection using pretrained model

I have been wondering why does “smart” phone need elaborate manual steps to trigger SOS, when it already has enough inputs to detect panic like mic, camera, gps, gyroscope, etc. I found this model (...

Chakradar Raju

2,821

asked Aug 25, 2024 at 17:13

1 vote

0 answers

29 views

Python `speech_recognition` Script Stuck on `recognizer.listen()` Without `phrase_time_limit`

I'm working with the speech_recognition library in Python, and my script is getting stuck on the recognizer.listen() call. I want to continuously listen for commands after detecting the wake word but ...

Sandip Mishra

11

asked Aug 7, 2024 at 10:19

0 votes

1 answer

146 views

Speaker identification embeddings audio fragment length

I have a base of audio samples matched with concrete speaker like nick_sample1.mp3 nick_sample2.mp3 ... nick_sampleN.mp3 john_sample1.mp3 john_sample2.mp3 ... john_sampleK.mp3 The task is to match a ...

Anton Maiorov

183

asked Jul 2, 2024 at 8:53

0 votes

0 answers

378 views

How can I identify silences in audio in Node.js?

I'm working on implementing a real time chatbot using openai's GPT4o model, so far I've got mostly everything working except one key piece which is knowing when the audio streamed by the frontend is ...

santiago calvo

103

asked May 25, 2024 at 23:46

1 vote

1 answer

999 views

How to obtain confidence scores from transcriptions using faster_whisper?

I am using the faster_whisper Python library to transcribe audio files. Currently, I am able to get the transcriptions of audio data, but I'm unable to retrieve confidence scores for these ...

Hankie

45

asked May 13, 2024 at 10:10

1 vote

0 answers

169 views

Increase volume from Google's Text-to-Speech (WAV Audio processing, React Native)

I am using Google's Text to Speech API on my backend. I want to play the resulting audio in my Expo application, but I have a low volume problem on physical devices (but not on simulators). Right now, ...

Martin

71

asked May 6, 2024 at 21:35

1 vote

2 answers

107 views

Using DCT to create real-time "levels" animation for microphone input

For context: I'm trying to create a simple "level monitor" animation of audio data streaming from a microphone. I'm running this code on an iOS device and leaning heavily on the Accelerate ...

Joshua Sullivan

1,129

asked Apr 29, 2024 at 3:21

0 votes

1 answer

109 views

Not able to use harmonic and rp_entropy function from librosa

I'm trying to extract certain features from an audio file using librosa, but it kept raising an AttributeError for the harmonic and entropy functions. I tried to change my python version from 3.8 to 3....

kevinrotern

21

asked Apr 12, 2024 at 18:10

0 votes

1 answer

283 views

Automating Copyrighted Music Silencing in YouTube Videos Using the YouTube API

I currently manually mute copyrighted songs in my YouTube videos by navigating to the video, clicking on “Restrictions,” and then muting the specific copyrighted track. However, I’d like to achieve ...

s.mai

33

asked Mar 7, 2024 at 11:33

2 votes

0 answers

52 views

Clicking/distortion noise at start of mixed audio in java

I am attempting to mix multiple .wav sound files (merging musical notes to create a chord) in Java. The end result is mostly great, except during the first 23ms of the mixed audio where there's a loud ...

TJRC

444

asked Mar 5, 2024 at 13:05

1 vote

1 answer

225 views

ToneJS PitchShift with MediaStream

I'm currently building an app with pitch-shifting functionality, and I've found out that ToneJS can do that job. I'd like to know if I can extract only the pitch-shifted part of a track from a media ...

Nazarii Shvets

13

asked Jan 31, 2024 at 15:31

0 votes

2 answers

912 views

Apply gain to specific frequencies using pyDub

I want to increase the volume of a specific frequency in a wav file, making them louder (more audible) then the rest of the frequencies. What I've done so far (or at least I believe to) is to find the ...

Rosilda

3

asked Dec 21, 2023 at 5:53

0 votes

1 answer

294 views

Sounddevice Output Overflow

I have troubles of unknown kind with the sounddevice module for Python. The code below outputs a GUI where you can output a sine wave of a user-defined frequency. Additionaly, one can modulate its ...

martinr

1

asked Dec 10, 2023 at 12:52

0 votes

1 answer

313 views

Query by Example(Searching in audio database using audio query)

I have been given audio database consisting of speech recordings. Speech can be in any language. So transcrips for speech are not available. Now I will be given one query. I want to see for which ...

Arun Labana

1

asked Nov 10, 2023 at 11:45

1 vote

0 answers

64 views

Are there any libraries/APIs that can take a large audio file and identify music in it?

I'm trying to process large video/audio files and extract timestamps and songs played during the video. For example, processing a large Twitch stream VOD to find out that songs A, B and C were played ...

ruzat

11

asked Oct 16, 2023 at 21:30

1 vote

1 answer

1k views

Writing to Virtual Audio Cable via Python (PyAudio, Sounddevice)

I am trying to write data to a virtual audio cable using python. So far I don't really care which module I use, so I tried PyAudio and Sounddevive. I installed the Virtual Cable from VB-Audio. It does ...

JRolfes

11

asked Sep 24, 2023 at 11:34

0 votes

0 answers

181 views

How can I trim silence at the start and end of a recording(wav) in python?

I am trying to remove the silence at the start and end of this recording, so that I just have the voice in between, no need to remove any silence in between, just the start and end portions of the ...

Ijaz Ahmed

5

asked Sep 22, 2023 at 9:20

0 votes

1 answer

416 views

Is there a way to get audio from an opencv webcam capture? [duplicate]

I am trying to do some audio processing in python, but I need both the video and audio as I wish to output both when I am done. I have looked up other examples, but they all use video files instead of ...

Bengemon825

1

asked Aug 21, 2023 at 1:58

4 votes

2 answers

3k views

How can I implement real-time sentiment analysis on live audio streams using Python?

I'm currently working on a project where I need to perform real-time sentiment analysis on live audio streams using Python. The goal is to analyze the sentiment expressed in the spoken words and ...

Aqurds

59

asked Aug 16, 2023 at 19:15

1 vote

1 answer

457 views

Passing a numpy audio array around different audio libraries

I'm working on a project which involved numerous audio-processing tasks with text-to-speech, but I've hit a small snag. I'm going to be processing possibly hundreds of TTS audio segments, so I want to ...

Tessa Painter

2,024

asked Aug 13, 2023 at 13:39

0 votes

1 answer

1k views

Offset in timing when transcribing noise-reduced audio with OpenAI's Whisper

I'm working on a project that involves transcribing audio files using OpenAI's Whisper. To improve the quality of the transcriptions, I'm trying to reduce the noise in my audio files using the ...

Hana Baron

1

asked Jun 25, 2023 at 9:14

0 votes

1 answer

871 views

android.media.audiofx how to create a 10 band equilizer

I am trying to write my own MP3 player using .NET MAUI. One of the libraries are based on android media player. As I read in the documentation it has an Equalizer class in the android.media.audiofx ...

Wasyster

2,583

asked Jun 23, 2023 at 19:20

0 votes

0 answers

165 views

How do I calculate the frequency of a sound using python

I have a .wav file, and I need to write code that tells me the frequency in hz of the sound. How do I go about this? I've done some research and all of them point me to something thats related to fft, ...

TupacShakur

1

asked Jun 8, 2023 at 18:12

0 votes

1 answer

700 views

Automatic separation between consonants and vowels in speech recording

Given an audio file of speech, (for example the file you can download from here), that looks like this: If we examine it we will find that the vowels are the areas with the largest amplitude, and the ...

codeDom

1,849

asked Jun 7, 2023 at 9:23

1 vote

1 answer

891 views

Understanding mel-scaled spectrogram for a simple sine wave

I generate a simple sine wave with a frequency of 100 and calculate an FFT to check that the obtained frequency is correct. Then I calculate melspectrogram but do not understand what its output means? ...

codeDom

1,849

asked Jun 6, 2023 at 4:51

1 vote

1 answer

283 views

Modify ALAC channel configuration inside M4V (MP4) container

I want to modify channel configuration of an ALAC audio stream inside MP4-container without multiplexing the container. So I need to change letter H of Audio Data Transport Stream with a hex editor ...

FLX

311

asked Jun 4, 2023 at 9:31

0 votes

1 answer

44 views

Abnormal sound prediction predicts no abnormal sound for all frame

I'm trying to write a function that detect all abnormal shot sound in a audio file extracted from the video file using moviepy. The result should be a dataframe that contains the frame count of video ...

iw2fs

19

asked May 7, 2023 at 10:27

2 votes

1 answer

89 views

"ValueError: x and y must have same first dimension" when trying to plot the signal amplitude of a wav file using Python

Using Python, I am trying to plot the signal amplitude of a wav file, however I am getting the following error "ValueError: x and y must have same first dimension". Here is my code: import ...

Jay

21

asked Apr 10, 2023 at 2:52

2 votes

2 answers

914 views

No mic get detected as sound.query_devices() returns empty list?

Im trying to get the feed of the mic using "sounddevice" library in Python. import sounddevice as sd print(sd.query_devices()) But it returns empty list. I tried arecord -f cd -d 6 test....

imtiaz ul Hassan

388

asked Apr 5, 2023 at 18:09

0 votes

0 answers

460 views

Is there way to extract data audio from an mp3 file picked by an user on Android Studio?

My project involves working with audio files to process the audio signal they contain to compare them to other files. The main problem I've been facing for many hours now is that I don't know how or ...

Adrien S.

1

asked Apr 2, 2023 at 19:29

0 votes

1 answer

101 views

take audio input if user saying command else recognize that the user is quiet and terminate the audio input loop/ listening from microphone

I'm trying to build a voice assistant. I'm using speech_recognition library for the same. Below is my code: def takeCommand(): print("taking command") r = Recognizer() m = Microphone() with ...

styles

27

asked Mar 23, 2023 at 9:08

4 votes

1 answer

4k views

How to crop an audio file based on the timestamps present in a list

So, I have an audio file which is very long in duration. I have manual annotations (start and end duration in seconds) of the important parts which I need from the whole audio in a text file. I have ...

Medium

43

asked Mar 14, 2023 at 13:30

-1 votes

1 answer

82 views

Audio processing: How to obtain similar data from audio records that are recorded by different microphones, by the same person

I am currently developing a speaker recognition program which should recognize the speaker by listening the microphone. I'm a newbie at audio processing and machine learning, but I trained a neural ...

Arjein

35

asked Mar 14, 2023 at 0:07

1 vote

0 answers

461 views

How to extend audio file into a new duration - adding silent to beginning and ending of the audio

I have a wav file named as a.wav. Let's say it is 9.432 seconds. I want to extend it to 13.321 seconds. So I need to add silence to both beginning and end of video to 1.945 seconds. The reason is I am ...

Furkan Gözükara

24k

asked Feb 11, 2023 at 13:33

Collectives™ on Stack Overflow