Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
52 views

Introduction I'm building a real-time pitch detection app in Kotlin/Android using TarsosDSP. The app captures audio input, detects the fundamental frequency using the FFT_YIN algorithm, and displays ...
Manu campano ortega's user avatar
0 votes
0 answers
169 views

I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code: import { pipeline } from '@...
piyush's user avatar
  • 1
0 votes
0 answers
136 views

I'm working on a real-time speech processing pipeline using pyannote-audio, and I’m using the pyannote/speaker-diarization-3.1 pipeline with Hugging Face token authentication. My code captures live ...
Hadil Sghair's user avatar
2 votes
0 answers
101 views

I need to upload an audio file where two or more speakers are having a conversation, and at times their speech overlaps. The requirement is to segment the audio into distinct chunks, each ...
Anjali Pandey's user avatar
1 vote
0 answers
193 views

I want to visualize audio data in Streamlit with two separate Plotly plots: one for the Time Domain waveform and one for the MFCC (Mel-frequency cepstral coefficients). I want to link their X-axes so ...
faith76's user avatar
  • 21
0 votes
0 answers
57 views

After a call negotiated and connected, for first 5 seconds (approx.), outgoing (tx) sound very low and many times distorted. If after a long time silence, same case occurred. If we disable aec (...
Serdar KÖYLÜ's user avatar
0 votes
1 answer
58 views

I have a model that has a GRU implementation inside and process audio samples. In each forward path I process a single sample of an audio file. To imitate the GRU behavior correctly, I have returned ...
Zahra Kokhazad's user avatar
1 vote
0 answers
249 views

I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and ...
dannym25's user avatar
0 votes
0 answers
319 views

I am trying to get just the audio data of songs from YouTube videos to analyze without downloading (Python). I started with using yt-dlp with the following code def search_youtube(song_name, ...
Anthony Reid's user avatar
0 votes
0 answers
70 views

I'm experiencing an issue with PyAnnote's speaker verification where all speakers are getting perfect similarity scores (1.000), even when they are clearly different voices. Environment pyannote....
user29588450's user avatar
1 vote
0 answers
97 views

I’m trying to send an audio stream captured in JavaScript from a browser tab to a .NET SignalR Hub. My goal is to stream audio in chunks/realTime to the server and broadcast it to all connected ...
Levan Amashukeli's user avatar
0 votes
1 answer
191 views

Question: I am working on a Blazor project where I integrate Azure Speech Service to perform speech-to-text transcription on system audio during screen sharing. However, I am facing an issue where ...
Levan Amashukeli's user avatar
1 vote
0 answers
38 views

I'm looking into creating the sends/returns/busses logic for my own audio application. When checking existing DAW's (Logic Pro / Ableton Live) I noticed you can make a routing-loop by sending busses/...
Rene's user avatar
  • 127
1 vote
0 answers
95 views

I'm currently working on an iOS app that uses Bluetooth to stream audio data from a Seeed Studio XIAO nRF52840 (Sense) board, which has a PDM microphone. The board is running the OMI Friend firmware, ...
guillaume olivieri's user avatar
0 votes
0 answers
52 views

How to calculate the total number of different frequencies that are present in audio signal spectrum? Is that possible? Linux platform.
Lexx Luxx's user avatar
  • 284
0 votes
1 answer
350 views

I have a problem regarding ffmpeg use in jupyter notebooks. I'm trying to use this funcion for an audio, but I get the below error, I have tried changing the file to wav and different specifications ...
Teko JR's user avatar
  • 21
1 vote
0 answers
75 views

I have been wondering why does “smart” phone need elaborate manual steps to trigger SOS, when it already has enough inputs to detect panic like mic, camera, gps, gyroscope, etc. I found this model (...
Chakradar Raju's user avatar
1 vote
0 answers
29 views

I'm working with the speech_recognition library in Python, and my script is getting stuck on the recognizer.listen() call. I want to continuously listen for commands after detecting the wake word but ...
Sandip Mishra's user avatar
0 votes
1 answer
146 views

I have a base of audio samples matched with concrete speaker like nick_sample1.mp3 nick_sample2.mp3 ... nick_sampleN.mp3 john_sample1.mp3 john_sample2.mp3 ... john_sampleK.mp3 The task is to match a ...
Anton Maiorov's user avatar
0 votes
0 answers
378 views

I'm working on implementing a real time chatbot using openai's GPT4o model, so far I've got mostly everything working except one key piece which is knowing when the audio streamed by the frontend is ...
santiago calvo's user avatar
1 vote
1 answer
999 views

I am using the faster_whisper Python library to transcribe audio files. Currently, I am able to get the transcriptions of audio data, but I'm unable to retrieve confidence scores for these ...
Hankie's user avatar
  • 45
1 vote
0 answers
169 views

I am using Google's Text to Speech API on my backend. I want to play the resulting audio in my Expo application, but I have a low volume problem on physical devices (but not on simulators). Right now, ...
Martin's user avatar
  • 71
1 vote
2 answers
107 views

For context: I'm trying to create a simple "level monitor" animation of audio data streaming from a microphone. I'm running this code on an iOS device and leaning heavily on the Accelerate ...
Joshua Sullivan's user avatar
0 votes
1 answer
109 views

I'm trying to extract certain features from an audio file using librosa, but it kept raising an AttributeError for the harmonic and entropy functions. I tried to change my python version from 3.8 to 3....
kevinrotern's user avatar
0 votes
1 answer
283 views

I currently manually mute copyrighted songs in my YouTube videos by navigating to the video, clicking on “Restrictions,” and then muting the specific copyrighted track. However, I’d like to achieve ...
s.mai's user avatar
  • 33
2 votes
0 answers
52 views

I am attempting to mix multiple .wav sound files (merging musical notes to create a chord) in Java. The end result is mostly great, except during the first 23ms of the mixed audio where there's a loud ...
TJRC's user avatar
  • 444
1 vote
1 answer
225 views

I'm currently building an app with pitch-shifting functionality, and I've found out that ToneJS can do that job. I'd like to know if I can extract only the pitch-shifted part of a track from a media ...
Nazarii Shvets's user avatar
0 votes
2 answers
912 views

I want to increase the volume of a specific frequency in a wav file, making them louder (more audible) then the rest of the frequencies. What I've done so far (or at least I believe to) is to find the ...
Rosilda's user avatar
0 votes
1 answer
294 views

I have troubles of unknown kind with the sounddevice module for Python. The code below outputs a GUI where you can output a sine wave of a user-defined frequency. Additionaly, one can modulate its ...
martinr's user avatar
0 votes
1 answer
313 views

I have been given audio database consisting of speech recordings. Speech can be in any language. So transcrips for speech are not available. Now I will be given one query. I want to see for which ...
Arun Labana's user avatar
1 vote
0 answers
64 views

I'm trying to process large video/audio files and extract timestamps and songs played during the video. For example, processing a large Twitch stream VOD to find out that songs A, B and C were played ...
ruzat's user avatar
  • 11
1 vote
1 answer
1k views

I am trying to write data to a virtual audio cable using python. So far I don't really care which module I use, so I tried PyAudio and Sounddevive. I installed the Virtual Cable from VB-Audio. It does ...
JRolfes's user avatar
  • 11
0 votes
0 answers
181 views

I am trying to remove the silence at the start and end of this recording, so that I just have the voice in between, no need to remove any silence in between, just the start and end portions of the ...
Ijaz Ahmed's user avatar
0 votes
1 answer
416 views

I am trying to do some audio processing in python, but I need both the video and audio as I wish to output both when I am done. I have looked up other examples, but they all use video files instead of ...
Bengemon825's user avatar
4 votes
2 answers
3k views

I'm currently working on a project where I need to perform real-time sentiment analysis on live audio streams using Python. The goal is to analyze the sentiment expressed in the spoken words and ...
Aqurds's user avatar
  • 59
1 vote
1 answer
457 views

I'm working on a project which involved numerous audio-processing tasks with text-to-speech, but I've hit a small snag. I'm going to be processing possibly hundreds of TTS audio segments, so I want to ...
Tessa Painter's user avatar
0 votes
1 answer
1k views

I'm working on a project that involves transcribing audio files using OpenAI's Whisper. To improve the quality of the transcriptions, I'm trying to reduce the noise in my audio files using the ...
Hana Baron's user avatar
0 votes
1 answer
871 views

I am trying to write my own MP3 player using .NET MAUI. One of the libraries are based on android media player. As I read in the documentation it has an Equalizer class in the android.media.audiofx ...
Wasyster's user avatar
  • 2,583
0 votes
0 answers
165 views

I have a .wav file, and I need to write code that tells me the frequency in hz of the sound. How do I go about this? I've done some research and all of them point me to something thats related to fft, ...
TupacShakur's user avatar
0 votes
1 answer
700 views

Given an audio file of speech, (for example the file you can download from here), that looks like this: If we examine it we will find that the vowels are the areas with the largest amplitude, and the ...
codeDom's user avatar
  • 1,849
1 vote
1 answer
891 views

I generate a simple sine wave with a frequency of 100 and calculate an FFT to check that the obtained frequency is correct. Then I calculate melspectrogram but do not understand what its output means? ...
codeDom's user avatar
  • 1,849
1 vote
1 answer
283 views

I want to modify channel configuration of an ALAC audio stream inside MP4-container without multiplexing the container. So I need to change letter H of Audio Data Transport Stream with a hex editor ...
FLX's user avatar
  • 311
0 votes
1 answer
44 views

I'm trying to write a function that detect all abnormal shot sound in a audio file extracted from the video file using moviepy. The result should be a dataframe that contains the frame count of video ...
iw2fs's user avatar
  • 19
2 votes
1 answer
89 views

Using Python, I am trying to plot the signal amplitude of a wav file, however I am getting the following error "ValueError: x and y must have same first dimension". Here is my code: import ...
Jay's user avatar
  • 21
2 votes
2 answers
914 views

Im trying to get the feed of the mic using "sounddevice" library in Python. import sounddevice as sd print(sd.query_devices()) But it returns empty list. I tried arecord -f cd -d 6 test....
imtiaz ul Hassan's user avatar
0 votes
0 answers
460 views

My project involves working with audio files to process the audio signal they contain to compare them to other files. The main problem I've been facing for many hours now is that I don't know how or ...
Adrien S.'s user avatar
0 votes
1 answer
101 views

I'm trying to build a voice assistant. I'm using speech_recognition library for the same. Below is my code: def takeCommand(): print("taking command") r = Recognizer() m = Microphone() with ...
styles's user avatar
  • 27
4 votes
1 answer
4k views

So, I have an audio file which is very long in duration. I have manual annotations (start and end duration in seconds) of the important parts which I need from the whole audio in a text file. I have ...
Medium's user avatar
  • 43
-1 votes
1 answer
82 views

I am currently developing a speaker recognition program which should recognize the speaker by listening the microphone. I'm a newbie at audio processing and machine learning, but I trained a neural ...
Arjein's user avatar
  • 35
1 vote
0 answers
461 views

I have a wav file named as a.wav. Let's say it is 9.432 seconds. I want to extend it to 13.321 seconds. So I need to add silence to both beginning and end of video to 1.945 seconds. The reason is I am ...
Furkan Gözükara's user avatar

1
2 3 4 5
12