How to reliably capture high-quality audio/transcripts from Google Meet using a bot (without official API)? [closed]

Ask Question

Asked yesterday

Modified yesterday

Viewed 26 times

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed yesterday.

Improve this question

I’m building a system where a bot joins a Google Meet call and extracts live transcription.
Right now, I’m injecting JavaScript into the Meet tab (through a browser automation bot) and scraping the DOM captions. This works, but the transcription quality is very poor:

Many words are wrong/missing
Google Meet system messages (join/leave/prompts) appear inside the transcript
Sometimes only partial captions appear
The accuracy is far below what Google Meet itself shows to users

Google Meet does not provide any official API for captions, speaker labels, or meeting audio, and WebRTC restrictions prevent directly capturing tab audio through JavaScript for a non-human bot.

What I want to know

Is there any reliable / free / open-source method to capture high-quality audio or transcripts from Google Meet when a bot joins the call?

Details about my environment

The bot is running on an Ubuntu VM (Civo cloud)
I can run a headful Chrome instance (via Puppeteer or Selenium)
I’m okay with recording system/tab audio if possible
I want to avoid paid APIs (e.g., Vexa, paid STT APIs)
Goal is to feed the audio into a local STT engine (Whisper, WhisperX, etc.)

What I’ve already tried

DOM scraping of captions → poor quality, noisy, system messages mixed with speech
Exploring Chrome getDisplayMedia → cannot auto-grant permissions from a bot; fails due to user-gesture requirement
Investigating WebRTC internals → Seems impossible to intercept audio tracks of other participants from JS
Searching for Meet API → none exists for transcripts/audio

My questions

Is there a technically feasible way to capture Google Meet tab/system audio on a Linux VM using a bot?
- e.g., using PulseAudio monitor, null-sinks, Chrome flags, or tabCapture
Has anyone successfully implemented a Google Meet bot → audio capture → local transcription (Whisper) pipeline?
Are there any reliable open-source approaches, or is the only stable method to record system audio at the OS level and bypass Meet entirely?
Any known limitations with Chrome/Puppeteer + Meet that I should be aware of?

My goal

I’m not trying to break security — I just want to implement a bot that can hear the meeting audio (similar to human attendees), transcribe it locally, and avoid the low-quality DOM caption scrape.

What is the best technical approach to achieve this?

asked yesterday

varaha

New contributor

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow