0

I'm building a phone call application using Twilio Media Streams.

The workflow is as follows:

Twilio Media Stream → Google STT (Streaming) → LLM → TTS

I'm using the sample code from the following GitHub repository: https://github.com/twilio/media-streams/tree/master/python/realtime-transcriptions

I've modified the on_transcription_response function as shown below:

def on_transcription_response(response):
    if not response.results:
        return

    result = response.results[0]
    if not result.alternatives:
        return

    transcription = result.alternatives[0].transcript
    print("Transcription: " + transcription + " is_final: " + str(result.is_final))

The issue is that result.is_final never returns True, which prevents me from sending the transcription to the LLM.

I tried adding an is_silence function to pause when silence is detected, but is_final still always returns False.

import audioop

def is_silence(buffer, threshold=500):
    pcm = audioop.ulaw2lin(buffer, 2)  # Convert to 16-bit PCM
    rms = audioop.rms(pcm, 2)          # Calculate root mean square amplitude
    return rms < threshold

def add_request(self, buffer):
    if is_silence(buffer):
        print("Skipping silence based on amplitude")
        return
    self._queue.put(bytes(buffer), block=False)

Additionally, I need to continuously recognize speech with language_code="yue-Hant-HK", as the caller may speak at any time during the call. I’m not looking to stop recognition after a single utterance—the STT should stay active and detect complete sentences dynamically.

Any suggestions on how to handle this with Google STT streaming while keeping is_final working properly?

cheers

1 Answer 1

0

Try using single_utterance=true or manually half-close the stream when the API sends an END_OF_SINGLE_UTTERANCE response.

If this will not work, the issue needs to be investigated further. Please open a new issue on the issue tracker, describing your problem.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, @shiro. I’m currently working on setting single_utterance=true, checking for the end_of_file utterance, and then closing and rebuilding the stream to flush the recognized sentence. However, it still never turns True for me.
In that case, you need to open an issue on the issue tracker

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.