How to change Azure text to speech silence timeout in JavaScript

Question

I'm using Azure SpeechSDK services for speech-to-text transcription using recognizeOnceAsync. The current code resembles:

var SpeechSDK, recognizer, synthesizer;
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription('SUB_KEY', 'SUB_REGION');
var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
new Promise(function(resolve) {
    recognizer.onend = resolve;
    recognizer.recognizeOnceAsync(
        function (result) {
            recognizer.close();
            recognizer = undefined;
            resolve(result.text);
        },
        function (err) {
            alert(err);
            recognizer.close();
            recognizer = undefined;
        }
    );
}).then(r => {
    console.log(`Azure STT enterpreted: ${r}`);
});

In the HTML file I import the Azure package like so:

<script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>

The issue is that I would like to increase the amount of "Silence time" which is allowed before the recognizeOnceAsync method returns the result. (I.e. you should be able to stop and take a breath without the method assuming you're done talking). Is there any way to do this with fromDefaultMicrophoneInput? I've tried various things like:

const SILENCE_UNTIL_TIMEOUT_MS = 5000;
speechConfig.SpeechServiceConnection_EndSilenceTimeoutMs = SILENCE_UNTIL_TIMEOUT_MS;
audioConfig.setProperty("Speech_SegmentationSilenceTimeoutMs", SILENCE_UNTIL_TIMEOUT_MS);

but none seem to extend the "silence time allowance" correctly.

This is the resource which I have been looking at: https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/propertyid?view=azure-node-latest

Ralph · Accepted Answer · 2022-08-11 23:02:54Z

5

Based on what you're describing, you'd need to set the segmentation silence timeout. Unfortunately, there is a bug in the JS SDK at the moment and the PropertyId.Speech_SegmentationSilenceTimeoutMs is not being set correctly.

As a workaround, you can instead set the segmentation timeout as follows:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const reco = new SpeechRecognizer(speechConfig);
const conn = Connection.fromRecognizer(reco);
conn.setMessageProperty("speech.context", "phraseDetection", {
    "INTERACTIVE": {
        "segmentation": {
            "mode": "custom",
            "segmentationSilenceTimeoutMs": 5000
        }
    },
    mode: "Interactive"
});

reco.recognizeOnceAsync(
    (result) =>
    {
        console.log("Recognition done!!!");
        // do something with the recognition
    },
    (error) =>
    {
        console.log("Recognition failed. Error:" + error);
    });

Please note that the allowed range for the segmentation timeout is 100-5000 ms (inclusive)

answered Aug 11, 2022 at 23:02

Ralph

1412 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Steinn Hauser Magnússon Over a year ago

Thanks for the snippet! Marked as answer. Not sure why there's little to no documentation on these speechConfig/Recognizer attributes and how to vary them

Birla Over a year ago

I used the latest package v1.25.1 but was not able to see any effect using the setProperty method. The code shared by Ralph worked fine! Refer to this doc for details on these settings and their impacts.

Sujit.Warrier Over a year ago

@Ralph are there modes other than interactive?

Collectives™ on Stack Overflow

How to change Azure text to speech silence timeout in JavaScript

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related