4

I'm using Azure SpeechSDK services for speech-to-text transcription using recognizeOnceAsync. The current code resembles:

var SpeechSDK, recognizer, synthesizer;
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription('SUB_KEY', 'SUB_REGION');
var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
new Promise(function(resolve) {
    recognizer.onend = resolve;
    recognizer.recognizeOnceAsync(
        function (result) {
            recognizer.close();
            recognizer = undefined;
            resolve(result.text);
        },
        function (err) {
            alert(err);
            recognizer.close();
            recognizer = undefined;
        }
    );
}).then(r => {
    console.log(`Azure STT enterpreted: ${r}`);
});

In the HTML file I import the Azure package like so:

<script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>

The issue is that I would like to increase the amount of "Silence time" which is allowed before the recognizeOnceAsync method returns the result. (I.e. you should be able to stop and take a breath without the method assuming you're done talking). Is there any way to do this with fromDefaultMicrophoneInput? I've tried various things like:

const SILENCE_UNTIL_TIMEOUT_MS = 5000;
speechConfig.SpeechServiceConnection_EndSilenceTimeoutMs = SILENCE_UNTIL_TIMEOUT_MS;
audioConfig.setProperty("Speech_SegmentationSilenceTimeoutMs", SILENCE_UNTIL_TIMEOUT_MS);

but none seem to extend the "silence time allowance" correctly.

This is the resource which I have been looking at: https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/propertyid?view=azure-node-latest

1 Answer 1

5

Based on what you're describing, you'd need to set the segmentation silence timeout. Unfortunately, there is a bug in the JS SDK at the moment and the PropertyId.Speech_SegmentationSilenceTimeoutMs is not being set correctly.

As a workaround, you can instead set the segmentation timeout as follows:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const reco = new SpeechRecognizer(speechConfig);
const conn = Connection.fromRecognizer(reco);
conn.setMessageProperty("speech.context", "phraseDetection", {
    "INTERACTIVE": {
        "segmentation": {
            "mode": "custom",
            "segmentationSilenceTimeoutMs": 5000
        }
    },
    mode: "Interactive"
});

reco.recognizeOnceAsync(
    (result) =>
    {
        console.log("Recognition done!!!");
        // do something with the recognition
    },
    (error) =>
    {
        console.log("Recognition failed. Error:" + error);
    });

Please note that the allowed range for the segmentation timeout is 100-5000 ms (inclusive)

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the snippet! Marked as answer. Not sure why there's little to no documentation on these speechConfig/Recognizer attributes and how to vary them
I used the latest package v1.25.1 but was not able to see any effect using the setProperty method. The code shared by Ralph worked fine! Refer to this doc for details on these settings and their impacts.
@Ralph are there modes other than interactive?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.