1

I'm building a Mac and iOS app and using Google's WebRTC (m137 using LiveKit's binary distribution). For a regular app it's very convenient that it just automatically starts playback/playout of all incoming audio streams. However, I'm building an app where I want to do processing of the audio/PCM data before playing it back with my own audio output framework, and I can't find any API to stop that automatic playback?

RTCAudioTrack has a promising API with RTCAudioRenderers, and I figured if I just removeAllRenderers(), that'll remove the built-in renderer that plays back audio. But it does nothing.

After digging for hours, I found RTCAudioDeviceModule which seems to control the whole playback machinery. It even mentions isPlayoutEnabled in its delegate protocol, and has things like isPlayoutInitialized, so it looks so tantalizingly close to just having a isPlayoutEnabled property? But there isn't one‽

Is there some other API I'm missing for disabling playout? The idea seems to be there in the API, But I just can't find anything to stop WebRTC from not playing back (or at least not without muting the track, which I don't want to do -- I still want to receive PCM from it as an RTCAudioRenderer).

1 Answer 1

1

I've spent many hours reading the C++ implementation to come up with this solution. It's kind of a crazy solution: I hope I'm stupid and there's another, cleaner API just sitting there on some class I haven't seen yet, but this is the best I've got so far.

// This code uses the LiveKit distribution of WebRTC, hence all the WebRTC symbols are prefixed LK.

/// LKRTCAudioDeviceModule delegate whose only purpose is to stop playout/playback of every incoming audio track. This is because we want to process the PCM stream before playback, and not have GoogleWebRTC play it back in stereo. I couldn't find any API to change this behavior without overriding this delegate.
class PlaybackDisablingAudioDeviceModuleDelegate: NSObject, LKRTCAudioDeviceModuleDelegate
{
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didReceiveSpeechActivityEvent speechActivityEvent: LKRTCSpeechActivityEvent)
    {
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didCreateEngine engine: AVAudioEngine) -> Int
    {
        return 0
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, willEnableEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
    {
        return 0
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, willStartEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
    {
        return 0
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didStopEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
    {
        return 0
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didDisableEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
    {
        return 0
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, willReleaseEngine engine: AVAudioEngine) -> Int
    {
        return 0
    }
    
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, engine: AVAudioEngine, configureInputFromSource source: AVAudioNode?, toDestination destination: AVAudioNode, format: AVAudioFormat, context: [AnyHashable : Any]) -> Int
    {
        return 0
    }
    
    var mixer: AVAudioMixerNode!
    func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, engine: AVAudioEngine, configureOutputFromSource source: AVAudioNode, toDestination destination: AVAudioNode?, format: AVAudioFormat, context: [AnyHashable : Any]) -> Int
    {
        print("!!\nDISABLING OUTPUT ")
        guard let destination else { fatalError() }
        guard mixer == nil else { fatalError() }
        if mixer == nil
        {
            mixer = AVAudioMixerNode()
            engine.attach(mixer)
            
            engine.disconnectNodeOutput(source)
            engine.connect(source, to: mixer, format: format)
            
            engine.disconnectNodeInput(destination)
            engine.connect(mixer, to: destination, format: format)
            
            mixer.outputVolume = 0
        }
        return 0
    }
    
    func audioDeviceModuleDidUpdateDevices(_ audioDeviceModule: LKRTCAudioDeviceModule)
    {
    }
}

// Now, use this delegate from your PeerConnectionFactory
private static let audioDeviceObserver = PlaybackDisablingAudioDeviceModuleDelegate()
    private static let factory: LKRTCPeerConnectionFactory = {
        LKRTCInitializeSSL()
        let videoEncoderFactory = LKRTCDefaultVideoEncoderFactory()
        let videoDecoderFactory = LKRTCDefaultVideoDecoderFactory()
        // LKRTCAudioDeviceModuleDelegate is not called unless audioDeviceModuleType is switched from default to to .audioEngine.
        // This took two days of debugging and reading through WebRTC source code. Goddammit.
        let factory = LKRTCPeerConnectionFactory(
            audioDeviceModuleType: .audioEngine, // !! Important
            bypassVoiceProcessing: false,
            encoderFactory: videoEncoderFactory,
            decoderFactory: videoDecoderFactory,
            audioProcessingModule: nil
        )
        // Disable automatic playback of incoming audio streams using our delegate
        factory.audioDeviceModule.observer = audioDeviceObserver
        return factory
    }()

The big gotcha here is that the whole RTCAudioDeviceModuleDelegate protocol doesn't do anything unless you also change the RTCAudioDeviceModuleType from .default to .audioEngine. This isn't documented anywhere. 🤬

If you want to go C++ spelunking around how WebRTC for mac/ios does playback, here are some notes:

  • There are TWO AudioDeviceModules

    1. webrtc::AudioEngineDevice at webrtc/modules/audio_device/audio_engine_device.h|mm is apple specific and looks fairly modern.

    2. ObjCAudioDeviceModule at webrtc/sdk/objc/native/src/objc_audio_device.mm is also apple specific, and looks ancient and sloppilly written.

  • RTCPeerConnectionFactory.mm initializes either of them based on the Type.

    • RTCAudioDeviceModuleTypeAudioEngine gives you #1, webrtc::AudioEngineDevice

    • RTCAudioDeviceModuleTypePlatformDefault gives you #2, ObjCAudioDeviceModule

      • This is obviously then the default, and thus the one we’re using.
  • Either is exposed through the public API RTCAudioDeviceModule at webrtc/sdk/objc/api/peerconnection/RTCAudioDeviceModule.h|m

    • It seems awfully hardcoded for webrtc::AudioEngineDevice though?! Even though that’s not the default?!

EDIT:

In my first submission, I did destination!.auAudioUnit.isOutputEnabled = false and that did indeed turn off playout. But it also disabled all audio, so that PCM wouldn't be rendered through RTCAudioRenderer. So I've updated the answer with this AVAudioMixer instead, which mutes playout but allows PCM to be renderered.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.