I've spent many hours reading the C++ implementation to come up with this solution. It's kind of a crazy solution: I hope I'm stupid and there's another, cleaner API just sitting there on some class I haven't seen yet, but this is the best I've got so far.
// This code uses the LiveKit distribution of WebRTC, hence all the WebRTC symbols are prefixed LK.
/// LKRTCAudioDeviceModule delegate whose only purpose is to stop playout/playback of every incoming audio track. This is because we want to process the PCM stream before playback, and not have GoogleWebRTC play it back in stereo. I couldn't find any API to change this behavior without overriding this delegate.
class PlaybackDisablingAudioDeviceModuleDelegate: NSObject, LKRTCAudioDeviceModuleDelegate
{
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didReceiveSpeechActivityEvent speechActivityEvent: LKRTCSpeechActivityEvent)
{
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didCreateEngine engine: AVAudioEngine) -> Int
{
return 0
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, willEnableEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
{
return 0
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, willStartEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
{
return 0
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didStopEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
{
return 0
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, didDisableEngine engine: AVAudioEngine, isPlayoutEnabled: Bool, isRecordingEnabled: Bool) -> Int
{
return 0
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, willReleaseEngine engine: AVAudioEngine) -> Int
{
return 0
}
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, engine: AVAudioEngine, configureInputFromSource source: AVAudioNode?, toDestination destination: AVAudioNode, format: AVAudioFormat, context: [AnyHashable : Any]) -> Int
{
return 0
}
var mixer: AVAudioMixerNode!
func audioDeviceModule(_ audioDeviceModule: LKRTCAudioDeviceModule, engine: AVAudioEngine, configureOutputFromSource source: AVAudioNode, toDestination destination: AVAudioNode?, format: AVAudioFormat, context: [AnyHashable : Any]) -> Int
{
print("!!\nDISABLING OUTPUT ")
guard let destination else { fatalError() }
guard mixer == nil else { fatalError() }
if mixer == nil
{
mixer = AVAudioMixerNode()
engine.attach(mixer)
engine.disconnectNodeOutput(source)
engine.connect(source, to: mixer, format: format)
engine.disconnectNodeInput(destination)
engine.connect(mixer, to: destination, format: format)
mixer.outputVolume = 0
}
return 0
}
func audioDeviceModuleDidUpdateDevices(_ audioDeviceModule: LKRTCAudioDeviceModule)
{
}
}
// Now, use this delegate from your PeerConnectionFactory
private static let audioDeviceObserver = PlaybackDisablingAudioDeviceModuleDelegate()
private static let factory: LKRTCPeerConnectionFactory = {
LKRTCInitializeSSL()
let videoEncoderFactory = LKRTCDefaultVideoEncoderFactory()
let videoDecoderFactory = LKRTCDefaultVideoDecoderFactory()
// LKRTCAudioDeviceModuleDelegate is not called unless audioDeviceModuleType is switched from default to to .audioEngine.
// This took two days of debugging and reading through WebRTC source code. Goddammit.
let factory = LKRTCPeerConnectionFactory(
audioDeviceModuleType: .audioEngine, // !! Important
bypassVoiceProcessing: false,
encoderFactory: videoEncoderFactory,
decoderFactory: videoDecoderFactory,
audioProcessingModule: nil
)
// Disable automatic playback of incoming audio streams using our delegate
factory.audioDeviceModule.observer = audioDeviceObserver
return factory
}()
The big gotcha here is that the whole RTCAudioDeviceModuleDelegate protocol doesn't do anything unless you also change the RTCAudioDeviceModuleType from .default to .audioEngine. This isn't documented anywhere. 🤬
If you want to go C++ spelunking around how WebRTC for mac/ios does playback, here are some notes:
There are TWO AudioDeviceModules
webrtc::AudioEngineDevice at webrtc/modules/audio_device/audio_engine_device.h|mm is apple specific and looks fairly modern.
ObjCAudioDeviceModule at webrtc/sdk/objc/native/src/objc_audio_device.mm is also apple specific, and looks ancient and sloppilly written.
RTCPeerConnectionFactory.mm initializes either of them based on the Type.
RTCAudioDeviceModuleTypeAudioEngine gives you #1, webrtc::AudioEngineDevice
RTCAudioDeviceModuleTypePlatformDefault gives you #2, ObjCAudioDeviceModule
- This is obviously then the default, and thus the one we’re using.
Either is exposed through the public API RTCAudioDeviceModule at webrtc/sdk/objc/api/peerconnection/RTCAudioDeviceModule.h|m
- It seems awfully hardcoded for
webrtc::AudioEngineDevice though?! Even though that’s not the default?!
EDIT:
In my first submission, I did destination!.auAudioUnit.isOutputEnabled = false and that did indeed turn off playout. But it also disabled all audio, so that PCM wouldn't be rendered through RTCAudioRenderer. So I've updated the answer with this AVAudioMixer instead, which mutes playout but allows PCM to be renderered.