After adding background typing sounds we noticed an increase in voice warping during the initial phase of a call. We pinpointed it to the frame rate negotiation between the TTS audio from the AI provider and the background audio. The background audio mixer is hardcoded to 48khz with no option to override it. The TTS provider does not support 48khz so the reframing is needed.
Is there a way to modify the audiomixer implementation and the preloaded audio samples to adjust to the sample rate of the AI provider to avoid this issue?
This also only occurs on resource constrained machines (cloud) our local development (on macbooks) does not suffer from this.