Audio mixer hard coded sample rate causes voice warping

After adding background typing sounds we noticed an increase in voice warping during the initial phase of a call. We pinpointed it to the frame rate negotiation between the TTS audio from the AI provider and the background audio. The background audio mixer is hardcoded to 48khz with no option to override it. The TTS provider does not support 48khz so the reframing is needed.

Is there a way to modify the audiomixer implementation and the preloaded audio samples to adjust to the sample rate of the AI provider to avoid this issue?

This also only occurs on resource constrained machines (cloud) our local development (on macbooks) does not suffer from this.

The source audio and what is actually published is not necessarily the same. It gets resampled. Why is the causing an issue for you?

Because it results in audio distortion. I uploaded 2 fragments where this happened, but we have quite a few where this is the case

What am I listening for in those two files?

The distorted “hey Sam” in the first few seconds of each fragment.
This started showing up since we added background audio

What version of LikveKIt libraries are you running, what TTS, STT, and LLM are you using?

It would be helpful if you could take this example

and make an example I can run that will reproduce the issue along with any steps I need to reproduce it.

I just tried to set this up and reproduce but I was not seeing the slured speech like you are.

Can you please share you setup and reproducible steps?