I’m building a real-time voice interaction system using the LiveKit SDK, where responses from my backend (LLM + TTS) can sometimes take longer than expected.
To improve the user experience, I’m considering adding a fallback mechanism:
-
After sending a user query, I start a timer (e.g., 1.5 seconds)
-
If no response audio has started by then, I play a short filler message like “Let me check that for you…”
-
If the actual response arrives while the filler is playing, I want the filler to play completely and switch to the real response audio
However, since the STT → LLM → TTS pipeline is managed internally by LiveKit, I don’t have direct control over:
I wanted to ask:
-
What’s the recommended way in LiveKit to handle this kind of timed fallback behavior?
-
Is there a best practice for interrupting and replacing an ongoing audio track with a new one?
-
Should this be handled entirely at the application layer, or is there any built-in support/pattern in LiveKit for such cases?
-
If anyone has implemented something similar, I’d love to hear how you handled audio track switching and synchronization
-
Even a minimal sample showing how to inject and interrupt audio alongside the pipeline would be really helpful.
Any guidance or examples would be really helpful. Thanks!