Handling response latency: Playing fallback/filler audio if no response arrives within a timeout

I’m building a real-time voice interaction system using the LiveKit SDK, where responses from my backend (LLM + TTS) can sometimes take longer than expected.

To improve the user experience, I’m considering adding a fallback mechanism:

  • After sending a user query, I start a timer (e.g., 1.5 seconds)

  • If no response audio has started by then, I play a short filler message like “Let me check that for you…”

  • If the actual response arrives while the filler is playing, I want the filler to play completely and switch to the real response audio

However, since the STT → LLM → TTS pipeline is managed internally by LiveKit, I don’t have direct control over:

I wanted to ask:

  1. What’s the recommended way in LiveKit to handle this kind of timed fallback behavior?

  2. Is there a best practice for interrupting and replacing an ongoing audio track with a new one?

  3. Should this be handled entirely at the application layer, or is there any built-in support/pattern in LiveKit for such cases?

  4. If anyone has implemented something similar, I’d love to hear how you handled audio track switching and synchronization

  5. Even a minimal sample showing how to inject and interrupt audio alongside the pipeline would be really helpful.

Any guidance or examples would be really helpful. Thanks!

@darryncampbell Did you get a chance to review this? It’s important for us as we’re planning to launch the product soon.

What I want here is: if the LLM response arrives before the threshold timer is reached, we should cancel the filler message instead of sending it.

I checked this example: agents/examples/voice_agents/fast-preresponse.py at main · livekit/agents · GitHub but this guarantees each time filler message will be there and this is not time specific.

I can’t find an example which shows this, but what sounds most sensible to me is:

  1. Start a timer in on_user_turn_completed(), Pipeline nodes and hooks | LiveKit Documentation
  2. Cancel the timer in llm_node() when the agent responds, Pipeline nodes and hooks | LiveKit Documentation
  3. If the timer fires, generate_reply()

We also have this page, which you have probably seen, for a similar use case: External data and RAG | LiveKit Documentation