Cloud turn detector failed + late STT final warnings after enabling LiveKit inference audio turn detection/vad

After switching from LiveKit plugin models to LiveKit Inference models, I started seeing new warnings/issues that were not present before.

Issues:

  1. Late EOU detection warning:
eou detection ran after the audio eot turn was already flushed (likely a late stt final). consider raising min_delay in the endpointing options to accommodate slow stt.

  1. Cloud turn detector failures:
cloud turn detector failed (eot prediction timed out); falling back to local mini model

  1. Worker/job memory usage has also increased significantly.

Previous setup:

  • Silero VAD plugin
  • LiveKit plugin turn detector

New setup:

  • LiveKit Inference VAD
  • New LiveKit Inference audio turn detector

Current settings:

VAD:

MIN_SILENCE_DURATION=0.25

Endpointing:

MIN_ENDPOINTING_DELAY=0.15
MAX_ENDPOINTING_DELAY=1.2
ENDPOINTING_MODE=fixed

Session/Room/Call IDs

  • RM_HQMqnZ3afh2U
  • RM_GfoXD29Yfto6
  • RM_7BcQza5BEoAA

I suspect the lower endpointing delay might be too aggressive, causing the turn to close before STT final / EOU prediction completes but there was not this issue before and i need it to be minimum as for very responsive replies.

Questions:

  • Is this expected when moving from local/plugin VAD + turn detector to LiveKit Inference models?
  • Should min_delay be increased when using cloud turn detection (will it add delay as i want it lower latency as possible)?
  • Is there a recommended configuration for low latency but reliable EOU detection?

Would appreciate guidance on recommended production settings or should we revert back to livekit plugins as it was before?.

@Kaushal_Shah, You filed this as livekit/agents#6177 as well, and a maintainer already answered there. The load-bearing part: min_endpointing_delay=0.15 doubles as the timeout the cloud EOU prediction gets. The future is awaited with asyncio.wait([fut], timeout=min_delay) [ audio_recognition.py ], and the audio model runs ~80-100ms per the maintainer, so a 150ms budget leaves no room for the round-trip. It times out, commits without a prediction, and cancel_inference(timed_out=True) flips you to the local mini model [inference/eot/base.py]. The stated audio-detector defaults are 300ms min / 2.5s max; raising min toward 300ms stops the timeout-and-fallback thrash, and a lower max is fine.

The late-EOU warning is benign and being rephrased there: the detector already committed the turn correctly, then a late STT final fired a redundant second EOU call. The audio model decides without waiting on the transcript, so that ordering is possible where the old text-based model couldn’t hit it. It isn’t your endpointing closing the turn early. Memory is still open on the issue; the maintainer asked for your measurement method, so that part belongs there.

Thanks @Muhammad_Usman_Bashir , yeah i opened here as well to get more insights from you guys.

So both issues remained still for me even after increasing endpointing time

  1. The warning still comes if we increase min endpointing delay to MIN_ENDPOINTING_DELAY=0.5
  2. The memory usage is also logged in sentry when it cross job memory warn set in AgentServer, earlier we had 750 and we never get any warnings, and now we had to increase upto 1024 to remove warnings.

i am reverting back to text turn detector to see both issues are resolved or not

Which warning do you see after increasing the endpointing? The warning about ‘eou detection ran after the audio…’, or ‘cloud turn detector failed…’?

eou detection ran after the audio eot turn was already flushed (likely a late stt final). consider raising min_delay in the endpointing options to accommodate slow stt.

This warning, and also in some sessions the turn detector for v1 model times out or failed and switch to mini model for rest of session

Thanks. I see Chenghao already replied in Cloud turn detector timeout and late EOU detection warnings after switching to LiveKit Inference VAD/turn detector · Issue #6177 · livekit/agents · GitHub. For context, he is the engineer responsible for implementing this new feature, so I am also learning from his responses :slight_smile:

and also in some sessions the turn detector for v1 model times out or failed and switch to mini model for rest of session

Can you share a session or sessions where you see this?

Sure, no problem, thanks for all help @darryncampbell
i have reverted to use older text turn taking model for now. so i dont have any new sessions but these were old ones

  • RM_HQMqnZ3afh2U
  • RM_GfoXD29Yfto6
  • RM_7BcQza5BEoAA

Thanks, I see the eou detection ran after the audio eot turn was already flushed... error in the Agent Observability logs for each of these sessions but I don’t see the error related to , also in some sessions the turn detector for v1 model times out or failed and switch to mini model for rest of session, where should I look for this?

Sorry about cofusion for memory usage issue, its not linked with livekit turn detecor audio or silero vad, it was because of version upgrade from 1.6.0 to 1.6.2

  1. So on 1.6.0, ram usage is mostly under 700mb, as i cannot see any warning log
  2. For 1.6.1 its around 800mb.
  3. For 1.6.2 its more than 950

So session with mini model error i would have to find some sessions, quite hard but will share you soon