Livekit Gemini-Live Pause issue

So we are using livekit + gemini live(2.5 native audio) , and 4/10 times we are facing a pausing issue , and we are getting this in logs :

{“message”: “_SegmentSynchronizerImpl.playback_finished called before text/audio input is done”, “level”: “WARNING”, “name”: “livekit.agents”, “text_done”: false, “audio_done”: true, “room”: “call-_12057091293_Hi3”, “agent_id”: “ad883c87-8b52-42a4-8418-8e”, “pid”: 626649, “job_id”: “AJ_YSmHaRfLqrKy”, “room_id”: “RM_nptn”, “timestamp”: “2026-05-25T12:15:55.426413+00:00”}

is there any fix ? can anyone help in resolving this issue?

@naahuja, The warning fires in _SegmentSynchronizerImpl.mark_playback_finished() when text or audio stream isn’t marked done before playback completes [ livekit agents/livekit/agents/voice/transcription/synchronizer.py ]. Your text_done: false, audio_done: true means the text transcript stream for that segment didn’t get its end_text_input() call before Gemini’s audio segment finished. The function returns early in this state, which is the “pause” you see: the next segment can’t process cleanly until the prior state resolves.

This looks like a known class of Gemini Live turn-end race. Several related issues are open in livekit/agents (#5742 trailing audio truncation when a tool call ends the turn, #5556 audio input pause during synchronous tool execution).

Two things to try:

  1. Update livekit-agents to the latest. Gemini Live plugin fixes land regularly and a recent release may already cover the segment-marking case.
  2. Test once with a non-native-audio Gemini Live variant. That isolates whether the race is specific to Gemini’s native-audio path (audio + transcript emitted asynchronously) versus a general plugin segment-marking issue.

If it persists on latest, open a livekit/agents issue with: your livekit-agents version, the exact Gemini model id, the full warning plus ~30 lines of surrounding agent logs, and the 4/10 reproduction frequency. Intermittent repros with that context are what maintainers need to chase the race.

Hi Usman , Thanks for reverting i will for sure try these steps , but i was just brainstroming a bit and found it could be a VAD issue as well? actually we already have silero vad running in the pipeline and now using gemini live which comes with it’s own vad , maybe both vad coninciding ? could this be a issue? @Muhammad_Usman_Bashir

@naahuja, That’s a real possibility with Gemini Live’s native-audio path. The AgentSession turn_detection parameter governs which VAD drives turn decisions: with a RealtimeModel that has server-side turn detection (Gemini Live native audio qualifies), set turn_detection='realtime_llm' to defer to the model’s server-VAD and stop driving turns from Silero [ livekit/agents/livekit-agents/livekit/agents/voice/agent_activity.py ].

Without it, Silero fires user-turn-end locally on audio energy while Gemini’s server-VAD marks turn-end on its side. Local commits from Silero land while Gemini is still mid-segment, which is exactly the race that produces text_done=false / audio_done=true in the SegmentSynchronizer.

Check your AgentSession constructor: if turn_detection isn’t explicitly set to 'realtime_llm', set it and re-test. Silero stays useful in the pipeline for interruption detection (separate concern), just not as the turn-end driver when Gemini is doing it server-side.

If that resolves the pause, your hypothesis is right and the open issues (#5742, #5556) are unrelated to your case.

Hi @Muhammad_Usman_Bashir , i tried to update the model to non native model , my livekit agents is also latest (i.e. 1.5.1) , i am using turn_detection=‘realtime_llm’ , so no silero as well ,
but i am still facing the pause issue even with the non native model.

@naahuja, with non-native plus latest plus realtime_llm still pausing, both prior hypotheses are ruled out. That moves this into open-bug territory rather than configuration.

There’s an open issue with the exact symptom: livekit/agents#5096 (March 2026, “Gemini Realtime EndCallTool not working”) reporting the same text_done=false / audio_done=true playback_finished warning on Gemini Live, no maintainer response yet. An earlier closed issue (#2644) addressed a related case via PR #2661 in June 2025, but #5096 shows the class isn’t fully closed.

Your repro is actually high-value for the maintainers because it proves the race isn’t native-audio specific. Comment on #5096 (or open a new issue cross-referencing it) with: livekit-agents 1.5.1, the non-native Gemini Live model id you tested, turn_detection='realtime_llm', no Silero, 4/10 reproduction frequency, plus the full warning and ~30 lines of surrounding logs. Intermittent races with definitive ruling-out of common config causes are exactly what maintainers need to chase the root cause.

Hi @Muhammad_Usman_Bashir , Can it be a case of tool latency ? my tool is taking 300-500ms ? could that be a issue ?
Just an observation .

@naahuja, That hypothesis lines up with the open fix at livekit/agents#5556 (PR currently open). It targets Gemini 3.1 specifically but documents the underlying mechanism: “Gemini forces synchronous tool calling, which means the model blocks until tool responses arrive” while the plugin continues forwarding mic audio. The PR’s fix is a _tool_call_pending flag that suppresses audio frames between toolCall arrival and tool response.

At 300-500ms your tool isn’t hitting the ~12s cancellation threshold the PR describes, but the same race opens a smaller window: while the agent blocks on tool execution, audio frames keep flowing, and Gemini’s audio_done can fire on the post-tool segment while text_done is still gated on the tool result. Same warning shape as yours.

First check whether your tool function is async def or sync def:

  # sync, blocks the agent's event loop while running
  @function_tool
  def my_tool(arg: str) -> str:
      result = some_blocking_call(arg)   # 300-500ms here
      return result

  # async, lets segment marking and audio handling progress in parallel
  @function_tool
  async def my_tool(arg: str) -> str:
      result = await some_async_call(arg)  # 300-500ms but non-blocking
      return result

If it’s sync, converting to async (and using await on any I/O) should narrow the race window even before #5556 lands.

For your GH issue write-up, including the exact tool latency (300-500ms) and whether the function is sync vs async is the missing piece that bridges #5556's mechanism to your model and tool config. The maintainer comment on #5556 also suggests the fix should generalize via tool_behavior config, covering non-3.1 cases.