High latency (5-8 seconds) with Google Gemini Realtime plugin over SIP

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

I’m working on a voice assistant using LiveKit SIP (with Twilio) and the Google Gemini Realtime plugin. Calls connect successfully and conversation flows, but I’m facing 5-8 second latency for agent replies.

I’ve tested both self-hosted (Docker Compose) and LiveKit Cloud - latency is the same on both.

My stack:

  • Model: gemini-2.5-flash-native-audio-preview via google.realtime
  • SIP: Twilio Elastic SIP Trunk

Optimizations tried:

  • Added silero.VAD for faster speech-end detection
  • Added turn_detector.EnglishModel() for turn-taking
  • Reduced min_silence_duration to 0.4

Has anyone else experienced this delay with the Gemini Realtime plugin?

Others have experienced latency with Gemini Realtime as well. Some observations:

Timeout errors: Check your logs - the Gemini realtime plugin can run into timeout errors frequently, which contributes to lag. If you’re on the free tier Gemini API, this is more common.

Tool calls: If you’re using tool calls, they might be slowing things down.

Alternative models: For comparison, GPT-4o and GPT-4o-mini can achieve sub-1000ms latency. While Gemini is good, it’s currently not as reliable or consistent as OpenAI’s models in terms of latency and response quality.

VAD tuning: Your approach of reducing min_silence_duration is correct. You might also experiment with other VAD parameters.

Hello, I am facing the same issue for a long time now. Can you please suggest what other parameters can be used to fix this? Or is this an expected behavior??

My stack:

  • Model: gemini-2.5-flash-native-audio-preview-12-2025 via google.realtime

  • SIP: Twilio Elastic SIP Trunk and Plivo SIP Trunk

Optimizations tried:

  • Added silero.VAD for faster speech-end detection with min_speech_duration=0.1, min_silence_duration=0.25, activation_threshold=0.5, deactivation_threshold=0.40, prefix_padding_duration=0.0
1 Like

This question has a lot of views, so I will improve the update the answer.

The best general resource for understanding and improving agent latency is this blog:

To address some of the specifics in the question:

  • The original question, under ‘optimizations tried’, implies the OP is using LiveKit’s turn detection. Gemini, like other Realtime models, has its own built-in turn detection which should be used unless there is a good reason you need a separate turn detection model: Gemini Live API plugin | LiveKit Documentation
  • I have seen reports that the provider tools in Gemini Live can add latency, so it is worth testing without those.