This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.
I’m working on a voice assistant using LiveKit SIP (with Twilio) and the Google Gemini Realtime plugin. Calls connect successfully and conversation flows, but I’m facing 5-8 second latency for agent replies.
I’ve tested both self-hosted (Docker Compose) and LiveKit Cloud - latency is the same on both.
My stack:
- Model:
gemini-2.5-flash-native-audio-preview via google.realtime
- SIP: Twilio Elastic SIP Trunk
Optimizations tried:
- Added
silero.VAD for faster speech-end detection
- Added
turn_detector.EnglishModel() for turn-taking
- Reduced
min_silence_duration to 0.4
Has anyone else experienced this delay with the Gemini Realtime plugin?
Others have experienced latency with Gemini Realtime as well. Some observations:
Timeout errors: Check your logs - the Gemini realtime plugin can run into timeout errors frequently, which contributes to lag. If you’re on the free tier Gemini API, this is more common.
Tool calls: If you’re using tool calls, they might be slowing things down.
Alternative models: For comparison, GPT-4o and GPT-4o-mini can achieve sub-1000ms latency. While Gemini is good, it’s currently not as reliable or consistent as OpenAI’s models in terms of latency and response quality.
VAD tuning: Your approach of reducing min_silence_duration is correct. You might also experiment with other VAD parameters.
Hello, I am facing the same issue for a long time now. Can you please suggest what other parameters can be used to fix this? Or is this an expected behavior??
My stack:
Optimizations tried:
- Added
silero.VAD for faster speech-end detection with min_speech_duration=0.1, min_silence_duration=0.25, activation_threshold=0.5, deactivation_threshold=0.40, prefix_padding_duration=0.0
1 Like
This question has a lot of views, so I will improve the update the answer.
The best general resource for understanding and improving agent latency is this blog:
To address some of the specifics in the question:
- The original question, under ‘optimizations tried’, implies the OP is using LiveKit’s turn detection. Gemini, like other Realtime models, has its own built-in turn detection which should be used unless there is a good reason you need a separate turn detection model: Gemini Live API plugin | LiveKit Documentation
- I have seen reports that the provider tools in Gemini Live can add latency, so it is worth testing without those.