Others have experienced latency with Gemini Realtime as well. Some observations:
Timeout errors: Check your logs - the Gemini realtime plugin can run into timeout errors frequently, which contributes to lag. If you’re on the free tier Gemini API, this is more common.
Tool calls: If you’re using tool calls, they might be slowing things down.
Alternative models: For comparison, GPT-4o and GPT-4o-mini can achieve sub-1000ms latency. While Gemini is good, it’s currently not as reliable or consistent as OpenAI’s models in terms of latency and response quality.
VAD tuning: Your approach of reducing min_silence_duration is correct. You might also experiment with other VAD parameters.