If you use the Gemini + LiveKit plugin directly, you’ll likely experience high latency. In my case, it was around 3-3.5 seconds.
After deeper investigation, I found that the TTFP (Time to First Token) was the main issue—it was sitting at ~2 to 2.5 seconds, which is extremely high.
Digging further, the root cause turned out to be enabled tools, specifically the Google Search tool. These tools significantly slow down the agent.
The fix is simple once you identify it:
- Remove search tools
There is currently no clear mention in documentation that these tools can cause such high latency, which makes this issue harder to diagnose.
Additionally, I’ve seen some discussions suggesting that disabling noise cancellation may help. I haven’t fully verified this, but removing search tools definitely has a major impact.
After removing them:
- TTFP dropped from ~2–2.5 seconds to ~100 ms (overall timeline 1 to 1.5 seconds)
This improvement is because the model is speech-to-speech and can start generating output tokens before the speaker finishes speaking. In some cases, this can even result in negative latency values, which is ideal for real-time systems.
Conclusion
To significantly reduce latency:
-
Remove search tools
-
Optionally test disabling noise cancellation
This alone can dramatically improve responsiveness.