Behaviour of Gemini Live 3.1 model in LiveKit (not consistent)

vvgr001 · March 28, 2026, 8:31am

Hi,

We have been testing with Gemini Live 3.1 and see really weird behaviour. For example, when speaking in Dutch, it sometimes transcribes in a totally other language, although the model does seem to hear what we say correct. Is this a known limitation of the model?

Also, when we connect an STT model to the AgentSession to get good transcription for example. This results in getting two streams for user transcription.

I am thinking about the use case of using for example Soniox STT or Gladia STT next to Gemini Live 3.1, to get the correct transcription and preventing in having two streaks recorded in the chat transcript. What would be a good approach for this?

Another point; Is it recommended to use Silero VAD with Realtime models like Gemini?

Harvinder_Singh · April 3, 2026, 2:39am

It is a speech to speech model bro. It doesn’t work on text. If u want the transcription for post- analysis cant say for sure but there must a method within the plugin to do so.
Also, if you’re planning to use this agent for telephony, don’t I have wasted days trying to fix an unsolvable error. The latency is going to be somewhere around 1.5 to 2.5 whole seconds for Gemini to take in the input audio and generate an output speech. I don’t know exactly why this is the case for telephony. I guess you know telephony providers use 8 kilohertz of frequency. So basically, all in all, it’s not a good idea because you will hit a latency bottleneck, and you cannot do anything because it’s Gemini 3.1 Flash Live that is taking the bulk of processing time, and there is literally no way you can speed it up.

darryncampbell · April 3, 2026, 8:32am

You should be able to take the example from the turn detection docs, Gemini Live API plugin | LiveKit Documentation , which uses a separate STT

Janhvi_Nandwani · April 8, 2026, 7:16am

Hi,

Given that you are looking to test across models and continue to monitor this across your test cases; we’ve built Cekura.ai for exactly that. Automated 1000+ test cases across 100+ metrics and scenario judges with A/B testing across models.

Topic		Replies	Views
Inconsistent transcripts language when using Gemini realtime model ( gemini-live-2.5-flash-native-audio ) Agents agent-development , plugin , gemini , google	3	34	March 3, 2026
Anyone else having issues with Live 3.1? Getting Started	5	70	April 1, 2026
Gemini 3.1 flash live Getting Started	9	333	April 8, 2026
Realtime model with Azure whisper STT Agents python , stt , realtime , openai , azure	17	89	February 26, 2026
High latency (5-8 seconds) with Google Gemini Realtime plugin over SIP Self Hosting gemini , sip-trunking , turn-detection	3	130	April 8, 2026

Behaviour of Gemini Live 3.1 model in LiveKit (not consistent)

Related topics