LiveKit inference for gemini 3.1 flash lite when?

Gemini 3.1 flash lite is 2.5x faster than gemini 2.5 flash and it is also has better intelligence scores than gemini 2.5 flash. I think this would be nice to have 3.1 flash lite available for its time to first answer token response time and it’s overall speed. Having colocation with the voice agent would make STT → LLM → TTS → telephony pipeline even lower latency.

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/

You can access https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-lite-preview through the Plugin today:

llm=google.LLM(
    model="gemini-3.1-flash-lite-preview",
),

It is not yet available through LiveKit inference, but it should be soon

We are waiting for that.

it’s live btw :slight_smile: it seems like it might be slightly adding latency instead of decreasing latency I’m not sure yet. Need to do more testing