All Livekit Inference Gemini LLMs return "Completion_tokens=0" and stop responding , This suddenly started happening today without any code change

CX_PROGRAMMER_AND_DESIGNER · May 27, 2026, 11:31am

Yesterday I was testing my product using LiveKit Inference endpoints with STT = deep gram/nova3 and LLM = gemini 3.1 flash lite and TTS = Cartesia/sonic-3 and it was working perfectly fine , without any code change today suddenly I see that after one or two turns my product just stays silent and nothing happens and when I checked the logs I saw that my LLM is returning “Completion_tokens=0”. I also tried to change the LLM model and it works amazing with other models today as well , but its an issue with just google models

CWilson · May 27, 2026, 11:53am

Is it possible you are hitting Google’s Moderation? Have checked with them on what may be happening?

CX_PROGRAMMER_AND_DESIGNER · May 27, 2026, 5:45pm

Thanks for the hint , it could be the reason because google rolled out its latest moderations on 21st may .
But how do I know that , while streaming I am unable to see those logs that why the LLM returns “Completion_token=0”.
Basically I want advice and guidance on how do I evaluate my prompts on the latest moderations by google

CWilson · May 27, 2026, 6:05pm

Google community will be the experts on this. I don’t have specific knowledge of Googles recommendations on this topic.

Muhammad_Usman_Bashir · May 28, 2026, 2:19am

@CX_PROGRAMMER_AND_DESIGNER, completion_tokens=0 with empty output is the signature of a Gemini safety block. The reason lies in the raw response: finishReason="SAFETY" plus safetyRatings (category + probability), and promptFeedback.blockReason if the prompt itself was blocked [ ai.google.dev/gemini-api/docs/safety-settings ]. LiveKit Inference surfaces token counts, not those fields, so you can’t see the cause through it.

In my opinion, to see it: replay a failing turn in Google AI Studio or against the Gemini API directly, where finishReason and safetyRatings show. To fix over-moderation: set Gemini’s safetySettings per-category thresholds

( HARM_CATEGORY_HATE_SPEECH, HARASSMENT, SEXUALLY_EXPLICIT, DANGEROUS )

like BLOCK_ONLY_HIGH or BLOCK_NONE. Inference doesn’t expose these, but the direct Google plugin does: google.LLM(safety_settings=[...]) forwards them to Gemini [ livekit/agents/…/google/llm.py ]. Switching to the plugin with your own key gives both the block-reason visibility and the threshold control.

CWilson · May 28, 2026, 7:10pm

@CX_PROGRAMMER_AND_DESIGNER Do you use Agent insights? If so can you share a session where you saw the zero tokens. If not using agent insights do you have full agent log from when this happened?

Someone on our team tried to reproduce this but did not see this behavior exactly.

Topic		Replies	Views
Livekit Inference no-thinking config for google gemini 2.5 flash model Getting Started livekit-inference	4	85	March 26, 2026
Livekit Inference thinking configuration for gemini 2.5 and 3.5 flash Agents agent-deployment , llm , livekit-inference	4	71	May 28, 2026
Gemini 3 Flash Preview via LiveKit Inference has much higher TTFT/jitter than direct Vertex in same Agents workflow Agents llm	1	39	May 15, 2026
Livekit inference GPT-5 mini does not works Getting Started llm , livekit-inference	5	60	June 17, 2026
Gemini Realtime API error: 1007 None. The audio content type (CONTENT_TYPE_AUDIO) is not supported for this model configuration Agents agent-development , python , realtime , gemini	3	21	July 27, 2026

All Livekit Inference Gemini LLMs return "Completion_tokens=0" and stop responding , This suddenly started happening today without any code change

Related topics