Yesterday I was testing my product using LiveKit Inference endpoints with STT = deep gram/nova3 and LLM = gemini 3.1 flash lite and TTS = Cartesia/sonic-3 and it was working perfectly fine , without any code change today suddenly I see that after one or two turns my product just stays silent and nothing happens and when I checked the logs I saw that my LLM is returning “Completion_tokens=0”. I also tried to change the LLM model and it works amazing with other models today as well , but its an issue with just google models
Is it possible you are hitting Google’s Moderation? Have checked with them on what may be happening?
Thanks for the hint , it could be the reason because google rolled out its latest moderations on 21st may .
But how do I know that , while streaming I am unable to see those logs that why the LLM returns “Completion_token=0”.
Basically I want advice and guidance on how do I evaluate my prompts on the latest moderations by google
Google community will be the experts on this. I don’t have specific knowledge of Googles recommendations on this topic.
@CX_PROGRAMMER_AND_DESIGNER, completion_tokens=0 with empty output is the signature of a Gemini safety block. The reason lies in the raw response: finishReason="SAFETY" plus safetyRatings (category + probability), and promptFeedback.blockReason if the prompt itself was blocked [ ai.google.dev/gemini-api/docs/safety-settings ]. LiveKit Inference surfaces token counts, not those fields, so you can’t see the cause through it.
In my opinion, to see it: replay a failing turn in Google AI Studio or against the Gemini API directly, where finishReason and safetyRatings show. To fix over-moderation: set Gemini’s safetySettings per-category thresholds
( HARM_CATEGORY_HATE_SPEECH, HARASSMENT, SEXUALLY_EXPLICIT, DANGEROUS )
like BLOCK_ONLY_HIGH or BLOCK_NONE. Inference doesn’t expose these, but the direct Google plugin does: google.LLM(safety_settings=[...]) forwards them to Gemini [ livekit/agents/…/google/llm.py ]. Switching to the plugin with your own key gives both the block-reason visibility and the threshold control.
@CX_PROGRAMMER_AND_DESIGNER Do you use Agent insights? If so can you share a session where you saw the zero tokens. If not using agent insights do you have full agent log from when this happened?
Someone on our team tried to reproduce this but did not see this behavior exactly.