I have encountered the following issue when using the Cerebras provider with LiveKit:
At times, the agent gets stuck in the “thinking” state, after which it reverts back to the “listening” state. In the logs, I see the following entry:
{"llm":"cerebras.LLM","totalChunks":1,"textLength":0,"msg":"FallbackAdapter: Provider succeeded"}.
Interestingly, this problem only occurs when reasoning_effort is enabled. If I disable it, the model does not get stuck; however, the response quality drops significantly. Are there any recommendations regarding this issue, or has anyone encountered something similar?
@bogdan_taradada, textLength=0 is the smoking gun: the provider returned a chunk with no content, so the agent had nothing to speak and reverted to listening.
-
The reason this only fires with reasoning_effort on is structural. Cerebras puts reasoning output in a separate reasoning field on the message object, distinct from content [ inference-docs.cerebras.ai/api-reference/chat-completions ].
-
The Cerebras plugin extends OpenAILLM and only customizes request compression, so response parsing flows through the shared OpenAI-compatible path [ livekit-plugins-cerebras/livekit/plugins/cerebras/llm.py ].
-
That parser reads delta.content (after stripping tokens) plus an optional extra_content field, and returns None when both are empty. There is no handling of delta.reasoning [livekit-agents/livekit/agents/inference/llm.py _parse_choice, lines ~527-534 ].
-
With reasoning_effort on, the effort budget lands in the reasoning channel and the content channel can come back empty, which is exactly what your log shows.
Some of practical options: keep reasoning_effort off and compensate in the system prompt It’s a small change inside _parse_choice. If you need reasoning plus speech today, run reasoning as a separate call and feed only the final answer into the speakable turn.