Cerebras plugin

I have encountered the following issue when using the Cerebras provider with LiveKit:

At times, the agent gets stuck in the “thinking” state, after which it reverts back to the “listening” state. In the logs, I see the following entry:
{"llm":"cerebras.LLM","totalChunks":1,"textLength":0,"msg":"FallbackAdapter: Provider succeeded"}.

Interestingly, this problem only occurs when reasoning_effort is enabled. If I disable it, the model does not get stuck; however, the response quality drops significantly. Are there any recommendations regarding this issue, or has anyone encountered something similar?

@bogdan_taradada, textLength=0 is the smoking gun: the provider returned a chunk with no content, so the agent had nothing to speak and reverted to listening.

  • The reason this only fires with reasoning_effort on is structural. Cerebras puts reasoning output in a separate reasoning field on the message object, distinct from content [ inference-docs.cerebras.ai/api-reference/chat-completions ].

  • The Cerebras plugin extends OpenAILLM and only customizes request compression, so response parsing flows through the shared OpenAI-compatible path [ livekit-plugins-cerebras/livekit/plugins/cerebras/llm.py ].

  • That parser reads delta.content (after stripping tokens) plus an optional extra_content field, and returns None when both are empty. There is no handling of delta.reasoning [livekit-agents/livekit/agents/inference/llm.py _parse_choice, lines ~527-534 ].

  • With reasoning_effort on, the effort budget lands in the reasoning channel and the content channel can come back empty, which is exactly what your log shows.

Some of practical options: keep reasoning_effort off and compensate in the system prompt It’s a small change inside _parse_choice. If you need reasoning plus speech today, run reasoning as a separate call and feed only the final answer into the speakable turn.