In a certain session, the agent didn’t followup after calling a tool - said it’ll do something, invoked the tool, then didn’t followup. Any attempts to nudge it by saying something resulted in the same behavior happening again - saying it’ll do something, calling the tool, and then not following up.
I couldn’t reproduce this in other sessions, but I suspect it’s likely to happen again, even if rarely.
@royibernthal
This is likely the behavioral difference between xAI Realtime and OpenAI Realtime in how post-tool turns are triggered.
With xAI (and other non-OpenAI realtime models), LiveKit swaps tools for the reply then restores them, but that restoration doesn’t automatically trigger a follow-up turn the way OpenAI Realtime does. Grok just goes silent if it doesn’t self-initiate.
Two things to try: explicitly call generate_reply() after your tool returns, and check max_tool_steps. If the limit is hit and the final LLM call produces no audio with xAI, you’d see exactly this pattern.
The inconsistency across sessions points to model-side behavior rather than your code.
@royibernthal, the agents framework handles the multi-step case explicitly: after tool execution, if num_steps >= max_tool_steps + 1, it forces tool_choice="none" on the follow-up call to “guarantee a final text response instead of silently stopping” [ agent_activity.py:2855-2914 ].
xAI Realtime under that tool_choice="none" constraint can return text without producing audio. That fits your symptom: tool fires, then silence. The intermittency would line up with whether the specific turn happens to hit max_steps_reached or draining state.
Practical implications can be: raise max_tool_steps if your tool chains are deeper than the default, and add a defensive watchdog on the post-tool turn:
session.on(voice.AgentSessionEventTypes.MetricsCollected, (ev) => {
// if the post-tool turn produced no audio output, nudge once
if (ev.metrics.outputTokens === 0) {
session.generateReply()
}
})
That same handler doubles as confirmation if you’re trying to reproduce, outputTokens=0 on the post-tool RealtimeModelMetrics proves the empty-audio path.