Agent speaking audio_text tokens out loud

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

I’m facing an unusual problem with my agent. Sometimes it sends and speaks values like <|audio_text|>, as in the example below:

{
  "id": "item_Clh60LcmD6UIgb8CEL6U4",
  "type": "message",
  "role": "assistant",
  "content": ["<|audio_text|>"]
}

My AgentSession uses OpenAI Realtime with Azure and ElevenLabs TTS. Any idea what this could be?

That <|audio_text|> is coming from and should be internal to the LLM, but sometimes can leak out.

Rather than trying to adjust the LLM with prompts, it’s probably more reliable to intercept and replace these in a custom llm_node. See this example that removes <think> tags:

the shared link is incorrect

gives a 404 not found

@Ahmed_Aziz apologies for the inconvenience, I have updated the link