Agent speaking audio_text tokens out loud

LiveKit-Community · January 21, 2026, 1:40pm

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

I’m facing an unusual problem with my agent. Sometimes it sends and speaks values like <|audio_text|>, as in the example below:

{
  "id": "item_Clh60LcmD6UIgb8CEL6U4",
  "type": "message",
  "role": "assistant",
  "content": ["<|audio_text|>"]
}

My AgentSession uses OpenAI Realtime with Azure and ElevenLabs TTS. Any idea what this could be?

LiveKit-Community · January 21, 2026, 1:40pm

That <|audio_text|> is coming from and should be internal to the LLM, but sometimes can leak out.

Rather than trying to adjust the LLM with prompts, it’s probably more reliable to intercept and replace these in a custom llm_node. See this example that removes <think> tags:

github.com/livekit-examples/python-agents-examples

docs/examples/replacing_llm_output/replacing_llm_output.py

main


      
          async with self._llm.chat(chat_ctx=chat_ctx, tools=tools, tool_choice=None) as stream:
              async for chunk in stream:
                  if chunk is None:
                      continue
          
                  content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else str(chunk)
                  if content is None:
                      yield chunk
                      continue
          
                  processed_content = content.replace("<think>", "").replace("</think>", "Okay, I'm ready to respond.")
                  print(f"Original: {content}, Processed: {processed_content}")
          
                  if processed_content != content:
                      if hasattr(chunk, 'delta') and hasattr(chunk.delta, 'content'):
                          chunk.delta.content = processed_content
                      else:
                          chunk = processed_content
          
                  yield chunk

Ahmed_Aziz · March 5, 2026, 9:30pm

the shared link is incorrect

Ahmed_Aziz · March 5, 2026, 9:30pm

gives a 404 not found

darryncampbell · March 6, 2026, 8:09am

@Ahmed_Aziz apologies for the inconvenience, I have updated the link

Topic		Replies	Views
Gpt-realtime-1.5 leaks audio control tokens (<\|audio_text\|>, <\|caption_quality_N\|>) into text stream when run with modalities=["text"] Agents tts , realtime	1	16	April 20, 2026
Add api ElevenLabs key to agents TTS Getting Started	4	33	March 26, 2026
Response.prompt_cache_retention Input should be ‘in-memory’ or ‘24h Agents agent-development , openai	2	21	April 21, 2026
Realtime model with Azure whisper STT Agents python , stt , realtime , openai , azure	17	108	February 26, 2026
Question with LLM tool calling Agents agent-development	9	121	February 16, 2026

Agent speaking audio_text tokens out loud

Related topics