This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.
I’m building a multilingual voice agent using LiveKit agents with LangChain + ElevenLabs TTS. The agent needs to dynamically switch TTS voice/language based on what language the LLM responds in (e.g., user speaks Spanish → LLM replies in Spanish → TTS should use Spanish voice).
My current approach is polling the LangGraph state every 0.5s to detect new AI messages, run language detection, and call tts.update_options(). But this creates a race condition - TTS often starts speaking before the monitor detects the language change.
Is there a cleaner way to hook into the TTS pipeline before audio generation starts?
Override the tts_node method in your agent. The tts_node receives an AsyncIterable[str] of text chunks from the LLM. You can process this stream to detect language before forwarding to TTS:
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
async def process_text():
accumulated_text = ""
async for chunk in text:
accumulated_text += chunk
# Detect language once you have enough text
if len(accumulated_text.strip()) >= 20: # threshold
detected_lang = detect(accumulated_text)[0]['lang']
if detected_lang != self.current_language:
self.tts.update_options(
language=detected_lang,
voice_id=LANGUAGE_VOICE_MAPPING[detected_lang]
)
self.current_language = detected_lang
yield chunk
return Agent.default.tts_node(self, process_text(), model_settings)