This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.
I’m building a voice agent for conversations in local languages like Hindi, Punjabi, and Tamil.
Questions:
- Is there a way to use turn_detection with local languages?
- If I have to use VAD, is there a way to pass the language in metadata and use turn_detection when it’s a supported language, and fall back to VAD when it’s not?
-
The multilingual turn detection model will work with Hindi, but for Punjabi and Tamil you would have to use VAD.
-
You can switch turn detection dynamically using session.update_options:
Enable multilingual turn detection:
session.update_options(turn_detection=MultilingualModel())
Fall back to VAD:
session.update_options(turn_detection="vad")
You would need to configure the language beforehand or determine it during the call (e.g., using LLM), as most STT providers don’t report the detected language on the fly.