I am implementing a real-time voice assistant using LiveKit Agents with Deepgram STT and ElevenLabs TTS.
Problem:
Deepgram is not correctly recognizing Hindi speech in live streaming. English works fine, but Hindi or Hinglish is either गलत transcribed or not detected.
Current setup:
-
STT: Deepgram (nova-2 / nova-3)
-
Language: hi / multi
-
LiveKit AgentSession
-
VAD: silero
-
Noise cancellation enabled
Issues:
-
Hindi sentences are cut or misinterpreted
-
Hinglish fails completely
-
Latency is also high
Example:
Input: “नमस्ते आप कैसे हैं”
Output: incorrect or empty
What I tried:
-
language=“hi”
-
language=“multi”
-
changing endpointing delay
-
using noise cancellation
Questions:
-
Which Deepgram model works best for Hindi in streaming?
-
Should I use “multi” or “hi”?
-
Is LiveKit inference STT better or direct Deepgram plugin?
-
Any best practices for low-latency Hindi voice?
Goal:
Achieve real-time (<500ms) accurate Hindi conversation.
Any help would be appreciated.