Hey guys I have been using gpt 4o mini as the model; sarvam v3 as TTS and Deepgram STT Nova 3 and am getting abnormal latency. Is it because of livekit or something with my code. And when I am using tool calling thats taking like 10-10 sec. The voice quality also isnt very good , it keeps on breaking. What should I do/ optimise??
How do you deploy it? is it self-hosted? also where are you calling from vs where are the different components located?
I would start by looking at Agent insights and usage metrics. That should help you to start isolating the issue. If the voice is not good try a different model and see if it fits your needs more.
This doc may help:
The data @Shyamal_Narang needs to share before anyone can advance this:
- Geography.
g48in'squestion still matters.Sarvamis India-based; if the LiveKit-hosted worker isn’t running in an India region, every TTS request adds ocean-crossingRTT. Confirm where the worker actually spawns,Agent Insights >> Sessionsshows the region. - Four numbers from Agent Insights. End-of-user-speech >> turn detected;
turn detected > first LLM token;first LLM token >> first TTS audio chunk; per-call total.
The “tool calls take 10s” question becomes tractable once you know which phase owns the time. Most likely post-tool: LLM second round-trip + Sarvam first-byte from scratch.
- Sarvam streaming knobs. “Voice keeps breaking” with Sarvam often comes from
min_buffer_size / max_chunk_lengthdefaults (50 / 150) being too aggressive on unstableRTT. Trymin_buffer_size=150,max_chunk_length=300and compare. - Filler while the tool runs. Play a brief line (“checking that for you…”) immediately before invoking the tool, masks
5–10sof post-tool synthesis. Free perceived-latency win.
Without those Insights numbers, this stays guesswork.
As others have stated, make sure you good observability at every turn, and then start your tuning from there. Voice has too many moving parts, latency tuning has to be done systematicly. We’ve also used Sarvam for one of our agents, we have written about latency tunining here: Voice Agent Latency: The Sub-Second Tuning Playbook | ByondLabs