Optimizing Voice Agent Latency, Tool Calling Delays, and Audio Quality Issues with GPT-4o Mini, Sarvam V3 TTS, Deepgram Nova 3 STT, and LiveKit

Shyamal_Narang · May 4, 2026, 6:53pm

Hey guys I have been using gpt 4o mini as the model; sarvam v3 as TTS and Deepgram STT Nova 3 and am getting abnormal latency. Is it because of livekit or something with my code. And when I am using tool calling thats taking like 10-10 sec. The voice quality also isnt very good , it keeps on breaking. What should I do/ optimise??

g48in · May 5, 2026, 9:53am

How do you deploy it? is it self-hosted? also where are you calling from vs where are the different components located?

CWilson · May 5, 2026, 6:35pm

I would start by looking at Agent insights and usage metrics. That should help you to start isolating the issue. If the voice is not good try a different model and see if it fits your needs more.

This doc may help:

Shyamal_Narang · May 7, 2026, 7:32pm

@CWilson + @g48in

Its hosted by live kit and yes I have seen this documentation and make my infrastructure accrodingly…

Muhammad_Usman_Bashir · May 10, 2026, 8:48pm

The data @Shyamal_Narang needs to share before anyone can advance this:

Geography. g48in's question still matters. Sarvam is India-based; if the LiveKit-hosted worker isn’t running in an India region, every TTS request adds ocean-crossing RTT. Confirm where the worker actually spawns, Agent Insights >> Sessions shows the region.
Four numbers from Agent Insights. End-of-user-speech >> turn detected; turn detected > first LLM token; first LLM token >> first TTS audio chunk; per-call total.

The “tool calls take 10s” question becomes tractable once you know which phase owns the time. Most likely post-tool: LLM second round-trip + Sarvam first-byte from scratch.

Sarvam streaming knobs. “Voice keeps breaking” with Sarvam often comes from min_buffer_size / max_chunk_length defaults (50 / 150) being too aggressive on unstable RTT. Try min_buffer_size=150, max_chunk_length=300 and compare.
Filler while the tool runs. Play a brief line (“checking that for you…”) immediately before invoking the tool, masks 5–10s of post-tool synthesis. Free perceived-latency win.

Without those Insights numbers, this stays guesswork.

Sarvam TTS plugin guide | LiveKit Documentation

Raghu_Udiyar · May 11, 2026, 9:04am

As others have stated, make sure you good observability at every turn, and then start your tuning from there. Voice has too many moving parts, latency tuning has to be done systematicly. We’ve also used Sarvam for one of our agents, we have written about latency tunining here: Voice Agent Latency: The Sub-Second Tuning Playbook | ByondLabs

Topic		Replies	Views
Latency issue how to fix this? Getting Started	13	434	April 13, 2026
High end-to-end latency in LiveKit voice agent Getting Started agent-development	3	298	February 10, 2026
Voxtral TTS API 1,230ms TTFB in real-time voice agent pipeline Agents tts , mistralai	3	108	April 6, 2026
Lowest latency STT/TTS/LLM stack for German - what's your experience? Agents agent-development , stt , llm , tts	1	80	March 13, 2026
Add api ElevenLabs key to agents TTS Getting Started	4	70	March 26, 2026

Optimizing Voice Agent Latency, Tool Calling Delays, and Audio Quality Issues with GPT-4o Mini, Sarvam V3 TTS, Deepgram Nova 3 STT, and LiveKit

Related topics