EOT value leads to around 30% of e2e latency, Tried reducing min/max endpointint in turn delay plugin but no luck

Kaushal_Shah · June 17, 2026, 11:33am

We are usually having llm ttft + tts ttfb around 1.2 seconds on avg. But stil e2e latency is much higher. i suspect it was because of end of turn delay as its mostly around 500ms on an avg.

I tried reducing

turn_handling=TurnHandlingOptions(

            turn_detection=MultilingualModel(),

endpointing={

"mode": "fixed",

"min_delay": 0.2,

"max_delay": 1.2,

            },

interruption={

"enabled": True,

"resume_false_interruption": True,

"mode": "adaptive",

"min_words": 2,

            },

preemptive_generation={"enabled": true},

I expect end of turn delay to reduce by 200-300ms but it still same around 500ms.
These settings are passed on agent session and i am using vad silero with value min_silence_duration = 0.2

One of the chat merics data for user turn
“end_of_turn_delay”: 0.5877835750579834,

“started_speaking_at”: 1781695592.438063,

“stopped_speaking_at”: 1781695593.151138,

“transcription_delay”: 0.5335736274719238,

“on_user_turn_completed_delay”: 0.00003577399979803886

One of the chat merics data for ai turn after it

“e2e_latency”: 1.832134485244751,

“llm_node_ttft”: 0.8538708429998678,

“tts_node_ttfb”: 0.38110785300000316,

“playback_latency”: 0.00008678436279296875,

“started_speaking_at”: 1781695594.9832726,

“stopped_speaking_at”: 1781695596.3034112

i am using deepgram nova STT,

Am i missing anything or what should i do to make e2e latency as lower as possible as from what i see if i reduce end of turn delay it will signifcantly reduce e2e but not sure why above changes not work.

i am using livekit-agent, turn detector with version 1.6.0

Kaushal_Shah · June 17, 2026, 11:51am

Adding few session details

RM_PGWcYFqCAcFT, RM_8Lnc83ntTuz8, RM_adPKQsMe3D3d

Muhammad_Usman_Bashir · June 17, 2026, 9:19pm

@Kaushal_Shah, Your metrics explain it: end_of_turn_delay (0.588) and transcription_delay (0.533) are both measured from stop-speaking [ livekit/agents voice/audio_recognition.py ], so only ~55ms separates them. That ~55ms is all the turn detector and min_delay/max_delay control; the 0.53s underneath is the wait for Deepgram’s final transcript, which TurnHandlingOptions.endpointing can’t touch.

The EOT lever is the STT side, not turn handling:

from livekit.plugins import deepgram
stt = deepgram.STT(model="nova-3", endpointing_ms=25, no_delay=True)

endpointing_ms already defaults to 25 [ livekit/agents deepgram stt.py ], so if you’re on it, most of that 533ms is Deepgram’s own finalize plus network RTT, a closer region or faster STT beats any knob. And note llm_node_ttft (0.85) is your biggest e2e slice anyway, ahead of EOT.

Kaushal_Shah · June 18, 2026, 1:27pm

Ah makes sense, silly me
Yeah so EOT is composes of both STT finalization and turn detector time.

Thanks for clarification

Topic		Replies	Views
Turn Detection Latency Issue Agents stt , turn-detection	3	114	April 28, 2026
Incorrect/false high e2e Latency > 203 seconds OR sometimes not present at all in chat item Getting Started python , stt , deepgram , livekit-cloud	5	43	June 17, 2026
Cloud turn detector failed + late STT final warnings after enabling LiveKit inference audio turn detection/vad Getting Started livekit-inference , turn-detection	8	33	June 22, 2026
High end-to-end latency in LiveKit voice agent Getting Started agent-development	3	298	February 10, 2026
High Turn Detection Latency Issue Agents agent-development , agent-deployment , turn-detection	1	57	April 29, 2026

EOT value leads to around 30% of e2e latency, Tried reducing min/max endpointint in turn delay plugin but no luck

Related topics