EOT value leads to around 30% of e2e latency, Tried reducing min/max endpointint in turn delay plugin but no luck

We are usually having llm ttft + tts ttfb around 1.2 seconds on avg. But stil e2e latency is much higher. i suspect it was because of end of turn delay as its mostly around 500ms on an avg.

I tried reducing

turn_handling=TurnHandlingOptions(

            turn_detection=MultilingualModel(),

endpointing={

"mode": "fixed",

"min_delay": 0.2,

"max_delay": 1.2,

            },

interruption={

"enabled": True,

"resume_false_interruption": True,

"mode": "adaptive",

"min_words": 2,

            },

preemptive_generation={"enabled": true},

I expect end of turn delay to reduce by 200-300ms but it still same around 500ms.
These settings are passed on agent session and i am using vad silero with value min_silence_duration = 0.2

One of the chat merics data for user turn
“end_of_turn_delay”: 0.5877835750579834,

“started_speaking_at”: 1781695592.438063,

“stopped_speaking_at”: 1781695593.151138,

“transcription_delay”: 0.5335736274719238,

“on_user_turn_completed_delay”: 0.00003577399979803886

One of the chat merics data for ai turn after it

“e2e_latency”: 1.832134485244751,

“llm_node_ttft”: 0.8538708429998678,

“tts_node_ttfb”: 0.38110785300000316,

“playback_latency”: 0.00008678436279296875,

“started_speaking_at”: 1781695594.9832726,

“stopped_speaking_at”: 1781695596.3034112

i am using deepgram nova STT,

Am i missing anything or what should i do to make e2e latency as lower as possible as from what i see if i reduce end of turn delay it will signifcantly reduce e2e but not sure why above changes not work.

i am using livekit-agent, turn detector with version 1.6.0

Adding few session details

RM_PGWcYFqCAcFT, RM_8Lnc83ntTuz8, RM_adPKQsMe3D3d

@Kaushal_Shah, Your metrics explain it: end_of_turn_delay (0.588) and transcription_delay (0.533) are both measured from stop-speaking [ livekit/agents voice/audio_recognition.py ], so only ~55ms separates them. That ~55ms is all the turn detector and min_delay/max_delay control; the 0.53s underneath is the wait for Deepgram’s final transcript, which TurnHandlingOptions.endpointing can’t touch.

The EOT lever is the STT side, not turn handling:

from livekit.plugins import deepgram
stt = deepgram.STT(model="nova-3", endpointing_ms=25, no_delay=True)

endpointing_ms already defaults to 25 [ livekit/agents deepgram stt.py ], so if you’re on it, most of that 533ms is Deepgram’s own finalize plus network RTT, a closer region or faster STT beats any knob. And note llm_node_ttft (0.85) is your biggest e2e slice anyway, ahead of EOT.

Ah makes sense, silly me
Yeah so EOT is composes of both STT finalization and turn detector time.

Thanks for clarification