I am struggling with 11Labs performance with in Livekit. I am getting 3-4 second turnaround on Elevenlabs. I am trying to migrate from 11Labs and hence trying to keep the same setup for now, but it’s at least a second slower.
Has anyone been able to get decent performance with 11Labs? Is it possible to achieve 2 seconds or less?
STT: scribe_v2_realtime is slower than Deepgram streaming. You already have livekit-plugins-deepgram; switch to nova-3 streaming for several hundred ms of TTFT savings.
TTS knobs: your params are tuned for fidelity. For latency: style: 0, stability: 0.5, similarity_boost: 0.5, speed: 1.0, drop sync_alignment, keep auto_mode: true. Each fidelity-tuned param adds server-side processing time on Flash v2.5.
Turn detection: swap “vad” for MultilingualModel from livekit-plugins-turn-detector. Semantic EOU often fires before your 200ms silence window.
Also, you’re on livekit-agents==1.3.10; latest is 1.5.9. inference_class="priority" for lower-TTFT routing on LK Inference shipped in 1.5.7.
Run @darryncampbell’s observability breakdown first. The Sessions dashboard splits each turn into STT/EOU/LLM/TTS so you’ll know which knob actually moves your number.
I am left with LLM performance, which is abysmal. I understand geography plays a big part.
Gemini 2.5 flash with 0 temp and 0 thinking budget produces 1.6 sec TTFT on average (mode)
Gemini 3.1 flash lite with 0 temp and minimal reasoning effort produces 0.9 TTFT
Both are faster on from Google than from Livekit inference
Meanwhile, GPT 5.2 chat (0 temp, low effort) produces a larger variance of TTFT, with mode around 1.75 seconds
In Elevenlabs, GPT 5.2 produces TTFT in the 400ms to 550ms
Do you guys find inference better in your experience or using directly from the vendor?