High end-to-end latency in LiveKit voice agent

Hey LiveKit team, I’m building an end-to-end voice agent using LiveKit. Everything is working, but I’m seeing unexpectedly high end-to-end latency.

What’s confusing is that I’m using the same pipeline that you have used for building the voice agent that resolves queries and can be used to test voice flow.

  • STT: Deepgram

  • LLM: Gemini 2.5 Flash

  • TTS: Cartesia

  • For V2V: Gemini 2.5 Flash (multimodal)

Despite matching the pipeline, latency is still significantly higher on my side, and I’m not sure what I’m doing wrong. What would you recommend as the next steps to debug this? If there’s a preferred checklist/instrumentation approach, I’m happy to follow it.

Thanks

I am trying to build out this FAQ for this topic: Frequently Asked Questions (FAQ) - #4

I’m using the same pipeline that you have used for building the voice agent that resolves queries and can be used to test voice flow

Sorry, what are you comparing your implementation to exactly?

I am comparing it with the Talk to livekit agent that is present on livekit website.

Sure, we’re not doing anything special on the homepage agent.

If you look at the “Agent Configuration” to the right of the agent it will tell you which models each agent uses (each agent has different models, and we might switch these out from time to time as new models are introduced). We always use LiveKit inference to communicate with models.

For example, Hayley’s model configuration currently looks like this:

stt="deepgram/nova-3",
llm="openai/gpt-4.1-mini",
tts="rime/arcana:astra",

Each agent is also initialized with some instructions, but these do not contain anything special - for our purposes they mostly provide hints about LiveKit, since many customers use the front page agent to ask about our product.

The session configuration for Hayley, and most of the other agents, is as follows (these are not necessarily recommended settings, they are just what work for us on the homepage):

min_endpointing_delay=0.2,
max_endpointing_delay=3,
preemptive_generation=True,
false_interruption_timeout=1,
resume_false_interruption=True,
min_interruption_words=0,

These settings are documented as part of turn detection, and speech generation. In many cases we just use the default values.

The homepage agent will tend to use the latest version of LiveKit agents and at the time of writing, it is using 1.4.0, which is the latest version of Python agents. This will also pull in the latest version of the turn detector model, `turn_detection=MultilingualModel().`

The agent is hosted on LiveKit cloud, in the US, with no special settings.

The web front end used to access the homepage agent is not doing anything special and is analogous to any of our front-end starters such as GitHub - livekit-examples/agent-starter-react: A complete voice AI frontend app for LiveKit Agents with Next.js, or even the agent Playground.