How does Azure gpt vs Open AI gpt has such latency difference?

Shubhankar_Kumar · May 12, 2026, 7:57am

Every turn I noticed a difference in latency of around 1.5 to 2 seconds. I checked the code and found out that the Azure plugin wrapper has a stream set to false. Anyone have an idea on this? Because I got Grant from Azure.

darryncampbell · May 12, 2026, 10:57am

Assuming everything else remains the same, if you are seeing an additional 1.5-2 seconds of latency between OpenAI and Azure OpenAI for the LLM, my initial suspicion is that you are connecting to an Azure endpoint in a different region to your STT or TTS (or agent) introducing several large geographic network hops for each turn.

I would also recommend going through Understand and Improve Voice Agent Latency | LiveKit to understand where your latency is coming from.

Where do you see this? I thought the plugin streamed

Muhammad_Usman_Bashir · May 13, 2026, 8:58pm

@darryncampbell’s instinct is right. On main, both vanilla LLM(...) and LLM.with_azure(...) go through the same LLMStream parent in livekit-agents/livekit/agents/inference/llm.py:396, which calls client.chat.completions.create(..., stream=True, stream_options={"include_usage": True}). Streaming is on by default; no stream=False anywhere on the Azure path.

Where did you see stream=False? If it was a different method on LLM (a sync helper or eval shim), that’s not the path AgentSession exercises.

Real Azure-vs-OpenAI latency culprits:

Region routing. Azure OpenAI has fewer deployment regions than the OpenAI public API. If your STT/TTS or agent worker are in a region with no Azure deployment, every turn crosses oceans.
Azure Content Safety inline. Default Azure deployments run content moderation per request, adding 100-300ms first-token latency that OpenAI direct doesn’t.
PTU vs pay-as-you-go. Provisioned throughput has very different latency than PAYG; check your Azure deployment’s billing model.

Capture per-turn timing from Agent Insights (LLM TTFT specifically) on both providers under identical region setup. Whichever phase owns the 1.5-2s becomes obvious.

Topic		Replies	Views
Latency issue how to fix this? Getting Started	13	341	April 13, 2026
High end-to-end latency in LiveKit voice agent Getting Started agent-development	3	237	February 10, 2026
Hello all, what is the difference between inference and plugins. Does inference will be fast compared to plugins? Agents agent-development	1	36	March 4, 2026
Why is GPT-5.4 pricing via LiveKit Inference about 2x OpenAI direct? Agents livekit-inference	7	61	May 14, 2026
Lowest latency STT/TTS/LLM stack for German - what's your experience? Agents agent-development , stt , llm , tts	1	68	March 13, 2026

How does Azure gpt vs Open AI gpt has such latency difference?

Related topics