How does Azure gpt vs Open AI gpt has such latency difference?

Every turn I noticed a difference in latency of around 1.5 to 2 seconds. I checked the code and found out that the Azure plugin wrapper has a stream set to false. Anyone have an idea on this? Because I got Grant from Azure.

Assuming everything else remains the same, if you are seeing an additional 1.5-2 seconds of latency between OpenAI and Azure OpenAI for the LLM, my initial suspicion is that you are connecting to an Azure endpoint in a different region to your STT or TTS (or agent) introducing several large geographic network hops for each turn.

I would also recommend going through Understand and Improve Voice Agent Latency | LiveKit to understand where your latency is coming from.

Where do you see this? I thought the plugin streamed

@darryncampbell’s instinct is right. On main, both vanilla LLM(...) and LLM.with_azure(...) go through the same LLMStream parent in livekit-agents/livekit/agents/inference/llm.py:396, which calls client.chat.completions.create(..., stream=True, stream_options={"include_usage": True}). Streaming is on by default; no stream=False anywhere on the Azure path.

Where did you see stream=False? If it was a different method on LLM (a sync helper or eval shim), that’s not the path AgentSession exercises.

Real Azure-vs-OpenAI latency culprits:

  • Region routing. Azure OpenAI has fewer deployment regions than the OpenAI public API. If your STT/TTS or agent worker are in a region with no Azure deployment, every turn crosses oceans.
  • Azure Content Safety inline. Default Azure deployments run content moderation per request, adding 100-300ms first-token latency that OpenAI direct doesn’t.
  • PTU vs pay-as-you-go. Provisioned throughput has very different latency than PAYG; check your Azure deployment’s billing model.

Capture per-turn timing from Agent Insights (LLM TTFT specifically) on both providers under identical region setup. Whichever phase owns the 1.5-2s becomes obvious.