Why is GPT-5.4 pricing via LiveKit Inference about 2x OpenAI direct?

Hi team,

I’m evaluating LiveKit Inference for GPT-5.4 and noticed that the published token pricing appears to be 2x the equivalent OpenAI direct API pricing. Same thing for GPT-5.5.

Is that expected, or am I comparing the wrong model/unit?

A few specific questions:

  • Does the LiveKit Inference price include additional managed infrastructure, routing, availability, or billing overhead?
  • Does LiveKit Inference support/pass through OpenAI cached-input pricing where applicable?
  • Are there latency or reliability benefits that would make LiveKit Inference the recommended path over calling OpenAI directly from a LiveKit Agent?
  • For production voice agents, is the intended guidance “use LiveKit Inference for convenience/managed integration, direct OpenAI for cost optimization”?

Just trying to understand the tradeoff before choosing the architecture.

Good question, let me ask internally about this. Those two models you mentioned are outliers in terms of cost.

I notice that for gpt-5.4 mini and gpt-5.4 nano the prices are the same between LiveKit Inference and Open AI

Does the LiveKit Inference price include additional managed infrastructure, routing, availability, or billing overhead?

No, the costs listed on Build voice, video, and physical AI | LiveKit are only for the provider integration, the infrastructure costs are covered by Pricing | LiveKit.

Also consider the tradeoff in performance and latency - the models you list (gpt 5.4 and 5.5) are highly capable, but will be slow - models such as the mini model will provide a better trade-off for voice use cases as they will be quicker to respond.

While @darryncampbell checks the internal pricing question, the architectural alternative is straightforward: skip LiveKit Inference and use the OpenAI plugin directly with your own OPENAI_API_KEY. You pay OpenAI direct rates with zero markup, and you keep all of LiveKit Agents (turn detection, plugins, telephony, Insights). Inference is opt-in convenience routing.

So your read on question 4 is correct: Inference for managed routing/billing, direct OpenAI for cost optimization. Mini-class models are the right voice choice either way per @darryncampbell’s latency point.

@Abdulaziz looking closer, our prices match what OpenAI term ‘Long context’ (Pricing | OpenAI API) but I have raised this with the internal team to investigate.

The intention is that Inference pricing should not be a barrier to adoption compared with going directly to the provider, this is why in most cases I would expect the prices to either match, or be similar.

Thanks Darryn, that makes sense and matches what I’m seeing.

The key point for us is that our requests do not cross the long-context threshold. OpenAI’s pricing page says standard GPT-5.4 pricing applies under ~270K context, and the GPT-5.4 model page says the long-context uplift applies only when prompts exceed 272K input tokens.

Our production voice-agent requests are well below that, so if LiveKit Inference prices every GPT-5.4 request at the long-context rate, it would be a material premium versus using the OpenAI plugin directly.

Can you confirm whether LiveKit intends to support the standard under-threshold GPT-5.4 price for requests that do not use long context?

We would prefer to keep using LiveKit Inference, especially since our LiveKit agent is hosted in the EU and we expect the managed provider path to be better for latency. But if GPT-5.4 will remain priced at the long-context rate regardless of actual request size, we’ll likely route GPT-5.4 directly to OpenAI for cost reasons.

That’s fair, I expect most voice agent implementations will fall under short-context like yourselves.

Can you confirm whether LiveKit intends to support the standard under-threshold GPT-5.4 price for requests that do not use long context?

I suspect so, but I don’t want to commit to that on a public forum :slight_smile: let me ping again internally. It may be more nuanced - I’m not familiar with how our billing for Inference is set up behind the scenes or the details of our pricing relationship with OpenAI.

But if GPT-5.4 will remain priced at the long-context rate regardless of actual request size, we’ll likely route GPT-5.4 directly to OpenAI for cost reasons.

That makes complete sense, and I would do the same in your position.

@darryncampbell’s “long-context rate applied universally” finding lines up with what we both saw. That reframes the 2x as a billing-tier mismatch on LiveKit’s side rather than an intentional markup, which is actually the better outcome for you. Once it’s reconciled, your direct-vs-Inference cost gap on sub-272K requests should close on its own.

For the EU latency angle: that’s a real trade-off if you switch to direct OpenAI even temporarily. An EU-hosted agent calling OpenAI’s US endpoints adds roughly 100-200ms versus Inference’s managed path. Worth measuring on a representative call before committing to direct routing as the permanent setup.

Cc: @Abdulaziz

Thanks Darryn, appreciate you checking internally.

That sounds reasonable. We’ll hold off on changing architecture for now and wait for the official answer.