Error: 429 Too Many Requests on agent-gateway.livekit.cloud

Hi — Inference gateway returns HTTP 429 for my Cloud project.

Project name: Production Grade Voice Bot
Plan: Build

What I run: LiveKit Agents Python, agent.py console on Windows.
Error: 429 Too Many Requests on agent-gateway.livekit.cloud

When: 2026-04-19 around 11:00 am (my local timezone: Saudi Arabia)
Billing page shows Inference STT/TTS/LLM usage — not at $0 because of “no usage”, but gateway still 429 immediately.

Question: Is there a burst/minute/regional cap or account flag on Inference gateway for Build? Can you check my project?

I am NOT sharing API keys here.

I’ve experienced this issue too, I ultimately just stopped using livekit inference.stt. I was getting weird behavior where apparently on connect, the client was sending 4 bursts of stt connection requests, immediately 429’ing.

Does this crash your session?

I suspect this is a bug; the previous post I made about this wasn’t solved as I just switched to a direct STT connection.

One question, though: do you do await ctx.connect() explicitly? If so, is it right above your session.start()?

There are concurrency limits on Inference STT / TTS. Please see this section of the pricing page: Pricing | LiveKit

If you look at your plan quotas: Sign in | LiveKit Cloud, you have a peak usage of 2 STT and 3 TTS. The total of 5 is equal to the concurrency limit on build, so I assume you tried to create a 6th connection, which triggered the 429