Trying to understand how cloud limit on STT and TTS reset

In my LiveKit project, I had my agent workers running locally. I might not have implemented the best way to clean up socket connections. I attempted to close all development VPS and local agent workers to eliminate WebSocket leaks but unfortunately my project still uses all five of the five available. I’m just starting the development and testing phases and while I can switch to a different plan I still want to be on the build plan during the phase-out.

Could anyone help me with this situation?

Thanks.

I believe you are referring to the LiveKit Inference concurrency, Pricing | LiveKit . This allows for 5 concurrent connections to model providers through LiveKit inference, so if you had STT, TTS and LLM provided through LK inference, this would be 3 concurrent connections. If you launched another session at the same time, this would max out your connections on the free tier.

After your agent leaves the room (or your participant leaves the room, causing the room to shut down and the agent to leave) these concurrent connections are freed automatically, you should not need to worry about any websocket connections yourself as that is handled by LiveKit’s Agent framework.

Thanks for the quick reply. However, the LK inference limits aren’t automatically freeing unless there’s a specified time frame. It’s been nearly 12 + hours and the limit still shows as used.

Type Limit Peak usage (past 7 days)
Concurrent participants 100 2
Concurrent Egress requests 2 0
Concurrent Ingress requests 2 0
Concurrent agent sessions 5 0
Agents deployed on LiveKit Cloud 1 1
Concurrent STT 5 5
Concurrent TTS 5 7
LLM requests per minute 100 31
LLM tokens per minute 600,000 39,863

This is the p_3ugvi54jk02 if that helps.

That table is peak usage over the past 7 days, it’s not realtime.

The best table to monitor current use is the session dashboard, https://cloud.livekit.io/projects/p_/sessions (see it shows all your sessions as closed)

You can also use the LiveKit CLI to list the current sessions (rooms):

lk room list

Seems like. wss://agent-gateway.livekit.cloud/v1/stt — gateway rejects with 429 (no inference credits). Is there no way to top up this credits? without switching to another plan? Can you please advise on this.

Thank you

Cc: @Kiyado_Labs

@darryncampbell, or @CWilson should weigh in on whether Build credits can be topped up without changing plans (commercial question).

For the technical alternative: you can skip LiveKit Inference entirely and plug provider SDKs directly with your own API keys (OpenAI, Deepgram, Cartesia, ElevenLabs, etc.). The agent framework supports all major providers natively; you pay the provider directly, no agent-gateway.livekit.cloud credit gate. Useful for dev/testing where the included Inference credits drain quickly.

Understood.

No, the thinking behind the build plan is to make it as easy for developers to get started as possible, and that includes not requiring you to enter a credit card.

The amount of free Inference credits we provide on build is in-line with the free allowance if you went to the model providers directly.

Users on the ship plan or higher, where we have a credit card on file, can go over the free limit and are charged accordingly, but there is no option to add a credit card to the build tier.

Yep i have been using them as a fallback. Thanks for letting me know

Not sure if thats a really good developer UX. Free credits can be exhausted pretty quickly after just couple of testing. While most of the developers wouldn’t mind to top-up for inference credits still being on the build plan would have been much more better UX. Yes i understand we can always fallback to the native adapters.

At least allow the option to add a billing debit or credit card to the build plan so inference continues to work after the free credits expire at $2.50. As a developer who wants to test, break and test the cycle, a build plan is the best option in my opinion. Having to pay $50 per month for the ship plan for team collaboration isn’t worth it, especially since the only additional free inference credits are $5.00 if I’m not mistaken. Someone who wants to test the features without upgrading to a different plan and locking billing only for ship and scale plans isn’t the best option either.

@Kiyado_Labs, one hybrid worth trying if you still want some LK Inference exposure on Build: move STT and TTS to direct provider keys (Deepgram, Cartesia, ElevenLabs, etc.) and keep LLM on LK Inference.

STT and TTS hold persistent WebSocket connections for the whole session, so they dominate your inference concurrency. LLM calls are bursty HTTP turns. Pushing the chatty streams off LK Inference onto provider free tiers gives the most headroom per Build credit while still letting you test the LK routing layer.