Why is my concurrent STT count high after sequential test calls?

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

I’m testing my phone agent and made calls one after another. Now I see “Concurrent STT is 20” and I’m about to exceed my plan limit. Since I made calls sequentially, why did concurrent STT become 20?

Does only LiveKit Inference STT count toward this limit? My custom STT plugin using Qwen APIs shouldn’t count, right?

“Concurrent STT” refers to how many STT streams/connections are open at the same time (usually WebSockets), not how many calls you’ve made in total.

With phone agents, each call typically opens an STT stream and keeps it open for the entire AgentSession. If previous sessions aren’t fully closing (or take time to close), concurrency can climb even when you dial “one after another.”