Managing concurrent STT sessions

Giovanni_Braghieri · March 24, 2026, 12:15pm

Hi — we’re seeing persistent STT concurrency that doesn’t drop to zero after sessions are closed, and we want to verify whether this is expected behavior or if we are missing a cleanup step.

What we observe

Runtime metric shows: Concurrent STT max 5, current 3
This persists even after:
- user disconnects
- room is closed/deleted
- app-side sessions are marked finished
We also occasionally see teardown warnings and then:
- failed to send usage report: http status: 401
- followed by a native abort in the voice-agent process:
  - libc++abi ... mutex lock failed: Invalid argument

Our implementation (Node, @livekit/agents)

AgentSession created with:
- stt: new inference.STT({ model: 'deepgram/nova-3', language: 'multi' })
- llm: new inference.LLM({ model: 'openai/gpt-4.1-mini' })
- tts: createGracefulTTS()
- VAD + multilingual turn detector
Session starts with:
- await session.start({ agent, room, inputOptions })
We rely on normal participant disconnect to close session.
Logs typically show:
- AgentSession closed
- disconnected from room
- Job process shutdown
We also have admin force-close:
- list/delete rooms via RoomServiceClient
- remove participants before delete
- mark app-side sessions finished

Question

Is STT “current” expected to remain >0 for a while after teardown (provider-side TTL / delayed accounting), or should it drop to 0 immediately when AgentSession closes and room is deleted? We could not understand it from here Server lifecycle | LiveKit Documentation

Also, are there recommended best practices to guarantee STT stream cleanup in Node agents (beyond room delete + participant remove + session close), especially when there are occasional teardown warnings?

Useful details

SDK: @livekit/agents / plugin versions around 1.0.47
STT provider: Deepgram via inference.STT
Environment: local dev, frequent start/stop cycles

Thanks — happy to share more logs if needed.

CWilson · March 24, 2026, 7:33pm

How are you measuring “persistent STT concurrency”?

Giovanni_Braghieri · March 25, 2026, 2:35pm

We’re basing that on the LiveKit runtime metric/dashboard, which shows Concurrent STT current > 0 even after the related session has been torn down.

Concretely, we observe:

user disconnected
room deleted / closed
app-side session marked finished
agent logs show AgentSession closed, disconnected from room, and job shutdown

But the LiveKit-side STT current metric on the dashboard still remains non-zero for some time afterward. We’re calling it “persistent” because we would expect it to return to 0 once everything is torn down.

To be clear, we are not yet proving that this metric is the root cause. What we are seeing is that at some point the API returns an error no matter and fails all stt connections. (the error is generic, does not mention the quota but it’s the only plausible explanation)

CWilson · March 26, 2026, 1:36am

Hi Giovanni, welcome!

A few clarifying questions and initial thoughts:

On the dashboard metric staying > 0:

The concurrency metric on the dashboard can lag behind actual session teardown — it’s not a real-time reflection of active STT streams. There’s some delay in accounting, especially during frequent start/stop cycles in local dev. So seeing “current 3” after teardown doesn’t necessarily mean 3 STT streams are still open. I wouldn’t use the dashboard metric alone to diagnose a quota issue.

On the actual failures you’re seeing:

The more concerning part is this:

at some point the API returns an error no matter and fails all STT connections

Can you share the exact error message/code you’re getting when STT connections start failing? That will help us determine whether this is actually a concurrency limit issue or something else (auth, rate limiting, provider-side, etc.).

On the teardown warnings:

The sequence you’re seeing:

failed to send usage report: http status: 401
libc++abi ... mutex lock failed: Invalid argument (native abort)

The 401 on usage report suggests an auth token expiring during teardown — this shouldn’t affect STT cleanup but the native abort that follows is a bug. Are you able to reproduce this consistently? If so, can you share:

Your @livekit/agents version (you mentioned ~1.0.47 — exact version would help)
The full stack trace / log around the abort
Whether this happens on every session teardown or only intermittently

On cleanup best practices for Node agents:

Your current approach (participant remove → room delete → app-side session close) is correct. A couple of things to check:

Are you calling session.close() explicitly, or only relying on participant disconnect events? In local dev with frequent restarts, disconnect events can get lost. Explicitly closing the session is more reliable.
When you force-close via RoomServiceClient, make sure you’re awaiting the room deletion — if the process exits before the cleanup completes, streams may linger until they time out server-side.
For local dev specifically, if you’re killing the agent process (Ctrl+C / SIGKILL), graceful shutdown may not complete. Try handling SIGINT/SIGTERM to call session.close() before exit.

The most useful next step would be getting the exact error when STT starts failing. That’ll tell us whether we’re looking at a concurrency limit, an auth issue, or something provider-side with Deepgram.

Giovanni_Braghieri · April 3, 2026, 11:01am

Hi - I realised that this is probably due to the fact that I ran out of free quota.
Now, definitely not a problem for me but as feedback I need to say that the UX can be improved.

It’s unclear how I have used my free quota (minutes, concurrent sessions, egress,ingress?) - there is no clear dashboard that shows how much I still have
I only got a message saying “You exceeded your quote for March” but that was pretty much it

CWilson · April 3, 2026, 11:54am

@Giovanni_Braghieri I appreciate the feedback. I will bring this back to the UX team.

Topic		Replies	Views
Observed STT/TTS Concurrency Behavior During Internal Testing Agents react-native , stt , tts , testing	1	22	February 24, 2026
Why is my concurrent STT count high after sequential test calls? Agents agent-deployment , stt	1	19	January 21, 2026
Long-running Voice Session incur Significant Costs for WebSocket-based STT model Agents agent-development , python , stt , realtime , node-js , deepgram , elevenlabs	2	31	March 31, 2026
Reliably persisting transcriptions under moderate concurrency Agents agent-development	5	23	March 23, 2026
Inference STT WebSocket fails (APIConnectionError) while room connection works Agents agent-development	1	7	March 26, 2026

Managing concurrent STT sessions

What we observe

Our implementation (Node, @livekit/agents)

Question

Useful details

Related topics