Managing concurrent STT sessions

Hi — we’re seeing persistent STT concurrency that doesn’t drop to zero after sessions are closed, and we want to verify whether this is expected behavior or if we are missing a cleanup step.

What we observe

  • Runtime metric shows: Concurrent STT max 5, current 3

  • This persists even after:

    • user disconnects

    • room is closed/deleted

    • app-side sessions are marked finished

  • We also occasionally see teardown warnings and then:

    • failed to send usage report: http status: 401

    • followed by a native abort in the voice-agent process:

      • libc++abi ... mutex lock failed: Invalid argument

Our implementation (Node, @livekit/agents)

  • AgentSession created with:

    • stt: new inference.STT({ model: 'deepgram/nova-3', language: 'multi' })

    • llm: new inference.LLM({ model: 'openai/gpt-4.1-mini' })

    • tts: createGracefulTTS()

    • VAD + multilingual turn detector

  • Session starts with:

    • await session.start({ agent, room, inputOptions })
  • We rely on normal participant disconnect to close session.

  • Logs typically show:

    • AgentSession closed

    • disconnected from room

    • Job process shutdown

  • We also have admin force-close:

    • list/delete rooms via RoomServiceClient

    • remove participants before delete

    • mark app-side sessions finished

Question

Is STT “current” expected to remain >0 for a while after teardown (provider-side TTL / delayed accounting), or should it drop to 0 immediately when AgentSession closes and room is deleted? We could not understand it from here Server lifecycle | LiveKit Documentation

Also, are there recommended best practices to guarantee STT stream cleanup in Node agents (beyond room delete + participant remove + session close), especially when there are occasional teardown warnings?

Useful details

  • SDK: @livekit/agents / plugin versions around 1.0.47

  • STT provider: Deepgram via inference.STT

  • Environment: local dev, frequent start/stop cycles

Thanks — happy to share more logs if needed.

How are you measuring “persistent STT concurrency”?

We’re basing that on the LiveKit runtime metric/dashboard, which shows Concurrent STT current > 0 even after the related session has been torn down.

Concretely, we observe:

  • user disconnected

  • room deleted / closed

  • app-side session marked finished

  • agent logs show AgentSession closed, disconnected from room, and job shutdown

But the LiveKit-side STT current metric on the dashboard still remains non-zero for some time afterward. We’re calling it “persistent” because we would expect it to return to 0 once everything is torn down.

To be clear, we are not yet proving that this metric is the root cause. What we are seeing is that at some point the API returns an error no matter and fails all stt connections. (the error is generic, does not mention the quota but it’s the only plausible explanation)

Hi Giovanni, welcome!

A few clarifying questions and initial thoughts:

On the dashboard metric staying > 0:

The concurrency metric on the dashboard can lag behind actual session teardown — it’s not a real-time reflection of active STT streams. There’s some delay in accounting, especially during frequent start/stop cycles in local dev. So seeing “current 3” after teardown doesn’t necessarily mean 3 STT streams are still open. I wouldn’t use the dashboard metric alone to diagnose a quota issue.

On the actual failures you’re seeing:

The more concerning part is this:

at some point the API returns an error no matter and fails all STT connections

Can you share the exact error message/code you’re getting when STT connections start failing? That will help us determine whether this is actually a concurrency limit issue or something else (auth, rate limiting, provider-side, etc.).

On the teardown warnings:

The sequence you’re seeing:

  1. failed to send usage report: http status: 401

  2. libc++abi ... mutex lock failed: Invalid argument (native abort)

The 401 on usage report suggests an auth token expiring during teardown — this shouldn’t affect STT cleanup but the native abort that follows is a bug. Are you able to reproduce this consistently? If so, can you share:

  • Your @livekit/agents version (you mentioned ~1.0.47 — exact version would help)

  • The full stack trace / log around the abort

  • Whether this happens on every session teardown or only intermittently

On cleanup best practices for Node agents:

Your current approach (participant remove → room delete → app-side session close) is correct. A couple of things to check:

  1. Are you calling session.close() explicitly, or only relying on participant disconnect events? In local dev with frequent restarts, disconnect events can get lost. Explicitly closing the session is more reliable.

  2. When you force-close via RoomServiceClient, make sure you’re awaiting the room deletion — if the process exits before the cleanup completes, streams may linger until they time out server-side.

  3. For local dev specifically, if you’re killing the agent process (Ctrl+C / SIGKILL), graceful shutdown may not complete. Try handling SIGINT/SIGTERM to call session.close() before exit.

The most useful next step would be getting the exact error when STT starts failing. That’ll tell us whether we’re looking at a concurrency limit, an auth issue, or something provider-side with Deepgram.

Hi - I realised that this is probably due to the fact that I ran out of free quota.
Now, definitely not a problem for me but as feedback I need to say that the UX can be improved.

  1. It’s unclear how I have used my free quota (minutes, concurrent sessions, egress,ingress?) - there is no clear dashboard that shows how much I still have
  2. I only got a message saying “You exceeded your quote for March” but that was pretty much it

@Giovanni_Braghieri I appreciate the feedback. I will bring this back to the UX team.