Observed STT/TTS Concurrency Behavior During Internal Testing

Test Setup

  • Internal Users: 5
  • Stack: LiveKit Agent JS (hosted on LiveKit Agent Cloud)
  • SDK Version: {please confirm current version if needed}
  • Plan: Pro (20 users)

Scenario 1 – Normal Usage
Five users joined the voice agent simultaneously and stayed connected for about 5–6 minutes.
Everything worked as expected — audio, STT/TTS, and agent responses were stable, with concurrency reaching around 5–6 users.

Scenario 2 – Reconnect Workflow
We repeated the same test, but this time each user:

  1. Spoke for ~30 seconds
  2. Closed the voice session
  3. Reopened it shortly after

During this flow, we observed:

  • STT/TTS utilization quickly rising to ~80% despite still having only five users.
  • Agent sessions beginning to fail or behave inconsistently.
  • Previously closed sessions appearing to remain active for a noticeable duration before being released.

Although our plan allows up to 20 concurrent users, we seemed to hit limits with just these repeated reconnects. This makes us suspect that STT/TTS resources (or related workers) may not be terminating immediately after session closure, which could also have cost implications.

Help in explaining:

  • Whether there are recommended lifecycle or teardown steps we should be explicitly handling on our side?
  • If there are known considerations around rapid reconnect patterns?
  • How STT/TTS session cleanup is managed and how we can ensure resources are released promptly?

Hi, the session lifecycle is described here: Server lifecycle | LiveKit Documentation and there are some additional options detailed at Agent session | LiveKit Documentation

Just to clarify, that 20 number applies to Concurrent agent sessions and LiveKit Inference concurrency.

Where are you seeing the STT/TTS utilization at 80%? If you are using LiveKit inference for both STT and TTS, that would be 2 concurrent connections per user