Hi guys,
We’re running into a serious concurrency issue and could use some guidance.
Even on the Scale plan, with only ~50 concurrent users, we’re seeing a huge increase in agent join latency. In many cases, this delay also causes the agent’s first (welcome) message to be dropped entirely.
This is concerning because the Scale plan is advertised to support up to 600 concurrent users, which is the level we’re planning for. We’ve already requested and been granted 600 concurrency on the dashboard, but performance degradation is already severe at just 50 users.
Observed latency percentiles (from dashboard):
-
p50: 13,498 ms
-
p90: 245,331 ms
-
p99: 276,611 ms
Additional context:
-
We are not using avatars in this flow , this is a simple agent session only.
-
We tested concurrency using Artillery, simulating virtual users starting agent sessions.
-
Sessions were initiated simultaneously (burst load).
-
There were no dispatch errors in the agent logs during testing, only extremely long agent join delays.
-
We increased num_idle_processes in an attempt to keep enough warm capacity, but it didn’t materially improve join latency.
Questions we’d appreciate clarity on:
-
Are there known limitations around simultaneous / burst dispatch calls, even within the approved concurrency limit?
-
Does autoscaling or worker spin-up time impact agent join latency under burst conditions?
-
Is there a recommended approach to achieve the maximum number of dispatches with minimal delay?
-
Are there any best practices or configuration changes we should apply if we expect near-simultaneous session starts?
We understand that staggering session creation by a few seconds is more realistic, but we’d like to understand the hard limits and expected behavior when sessions are created concurrently, so we can plan accordingly.
Note: This thread already exists on slack, posting here for better reach, if any issue with duplicity kindly let me know, only hoping to resolve this soon
Can you please provide your project ID and a session id that demonstrates a long join time?
Your project ID can be found in the URL when you are on your dashboard. It starts with “p_"
We’re seeing duplicate on_enter() logs for the same encounter UUID, even though each client session is supposed to be uniquely identified and we are assigning unique UUID when creating the dispatches.
The log is emitted inside the on_enter() method, yet it’s being triggered twice for the same encounterId, coming from two different client locations / workers.
Expected
on_enter() should fire once per unique encounter/session
Observed
Same encounterId logged twice
Different worker IDs, PIDs, timestamps, and client locations
The duration points to the latency from when the dispatch call was created.
Could you point out as to why this is the case
{
“timestamp_utc”: “2026-02-07T15:41:27.589105Z”,
“message”: “Agent Entered”,
“duration”: “49.21s”,
“encounterId”: “1f6f8b17-95be-4797-bb03-21e805d3283a”,
“job_id”: “AJ_c9cQy6riGnDC”,
“room_id”: “RM_Lh7CFV3pTDH8”,
“worker_id”: “CAW_QJChRq6xL3j7”,
“pid”: “10207”,
“client_location”: “Seattle, Washington, US”,
“code_location”: {
“file”: “/app/IntakeAgent.py”,
“function”: “on_enter”,
“line”: 342
}
}
{
“timestamp_utc”: “2026-02-07T15:40:43.639816Z”,
“message”: “Agent Entered”,
“duration”: “5.26s”,
“encounterId”: “1f6f8b17-95be-4797-bb03-21e805d3283a”,
“job_id”: “AJ_c9cQy6riGnDC”,
“room_id”: “RM_Lh7CFV3pTDH8”,
“worker_id”: “CAW_mtxuHnnpbKxL”,
“pid”: “1371”,
“client_location”: “Ashburn, Virginia, US”,
“code_location”: {
“file”: “/app/IntakeAgent.py”,
“function”: “on_enter”,
“line”: 342
}
}
What Happened
The logs show a clear sequence of events for the room RM_Lh7CFV3pTDH8:
-
10:40:41 - First job assignment to the worker CAW_mtxuHnnpbKxL (Ashburn)
-
10:40:43 - Agent joins and becomes active
-
10:41:18 - “short ice connection” error occurs
-
10:41:23 - Agent disconnects (PEER_CONNECTION_DISCONNECTED)
-
10:41:26 - System logs “no worker available to handle job” with “context deadline exceeded” error
-
10:41:26 - Job reassigned to new worker CAW_QJChRq6xL3j7 (Seattle)
-
10:41:27 - Second agent joins the room
This resulted in two separate agent instances joining the same room with the same job_id but different worker_id values, triggering your on_enter() twice.
Why This Happens
The LiveKit agent system automatically retries failed jobs. manager.go:924-946 When an agent disconnects unexpectedly (not intentionally), the system checks if the job should be retried. manager.go:77-105
The first agent (Ashburn) disconnected due to network issues (PEER_CONNECTION_DISCONNECTED), which is not considered an intentional disconnect. This triggered the retry logic, causing the system to assign the same job to a different worker (Seattle).
The Duration Discrepancy
The duration values you’re seeing (5.26s vs 49.21s) represent the time from when the job was first created to when each agent entered:
Notes
The agent retry mechanism is working as designed to ensure job completion despite transient failures. However, if your application logic assumes on_enter() fires exactly once per job, you’ll need to add idempotency checks using the job_id to detect and handle reconnections. The same job_id appearing with different worker_id values indicate a retry scenario.
Also….
I think you are reusing room names. It is best to use a unique room name each time if you can.
The question is, why, in burst load testing, do some of these workers face this sort of disconnection ?
Bringing the conversation over from Slack, I understand your burst load testing is creating sessions simultaneously - the scaling is designed for more realistic scenarios rather than a sudden influx of simultaneous calls. This probably also explains the comment above when the job had to be reassigned.
I would suggest retesting with a more realistic load test. If you expect to see these sudden spikes in production, please get in touch as we can get you on an enterprise and provisioned appropriately.
Thanks for your assistance guys. We noticed that having a 1second delay between batches helps simulate a more realistic scenario and we did not face issues in that case. We will get in touch, once we analyze expected usage/load to move towards enterprise
Thanks!
Thanks, earlier I couldn’t find the agent load-testing on your docs website so I assumed it only existed for conference calls. Can see it in the link you shared, very helpful. Thanks!