Agent Join Latency Spikes at ~50 Concurrent Users on Scale Plan

Rida_Amir · February 6, 2026, 5:57pm

Hi guys,

We’re running into a serious concurrency issue and could use some guidance.

Even on the Scale plan, with only ~50 concurrent users, we’re seeing a huge increase in agent join latency. In many cases, this delay also causes the agent’s first (welcome) message to be dropped entirely.

This is concerning because the Scale plan is advertised to support up to 600 concurrent users, which is the level we’re planning for. We’ve already requested and been granted 600 concurrency on the dashboard, but performance degradation is already severe at just 50 users.

Observed latency percentiles (from dashboard):

p50: 13,498 ms
p90: 245,331 ms
p99: 276,611 ms

Additional context:

We are not using avatars in this flow , this is a simple agent session only.
We tested concurrency using Artillery, simulating virtual users starting agent sessions.
Sessions were initiated simultaneously (burst load).
There were no dispatch errors in the agent logs during testing, only extremely long agent join delays.
We increased num_idle_processes in an attempt to keep enough warm capacity, but it didn’t materially improve join latency.

Questions we’d appreciate clarity on:

Are there known limitations around simultaneous / burst dispatch calls, even within the approved concurrency limit?
Does autoscaling or worker spin-up time impact agent join latency under burst conditions?
Is there a recommended approach to achieve the maximum number of dispatches with minimal delay?
Are there any best practices or configuration changes we should apply if we expect near-simultaneous session starts?

We understand that staggering session creation by a few seconds is more realistic, but we’d like to understand the hard limits and expected behavior when sessions are created concurrently, so we can plan accordingly.

Note: This thread already exists on slack, posting here for better reach, if any issue with duplicity kindly let me know, only hoping to resolve this soon

CWilson · February 6, 2026, 8:11pm

Can you please provide your project ID and a session id that demonstrates a long join time?

Your project ID can be found in the URL when you are on your dashboard. It starts with “p_"

Rida_Amir · February 7, 2026, 7:56am

Hi, I’ve shared in dm

Rida_Amir · February 7, 2026, 10:55am

We’re seeing duplicate on_enter() logs for the same encounter UUID, even though each client session is supposed to be uniquely identified and we are assigning unique UUID when creating the dispatches.
The log is emitted inside the on_enter() method, yet it’s being triggered twice for the same encounterId, coming from two different client locations / workers.
Expected
on_enter() should fire once per unique encounter/session
Observed
Same encounterId logged twice
Different worker IDs, PIDs, timestamps, and client locations

The duration points to the latency from when the dispatch call was created.
Could you point out as to why this is the case

{
“timestamp_utc”: “2026-02-07T15:41:27.589105Z”,
“message”: “Agent Entered”,
“duration”: “49.21s”,
“encounterId”: “1f6f8b17-95be-4797-bb03-21e805d3283a”,
“job_id”: “AJ_c9cQy6riGnDC”,
“room_id”: “RM_Lh7CFV3pTDH8”,
“worker_id”: “CAW_QJChRq6xL3j7”,
“pid”: “10207”,
“client_location”: “Seattle, Washington, US”,
“code_location”: {
“file”: “/app/IntakeAgent.py”,
“function”: “on_enter”,
“line”: 342
}
}

{
“timestamp_utc”: “2026-02-07T15:40:43.639816Z”,
“message”: “Agent Entered”,
“duration”: “5.26s”,
“encounterId”: “1f6f8b17-95be-4797-bb03-21e805d3283a”,
“job_id”: “AJ_c9cQy6riGnDC”,
“room_id”: “RM_Lh7CFV3pTDH8”,
“worker_id”: “CAW_mtxuHnnpbKxL”,
“pid”: “1371”,
“client_location”: “Ashburn, Virginia, US”,
“code_location”: {
“file”: “/app/IntakeAgent.py”,
“function”: “on_enter”,
“line”: 342
}

}

CWilson · February 7, 2026, 5:24pm

What Happened

The logs show a clear sequence of events for the room RM_Lh7CFV3pTDH8:

10:40:41 - First job assignment to the worker CAW_mtxuHnnpbKxL (Ashburn)
10:40:43 - Agent joins and becomes active
10:41:18 - “short ice connection” error occurs
10:41:23 - Agent disconnects (PEER_CONNECTION_DISCONNECTED)
10:41:26 - System logs “no worker available to handle job” with “context deadline exceeded” error
10:41:26 - Job reassigned to new worker CAW_QJChRq6xL3j7 (Seattle)
10:41:27 - Second agent joins the room

This resulted in two separate agent instances joining the same room with the same job_id but different worker_id values, triggering your on_enter() twice.

Why This Happens

The LiveKit agent system automatically retries failed jobs. manager.go:924-946 When an agent disconnects unexpectedly (not intentionally), the system checks if the job should be retried. manager.go:77-105

The first agent (Ashburn) disconnected due to network issues (PEER_CONNECTION_DISCONNECTED), which is not considered an intentional disconnect. This triggered the retry logic, causing the system to assign the same job to a different worker (Seattle).

The Duration Discrepancy

The duration values you’re seeing (5.26s vs 49.21s) represent the time from when the job was first created to when each agent entered:

First agent: 5.26s from job creation at ~10:40:38
Second agent: 49.21s from the same job creation timestamp (includes the ~40s of failed connection attempts)

Notes

The agent retry mechanism is working as designed to ensure job completion despite transient failures. However, if your application logic assumes on_enter() fires exactly once per job, you’ll need to add idempotency checks using the job_id to detect and handle reconnections. The same job_id appearing with different worker_id values indicate a retry scenario.

Also….
I think you are reusing room names. It is best to use a unique room name each time if you can.

Rida_Amir · February 9, 2026, 6:24am

The question is, why, in burst load testing, do some of these workers face this sort of disconnection ?

darryncampbell · February 9, 2026, 10:00am

Bringing the conversation over from Slack, I understand your burst load testing is creating sessions simultaneously - the scaling is designed for more realistic scenarios rather than a sudden influx of simultaneous calls. This probably also explains the comment above when the job had to be reassigned.

I would suggest retesting with a more realistic load test. If you expect to see these sudden spikes in production, please get in touch as we can get you on an enterprise and provisioned appropriately.

Rida_Amir · February 9, 2026, 10:37am

Thanks for your assistance guys. We noticed that having a 1second delay between batches helps simulate a more realistic scenario and we did not face issues in that case. We will get in touch, once we analyze expected usage/load to move towards enterprise

Thanks!

darryncampbell · February 10, 2026, 7:29am

FYI we also have an agent load test tool as part of the CLI: GitHub - livekit/livekit-cli: Command line interface to LiveKit

Rida_Amir · February 10, 2026, 8:03am

Thanks, earlier I couldn’t find the agent load-testing on your docs website so I assumed it only existed for conference calls. Can see it in the link you shared, very helpful. Thanks!

darryncampbell · April 8, 2026, 12:36pm

Updating the marked solution for this topic. The root cause was found to be related to how load testing was performed. @CWilson has written a dedicated guide for this:

Topic		Replies	Views
Agent didn't join even after 2 mins of room creation and no errors Agents livekit-cloud	8	58	April 1, 2026
Agent not joining sessions Telephony sip-trunking	5	25	April 10, 2026
Load Testing LiveKit Agents Agents testing	1	66	February 19, 2026
Build plan shows 5 concurrent agent sessions, but only 1 live call works at a time Agents agent-development	4	31	April 30, 2026
The agent is dispatched only for the first participant Agents agent-development	6	33	April 17, 2026

Agent Join Latency Spikes at ~50 Concurrent Users on Scale Plan

What Happened

Why This Happens

The Duration Discrepancy

Notes

Related topics