Slow room creation

Rushi_Patel · June 6, 2026, 12:58pm

I am using a self-hosted LiveKit stack (with the Python Agents SDK) and experiencing a severe 7–8 second delay during call initialization across both web widgets and our Meta SIP inbound trunk. The issue is consistently a slow room-to-agent dispatch bottleneck; the frontend participant connects instantly, but there is a massive delay before the agent worker receives the job request and handles the first utterance, even though our agent container is running with active, pre-warmed idle processes. What internal server configurations or underlying Redis/Signaling components should we look into to fix this dispatch lag on a self-hosted setup, and is there a known issue with CreateRoom explicit agent dispatch (agents field) failing or delaying jobs compared to calling CreateDispatch manually?

khan · June 6, 2026, 2:56pm

This looks like a known issue in some self-hosted deployments. There are a couple of GitHub reports describing similar behavior, where pre-warmed workers remain idle and job dispatch is delayed by 15–50 seconds.

A few things check:

1. Use explicit CreateDispatch instead of the agents field

If you’re currently creating rooms with the agents field, try switching to an explicit CreateDispatch flow. LiveKit generally recommends explicit dispatching for SIP integrations and more complex deployments, as it tends to be more reliable and easier to troubleshoot.

2. Verify worker and room placement in multi-node setups

If you’re running a multi-node self-hosted cluster, take a look at issue #3645. Agent workers only serve the node they register with. If a room is created on a different node than the one hosting the worker, dispatch can be delayed or appear to stall. That could explain the 7–8 second gap you’re seeing.

Also, check your Redis deployment. Since Redis handles the signaling layer for job routing, high pub/sub latency or unstable connections can introduce additional dispatch delays.

github.com/livekit/agents

Significant Job Dispatch Delay (15-50 seconds) Between Room Creation and Agent Job Receipt

opened 10:26PM - 19 Aug 25 UTC

closed 06:48PM - 30 Oct 25 UTC

ClvHbl

bug

## Summary We're experiencing consistent, significant delays (15-50 seconds) bet…ween when a LiveKit room is created and when the agent worker receives the job request. According to LiveKit documentation, dispatch should complete in under 150ms, but we're seeing delays that are 100-300x longer than expected. ## Environment - **LiveKit Server**: Self-hosted on GCP (Cloud Run) - **LiveKit Agents**: Python SDK (latest version) - **Deployment**: GCE Managed Instance Group with 20 worker processes - **Configuration**: - `num_idle_processes=20` - `job_executor_type=JobExecutorType.PROCESS` - `load_threshold=0.85` ## The Problem When a user requests to join a room: 1. Frontend requests token from backend 2. Backend creates LiveKit token and room successfully (< 1 second) 3. Frontend connects to LiveKit room as a participant 4. **15-50 second delay occurs here** (agent job dispatch) 5. Agent worker finally receives job request 6. Agent joins room and operates normally This delay is causing users to wait in empty rooms, often disconnecting before the agent arrives. The frontend participant is successfully in the room, but the agent dispatch to join that same room is delayed. ## Detailed Timeline Examples ### Example 1: 25-second delay ``` 20:41:02.092 - [Backend] LiveKit token requested 20:41:02.469 - [Backend] Room created: agent_c69454ee-f917-47e1-9aec-617dce1cdd1b_session_session_1755636061982_4m8xh23 20:41:03.095 - [Backend] Customer info received via WebSocket 20:41:22.365 - [Backend] User disconnected (gave up waiting!) 20:41:27.694 - [Agent] Finally received job request (25 seconds late) 20:41:27.719 - [Agent] Connecting to room (user already gone) ``` ### Example 2: Working correctly ``` 20:39:32.994 - [Backend] LiveKit token requested 20:39:34.062 - [Agent] Received job request (1 second - normal!) 20:39:34.100 - [Agent] Connecting to room 20:39:37.659 - [Agent] Agent entered room ``` ## Key Observations 1. **Highly inconsistent**: The issue occurs unpredictably - sometimes the system works perfectly for dozens of conversations without any delays, then suddenly multiple sessions in a row experience 15-50 second delays. 2. **Worker availability is not the issue**: We have 20 processes initialized and ready, but only 3 were ever used (PIDs: 37, 41, 43). The other 17 processes remained idle while jobs were delayed. 3. **Pattern inconsistency**: The delay varies between 15-50 seconds when it occurs and happen even with just a single session across the entire system. ## What We've Verified - ✅ All 20 worker processes are initialized and ready - ✅ Worker successfully registers with LiveKit server - ✅ Backend creates rooms immediately (no delay on our end) - ✅ Network connectivity is stable - ✅ No memory/CPU pressure on agent instances - ✅ No errors in LiveKit server logs ## Questions 1. Is there a configuration setting that controls job dispatch timing/retry logic? 2. Are there any known issues with job dispatch to multi-process workers? 3. Could this be related to WebSocket connection/reconnection delays? 4. Are there any undocumented rate limits or throttling mechanisms in self-hosted LiveKit?

github.com/livekit/livekit

Distributed LiveKit: Agent worker can only serve the node it registered on

opened 06:17AM - 06 May 25 UTC

closed 05:56AM - 24 May 25 UTC

umarniz

## Summary In a 3-node LiveKit cluster, a single **LiveKit Agent worker** regis…ters successfully on **Node C**. When a new room request lands on **Node B**, the server cannot hand that job to the worker on Node C. Instead it 1. Tries (and fails) to assign the job on Node B. 2. Falls back to **Node A**, which creates the room **without any Agent worker attached**. Result: the room starts, but the Agent never connects and hence there is only silence. ------ ## Topology ``` Node A ── LiveKit server Node B ── LiveKit server (receives the room request) Node C ── LiveKit server (worker registers here) ``` ------ ## Logs ## Node A ``` 2025-05-06T05:42:55.411Z INFO livekit service/roomallocator.go:164 selected node for room {"room":"<room_name>","selectedNodeID":"<livekit_node_id>"} 2025-05-06T05:42:55.423Z INFO livekit.api service/twirp.go:124 API RoomService.CreateRoom {"service":"RoomService","method":"CreateRoom","room":"<room_name>","request":{"name":"<room_name>","metadata":"<metadata>"},"duration":"16.212838ms","status":"200"} 2025-05-06T05:42:55.596Z INFO livekit.webhook webhook/url_notifier.go:124 sent webhook {"event":"room_started","id":"<event_id>","webhookTime":1746510175,"room":"<room_name>","roomID":"<room_id>","url":"<webhook_url>","queueDuration":"42.041µs","sendDuration":"173.904564ms"} 2025-05-06T05:42:55.646Z INFO livekit agent/client.go:161 failed to send job request {"error":"no workers with sufficient capacity","namespace":"","jobType":"JT_ROOM","agentName":""} 2025-05-06T05:42:56.493Z INFO livekit service/roommanager.go:405 starting RTC session {"room":"<room_name>","roomID":"<room_id>","participant":"<participant_id>","pID":"<process_id>","remote":false,"nodeID":"<livekit_node_id>","numParticipants":0,"participantInit":{"Identity":"<participant_id>","Client":{"sdk":"<sdk>","version":"<sdk_version>","os":"<os>","osVersion":"<os_version>","deviceModel":"<device_model>","browser":"<browser>","browserVersion":"<browser_version>"}}} 2025-05-06T05:42:56.496Z INFO livekit service/roommanager.go:938 created TURN password {"username":"<turn_username>","password":"<turn_pass>"} 2025-05-06T05:42:56.808Z INFO livekit.transport rtc/transport.go:546 ice reconnected or switched pair {"room":"<room_name>","roomID":"<room_id>","participant":"<participant_id>","pID":"<process_id>","remote":false,"transport":"PUBLISHER","existingPair":{"localProtocol":"udp","localCandidateType":"host","localAddress":"<local_ip>","localPort":"<local_port>","remoteProtocol":"udp","remoteCandidateType":"prflx","remoteAddress":"<remote_ip>","remotePort":"<remote_port>"},"newPair":{"localProtocol":"udp","localCandidateType":"host","localAddress":"<local_ipv6>","localPort":"<local_ipv6_port>","remoteProtocol":"udp","remoteCandidateType":"host","remoteAddress":"<remote_ipv6>","remotePort":"<remote_ipv6_port>"}}} 2025-05-06T05:42:56.922Z INFO livekit.pub rtc/participant.go:1826 mediaTrack published {"room":"<room_name>","roomID":"<room_id>","participant":"<participant_id>","pID":"<process_id>","remote":false,"kind":"audio","trackID":"<track_id>","webrtcTrackID":"<webrtc_track_id>","rid":"","SSRC":"<ssrc>","mime":"audio/red","trackInfo":{"sid":"<track_id>","type":"AUDIO","source":"MICROPHONE","mimeType":"audio/red","mid":"1","stream":"camera"},"fromSdp":true} 2025-05-06T05:42:56.957Z INFO livekit.webhook webhook/url_notifier.go:124 sent webhook {"event":"track_published","id":"<event_id>","webhookTime":1346510176,"room":"<room_name>","roomID":"<room_id>","participant":"<participant_id>","pID":"<process_id>","url":"<webhook_url>","queueDuration":"14.513µs","sendDuration":"32.059281ms"} 2025-05-06T05:42:57.704Z INFO livekit rtc/room.go:469 participant active {"room":"<room_name>","roomID":"<room_id>","participant":"<participant_id>","pID":"<process_id>","remote":false,"publisherCandidates":"<redacted>","subscriberCandidates":"<redacted>","connectionType":"udp"} 2025-05-06T05:42:57.744Z INFO livekit.webhook webhook/url_notifier.go:124 sent webhook {"event":"participant_joined","id":"<event_id>","webhookTime":1346510177,"room":"<room_name>","roomID":"<room_id>","participant":"<participant_id>","pID":"<process_id>","url":"<webhook_url>","queueDuration":"34.951µs","sendDuration":"39.824442ms"} ``` ### Node B ``` 2025-05-06T05:42:55.632Z WARN livekit.agents service/agentservice.go:401 failed to assign job to worker {"jobID": "<job_id>", "namespace": "", "agentName": "", "jobType": "JT_ROOM", "room": "<room_name>", "roomID": "<room_id>", "workerID": "<OLD_WORKER_ID_1>", "retry": true, "error": "worker not available"} 2025-05-06T05:42:55.639Z WARN livekit.agents service/agentservice.go:401 failed to assign job to worker {"jobID": "<job_id>", "namespace": "", "agentName": "", "jobType": "JT_ROOM", "room": "<room_name>", "roomID": "<room_id>", "workerID": "<OLD_WORKER_ID_2>", "retry": true, "error": "worker not available"} 2025-05-06T05:42:55.644Z WARN livekit.agents service/agentservice.go:401 failed to assign job to worker {"jobID": "<job_id>", "namespace": "", "agentName": "", "jobType": "JT_ROOM", "room": "<room_name>", "roomID": "<room_id>", "workerID": "<OLD_WORKER_ID_3>", "retry": true, "error": "worker not available"} 2025-05-06T05:42:55.644Z WARN livekit.agents service/agentservice.go:391 no worker available to handle job {"jobID": "<job_id>", "namespace": "", "agentName": "", "jobType": "JT_ROOM", "room": "<room_name>", "roomID": "<room_id>", "error": "no workers with sufficient capacity"} ``` ### Node C ``` 2025-05-06T05:42:44.567Z INFO livekit.agents service/agentservice.go:292 worker registered {"namespace": "", "jobType": "JT_ROOM", "agentName": "", "workerID": "<CORRECT_WORKED_ID>"} ``` ------ ## Expected behavior - **Node B** should be able to dispatch the `JT_ROOM` job to the worker already registered on **Node C**, or the cluster should transparently proxy the request. ## Actual behavior - **Node B** fails to find *any* available worker and eventually the allocator selects **Node A**, which creates the room without an Agent worker. ------ ## Reproduction steps 1. Run three LiveKit nodes (A, B, C) backed by the same Redis/state store (running in sentinel with one master node). 2. Start **one** Agent worker *only* on Node C (job type `JT_ROOM`). 3. Send a `CreateRoom` request that the allocator routes to Node B. 4. Observe *worker-not-available* warnings on Node B and room creation on Node A. ------ ## Environment | Component | Version | | -------------- | --------------------------------------- | | LiveKit server | `v2.8.0` | | LiveKit Agent | `v0.x` (worker) | | Redis | `7.2` | | Deployment | Docker Compose (one container per node) | | OS | Ubuntu 22.04 | ------ ## Question **Why can’t Node B delegate the job to the registered worker on Node C?** Is worker discovery limited to the local node by design, or is there an extra configuration flag required to make workers globally visible across the cluster? Any insight or pointers would be greatly appreciated!

Muhammad_Usman_Bashir · June 7, 2026, 2:25am

@Rushi_Patel, 7-8s is way outside the documented baseline (“max dispatch time under 150 ms” per Agent dispatch | LiveKit Documentation ). For self-hosted, the dispatch path is server >> Redis >> worker. Top suspects:

Redis network latency between LK server and your agent container.
Worker agent_name mismatch (job sits unrouted until a timeout).
PR #4488 (open) is fixing a 3s psrpc timeout when ListDispatch hits a non-existent room. If your dispatch hits that path on the first call, you’d see ~3s eaten there.

On the CreateRoom-agents-field vs CreateDispatch question: there are known issues. Bug #4357 (closed Mar 2026) was about dispatches via RoomServiceClient.createRoom missing the room field. Issue #4390 documents that roomConfig.agents from a JWT token is silently ignored when the room already exists. The recommended path is the explicit CreateAgentDispatchRequest:

  from livekit import api

  lkapi = api.LiveKitAPI()

  await lkapi.agent_dispatch.create_dispatch(
      api.CreateAgentDispatchRequest(
          agent_name="my-agent",   # must match worker registration exactly
          room="my-room",           # auto-created if missing
          metadata='{"user_id": "12345"}',
      )
  )

To isolate the bottleneck: log timestamps on

participant connect,
CreateDispatch call,
worker entrypoint start.

The Gap between (2) and (3) is Redis or worker registration. Gap between (1) and (2) is your own backend logic.

Rushi_Patel · June 8, 2026, 6:40am

since u have mentioned redis network latency btw LK server and my agent container,
one thing I want to discuss is that does the geo location of all the server affect the latency??
because all my servers (redis, livekit, livekit-sip and my agent) are running on the different region all over the world.
If geographic distribution does have a significant impact on latency, could you please help explain the underlying reasons? I’d like to share a clear technical explanation with my team to support the recommendation of deploying these services within the same region.

khan · June 8, 2026, 7:17am

@Rushi_Patel Yes, geographic distribution can absolutely contribute to dispatch delays, and it’s worth checking how your infrastructure is deployed.

In a self-hosted setup, Redis acts as the messaging layer for job dispatch. A typical flow looks like this:

LiveKit Server → Redis → Agent Worker → LiveKit Server

If these components are spread across different regions, every network hop introduces additional latency. Once you add SIP infrastructure in another region, the total round-trip time can grow even further.

One key difference is that LiveKit Cloud handles geographic placement automatically. In self-hosted deployments, there is no built-in geographic affinity, so placement and routing need to be managed through your infrastructure design.

As a best practice, try to keep the following components in the same region and, ideally, the same datacenter:

Redis
LiveKit Server
LiveKit SIP
Agent Workers

Keeping Redis pub/sub traffic local minimizes latency and can significantly improve dispatch times.

Topic		Replies	Views
LiveKit Agent took 4 minutes to join once livekitSdk.dispatchAgentToRoom() was called Agents agent-deployment	4	58	May 7, 2026
Build plan deployed agent stays Sleeping after successful CreateDispatch Agents agent-development , agent-deployment , livekit-cloud	4	81	June 5, 2026
Livekit Agent Dispatch issue, hosted on livekit cloud Agents agent-development , other , livekit-cloud	15	334	June 2, 2026
Inbound SIP call: agent job dispatched ~8s after the INVITE reached LiveKit Cloud — caller had already hung up Getting Started	12	97	June 10, 2026
Agent Dispatch errors - Free Plan \| Automatic Agent Dispatch Agents agent-deployment	2	57	May 13, 2026

Slow room creation

Related topics