Hey team
I’ve been running a deployed agent on LiveKit Cloud for several months without issues. About 2–3 weeks ago it stopped working completely, and after extensive debugging I can confirm it’s not a code or config problem.
Details:
Agent ID: CA_kdH3VYVKtzsr
Project ID: p_5wnpagm7zno
Region: us-east
Plan: Build (free)
Agent: Python SDK livekit-agents==1.5.1, explicit dispatch with agent_name=“outbound-caller”
Dispatched from: .NET backend via AgentDispatchServiceClient
What’s happening:
lk agent status always shows Sleeping with CPU: 0m, Mem: 0
AgentDispatchServiceClient.CreateDispatch() returns a valid AD_ ID with no errors
The agent never joins the room — dispatch is orphaned
lk agent restart returns “Restarted” but status stays Sleeping immediately after
Runtime logs are inaccessible: “The agent has shut down due to inactivity. It will automatically start again when a new session begins.” — but it never does
Build logs are completely clean — all 9 Docker steps succeed, no errors
Confirmed NOT the issue:
All secrets present and valid (OPENAI_API_KEY, DEEPGRAM_API_KEY, ASSEMBLYAI_API_KEY, SIP_OUTBOUND_TRUNK_ID, AWS keys)
LIVEKIT_URL/API_KEY/SECRET are project-level — not needed in secrets
Free plan quota not exceeded
Room stays open for several minutes after dispatch (not an emptyTimeout issue)
Dispatch API works correctly — valid AD_ IDs returned every time
Self-hosted works perfectly:
When I run the exact same agent code locally with python agent.py dev pointing to the same LiveKit Cloud project, it works flawlessly — registers as a worker, receives the dispatch, joins the room, completes the full session including DTMF, STT, TTS, S3 upload. Zero issues.
So the code is correct. The cloud deployment of the same code is broken.
My conclusion: The deployment CA_kdH3VYVKtzsr appears stuck in a broken infrastructure state where the cold start mechanism silently fails — the container is triggered to wake up on dispatch but never actually boots, never registers as a worker, and the pending dispatch expires with nobody to claim it.
This started happening ~2–3 weeks ago with no changes on my end.
Questions:
Can you investigate what’s happening with this specific deployment on your infrastructure?
Is there a known issue with cold start + explicit dispatch on the Build plan recently?
Would deleting and redeploying fix this, or is it a broader platform issue?
Thanks ![]()