In the morning agent and sip participant are fine and join.
As volume picks up i see 20+ concurrent sessions that are not there and a sip caller cannot join the room.
Agent joining spikes and stuck in limbo
In the morning agent and sip participant are fine and join.
As volume picks up i see 20+ concurrent sessions that are not there and a sip caller cannot join the room.
Agent joining spikes and stuck in limbo
Are you self-hosing agents, or is this LiveKit cloud? Can you give me a session ID or call ID for a session where you saw the issue?
What do you see in your agent’s logs when that happens?
I am on livekit cloud. It has happened mutliple times over days.
here is an id where it worked RM_Nvkp5fdUzmea and then the next id RM_a4ZmN5oe4PKq
After the failed session I saw 20+ zombie stale rooms stack up and callers could not connect to the agent.
Thank you so much
It might be related to this patch FOUND IT.
Look at this:
▎
fix(agents): release initMutex after warming to restore pool concurrency
—
This is
exactly
the bug I spotted in the code earlier. The
initMutex
was held through
proc.join()
— meaning through the entire
lifetime of each process, including while handling a job. That forced jobs to run serially inside a single worker, regardless of
numIdleProcesses
. My original theory was right; I just got confused when I saw concurrent PIDs in your current logs.
If you look at the event logs for RM_Nvkp5fdUzmea and RM_Nvkp5fdUzmea, e.g. Sign in | LiveKit Cloud, you’ll notice that the agent DID join, it just took a while for it to appear.
As volume picks up i see 20+ concurrent sessions that are not there
I think the issue is you are on the ship tier, which only allows 20 concurrent agent sessions (link), I suspect the agent is slow to join because it’s waiting for another session to finish.
I don’t actually see any dispatch errors on your agent dashboard, Sign in | LiveKit Cloud, but you’ll need to upgrade to scale to cater for >20 concurrent agent sessions.
It is a good intuition but this is not the case.
There were never 20+ concurrent sessions. Livekit cloud created zombie sessiosn that I could not even locate on the dashboard. Hopefully it is fixed with the above github p[atch