I am taking a hybrid managed and self-hosted autoscaling approach for the agents in my app, I am curious about the max concurrency behavior on the cloud managed agents when interacting with self-hosted agents. I understand that livekit load-balances participant requests to available workers, what if a Ship tier worker reaches its max concurrent calls of 20, is it taken out of the pool or will a routed request result in an error for the user?
On the OSS side, the dispatch is clean: workers self-report load_fnc against load_threshold (prod default 0.7) and flip to WorkerStatus.WS_FULL via UpdateWorkerStatus when they cross it. WS_FULL workers drop out of the dispatch pool; jobs route to other available workers. If everything’s full, jobs queue server-side rather than erroring (cf. Issue: Jobs keep enqueued, even for empty rooms · Issue #1024 · livekit/agents · GitHub).
What’s not publicly verifiable: whether Ship-tier’s “20 concurrent calls” is enforced via that same flip (likely, same framework) or as a hard reject at exactly 20. In the soft case, self-hosted workers under the same agent_name absorb overflow naturally. In the hard case, self-host has to be an explicit fallback. Worth a quick controlled ramp test, or @CWilson can confirm in a line.
Cc: @Andrew_Hilton