Cross-process agent handoff — any plans for first-class support?

We’re building a production voice AI system on LiveKit with a parent agent that delegates to multiple specialist sub-agents (identification, user management, etc.) using enter_agent() / handoff. Today the whole system runs as a single monolith — one ECS task, one worker process.

We want to move to independently deployable sub-agents, where each specialist runs in its own container/process and can be deployed, versioned, and scaled independently. This is a standard microservices concern — changing one sub-agent shouldn’t require redeploying and retesting the entire system.

We’ve researched the current options and understand the constraints:

  1. enter_agent() is purely in-process — it calls update_agent() under the hood, so there’s no way to hand off to an agent running in a different worker.

  2. Agent dispatch (RoomAgentDispatch) supports dispatching multiple named agents to the same room from separate workers (docs, confirmed working in #2220). But there’s no built-in protocol for coordinating handoff between them — who’s “active,” passing conversation context/state, muting/unmuting, or managing turn detection across agent switches.

  3. send_text() data channels can be used for DIY coordination between agents in the same room, but it’s a significant amount of custom infrastructure (state serialization, ack protocols, audio isolation, turn detection stability).

Our questions:

  • Is LiveKit planning to add cross-process agent handoff — something like enter_agent() but targeting a named agent running in a different worker?

  • Alternatively, are there plans for a multi-agent coordination layer that handles the mute/unmute, turn detection, and context passing when multiple agents share a room?

  • Is there a public roadmap or feature prioritization board we can follow for this kind of architectural feature?

We want to avoid building a custom solution if LiveKit has this on the near-term roadmap.

Can you tell me more about why agent dispatch is not solving the issue. That was my first thought on how to solve this when I started reading your message.

It is not clear to me why you have many agents in the room at the same time. Could it be on demand instead? In my head I expected Agent A would dispatch (with context in the metadata) Agent B. When Agent B arrives agent A would leave the room so it would only be the User and Agent B. If Agent C was needed Agent B would dispatch with context and exit upon agent C arrival

What are you finding complex about state transfer?

Can you tell me a little more about your usecase so I can communicate this properly to the agents team?

Thanks for engaging on this, Chris. Happy to clarify.

Why agent dispatch alone doesn’t solve it

Agent dispatch handles the routing, getting the right agent worker into the room. That part works great. The gap is everything that happens around the dispatch during a handoff:

  1. Conversation history transfer: When Agent A dispatches Agent B, Agent B arrives with an empty ChatContext. The caller has been talking for 5 minutes, has identified themselves, explained their problem, and Agent B knows none of this. There’s no built-in mechanism to transfer the conversation history from Agent A to Agent B. We’d need to serialize the full ChatContext (system prompt, user messages, assistant messages, tool calls and results), pass it via metadata or data channel, and reconstruct it on the other side.

  2. Shared mutable session state: Our agents accumulate rich state during a call: caller identity, verified account info, facility data, what workflow steps have been completed, etc. (~120+ fields). This state is currently an in-memory Python object shared across agents in our monolith. In a dispatch model, we’d need to serialize it, store it externally (DynamoDB or similar), and have Agent B hydrate it on arrival. Some of our state contains runtime objects (async closures for lazy-loaded API clients, audio control handles) that aren’t serializable without refactoring.

  3. Audio gap during transition: When Agent A dispatches Agent B and then leaves, there’s a period where the caller hears nothing. Agent B needs to connect, load state, initialize STT/TTS, and start its pipeline. For a voice call, even 500ms of dead air feels jarring, and realistic transition time is likely longer.

The serial dispatch model you described is actually close to what we’ve been considering. It’s the most viable path. But the “dispatch with context in metadata” part is where the complexity lives. Metadata has size limits, and our context (conversation history + session state) can be substantial.

Our use case

We run a production voice AI call center agent on LiveKit (SIP inbound via telephony). A single call involves a parent agent that delegates to specialist sub-agents for different workflows: identifying the caller, managing their account, handling specific transaction types, etc. Today it’s a monolith: one container, one worker process, in-process enter_agent() handoffs.

Why we want to break it apart: independent deployability. When we change the logic in one specialist agent (say, the account management agent), we currently have to redeploy and retest the entire system. We want each specialist agent in its own container/ECS task so teams can deploy, version, and scale them independently. Standard microservices motivation.

What we’re hoping to learn from the agents team

We can absolutely build the state serialization, external state store, and ChatContext transfer ourselves. But before we invest in that infrastructure, we want to know:

  • Is LiveKit considering adding ChatContext transfer as a built-in capability when dispatching a new agent to a room? (e.g., Agent B automatically receives Agent A’s conversation history)

  • Is there any planned support for seamless audio handoff, where Agent B’s pipeline is warmed up before Agent A disconnects, avoiding dead air?

  • Or more generally, is “multi-agent handoff with context preservation” something on the roadmap at all?

If this is on your near-term roadmap, we’d rather wait and build on first-class support than create a custom solution that your framework later supersedes.