Achieving multi‑agent awareness and state synchronization with LiveKit Data Channels

Good Morning Everyone,

I’m Elijah and I want to share an architectural pattern I recently implemented for orchestrating multiple AI agents within a single LiveKit room. When you move from a single agent to a crew of specialized agents in the same room, you quickly hit a synchronization wall: agents talking over each other , redundant processing of the same audio track , And no shared sense of “who is doing what right now.” Here’s how I’m using LiveKit’s room primitives and Data Channels to build a shared “brain state” so agents can stay aware of each other’s actions in real time.

three agents in the same LiveKit room, each subscribed to the user track, coordinated via Data Channel state.
Screenshot 2: example JSON payload showing
agent_id, status, and action_lock used for synchronization.

Live view of a three‑agent round‑robin session in a single LiveKit room, where each agent takes a turn responding while coordination happens behind the scenes via Data Channel state messages.

The Architecture: How It’s Wired

1. The Room Topology Instead of spinning up separate rooms, all agents connect to the same primary Room as independent participants.

  • Every agent subscribes to the user’s audio track.

2. The Awareness Layer (LiveKit Data Channels) This is the core of the synchronization pattern. Relying on transcript completion is too slow for real-time orchestration. Instead, I use LiveKit Data Channels as a high-speed, decentralized event bus.

  • When Agent A detects an intent it needs to act on, it instantly broadcasts a JSON state payload via the Data Channel.

  • Agent B (and any other agent in the room) receives this event in milliseconds.

  • The payload includes fields like agent_id, status: “processing” | “idle”, and an action_lock boolean. When another agent sees action_lock: true for a given task, it temporarily mutes or pauses its own response pipeline until it receives a status: “resolved” event.

3. State Reconciliation By using Data Channels, the agents aren’t just reacting to audio; they are reacting to each other’s internal execution states. If an agent is executing a background task, the other agents know to either hold the user’s attention or stand by.

The Takeaway

Using Data Channels for inter-agent communication completely eliminated audio collisions and allowed for seamless “handoffs” between specialized models. LiveKit handles the routing and latency so well that the agents feel like a single, cohesive intelligence.

I’m currently exploring better ways to share conversation context across agents so they don’t each burn tokens re‑parsing the same user prompt, while still keeping per‑agent prompts specialized.

Has anyone else experimented with multi-agent routing using this kind of pub/sub pattern over Data Channels? I’d love to see how others are handling the state locks or if there are cleaner ways to manage the track subscriptions dynamically!

1 Like

Looks neat. It makes me reminisce a little about the good ole days of token ring networks.

1 Like

Elijah, this is really slick work.

Using LiveKit Data Channels as a low-latency awareness layer and lightweight event bus for agent state (with agent_id, status, and an action_lock to prevent collisions) is a clean, pragmatic solution to the exact synchronization wall you described. The fact that it enables seamless handoffs without waiting on transcript completion is especially smart.

Your post nudged me to finally collect some related thinking I’ve been doing on turn-taking, fairness, and interruption handling, from 3-way chats up to larger groups. If it’s useful: https://eniveld.substack.com/p/teaching-livekit-agents-to-talk-like

Curious about your next step too: when you talk about sharing context across agents without everyone burning tokens re-parsing the same stream, are you leaning toward a shared rolling summary, a central context broker, or something else? That feels like the next big unlock.

David, thanks a ton — really appreciate this, and your article was super clarifying. I love how you treat “who has the floor” as a first-class policy layer with explicit eligibility and fairness instead of just reacting to STT timing.​

On the context side, I’m leaning toward a central context broker that owns floor/thread state and a rolling summary, then publishes low-latency artifacts (events, summaries, features) over the same Data Channel bus I’m using for agent awareness. Individual agents keep thin, role-specific overlays instead of re-parsing the whole stream.

The piece I’m still figuring out is how opinionated that broker should be versus letting specialist agents disagree, and how tightly its floor model should track your eligibility / spectator-mode logic. Would love to compare notes on what this looks like in your head — is it more of a single “conductor” service, or something agents negotiate with more loosely?