Multi-agent turn coordination — text-stream ack protocol vs native SDK approach?

Il_Gav · March 25, 2026, 2:58pm

Hello,

We have two agents in the same LiveKit room using livekit-agents~=1.5. Agent A speaks first, then sends a text payload to Agent B via send_text(topic=“finding”). Agent B reacts via session.say() + wait_for_playout(), then sends send_text(“done”, topic=“turn-ack”). Agent A awaits the ack before proceeding to the next item.

This works well in testing (zero overlap between agents). We chose this over VAD-based gating because TTS has natural inter-sentence pauses that triggered false “end of speech” signals.

Questions:

Is there a native SDK mechanism for explicit agent-to-agent turn coordination we should be using instead? (e.g., something in the handoff/task system, or a built-in signaling pattern?)
Any concerns with using send_text() as a low-frequency signaling channel? (Our rate is ~1 message per 15-20s.)
For scaling to 3+ agents in the same room, would you recommend a different pattern?

Thanks!

CWilson · March 26, 2026, 1:37am

There isn’t a lower-level “turn lock” primitive in the SDK specifically for agent-to-agent coordination. The native pattern for structured multi-agent control is agent sessions with handoffs, tasks, and task groups, where one controlling agent transfers execution explicitly rather than relying on media timing. See Workflows and the linked Agents & handoffs section.

Using send_text() as a low-frequency signaling channel (1 message per 15–20s) is fully reasonable. It rides on the same reliable data mechanisms as other text streams and is appropriate for explicit coordination.

For 3+ agents, consider a single controlling agent (or task group) orchestrating handoffs instead of peer-to-peer acks, which keeps flow centralized and testable.

Are these agents meant to act as distinct “personas” in one conversation, or as cooperative background workers?

Il_Gav · March 26, 2026, 2:54pm

Hi,

Thanks for the tips and doc references, this is really helpful. I think we now need to switch to this approach.

Currently we have one persona with multiple agents under the hood, same voice. STT+VAD is needed since a real human can join.

Two follow-up questions: for a future use case we want 2–4 independent personas in one room. I think we can drive interruptions programmatically via @function_tool rather than STT and VAD. Do you think that’s a good idea or is there a better approach?

Thanks

CWilson · March 26, 2026, 9:59pm

I am not sure. It really comes down to your use case and what you are really trying to achieve. How those “personas” interact, or if there only one active at a time, etc.

Topic		Replies	Views
Cross-process agent handoff — any plans for first-class support? Agents agent-development , agent-deployment	2	46	April 10, 2026
Achieving multi‑agent awareness and state synchronization with LiveKit Data Channels Agents agent-development , agent-sdk-python , python , agent-deployment , react-components , agent-observability , stt , llm , tts , realtime , avatar , node-js , livestream , gemini , openai , perplexity , deepgram , elevenlabs , turn-detection	3	111	February 24, 2026
User_state stuck in speaking during the agent handoff Agents agent-development , agent-sdk-python , turn-detection	4	42	May 26, 2026
Agent-to-agent audio Agents agent-sdk-python	1	30	April 6, 2026
Python Agents 1.5.0 Released Agents python	0	202	March 19, 2026

Multi-agent turn coordination — text-stream ack protocol vs native SDK approach?

Related topics