Unstability with livekit plugins for azure openai realtime

sahil.dutta · May 25, 2026, 9:02am

Hi LiveKit Team,

I’m facing an issue while using GPT Realtime 2 as a single model for STT, LLM, and TTS.

Configuration:

realtime_kwargs = {
“azure_deployment”: “gpt-realtime-2”,
“azure_endpoint”: “https://my-openai-resource.openai.azure.com”,
“api_key”: “AZURE_OPENAI_API_KEY”,
“api_version”: “2024-10-01-preview”,
“temperature”: 0.7,
“modalities”: [“audio”, “text”],
“voice”: “alloy”,
}

llm_service = openai.realtime.RealtimeModel.with_azure(**realtime_kwargs)

Error observed:

{
“message”: “expected to receive only one message generation from the realtime API”,
“level”: “WARNING”,
“name”: “livekit.agents”
}

After this warning, the agent suddenly stops speaking/responding until the conversation is triggered again.

Could you please help identify whether this is related to multi-generation handling or compatibility with GPT-Realtime-2?

sahil.dutta · May 25, 2026, 9:08am

@darryncampbell could you help me out with this

Muhammad_Usman_Bashir · May 25, 2026, 3:02pm

@sahil.dutta, The warning lines up with an assumption in the Realtime plugin: “Our code assumes a response will generate only one item with type ‘message’” [ livekit/agents/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py ]. When the Realtime API emits multiple message items in one response, downstream handling breaks, which produces the “stops until next trigger” behavior.

Two things to try:

Update your api_version. You’re on 2024-10-01-preview (October 2024). The plugin’s _AZURE_EVENT_MAPPING [same file] normalizes Azure’s old beta event names to OpenAI GA event names, so a newer api_version may avoid the multi-message case. Check Microsoft’s current Azure OpenAI Realtime api_version and bump.
Drop text from modalities. Try modalities: ['audio'] only. The single-message assumption is more likely to hold when only audio is emitted; text + audio can land as separate items depending on API version.

darryncampbell · May 26, 2026, 10:26am

Did you try with the latest Agents release, 1.5.12? It contains:

github.com/livekit/agents

feat(realtime): support multi-message generation per response (#5763)

main ← longc/multi-message-realtime-v2

opened 06:46AM - 18 May 26 UTC

longcw

+212 -150

## Summary - Process each `MessageGeneration` from `generation_ev.message_str…eam` serially via `perform_audio_forwarding` + `perform_text_forwarding` + `wait_for_playout`. Only one flush is in flight at a time. - Per-msg state is derived directly from the `playback_finished` event: - `full` → emit `ChatMessage(interrupted=False)` with the msg's `message_id` - `partial` → emit `ChatMessage(interrupted=True)` and call `_rt_session.truncate(...)` with this msg's local `playback_position` (not a cumulative offset) - `skipped` → drop locally and call `update_chat_ctx(...)` so the realtime server removes never-played items from its history - `_on_first_frame` now early-returns once `started_speaking_at` is set, so per-msg first-frame callbacks don't re-fire `_update_agent_state("speaking")` for each message. ## Alternative considered #5690 makes multi-message work by flushing per message — that needs the synchronizer to keep pending/finalizing impls alive and serialize concurrent flushes in `room_io/_output.py`. Our AudioOutput assumes there is only one speech at a time, serializing per-message at the `wait_for_playout` boundary (this PR) avoids both changes. close https://github.com/livekit/agents/pull/5690, https://github.com/livekit/agents/issues/5684

sahil.dutta · June 2, 2026, 11:32am

Thanks mate, the error resolves after upgrading to latest version. I see there’s so much of noise captured and very unstable behavior while using realtime model. I found this https://playground.livekit.io/ where i see demo usage of livekit with realtime model. I am trying to replicate this setup in my agent. Can you tell me if i am correct here - we are using here two models that is whisper-1 for transcription and a realitme model for (llm and tts) ? if yes how do we control this that realtime model should only do the work of llm and tts … not the stt. Can you help me out with this ?

darryncampbell · June 2, 2026, 2:38pm

The code for that, realtime-playground/agent/main.py at main · livekit-examples/realtime-playground · GitHub, is from November 2024, which may as well be a lifetime ago in this industry

This is a better resource for OpenAI Realtime model: https://docs.livekit.io/agents/models/realtime/plugins/openai/#usage. It includes STT, LLM and TTS and I suggest just using the defaults to get started.

If you run through our Voice AI quickstart, https://docs.livekit.io/agents/start/voice-ai/, you’ll end up using our agent starter - the quickstart assumes you’re using a pipeline architecture, but there’s a commented out line in the agent.py, agent-starter-python/src/agent.py at main · livekit-examples/agent-starter-python · GitHub, which gives instructions on how to use a realtime model (architecture) instead.

Topic		Replies	Views
Realtime model with Azure whisper STT Agents python , stt , realtime , openai , azure	17	241	February 26, 2026
Voice Live API connector Agents agent-development , plugin , realtime , azure	1	42	February 23, 2026
Gpt-realtime-1.5 leaks audio control tokens (<\|audio_text\|>, <\|caption_quality_N\|>) into text stream when run with modalities=["text"] Agents tts , realtime	1	35	April 20, 2026
Realtime model is not working properly Agents realtime	8	91	May 30, 2026
Facing errors while calling update_chat_ctx when using azure open ai realtime llm Agents agent-development , python , realtime	2	54	March 17, 2026

Unstability with livekit plugins for azure openai realtime

Related topics