Bugs and proposed enhancements related to the generate_reply timeout following up actions

Terry_So · June 25, 2026, 7:51am

Hi @Long Chen

We have raised bugs and proposed enhancements in the livekit agent repo, which are related to the generate_reply timeout following up actions:

github.com/livekit/agents

Playback still happen for late response.create (after the timeout of its corresponding generate_reply)

opened 06:46AM - 25 Jun 26 UTC

moz164164

bug

### Bug Description According to [https://deepwiki.com/search/after-failed-to-g…enerate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep](https://deepwiki.com/search/after-failed-to-generate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep), the timeout of `generate_reply` does not trigger `response.cancel` to stop` response.create` and thus openai request pipeline triggered by `generate_reply` still keep running even though internal generate_reply timed out happened. In this situation, late `response.created `(after the timeout of its corresponding generate_reply) won't resolve any future, and thus no `SpeechHandle` is ever wired up to consume the audio events generated by the timeouted `generate_reply`, which should lead to no playback for it. However, when I try to trigger "failed to generate a reply: generate_reply timed out." through reducing the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py` and then implementing `await agent_session.generate_reply()`. **Playback** for the timeouted `generate_reply` **does happen** after the late response.created(after the timeout) ### Expected Behavior Playback should not happen for late `response.create` (after the timeout of its corresponding `generate_reply`) ### Reproduction Steps ```bash 1. Reduce the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py` 2. Use openai realtime model with openai.realtime.RealtimeModel( model="gpt-realtime", voice="marin", turn_detection=ServerVad( type="server_vad" prefix_padding_ms=300, silence_duration_ms=500, threshold=0.5, create_response=False, interrupt_response=False, ), temperature=0.6, input_audio_noise_reduction=NOT_GIVEN ), input_audio_transcription=AudioTranscription(language=lang, model="whisper-1"), max_session_duration=55 * 60, ) 3. Implement `await agent_session.generate_reply()` ``` ### Operating System Linus, MacOS ### Models Used "gpt-realtime" ### Package Versions ```bash "livekit~=1.1", "livekit-agents[azure,openai,turn-detector,silero,elevenlabs]==1.6.0", "livekit-api~=1.1", "livekit-plugins-noise-cancellation~=0.2.0" ``` ### Session/Room/Call IDs _No response_ ### Proposed Solution May be offer developer a choice to stop `response.create` when the timeout of `generate_reply` happen to ensure its corresponding playback not happen. I have mentioned here: https://github.com/livekit/agents/issues/6223 ### Additional Context `` ### Screenshots and Recordings _No response_

github.com/livekit/agents

Allow developer to choose to whether stop response.create when generate_reply timeout

opened 07:17AM - 25 Jun 26 UTC

moz164164

enhancement

### Feature Type I cannot use LiveKit without it ### Feature Description Acco…rding to [https://deepwiki.com/search/after-failed-to-generate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep](https://deepwiki.com/search/after-failed-to-generate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep), the timeout of `generate_reply` does not trigger `response.cancel` to stop `response.create` and thus openai request pipeline triggered by generate_reply still keep running even though internal generate_reply timed out happened. We want it can allow developer to choose to whether to stop response.create when `generate_reply` timeout due to following 3 reasons: 1. Our use case want openai request pipeline triggered by `generate_reply` completely stop when internal `generate_reply` time out happened. 2. According to [https://deepwiki.com/search/after-failed-to-generate-a-rep_2745306e-349a-44c8-9b7a-3bbdc43d37ce?mode=fast](https://deepwiki.com/search/after-failed-to-generate-a-rep_2745306e-349a-44c8-9b7a-3bbdc43d37ce?mode=fast), when `generate_reply` times out locally, the OpenAI server may still be processing the original request, and a subsequent generate_reply call can trigger the "OpenAI Realtime API returned an error: RealtimeError(message='Conversation already has an active response in progress:" error. The image below show the details <img width="1399" height="1453" alt="Image" src="https://github.com/user-attachments/assets/02feda21-d9b7-495e-847f-88a89d7cdd90" /> 3. We don't need the audio playback for `response.create `of the timeouted `generate_reply`. It is costly and wasteful that openai server is still processing it (input token and output token cost money) ### Workarounds / Alternatives _No response_ ### Additional Context To force to trigger "failed to generate a reply: generate_reply timed out.", please following the steps below: 1. Reduce the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py` 2. Use openai realtime model with openai.realtime.RealtimeModel( model="gpt-realtime", voice="marin", turn_detection=ServerVad( type="server_vad" prefix_padding_ms=300, silence_duration_ms=500, threshold=0.5, create_response=False, interrupt_response=False, ), temperature=0.6, input_audio_noise_reduction=NOT_GIVEN ), input_audio_transcription=AudioTranscription(language=lang, model="whisper-1"), max_session_duration=55 * 60, ) 3. Implement `await agent_session.generate_reply()`

github.com/livekit/agents

llm.RealtimeError("generate_reply timed out.") for "failed to generate a reply: generate_reply timed out" cannot be captured by exception

opened 07:33AM - 25 Jun 26 UTC

moz164164

bug

### Bug Description According to [(https://deepwiki.com/search/after-failed-to-…generate-a-rep_f1da2841-db51-441a-ad41-86e5f2ebc777?mode=fast](https://deepwiki.com/search/after-failed-to-generate-a-rep_f1da2841-db51-441a-ad41-86e5f2ebc777?mode=fast), when using the standard `AgentSession.generate_reply() `method, the framework handles the timeout internally and logs it instead of raising it to our application code. However, when `geneate_reply` timeout happen, no RealtimeError exception **cannot be captured** through: ```python try: await agent_session.generate_reply() except RealtimeError as e: LOGGER.exception( "Error generating reply: %s"} ) ``` Therefore, we cannot easily to notice when the timeout happen and do some following action. I think this is the bug due to imperfect design of livekit agent. ### Expected Behavior Raise `RealtimeError` when timeout happen during `await agent_session.generate_reply()`. ### Reproduction Steps ```bash 1. Reduce the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py` 2. Use openai realtime model with openai.realtime.RealtimeModel( model="gpt-realtime", voice="marin", turn_detection=ServerVad( type="server_vad" prefix_padding_ms=300, silence_duration_ms=500, threshold=0.5, create_response=False, interrupt_response=False, ), temperature=0.6, input_audio_noise_reduction=NOT_GIVEN ), input_audio_transcription=AudioTranscription(language=lang, model="whisper-1"), max_session_duration=55 * 60, ) 3. Implement try: await agent_session.generate_reply() except RealtimeError as e: LOGGER.exception( "Error generating reply: %s"} ) ``` ### Operating System MacOS, linus ### Models Used gpt-realtime ### Package Versions ```bash "livekit~=1.1", "livekit-agents[azure,openai,turn-detector,silero,elevenlabs]==1.6.0", "livekit-api~=1.1", "livekit-plugins-noise-cancellation~=0.2.0" ``` ### Session/Room/Call IDs _No response_ ### Proposed Solution ```python ``` ### Additional Context _No response_ ### Screenshots and Recordings _No response_

Please tackle all of them and keep us posted

darryncampbell · June 26, 2026, 8:34am

Thanks for the submissions, I’ll let the engineering teams comment on the PRs, I see there has been some activity by that team already. More complex PRs usually take a little while to process.

Topic		Replies	Views
Generate reply timeout for gemini-live-2.5-flash-native-audio realtime model Agents python , agent-deployment , realtime	1	14	June 25, 2026
Does anyone else see increase in timeouts from gpt-realtime since around 1st June 2026, particularly 5th June 2026 onwards? Agents agent-development , python	0	24	June 7, 2026
Generate_reply Times Out on First Dialogue with gemini-3.1-flash-live-preview Agents python , realtime , gemini , google	2	209	March 29, 2026
Gemini 3.1 Flash live generateReply() timeout issue nodejs Agents realtime , node-js , gemini	1	73	March 28, 2026
Trying gemini 3.1 flash live and I can't seem to make it start talking? Getting Started	11	335	May 15, 2026

Bugs and proposed enhancements related to the generate_reply timeout following up actions

Related topics