Terry_So
(Terry So)
June 25, 2026, 7:51am
1
Hi @Long Chen
We have raised bugs and proposed enhancements in the livekit agent repo, which are related to the generate_reply timeout following up actions:
opened 06:46AM - 25 Jun 26 UTC
bug
### Bug Description
According to [https://deepwiki.com/search/after-failed-to-g… enerate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep](https://deepwiki.com/search/after-failed-to-generate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep), the timeout of `generate_reply` does not trigger `response.cancel` to stop` response.create` and thus openai request pipeline triggered by `generate_reply` still keep running even though internal generate_reply timed out happened. In this situation, late `response.created `(after the timeout of its corresponding generate_reply) won't resolve any future, and thus no `SpeechHandle` is ever wired up to consume the audio events generated by the timeouted `generate_reply`, which should lead to no playback for it.
However, when I try to trigger "failed to generate a reply: generate_reply timed out." through reducing the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py` and then implementing `await agent_session.generate_reply()`. **Playback** for the timeouted `generate_reply` **does happen** after the late response.created(after the timeout)
### Expected Behavior
Playback should not happen for late `response.create` (after the timeout of its corresponding `generate_reply`)
### Reproduction Steps
```bash
1. Reduce the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py`
2. Use openai realtime model with openai.realtime.RealtimeModel(
model="gpt-realtime",
voice="marin",
turn_detection=ServerVad(
type="server_vad"
prefix_padding_ms=300,
silence_duration_ms=500,
threshold=0.5,
create_response=False,
interrupt_response=False,
),
temperature=0.6,
input_audio_noise_reduction=NOT_GIVEN
),
input_audio_transcription=AudioTranscription(language=lang, model="whisper-1"),
max_session_duration=55 * 60,
)
3. Implement `await agent_session.generate_reply()`
```
### Operating System
Linus, MacOS
### Models Used
"gpt-realtime"
### Package Versions
```bash
"livekit~=1.1",
"livekit-agents[azure,openai,turn-detector,silero,elevenlabs]==1.6.0",
"livekit-api~=1.1",
"livekit-plugins-noise-cancellation~=0.2.0"
```
### Session/Room/Call IDs
_No response_
### Proposed Solution
May be offer developer a choice to stop `response.create` when the timeout of `generate_reply` happen to ensure its corresponding playback not happen. I have mentioned here:
https://github.com/livekit/agents/issues/6223
### Additional Context
``
### Screenshots and Recordings
_No response_
opened 07:17AM - 25 Jun 26 UTC
enhancement
### Feature Type
I cannot use LiveKit without it
### Feature Description
Acco… rding to [https://deepwiki.com/search/after-failed-to-generate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep](https://deepwiki.com/search/after-failed-to-generate-a-rep_2136620f-4226-4394-96f6-b0867e3fd25b?mode=deep), the timeout of `generate_reply` does not trigger `response.cancel` to stop `response.create` and thus openai request pipeline triggered by generate_reply still keep running even though internal generate_reply timed out happened.
We want it can allow developer to choose to whether to stop response.create when `generate_reply` timeout due to following 3 reasons:
1. Our use case want openai request pipeline triggered by `generate_reply` completely stop when internal `generate_reply` time out happened.
2. According to [https://deepwiki.com/search/after-failed-to-generate-a-rep_2745306e-349a-44c8-9b7a-3bbdc43d37ce?mode=fast](https://deepwiki.com/search/after-failed-to-generate-a-rep_2745306e-349a-44c8-9b7a-3bbdc43d37ce?mode=fast), when `generate_reply` times out locally, the OpenAI server may still be processing the original request, and a subsequent generate_reply call can trigger the "OpenAI Realtime API returned an error: RealtimeError(message='Conversation already has an active response in progress:" error. The image below show the details
<img width="1399" height="1453" alt="Image" src="https://github.com/user-attachments/assets/02feda21-d9b7-495e-847f-88a89d7cdd90" />
3. We don't need the audio playback for `response.create `of the timeouted `generate_reply`. It is costly and wasteful that openai server is still processing it (input token and output token cost money)
### Workarounds / Alternatives
_No response_
### Additional Context
To force to trigger "failed to generate a reply: generate_reply timed out.", please following the steps below:
1. Reduce the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py`
2. Use openai realtime model with openai.realtime.RealtimeModel(
model="gpt-realtime",
voice="marin",
turn_detection=ServerVad(
type="server_vad"
prefix_padding_ms=300,
silence_duration_ms=500,
threshold=0.5,
create_response=False,
interrupt_response=False,
),
temperature=0.6,
input_audio_noise_reduction=NOT_GIVEN
),
input_audio_transcription=AudioTranscription(language=lang, model="whisper-1"),
max_session_duration=55 * 60,
)
3. Implement `await agent_session.generate_reply()`
opened 07:33AM - 25 Jun 26 UTC
bug
### Bug Description
According to [(https://deepwiki.com/search/after-failed-to-… generate-a-rep_f1da2841-db51-441a-ad41-86e5f2ebc777?mode=fast](https://deepwiki.com/search/after-failed-to-generate-a-rep_f1da2841-db51-441a-ad41-86e5f2ebc777?mode=fast), when using the standard `AgentSession.generate_reply() `method, the framework handles the timeout internally and logs it instead of raising it to our application code.
However, when `geneate_reply` timeout happen, no RealtimeError exception **cannot be captured** through:
```python
try:
await agent_session.generate_reply()
except RealtimeError as e:
LOGGER.exception(
"Error generating reply: %s"}
)
```
Therefore, we cannot easily to notice when the timeout happen and do some following action. I think this is the bug due to imperfect design of livekit agent.
### Expected Behavior
Raise `RealtimeError` when timeout happen during `await agent_session.generate_reply()`.
### Reproduction Steps
```bash
1. Reduce the default timeout of the corresponding source code from 10 to 0.01 in `livekit/plugins/openai/realtime/realtime_model.py`
2. Use openai realtime model with openai.realtime.RealtimeModel(
model="gpt-realtime",
voice="marin",
turn_detection=ServerVad(
type="server_vad"
prefix_padding_ms=300,
silence_duration_ms=500,
threshold=0.5,
create_response=False,
interrupt_response=False,
),
temperature=0.6,
input_audio_noise_reduction=NOT_GIVEN
),
input_audio_transcription=AudioTranscription(language=lang, model="whisper-1"),
max_session_duration=55 * 60,
)
3. Implement
try:
await agent_session.generate_reply()
except RealtimeError as e:
LOGGER.exception(
"Error generating reply: %s"}
)
```
### Operating System
MacOS, linus
### Models Used
gpt-realtime
### Package Versions
```bash
"livekit~=1.1",
"livekit-agents[azure,openai,turn-detector,silero,elevenlabs]==1.6.0",
"livekit-api~=1.1",
"livekit-plugins-noise-cancellation~=0.2.0"
```
### Session/Room/Call IDs
_No response_
### Proposed Solution
```python
```
### Additional Context
_No response_
### Screenshots and Recordings
_No response_
Please tackle all of them and keep us posted
Thanks for the submissions, I’ll let the engineering teams comment on the PRs, I see there has been some activity by that team already. More complex PRs usually take a little while to process.