Summary
When using llm=google.beta.realtime.RealtimeModel (gemini-3.1-flash-live-preview) together with a separate tts=google.beta.GeminiTTS(gemini-3.1-flash-tts-preview) instance on the same AgentSession, a sequence of two session.say() calls used to deliver a scripted greeting reliably hangs on the second call for ~20 seconds with no logs, no exception, and no say() completion — even though allow_interruptions=False is set and session.input.set_audio_enabled(False) was called beforehand. The same code path works correctly when tts is swapped for a non-Google provider (e.g. sarvam.TTS), with both calls completing in well under 2 seconds combined.
Separately, even when the hang is not occurring, we’ve observed that session.input.set_audio_enabled(False) and the session’s aec_warmup_duration interruption window do not compose as expected. The AEC-warmup-based interruption suppression appears to be a fixed timer independent of how long the say() calls actually take, so it can expire mid-greeting and let STT-detected noise/echo interrupt the agent before our manually-disabled input window ends.
Environment
livekit-agentsversion: ~=1.5- Realtime model:
gemini-3.1-flash-live-preview - TTS (problem case):
google.beta.GeminiTTS, modelgemini-3.1-flash-tts-preview, voiceSulafat - TTS (working case):
sarvam.TTS, modelbulbul:v3 AgentSession(aec_warmup_duration=6, ...)
Code
python
session = AgentSession[Call_State](
userdata=call_state,
llm=realtime_model,
tts=tts_model,
user_away_timeout=user_away_timeout,
aec_warmup_duration=6,
)
python
session.input.set_audio_enabled(False)
logger.warning("disabling the user input for saying the Hello and first intro!!")
await session.say(
"Hello",
audio=audio_frames_from_file(
file_path=f"src/assets/audio_files_for_hello/{audio_file}",
),
)
logger.warning("said hello!!")
await asyncio.sleep(1.5)
await session.say("Hey there, I am an AI calling bot. Can we have a quick chat?")
logger.warning("said second sentence from the script") # <-- never reached for ~20s
session.input.set_audio_enabled(True)
logger.warning("enabled the user input")