Issue with programmatically toggle STT/TTS on off

Goktug_Gumus · February 23, 2026, 12:05pm

I have an LLM + STT + TTS pipeline and I need to programmatically toggle the STT/TTS parts on and off.
Right now, even when I try to disable them via audio.set_audio_enabled(), they don’t fully shut down.

The audio stream continues, and the session eventually crashes because it’s still attempting to process recognized audio.

I think I tried all the tricks but couldn’t be successful. Please help

darryncampbell · February 23, 2026, 1:40pm

Hi, did you see this? Agent session | LiveKit Documentation it also links to a couple of examples that show how to toggle room IO:

Goktug_Gumus · February 23, 2026, 3:01pm

Yes, I reviewed those examples. When I use session.input.set_audio_enabled(False), the AgentSession crashes after retrying speech recognition. It still attempts to process the audio event, even though no audio is being sent.

This is the issue I see:
WARNING:livekit.agents:failed to recognize speech: Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time. [], retrying in 2.0s

I want the user to start with text-based interaction only, and during the session allow them to connect to the audio pipeline for spoken interaction.

darryncampbell · February 23, 2026, 3:45pm

Both examples crash, or just one? I’ll try to reproduce. Which version of LiveKit agents are you using?

Goktug_Gumus · February 23, 2026, 3:55pm

Toggle.io one, I’m on version 1.4.2, and this kind of workaround is required to prevent the crash. However, I’m not sure whether this is the intended or recommended approach cos requires accessing internal APIs

  def _deactivate_stt_node(self, session) -> None:
    """Stop the STT stream to prevent Audio Timeout errors.

    Args:
      session: The active AgentSession
    """
    if (
      hasattr(session, '_activity')
      and session._activity is not None
      and session._activity._audio_recognition is not None
    ):
      session._activity._audio_recognition.update_stt(None)
      logger.info("🔇 STT stream stopped")

darryncampbell · February 24, 2026, 9:17am

OK, for agents/examples/voice_agents/toggle_io.py at main · livekit/agents · GitHub , when I connect to a room I see the following exception:

Exception: cannot access local participant before connecting

I will need to fix that in the sample, but the fix is to add:

await ctx.connect()

Immediately after await session.start(…)

After you apply that fix and connect to a room, it should work. You can test it as follows:

Start the agent uv run examples/voice_agents/toggle_io.py dev
Use the Agents Playground as a front end: https://agents-playground.livekit.io/
Press Connect in the Playground and allow the agent to connect to your toom
Under the RPC options in the playground specify the method name as toggle_input and then the Payload is either audio_off or audio_on - you should see the agent stops or starts responding to your speech.
The example is written with the OpenAI realtime LLM, but it should work with any LLM/STT/TTS - you’ll need to modify it to match your setup if you don’t have an OpenAI key

Elijah.Rose · February 24, 2026, 6:43pm

Hey! I ran into this exact issue when building voice agents that needed to “mute” themselves during certain workflows. Have you tried controlling it at the VAD (Voice Activity Detection) level? That’s been the cleanest approach for me.

The idea is that VAD sits at the entry point of your pipeline before STT even kicks in. When you disable VAD, the agent stops detecting speech entirely, so the whole pipeline stays idle without needing to tear down STT/TTS.

Here’s what worked for me:

python

# Control listening state via VAD**
async def toggle_listening(agent: VoiceAssistant, enabled: bool):
if enabled:
agent.vad.start() # Resume speech detection
else:
agent.vad.stop() # Ignore audio input**

If you need something heavier (like completely pausing multi-turn conversations), I’d recommend pausing the entire agent session instead:

python

# Pause the full agent context, not just listening**
async def toggle_agent(agent: VoiceAssistant, active: bool):
if active:
await agent.resume()
else:
await agent.pause()**

VAD control is way more efficient because audio tracks keep flowing you’re just not processing them. Pause/resume is better when you’re switching between different interaction modes entirely.

What’s your use case? Are you trying to implement push to talk, or is it more like “agent speaks, user can’t interrupt”?

Topic		Replies	Views
How to stop routing LLM output to TTS when sound is off Agents agent-development , tts	1	16	January 21, 2026
Agents-playground not displaying text in Speech to text agent Agents agent-development , stt	2	17	March 11, 2026
How to pause STT/LLM processing during a long-running function call Agents agent-development	1	36	January 21, 2026
AI_coustics blocking pipeline (?) Getting Started	2	56	April 21, 2026
How to switch TTS during agent runtime Agents agent-development , tts	1	17	January 21, 2026

Issue with programmatically toggle STT/TTS on off

Related topics