Elevenlabs STT Realtime and SIlero VAD | Proper setup

Hello, I have a question regarding the agent setup with elevenlabs STT and silero VAD:

elevenlabs and silero plugins are both v1.3.11

from livekit.agents import AgentSession
from livekit.plugins import elevenlabs, silero

vad_instance = silero.load(
vad_min_silence_duration=0.3,
smart_turn_threshold=0.3,
max_silence_duration=0.5,
)

stt_base = elevenlabs.STT(
model=“scribe_v2_realtime”,
language_code=“pl”,
server_vad=None
)

stt_instance = stt.StreamAdapter(
stt=stt_base,
vad=vad_instance,
)

session = AgentSession(
turn_detection=“vad”,
vad=vad_instance,
stt=stt_instance,
llm=…,
tts=…,
discard_audio_if_uninterruptible=False,
)

This setup works best among the rest I tried. It handles interruptions very fast, and doesnt bottlneck the LLM connection but it processes two audio streams at the same time. My question is: Is there a way to send a commit signal to the elevenlabs STT model via silero VAD? I want the silero to commit the transcription, handle interruptions and send the End of Speech signal to the LLM so it starts producing a response. Right now I dont seem to be able to wire it properly so there is no lag on the LLM side or double processing from silero.

Can you share a code example on how it should be implemented properly?

Short answer: You don’t manually “commit” ElevenLabs STT from Silero. When you use StreamAdapter(stt=..., vad=...), The adapter listens to VAD END_OF_SPEECH and then calls end_input() on the underlying STT stream. That END_OF_SPEECH An event is what AgentSession uses to trigger LLM generation (when turn_detection="vad").

If you’re seeing double processing, it’s usually because both:

  1. ElevenLabs server_vad is enabled, and

  2. You’re also wrapping it with StreamAdapter + Silero.

To make Silero the single source of truth:

  • Disable ElevenLabs server-side VAD.

  • Use StreamAdapter with a Silero VAD stream, not the raw VAD.

  • Keep turn_detection="vad" in AgentSession.

Correct pattern:

from livekit.agents import AgentSession

from livekit.agents.stt import StreamAdapter

from livekit.plugins import elevenlabs, silero

# 1. Load Silero VAD

vad = silero.VAD.load(

    min_speech_duration=0.1,

    min_silence_duration=0.3,

)

# 2. Create a VAD stream (important)

vad_stream = vad.stream()

# 3. ElevenLabs STT with server_vad disabled

stt_base = elevenlabs.STT(

    model="scribe_v2_realtime",

    language_code="pl",

    server_vad=None,  # ensure no server-side endpointing

)

# 4. Wrap with StreamAdapter so Silero controls commit

stt = StreamAdapter(

    stt=stt_base,

    vad=vad_stream,

)

session = AgentSession(

    stt=stt,

    vad=vad,                 # for interruption handling

    turn_detection="vad",    # VAD drives EoS → LLM

    llm=...,

    tts=...,

    discard_audio_if_uninterruptible=False,

)


In this setup:

Silero → emits END_OF_SPEECH
StreamAdapter → calls end_input() on ElevenLabs
AgentSession → receives final transcript + EoS → starts LLM

There is no separate “commit signal” API; VAD end-of-speech is the commit.