Elevenlabs STT Realtime and SIlero VAD | Proper setup

Rinvo_Team · March 4, 2026, 6:32pm

Hello, I have a question regarding the agent setup with elevenlabs STT and silero VAD:

elevenlabs and silero plugins are both v1.3.11

from livekit.agents import AgentSession
from livekit.plugins import elevenlabs, silero

vad_instance = silero.load(
vad_min_silence_duration=0.3,
smart_turn_threshold=0.3,
max_silence_duration=0.5,
)

stt_base = elevenlabs.STT(
model=“scribe_v2_realtime”,
language_code=“pl”,
server_vad=None
)

stt_instance = stt.StreamAdapter(
stt=stt_base,
vad=vad_instance,
)

session = AgentSession(
turn_detection=“vad”,
vad=vad_instance,
stt=stt_instance,
llm=…,
tts=…,
discard_audio_if_uninterruptible=False,
)

This setup works best among the rest I tried. It handles interruptions very fast, and doesnt bottlneck the LLM connection but it processes two audio streams at the same time. My question is: Is there a way to send a commit signal to the elevenlabs STT model via silero VAD? I want the silero to commit the transcription, handle interruptions and send the End of Speech signal to the LLM so it starts producing a response. Right now I dont seem to be able to wire it properly so there is no lag on the LLM side or double processing from silero.

Can you share a code example on how it should be implemented properly?

CWilson · March 4, 2026, 9:17pm

Short answer: You don’t manually “commit” ElevenLabs STT from Silero. When you use StreamAdapter(stt=..., vad=...), The adapter listens to VAD END_OF_SPEECH and then calls end_input() on the underlying STT stream. That END_OF_SPEECH An event is what AgentSession uses to trigger LLM generation (when turn_detection="vad").

If you’re seeing double processing, it’s usually because both:

ElevenLabs server_vad is enabled, and
You’re also wrapping it with StreamAdapter + Silero.

To make Silero the single source of truth:

Disable ElevenLabs server-side VAD.
Use StreamAdapter with a Silero VAD stream, not the raw VAD.
Keep turn_detection="vad" in AgentSession.

Correct pattern:

from livekit.agents import AgentSession

from livekit.agents.stt import StreamAdapter

from livekit.plugins import elevenlabs, silero

# 1. Load Silero VAD

vad = silero.VAD.load(

    min_speech_duration=0.1,

    min_silence_duration=0.3,

)

# 2. Create a VAD stream (important)

vad_stream = vad.stream()

# 3. ElevenLabs STT with server_vad disabled

stt_base = elevenlabs.STT(

    model="scribe_v2_realtime",

    language_code="pl",

    server_vad=None,  # ensure no server-side endpointing

)

# 4. Wrap with StreamAdapter so Silero controls commit

stt = StreamAdapter(

    stt=stt_base,

    vad=vad_stream,

)

session = AgentSession(

    stt=stt,

    vad=vad,                 # for interruption handling

    turn_detection="vad",    # VAD drives EoS → LLM

    llm=...,

    tts=...,

    discard_audio_if_uninterruptible=False,

)

In this setup:

Silero → emits END_OF_SPEECH
StreamAdapter → calls end_input() on ElevenLabs
AgentSession → receives final transcript + EoS → starts LLM

There is no separate “commit signal” API; VAD end-of-speech is the commit.

Topic		Replies	Views
Elevenlabs STT plugin for Node.js Agents agent-development , stt , plugin , node-js	2	12	May 1, 2026
ElevenLabs STT & TTS plugin both broken on LiveKit Agents v1.4.4 Agents stt , tts	5	198	April 15, 2026
Add api ElevenLabs key to agents TTS Getting Started	4	35	March 26, 2026
Python Agents 1.5.0 Released Agents python	0	155	March 19, 2026
Long-running Voice Session incur Significant Costs for WebSocket-based STT model Agents agent-development , python , stt , realtime , node-js , deepgram , elevenlabs	2	32	March 31, 2026

Elevenlabs STT Realtime and SIlero VAD | Proper setup

Related topics