Hello, I have a question regarding the agent setup with elevenlabs STT and silero VAD:
elevenlabs and silero plugins are both v1.3.11
from livekit.agents import AgentSession
from livekit.plugins import elevenlabs, silero
vad_instance = silero.load(
vad_min_silence_duration=0.3,
smart_turn_threshold=0.3,
max_silence_duration=0.5,
)
stt_base = elevenlabs.STT(
model=“scribe_v2_realtime”,
language_code=“pl”,
server_vad=None
)
stt_instance = stt.StreamAdapter(
stt=stt_base,
vad=vad_instance,
)
session = AgentSession(
turn_detection=“vad”,
vad=vad_instance,
stt=stt_instance,
llm=…,
tts=…,
discard_audio_if_uninterruptible=False,
)
This setup works best among the rest I tried. It handles interruptions very fast, and doesnt bottlneck the LLM connection but it processes two audio streams at the same time. My question is: Is there a way to send a commit signal to the elevenlabs STT model via silero VAD? I want the silero to commit the transcription, handle interruptions and send the End of Speech signal to the LLM so it starts producing a response. Right now I dont seem to be able to wire it properly so there is no lag on the LLM side or double processing from silero.
Can you share a code example on how it should be implemented properly?
Short answer: You don’t manually “commit” ElevenLabs STT from Silero. When you use StreamAdapter(stt=..., vad=...), The adapter listens to VAD END_OF_SPEECH and then calls end_input() on the underlying STT stream. That END_OF_SPEECH An event is what AgentSession uses to trigger LLM generation (when turn_detection="vad").
If you’re seeing double processing, it’s usually because both:
-
ElevenLabs server_vad is enabled, and
-
You’re also wrapping it with StreamAdapter + Silero.
To make Silero the single source of truth:
-
Disable ElevenLabs server-side VAD.
-
Use StreamAdapter with a Silero VAD stream, not the raw VAD.
-
Keep turn_detection="vad" in AgentSession.
Correct pattern:
from livekit.agents import AgentSession
from livekit.agents.stt import StreamAdapter
from livekit.plugins import elevenlabs, silero
# 1. Load Silero VAD
vad = silero.VAD.load(
min_speech_duration=0.1,
min_silence_duration=0.3,
)
# 2. Create a VAD stream (important)
vad_stream = vad.stream()
# 3. ElevenLabs STT with server_vad disabled
stt_base = elevenlabs.STT(
model="scribe_v2_realtime",
language_code="pl",
server_vad=None, # ensure no server-side endpointing
)
# 4. Wrap with StreamAdapter so Silero controls commit
stt = StreamAdapter(
stt=stt_base,
vad=vad_stream,
)
session = AgentSession(
stt=stt,
vad=vad, # for interruption handling
turn_detection="vad", # VAD drives EoS → LLM
llm=...,
tts=...,
discard_audio_if_uninterruptible=False,
)
In this setup:
Silero → emits END_OF_SPEECH
StreamAdapter → calls end_input() on ElevenLabs
AgentSession → receives final transcript + EoS → starts LLM
There is no separate “commit signal” API; VAD end-of-speech is the commit.