I’m working with a self-hosted LiveKit Agents worker and I’m looking into the new adaptive interruption mode.
From the docs, I understand that adaptive interruption handling currently runs on LiveKit Cloud inference infrastructure, while the turn detector model itself is available as an open-weights model. I also understand that adaptive interruption handling is separate from the LLM and is used to distinguish real barge-ins from backchanneling.
Is there any roadmap to make the adaptive interruption/barge-in model available for self-hosted workers as well?
More specifically:
Will the adaptive interruption model ever be released as an open-weights model, similar to the turn detector?
Could it become available for custom/self-hosted deployments without deploying the agent to LiveKit Cloud?
Is there any planned way to use it with arbitrary STT/LLM/TTS pipelines, assuming VAD and aligned transcripts are available?
If not, is VAD-only interruption handling currently the recommended path for fully self-hosted workers?
This would be useful for deployments where the worker infrastructure is self-hosted but we still want the smoother interruption behavior shown in the adaptive interruption demos.
We do not currently have plans to release it for self hosted agents. You do get access to it for local development and testing while not running in LiveKit cloud but for production level usage we require the agent to be hosted by LiveKit cloud.
@Robbe, On a self-hosted worker, the closest pattern today is layering your own classifier on top of VAD-mode interruption.
My recommendation is to keep interruption mode as VAD so you get baseline barge-in detection, then gate the actual interrupt through your own classifier. On each interim STT chunk, run a backchannel-vs-barge-in model against the partial transcript (small fine-tuned classifier, or a quick LLM prompt depending on latency budget). If it returns “real interruption”, trigger the session’s interrupt path manually; if “backchannel”, let the agent keep speaking.
You won’t match Cloud quality straight out of the box because they’ve presumably trained on a large internal dataset of voice agent calls, but it gives you the smoother behavior without the specific adaptive model.