Solving end-of-turn detection: LiveKit Turn Detector v1.0

We shipped LiveKit Turn Detector v1.

Instead of reading transcripts, it listens to speech directly and combines semantic and acoustic cues into a single end-of-turn prediction.

The result: high accuracy, low latency—the best model we tested across 14 languages.

Available on LiveKit

Watch the demo:

Announcement blog:

Resources

2 Likes

I also updated the FAQ to add clarity around the v1 and v1-mini distinction as it relates to self-hosted agents:

I want to mention the existence of the Krisp Turn-taking V3.0 model.

In the blog post A solution to Turn-Taking and Interruption Prediction in Voice AI it mentions evaluation techniques that are similar to yours.

Additionally, an open dataset is shared by Krisp LLC for the TT task. and the dataset chosen from real communication audios. See this link Krisp-AI/turn-taking-test-v1 · Discussions .

At the end It will be good to see the Krisp TTV3.0 evaluation results in your blog post also.

Thanks for the good explanations and visualizations in your blogs.

Hey @Artur_Kobelyan welcome to the community - we might need to consider some kind of avatar flair for people like yourself to make it easier to recognise partners.

Thanks for sharing. I’m not sure exactly how the models were chosen, and I expect there just needed to be a cut-off somewhere, but I’ll re-share the feedback with the model team.

Thanks, Darryn. Anything related to TT is super interesting to me since I actually worked on this project back at Krisp LLC. I’m still pretty new to the community, so my apologies if I posted this comment in the wrong channel!