Gpt-realtime-2 + LiveKit: VAD does not work well

Hi folks, I have started using realtime-2, but the VAD is horrible. Any sort of backchanelling like “uh ha”, “ok” would trigger the agent to stop talking. My current settings are below:

turn_detection:

  type: server_vad                 # semantic_vad | server_vad
  eagerness: low                # low | medium | high (semantic_vad only)
  create_response: true
  interrupt_response: true
  threshold: 0.8                   # server_vad only — energy detection threshold
  prefix_padding_ms: 300           # server_vad only — ms of audio before speech
  silence_duration_ms: 700         # server_vad only — ms of silence to end turn

I have tried playing with various settings but it’s still not working well.

Is this a known issue or am I missing something?

@James_Lau, server_vad is the issue: it’s energy-only, so “uh ha”/“ok” trip interrupt regardless of threshold. Switch to semantic_vad, which classifies on the actual words. Note eagerness: low in your config is dead; it only applies when type is semantic_vad.

  # livekit-agents==1.5.x
  from livekit.plugins.openai import realtime
  from openai.types.beta.realtime.session import TurnDetection

  llm = realtime.RealtimeModel(
      turn_detection=TurnDetection(
          type="semantic_vad",
          eagerness="low",
          create_response=True,
          interrupt_response=True,
      ),
  )

Worth knowing: with realtime models, LiveKit-side InterruptionOptions are mostly ignored; only enabled and discard_audio_if_uninterruptible apply. All tuning has to happen on the model’s own TurnDetection. If semantic_vad + eagerness="low" still over-interrupts, the escape hatch is turn_detection=None on the model and run LiveKit’s turn detector, but that needs an STT plugin, doubling transcription cost. Filler-word filtering as a separate knob is requested in Ignore Filler Words During Interruption Detection · Issue #4450 · livekit/agents · GitHub, not yet in main.

https://docs.livekit.io/agents/integrations/realtime/openai/

Hi everyone, thanks for reporting this! We’re working on full support of gpt-realtime-2 in this PR, feel free to follow along for progress :slight_smile: