Real-time STT with auto language detection and code-switching support

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

I’m looking for real-time STT that can auto-detect language on the fly (no language hint at init) for a voice app.

Ideally it should handle mid-utterance code-switching (e.g., Spanish ↔️ English) with low latency.

What providers/models are people using with LiveKit today? I’m currently using Groq Whisper Large.

Options that support auto language detection:

Both of these can handle code-switching scenarios with reasonable latency.

For STT specifically (if not using realtime API):

  • Deepgram Nova-3
  • AssemblyAI Universal Model