Realtime model with Azure whisper STT

Benjamin_Lowe · February 20, 2026, 9:45pm

The realtime model natively supports server side OpenAI whisper so I a little unclear why you would want to use an external Azure Whisper? Is it significantly better performance than OpenAI whisper?

We are interested also in an external STT with Realtime model for transcripts given the more recent STT engines exist wth higher transcription performance than Whisper, but would ideally have the STT as an independent layer which does not affect the model itself. I think that is possible with livekit if we disable realtime model server side transcription, although one concern we have with that idea is that this blog Developer notes on the Realtime API suggests they rely on the server side transcription for cost efficiency

The GA service will automatically drop some audio tokens when a transcript is available to save tokens.

So I think we would need both server side realtime Whisper transcription (for the model input/cost efficiency) and an external livekit ran STT…but I think livekit is not really built with the idea of two simultanous STT engines like that, so am currently thinking it just might not be possible in Livekit as-is, as @darryncampbell wrote.

Topic		Replies	Views
Unstability with livekit plugins for azure openai realtime Getting Started	5	49	June 2, 2026
Gpt-realtime-1.5 leaks audio control tokens (<\|audio_text\|>, <\|caption_quality_N\|>) into text stream when run with modalities=["text"] Agents tts , realtime	1	35	April 20, 2026
Agent speaking audio_text tokens out loud Agents llm , openai	4	65	March 6, 2026
Best STT Alternative to OpenAI whisper-1 for Japanese in LiveKit Agents stt , openai	2	71	March 9, 2026
Real-time STT with auto language detection and code-switching support Agents stt	1	50	January 21, 2026

Realtime model with Azure whisper STT

Related topics