Support for Live STT Partial Transcripts in Python SDK for OpenAI models

Hi LiveKit team,

I’m using the LiveKit Python Agents SDK with the OpenAI STT plugin and the gpt-4o-mini-transcribe model, and have a question regarding interim transcript streaming.

My use case is live captioning/transcript updates while the user is speaking. However, the current behavior I’m seeing is that no transcript events are emitted while the user is actively speaking. Both partial and final transcript events are only received after the user stops speaking and VAD determines the end of the utterance, at which point the transcript is delivered.

What I’m looking for is:

  • Continuous transcript updates while the user is actively speaking.
  • Interim/partial transcripts streamed in real time.
  • The ability to surface those updates immediately in the UI.

Is this expected behavior with the Python Agents SDK and the OpenAI STT plugin when using the gpt-4o-mini-transcribe model?

If realtime interim transcripts are supported for gpt-4o-mini-transcribe, is there any configuration or API that needs to be enabled? If not, what is the recommended approach for implementing live transcript streaming with the Python SDK?

For context, I’m using AgentSession with the OpenAI STT plugin and the gpt-4o-mini-transcribe model.

Thanks!

Are you open to changing your STT provider? That model won’t reliably stream progressive updates.
For example, you could switch to:

    stt=inference.STT(model="deepgram/nova-3", language="multi"),

Then you’ll get partials as your user speaks:

    @session.on("user_input_transcribed")
    def _on_transcript(ev: UserInputTranscribedEvent):
        if ev.is_final:
            logger.info(f">>> FINAL TRANSCRIPT: {ev.transcript!r}")
        else:
            logger.info(f">>> PARTIAL TRANSCRIPT: {ev.transcript!r}")

Hi Darryn,

Currently we would want to use openai models since we have a partnership with them. In order to get this working do we need to then write a custom plugin here, since as per claude analysis it seems this feature is there supported for node based agents?

Does this work with Node.JS? I tried passing in use_realtime=True, which is what I believe you are referring to, and the STT doesn’t provide the updates you are looking for.

Ok, so are we saying this is something related to open ai and there are reliability issues from their end ?

I wouldn’t say it was reliability, I didn’t think it was possible for gpt-4o-mini-transcribe to return partial transcripts continuously as the user is speaking, even if you configure it for realtime. See here, GPT-4o mini Transcribe Model | OpenAI API, it doesn’t offer realtime transcription.

Ok, tried with streaming model offerings from open ai (whisper real-time) and i get the required behavior.

So it seems it’s not there with the STT model offerings from open ai