I’m using the LiveKit Python Agents SDK with the OpenAI STT plugin and the gpt-4o-mini-transcribe model, and have a question regarding interim transcript streaming.
My use case is live captioning/transcript updates while the user is speaking. However, the current behavior I’m seeing is that no transcript events are emitted while the user is actively speaking. Both partial and final transcript events are only received after the user stops speaking and VAD determines the end of the utterance, at which point the transcript is delivered.
What I’m looking for is:
Continuous transcript updates while the user is actively speaking.
Interim/partial transcripts streamed in real time.
The ability to surface those updates immediately in the UI.
Is this expected behavior with the Python Agents SDK and the OpenAI STT plugin when using the gpt-4o-mini-transcribe model?
If realtime interim transcripts are supported for gpt-4o-mini-transcribe, is there any configuration or API that needs to be enabled? If not, what is the recommended approach for implementing live transcript streaming with the Python SDK?
For context, I’m using AgentSession with the OpenAI STT plugin and the gpt-4o-mini-transcribe model.
Currently we would want to use openai models since we have a partnership with them. In order to get this working do we need to then write a custom plugin here, since as per claude analysis it seems this feature is there supported for node based agents?
Does this work with Node.JS? I tried passing in use_realtime=True, which is what I believe you are referring to, and the STT doesn’t provide the updates you are looking for.
I wouldn’t say it was reliability, I didn’t think it was possible for gpt-4o-mini-transcribe to return partial transcripts continuously as the user is speaking, even if you configure it for realtime. See here, GPT-4o mini Transcribe Model | OpenAI API, it doesn’t offer realtime transcription.