Hey for the folks using gpt realtime api for their agents, do you guys face an issue where the text generated by the model along with the audio misses at times.
That is, in frontend the transcript misses being shown, but the audio is completely spoken out by the agent.
As far as I know with the GPT real-time you cannot see the exact text that model understand, it’s rather some sort of STT model running in parallel like whisper for example, there for there may be discrepancy between what model actually hear and what is transcribed.