I am trying to use gemini realtime model (gemini-live-2.5-flash-native-audio) for my voice agent and I am seeing following issues -
-
Hallucinations
-
Sometimes, model does not respond. You need to say hello 3-4 times before it generates any response.
-
Could not pronounce some words properly.
If anyone is using gemini realtime model in production, can you suggest how are you ensuring that these issues are not occurring. (edited)