Hello.
We do some turn-by-turn injection of dynamic current state into the LLM context. With GPT models we used the “assistant” role for this injection to keep prompt caching performant. The dynamic state would sit towards the end of the data stream sent to the LLM on every turn, so prompt caching would work.
When we switched to Gemini-3-Flash, we discovered a nasty bug. It appears that Gemini does not support multiple history messages from the same role, so the Google plugin code merges neighboring messages:
# google.py:43-48 — implicit merge of consecutive same-role messages
if role != current_role:
if current_role is not None and parts:
turns.append({"role": current_role, "parts": parts})
parts = []
current_role = role
This is OK for the most part. But occasionally, especially when dealing with error messages in tool calls, the model leaks the injected context to the user. We’ve literally witnessed the model vomiting the injected content to the user through the TTS! Rare, but it happens.
Switching injection to “system” eliminates this concern, but breaks caching based on my understanding. Dynamic turn-by-turn context is now injected much earlier in the data stream.
Do you have any canonical advice here? What role is best for injecting dynamic turn-by-turn state?
Thanks