Dangerous assistant turn merging with Gemini

Hello.

We do some turn-by-turn injection of dynamic current state into the LLM context. With GPT models we used the “assistant” role for this injection to keep prompt caching performant. The dynamic state would sit towards the end of the data stream sent to the LLM on every turn, so prompt caching would work.

When we switched to Gemini-3-Flash, we discovered a nasty bug. It appears that Gemini does not support multiple history messages from the same role, so the Google plugin code merges neighboring messages:

  # google.py:43-48 — implicit merge of consecutive same-role messages

  if role != current_role:

      if current_role is not None and parts:

          turns.append({"role": current_role, "parts": parts})

      parts = []

      current_role = role


This is OK for the most part. But occasionally, especially when dealing with error messages in tool calls, the model leaks the injected context to the user. We’ve literally witnessed the model vomiting the injected content to the user through the TTS! Rare, but it happens.

Switching injection to “system” eliminates this concern, but breaks caching based on my understanding. Dynamic turn-by-turn context is now injected much earlier in the data stream.

Do you have any canonical advice here? What role is best for injecting dynamic turn-by-turn state?

Thanks

I would recommend using system and measuring the effect this has on cached tokens using Data hooks | LiveKit Documentation. I don’t think it will have a large impact, but it would be good to measure.

If not, you could check for injected content in tts_node, but that’s probably not the best architecture.

1 Like