Dangerous assistant turn merging with Gemini

mikhail · February 19, 2026, 2:59pm

Hello.

We do some turn-by-turn injection of dynamic current state into the LLM context. With GPT models we used the “assistant” role for this injection to keep prompt caching performant. The dynamic state would sit towards the end of the data stream sent to the LLM on every turn, so prompt caching would work.

When we switched to Gemini-3-Flash, we discovered a nasty bug. It appears that Gemini does not support multiple history messages from the same role, so the Google plugin code merges neighboring messages:

  # google.py:43-48 — implicit merge of consecutive same-role messages

  if role != current_role:

      if current_role is not None and parts:

          turns.append({"role": current_role, "parts": parts})

      parts = []

      current_role = role

This is OK for the most part. But occasionally, especially when dealing with error messages in tool calls, the model leaks the injected context to the user. We’ve literally witnessed the model vomiting the injected content to the user through the TTS! Rare, but it happens.

Switching injection to “system” eliminates this concern, but breaks caching based on my understanding. Dynamic turn-by-turn context is now injected much earlier in the data stream.

Do you have any canonical advice here? What role is best for injecting dynamic turn-by-turn state?

Thanks

darryncampbell · February 20, 2026, 9:38am

I would recommend using system and measuring the effect this has on cached tokens using Data hooks | LiveKit Documentation. I don’t think it will have a large impact, but it would be good to measure.

If not, you could check for injected content in tts_node, but that’s probably not the best architecture.

Topic		Replies	Views
Gemini Explicit Context Caching (cached_content) drops system_instruction in livekit.plugins.google — How to inject dynamic session variables? Getting Started	2	26	June 9, 2026
Whats your current go-to LLM model? Agents llm	8	129	May 5, 2026
Response.prompt_cache_retention Input should be ‘in-memory’ or ‘24h Agents agent-development , openai	2	55	April 21, 2026
Question with LLM tool calling Agents agent-development	9	174	February 16, 2026
Livekit Inference no-thinking config for google gemini 2.5 flash model Getting Started livekit-inference	4	69	March 26, 2026

Dangerous assistant turn merging with Gemini

Related topics