Gemini Explicit Context Caching (cached_content) drops system_instruction in livekit.plugins.google — How to inject dynamic session variables?

Harshita_Sukumar_Patil · June 9, 2026, 6:59am

hi

We are working on optimizing costs for our voice agent utilizing gemini-2.5-flash via the livekit.plugins.google plugin. Our system prompts are large (5,000+ tokens), so we are leveraging Gemini’s Explicit Context Caching by passing a pre-warmed cached_content ID.

However, because this is an inbound voice bot, every single phone call contains dynamic runtime variables unique to that user session (e.g., customer_name, account_balance, loan_eligibility).

If we bake these variables into the static Cache ID, we cause massive cache-miss overhead and risk variable hallucination across callers. To bypass this, we tried passing the dynamic variables inside the agent initialization’s system_instruction field alongside the cached_content ID, expecting them to blend.

Instead, the plugin completely drops the system_instruction parameter, throwing this warning:

{
  "message": "dropping ['system_instruction'] from Gemini request because cached_content='projects/225719900046/locations/asia-south1/cachedContents/119712627008995328' is set; these fields must be baked into the CachedContent resource", 
  "level": "WARNING", 
  "name": "livekit.plugins.google"
}

Questions:

Is it a strict limitation of the Gemini API or the LiveKit integration that prevents passing runtime-appended system_instruction rules on top of an explicit cached_content resource?
What is the recommended LiveKit pattern to utilize explicit context caching for the static instruction layout while still declaring dynamic session metadata safely on a per-job basis?

darryncampbell · June 9, 2026, 9:12am

Yes, there is a note at agents/livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py at main · livekit/agents · GitHub as follows:

cached_content (str, optional): Resource name of an explicit context cache to attach to every request from this LLM instance, e.g. "cachedContents/abc123" for the Gemini API or "projects/<project>/locations/<location>/cachedContents/abc123" for VertexAI. The cache must already exist — create it via client.caches.create(...) and pass the returned name. Gemini rejects generateContent requests that combine cached_content with system_instruction, tools, or tool_config, so when this option is set the plugin bakes those fields out of every outgoing request; the cache resource itself must contain whichever of them the model needs (typically the system prompt and the tool schemas). Useful for long-lived static prefixes where implicit caching is unreliable. See https://ai.google.dev/gemini-api/docs/caching for details and minimum prefix-token requirements. Defaults to None.

I would only store static data in the cache context, then pass dynamic variables as user messages at the start of the session, as described here:

darryncampbell · June 9, 2026, 1:09pm

This looked familiar

Topic		Replies	Views
Dangerous assistant turn merging with Gemini Client SDKs python , llm	1	26	February 20, 2026
How to retain system instructions in update_chat_ctx? Agents agent-development , openai	11	182	February 25, 2026
Gemini3.1 live preview model drops mid conversation for odd seconds Agents agent-development , other	1	82	May 16, 2026
Gemini 3.1 history_config Client SDKs gemini	2	46	April 20, 2026
Response.prompt_cache_retention Input should be ‘in-memory’ or ‘24h Agents agent-development , openai	2	55	April 21, 2026

Gemini Explicit Context Caching (cached_content) drops system_instruction in livekit.plugins.google — How to inject dynamic session variables?

Questions:

Related topics