MCP tool definitions bloating prompt tokens and increasing latency in voice agents — how to handle this?

We’re building a voice agent using LiveKit Agents (Python) and ran into a token bloat problem when integrating MCP servers like Google Calendar.

The issue:
Each MCP tool injects its full schema/definition into the prompt on every LLM call. For a voice agent, latency is critical — ideally we want to stay under ~2k tokens per request. But even adding a single tool like “Create Event” from Google Calendar MCP adds ~3k tokens on its own, which immediately blows past that budget.

What we tried:
We know about the allowed_tools filter to only expose specific tools instead of the full MCP server. We tried this, but even one tool from Google Calendar MCP adds ~3k tokens — so filtering tools alone doesn’t solve the problem.

What we’re looking for:

  • Is there a way to reduce or compress the tool schema that gets injected into the prompt?
  • Has anyone written leaner custom wrappers around MCP tools to reduce token footprint?
  • Is there a pattern others use to balance MCP tool availability vs. latency in voice agents?

Any advice or examples would be really helpful!

I guess one solution is to use an agent that supports transparent caching

If your agent doesn’t need all Calendar functionality, you could create a custom tool that abstracts some of the details, and only injects the minimal schema needed for your Agent.

prompt prune with tool calls and sub agents helps