We’re building a voice agent using LiveKit Agents (Python) and ran into a token bloat problem when integrating MCP servers like Google Calendar.
The issue:
Each MCP tool injects its full schema/definition into the prompt on every LLM call. For a voice agent, latency is critical — ideally we want to stay under ~2k tokens per request. But even adding a single tool like “Create Event” from Google Calendar MCP adds ~3k tokens on its own, which immediately blows past that budget.
What we tried:
We know about the allowed_tools filter to only expose specific tools instead of the full MCP server. We tried this, but even one tool from Google Calendar MCP adds ~3k tokens — so filtering tools alone doesn’t solve the problem.
What we’re looking for:
- Is there a way to reduce or compress the tool schema that gets injected into the prompt?
- Has anyone written leaner custom wrappers around MCP tools to reduce token footprint?
- Is there a pattern others use to balance MCP tool availability vs. latency in voice agents?
Any advice or examples would be really helpful!