I’m building a voice AI agent using LiveKit and running into latency issues due to a large system prompt.
The prompt contains all the logic, flows, and business rules the assistant needs, but it’s becoming too long and impacting response time.
What are best practices to:
-
Reduce prompt size without losing important behavior?
-
Structure prompts for real-time voice agents?
-
Offload logic (tools, state machines, memory, etc.) effectively?
Would you recommend splitting the prompt, using external state, or relying more on function calling?
Any architecture patterns or examples would be greatly appreciated.