This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.
I’m working on a voice agent that follows a script like a human would. It should:
- Stick to the structure
- Say some lines exactly when needed
- Adapt when the user goes off track
- Guide the conversation back to the script
We also need this to be modular since we handle many different scripts. Have you seen approaches or patterns for this type of behavior?
For very scripted conversations, community members recommend using tool calls combined with session.say() since prompting alone is not deterministic.
Recommended approach:
- Break it into multiple agents: Use different agents for different parts of the script
- Create an evaluation pipeline: Test that the agent doesn’t go off script
- Iterate on prompts: Start with simple instructions and make incremental changes (change, test, repeat)
- Use function tools for state transitions: Have the LLM call tools to transition between script steps, with dynamic enum values for valid next steps
The tradeoff is that very strict scripting can feel robotic when users go off-topic. Finding the right balance between structure and flexibility requires iteration.