I’m writing E2E tests for a multi-task voice agent using AgentSession. My top-level agent sequentially awaits child tasks, each of which calls session.say() in its on_enter().
turn1 = await session.run(user_input=“…”) # captures User Input
After a task transition happens:
parent agent starts a new child task → on_enter() calls session.say(“…”)
turn2 = await session.run(user_input=“…”) # only captures user input, NOT the new task’s agent on_enter speech
The problem: When a task transition happens after session.run() completes, the new task’s on_enter() speech (via session.say()) is not captured in any RunResult. The RunResult from session.run() is already done before the transition, and the next session.run() only captures user input.
With session.start(agent, capture_run=True), the initial on_enter speech is captured. Is there an equivalent mechanism for task-to-task transitions that happen after session.run() completes?
So I see you follow the same strategy by just capturing the user input in your tests:
# NameTask asks for name first
result1 = await session.run(user_input="My name is Alex Johnson")
result1.expect.next_event().is_function_call(name="record_name")
# Wait a bit for next task to start
await asyncio.sleep(0.1)
# PhoneTask asks for phone
result2 = await session.run(user_input="My phone is 949-555-1234")
result2.expect.next_event().is_function_call(name="record_phone")
result2.expect.skip_next_event_if(type="function_call_output")
# Confirm phone
await asyncio.sleep(0.1)
result2_confirm = await session.run(user_input="Yes, that's correct")
result2_confirm.expect.next_event().is_function_call(name="confirm_phone")
# AgeTask asks for age
await asyncio.sleep(0.1)
result3 = await session.run(user_input="I'm 25 years old")
result3.expect.next_event().is_function_call(name="record_age")
result3.expect.skip_next_event_if(type="function_call_output")
# Confirm age
await asyncio.sleep(0.1)
result3_confirm = await session.run(user_input="Yes, that's correct")
result3_confirm.expect.next_event().is_function_call(name="confirm_age")
We use session.say at the beginning of the task. is there way to assert that it was called with certain input?
Not with the evaluation framework, you would need an end-to-end audio test for that - there are lots of 3rd party frameworks available and we don’t recommend a specific one, but https://www.cekura.ai/ was discussed a lot previously in the community (though I have no experience)
Alternatively, you could just test the string you are passing to the greeting message conforms to a specific string, outside of the evaluation framework:
def test_greeting_message():
assert "hello" in get_greeting_message().lower()