I have an async tool (get_data) that makes an HTTP call taking 2–6 seconds. I want the agent to speak while the tool runs (e.g., “Let me look that up”), then process the result when it arrives. Two speech turns total: one interim, one with the data. Is this possible? This is typically what I see:
Timeline (from logs)
T+0.0s LLM calls get_data(query="Alice")
T+0.0s ctx.update("Looking up data...") → _pending_fut resolves
T+0.0s LLM speaks: "OK, let me check if Alice is available." ✅ expected
T+2.0s HTTP response arrives (tool holds result, waiting for inactive)
T+3.9s LLM re-calls get_data({}) → duplicate rejected
T+3.9s LLM speaks: "I'm searching for Alice, one moment..." ❌ redundant
T+8.6s LLM speaks: "Still looking for Alice..." ❌ redundant
T+15.4s ctx.update(result_json) → LLM speaks result ✅ expected
Currently I get several intermediate talks from the LLM, and sometimes duplicate tool calling too. Is there a recommended pattern for async tools that need to deliver results? The ctx.update() + return None pattern works for the data delivery side, but the interim update causes the LLM to retry the tool.
Environment
- livekit-agents 1.5.6
- LLM: gemini-2.5-flash (via livekit.plugins.google)
- Python 3.13
My code
from livekit.agents import function_tool
from livekit.agents.llm.async_toolset import AsyncRunContext, AsyncToolset
class MyToolset(AsyncToolset):
def __init__(self):
super().__init__(id="my-tools") # on_duplicate_call defaults to "confirm"
@function_tool()
async def get_data(
self,
context: AsyncRunContext,
query: str | None = None,
) -> str | None:
"""Look up records by query."""
# Let the agent speak while the HTTP call runs
await context.update("Looking up data...")
OR
context.session.generate_reply(
instructions="Briefly tell the caller you're looking up the contact. One short sentence.",
allow_interruptions=False,
chat_ctx=None,
tools=[],
)
result = await self._http_get("/data", params={"q": query})
# Wait until the agent finishes its interim speech
await context.session.wait_for_inactive()
# Deliver data as an update (triggers generate_reply via _deliver_reply)
await context.update(result)
# Return None so _execute_tool skips _enqueue_reply — avoids double narration
return None