AI Agent async tool calls causes a lot of LLM conversation messages

I have an async tool (get_data) that makes an HTTP call taking 2–6 seconds. I want the agent to speak while the tool runs (e.g., “Let me look that up”), then process the result when it arrives. Two speech turns total: one interim, one with the data. Is this possible? This is typically what I see:

Timeline (from logs)

  T+0.0s   LLM calls get_data(query="Alice")
  T+0.0s   ctx.update("Looking up data...") → _pending_fut resolves
  T+0.0s   LLM speaks: "OK, let me check if Alice is available."            ✅ expected
  T+2.0s   HTTP response arrives (tool holds result, waiting for inactive)
  T+3.9s   LLM re-calls get_data({}) → duplicate rejected
  T+3.9s   LLM speaks: "I'm searching for Alice, one moment..."             ❌ redundant
  T+8.6s   LLM speaks: "Still looking for Alice..."                          ❌ redundant
  T+15.4s  ctx.update(result_json) → LLM speaks result                      ✅ expected

Currently I get several intermediate talks from the LLM, and sometimes duplicate tool calling too. Is there a recommended pattern for async tools that need to deliver results? The ctx.update() + return None pattern works for the data delivery side, but the interim update causes the LLM to retry the tool.

Environment

  • livekit-agents 1.5.6
  • LLM: gemini-2.5-flash (via livekit.plugins.google)
  • Python 3.13

My code

from livekit.agents import function_tool
  from livekit.agents.llm.async_toolset import AsyncRunContext, AsyncToolset


  class MyToolset(AsyncToolset):
      def __init__(self):
          super().__init__(id="my-tools")  # on_duplicate_call defaults to "confirm"

      @function_tool()
      async def get_data(
          self,
          context: AsyncRunContext,
          query: str | None = None,
      ) -> str | None:
          """Look up records by query."""
          # Let the agent speak while the HTTP call runs
          await context.update("Looking up data...")
OR
          context.session.generate_reply(
                instructions="Briefly tell the caller you're looking up the contact. One short sentence.",
                allow_interruptions=False,
                chat_ctx=None,
                tools=[],
            )


          result = await self._http_get("/data", params={"q": query})

          # Wait until the agent finishes its interim speech
          await context.session.wait_for_inactive()

          # Deliver data as an update (triggers generate_reply via _deliver_reply)
          await context.update(result)

          # Return None so _execute_tool skips _enqueue_reply — avoids double narration
          return None

I would try explicitly setting on_duplicate_call to reject

I also recommend avoiding generate_reply and sticking with your first option, await context.update….

If you specifically want to only agent to speak once prior to the result being returned, it may be more predictable to use a say() prior to the tool call. Your timeline shows the HTTP call taking 15 seconds, not 2-6, and it feels like the default experience of the ‘redundant’ progress updates feels acceptable in this context?

Hi, and thanks for your input!

I think I actually found a more robust solution after your reply and more thinking, I probably had it a bit wrong around the `AsyncRunContext` vs the python function being async.

My working prototype now uses the regular `RunContext`, but with a (non awaited) call to `context.session.generate_reply(…)` before invoking the awaited http tool.

Basically:

    async def get_contacts(
        self,
        context: RunContext,
        name: str | None = None,
        department: str | None = None,
    ) -> str:

        reply_task = context.session.generate_reply(
            instructions="Briefly tell the caller you're looking up the contact. One short sentence.",
            allow_interruptions=False,
            chat_ctx=None,
            tools=[],
        )

        api_task = self._http_get("/data", params={"q": query})

        _, result = await asyncio.gather(reply_task, api_task)

        return result.text

I think this will work, and be the most pragmatic way to have the agent talk while it executes a tool. What do you recon?

(`say()` is unfortunately out of the question, because the Agent is adopting language to the callee, so we need the LLM to be involved and do translation. And the timing is from a slow development environment, sorry for the confusion.)

Please keep an eye on https://github.com/livekit/agents/pull/5841. It’s not yet merged, but the intention is that this should address your use case.