Whether or not the agent speaks prior to making a tool call is something that I’d like to be configurable on the LiveKit end. It matters a lot for my application.
Right now I’ve experienced a few bugs due to this not being configurable:
- the llm_node output speech & a tool call, but the tool call happened before the speech, i.e., the agent performed a tool call that was a web search, the contents were sent to my frontend, and then the agent said “let me look that up for you”.
- the agent started to say “let me look that up for you” but then the tool call execution interrupted the agent’s speech, so we only heard the first word or so.
- the agent doesn’t speak any pre-tool-call speech. The tool call adds latency, so there’s an awkward pause while the user waits until the agent is done performing the tool call.
As of now I don’t prompt the agent to speak before calling the tool. I used to, and it was very unreliable, and oftentimes it seemed that there was a race condition between tool execution and speech streaming.
Simply instructing the agent not to output pre-tool-call speech, and then slapping a self.session.say() is NOT a solution. This is because latency matters and I can’t really afford a tts round trip. But also it’s a nasty solution, given that, ya know, we were just at the llm_node, as that’s what called the tool. Also I do not want the agent saying the same thing every single time. Oh and because the agent might have already output pre-tool-call speech. Then the agent would be double-speaking.
Similarly, slapping generate_reply() at the top of every function_tool is not a solution, as this is the full llm & tts round trip. I can’t afford that latency, and again we run into the possibility of the agent having already spoken pre-tool-call-speech.
This really irks me because right now, oftentimes somehow the tool call happens before the speech output, and so not only does the agent already have the results when it says “let me look that up for you”, but in fact, now my transcribed chat history is out of order, as it shows a tool call happened and then the agent said “let me look that up for you”.
I’ve tried the workarounds. This is my 2nd or 3rd post on this subject. I can’t be shrugged off again and told “just do session.say()”. This is causing major problems for my application.