Hi everyone, had some questions about LiveKit + LLM tool calling.
When the LLM returns both text content and a function call in the same turn, are they both added to current_speech.chat_items immediately, or is there a specific order/timing?
Is checking ChatMessage the most reliable way to detect what was actually spoken, or should I use agent_speech_committed events instead?
If I need to check this from within a function tool (like hangup_call), is current_speech.chat_items the right place to look, or should I wait for agent_speech_committed events?
Use case:
I need to determine if a closure greeting was already spoken in the current turn before deciding whether to generate one, to avoid repeating the same greeting.
Thanks for any guidance!
I believe there are two llm calls that happen for a turn that needs tool. One for deciding what tool to be used, and the next one for the output generation .
llm_call_1(user_query)→ returns tool names → executes them → returns returns value → llm_call_2 (with tool response in context)→ generates final response
i would suggest using the emitted events to fetch the committed contents. conversation_item_addedevent would be pretty handy
conversation_item_added is triggered when TTS is done with speaking and item is added to chat_ctx but my requirement is i need to get message which TTS is speaking which should be available in speech handle.chat_items previously it was working and i was getting in speech_handle.chat_items but in 1.3.11 it is not coming. only FunctionCall items is present but ChatMessage is not there.
I am talking about a case when llm output ‘content’ and ‘tools’ in the same api call.
I thought they should be added in the order they are generated.
should be available in speech handle.chat_items previously it was working and i was getting in speech_handle.chat_items but in 1.3.11 it is not coming.
I’m not quite following where you are retrieving the chat_items in your workflow, perhaps a code snippet would help?
You are correct. they were used to be available in previous version but now things are breaking. here is the code snippet which will help to understand better.
def some_tool(ctx):
current_speech = ctx.session.current_speech
if current_speech:
for item in current_speech.chat_items:
if isinstance(item, ChatMessage):
#some logic
We are getting function tool message but not chat message.
chat_items = context.session.current_agent.chat_ctx.items
if chat_items:
for item in chat_items:
if isinstance(item, ChatMessage):
logger.info(f"message: {item.text_content}")
Would be interesting to know which version the behaviour changed?
Thanks for looking into this. I’ve tested both approaches, and unfortunately neither works for our use case:
**Scenario 1: Using `current_speech.chat_items`**
When the LLM returns both text content and a function call in the same turn, `current_speech.chat_items` only contains `FunctionCall` items. No `ChatMessage` objects are present. This is possibaly the bug as this was working fine some version back.
**Scenario 2: Using `current_agent.chat_ctx.items`**
Here the chat items are added to current_agent.chat_ctx.items only after TTS has finished playing back current speech. but we want that message as the speech is created or is being spoken.
From our logs:
- **Before `wait_for_playout()`**: `chat_ctx.items` contains messages up to the previous turn, but not the message that’s currently being spoken
- **After `wait_for_playout()`**: The TTS spoken message `ChatMessage` appears in `chat_ctx.items`
I spent some time trying to reproduce this, modifying the agent starter to invoke multiple function calls in a turn and accessing speech via current_speech.chat_items
I wasn’t able to access the previous speech even with 1.3.5 so I assume I’m not matching your scenario.