Hi,
We have use cases where we initiate outbound calls from agents to users. This happens without a welcome message, because in real world scenario when you pick up the phone you speak first and than the person who calls speaks second.
So what happens in our case:
- Agent calls
- User picks up
- User says: “hello this is John”
- Agent speaks: “hi this is assistant blablabla”
The issue we faced here, is that the LLM and TTS connections were not warmed up; so between step 3 and 4 it takes up to 3.5 seconds in worst cases.
What usually happens with the on_enter approach with welcome message, when the user calls the agent. A welcome message is spoken and the LLM and STT connections are setup and ready.
So for this use case we have built a function for our outbound calls that prewarms the TTS and LLM when the session starts, without a welcome message. I have not seen something standard out of the box in place for livekit agents for this use case, but might be beneficial to include something like this in the framework in the near future.
Through this approach we have reduced the time from step 3 to step 4 from 3.5s to 1.2s-1.4s