This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.
I have a function call that takes 10-20 seconds to complete (e.g., creating a ticket). During this time, if the user says “hi” or “hello”, those messages are still transcribed and sent to the LLM.
As a result, once the function completes, the agent restarts the conversation instead of responding with the function result.
Rather than stopping STT entirely (which isn’t designed to work that way), you can hook the STT node and drop the input until your function is complete.
See this example that drops input until a wake word is detected - you can adapt it to drop input until your processor is done:
Note: session.input.set_audio_enabled(False) stops audio frames being sent to the room, which pauses your pipeline. But if you need to specifically prevent STT from processing buffered audio, you’ll need the node-hooking approach.