Does LLM output stream directly to TTS or wait for complete response?

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

Does LLM output get streamed directly to the TTS model, or does it wait for the entire response?

I’m using inference and can’t find documentation on this. Do I have to create separate node functions?

Inference uses the Pipeline model, where LLM output flows into the TTS model as it is generated. You do not have to wait for the entire response.

See the nodes documentation for more details: