Long-running Voice Session incur Significant Costs for WebSocket-based STT model

Yang_Xu · March 29, 2026, 6:33am

For streaming STT models like Deepgram, Assembly Universal Pro, and ElevenLabs Scribe v2 Realtime, the current setup works well for dense, back-and-forth voice interactions. The challenge is with long-running voice sessions that include extended silence, where keeping STT active can become quite expensive, especially with ElevenLabs.

Does the framework support automatically disabling STT after a configurable period of no audio activity, then reconnecting once new VAD events are detected? An optimization flag like this could help reduce costs significantly, with the trade-off of some reconnection latency.

Raghu_Udiyar · March 30, 2026, 12:07pm

I think usage will incur only if you send the audio - you could block / allow that based on the VAD detection.

darryncampbell · March 31, 2026, 8:31am

No, once the websocket connections are established to the STT provider, which happens during session.start() there is no easy way to tear these down and resume the session at a later time. Muting the stream, or not sending audio won’t affect your STT usage.

STT will only run when there is a user in the room and the session is started, so some customers will delay starting the session until an indication is received from the user front-end that they are ‘ready’ - but that doesn’t apply after the session is started.

Topic		Replies	Views
Issue with programmatically toggle STT/TTS on off Agents agent-development , python , stt , tts	6	71	February 24, 2026
Managing concurrent STT sessions Client SDKs stt	6	80	April 10, 2026
Realtime model with Azure whisper STT Agents python , stt , realtime , openai , azure	17	141	February 26, 2026
Inference STT WebSocket fails (APIConnectionError) while room connection works Agents agent-development	1	24	March 26, 2026
How to pause STT/LLM processing during a long-running function call Agents agent-development	1	45	January 21, 2026

Long-running Voice Session incur Significant Costs for WebSocket-based STT model

Related topics