Voxtral TTS API 1,230ms TTFB in real-time voice agent pipeline

Mahimai_Raja · March 29, 2026, 5:11pm

Hi, I’m building a real-time voice AI agent using LiveKit and tested Voxtral TTS (voxtral-mini-tts-2603) via the /v1/audio/speech endpoint with SSE streaming.

I referred to code from the merged PR: feat(mistral): add voxtral TTS support by jeanprbt · Pull Request #5245 · livekit/agents · GitHub

I’m consistently seeing ~1,230ms TTFB (time to first audio byte) on warm connections. For comparison, here’s what I’m getting from other TTS providers in the same pipeline:

Provider	TTFB
Cartesia Sonic	~40-90ms
Smallest.ai Lightning v3.1	~250ms
Mistral Voxtral	~1,230ms

My setup:

SSE streaming (stream: true)
Response format: mp3
Short conversational text (~1-2 sentences)
Measured from POST to first speech.audio.delta event

Is anyone else seeing similar latency? The docs mention ~90ms processing time wondering if there’s something I’m missing in my configuration, or if this is expected during the early rollout period.

Cesar_Sanz_Martinez · April 2, 2026, 4:04am

Unfortunately is the same in our case, around 1000ms TTFB. I had high hopes for this model

Josselin_Lecocq · April 2, 2026, 2:54pm

According to Text to Speech | Mistral Docs , “End-to-end API time-to-first-audio varies by format (~0.8s for pcm, ~3s for mp3)”, which is what you see. 90 ms is the model processing time, not the TTFB unfortunately. Additionnally, Voxtral TTS doesn’t currently support input streaming.

Mahimai_Raja · April 6, 2026, 6:28pm

That clarifies alot, thanks @Josselin_Lecocq

Topic		Replies	Views
Optimizing Voice Agent Latency, Tool Calling Delays, and Audio Quality Issues with GPT-4o Mini, Sarvam V3 TTS, Deepgram Nova 3 STT, and LiveKit Agents agent-builder	5	81	May 11, 2026
Latency issue how to fix this? Getting Started	13	389	April 13, 2026
Gpt-realtime-1.5 leaks audio control tokens (<\|audio_text\|>, <\|caption_quality_N\|>) into text stream when run with modalities=["text"] Agents tts , realtime	1	33	April 20, 2026
Add api ElevenLabs key to agents TTS Getting Started	4	65	March 26, 2026
Best STT Alternative to OpenAI whisper-1 for Japanese in LiveKit Agents stt , openai	2	66	March 9, 2026

Voxtral TTS API 1,230ms TTFB in real-time voice agent pipeline

Related topics