Pronunciation Dictionary Limitations in Inference (Cartesia Sonic 3)

champion · April 8, 2026, 12:23pm

Hey everyone,

I’m currently building a Voice AI agent using LiveKit and experimenting with the Cartesia provider (Sonic 3 model).

I had a question regarding pronunciation control during inference. I noticed there seems to be a limitation when loading a custom pronunciation dictionary (word → phoneme mapping).

From an engineering perspective, I’m trying to understand:

1. Why is there a restriction on loading pronunciation dictionaries during inference?

2. Is this due to latency constraints, model architecture, or provider-level limitations?

3. If we need fine-grained pronunciation control (especially for domain-specific terms, names, etc.), what is the recommended approach?

For example, in my use case, I need consistent and accurate pronunciation for dynamically generated content, and a static preloaded dictionary doesn’t fully solve the problem.

Would really appreciate insights from the team or anyone who has tackled this in production.

darryncampbell · April 9, 2026, 8:05am

For Cartesia, the pronunciation_dict_id option is available in Agents for both Python and JS, however it doesn’t seem to be documented (I’ll follow up on that).

The pronunciation_dict_id takes a unique identifier corresponding to a dictionary stored in your Cartesia account, so it would be more difficult (but I’m sure not impossible) for us to expose this through LiveKit Inference. That is most likely the real reason it hasn’t been implemented (yet).

You have two options in this short term:

Use SSML tags, SSML Tags - Cartesia Docs (our docs: Cartesia TTS | LiveKit Documentation )
Use the Cartesia TTS plugin with pronunciation_dict_id

Topic		Replies	Views
Support for Cartesia sonic-3-latest model Agents agent-development , python , tts , cartesia	3	68	May 1, 2026
Is there support for ElevenLabs pronunciation dictionaries? Agents tts , elevenlabs	1	17	January 21, 2026
Does cartesia/sonic-3-latest still route to Sonic 3.5, or Sonic 3? Agents livekit-inference , cartesia	2	36	May 27, 2026
Elevenlabs Voice ID outside of Defaults - Livekit Inference Agents livekit-inference , elevenlabs	3	36	April 12, 2026
Lowest latency STT/TTS/LLM stack for German - what's your experience? Agents agent-development , stt , llm , tts	1	77	March 13, 2026

Pronunciation Dictionary Limitations in Inference (Cartesia Sonic 3)

Related topics