ElevenLabs eleven_v3 with LiveKit Agents JS plugin fails with WebSocket 403

Hi, I’m running a LiveKit Agents JS voice worker and trying to use ElevenLabs TTS with eleven_v3.

Environment:

@livekit/agents: 1.3.2
@livekit/agents-plugin-elevenlabs: 1.3.2
@livekit/rtc-node: 0.13.27
Runtime: Node 20+
LiveKit Cloud server: 1.10.1

My TTS config is roughly:

tts: new elevenlabs.TTS({
apiKey: process.env.ELEVENLABS_API_KEY,
model: "eleven_v3",
voiceId: process.env.ELEVENLABS_DUTCH_VOICE_ID,
language: "nl",
})

When a SIP call starts, the worker joins the room correctly. STT, room connection, and egress all appear fine, but the first agent speech fails with:

WebSocket connection error: Unexpected server response: 403
failed to synthesize speech, retrying...
LiveKit agent session emitted error
source: TTS
errorName: APIConnectionError
errorMessage: could not connect to ElevenLabs

After looking through the plugin source, it seems the default AgentSession TTS path uses the ElevenLabs plugin’s streaming path, which opens the multi-context WebSocket endpoint:

wss://api.elevenlabs.io/v1/text-to-speech/{voiceId}/multi-stream-input?model_id=eleven_v3...

I also reproduced the WebSocket handshake directly with the same API key and voice ID:

eleven_v3 -> 403
eleven_flash_v2_5 -> opens
eleven_turbo_v2_5 -> opens

ElevenLabs docs state that multi-context WebSockets are not available for eleven_v3. They also say v3 is available through the Create Speech / Stream Speech HTTP endpoints by specifying model_id: “eleven_v3”.

So this looks like a model/transport mismatch: the LiveKit ElevenLabs plugin accepts eleven_v3 as a model string, but the default AgentSession streaming path uses an ElevenLabs WebSocket endpoint that does not support that model.

Questions:

  1. Is eleven_v3 intended to be unsupported with @livekit/agents-plugin-elevenlabs inside AgentSession?

  2. Should the plugin validate this earlier and reject eleven_v3 for WebSocket streaming?

  3. If I want to keep eleven_v3, is the recommended path:

    • LiveKit Inference with elevenlabs/eleven_v3, assuming supported/default voices,

    • A custom ttsNode / custom TTS adapter that calls ElevenLabs HTTP Stream Speech sentence-by-sentence or

    • Something else?

Small update after digging further: I found a partial workaround by wrapping the ElevenLabs plugin TTS in LiveKit’s tts.StreamAdapter:

tts: new tts.StreamAdapter(
new elevenlabs.TTS({
apiKey: args.env.elevenLabsApiKey,
model: “eleven_v3”,
voiceId:
args.language === “nl”
? args.env.elevenLabsDutchVoiceId
: args.env.elevenLabsEnglishVoiceId,
language: args.language,
}),
new tokenize.basic.SentenceTokenizer(),
)

This appears to avoid the 403 because the adapter’s streaming interface calls the underlying synthesize() method per sentence, which uses ElevenLabs’ HTTP Stream Speech endpoint, instead of the plugin’s native .stream() path that opens the multi-stream-input WebSocket.

So it does make eleven_v3 usable inside AgentSession, but it is not a great production fix. Since every sentence becomes a separate ElevenLabs request, the delivery across sentence boundaries is noticeably less natural. Prosody/phonetic continuity is not preserved the way it would be with a single contextual synthesis stream.

@CWilson Can you take a look at this issue please?

Maybe that voice id is not valid?

My voice ID is not the issue, I have tested with numerous working IDs, but the error still occurs.

I was able to confirm that you do need to wrap that one in the stream adapter, as you did.

Alright, thank you for the clarification!