TTS recommendations for natural conversational voice agents

This question originally came up in our Slack community and the thread has been consolidated here for long-term reference.

Can someone recommend a good TTS? I was considering ElevenLabs but it’s too expensive.

Here’s my current setup:

session = AgentSession(
    stt=deepgram.STT(**stt_config),
    llm=openai.realtime.RealtimeModel(
        voice="verse",
        modalities=["text", "audio"],
        turn_detection=TurnDetection(
            type="semantic_vad",
            eagerness="high",
            create_response=True,
            interrupt_response=True,
        )
    ),
)

I’m trying to achieve a more human-like conversation with the agent.

Community recommendations:

  • Cartesia with cloned voice - Good quality TTS. However, Cartesia STT is not recommended for phone calls. Pricing: https://cartesia.ai/pricing
  • Google TTS - Really good quality
  • Deepgram - Recommended for STT, especially for phone calls

For best results, many developers use Cartesia TTS with Deepgram STT for phone call use cases.