Hey silly question, i’m setting up my livekit voice agent. There are many different plugins for TTS and hundred of voices per provider… how do you guys select the right voice? I mean do you just do it with “vibes” or did you build yourself some kind of evals?
I’ve seen 3rd party voice benchmarks floating around. Each have their strengths and weaknesses and the landscape is currently pretty dynamic.
Some are good at industry specific terminology, while other focus on more realistic voices, etc.
I don’t really have a metric to share but language, regional supports, etc are often parameters folks use us when making their decision,.
Hopefully it’s not too off, but on my side I feel like most of the TTS voices are so generic and artificial. Looking for something that feels a bit more playful with character. Recently Spotify introduced that AI DJ feature, I love the voice and the over character. Any ideas what are they using? Or maybe something similar?
@CWilson do you recommend just trying different voices on different providers sites? how do people usually do?
what benchmarks have you seen?
even with speech to speech voices? it feels artificial to you?
The tone etc is somehow off, I don’t know how to explain but it’s all very “official” generic, too perfect. However, you are right the STS models generally feel a bit more realistic from my experience.
Thank you so much @CWilson it’s gold, and again I think that doesn’t solve the issue of voices being super generic. Looking into the available options on the market now, maybe it’s really also the matter of TTS. If I find something appealing will share this with you guys.