I’m building a voice AI agent with LiveKit Agents (Python) and using AssemblyAI for real-time STT. I need acoustic sentiment or emotion (from the audio, not from the transcript text).
Current situation
-
AssemblyAI’s streaming STT doesn’t expose sentiment/emotion; their batch Audio Intelligence API does sentiment only (positive/negative/neutral), and we’re using that in parallel for acoustic sentiment.
-
I’d prefer a provider that exposes sentiment or emotion in the same real-time pipeline (e.g. as part of the STT response or a companion stream) to avoid extra latency and custom buffering.
Question
-
Is anyone aware of an STT or voice provider (e.g. Deepgram, ElevenLabs, Cartesia, or others) that exposes sentiment and/or emotion (e.g. happy, frustrated, neutral) from the user’s voice in real time or with low latency?
-
If you’ve integrated sentiment/emotion in a LiveKit agent (with any provider or custom model), I’d be interested in how you did it (e.g. streaming vs batch, and how you use it in the agent).
Thanks.