STT / voice pipeline: providers with built-in sentiment or emotion analysis?

I’m building a voice AI agent with LiveKit Agents (Python) and using AssemblyAI for real-time STT. I need acoustic sentiment or emotion (from the audio, not from the transcript text).

Current situation

  • AssemblyAI’s streaming STT doesn’t expose sentiment/emotion; their batch Audio Intelligence API does sentiment only (positive/negative/neutral), and we’re using that in parallel for acoustic sentiment.

  • I’d prefer a provider that exposes sentiment or emotion in the same real-time pipeline (e.g. as part of the STT response or a companion stream) to avoid extra latency and custom buffering.

Question

  • Is anyone aware of an STT or voice provider (e.g. Deepgram, ElevenLabs, Cartesia, or others) that exposes sentiment and/or emotion (e.g. happy, frustrated, neutral) from the user’s voice in real time or with low latency?

  • If you’ve integrated sentiment/emotion in a LiveKit agent (with any provider or custom model), I’d be interested in how you did it (e.g. streaming vs batch, and how you use it in the agent).

Thanks.

Have you looked at Assembly AI?