Audio Intelligence for Voice Agents

While most STT engines can only transcribe speech to text, they are unable to hear or recognize anything else. That’s why we developed Tyto: to bring audio intelligence to Voice AI and give your agent context about the acoustic environment.

Tyto sits at the input of the voice pipeline, scoring incoming audio for failure risk and adding context across several dimensions, such as noise, reverb, packet loss, interfering speech, media devices and more. With Tyto, agent builders can extract information from the audio stream for post-call analytics (e.g. automatically flagging, finding and analysing call failures) as well as making real-time adjustments to calls (e.g. informing the LLM or S2S model about the audio context and degradation so the agent can respond dynamically during a conversation).

Here’s a demo of an agent that can automatically detect if you are far from the microphone, have an audio issue due to low internet bandwidth, or have multiple interfering speakers.

Real-time demo: https://ai-coustics-tyto-demo--ph.modal.run/
Video: https://youtu.be/RjDujIWAkuA
GitHub: GitHub - ai-coustics/Project-Tyto-Real-Time-Demo at ver/python-livekit · GitHub

Feel free to play around with it. We’re curious to learn how you’d use Tyto and what other info/intelligence you’d like to see extracted from the audio signal. Happy coding!

I would encourage anyone to have a play with this, call audio quality is one of the trickiest things to monitor in production, so this is really interesting.