Telephony Voice AI listening errors

Hey,

We’ve been using the LiveKit agent framework for quite some time and have built our orchestration layer around it at vozzo.ai (you can check it out at platform.vozzo.ai for reference).

The issue we’re facing is with STT word error rates on Agent integrated with Plivo Zentrunk telephony in India. Everything works well on web calls, but over telephony—likely due to codec limitations and input audio quality—the speech-to-text accuracy drops significantly. This ends up breaking the overall experience and leads to unhappy clients. We also tried deepgram, sarvam, 11 labs voice ai platform and see the similar word errors.

If anyone has found effective ways or hacks to improve STT performance in such telephony setups, I’d really appreciate your guidance.

Thanks in advance!

Does Plivo support G722? If so that may perform better (I assume you are using G711 PCM now?). If not is there an option to switch to a trunk provider that support G722?

Have you already tried adding speech enhancement, like ai-coustics, into your pipeline?