Hey,
We’ve been using the LiveKit agent framework for quite some time and have built our orchestration layer around it at vozzo.ai (you can check it out at platform.vozzo.ai for reference).
The issue we’re facing is with STT word error rates on Agent integrated with Plivo Zentrunk telephony in India. Everything works well on web calls, but over telephony—likely due to codec limitations and input audio quality—the speech-to-text accuracy drops significantly. This ends up breaking the overall experience and leads to unhappy clients. We also tried deepgram, sarvam, 11 labs voice ai platform and see the similar word errors.
If anyone has found effective ways or hacks to improve STT performance in such telephony setups, I’d really appreciate your guidance.
Thanks in advance!