Caller audio is often too low for reliable STT pickup unless the caller speaks loudly

Hi everybody,

I’m troubleshooting a voice quality/input issue on inbound phone calls routed through Telnyx to a LiveKit voice agent (French language use case).

Symptoms:

  • Caller audio is often too low for reliable STT pickup unless the caller speaks loudly.
  • This is much more noticeable on real smartphone/PSTN calls than on desktop/local mic tests.

Current stack:

  • Telnyx SIP trunk → LiveKit → Deepgram STT
  • STT: deepgram nova-3 (fr)
  • VAD/turn detection: Silero VAD + turn_detection=“vad”

What we already tested:

  • Removing noise cancellation on the LiveKit side improved local/desktop behavior.
  • Issue remains on telephony calls.

Are there recommended Telnyx settings to improve inbound speech level/clarity for AI voice agents?
Any known best-practice codec strategy for AI STT reliability on mobile/PSTN callers?

Thanks in advance for your help !

Hi @Christophe_Chapiteau

Just wondering - what kind of noise cancellation have you used? Have you also considered using different technology for the VAD?

I assume you do not have noise cancellation enabled on the trunk. Right?

There is only so much that can be done once we receive the audio. If there are too many PADs in the loop, it can be hard to recover an intelligible signal. You may want to address this with your trunk provider instead of trying to fix it after the loss.

Hi @Pawel_Lach

Thanks for your answer, I am using silero, is there a better vad you would recommend ?

Hi @CWilson

Yes, that makes sense.

I ran more tests with Telnyx, and audio quality is noticeably better when I force G.711 only, on the SIP connection.

So you were right, the issue seems to be happening upstream, likely during codec negotiation and/or transcoding, rather than something LiveKit can fully recover afterward.

I’m now investigating the trunk configuration first.

Thanks for pointing me in that direction.

@Christophe_Chapiteau you can try out Quail Voice Focus 2.1 for the noise cancellation and speaker-isolation, as well as the VAD Voice Activity Detection (VAD) - ai-coustics Docs

However, I would first try to increase the input gain somehow to have the stronger signal. Then you could connect with with a Voice Focus and the VAD.

@Christophe_Chapiteau, your G.711 finding lines up with the standard recommendation for telephony AI agents. For French/EU PSTN, prefer PCMA (G.711 a-law) on the Telnyx trunk, not PCMU. PCMU is the US default and on EU calls usually adds an extra transcode hop. Forcing PCMA-only in the Telnyx SIP Connection codec preferences keeps the path PSTN > Telnyx > LiveKit with no codec conversion, which is the cleanest signal Deepgram will see.

On the VAD question: Silero is solid for telephony as a VAD. If you want stronger turn detection than VAD-only, switch turn_detection from “vad” to the MultilingualModel from livekit-plugins-turn-detector. It’s a semantic, multilingual turn detector that handles French. Fewer false interrupts on noisy lines than VAD-only.