Caller audio is often too low for reliable STT pickup unless the caller speaks loudly

Hi everybody,

I’m troubleshooting a voice quality/input issue on inbound phone calls routed through Telnyx to a LiveKit voice agent (French language use case).

Symptoms:

  • Caller audio is often too low for reliable STT pickup unless the caller speaks loudly.
  • This is much more noticeable on real smartphone/PSTN calls than on desktop/local mic tests.

Current stack:

  • Telnyx SIP trunk → LiveKit → Deepgram STT
  • STT: deepgram nova-3 (fr)
  • VAD/turn detection: Silero VAD + turn_detection=“vad”

What we already tested:

  • Removing noise cancellation on the LiveKit side improved local/desktop behavior.
  • Issue remains on telephony calls.

Are there recommended Telnyx settings to improve inbound speech level/clarity for AI voice agents?
Any known best-practice codec strategy for AI STT reliability on mobile/PSTN callers?

Thanks in advance for your help !

Hi @Christophe_Chapiteau

Just wondering - what kind of noise cancellation have you used? Have you also considered using different technology for the VAD?

I assume you do not have noise cancellation enabled on the trunk. Right?

There is only so much that can be done once we receive the audio. If there are too many PADs in the loop, it can be hard to recover an intelligible signal. You may want to address this with your trunk provider instead of trying to fix it after the loss.

Hi @Pawel_Lach

Thanks for your answer, I am using silero, is there a better vad you would recommend ?

Hi @CWilson

Yes, that makes sense.

I ran more tests with Telnyx, and audio quality is noticeably better when I force G.711 only, on the SIP connection.

So you were right, the issue seems to be happening upstream, likely during codec negotiation and/or transcoding, rather than something LiveKit can fully recover afterward.

I’m now investigating the trunk configuration first.

Thanks for pointing me in that direction.

@Christophe_Chapiteau you can try out Quail Voice Focus 2.1 for the noise cancellation and speaker-isolation, as well as the VAD Voice Activity Detection (VAD) - ai-coustics Docs

However, I would first try to increase the input gain somehow to have the stronger signal. Then you could connect with with a Voice Focus and the VAD.

@Christophe_Chapiteau, your G.711 finding lines up with the standard recommendation for telephony AI agents. For French/EU PSTN, prefer PCMA (G.711 a-law) on the Telnyx trunk, not PCMU. PCMU is the US default and on EU calls usually adds an extra transcode hop. Forcing PCMA-only in the Telnyx SIP Connection codec preferences keeps the path PSTN > Telnyx > LiveKit with no codec conversion, which is the cleanest signal Deepgram will see.

On the VAD question: Silero is solid for telephony as a VAD. If you want stronger turn detection than VAD-only, switch turn_detection from “vad” to the MultilingualModel from livekit-plugins-turn-detector. It’s a semantic, multilingual turn detector that handles French. Fewer false interrupts on noisy lines than VAD-only.

Thanks a lot for your detailed answer, @Muhammad_Usman_Bashir

Here is my current configuration in the Telnyx dashboard.

Accepted codecs: G.711U and G.711A only

SIP transport protocol: TCP

DTMF: RFC 2833

Since my use case is for French/EU PSTN, would it be better to keep only G.711A (PCMA) and remove G.711U (PCMU) entirely?

For SIP transport, would UDP or TLS be a better choice than TCP for this setup?

For DTMF, should I keep RFC 2833, or would Inband or SIP INFO be better in practice?

Thanks again for your help.

Appreciate it, @Christophe_Chapiteau. Three quick answers:

Codecs: yes, for French/EU PSTN-only inbound, drop G.711U and keep only G.711A (PCMA). PCMU is an option that should never be chosen on EU calls; removing it forces the cleanest path and eliminates a rare transcode possibility.

SIP transport: prefer TLS for production (encrypted signaling). TCP has no advantage over UDP for SIP, and SIP transport is signaling only, separate from RTP, so it doesn’t affect audio quality.

DTMF: keep RFC 2833. It’s the de facto standard and what LiveKit SIP handles cleanly. Inband is fragile (tones live in the RTP stream; any processing can mangle them); SIP INFO works but isn’t as universal.

Thank you very much for your message for your answer @Muhammad_Usman_Bashir , I’ll change TCP for TLS.