Hi @everyone,
We’re running livekit-agents==1.4.6 (Python) with a SIP inbound trunk through Asterisk 20.5.0 (IncrediblePBX). Our agent works well for MTN and AirtelTigo callers in Ghana, but we’ve had persistent audio issues specifically with Telecel Ghana callers (number prefixes 020/050) and would love input from anyone who has navigated carrier-specific audio differences in telephony setups.
Architecture / audio path
Telecel handset
→ VoipNow SIP (PCMA/8000, plain RTP, ptime=20ms)
→ Asterisk 20.5.0 (SIP trunk, transcodes to PCMU or Opus)
→ LiveKit SIP trunk (WebRTC/ICE, offers PCMU + Opus/48000)
→ Agent (Python, livekit-agents 1.4.6)
The key distinction vs. other Ghanaian carriers: Telecel’s network uses AMR-NB internally, which VoipNow transcodes to PCMA/G.711 before it reaches Asterisk. This AMR→G.711 transcoding introduces spectral characteristics that differ from a clean G.711 call.
Current config for Telecel sessions:
silero.VAD.load(activation_threshold=0.35, min_silence_duration=0.7)
AgentSession(
vad=session_vad,
turn_detection=None, # English EOU model rejects AMR-transcoded audio
preemptive_generation=False,
min_interruption_duration=0.5,
false_interruption_timeout=4.0,
min_endpointing_delay=0.7,
max_endpointing_delay=2.0,
)
And in the agent class, for Telecel we omit turn_detection from super().__init__() entirely (pass NOT_GIVEN, not None) so the agent inherits the session’s VAD-only setting rather than overriding it.
Questions / things we’re unsure about
1. Recommended VAD parameters for G.711 telephony?
Is there a known-good activation_threshold / min_silence_duration combination for standard SIP telephony where comfort noise is present? We tuned empirically but would love a reference baseline.
2. Correct RoomOptions API in 1.4.6?
We’re seeing this deprecation warning:
RoomInputOptions and RoomOutputOptions are deprecated, use RoomOptions instead
We’re currently using room_io.RoomOptions(noise_cancellation=noise_cancellation.BVCTelephony()) for non-Telecel and room_io.RoomOptions() for Telecel — is this the right structure? Is noise_cancellation a direct field on RoomOptions, or should it be nested (e.g. input=RoomInputOptions(noise_cancellation=...))?
3. Silero “inference is slower than realtime” warnings
{"message": "inference is slower than realtime", "delay": 0.45, ...}
{"message": "inference is slower than realtime", "delay": 0.48, ...}
These appear during the first ~20s of the call while a background RAG cache warm-up task runs (20 async HTTP calls). They resolve once warm-up completes. Is this expected resource contention, or should we move the warm-up to a separate worker/process? Would force_cpu=True on Silero help or hurt here?
4. “Input is shorter by X samples; silence prepended” warning
Input is shorter by 27638 samples; silence has been prepended to align the input channel.
We see this once per Telecel call. Is this a symptom of Telecel’s RTP jitter causing packet gaps that the framework pads with silence? Should we be configuring a jitter buffer anywhere in the Asterisk/LiveKit pipeline, or is this handled automatically?
5. turn_detection=None at both session and agent level — is omitting from Agent correct?
We found that passing turn_detection=None explicitly to Agent.super().__init__() overrides the session’s VAD-only setting (as opposed to NOT_GIVEN which inherits). Is there documentation on the precedence rules between Agent and AgentSession for vad, turn_detection, and stt? We couldn’t find this spelled out clearly.
Environment
-
livekit-agents==1.4.6,livekit-rtc==1.1.2 -
livekit-plugins-deepgram,livekit-plugins-silero,livekit-plugins-cartesia -
Python 3.11, Docker (Linux/amd64)
-
Asterisk 20.5.0 (IncrediblePBX), SIP trunk to LiveKit cloud (Germany region)
-
STT: Deepgram
nova-3(English), TTS: Cartesia, LLM: Gemini 2.0 Flash
Thanks in advance — happy to share more logs or config details.