SIP inbound — agent receives only zero-filled audio frames despite SIP ingress upstream > 0

@CWilson
Thank you. Will dig into the outbound path on our side (VPC Flow Logs, possibly engaging AWS support if needed). If we identify the drop point and need any additional info from your side, will follow up. Appreciate the investigation.

@dongwan.hong, It took my good attention and, the strongest single hypothesis given your Reply #16 evidence is that LK’s SIP ingress associates inbound RTP with an active SIP call using the signaling source IP, and your rtpengine-ON path breaks that association.

Look at the data: rtpengine OFF, RTP src = carrier IPs (203.240.134.x) which match the SIP signaling source the carrier uses, works. rtpengine ON, RTP src = your EIP 13.125.119.208, fails, even though that EIP IS in allowed_addresses. CWilson confirmed allowed_addresses applies only to signaling [ SIP inbound — agent receives only zero-filled audio frames despite SIP ingress upstream > 0 reply #19 ], so the discriminator can’t be the trunk ACL. The remaining candidate is the SFU’s own RTP-to-call binding, which would naturally use the signaling source IP as the key.

If Kamailio still proxies signaling unchanged (LK sees SIP from carrier IPs) but rtpengine rewrites RTP to your EIP, the two flows don’t tie back to the same call on LK’s side, matches the “only one RTP stream” CWilson sees in his dashboard screenshots.

This also fits the May 29 >> May 31 timing: your prior x86/Ubuntu setup likely ran Kamailio + rtpengine on host networking with both emitting from the EIP. The new ARM/AL2023 Docker setup may have changed which IP rtpengine actually emits RTP from (Docker bridge NAT vs host networking).

Two concrete suggestions from my side:

  • Run tcpdump on the EC2 host (not inside the container) watching the primary ENI for outbound packets to 161.115.163.x. That confirms the actual source IP the packets carry after any Docker NAT translation. If it’s not your EIP, you have an in-container source problem. Re-check rtpengine's --interface flag against your Docker network mode; the flag controls local-bind and advertised IP separately and has a NAT-aware form for splitting them.

  • Configure Kamailio to also rewrite SIP signaling source to your EIP, making signaling and RTP come from the same IP. That’s the cleanest fix if LK’s RTP-to-call binding does key on signaling source.

For background, sipwise/rtpengine#1621 documents a related class of asymmetric source-validation issue (STRICT_SOURCE + ASYMMETRIC flag interaction). Different specifics, same conceptual shape.

Sources:

@CWilson @Muhammad_Usman_Bashir
Closing the loop — silence is fully resolved. Root cause was on our side (AWS), not on LiveKit.

Cause:
Our VPC’s network ACL (NACL) and SG egress rules for UDP only allowed port range 30000-40000. LiveKit Cloud (Tokyo) advertises m=audio ports in the 50000-65535 range per call (we observed 50636, 51090, 52116, 53947, 54343, 55395, 57833, 59611 across different calls), so every outbound RTP/RTCP packet to LiveKit was REJECTED at our VPC boundary.

How we found it:
Enabled VPC Flow Logs on our SIP-proxy VPC. Insights query immediately showed:
10.220.100.245.30364 → 161.115.163.144.51090 UDP REJECT
10.220.100.245.30365 → 161.115.163.144.51091 UDP REJECT
SIP signaling (UDP 5060) and HTTPS were ACCEPT — only RTP outbound to LiveKit’s high ports was rejected.

Fix:
NACL ingress 350 + egress 440: UDP 30000-40000 → 30000-65535
SG egress UDP rule: 30000-40000 → 30000-65535
After both changes, RTP and RTCP both ACCEPT, calls work end-to-end (TTS heard + STT receiving).

Thanks for the patient investigation and for sharing the edge-side PCAPs — those confirmations were what kept narrowing the diagnosis until we found Flow Logs as the missing observation.

Sorry for the noise on the WebRTC cross-tenant correlation — that was a separate issue with similar surface symptoms that I conflated.

Thanks for the follow-up @dongwan.hong , great that we have this resolved :lk-party: