SIP inbound — agent receives only zero-filled audio frames despite SIP ingress upstream > 0

Hi LiveKit team,

Every inbound SIP call shows the same pattern: the agent’s TTS greeting reaches the caller fine, but the agent’s subscribed track for the SIP participant delivers only zero-filled audio frames (every PCM sample is literally 0x00). Caller hangs up after ~30s due to one-way audio.

The decisive evidence is a disagreement between SDK callbacks and the SFU’s own dashboard state - described below.

Setup

  • Region: Japan (Tokyo SIP ingress)
  • Inbound trunk: media encryption disabled, Krisp disabled, allowed-addresses ACL matches sender, codec PCMU/G.711 µ-law negotiated cleanly (PCMU/8000, RTP/AVP plaintext, ptime 20)
  • Carrier: Korean PSTN provider sending plain RTP (no SRTP, no carrier-side transcoding)
  • Our SIP edge: Kamailio + rtpengine relay (verified at every hop with pcap)
  • Agent: livekit-rtc 1.1.8 (Python), AutoSubscribe = AUDIO_ONLY, no noise-cancellation plugin

End-to-end trace for a single ~30s call

Every hop verified with pcap, amplitude decode, LiveKit dashboard, and agent logs:

  • Carrier → our SIP edge (RTP ingress) — PCMU, ~1500 packets, max amplitude 32,124, voice ratio ~10% (real speech) :white_check_mark:
  • Our SIP edge relay (bidirectional) — same packet count out, amplitude preserved :white_check_mark:
  • Our SIP edge → LiveKit Cloud (RTP egress) — packets sent to LiveKit’s advertised endpoint matching its 200 OK SDP :white_check_mark:
  • LiveKit dashboard, SIP participant Total upstream — ~8.78 KB / 31 s ≈ 2.3 kbps, consistent with Opus-DTX-encoded speech at the observed voice ratio. So media reaches the SFU. :white_check_mark:
  • Agent SDK level — track_subscribed callback fires; frame format correct (10 ms mono, 16 kHz, 160 samples/channel). :white_check_mark:
  • Agent subscribed track audio content — abs(int16_samples).max() = 0 for every 3-second window across the entire 27+ s session. Every PCM sample is literally 0x00. :cross_mark:
  • Agent → SIP (TTS downlink) — caller hears the greeting (mostly — see note below). :white_check_mark:

Downlink note: most calls have working downlink, but intermittently ringback + greeting both go silent — caller hears nothing. Suggests this isn’t strictly uplink-only.

Decisive server-side symptom — SDK and SFU disagree

The agent SDK reports track_subscribed fired locally. But on the LiveKit Cloud dashboard for the same session:

  1. The SIP participant’s “Subscribers” table is empty, even though the agent appears as a participant in the room.
  2. The Session Events tab contains no track_published, track_subscribed, or track_unpublished events at all. Only: Room created, Participant joining, Participant active, Participant left, Room ended.

So:

  • SDK side: tracks are published and subscribed.
  • SFU side: track lifecycle was never registered.

This cleanly explains why audio frames arrive at the agent but are zero-filled — the SFU has no subscriber to forward to, so the agent’s track pump receives empty/silence buffers.

Agent log excerpt

event=livekit.track.existing track_sid=TR_AM… participant=sip_
event=livekit.uplink.first_frame samples_per_channel=160 sample_rate=16000 peak_amp=0
event=livekit.uplink.frames count=300 peak_amp=0
event=livekit.uplink.frames count=600 peak_amp=0
event=livekit.uplink.frames count=900 peak_amp=0
… (continues every 300 frames / 3 s through end of call)
event=livekit.uplink.frames count=2700 peak_amp=0

peak_amp = numpy.abs(numpy.frombuffer(frame.data, dtype=int16)).max() — so 0 means every sample in every frame is byte-for-byte 0x00.

What we ruled out

  • SIP signaling / SDP — pcap of INVITE / 100 / 180 / 200 OK / ACK / BYE both directions: clean negotiation, public IPs on both c= lines, PCMU agreed.
  • RTP not reaching LiveKit — pcap on our edge confirms packets to LiveKit’s advertised endpoint; LiveKit dashboard upstream is non-zero.
  • RTP not reaching LiveKit — pcap on our edge confirms packets to LiveKit’s advertised endpoint; LiveKit dashboard upstream is non-zero.
  • SRTP / encryption — trunk media encryption is disabled.
  • Krisp / noise cancellation — disabled at trunk; no nc plugin in agent code.
  • Allowed-addresses ACL — exact /32 match with the sending IP.
  • Source-latching / NAT — LiveKit’s advertised c= IP matches the IP it received our RTP on.
  • Carrier-side audio quality — pcap decode confirms real speech (max amplitude 32,124, ~148 voice frames out of ~1500 packets).
  • Trunk config in general — every option set to its safest value (encryption off, Krisp off, headers default).

What we’d like you to check

  1. SFU-side state for the SIP participant track on our project — is the track lifecycle (publish/subscribe) being registered at all? The dashboard suggests not.
  2. Why does the agent SDK receive track_subscribed callbacks while the SFU has no record of the track lifecycle for the same session?
  3. Any SIP ingress audio processing (VAD, noise gate, codec validation) that might be stripping audio before publish?
  4. Tokyo SIP ingress + PCMU + Korean PSTN routing — any known issue?

We can trigger fresh calls at any time you want SFU logs captured in real time — just let us know a timestamp window.

SDK versions

livekit 1.1.8 (agent process)
livekit-agents (compatible)

Carrier publishes via SIP, not via a LiveKit SDK on that side.

Happy to share pcaps, agent logs, SDP traces, dashboard screenshots, and exact room/participant/session IDs privately with a LiveKit engineer.

Thanks!

All four conditions checked from the Telephony dashboard PCAP and they look consistent with the diagnosis you described:

INVITE offer that reached LiveKit (visible in the TCP signaling inside the PCAP — confirms what we sent arrived intact):
m=audio 30498 RTP/AVP 8 18 0 101
a=rtpmap:0 PCMU/8000
a=rtcp:30499
a=ptime:20

200 OK from LiveKit:
m=audio 55395 RTP/AVP 0 101 ← non-zero :white_check_mark:
a=rtpmap:0 PCMU/8000 ← present :white_check_mark:
a=rtpmap:101 telephone-event/8000
a=ptime:20
a=sendrecv

RTP stream LiveKit sent back to us (1260 packets / 25s, dashboard PCAP):

  • Payload type = 0 (PCMU), stable across all packets :white_check_mark:
  • SSRC = 0xe5d20db9, stable :white_check_mark:
  • Sequence numbers monotonic (0x0000 → 0x04ed), no jumps :white_check_mark:
  • No PT remapping mid-call :white_check_mark:

RTCP: the dashboard PCAP contains zero RTCP packets. No Receiver Reports on port 30499 (the RTCP port we advertised in a=rtcp:30499), and no SR/RR/SDES on any
other port — purely one-way downlink RTP plus the TCP signaling.

Combined with the empty Subscribers table and absence of track_published / track_subscribed events in Session Events — strongly looks like “SIP server receiving
RTP but failing track publication to the room”, exactly as you described.

Identifiers for SFU-side log inspection on our project (Tokyo region):

  • LiveKit Call ID: SCL_zN6Qkxnxb9hb
  • SIP Call-ID: 992990d0486f0cb13c4600af5772acd47f02bb8511bb5fd90-0017-1220
  • Time window: 2026-06-02 15:32:31 — 15:32:59 UTC (~28 s)
  • Caller-side EIP: 13.125.119.208
  • LiveKit external SIP IP observed: 161.115.163.49

Happy to share room/participant IDs or additional sessions privately with anyone on the LiveKit backend team.

The fact that the PCAP only shows RTP packets flowing one way I think is key. To me, that would imply a network setup, or firewall issue:

@darryncampbell
Thanks for taking a look — I think the dashboard metric itself argues against a firewall issue, but want to flag the data point clearly in case it got missed in my earlier post:

The LiveKit dashboard reports Total upstream ≈ 8.78 KB / 31 s on the SIP participant page for the broken call. If our RTP weren’t making it through to LiveKit’s media plane, that value would be zero. Instead it lines up almost exactly with what you’d expect from Opus DTX at the ~10% voice ratio we measured at our edge — so the inbound side appears to be reaching LiveKit.

Other pieces around the firewall direction:

  • Our edge captures show full bidirectional RTP (LiveKit → us ~1500 packets, us → LiveKit ~1500 packets) on the same UDP 30000–40000 ports and the EIP (13.125.119.208) whitelisted in the trunk’s allowed-addresses.
  • Edge SG + NACL are open on UDP 30000–40000 in both directions; SIP UDP 5080 ingress from the carrier IP. We’ve gone through each layer.
  • Caller hears the downlink TTS fine on most calls, so the RTP that does flow back to us is decodable end-to-end.

The dashboard PCAP only containing downlink RTP could reasonably be an artifact of where it captures (egress-only?) rather than an indication that nothing arrived inbound — given the non-zero upstream metric points the other way.

I don’t want to overclaim — I can’t see SFU state from outside, so the “server-side track publication failure” framing is my best inference from what’s visible in the dashboard:

  • Subscribers table empty on the SIP participant
  • Session Events tab has no track_published / track_subscribed / track_unpublished entries — only Room created, Participant joining / active / left, Room ended
  • No RTCP RRs sent back to us from LiveKit on a=rtcp port

If someone with SFU log access could check what happened for this specific session, that would confirm or rule out the interpretation:

  • LiveKit Call ID: SCL_zN6Qkxnxb9hb
  • SIP Call-ID: 992990d0486f0cb13c4600af5772acd47f02bb8511bb5fd90-0017-1220
  • Window: 2026-06-02 15:32:31 — 15:32:59 UTC
  • Caller EIP: 13.125.119.208
  • Tokyo region

Our edge captures show full bidirectional RTP (LiveKit → us ~1500 packets, us → LiveKit ~1500 packets) on the same UDP 30000–40000 ports and the EIP (13.125.119.208) whitelisted in the trunk’s allowed-addresses.

If the PCAP on our side shows only 1 way audio, that implies to me that there is something amiss at the network layer.

I suggest isolating the issue to SIP by testing your agent using the Agent Console, Agent Console | LiveKit Documentation . If that works fine, you know the issue is related to SIP rather than track publication / subscription.

Is this affecting EVERY call?

The LiveKit dashboard reports Total upstream ≈ 8.78 KB / 31 s on the SIP participant page

Hmmm, I don’t see that, but I’m also not entirely sure where that data comes from - I would trust the pcap in this instance.

@darryncampbell
Quick replies + a new data point:

Is this affecting EVERY call?
Yes — 100% of inbound calls on the project, sustained since May 31.
~5-10 attempts, all show uplink peak_amp=0 across every 3-second window
for the full call duration.

I don’t see [Total upstream] / would trust the pcap Path on our side to that pane:
Sessions → click the session → Participants list → click sip_ → Participant page, top stat pane labeled “Total upstream” / “Total downstream”

For our broken call:
Total upstream: 8.78 KB / 31 s
Total downstream: 49.93 KB
Connection: UDP, Region Japan, TTC 13ms
Published tracks: (No results)

NEW data point — dashboard session playback also shows silence:
We just observed a fresh call (RM_cyNiwqS6ePFk) using the dashboard session player. With the User/Agent track toggle:

  • User (subscriber inbound, SIP caller’s audio): 0% — flat silence
  • Agent (subscriber inbound, agent’s TTS): also 0% — silent
    Even the dashboard’s own recorded playback for this session has both directions silent. This is an internal LiveKit-side view of what was forwarded — independent of our edge or our agent SDK.

Re: Agent Console isolation test — fair call. Will run a WebRTC-direct test on our project to formally rule out SIP-specific factors and report back. (Our agent SDK is 1.1.8 and Console requires 1.5.2+, so this requires an upgrade step our side first.)

Identifiers for SFU-side correlation on our project:
Project ID: p_55yx00llx9k
Region: Japan (Tokyo)
Latest broken room: RM_cyNiwqS6ePFk
Earlier broken room: RM_PCviNgqYUoei (and others, all today)
Sample SIP call: SCL_zN6Qkxnxb9hb
Sample SIP Call-ID: 992990d0486f0cb13c4600af5772acd47f02bb8511bb5fd90-0017-1220
Sample window: 2026-06-02 15:32:31 — 15:32:59 UTC
Caller EIP: 13.125.119.208

For PCAP cross-checking on our side: edge tcpdump shows 4 symmetric flows on UDP 30000-40000 (carrier<->edge<->LiveKit, ~1500 packets each direction, voice content preserved end-to-end). Happy to share the raw .pcap if useful.

Standing by — happy to fire fresh repros at any window you specify for real-time SFU log capture.

Is the RTP stream to LiveKit going to the IP/Port in the ACK SDP?

@darryncampbell
Update — Agent Console test result: WebRTC path is functional.

We just ran your suggested isolation test. Launched the agent from the Agent Console (WebRTC, no SIP), spoke into the browser, and the agent received audio cleanly and responded with TTS — full duplex
working.

Updated ask: could SFU-side state be checked specifically for the SIP → room track publication path for our project?
Project ID: p_55yx00llx9k
Region: Japan (Tokyo)
Latest broken SIP room: RM_cyNiwqS6ePFk
Sample SIP call: SCL_zN6Qkxnxb9hb
SIP Call-ID: 992990d0486f0cb13c4600af5772acd47f02bb8511bb5fd90-0017-1220
Window: 2026-06-02 15:32:31 — 15:32:59 UTC
Caller EIP: 13.125.119.208

Specifically: when a SIP participant joins via the inbound trunk, is its audio track actually being published to the room? On the dashboard, the SIP participant’s “Published tracks” list shows “No results” and Session Events has no track_published event — but upstream-bytes are non-zero. Could be ingress->publish path stuck
somewhere internal to the SIP service.

Happy to fire fresh repros at any window you specify.

@CWilson
ACK contains no SDP body in our case — early offer/answer (INVITE carries offer, 200 OK carries answer). Confirmed from the pcap: the ACK has no Content-Length / no SDP, just To/From/CSeq/Via.

So our RTP egress is anchored on the 200 OK SDP:
200 OK c=: 161.115.163.128
200 OK m=: audio 55395 RTP/AVP 0 101

Our edge tcpdump confirms RTP destination matches exactly:
Our edge → 161.115.163.128:55395 (1500 packets, PCMU)

One observation that might or might not be relevant: in the same call, the RTP we receive from LiveKit arrives with src 161.115.163.49:55395 — different IP from the advertised c= (.128), but same port. We assume that’s normal LiveKit distributed media fanout (advertised ingress IP ≠ active media-server IP), and we’ve been latching/forwarding on the src we actually see. Let me know if that asymmetry is something we should be handling differently on our side.

Identifiers for this specific call:
Call ID: SCL_zN6Qkxnxb9hb
SIP Call-ID: 992990d0486f0cb13c4600af5772acd47f02bb8511bb5fd90-0017-1220
Window: 2026-06-02 15:32:31 — 15:32:59 UTC
Project: p_55yx00llx9k (Tokyo)

Happy to share the pcap.

I’m not sure what’s happening with the dashboard, but I see tracks being published in the server logs for SCL_zN6Qkxnxb9hb. The key point though is that if there is no data in the pcap, the agent wouldn’t be hearing anything. That PCAP is captured at the edge of our network.

I’m wondering what change to start this issue on May 31st, that was Sunday and I see no changes on our side at that time.

@darryncampbell
Thanks for the inside view. We have two data points that pull opposite ways:

Our edge tcpdump shows ~1500 RTP packets (PCMU, real voice, max amp 32,124) going to 161.115.163.128:55395 — exactly the c= advertised in your 200 OK. Dashboard “Total upstream” on the SIP participant also reports ~8.78 KB / 31s, consistent with Opus-DTX of that PCMU.

But your edge PCAP shows no inbound. One thing that could explain the gap: the RTP we receive back arrives with src 161.115.163.49 :55395 — different IP, same port. If your captured edge is .49 (or anything other than .128), it wouldn’t see our packets — we never send there.

Could you confirm which LiveKit IP your edge PCAP was captured on?
We send to .128, receive from .49.

Re May 31: we recreated the SIP proxy EC2 May 29 (x86/Ubuntu → ARM/AL2023, Kamailio + rtpengine as Docker containers). Close timing. Edge pcap shows full bidirectional voice though, and Agent Console (WebRTC) on the same project works fine — so issue is SIP-path specific, just unclear if it’s our SIP edge or
LiveKit SIP ingress.

I might need to defer to @CWilson on this, but if I look at the PCAP, I see the c= (connection address) of the 200 OK set to 161.115.163.49, so why are you sending to .128?

@darryncampbell
You’re right — apologies, the “.128” I cited was the c= from a different earlier call we have an edge pcap for not SCL_zN6Qkxnxb9hb. For that call the 200 OK c= is indeed 161.115.163.49, as you note.

Just ran a fresh repro with paired captures on both sides of our edge.

LiveKit 200 OK c= : 161.115.163.144
LiveKit 200 OK m=audio port: 57833
Our edge → 161.115.163.144: 1573 packets ← matches the c= exactly
Our edge ← 161.115.163.144: 1572 packets ← LiveKit replied from same IP

So our rtpengine egress goes to whatever c= LiveKit advertises in the 200 OK — consistent across calls. The advertised IP varies per call (.128 / .49 / .144 across three calls), but each time we send to exactly what LiveKit just advertised, with 1:1 packet match end-to-end at our edge.

The call still resulted in silence — agent received uplink frames but peak_amp=0 across the full call.

So either:

  • Our packets are getting dropped between our EIP (13.125.119.208) and LiveKit’s SIP-ingress at .144 — but the 1572 packets we received back from .144 on the same UDP socket suggests the path is reachable in at least one direction.
  • Or LiveKit’s edge PCAP is captured on a different node than the one .144 routes to internally.

For SFU correlation:
Project: p_55yx00llx9k (Tokyo)
Room SID: RM_SFRKsJZoeeaZ
Room name: aicx-call-_01073794936_FTuzJLCw2CTX
Call window: 2026-06-02 15:30:55 — 15:32:10 UTC
LiveKit advertise: 161.115.163.144:57833
Caller EIP: 13.125.119.208

As @darryncampbell mentioned earlier, we are not receiving RTP from the SIP provider. This is almost certainly an asymmetric firewall / NAT / security-group issue.

Two outcomes: (a) no packets leave → rtpengine/Kamailio isn’t forwarding uplink to LiveKit’s answered address (check it’s honoring the 200 OK c=/m=, not latching to a stale port); (b) packets leave, but LiveKit’s side still shows packets_input: 0 → a firewall/SG between them is dropping inbound UDP to 161.115.163.144:59611. Also, confirm your security groups allow inbound UDP from LiveKit’s media range, not just outbound.

One other possible issue the SIP provider offered PCMA(8) G729(18) PCMU(0); LiveKit answered PCMU(0). Make sure PCMU is supported by the SIP provider, or don’t offer it if it isn’t. This is probably not the case since caller can hear the agent.

You can download an example PCAP here and see no RTP was received by LiveKit:
https://cloud.livekit.io/projects/p_/telephony/SCL_LdAEFJnPsjFU/inbound

@CWilson
Update — isolation test localized the issue to our SIP-edge rtpengine. Disabling rtpengine_manage() / rtpengine_delete() in Kamailio (SDP forwarded untouched, no AICX relay) restores prior behavior:

rtpengine ON  (RTP src = our EIP 13.125.119.208):
  0 inbound packets at LiveKit edge, full silence both directions.

rtpengine OFF (RTP src = carrier IP pool 203.240.134.x,
               203.251.250.250-5):
  Greeting heard, STT intermittent — pre-regression behavior.

So the failure is specific to RTP arriving at your SIP ingress with src = our EIP, even though that EIP is exactly what the trunk’s allowed_addresses whitelists (13.125.119.208/32). RTP from the carrier’s IPs (NOT in allowed_addresses) gets through. RTP from the whitelisted IP doesn’t. This also makes our earlier WebRTC cross-tenant correlation (topic #1314) likely unrelated — different issue with similar surface.

Question: does allowed_addresses apply only to SIP signaling, or does the ingress independently validate RTP src (e.g. must match SIP source, or c= media IPs)? rtpengine legitimately rewrites c= to our EIP and sends RTP from that EIP, but the SIP signaling source is 203.240.134.4 — if RTP src is being compared against the signaling source, our packets would be filtered.

Original motivation: carrier sends RTP from 6 IPs we can’t enumerate in allowed_addresses, so rtpengine was meant to give LiveKit a single stable RTP src to whitelist. Now we’re stuck — direct mode keeps the original intermittent STT, rtpengine mode fixes that but its packets are dropped.

Project:   p_55yx00llx9k  (Tokyo)
Trunk:     aicx-dev-sejong-trunk, allowed 13.125.119.208/32
Broken (rtpengine) sample: SCL_wQjJmKduHQfy

If RTP source validation is the mechanism, what’s the recommended way to (a) declare additional allowed RTP sources, or (b) get the ingress to accept RTP from the whitelisted IP consistently?

If you remove the values from “allowed_addresses”, which should allow all addresses, do you get a different result?

I will need to double-check if “allowed_addresses” applies to RTP. I thought it was only the SIP signaling, but I need to verify that.

@CWilson
Tested with allowed_addresses cleared (empty). Two back-to-back calls, same rtpengine config:

Call 1: RM_uUN4GGrtKN4n  (2026-06-04 00:06:42–00:07:52 KST)
  LiveKit advertise: 161.115.163.191:57623
  Our edge: 1531 sent / 1502 received (1:1 socket pair)
  Result:   TTS ✅ heard greeting, STT ❌

Call 2: RM_mn9sjVKKniKN  (2026-06-04 00:10:01–00:10:52 KST)
  LiveKit advertise: 161.115.163.186:51376
  Our edge: 688 sent / 688 received (1:1 socket pair)
  Result:   TTS ❌ no greeting, STT ❌

Same setup, same minute, different outcomes. Clearing allowed_addresses had partial effect (TTS came back intermittently) but didn’t fully fix it.

Could you look at the SFU/ingress logs for these two rooms and compare what differs between .191 (partial success) vs .186 (full silence)?

Project: p_55yx00llx9k (Tokyo)

I verified that “allowed_addresses” applies only to SIP Signaling, not to RTP streams.

I only see one RTP stream

Same thing for this session. I am not sure how to help you. Something between you and us is blocking your RTP stream to us.