Hi LiveKit team,
Every inbound SIP call shows the same pattern: the agent’s TTS greeting reaches the caller fine, but the agent’s subscribed track for the SIP participant delivers only zero-filled audio frames (every PCM sample is literally 0x00). Caller hangs up after ~30s due to one-way audio.
The decisive evidence is a disagreement between SDK callbacks and the SFU’s own dashboard state - described below.
Setup
- Region: Japan (Tokyo SIP ingress)
- Inbound trunk: media encryption disabled, Krisp disabled, allowed-addresses ACL matches sender, codec PCMU/G.711 µ-law negotiated cleanly (PCMU/8000, RTP/AVP plaintext, ptime 20)
- Carrier: Korean PSTN provider sending plain RTP (no SRTP, no carrier-side transcoding)
- Our SIP edge: Kamailio + rtpengine relay (verified at every hop with pcap)
- Agent: livekit-rtc 1.1.8 (Python), AutoSubscribe = AUDIO_ONLY, no noise-cancellation plugin
End-to-end trace for a single ~30s call
Every hop verified with pcap, amplitude decode, LiveKit dashboard, and agent logs:
- Carrier → our SIP edge (RTP ingress) — PCMU, ~1500 packets, max amplitude 32,124, voice ratio ~10% (real speech)

- Our SIP edge relay (bidirectional) — same packet count out, amplitude preserved

- Our SIP edge → LiveKit Cloud (RTP egress) — packets sent to LiveKit’s advertised endpoint matching its 200 OK SDP

- LiveKit dashboard, SIP participant Total upstream — ~8.78 KB / 31 s ≈ 2.3 kbps, consistent with Opus-DTX-encoded speech at the observed voice ratio. So media reaches the SFU.

- Agent SDK level — track_subscribed callback fires; frame format correct (10 ms mono, 16 kHz, 160 samples/channel).

- Agent subscribed track audio content — abs(int16_samples).max() = 0 for every 3-second window across the entire 27+ s session. Every PCM sample is literally 0x00.

- Agent → SIP (TTS downlink) — caller hears the greeting (mostly — see note below).

Downlink note: most calls have working downlink, but intermittently ringback + greeting both go silent — caller hears nothing. Suggests this isn’t strictly uplink-only.
Decisive server-side symptom — SDK and SFU disagree
The agent SDK reports track_subscribed fired locally. But on the LiveKit Cloud dashboard for the same session:
- The SIP participant’s “Subscribers” table is empty, even though the agent appears as a participant in the room.
- The Session Events tab contains no track_published, track_subscribed, or track_unpublished events at all. Only: Room created, Participant joining, Participant active, Participant left, Room ended.
So:
- SDK side: tracks are published and subscribed.
- SFU side: track lifecycle was never registered.
This cleanly explains why audio frames arrive at the agent but are zero-filled — the SFU has no subscriber to forward to, so the agent’s track pump receives empty/silence buffers.
Agent log excerpt
event=livekit.track.existing track_sid=TR_AM… participant=sip_
event=livekit.uplink.first_frame samples_per_channel=160 sample_rate=16000 peak_amp=0
event=livekit.uplink.frames count=300 peak_amp=0
event=livekit.uplink.frames count=600 peak_amp=0
event=livekit.uplink.frames count=900 peak_amp=0
… (continues every 300 frames / 3 s through end of call)
event=livekit.uplink.frames count=2700 peak_amp=0
peak_amp = numpy.abs(numpy.frombuffer(frame.data, dtype=int16)).max() — so 0 means every sample in every frame is byte-for-byte 0x00.
What we ruled out
- SIP signaling / SDP — pcap of INVITE / 100 / 180 / 200 OK / ACK / BYE both directions: clean negotiation, public IPs on both c= lines, PCMU agreed.
- RTP not reaching LiveKit — pcap on our edge confirms packets to LiveKit’s advertised endpoint; LiveKit dashboard upstream is non-zero.
- RTP not reaching LiveKit — pcap on our edge confirms packets to LiveKit’s advertised endpoint; LiveKit dashboard upstream is non-zero.
- SRTP / encryption — trunk media encryption is disabled.
- Krisp / noise cancellation — disabled at trunk; no nc plugin in agent code.
- Allowed-addresses ACL — exact /32 match with the sending IP.
- Source-latching / NAT — LiveKit’s advertised c= IP matches the IP it received our RTP on.
- Carrier-side audio quality — pcap decode confirms real speech (max amplitude 32,124, ~148 voice frames out of ~1500 packets).
- Trunk config in general — every option set to its safest value (encryption off, Krisp off, headers default).
What we’d like you to check
- SFU-side state for the SIP participant track on our project — is the track lifecycle (publish/subscribe) being registered at all? The dashboard suggests not.
- Why does the agent SDK receive track_subscribed callbacks while the SFU has no record of the track lifecycle for the same session?
- Any SIP ingress audio processing (VAD, noise gate, codec validation) that might be stripping audio before publish?
- Tokyo SIP ingress + PCMU + Korean PSTN routing — any known issue?
We can trigger fresh calls at any time you want SFU logs captured in real time — just let us know a timestamp window.
SDK versions
livekit 1.1.8 (agent process)
livekit-agents (compatible)
Carrier publishes via SIP, not via a LiveKit SDK on that side.
Happy to share pcaps, agent logs, SDP traces, dashboard screenshots, and exact room/participant/session IDs privately with a LiveKit engineer.
Thanks!

