Yesterday, REFERs stopped matching on outbound

Timeline: Transfers/REFERs were working fine until yesterdat approx 5pm EST.

SIPtransfer_sip_participantsuddenly failing with 407 Proxy Authentication Required. From-URI realm mismatch. Seems SIP livekit backend is now passing the wrong SIP domain in the from host..

@Mark_Diaz, For a Cloud-side LK SIP backend regression on outbound REFER, support is the right channel. Backend logs for your project ID will tell the team whether the From URI construction shifted.

livekit/sip has had commits in the past two weeks touching SDP behavior, inbound auth realm customization, and 407 state handling, but none of them explicitly describe a REFER From URI change. Mentioning the timing window on your support ticket gives the team something to correlate against.

Transfer API reference: Call forwarding | LiveKit Documentation

Do you have a call id that failed?

Last test: ‘SCL_q4mhpfYxC7y4’

It was working fine until Monday approx 5pm EST. SIP engineering team tells me the LK sip gateway from_host is no longer matching the realm, as it was previously.

Looked at the code and the LiveKit‑side REFER From URI behavior has not changed in ~8 months. The change that aligns with the your timeline must be on the SIP provider’s side.

Do you have a call ID that previously worked? It is not exactly clear to me why you would expect your SIP provider to even have credentials for 5sii....sip.livekit.cloud since that is LiveKit domain.

I agree. I also looked at the code but I was not sure if your repo was in sync with what you’re running on your cloud.

SCL_rYdAiTBCm67q

So I looked at the PCAP for that SCL_rYdAiTBCm67q (success) and SCL_q4mhpfYxC7y4 (fail) and the REFER are nearly exactly the same. So not sure why your team believes its an issue on the LiveKit side.

Just to be sure I took a look at all the SIP changes between the two calls:

livekit/sip: bfbe8eb per-call media timeout
             dba15f9 configurable TLS ALPN
             36d7d59 track log consolidation
             be9e564 CANCEL race
             bab77e1 SDP local-IP / symmetric mode
             549c718 Allow customizing auth realm for INBOUND calls
             d3e37ef media timeout correctness
             cb9c0b4 disable TURN for LK RTC

None of these touch the outbound REFER From-URI path.

What the PCAPs also show: the FreeSWITCH node that handled the May 14 REFER was 157.xxx.xxx.28:11000; the one that 407’d on May 21 was 157.xxx.xxx.27:11000. Different Kazoo nodes, same trunk.

I would check if that’s consistent with SIP proxy rolling out a config or upgrade to part of their FreeSWITCH cluster (e.g., a new ACL/auth profile on :.27) between your two tests, while :.28 was either upgraded later or carries a different policy.

Why is the proxy challenging an in-dialog REFER?

@Mark_Diaz, The PCAP comparison settles it: near-identical REFERs from the LK side on both calls, the only material difference being which Kazoo node handled them (157.x.x.28 accepted May 14, 157.x.x.27 407'd May 21). That rules out the LK-side From URI theory.

The practical next step is with your SIP provider. Take both node IPs to whoever operates that FreeSWITCH/Kazoo cluster and ask why .27 rejects the From URI realm that .28 accepts. The common cause is per-node auth/realm config drift: one node picked up a config change around your May 21 timeline that altered how it validates the From host, while the other kept the prior behavior. Same trunk, divergent node config.

They can confirm by diffing the SIP profile and realm ACL between the two nodes. That diff is where the 407 is coming from.

As I am the SIP carrier :slight_smile: I will push my engineering team to investigate. I greatly appreciate the diligence, clarification and SIP traces. I’ll update soon! Thank you team!