Outbound SIP call: AI agent speaks before callee's phone rings — missing 180 Ringing in PCAP

Outbound SIP call: Agent speaks before callee’s phone rings — missing 180 Ringing in PCAP

Problem

We’re using LiveKit SIP trunk (via Twilio) for outbound calls. On certain destination numbers, the AI agent starts speaking before the callee’s phone has even started ringing. The callee hears the tail end of the first speech when they pick up, or misses it entirely.

We captured PCAPs of two calls from the same LiveKit SIP trunk to compare:

  • Call A (problematic): Agent spoke before the callee’s phone rang
  • Call B (normal): Agent spoke only after the callee answered

Both calls originate from the same LiveKit SIP endpoint (3tqm7ro6kbs.sip.livekit.cloud:9000) via Twilio (xxxxxx.pstn.twilio.com).

Our outbound call flow (code-level context)

Our application uses create_sip_participant with wait_until_answered=True to place the outbound call:

sip_participant_response = await livekit_api.sip.create_sip_participant(
    api.CreateSIPParticipantRequest(
        sip_trunk_id=trunk_id,
        room_name=room_name,
        sip_call_to=phone_number,
        sip_number=from_number,
        participant_identity=phone_number,
        play_dialtone=True,
        play_ringtone=True,
        wait_until_answered=True,
    )
)

Only after create_sip_participant returns successfully (i.e., the call is “answered” per SIP 200 OK) does our code proceed to start initiate the agent’s response. So our application correctly waits for the call to be answered before speaking — the issue is that the SIP 200 OK arrives before the callee’s phone actually starts ringing.

PCAP Evidence

Call A — Agent spoke too early (no 180 Ringing)

T=0.000s  INVITE  → Twilio
T=0.002s  100 Trying
T=0.023s  407 Proxy Auth
T=0.027s  INVITE (with auth)
T=0.028s  100 Trying
                          ← ** No 180 Ringing **
T=3.016s  200 OK          ← Call "answered" in ~3s (Server: Twilio, Session Name: Twilio Media Gateway)
T=3.034s  RTP starts      ← Both directions, agent speaks immediately

Key observations:

  • No 180 Ringing response between 100 Trying and 200 OK
  • 200 OK arrives only ~3 seconds after the authenticated INVITE — too fast for a human to answer
  • RTP begins 18ms after 200 OK with bidirectional audio immediately
  • The 200 OK comes from Server: Twilio with Session Name: Twilio Media Gateway
  • 200 OK has SDP with sendrecv, codec PCMU, media endpoint 168.86.138.29:12676

Call B — Normal behavior (180 Ringing present)

T=0.000s  INVITE  → Twilio
T=0.001s  100 Trying
T=0.021s  407 Proxy Auth
T=0.025s  INVITE (with auth)
T=0.026s  100 Trying
T=1.627s  180 Ringing     ← Phone is ringing (no SDP, Content-Length: 0)
T=12.449s 200 OK          ← Callee answers after ~11s of ringing
T=12.458s RTP starts      ← Agent speaks only now
T=39.721s BYE             ← Normal call end

Key observations:

  • 180 Ringing arrives at T=1.6s (no SDP body, signaling-only)
  • 200 OK arrives at T=12.4s — consistent with a human answering after several rings
  • RTP begins 9ms after 200 OK
  • 200 OK has SDP with sendrecv, codec PCMU, media endpoint 168.86.139.31:14814

Side-by-side comparison

Call A (problematic) Call B (normal)
180 Ringing Absent Present at T=1.6s
200 OK timing T=3.0s (~3s post-INVITE) T=12.4s (~12s post-INVITE)
First RTP packet T=3.034s T=12.458s
Agent spoke Immediately on 200 OK Immediately on 200 OK
Callee experience Heard disclosure before phone rang Normal — heard disclosure after picking up

Twilio recording confirms in-band ringback after 200 OK

We confirmed via the Twilio call recording for Call A that ringback tone (ringing sound) is audible in the audio after the SIP 200 OK was received. This means:

  1. Twilio (or the downstream carrier) sent a 200 OK at T=3s, establishing the media path
  2. Ringback tone was then played in-band over RTP — the callee’s phone was still ringing
  3. Our agent received the 200 OK, treated the call as answered, and started speaking
  4. The agent’s speech and the in-band ringback overlapped — the callee had not yet picked up

Analysis

Our application behaves correctly in both cases — it waits for create_sip_participant (with wait_until_answered=True) to return, then starts agent speech after the SIP 200 OK is received.

In Call A, the SIP 200 OK arrives from Server: Twilio / Twilio Media Gateway after only 3 seconds with no prior 180 Ringing, and the Twilio recording confirms that in-band ringback was still playing after the 200 OK. The PCAP only captures the LiveKit ↔ Twilio leg, so we cannot see what is happening between Twilio and the downstream terminating carrier. Both calls use the same Twilio media IP range (168.86.138.x / 168.86.139.x) and SIP proxy (54.172.60.3 — AWS ec2-54-172-60-3.compute-1.amazonaws.com).

Questions

  1. Has anyone else encountered this with Twilio SIP trunking where certain destination numbers receive a 200 OK without a prior 180 Ringing?
  2. Is there a recommended way to handle this “early answer / false connect” scenario?
  3. Should this be raised with Twilio directly, given that the 200 OK originates from their media gateway?

Any guidance would be appreciated. Happy to share PCAPs privately if needed.

Purely based on the signaling description you provided this sounds like a Twilio / downstream carrier issue to me

Thanks for the response. I will raise this with Twilio support.