Server-initiated migration fails to resume on agents 1.4.6 — subscriber + publisher PC fail, no recovery, process killed (expected fixed in >1.4.2 per agents #4705)

Zaheer_Abbas · June 16, 2026, 1:47pm

Setup: voice agents on livekit-agents 1.4.6 / livekit 1.1.2 / livekit-api 1.0.7 (Python 3.11), inbound SIP telephony, LiveKit Cloud (project p_3tqm7ro6kbs). Each call = one agent job; ctx.connect() is invoked at job entry.

Background: we previously reported subscriber PeerConnection failures in livekit/agents#4705, where the guidance was that this should be addressed in >1.4.2. We have since upgraded to 1.4.6 and still hit it — this time clearly triggered by a server-initiated migration, after which the connection never recovers.

What we saw (one inbound call, 2026-06-16 UTC). The call ran normally for ~34s. Then LiveKit Cloud issued a migration and the Resume never recovered:

13:07:10.796  Participant migration — rtc_engine: received session close: "server request to leave"  (reason: Migration / Resume)
13:07:21.805  rtc_session:762   signal_event taking too much time: Answer(SessionDescription { type: "answer", ... a=ice-lite ... a=recvonly ... })
13:07:29.414  rtc_session:1161  Subscriber pc state failed   → resuming connection... attempt: 0
13:07:29.510  rtc_session:919   Wrong packet sequence while retrying: 1046
13:07:47.204  rtc_session:1161  Publisher pc state failed    → resume
13:11:52.443  rtc_session:1161  Subscriber pc state failed   → resume
13:15:19       livekit.agents: process exited with non-zero exit code -9

Sequence: a server migration → during the Resume the SDK spent >10s processing the new subscriber Answer (the signal_event taking too much time watchdog) → the subscriber PeerConnection failed → the publisher PeerConnection failed → the SDK retried Resume ~3× over ~4.5 minutes (Wrong packet sequence while retrying) and never reconnected → the process exited with -9. From the caller’s side the agent went silent at ~13:07:11, mid-conversation — a dropped call.

IDs (LiveKit Cloud, project p_3tqm7ro6kbs): room RM_viPL8m3AqU8d, job AJ_m4tsujv4U4Fs, worker AW_GnAj5VMfxtbX, window ~13:07:10–13:15:19 UTC on 2026-06-16.

Please help investigate this issue and let us know what mitigation we can put in place

Happy to share full agent logs for the room/job above.

Zaheer_Abbas · June 16, 2026, 1:49pm

github.com/livekit/agents

Participant Migration and State Mismatch issues in certain SIP calls

opened 08:42PM - 03 Feb 26 UTC

closed 07:55PM - 13 Feb 26 UTC

zaheerabbas-prodigal

bug

### Bug Description We're seeing intermittent issues around SIP participant mig…ration/reconnect while running `livekit-agents` (Python) on Kubernetes. In LiveKit Analytics (LiveKit cloud dashboard) the SIP identity often shows up twice in the same room (leave + immediate rejoin), and in our logs we frequently see: `livekit::rtc_engine - received session close: "server request to leave" Migration Resume` After that point we see two related behaviors: 1) `ParticipantDisconnected` fires but `participant.disconnect_reason` is `null` - In Analytics, we see the SIP participant leave labeled "Participant left (MIGRATION)" and the agent participant leave labeled "Participant left (STATE_MISMATCH)". - We currently use `disconnect_reason` to decide whether to treat the disconnect as a migration/reconnect vs. a real hangup. When it's `null` we can't reliably distinguish the two, and sometimes end the call early by deleting the room. 2) In other cases we never get `ParticipantDisconnected` at all - Shortly after Migration Resume we see `signal_event taking too much time: Answer(SessionDescription ...)` - Then ~60-90s later the agent job is force-killed because it doesn't exit cleanly - In some cases the call indefinitely goes into `pc_state failed` and does not recover unless manually process is killed Example logs for a single call (disconnect_reason is null): ```json {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"server request to leave\" Migration Resume", "level": "WARNING", "name": "livekit", "worker_id": "AW_mK6w97YC5D3J", "pid": 270148, "timestamp": "2026-01-23T20:03:21.285929+00:00"} {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"signal client closed: \\\"stream closed\\\"\" UnknownReason Resume", "level": "WARNING", "name": "livekit", "worker_id": "AW_mK6w97YC5D3J", "timestamp": "2026-01-23T20:03:21.286151+00:00"} {"message": "livekit::rtc_engine:773:livekit::rtc_engine - resuming connection... attempt: 0", "level": "ERROR", "name": "livekit", "worker_id": "AW_mK6w97YC5D3J", "pid": 270148, "timestamp": "2026-01-23T20:03:21.286258+00:00"} {"levelname": "INFO", "process": 270148, "event": "Participant Disconnected", "participant": "rtc.RemoteParticipant(sid=PA_KmSLmD9jKnjv, identity=sip_+XXXXXX, name=Phone +XXXXXX)", "kind": 3, "identity": "sip_+XXXX", "disconnect_reason": null, "timestamp": "2026-01-23T20:03:24.347553+00:00"} ``` Example logs in the "stuck" case: ```json {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"server request to leave\" Migration Resume", "level": "WARNING", "name": "livekit", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "timestamp": "2026-01-30T22:09:45.271724+00:00"} {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"signal client closed: \\\"stream closed\\\"\" UnknownReason Resume", "level": "WARNING", "name": "livekit", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "timestamp": "2026-01-30T22:09:45.271995+00:00"} {"message": "livekit::rtc_engine:773:livekit::rtc_engine - resuming connection... attempt: 0", "level": "ERROR", "name": "livekit", "pid": 139944, "timestamp": "2026-01-30T22:09:45.272345+00:00"} {"message": "livekit::rtc_engine::rtc_session:715:livekit::rtc_engine::rtc_session - signal_event taking too much time: Answer(SessionDescription { r#type: \"answer\", sdp: \"v=0\\r\\no=- 6603763378326092889 1769810986 IN IP4 0.0.0.0\\r\\ns=-\\r\\nt=0 0\\r\\na=msid-semantic:WMS *\\r\\na=fingerprint:sha-256 2F:7D:65:03:79:2A:C8:E8:2A:DB:C8:EA:63:80:24:D0:E1:A5:0A:99:ED:84:F8:5E:64:3C:E8:38:1E:EC:74:5B\\r\\na=ice-lite\\r\\na=extmap-allow-mixed\\r\\na=group:BUNDLE 0 1\\r\\nm=audio 9 UDP/TLS/RTP/SAVPF 63 111 0 8\\r\\nc=IN IP4 0.0.0.0\\r\\na=setup:active\\r\\na=mid:0\\r\\na=ice-ufrag:moWOANYkpxRtTanA\\r\\na=ice-pwd:tUgqKFpgfzQQFgccENCyKxxIPUQJOHnC\\r\\na=rtcp-mux\\r\\na=rtcp-rsize\\r\\na=rtpmap:63 red/48000/2\\r\\na=fmtp:63 111/111\\r\\na=rtpmap:111 opus/48000/2\\r\\na=fmtp:111 minptime=10;useinbandfec=1;usedtx=1\\r\\na=rtpmap:0 PCMU/8000\\r\\na=rtpmap:8 PCMA/8000\\r\\na=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level\\r\\na=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid\\r\\na=recvonly\\r\\nm=application 9 UDP/DTLS/SCTP webrtc-datachannel\\r\\nc=IN IP4 0.0.0.0\\r\\na=setup:active\\r\\na=mid:1\\r\\na=sendrecv\\r\\na=sctp-port:5000\\r\\na=max-message-size:65535\\r\\na=ice-ufrag:moWOANYkpxRtTanA\\r\\na=ice-pwd:tUgqKFpgfzQQFgccENCyKxxIPUQJOHnC\\r\\n\", id: 0, mid_to_track_id: {} })", "level": "ERROR", "name": "livekit", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "timestamp": "2026-01-30T22:09:56.411066+00:00"} {"message": "livekit::rtc_engine::rtc_session:1041:livekit::rtc_engine::rtc_session - Subscriber pc state failed", "level": "ERROR", "name": "livekit", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:10:05.096736+00:00"} {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"pc_state failed\" UnknownReason Resume", "level": "WARNING", "name": "livekit", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:10:05.097985+00:00"} {"message": "livekit::rtc_engine:773:livekit::rtc_engine - resuming connection... attempt: 0", "level": "ERROR", "name": "livekit", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:10:05.098116+00:00"} {"message": "process did not exit in time, killing process", "level": "ERROR", "name": "livekit.agents", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:11:06.059865+00:00"} {"message": "killing process", "level": "INFO", "name": "livekit.agents", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:11:06.060096+00:00"} {"message": "sending SIGUSR1 signal to process", "level": "INFO", "name": "livekit.agents", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:11:06.060173+00:00"} {"message": "process exited with non-zero exit code -10", "level": "ERROR", "name": "livekit.agents", "pid": 139944, "job_id": "AJ_vau2WWRWuKjG", "room_id": "RM_RWpxP322Ynvs", "timestamp": "2026-01-30T22:11:06.085801+00:00"} ``` Questions: 1. What typically triggers a SIP participant “Migration” in LiveKit Cloud—SIP/dialer behavior (trunk/provider, SIP edge, network) vs LiveKit-initiated migration? Are there recommended configuration changes or best practices to reduce how often this happens (or avoid it entirely)? Also, when migration happens, ParticipantDisconnected.disconnect_reason is sometimes null (which is derived from “Unknown” status as [per code here](https://github.com/livekit/python-sdks/blob/main/livekit-rtc/livekit/rtc/participant.py#L142C1-L144C1)), which makes it hard to safely decide whether to keep the call alive—what’s the correct signal or recommended handling here? 2. What does “STATE_MISMATCH” mean in this context (what state is mismatching, and why)? What are the common root causes, and are there specific mitigations we can apply to reduce the frequency of these events? ### Expected Behavior - If the server considers a SIP participant leave to be migration/reconnect, the SDK callback should include a non-null `disconnect_reason` (or another reliable signal) so apps can avoid treating it as a real hangup. - If Analytics shows SIP participant "MIGRATION" or agent participant "STATE_MISMATCH", we'd expect the SDK callback to report something consistent. - Possible solutions for webrtc Signaling (`Answer(SessionDescription ...)`) during/after Migration Resume to be handled in RTC package ### Reproduction Steps ```bash We don't have a deterministic minimal repro yet (production-only so far), but we have provided additional room/job IDs and full logs. - I have ensured this is not a networking issue on our end* - I have also looked at SIP PCAP logs available on LiveKit dashboard but didn't notice any inconsistency* ``` ### Operating System Linux (containerized on AWS EKS, Kubernetes, us-east-2, python 3.11) ### Models Used STT: Deepgram; TTS: ElevenLabs; VAD: Silero ### Package Versions ```bash livekit==1.0.23 livekit-agents==1.3.10 livekit-api==1.0.7 livekit-blingfire==1.1.0 livekit-plugins-anthropic==1.3.10 livekit-plugins-cartesia==1.3.10 livekit-plugins-deepgram==1.3.10 livekit-plugins-elevenlabs==1.3.10 livekit-plugins-google==1.3.10 livekit-plugins-noise-cancellation==0.2.5 livekit-plugins-openai==1.3.10 livekit-plugins-silero==1.3.10 livekit-plugins-turn-detector==1.3.10 livekit-protocol==1.1.1 ``` ### Session/Room/Call IDs Below are Room IDs: LiveKit Cloud Project ID: `p_3tqm7ro6kbs` #### Case A - `disconnect_reason: null`: Date | Job ID | Room ID | Trunk Provider -- | -- | -- | -- 2026-01-30 | AJ_LgybQoHWHrkk | RM_qbb2qTaMaNji | TCN 2026-01-30 | AJ_wvD4L2VH6daA | RM_dkqfDw3QVRBR | TCN 2026-01-29 | AJ_8As8YVjvbSP2 | RM_8Vi9SAaw9bPb | TCN 2026-01-23 | AJ_aJdzGbmjGSjJ | RM_fxTBN2HdgiWD | TCN 2026-01-12 | AJ_HRXkAVJJGjZg | RM_NrLqAFpnZghB | TCN 2026-01-09 | AJ_toohMYkYky5w | RM_m7FRVEEpXb6H | TCN 2026-01-08 | AJ_Q6wXWJt2cysu | RM_ugZKppSeJ6kt | TCN 2026-01-05 | AJ_saxUGSUn582i | RM_VBwD4aEFvjeW | TCN 2025-12-30 | AJ_7stG8Zwipi3d | RM_UYLyTqfXguKo | TCN #### Case B - `signal_event taking too much time` + forced kill (STATE_MISMATCH on agent participant unless noted) | Date | Job ID | Room ID | Trunk Provider | Agent `STATE_MISMATCH`? | Notes | |---|---|---|---|---|---| | 2026-01-30 | AJ_vau2WWRWuKjG | RM_RWpxP322Ynvs | TCN | Yes | | | 2026-01-30 | AJ_FxykEihmLkRZ | RM_xwGg6ycTR5mM | TCN | Yes | | | 2026-01-30 | AJ_pGHoG2R4rsGh | RM_H35AZSeWg7Rw | TCN | Yes | | | 2026-01-30 | AJ_65HkxUK6sN5C | RM_uwV8tvyThkCh | TCN | Yes | | | 2026-01-30 | AJ_prtvEq4PmyfR | RM_CszADLmSGjYG | TCN | No | Process killed with log `signal_event taking too much time` | | 2026-01-30 | AJ_Fcz7bngRtZeT | RM_NWiZ4dYXzoop | TCN | Yes | | | 2026-01-28 | AJ_83dBFfUka3X8 | RM_zhSMDQisi85x | TCN | Yes | | | 2026-01-22 | AJ_Yq94XDEtxGHy | RM_BNogVezdPxUR | Twilio | Yes | | | 2026-01-05 | AJ_zGnGbJRYqisF | RM_cf6Zih5UK2k3 | TCN | No | Process killed with log `signal_event taking too much time` | | 2026-01-05 | AJ_KRSHaoG93jh9 | RM_t2DUGMJvgwmq | TCN | Yes | | #### Case B.2 — hang (not auto-killed; requires manual intervention) | Date | Job ID | Room ID | Trunk Provider | Notes | |---|---|---|---|---| | 2026-02-04 | AJ_ak4H8xArtcWr | RM_qVJj2MFdhfo6 | TCN | Process hangs indefinitely; requires manual kill | Logs for Case B.2: ```json {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"server request to leave\" Migration Resume", "level": "WARNING", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "call_id": "call_AJ_ak4H8xArtcWr", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:07:47.178137+00:00"} {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"signal client closed: \\\"stream closed\\\"\" UnknownReason Resume", "level": "WARNING", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:07:47.178858+00:00"} {"message": "livekit::rtc_engine:773:livekit::rtc_engine - resuming connection... attempt: 0", "level": "ERROR", "name": "livekit", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:07:47.179020+00:00"} {"message": "livekit::rtc_engine::rtc_session:715:livekit::rtc_engine::rtc_session - signal_event taking too much time: Answer(SessionDescription { r#type: \"answer\", sdp: \"v=0\\r\\no=- 7106784265579657634 1770149268 IN IP4 0.0.0.0\\r\\ns=-\\r\\nt=0 0\\r\\na=msid-semantic:WMS *\\r\\na=fingerprint:sha-256 49:6D:BC:DF:5C:CC:07:18:6D:4A:9D:76:71:27:EF:74:AC:51:43:8B:9A:8A:4D:DC:B0:46:8C:79:A5:1A:FF:C9\\r\\na=ice-lite\\r\\na=extmap-allow-mixed\\r\\na=group:BUNDLE 0 1\\r\\nm=audio 9 UDP/TLS/RTP/SAVPF 63 111 0 8\\r\\nc=IN IP4 0.0.0.0\\r\\na=setup:active\\r\\na=mid:0\\r\\na=ice-ufrag:NoojDQQFKcpybZYn\\r\\na=ice-pwd:YtZlOyYwiYbhwLCAoJGjulBjJIMPwjqw\\r\\na=rtcp-mux\\r\\na=rtcp-rsize\\r\\na=rtpmap:63 red/48000/2\\r\\na=fmtp:63 111/111\\r\\na=rtpmap:111 opus/48000/2\\r\\na=fmtp:111 minptime=10;useinbandfec=1;usedtx=1\\r\\na=rtpmap:0 PCMU/8000\\r\\na=rtpmap:8 PCMA/8000\\r\\na=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid\\r\\na=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level\\r\\na=recvonly\\r\\nm=application 9 UDP/DTLS/SCTP webrtc-datachannel\\r\\nc=IN IP4 0.0.0.0\\r\\na=setup:active\\r\\na=mid:1\\r\\na=sendrecv\\r\\na=sctp-port:5000\\r\\na=max-message-size:65535\\r\\na=ice-ufrag:NoojDQQFKcpybZYn\\r\\na=ice-pwd:YtZlOyYwiYbhwLCAoJGjulBjJIMPwjqw\\r\\n\", id: 0, mid_to_track_id: {} })", "level": "ERROR", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:07:58.256015+00:00"} {"message": "livekit::rtc_engine::rtc_session:1041:livekit::rtc_engine::rtc_session - Subscriber pc state failed", "level": "ERROR", "name": "livekit","worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:08:07.207458+00:00"} {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"pc_state failed\" UnknownReason Resume", "level": "WARNING", "name": "livekit" "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:08:07.208092+00:00"} {"message": "livekit::rtc_engine:773:livekit::rtc_engine - resuming connection... attempt: 0", "level": "ERROR", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:08:07.208260+00:00"} {"message": "livekit::rtc_engine::rtc_session:870:livekit::rtc_engine::rtc_session - Wrong packet sequence while retrying: 135 > 128, 7 packets missing", "level": "WARNING", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:08:07.309522+00:00"} ---below logs are generated every 5 mins until process is killed manually--- {"message": "livekit::rtc_engine::rtc_session:1041:livekit::rtc_engine::rtc_session - Publisher pc state failed", "level": "ERROR", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:11:41.134958+00:00"} {"message": "livekit::rtc_engine:474:livekit::rtc_engine - received session close: \"pc_state failed\" UnknownReason Resume", "level": "WARNING", "name": "livekit", "worker_id": "AW_FoKfHKbkVaAq", "pid": 186848, "job_id": "AJ_ak4H8xArtcWr", "room_id": "RM_qVJj2MFdhfo6", "timestamp": "2026-02-03T20:11:41.135196+00:00"} ``` ### Proposed Solution ```python NA ``` ### Additional Context - This happens intermittently (production only so far); we don't have a deterministic repro. - Trunk providers: mostly TCN; one occurrence via Twilio. - In Analytics the SIP identity often appears twice in the same room (leave + immediate rejoin) around the time we see "Migration Resume". - In the stuck cases, `process did not exit in time, killing process` happens ~60-90s after `signal_event taking too much time`. ### Screenshots and Recordings _No response_

Reported here too

CWilson · June 16, 2026, 2:26pm

I think the reconnect hardening is not in 1.1.2 but in 1.1.8. Along with this PR in 1.6.x fix quick reconnect participant keyerror by tinalenguyen · Pull Request #5979 · livekit/agents · GitHub. Right?

Zaheer_Abbas · June 16, 2026, 2:40pm

The above PR you have linked seems a diff issue

Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub → check this issue where I had raised the same state mismatch and Subscriber pc state failed. It was mentioned here Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub that this has been fixed in 1.4.2 of livekit/agents which is 1.2.0 of rtc

CWilson · June 16, 2026, 2:45pm

I think your mapping is a little off. Agents 1.4.2 and 1.4.6 both pin livekit==1.1.2 in their pyproject.toml, and current livekit-agents 1.6.0 pins `livekit==1.1.9. Being on agents 1.4.6 leaves you on the same rtc 1.1.2 where I believe the bug lives.

It does not seem to be fixed in 1.4.6 as you have demonstrated at the top of this thread.

I think these are the needed PRs:

fix full_reconnect downgrade & don't ignore Leave messages by theomonnom · Pull Request #893 · livekit/rust-sdks · GitHub
Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub

We may also need this one that is not merged yet:

harden reconnect behaviour

harden reconnect behaviour by lukasIO · Pull Request #1148 · livekit/rust-sdks · GitHub

Zaheer_Abbas · June 16, 2026, 3:25pm

Sorry I mentioned 1.2.0 in my earlier message

Check this GH message - Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub which was made in February
And this PR linked to this issue - fix full_reconnect downgrade & don't ignore Leave messages by theomonnom · Pull Request #893 · livekit/rust-sdks · GitHub. This PR was released in 1.1.2 of rtc package
This message Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub that explicitly mentions this will be fixed in the next livekit/agents release - comment from February 13 - Release livekit-agents@1.4.2 · livekit/agents · GitHub this release was made in Feb 17

fix full_reconnect downgrade & don’t ignore Leave messages by theomonnom · Pull Request #893 · livekit/rust-sdks · GitHub
Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub
fix full_reconnect downgrade & don’t ignore Leave messages by theomonnom · Pull Request #893 · livekit/rust-sdks · GitHub
Participant Migration and State Mismatch issues in certain SIP calls · Issue #4705 · livekit/agents · GitHub

You also mentioned the issue I myself raised and the fix that was merged and released in 1.1.2 of the RTC.

I don’t think my mapping is off here.

This still seems to be an issue even after the fix that had been applied in February

CWilson · June 16, 2026, 3:28pm

Ok, maybe agents team will weigh in on that comment you made in the PR. I don’t think it is fixed in 1.4.6 but maybe I am wrong.

If it is still broken in 1.6.0 it won’t be fixed until at least 1.6.1. I will highlight this thread to the team.

If you have a code example that reproduces the issue and the steps they should follow, I can test it with 1.6.0 and see if it works.

Zaheer_Abbas · June 16, 2026, 3:57pm

The problem is this is NOT replicable and occurs intermittently only during production SIP calls. I myself haven’t been able to replicate with multiple hardened network conditions and seems to be an issue on LiveKit Cloud server side

CWilson · June 16, 2026, 3:58pm

I will ask if we can add some extra logging in 1.6.1 and see if we can get something that will help you find it.

From server logs it looks like agent applying the migration Answer too late/stale the same event the SDK logs as signal_event taking too much time.

CWilson · June 17, 2026, 11:49am

The team dug deeper into the server-side logs for the session referenced above. We believe this PR will address the issue. It appears the failure is if the resume is somehow unsuccessful. This PR addresses that.

Once this is released, I hope you can let us know if the issue is resolved for you.

Zaheer_Abbas · June 17, 2026, 2:22pm

Thanks for the update @CWilson - any eta on when this will be released and what version of livekit/agents and rtc sdks we will need to upgrade?

Any way to verify this fixes things before we do the upgrade? We were told the same last time that upgrading to 1.4.x would solve the issue

CWilson · June 17, 2026, 2:58pm

In that PR there is a test that reproduces what we saw in the server logs 100%, and once the fix is applied, it works.

See: tests/peer_connection_signaling_test.rs

The only way to know for sure is if you give it a try in your test env and verify. If you can run that PR and see that is a good way for you to verify it is fixed.

There may be a release today, but I am not sure if this will make it in time for that. I am pushing to get it added, but the changes still need to be reviewed before it can be released. I am not sure what the version numbers will be.

I can post in this thread once it is released.

Zaheer_Abbas · June 17, 2026, 4:34pm

Ok thanks - will test it once it is released using the script under tests folder. Much appreciated

Topic		Replies	Views
Publisher pc state failed and the agent resumed frequently at around the 8-minute mark Agents webrtc , livekit-cloud	15	106	May 14, 2026
Signal connection times out on the "v0 path" at agent join, forcing a fallback that adds 0.5–5s of call-setup latency Getting Started	46	146	June 23, 2026
Issue: [Python SDK] Network switch causes "stuck-in-reconnecting" and "ghost room" states in livekit-agents Agents agent-sdk-python , livekit-cloud	3	25	May 19, 2026
Livekit Outage - Production Getting Started livekit-cloud	11	138	June 2, 2026
Agent Session Connection Timeout without Redispatch 4/3 3 PM PST Agents agent-development , agent-deployment , livekit-cloud	0	26	April 4, 2026

Server-initiated migration fails to resume on agents 1.4.6 — subscriber + publisher PC fail, no recovery, process killed (expected fixed in >1.4.2 per agents #4705)

Related topics