LiveKit Cloud GStreamer Pipeline Crashing with Connection Timeout (~4 min mark)

Urgent: Intermittent CONNECTION_TIMEOUT / action=Resume causing livekitwebrtcsink publisher pipeline crashes

Hi LiveKit community,

I wanted to flag a critically urgent issue we have been experiencing with our LiveKit integration that started recently and is actively disrupting customer sessions.

Summary

We use the GStreamer LiveKit plugin, livekitwebrtcsink, in a pipeline to publish a video/audio feed into our LiveKit rooms as a WebRTC participant.

This pipeline has been consistently crashing at around 4 minutes into a participant’s connection.

From the logs, we can see the pipeline receives a connection timeout message with action=Resume, which appears to signal that it should attempt to reconnect. However, the GStreamer LiveKit client does not seem to handle this reconnection signal and instead stops entirely.

Key details

  • The affected participant only publishes video and audio. It does not subscribe to anything.

  • It connects to the LiveKit room using an API key/secret, not a token.

  • The issue appeared simultaneously across all of our environments:

    • production

    • staging

    • local machines

  • There were no changes on our side when this began. No deploys, updates, or config changes.

  • We have been connecting to LiveKit Cloud server version v1.9.11 in staging for over a month, confirmed via logs.

  • This issue previously appeared and then stopped on its own:

    • first occurrences started around March 3, 2026

    • they stopped around 2026-03-04 21:30:00 UTC

  • New occurrences started again on 2026-03-11 13:30:00 UTC and are currently ongoing.

Because the behavior started and stopped suddenly without changes on our side, and affected all environments at once, this strongly suggests there may be an infrastructure-side factor involved.

Possible connection to UDP to TCP fallback / reconnect handling

We suspect this may be related to the UDP to TCP fallback / reconnect behavior introduced around LiveKit v1.8.5, since that logic can produce this kind of timeout message.

Specifically, we are seeing a LeaveRequest with:

  • reason: ConnectionTimeout

  • action: Resume

  • region candidates included in the payload

But instead of resuming, the GStreamer pipeline stops.

Example session

One affected session from a user report:

room-gst-producer receiving Participant left (CONNECTION_TIMEOUT)

Session RM_dDBeHJhEr9ke

Relevant logs

A few seconds before the crash:

0:04:01.527049115   436 0x7aa948007780 DEBUG   webrtc-livekit-signaller net/webrtc/src/livekit_signaller/imp.rs:316:gstrswebrtc::livekit_signaller::imp::Signaller::on_signal_event::{{closure}}:<GstLiveKitWebRTCSinkSignaller@0x5cb778f8a250> Connection quality: ConnectionQualityUpdate { updates: [ConnectionQualityInfo { participant_sid: "PA_5NLgLKCgQJMm", quality: Lost, score: 1.252 }] }

At the point where the GStreamer pipeline crashes:

0:04:04.970865139   436 0x7aa948007780 DEBUG   webrtc-livekit-signaller net/webrtc/src/livekit_signaller/imp.rs:337:gstrswebrtc::livekit_signaller::imp::Signaller::on_signal_event::{{closure}}:<GstLiveKitWebRTCSinkSignaller@0x5cb778f8a250> Leave: LeaveRequest { can_reconnect: false, reason: ConnectionTimeout, action: Resume, regions: Some(RegionSettings { regions: [RegionInfo { region: "osaopaulo1b", url: "https://teleo-staging-bk1tdl1w.osaopaulo1b.production.livekit.cloud", distance: 89412 }, RegionInfo { region: "ojohannesburg1a", url: "https://teleo-staging-bk1tdl1w.ojohannesburg1a.production.livekit.cloud", distance: 7503272 }, RegionInfo { region: "oashburn1b", url: "https://teleo-staging-bk1tdl1w.oashburn1b.production.livekit.cloud", distance: 7603430 }, RegionInfo { region: "ochicago1b", url: "https://teleo-staging-bk1tdl1w.ochicago1b.production.livekit.cloud", distance: 8335785 }, RegionInfo { region: "omarseille1b", url: "https://teleo-staging-bk1tdl1w.omarseille1b.production.livekit.cloud", distance: 9152002 }, RegionInfo { region: "ophoenix1b", url: "https://teleo-staging-bk1tdl1w.ophoenix1b.production.livekit.cloud", distance: 9312041 }] }) }

What we have done so far

As a short-term mitigation, we have implemented a pipeline restart when this specific failure occurs.

That helps reduce impact, but it is not a real fix and is still causing disruption for customers.

Our questions

We would really appreciate help understanding:

  1. Whether there is anything on the LiveKit infrastructure side that could be contributing to these ConnectionTimeout / action=Resume events

  2. Whether livekitwebrtcsink is expected to handle this signal differently

  3. Whether there were any infrastructure or transport-related changes around:

    • March 3 to March 4, 2026

    • March 11, 2026

Quick update: we have refactored our implementation to use the LiveKit gstreamer publisher instead of the gstreamer LiveKit Rust plugin, since it is officially maintained by LiveKit and does not disconnect after 4 minutes.