LiveKit Cloud GStreamer Pipeline Crashing with Connection Timeout (~4 min mark)

Tom_Fanella · March 11, 2026, 4:40pm

Urgent: Intermittent CONNECTION_TIMEOUT / action=Resume causing livekitwebrtcsink publisher pipeline crashes

Hi LiveKit community,

I wanted to flag a critically urgent issue we have been experiencing with our LiveKit integration that started recently and is actively disrupting customer sessions.

Summary

We use the GStreamer LiveKit plugin, livekitwebrtcsink, in a pipeline to publish a video/audio feed into our LiveKit rooms as a WebRTC participant.

This pipeline has been consistently crashing at around 4 minutes into a participant’s connection.

From the logs, we can see the pipeline receives a connection timeout message with action=Resume, which appears to signal that it should attempt to reconnect. However, the GStreamer LiveKit client does not seem to handle this reconnection signal and instead stops entirely.

Key details

The affected participant only publishes video and audio. It does not subscribe to anything.
It connects to the LiveKit room using an API key/secret, not a token.
The issue appeared simultaneously across all of our environments:
- production
- staging
- local machines
There were no changes on our side when this began. No deploys, updates, or config changes.
We have been connecting to LiveKit Cloud server version v1.9.11 in staging for over a month, confirmed via logs.
This issue previously appeared and then stopped on its own:
- first occurrences started around March 3, 2026
- they stopped around 2026-03-04 21:30:00 UTC
New occurrences started again on 2026-03-11 13:30:00 UTC and are currently ongoing.

Because the behavior started and stopped suddenly without changes on our side, and affected all environments at once, this strongly suggests there may be an infrastructure-side factor involved.

Possible connection to UDP to TCP fallback / reconnect handling

We suspect this may be related to the UDP to TCP fallback / reconnect behavior introduced around LiveKit v1.8.5, since that logic can produce this kind of timeout message.

Specifically, we are seeing a LeaveRequest with:

reason: ConnectionTimeout
action: Resume
region candidates included in the payload

But instead of resuming, the GStreamer pipeline stops.

Example session

One affected session from a user report:

room-gst-producer receiving Participant left (CONNECTION_TIMEOUT)

Session RM_dDBeHJhEr9ke

Relevant logs

A few seconds before the crash:

0:04:01.527049115   436 0x7aa948007780 DEBUG   webrtc-livekit-signaller net/webrtc/src/livekit_signaller/imp.rs:316:gstrswebrtc::livekit_signaller::imp::Signaller::on_signal_event::{{closure}}:<GstLiveKitWebRTCSinkSignaller@0x5cb778f8a250> Connection quality: ConnectionQualityUpdate { updates: [ConnectionQualityInfo { participant_sid: "PA_5NLgLKCgQJMm", quality: Lost, score: 1.252 }] }

At the point where the GStreamer pipeline crashes:

0:04:04.970865139   436 0x7aa948007780 DEBUG   webrtc-livekit-signaller net/webrtc/src/livekit_signaller/imp.rs:337:gstrswebrtc::livekit_signaller::imp::Signaller::on_signal_event::{{closure}}:<GstLiveKitWebRTCSinkSignaller@0x5cb778f8a250> Leave: LeaveRequest { can_reconnect: false, reason: ConnectionTimeout, action: Resume, regions: Some(RegionSettings { regions: [RegionInfo { region: "osaopaulo1b", url: "https://teleo-staging-bk1tdl1w.osaopaulo1b.production.livekit.cloud", distance: 89412 }, RegionInfo { region: "ojohannesburg1a", url: "https://teleo-staging-bk1tdl1w.ojohannesburg1a.production.livekit.cloud", distance: 7503272 }, RegionInfo { region: "oashburn1b", url: "https://teleo-staging-bk1tdl1w.oashburn1b.production.livekit.cloud", distance: 7603430 }, RegionInfo { region: "ochicago1b", url: "https://teleo-staging-bk1tdl1w.ochicago1b.production.livekit.cloud", distance: 8335785 }, RegionInfo { region: "omarseille1b", url: "https://teleo-staging-bk1tdl1w.omarseille1b.production.livekit.cloud", distance: 9152002 }, RegionInfo { region: "ophoenix1b", url: "https://teleo-staging-bk1tdl1w.ophoenix1b.production.livekit.cloud", distance: 9312041 }] }) }

What we have done so far

As a short-term mitigation, we have implemented a pipeline restart when this specific failure occurs.

That helps reduce impact, but it is not a real fix and is still causing disruption for customers.

Our questions

We would really appreciate help understanding:

Whether there is anything on the LiveKit infrastructure side that could be contributing to these ConnectionTimeout / action=Resume events
Whether livekitwebrtcsink is expected to handle this signal differently
Whether there were any infrastructure or transport-related changes around:
- March 3 to March 4, 2026
- March 11, 2026

Tom_Fanella · March 15, 2026, 4:59am

Quick update: we have refactored our implementation to use the LiveKit gstreamer publisher instead of the gstreamer LiveKit Rust plugin, since it is officially maintained by LiveKit and does not disconnect after 4 minutes.

sbarber · June 3, 2026, 6:44pm

We are running on ARM64 Ubuntu 24.04 (Orin NX) and see the same issue.

We bisected it and the issue was introduced in livekit/livekit-server:v1.9.12
The 4 min drop-out does not occur with livekit/livekit-server:v1.9.11

We built and are running with gst-plugins-rs v0.12.4 / gstreamer-1.24.2

Nate_Munger · June 18, 2026, 7:59pm

We also experienced this exact issue. The issue was between the aforementioned livekit versions, ultimately introduced via transitive dependency pion in AlwaysNegotiateDataChannels configuration flag · pion/webrtc@332878f · GitHub

This change incorrectly handles bundle-only on port zero, there is a fix in Pion: Fix bundle-only data channel startup (#3447) · pion/webrtc@ec1dd73 · GitHub which at the time of this comment is unreleased.

It is likely that therefore once pion rereleases and LiveKit bumps to the new Pion version, this issue will go away for anybody else watching this thread who would like to continue using gstreamer.

In the short term, for those who build from source, I can share a patch that can force gstreamer to use a nonzero port which will fix this until the upstream fix lands. This bites gstreamer because it by default uses port 0 whereas LiveKit’s native clients do not.

Topic		Replies	Views
Publisher connection times out mid-call, then retries ~11×/sec until hangup (agent goes silent) Agents python , webrtc , sip-other-provider	11	72	June 8, 2026
Signal connection times out on the "v0 path" at agent join, forcing a fallback that adds 0.5–5s of call-setup latency Getting Started	46	146	June 23, 2026
Publisher pc state failed and the agent resumed frequently at around the 8-minute mark Agents webrtc , livekit-cloud	15	106	May 14, 2026
Server-initiated migration fails to resume on agents 1.4.6 — subscriber + publisher PC fail, no recovery, process killed (expected fixed in >1.4.2 per agents #4705) Getting Started	12	36	June 17, 2026
Livekit Outage - Production Getting Started livekit-cloud	11	138	June 2, 2026