Missing webhook requests

Hello,

In the recent couple of days, we have discovered strange webhook behaviour from LiveKit. We set up our LiveKit cloud to send us all the room events in the API request so we can digest them and decide what to do with the users in our room.

We particularly rely on the “participant_joined” and “participant_left” events that tell us whether we should change the state of the participant on our side and kick off background workers to join or kick out users from our room.

What we noticed recently is that some of the requests (events) are not coming through. I tweaked the code to save every incoming request in our database and compare it with the LiveKit dashboard. Here are the results for one of the sessions:

LiveKit Dashboard:

We can clearly see two users joining the call. However, our database log shows only three events received per user from LiveKit:

Track unpublished (2026-02-17 18:11:47.745356000 +0000)

{“id”=>“EV_jT6Y4xyzXsjM”, “room”=>{“sid”=>“RM_nHPUfmxXrHry”, “name”=>“974b6e04-0617-4ae7-90f9-7202e5e998e9”}, “event”=>“track_unpublished”, “track”=>{“mid”=>“1”, “sid”=>“TR_AMNTtQxevMRbsr”, “codecs”=>[{“cid”=>“a769975c-cf87-480a-b18a-4c4732cfacae”, “mid”=>“1”, “mimeType”=>“audio/red”}], “source”=>“MICROPHONE”, “stream”=>“camera”, “version”=>{“unixMicro”=>“1771351848526591”}, “mimeType”=>“audio/red”, “audioFeatures”=>[“TF_AUTO_GAIN_CONTROL”, “TF_ECHO_CANCELLATION”, “TF_NOISE_SUPPRESSION”, “TF_ENHANCED_NOISE_CANCELLATION”], “backupCodecPolicy”=>“SIMULCAST”}, “createdAt”=>“1771351907”, “participant”=>{“sid”=>“PA_eU7nwP2s3FcN”, “identity”=>“ab7d6065-44a4-4385-b225-b95a490bf062”}}

Track unpublished (2026-02-17 18:11:47.832157000 +0000)

{“id”=>“EV_FtXjz39yTTYT”, “room”=>{“sid”=>“RM_nHPUfmxXrHry”, “name”=>“974b6e04-0617-4ae7-90f9-7202e5e998e9”}, “event”=>“track_unpublished”, “track”=>{“mid”=>“2”, “sid”=>“TR_VCdefG3gQEgBdD”, “type”=>“VIDEO”, “width”=>960, “codecs”=>[{“cid”=>“cce04437-8b43-4b1d-8626-d8ab400f8d51”, “mid”=>“2”, “layers”=>[{“rid”=>“q”, “ssrc”=>3327131374, “width”=>384, “height”=>216, “bitrate”=>180000, “repairSsrc”=>2873069519}, {“rid”=>“h”, “ssrc”=>2135938973, “width”=>960, “height”=>540, “bitrate”=>800000, “quality”=>“MEDIUM”, “repairSsrc”=>478939091, “spatialLayer”=>1}], “mimeType”=>“video/VP8”, “videoLayerMode”=>“ONE_SPATIAL_LAYER_PER_STREAM”}], “height”=>540, “layers”=>[{“rid”=>“q”, “ssrc”=>3327131374, “width”=>384, “height”=>216, “bitrate”=>180000, “repairSsrc”=>2873069519}, {“rid”=>“h”, “ssrc”=>2135938973, “width”=>960, “height”=>540, “bitrate”=>800000, “quality”=>“MEDIUM”, “repairSsrc”=>478939091, “spatialLayer”=>1}], “source”=>“CAMERA”, “stream”=>“camera”, “version”=>{“unixMicro”=>“1771351848863170”}, “mimeType”=>“video/VP8”, “simulcast”=>true}, “createdAt”=>“1771351907”, “participant”=>{“sid”=>“PA_eU7nwP2s3FcN”, “identity”=>“ab7d6065-44a4-4385-b225-b95a490bf062”}}

Disconnected state (2026-02-17 18:11:47.882257000 +0000)

{“id”=>“EV_gjAfUWxDwFcg”, “room”=>{“sid”=>“RM_nHPUfmxXrHry”, “name”=>“974b6e04-0617-4ae7-90f9-7202e5e998e9”, “version”=>{“unixMicro”=>“1771351846497978”}, “creationTime”=>“1771351846”, “emptyTimeout”=>300, “enabledCodecs”=>[{“mime”=>“video/H264”}, {“mime”=>“video/VP8”}, {“mime”=>“video/VP9”}, {“mime”=>“video/AV1”}, {“mime”=>“audio/red”}, {“mime”=>“audio/opus”}, {“mime”=>“audio/PCMU”}, {“mime”=>“audio/PCMA”}], “creationTimeMs”=>“1771351846495”, “departureTimeout”=>20}, “event”=>“participant_left”, “createdAt”=>“1771351907”, “participant”=>{“sid”=>“PA_eU7nwP2s3FcN”, “name”=>“gilberth obando”, “state”=>“DISCONNECTED”, “region”=>“oashburn1b”, “version”=>10, “identity”=>“ab7d6065-44a4-4385-b225-b95a490bf062”, “joinedAt”=>“1771351847”, “joinedAtMs”=>“1771351847544”, “permission”=>{“canPublish”=>true, “canSubscribe”=>true, “canPublishData”=>true}, “isPublisher”=>true, “d
isconnectReason”=>“CLIENT_INITIATED”}}

Do you know how I can debug it further to understand why requests are not coming through? Any help would be appreciated.

Looking at your session RM_nHPUfmxXrHry, I see in our server logs reports of “failed to send webhook” . Looks like the request tried 3 times but experienced a timeout.

Mostly commonly this is caused by the webhooks not being handled in a timely manner, can you take a look through Best practices for managing webhook event streams | LiveKit and make sure you are following the guidance in your webhook receiver?

Hi @darryncampbell,

This gives me something to hook on to. Thanks. I looked into our server performance metrics, and indeed, there are periods when we’re timing out. I’ll read the document you provided to improve the implementation on our side.

For now, I’ll scale up our application because I can see there are at least 5-10% requests that take ~10 seconds to process.

Can you, in the meantime, do the following:

  • Please let me know if there is an imposed timeout on your side before kill the request? Or perhaps you’re waiting for server timeout?
  • Provide me with a full log of these failing requests? What was the server response and the timestamp?

Thank you very much!

I don’t see it stated publicly, but it looks like a 3 second timeout, and we retry 3 times. This is not configurable.

The logs indicate that we timeout waiting for headers, for example participant_joined at 181105.188 on Feb 17th (UTC)

Thank you again. We identified performance issues on our side (database-related), and we will be working to fix it and reduce the average response time.

2 Likes

@darryncampbell I am coming back to you after several days to check several things.

We have improved the way we’re processing the incoming webhook requests by queueing them in the background job processor as per the documentation. This ensures a server can return a response in an acceptable time.

It’s been a couple of days of relative quietness, and we didn’t encounter issues on our side until yesterday. One of the users reported a sudden dropout from the session and when I checked the symptoms, they were the same. In the session (ID: RM_CrPb9zurMzsz), it appears that the webhook request timed out on our side for the user (ID: PA_mW55zDen2iz3).

It’s very likely that our server couldn’t respond in the specified timeout threshold again but can you check your logs and confirm that?

This leads me to the question of what the alternative ways of ensuring the requests always reach our server:

  • Is there a way to configure the timeout or number of retries?
  • Can I hit the API endpoint on the LiveKit side myself to check who the participants of the given room are? I see the Ruby SDK, where I could fetch participants using the method client.list_participants(room: name) but I am not entirely sure how to initialize the object:
  • client = LiveKit::RoomServiceClient.new('https://my.livekit.instance',
        api_key: 'yourkey', api_secret: 'yoursecret')
    

What is the https://my.livekit.instance. We don’t self-host a LiveKit instance; we use a cloud subscription.

Thank you for all your help.

I took a quick look at this session you shared, and we timed out waiting for a response from your server Client.Timeout exceeded

Just to restate the guidance that was given above. Your webhook receiver should just collect the webhook event, put it in a queue, and return 200OK as quickly as possible. Trying to in-band process the webhook will cause issues as your user base grows.

That would be your LiveKit cloud URL, https://<your subdomain>.livekit.cloud. It’s also listed here: Sign in | LiveKit Cloud

Is there a way to configure the timeout or number of retries?

No, sorry, it is not configurable

Can I hit the API endpoint on the LiveKit side myself to check who the participants of the given room are?

Technically yes, that should work, but if I understand what you are saying, it would essentially be a polling architecture, rather than relying on webhooks to push updates, so it would be better to fix the webhook issues you are experiencing.