Start/Stop recording Latency in LiveKit Rooms using Egress

Recording start takes ~3 seconds and stop takes ~20–23 seconds from the moment the moderator clicks the button to the moment the UI updates and the voice announcement plays.

Environment

Web client: Next.js + @livekit/components-react

Recording: startRoomCompositeEgress → S3

Full Process Breakdown

START (~3 seconds)

Moderator clicks “Start Recording”│├─ fetch /api/record/start│ ├─ POST to external API (get user S3 folder) ~500ms–1s│ ├─ egressClient.listEgress() (duplicate check) ~300ms│ ├─ egressClient.startRoomCompositeEgress() ~1–2s ← LiveKit spins up compositor│ ├─ redis.setex() store start time ~50ms│ └─ roomService.sendData(RECORDING_STATUS_CHANGED, true)│├─ All clients receive data message → UI updates instantly└─ LiveKit fires RecordingStatusChanged ~1–2s later (redundant)

Start is already well optimised. The ~3s is mostly LiveKit’s own compositor spin-up time — nothing we can cut.

STOP (~20–23 seconds) — the real problem

This is where the delay was coming from. There were three separate layers each adding time:

Layer 1 — Our stop API route (artificial +2s)

Moderator clicks “Stop Recording”│└─ fetch /api/record/stop├─ egressClient.listEgress() ~300ms├─ await setTimeout(2000ms) ← ARTIFICIAL DELAY (removed)├─ egressClient.stopEgress() ~500ms└─ roomService.sendData(recording-stopped)

We had a 2 second intentional delay before calling stopEgress() to “allow transcript association”. This was unnecessary — transcript association happens in the webhook, not here.

Layer 2 — LiveKit egress winds down (unavoidable ~15–20s)

stopEgress() called│└─ LiveKit finalises the MP4 file├─ Flushes video buffer├─ Closes the egress compositor├─ Uploads final MP4 to S3└─ Fires egress_ended webhook ← only after S3 upload complete

This is entirely on LiveKit’s infrastructure side. We cannot speed this up.

Layer 3 — Our webhook pipeline (artificial +8s)

egress_ended webhook received│├─ await setTimeout(3000ms) ← waiting for transcripts (reduced to 500ms)├─ fetch transcript from Redis├─ store recording segment│└─ maybeFinalizeMeeting()├─ await setTimeout(5000ms) ← finalization guard (reduced to 1s)├─ listParticipants()├─ listEgress()└─ sendSessionRecordingsToExternalAPI()

Two more artificial delays totalling 8 seconds inside the webhook processing.

Layer 4 — UI was waiting for the wrong event

stopEgress() called│└─ RecordingStatusChanged event fires on all clients└─ ONLY fires after LiveKit fully winds down egress (~15–20s)└─ RecordingIndicator used useIsRecording() hook└─ hook is bound to RecordingStatusChanged└─ UI updates only after ~20s ← ROOT CAUSE

This was the biggest problem. useIsRecording() from @livekit/components-react only reacts to LiveKit’s native RecordingStatusChanged event — which LiveKit fires only after the egress fully winds down and the MP4 is finalised on S3. Our sendData message was being sent but RecordingIndicator was completely ignoring it because it was reading from useIsRecording() internally.

Total Artificial Delay Breakdown

Source Delay Statusstop/route.ts — setTimeout before stopEgress +2s Removedlivekit-webhook — setTimeout before transcript fetch +3s Reduced to 500mslivekit-webhook — setTimeout in maybeFinalizeMeeting +5s Reduced to 1sRecordingIndicator using useIsRecording() +15–20s Fixed — now prop-drivenTotal removed ~25s

Questions

Is RecordingStatusChanged intentionally fired only after egress file finalisation, or is this a side effect?

Is there a recommended way to get an immediate acknowledgement signal when stopEgress() is accepted — separate from when the file is ready?

“Our current workaround uses optimistic state via sendData — but if stopEgress() fails silently, the UI shows stopped while recording is still active. We want to know if there is a safer official approach.”

in this have we metioned about transcripts also do we need to?

I am not sure what I am reading there. Do you have a tldr for it?

Recording Stop Delay Explanation:

When a user clicks “Stop Recording,” the process takes approximately 23 seconds because the egress (recording service running on LiveKit server) needs to finalize the video file. During this time, the egress must: flush all buffered video/audio frames, encode the final segments, close the MP4 file properly, and upload it to S3 storage. This is not instantaneous - it’s actual video processing work happening on the server.

The Problem with Immediate UI Updates:

If we show “Recording Stopped” in the UI immediately (before the egress actually stops), we create a critical race condition. The backend egress is still active and hasn’t released the room yet. If the user clicks “Start Recording” again during this window, the API call will fail with a 409 Conflict error because LiveKit detects an existing active egress for that room. The system prevents multiple simultaneous recordings of the same room to avoid conflicts and corrupted files.

Why We Must Wait:

We must wait for the

egress_ended

webhook event from LiveKit before allowing another recording to start. This ensures the previous egress has completely finished processing, the file is safely uploaded to S3, and the room is ready for a new recording session. Showing a loading state during these 23 seconds, while not ideal for UX, prevents users from encountering confusing errors and ensures data integrity.

The StopEgress call is async, so the actual recording takes time to stop (as mentioned in your comment) - if you want to make it deterministic, you can use ListEgress and check if the corresponding egress status has changed to stop or not - either do it when the user presses stop (and let the user know) - or handle the 409 conflict when the user presess start, and wait for the egress to stop.

Following @Raghu_Udiyar’s poll-ListEgress suggestion, the three questions have clean answers:

  • Q1 (RecordingStatusChanged timing). It’s the room’s recording flag flipping, gated on whether any egress is still active for the room. Matches your observation: the flag only goes false after the last egress reaches EGRESS_COMPLETE. Design, not a side effect.

  • Q2 (immediate ack on stop). You already have it. stopEgress() returns EgressInfo with status flipping from EGRESS_ACTIVE to EGRESS_ENDING within ~500ms. That response is the ack; trust it for UI state. EGRESS_COMPLETE is the separate “file ready” signal.

  • Q3 (safer than optimistic sendData). Treat the stopEgress() response as authoritative. If status comes back EGRESS_ENDING, flip UI to “stopping” and poll ListEgress every 1 to 2s until EGRESS_COMPLETE or EGRESS_FAILED. Status staying ACTIVE or returning FAILED after the call surfaces the silent-failure case sendData can’t catch.