Room Composite API returning `context deadline exceed` and 404 for a room the agent is actively connected to

Posting for help. start_room_composite_egress returns not_found (“requested room does not exist”) on a room the agent is connected to and actively producing/consuming audio in. The agent runs the call end-to-end normally (audio flows, transcripts produced, TTS played out, voicemail detection fires, call ends cleanly) — only the egress recording fails.

Egress request

api.RoomCompositeEgressRequest(
    room_name=<the room my agent is connected to via job dispatch>,
    audio_only=True,
    audio_mixing=api.AudioMixing.DUAL_CHANNEL_AGENT,
    file_outputs=[...],
    webhooks=[...],
)

Client SDKs: livekit==1.1.2, livekit-agents==1.4.6 (Python).

The room_name passed to egress is the same name LiveKit gave us in the SDK’s received job request event — same string, no whitespace or encoding surprises.

Retry timeline (UTC, single call)

The call ran in room ID RM_QQeiEgbwzhkb throughout. The agent was inside that room and audio was flowing through it during every one of these retries.

Timestamp Event Detail
12:05:53.159 livekit.agents “received job request” room_id=RM_QQeiEgbwzhkb, dispatch_id=AD_7ApXzCanvBAu, job_id=AJ_A4Z2GYK8rHKE
12:05:53.694 First start_room_composite_egress request begins
12:05:54.157 Retry #1: TwirpError(code=unknown, message="context deadline exceeded", status=500) request took ~463 ms server-side, then timed out
12:05:58.361 Retry #2: TwirpError(code=not_found, message="requested room does not exist", status=404) from here on, every attempt returns the same 404

The transition between retry #1 (status=500 "context deadline exceeded") and retries #2-5 (status=404 "requested room does not exist") looks meaningful — like the first attempt hit some server-side timeout, after which subsequent name-based lookups settled into “not found”.

Additionally — two rooms with the same name in the dashboard

While digging through the LiveKit dashboard to confirm RM_QQeiEgbwzhkb was healthy, I noticed there are actually two distinct room IDs sharing the same room name:

Room ID Status Duration Participants Agent connection
RM_8EGjd4UNE5Xw Active, “In progress” 43 min and counting 0 Agent never connected to this; not logged anywhere on the agent side
RM_QQeiEgbwzhkb Active 43min and counting, actual duration ~30secs Agent + 1 SIP participant The room our agent actually ran the call in

RM_8EGjd4UNE5Xw was created roughly 5 seconds before RM_QQeiEgbwzhkb, has zero participants throughout, and has been sitting “In progress” for 43 minutes with no participant ever joining. Both the rooms are being shown as active for the last 43 minutes. I know that LiveKit dashboard is

I don’t know whether this is related to the egress 404, but if start_room_composite_egress resolves by room_name and there are two same-named rooms in some active/empty state, that ambiguity seems like a plausible explanation for the first attempt’s context deadline exceeded and the subsequent not_found responses.

Please help check what exactly tool place here.

Hi Zaheer, thanks for that analysis. Presumably this was a one-off, isolated incident?

I wondered if this was somehow related to yesterday’s incident, LiveKit Status - Elevated Reports of Participant Connection Latency and Errors In US East Region. Looks like your room was created about 1200 UTC whereas the incident started at 1500 UTC, so at first glance they don’t seem related.

If I look at the server logs for your room creation, I see an error synchronizing the room across nodes, which probably led to the duplicate room ID for the same name that you found, which in turn likely led to the Egress start failure.

Looking at the trend of these synchronization errors across all projects (not just yours), I do see occasional blips starting around 1200 UTC, then a spike around 1500, then back down to 0 after yesterday’s incident was resolved. That does lead me to believe the two are related.

Got it - thanks for the update. Yeah it was one such case I found. Didn’t notice other cases. Let us know if you find more info internally

Also hope we don’t get billed for 120mins (80 mins * 2 rooms)? as the call itself was 30 sec for a single room

Looks like RM_QQeiEgbwzhkb now shows accurate start / end times, so no issues there.

RM_8EGjd4UNE5Xw is still showing in-progress. Did you try deleting it?

Either way, you shouldn’t be billed for this and the system should automatically detect this anomaly. If you do encounter billing issues, please be aware I can’t address them on this forum, but they would need to be handled by our support team.

RM_8EGjd4UNE5Xw is still showing in-progress. Did you try deleting it?

I am unable to see this on the dashboard as of now.

@darryncampbell - I am seeing this for one more call today with same exact behavior

Room ID - RM_WeV6fCgbcPeE

First request with 500 error code

raise TwirpError(\nlivekit.api.twirp_client.TwirpError: TwirpError(code=unknown, message=context deadline exceeded, status=500)

All retries returned 404

TwirpError(\nlivekit.api.twirp_client.TwirpError: TwirpError(code=not_found, message=requested room does not exist, status=404)

Please check on this

RM_WeV6fCgbcPeE is here: https://cloud.livekit.io/projects/p_3tqm7ro6kbs/sessions/RM_WeV6fCgbcPeE

@darryncampbell - Thanks for the response.

I was able to find RM_WeV6fCgbcPeE but I am unable to find the Room ID - RM_8EGjd4UNE5Xw in response to your first question

RM_8EGjd4UNE5Xw is still showing in-progress. Did you try deleting it?

My follow up question relaetd to RM_WeV6fCgbcPeE

Room ID - RM_WeV6fCgbcPeE

First request with 500 error code

raise TwirpError(\nlivekit.api.twirp_client.TwirpError: TwirpError(code=unknown, message=context deadline exceeded, status=500)

All retries returned 404

TwirpError(\nlivekit.api.twirp_client.TwirpError: TwirpError(code=not_found, message=requested room does not exist, status=404)

Please check on this

We ran into egress issue for that RM_WeV6fCgbcPeE room and got a 500 error from the egress API which is what I wanted to raise.

You said here:

That you were speaking with @Milos_Pesic about that issue but I don’t see that conversation publicly. For RM_WeV6fCgbcPeE, I see server logs related to “room synchronization”, which sounds similar to the issue referenced above

I haven’t received a response from Milos, I had pinged community post link on slack to him over DM.

What is the best way to raise these failure cases with LiveKit team?

Egress Recording is really important for us. Is there a way to debug this and anything that we can do to fix this transient issues that we keep seeing in Egress?

Three issues that I have reported:

  1. 500 error in RoomComposite Egress request with context deadline exceeded. Even retries fail with 404 for these cases - This current post
  2. Egress unknowingly closes the recording even though participants and room is active- This particular post
  3. The egress expired token post

3 is on us and we are fixing it. Need your help with 1 and 2, and want information if we can do something to avoid these issues.

Appreciate the help :folded_hands:

What is the best way to raise these failure cases with LiveKit team?

You are on our Scale plan so you are eligible for email support if needed (though myself and CWilson try to help as many people in community as possible, regardless of plan). Milos usually fixes things before they get raised :lk-launch: , but today we had an incident which has been distracting.

but today we had an incident which has been distracting

I know @darryncampbell - I was the one reported that incident here - Slack

Ack on raising issue over email. I will have our team check if we can get on a better SLA with Enterprise plan.

Appreciate your and CWilson’s help :folded_hands:

@darryncampbell — this same start_room_composite_egress → not_found “requested room does not exist” on a live room recurred yesterday (2026-06-16), and this time it hit two independent calls in the same ~18s window, so flagging in case there was another cross-node room-sync blip.

Room IDs:

  1. RM_7Dm2S6ZZymQ2
  2. RM_rED2zGpSDdsa

This keeps recurring outside declared incidents, and egress recording is business-critical for us. Can you please help investigate and add the required fix for this?