Autoscaling Strategy for Self-Hosted LiveKit Egress Workers in Real-Time Streaming Platform

Hi Team,

We are building a real-time streaming and transcription platform using a self-hosted LiveKit stack. Our architecture currently includes:

  • LiveKit Server (self-hosted)

  • Egress service for recording/streaming outputs

  • Transcription agent for real-time speech-to-text

  • Redis for coordination

  • Containerized deployment (ECS/Fargate / Kubernetes style cluster)

Current Problem

In our current setup:

  • Each egress request is assigned to a single egress worker.

  • If no worker is available, the request stays pending and eventually times out.

  • During high traffic (many rooms starting recordings simultaneously), we see egress capacity bottlenecks.

So effectively:

Room Recording Request
      ↓
LiveKit Server
      ↓
Egress Service
      ↓
Available Worker ?
   YES β†’ Start recording
   NO  β†’ Request timeout

Goal

We want to design a reliable autoscaling strategy so that:

  1. Egress workers scale automatically based on demand.

  2. Recording requests do not timeout during bursts.

  3. Workers scale down when idle to save cost.

Questions

  1. What is the recommended autoscaling strategy for self-hosted LiveKit egress clusters?

  2. Should autoscaling be based on:

    • CPU usage

    • Memory usage

    • Number of pending egress jobs

    • Active room recordings?

  3. Is there a way to queue egress jobs when workers are unavailable instead of failing immediately?

  4. Has anyone implemented horizontal autoscaling for egress workers successfully (Kubernetes / ECS)?

  5. Any recommended metrics to monitor for egress scaling (e.g., active pipelines, ffmpeg processes, Redis state)?

1 Like

Use the metrics exposed by egress to identify how many are running - its available as livekit_egress_requests, and then autoscale using this metric. For the client, you could have a retry based logic to wait and keep retrying until the egress request gets accepted.

You should benchmark how many egress requests a single instance handle, and accordingly use that as the threshold for autoscaling.

Hi @Raghu_Udiyar ,

Is there any SDK-level support in LiveKit to implement retry-based logic for starting egress until a worker becomes available?

Currently our setup is:

  • LiveKit server (self-hosted)

  • LiveKit Egress running on AWS ECS EC2 launch mode

  • EC2 instances behind an Auto Scaling Group

The issue we are seeing is:

When a new egress request arrives and no egress worker is immediately available, the request eventually times out. If the Auto Scaling Group launches a new EC2 instance to run the egress worker, it takes significant time (instance startup + container startup), so the original request fails before capacity becomes available.

What we are trying to understand:

  1. Does the LiveKit SDK provide built-in retry logic for egress start requests until a worker becomes available?

  2. Is there any recommended pattern for handling this scenario in production?

  3. Apart from keeping idle standby workers, are there any other strategies used by the community to handle burst egress workloads?

Our current approach is considering implementing application-level retry with backoff, but wanted to check if there is a recommended LiveKit-native solution.

Any suggestions or production patterns would be greatly appreciated.

Thanks!

I don’t think the SDK provides it, but you should be able to wrap the response and retry - the application level retry as you mentioned. Otherwise, keeping sufficient headroom, and autoscaling based on the egress metrics is the way to go - thats what we use for scaling egress. There could be optimisation on EC2 boot time as well.