Best way to scale LiveKit Egress for recordings (private meetings + livestream platform)?

Hi everyone,

I’m building a live streaming + private meeting platform and looking for some architecture advice around scaling LiveKit egress.

Current stack

  • Angular frontend

  • Flutter Mobile

  • .NET backend

  • Self-hosted LiveKit server running on an Ubuntu EC2 instance

  • Redis for coordination

  • AWS infrastructure (EC2 / containerized services)

Recording use cases

The platform supports two types of sessions:

  1. Private meetings → using RoomComposite Egress

  2. Livestream classes → using Participant Egress (record only instructor)

Recording is optional and triggered by the instructor, so demand can vary a lot. For example, multiple instructors might start recordings at the same time.

The problem I’m trying to solve

Right now I haven’t implemented autoscaling yet, and I’m trying to design the right architecture before moving forward.

My concern is how to handle situations where many recordings start at once. Since egress workers handle recording jobs, I want to avoid cases where requests fail or timeout due to lack of capacity.

What I’m trying to achieve

Ideally the system should:

  • Scale egress workers automatically when recording demand increases

  • Scale down when idle to save infrastructure cost

  • Handle bursts where many recordings start simultaneously

  • Support both RoomComposite and Participant egress jobs efficiently

Questions

For anyone running LiveKit in production:

  1. What is the recommended way to scale LiveKit egress workers?

  2. Should scaling be based on:

    • CPU usage

    • number of active recordings

    • pending egress jobs

    • pipelines per worker

  3. Has anyone implemented autoscaling egress workers successfully on AWS (ECS / EC2 / Kubernetes)?

  4. If LiveKit server load increases (many rooms), how do you typically scale the LiveKit media servers alongside egress workers?

I’m still in the architecture design stage, so any suggestions, reference architectures, or lessons learned would be really helpful.

Thanks!

Hi, We have implemented autoscaling for the entire stack - livekit has good instrumentation and architecture to be able to do so.

First size the servers and determins what is the peak workload for the hardware you are using. Then use that metric to autoscale the cluster, while keeping some headroom, depending on the concurrency that you expect. Egress exposes livekit_egress_requests which you can use to determine active egress instances running on an instance.

To make it even more reslient, on the client, you can have a retry based logic to wait and keep retrying until the egress request gets accepted.

Do the same for livekit media servers as well - how many rooms it can accomodate, and autoscale accordingly.