Hi Team,
We are building a real-time streaming and transcription platform using a self-hosted LiveKit stack. Our architecture currently includes:
-
LiveKit Server (self-hosted)
-
Egress service for recording/streaming outputs
-
Transcription agent for real-time speech-to-text
-
Redis for coordination
-
Containerized deployment (ECS/Fargate / Kubernetes style cluster)
Current Problem
In our current setup:
-
Each egress request is assigned to a single egress worker.
-
If no worker is available, the request stays pending and eventually times out.
-
During high traffic (many rooms starting recordings simultaneously), we see egress capacity bottlenecks.
So effectively:
Room Recording Request
β
LiveKit Server
β
Egress Service
β
Available Worker ?
YES β Start recording
NO β Request timeout
Goal
We want to design a reliable autoscaling strategy so that:
-
Egress workers scale automatically based on demand.
-
Recording requests do not timeout during bursts.
-
Workers scale down when idle to save cost.
Questions
-
What is the recommended autoscaling strategy for self-hosted LiveKit egress clusters?
-
Should autoscaling be based on:
-
Is there a way to queue egress jobs when workers are unavailable instead of failing immediately?
-
Has anyone implemented horizontal autoscaling for egress workers successfully (Kubernetes / ECS)?
-
Any recommended metrics to monitor for egress scaling (e.g., active pipelines, ffmpeg processes, Redis state)?
1 Like
Use the metrics exposed by egress to identify how many are running - its available as livekit_egress_requests, and then autoscale using this metric. For the client, you could have a retry based logic to wait and keep retrying until the egress request gets accepted.
You should benchmark how many egress requests a single instance handle, and accordingly use that as the threshold for autoscaling.
Hi @Raghu_Udiyar ,
Is there any SDK-level support in LiveKit to implement retry-based logic for starting egress until a worker becomes available?
Currently our setup is:
-
LiveKit server (self-hosted)
-
LiveKit Egress running on AWS ECS EC2 launch mode
-
EC2 instances behind an Auto Scaling Group
The issue we are seeing is:
When a new egress request arrives and no egress worker is immediately available, the request eventually times out. If the Auto Scaling Group launches a new EC2 instance to run the egress worker, it takes significant time (instance startup + container startup), so the original request fails before capacity becomes available.
What we are trying to understand:
-
Does the LiveKit SDK provide built-in retry logic for egress start requests until a worker becomes available?
-
Is there any recommended pattern for handling this scenario in production?
-
Apart from keeping idle standby workers, are there any other strategies used by the community to handle burst egress workloads?
Our current approach is considering implementing application-level retry with backoff, but wanted to check if there is a recommended LiveKit-native solution.
Any suggestions or production patterns would be greatly appreciated.
Thanks!
I donβt think the SDK provides it, but you should be able to wrap the response and retry - the application level retry as you mentioned. Otherwise, keeping sufficient headroom, and autoscaling based on the egress metrics is the way to go - thats what we use for scaling egress. There could be optimisation on EC2 boot time as well.