Egress service gets stuck

I am self hosting livekit. The egress service abruptly stops sometimes after logging that it’s connecting to redis. The livekit server cancels the recording after 10s since it doesn’t get any response from egress service.

2026-02-13T07:05:59.764Z	INFO	egress	pipeline/watch.go:115	pipeline received EOS	{"nodeID": "NE_JGn2CW6uEUET", "handlerID": "EGH_DCTYA66WgMtY", "clusterID": "", "egressID": "EG_qfzmKVSQR9wb"}
2026-02-13T07:06:00.127Z	INFO	egress	info/io.go:269	egress_complete	{"nodeID": "NE_JGn2CW6uEUET", "clusterID": "", "egressID": "EG_qfzmKVSQR9wb", "requestType": "room_composite", "outputType": "file", "error": "", "code": 0, "details": "End reason: Source closed"}
2026-02-13T07:24:38.141Z	INFO	egress	server/server_rpc.go:59	request received	{"nodeID": "NE_JGn2CW6uEUET", "clusterID": "", "egressID": "EG_dz3JMaE3cvex"}
2026-02-13T07:24:38.141Z	INFO	egress	server/server_rpc.go:69	request validated	{"nodeID": "NE_JGn2CW6uEUET", "clusterID": "", "egressID": "EG_dz3JMaE3cvex", "requestType": "room_composite", "sourceType": "EGRESS_SOURCE_TYPE_WEB", "outputType": "file", "room": "study_698dc93713cf9c6e82b1a8e3_participant_698dcb4d13cf9c6e82b1a8e6_e0d60f", "request": {"RoomComposite":{"room_name":"study_698dc93713cf9c6e82b1a8e3_participant_698dcb4d13cf9c6e82b1a8e6_e0d60f","layout":"grid-dark","Output":null,"Options":null,"file_outputs":[{"file_type":1,"filepath":"study_698dc93713cf9c6e82b1a8e3_participant_698dcb4d13cf9c6e82b1a8e6_e0d60f/media","Output":{"Azure":{"account_name":"{account_name}","account_key":"{account_key}","container_name":"interview"}}}]}}}
2026-02-13T07:24:38.156Z	INFO	egress	redis/redis.go:144	connecting to redis	{"nodeID": "NE_JGn2CW6uEUET", "handlerID": "EGH_iSDJGyGmoBXM", "clusterID": "", "egressID": "EG_dz3JMaE3cvex", "simple": true, "addr": "localhost:6379"}

I am running egress, livekit and redis on a single machine. Attached is my init script for all the docker containers.

Inti script
#!/bin/sh
# This script will write all of your configurations to /opt/livekit.
# It'll also install LiveKit as a systemd service that will run at startup
# LiveKit will be started automatically at machine startup.

# Parse command line arguments
API_KEY=""
API_SECRET=""

usage() {
    echo "Usage: $0 --api-key <api_key> --api-secret <api_secret>"
    echo ""
    echo "Required arguments:"
    echo "  --api-key      LiveKit API key"
    echo "  --api-secret   LiveKit API secret"
    exit 1
}

while [ $# -gt 0 ]; do
    case "$1" in
        --api-key)
            API_KEY="$2"
            shift 2
            ;;
        --api-secret)
            API_SECRET="$2"
            shift 2
            ;;
        --help|-h)
            usage
            ;;
        *)
            echo "Unknown option: $1"
            usage
            ;;
    esac
done

# Validate required arguments
if [ -z "$API_KEY" ] || [ -z "$API_SECRET" ]; then
    echo "Error: Both --api-key and --api-secret are required."
    usage
fi

echo "Using API Key: $API_KEY"

# create directories for LiveKit
mkdir -p /opt/livekit/caddy_data
mkdir -p /usr/local/bin

# Docker & Docker Compose will need to be installed on the machine
curl -fsSL https://get.docker.com -o /tmp/get-docker.sh
sh /tmp/get-docker.sh
curl -L "https://github.com/docker/compose/releases/download/v5.0.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod 755 /usr/local/bin/docker-compose

sudo systemctl enable docker

# livekit config
cat << EOF > /opt/livekit/livekit.yaml
port: 7880
bind_addresses:
    - ""
rtc:
    tcp_port: 7881
    port_range_start: 50000
    port_range_end: 60000
    use_external_ip: true
    enable_loopback_candidate: false
redis:
    address: 127.0.0.1:6379
    username: ""
    password: ""
    db: 0
    use_tls: false
    sentinel_master_name: ""
    sentinel_username: ""
    sentinel_password: ""
    sentinel_addresses: []
    cluster_addresses: []
    max_redirects: null
turn:
    enabled: true
    domain: lk-turn.trykikilabs.com
    tls_port: 5349
    udp_port: 3478
    external_tls: true
ingress:
    rtmp_base_url: rtmp://lk.trykikilabs.com:1935/x
    whip_base_url: http:///w
keys:
    $API_KEY: $API_SECRET
webhook:
    api_key: $API_KEY
    urls:
      - https://app.trykikilabs.com/api/webhook/livekit


EOF

# caddy config
cat << EOF > /opt/livekit/caddy.yaml
logging:
  logs:
    default:
      level: INFO
storage:
  "module": "file_system"
  "root": "/data"
apps:
  tls:
    certificates:
      automate:
        - lk.trykikilabs.com
        - lk-turn.trykikilabs.com
  layer4:
    servers:
      main:
        listen: [":443"]
        routes:
          - match:
            - tls:
                sni:
                  - "app.trykikilabs.com"
            handle:
              - handler: proxy
                upstreams:
                  - dial: ["localhost:8443"]
          - match:
            - tls:
                sni:
                  - "lk-turn.trykikilabs.com"
            handle:
              - handler: tls
              - handler: proxy
                upstreams:
                  - dial: ["localhost:5349"]
          - match:
              - tls:
                  sni:
                    - "lk.trykikilabs.com"
            handle:
              - handler: tls
                connection_policies:
                  - alpn: ["http/1.1"]
              - handler: proxy
                upstreams:
                  - dial: ["localhost:7880"]


EOF

# update ip script
cat << "EOF" > /opt/livekit/update_ip.sh
#!/usr/bin/env bash
ip=`ip addr show |grep "inet " |grep -v 127.0.0. |head -1|cut -d" " -f6|cut -d/ -f1`
sed -i.orig -r "s/\\\"(.+)(\:5349)/\\\"$ip\2/" /opt/livekit/caddy.yaml


EOF

# docker compose
cat << EOF > /opt/livekit/docker-compose.yaml
# This docker-compose requires host networking, which is only available on Linux
# This compose will not function correctly on Mac or Windows
services:
  caddy:
    image: livekit/caddyl4
    command: run --config /etc/caddy.yaml --adapter yaml
    restart: unless-stopped
    network_mode: "host"
    volumes:
      - ./caddy.yaml:/etc/caddy.yaml
      - ./caddy_data:/data
  livekit:
    image: livekit/livekit-server:latest
    command: --config /etc/livekit.yaml
    restart: unless-stopped
    network_mode: "host"
    volumes:
      - ./livekit.yaml:/etc/livekit.yaml
  redis:
    image: redis:7-alpine
    command: redis-server /etc/redis.conf
    restart: unless-stopped
    network_mode: "host"
    volumes:
      - ./redis.conf:/etc/redis.conf
  egress:
    image: livekit/egress:latest
    restart: unless-stopped
    environment:
      - EGRESS_CONFIG_FILE=/etc/egress.yaml
    network_mode: "host"
    volumes:
      - ./egress.yaml:/etc/egress.yaml
    cap_add:
      - CAP_SYS_ADMIN
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8091/"]
      interval: 15s
      timeout: 3s
      retries: 3
      start_period: 20s
  ingress:
    image: livekit/ingress:latest
    restart: unless-stopped
    environment:
      - INGRESS_CONFIG_FILE=/etc/ingress.yaml
    network_mode: "host"
    volumes:
      - ./ingress.yaml:/etc/ingress.yaml


EOF

# systemd file
cat << EOF > /etc/systemd/system/livekit-docker.service
[Unit]
Description=LiveKit Server Container
After=docker.service
Requires=docker.service

[Service]
LimitNOFILE=500000
Restart=always
WorkingDirectory=/opt/livekit
# Shutdown container (if running) when unit is started
ExecStartPre=/usr/local/bin/docker-compose -f docker-compose.yaml down
ExecStart=/usr/local/bin/docker-compose -f docker-compose.yaml up
ExecStop=/usr/local/bin/docker-compose -f docker-compose.yaml down

[Install]
WantedBy=multi-user.target


EOF
# redis config
cat << EOF > /opt/livekit/redis.conf
bind 127.0.0.1 ::1
protected-mode yes
port 6379
timeout 300
tcp-keepalive 120


EOF
# egress config
cat << EOF > /opt/livekit/egress.yaml
redis:
    address: 127.0.0.1:6379
    username: ""
    password: ""
    db: 0
    use_tls: false
    sentinel_master_name: ""
    sentinel_username: ""
    sentinel_password: ""
    sentinel_addresses: []
    cluster_addresses: []
    max_redirects: null
api_key: $API_KEY
api_secret: $API_SECRET
ws_url: wss://lk.trykikilabs.com
health_port: 8091


EOF
# ingress config
cat << EOF > /opt/livekit/ingress.yaml
redis:
    address: 127.0.0.1:6379
    username: ""
    password: ""
    db: 0
    use_tls: false
    sentinel_master_name: ""
    sentinel_username: ""
    sentinel_password: ""
    sentinel_addresses: []
    cluster_addresses: []
    max_redirects: null
api_key: $API_KEY
api_secret: $API_SECRET
ws_url: wss://lk.trykikilabs.com
rtmp_port: 1935
whip_port: 8080
http_relay_port: 9090
logging:
    json: false
    level: ""
development: false
rtc_config:
    udp_port: 7885
    use_external_ip: true
    enable_loopback_candidate: false


EOF

chmod 755 /opt/livekit/update_ip.sh
/opt/livekit/update_ip.sh

systemctl enable livekit-docker
systemctl start livekit-docker

Hey @Ashish_Agarwal ,

you will have to look into debug logs - please set DEBUG level logs and try to reproduce. These logs will likely point you to the problem. Feel free to share them in case you don’t see what’s causing the issue.

Best,

Milos

Hey @Milos_Pesic, thanks for the advice, I was able to debug the issue by using debug logs.

Issue: I was running it on a 4 core machine which only supports 1 concurrent recording.

One small feedback: Can you update the documention of logging in egress service. It would really help the future developers.

Current documentation:

log_level: debug, info, warn, or error (default info)

Updated docs:

logging: 
    level: debug, info, warn, or error (default info)

It would be similar to logging in ingress or livekit server.

Hey @Ashish_Agarwal - I am glad you managed to figure out the root cause. Thanks a lot for pointing out stale doc - we will fix it :folded_hands: