@Apple_Intelligence2.7 GB is normal-ish for a Python voice agent bundling Silero VAD + turn detector. The bloat is mostly PyTorch + torchaudio pulled in by those models (1-2 GB alone, even CPU-only), not your code.
Concrete cuts in impact order:
python:3.12-slim base. Drops ~900 MB versus full python:3.12.
Multi-stage build. Install in a builder stage, copy only the resolved venv into runtime. Strips pip cache, build tools, __pycache__.
CPU-only torch wheels. Install from PyTorch’s CPU index so you don’t pull CUDA libs you can’t use.
Prewarm models in the builder. Trigger silero.VAD.load() and the turn detector download once during build so they’re baked into the image.
Realistic target: 800 MB to 1.2 GB. Below that needs Alpine or stripping torchaudio.
Yeah, I used the base image from the docs, but it’s still close to 2 GB. I’m including the Turn-D and Silero models in it. In cache. I will go through once I review the docs again. Thanks.
Yeah, I tried it now with the Python 3.12 slim image, and it reduced the size to around 2 GB, but it’s still quite large. I’m also attaching the Turn-D and Silero models in there so they’re cached and don’t need to be downloaded every time.
If possible, could you share any reference Dockerfile where you were able to achieve around a 1 GB image size? That would be really helpful.
@Apple_Intelligence, the starter darryn linked lands at ~1.25GB because of the multi-stage split: build tools and pip/uv cache stay in the builder; production copies only the resolved venv. Going from 2.7GB to 2GB on slim alone means you’re still missing that split. Reproduce the starter’s pattern and you should land near 1.25GB.
Correcting my earlier reply on torch: I rechecked main. livekit-plugins-silero declares onnxruntime only; livekit-plugins-turn-detector declares onnxruntime + transformers. Both use ONNX backends, not torch. So the 322MB of torch in your image is coming from something else in your dep tree. Run pip show torch or pipdeptree -r --packages torch in the built image to find the requirer. Dropping that dependency is your real path to sub-1GB.
For 100+ calls/day, tune num_idle_processes and worker concurrency before splitting services.
@CWilson ,@darryncampbell & @Muhammad_Usman_Bashir Based on your help and guidance, I was able to reduce the Docker image size to 1.2 GB, and I tested it it’s working fine. Thanks for the help, I really appreciate it.