What is causing the spike in my call processing?
I’m self-hosting LiveKit for development, and currently I can only handle around 15 concurrent calls. However, my CPU usage is only about 40%, and memory usage is still low.
The issue is that telephony stops dispatching new calls when there’s a spike in load.
I’ve already moved my model initialization into a prewarm phase, so I don’t think cold starts are the problem.
I’m trying to understand what is causing this spike and limiting my concurrency despite low resource usage.
Here I attach attachment and code snippet:
Performance
Code Snippet
def compute_load(agent_server: Any) -> float:
# ---- HARD MEMORY LIMIT (SAFETY ONLY) ----
mem_used_gb = psutil.virtual_memory().used / (1024**3)
if mem_used_gb >= AGENT_MAX_MEMORY_GB:
return 0.95
pressures: list[float] = []
# ---- PRIMARY: CCR ----
ccr = len(agent_server.active_jobs) / AGENT_MAX_CONCURRENT_CALLS
pressures.append(min(ccr, 1.0))
# ---- CPU (LiveKit default moving average) ----
try:
cpu_percent = _DefaultLoadCalc.get_load(agent_server)
except Exception:
cpu_percent = float("nan")
if cpu_percent >= AGENT_MAX_CPU_PERCENT:
return 0.9
cpu_pressure = cpu_percent / AGENT_MAX_CPU_PERCENT
pressures.append(min(cpu_pressure, 1.0))
load = max(pressures)
return load
def prewarm(proc: JobProcess) -> None:
otel_manager = OTELManager()
otel_manager.initialize()
rest = RestClient()
gql = GraphQLClient()
intent_clf = joblib.load(intent_model_path)
proc.userdata["vad"] = silero.VAD.load()
proc.userdata["nc_models"] = {
"telephony": nc.BVCTelephony(),
"bvc": nc.BVC(),
"nc": nc.NC(),
}
setup_grpc()

