Upgrading python livekit-agents to 1.5.6 causing memory issues

Hello,

We have recently upgraded python livekit-agents from 1.5.1 to 1.5.6, and it caused massive spikes in memory. Here is the AWS graph before and after deployment.

NOTE: The sudden drop in memory usage is due to auto horizontal scaling of the instance.

The pattern suggests that something is using memory and not releasing it. In the update, we just upgraded livekit-agent from 1.5.1 to 1.5.6 and added default Preamptive generation configs; nothing else.

I’ve not hear anyone report this but I will check with our team to see if they’ve seen this.

After downgrading its working perfectly fine

Thanks for this. I will pass that along to the team.

Our team ran a test of 1.5.6 (from 1.5.3) and did not see any unusual memory usage. It was same memory usage in both cases.

Can you provide your agent configuration or other ways of reproducing this?

Sure. Here are the configurations

#>>>>>>>>>>>>>>>>>>>>>>>>> BEFORE
[project]
name = "voicebot-livekit"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "aiohttp~=3.13",
    "livekit~=1.1.2",
    "livekit-agents[azure,cartesia,deepgram,elevenlabs,google,groq,inworld,openai,sarvam,silero,turn-detector]==1.5.1",
    "livekit-api~=1.1.0",
    "livekit-plugins-noise-cancellation~=0.2",
    "livekit-protocol~=1.1.2",
    "motor~=3.7",
    "pymongo>=4.12",
    "python-dotenv~=1.1",
]

#>>>>>>>>>>>>>>>>>>>>>>>>> AFTER
#>>>>>>>>>>>>>>>>>>>>>>>>> NOTE: Happens for the 1.5.2 as well. 
[project]
name = "voicebot-livekit"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "aiohttp~=3.13",
    "livekit~=1.1.2",
    "livekit-agents[azure,cartesia,deepgram,elevenlabs,google,groq,inworld,openai,sarvam,silero,turn-detector]==1.5.6", # Same result for 1.5.2 as seen in graph
    "livekit-api~=1.1.0",
    "livekit-plugins-noise-cancellation~=0.2",
    "livekit-protocol~=1.1.2",
    "motor~=3.7",
    "pymongo>=4.12",
    "python-dotenv~=1.1",
]

# EXAMPLE CONFIG
session = AgentSession(
    vad=VAD,  # << Prewarm silero.VAD.load()
    stt=deepgram.STT(model="nova-3-general"),
    llm=openai.LLM(model="gpt-4.1-mini"),
    tts=elevenlabs.TTS(model="eleven_flash_v2_5"),
    user_away_timeout=8.0,
    preemptive_generation=True,
    turn_handling=TurnHandlingOptions(
        turn_detection="vad",
        endpointing={
            "min_delay": 0.2,   # 200ms
        },
        interruption={
            "mode": "vad",
            "enabled": True,
            "discard_audio_if_uninterruptible": True,
            "min_duration": 0.4,   # 400ms
            "min_words": 1,
            "false_interruption_timeout": 2.0,
        },
    )
)

Thanks for that. So looks like you are seeing the memory issue started in 1.5.2 instead of 1.5.6?

I will pass this along.

Preemptive generation was added in 1.5.2 right? Shouldn’t benchmarks be between 1.5.1 and something post-1.5.2?

Hello @CWilson , Any update on the issue? Let me know if I can help in any way.
@Cars_Chandler , Preemptive generation was added in 1.5.0.

Can you try and disable premptive generation and see if that helps?

Hi @CWilson , I tried disabling premptive generation and it got worse. Here is a memory chart marked with versions.

Even without any calls its getting increased. I am not sure whats happening. Can you help me as I am stuck on v1.5.1?

@CWilson, the “memory grows even without any calls” data point in #12 narrows the suspect range. With no sessions active, leak candidates are worker-level:

  • Heartbeat / signaling WebSocket
  • Telemetry / metrics buffers
  • Plugin import-time singletons
  • Background asyncio tasks accumulating

@nilkanth, fastest path to a concrete root cause is an allocator-level snapshot diff. Run the worker idle (no calls) for ~30 minutes, then capture with memray:

  # pip install memray
  memray run -o profile.bin python your_worker.py start
  # stop after 30 min idle
  memray flamegraph profile.bin

Or in-process with tracemalloc snapshots at startup and 30 minutes in, then snapshot.compare_to(start_snapshot, 'lineno'). The top growers in the diff name the offending allocators directly, which is far more actionable for @CWilson’s team than memory graphs.