Load Testing LiveKit Agents

We frequently see this question in the community, so consolidating all the responses here

I’m building LiveKit agents and want to load test them before going to production. Specifically, I need to:

  • Simulate agents joining multiple rooms with users concurrently (1 agent and 1 user per room)
  • Measure and validate agent join delay
  • Understand whether there are any hard limits (per room, per project, etc.)
  • Follow best practices before running a large-scale load test

What’s the recommended approach for load testing LiveKit agents, and are there any limits or important considerations I should be aware of?

The LiveKit CLI already includes a built-in tool for agent load testing, and details can be found in the ReadMe

The CLI tool allows you to:

  • Create any number of rooms
  • Automatically dispatch your agent to each room
  • Add a single participant per room
  • Have the participant echo back whatever the agent says

The tool is not designed to be used for large scale load tests - it is strongly recommended you contact us before conducting any such tests.

This makes it very easy to simulate real-world usage and measure join latency.

The docs do call this out, but it’s worth noting that the agent should initiate the interaction, otherwise the participant will never anything back. You can add something like this after the session starts:

# Greet the user first
await session.generate_reply(
  instructions="Greet the user warmly and offer your assistance.",
)

The docs also say to use start, but you can also run the load test against deployed agents. I would recommend that because you then also get agent stats on the cloud dashboard for load and join latency.

After the test completes the CLI will display a table showing join delays for the agent in each room.

In addition to the CLI output and agent stats mentioned preivously, you can also access the logs, transcripts, and audio for your runs, as stored in Agent Insights

Burst behavior

The CLI does not create all rooms in a single burst. Instead:

  • A room is created
  • The agent joins
  • After the agent successfully joins, the next room is created

This produces a controlled ramp-up rather than a simultaneous spike.

We’ve previously seen developers try to load test agents by having them join multiple rooms simultaneously, which is not a realistic scenario for most users. Our infrastructure is designed to scale for realistic usage patterns. If you expect to see sudden large spikes in production, please get in touch and we can provision you appropriately under an enterprise plan to support your traffic pattern.

Plan Limits and Usage Considerations

The load test will be subject to the usage and limits of your plan as detailed on our pricing page.

In particular, limits may apply to:

  • Concurrent agent sessions
  • Concurrent LiveKit inference sessions
  • Overall usage quotas

Make sure your test configuration aligns with your plan’s capacity.


If the CLI Tool Doesn’t Meet Your Needs

If the load test available in the CLI does not meet your requirements, you can configure a script to dispatche agents and participants to rooms directly.

1 Like