Testing strategy for LiveKit voice agents (roomless vs lk.chat UI tests)

Hey everyone! I am building automated tests for our LiveKit voice agent and wanted to sanity-check the approach before going too far

I have two layers of tests:

1) Roomless conversation tests

I intend to use AgentSession.run(user_input=...) to validate the agent’s conversational behavior without a room, STT, TTS, or VAD

The idea:

  • Spin up sessions
  • Send text input
  • Inspect RunResult.events for messages, tool calls, and outputs

Our Agent subclass uses @function_tool decorators and custom skills that internally call CommandSender methods.


2) Browser UI tests

From the browser we send text via:

room.localParticipant.sendText(text, { topic: "lk.chat" })

Then we verify rendering and behavior using Playwright

Our understanding is that RoomIO automatically handles lk.chat and routes it through the same LLM pipeline as voice


Questions

  1. Is session.run(user_input=...) the recommended way to test Agent subclasses without a room? Any pitfalls?
  2. Does lk.chat text follow the exact same pipeline as voice (including function tools), or are there differences?

Does this overall strategy make sense, or would you approach testing differently?

Is session.run(user_input=...) the recommended way to test Agent subclasses without a room? Any pitfalls?

Yes, this is the recommended way to test Agent subclasses without a room

Does lk.chat text follow the exact same pipeline as voice (including function tools), or are there differences?

Yes, text sent with room.localParticipant.sendText… is handled by RoomIO and routed through the same LLM Pipeline as voice input

Overall it looks good. I can’t imagine you haven’t seen it, but the docs page is Testing and evaluation | LiveKit Documentation

Great. Thanks for the answers @darryncampbell, good to know the approach makes sense

@cdutr are you trying to look at specific metrics/trends in the load testing?

Hi @Shashij_Gupta , I am more interested in analyzing behavior (using LLM as judge), finding bugs/crashs and overal latency. I still am not very focused on load testing for now