Hey LiveKit community,
I’m looking for help debugging an intermittent issue with a production AI voice moderation app I built on LiveKit Cloud.
Context
I’ve built and deployed an AI voice moderation platform for focus groups using:
-
LiveKit Cloud for real-time multi-user voice/video rooms
-
Python + FastAPI backend
-
Docker-based deployment on a VPS
-
LiveKit Agents
-
Deepgram for STT
-
ElevenLabs for TTS
-
OpenAI for conversation logic
-
Anam for the AI avatar layer
The platform creates rooms, generates participant tokens, dispatches the AI moderator agent, and supports Zoom-like multi-participant focus group sessions.
Issue
During a recorded demo, the AI moderator asked a participant a question. The participant answered clearly, and everyone in the room could hear him. The demo recording also captured his voice clearly.
However, the AI voice agent did not reliably process what he was saying.
In LiveKit Agent Insights, the participant’s audio sounded very faint/weak compared with what was heard in the actual room/demo recording. My application logs showed moments where the system detected speech activity but did not receive usable STT transcripts.
Example pattern from my logs:
User started speaking
STT HEALTH CHECK: VAD detected speech 7s ago but no STT transcripts received for christopher! Nudging.
Later, some partial fragments came through, but the agent treated the participant’s answer as incomplete/off-topic because the transcript was missing or fragmented.
What I checked
-
The participant’s microphone appeared to be working in Session Analytics.
-
Other participants could hear him clearly.
-
The demo recording captured his response clearly.
-
The issue seemed isolated to what the agent/STT pipeline was receiving.
-
Agent Insights made his audio sound much weaker than the room/demo recording.
My question
What is the best way to debug this type of mismatch?
Specifically:
-
Can LiveKit room audio/recording sound clear while the agent receives a much lower-quality or lower-volume subscribed track?
-
Could this be caused by participant-side connectivity, packet loss, browser audio processing, audio level normalization, or track subscription behavior?
-
Are there specific LiveKit metrics I should inspect, such as packet loss, jitter, audio level, connection quality, track SID, participant SID, or agent subscription state?
-
Is there a recommended way to compare what the room received versus what the LiveKit Agent actually received?
-
Could this be related to the LiveKit Agents audio pipeline, VAD configuration, or the downstream STT provider receiving weak/partial audio?
I’m not trying to assume this is a LiveKit infrastructure issue. I’m trying to determine whether the failure point is participant connectivity, browser/mic behavior, LiveKit track delivery, agent subscription/ingestion, VAD, or STT.
Any recommended debugging steps, metrics to export, or best practices for diagnosing “participant audible to humans but not reliably heard by agent” would be greatly appreciated.
The Session ID: RM_MQSDbz3SfBMD