You can now automatically detect voicemail and IVR with outbound phone agents
Outbound phone calls don’t always reach a human. Voicemail boxes and IVR menus come up far more often than most teams expect, and some lines never stop ringing. An agent only has a few seconds of the call to gather enough context on how to interact.
This is what the implementation I had earlier, so I had the STT transcribe the audio which was played by the voicemail system so if the transcription LLM will read it so the transcription read something like the person you are reaching is not available. Please leave a message or something like that. It has some similar pattern. The LLM will call the voicemail tool which I designed so the voicemail tool will leave the voicemail and it will end so this is the current solution which I have for voicemail. How will this new feature help me upgrade, or is it anything different from what I already have?
I would suggest to try it out for your use case to see for yourself. The LiveKit Answering Machine Detection has been tested across many different scenarios and use cases. It does not only detect answering machines (as indicated in your use case) but also can detect other types of systems like IVR. I know there were some tricky cases when the LiveKit ML team was developing this feature.
Another benefit is it is less code you need to write and maintain since it is handled by the system so you will just get the collective improvements as they are added.
Your STT-pattern-match has a structural floor: greeting plays (3-5s), STT transcribes (200ms-1s/chunk), LLM decides (500ms-2s next-turn), tool fires. Realistic detection 5-10s in. AMD operates on audio features (silence patterns, prompt cadence, post-beep gap), not transcripts, so it classifies before STT has anything.
Three wins beyond what CWilson noted:
Language-agnostic. Audio detection doesn’t depend on English greetings.
Lower human false positives. Pattern-matching fires when a real human says “leave a message”; audio classifier doesn’t.
IVR vs voicemail strategies differ. Voicemail = play message, hang up. IVR = wait for prompt + DTMF. Your current path treats both the same.
Migration is mostly subtractive: delete the regex and tool-dispatch logic, keep the leave-message audio.
One other thing I would add, the new answering machine detection should also more reliably cater for scenarios where the user picks up and just says ‘hello?’ (or similar). This is the “fast-path heuristic” described here: https://livekit.com/blog/content/images/2026/05/amd_workflow.svg
@CWilson@darryncampbell Does AMD work with Realtime Voice models. I’m trying to implement AMD when using Amazon Nova Sonic, but it’s not working. Here is my agent.py;
from dotenv import load_dotenv
from livekit import api
from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, Agent, room_io, mcp, AMD
from livekit.plugins import (
aws,
noise_cancellation
)
import os
from datetime import datetime
import json
import logging
import random
import string
from utils import load_prompt
load_dotenv()
logger = logging.getLogger(__name__)
agent_name = "test-agent"
class ContextAgent(Agent):
print(f"AGENT: {Agent}")
def __init__(self, context_vars=None, call_type=None) -> None:
if call_type == "option1":
instructions = load_prompt("option1_instructions.yaml")
else:
instructions = load_prompt("option2_instructions.yaml")
if context_vars:
instructions = instructions.format(**context_vars)
super().__init__(instructions=instructions)
async def on_enter(self):
self.session.generate_reply(
instructions="""
Greet the customer by saying;
For example;
Hi {first_name}, I'm Linda from Dalabey Live. I have seen you're interested in the product {product_name}.
Offer your assistance. You should start by speaking in English.
"""
)
server = AgentServer()
@server.rtc_session(agent_name=agent_name)
async def my_agent(ctx: agents.JobContext):
await ctx.connect()
participant = await ctx.wait_for_participant()
print(participant)
print(f"Participant Attributes: {participant.attributes}")
# Amazon Nova Sonic
session = AgentSession(
llm=aws.realtime.RealtimeModel(voice="tiffany")
)
await session.start(
room=ctx.room,
agent=ContextAgent(participant.attributes, participant.attributes.get("call_type")),
room_options=room_io.RoomOptions(
audio_input=room_io.AudioInputOptions(
noise_cancellation=lambda params: noise_cancellation.BVCTelephony() if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP else noise_cancellation.BVC(),
),
),
)
# AMD
participant_identity = "phone_user"
async with AMD(session, participant_identity=participant_identity) as detector:
# I have already done CreateSIPParticipantRequest from an external API
await ctx.wait_for_participant(identity=participant_identity)
result = await detector.execute()
print(f"AMD Category: {result.category}")
if result.category == "human" or result.category == "uncertain":
logger.info(
"human answered the call or amd is uncertain, proceeding with normal conversation",
extra={"transcript": result.transcript},
)
elif result.category == "machine-ivr":
logger.info("ivr menu detected, starting navigation")
elif result.category == "machine-vm":
logger.info("voicemail detected, leaving a message")
speech_handle = session.generate_reply(
instructions=(
"You've reached voicemail. Leave a brief message asking "
"the customer to call back."
),
)
await speech_handle.wait_for_playout()
ctx.shutdown("voicemail detected")
elif result.category == "machine-unavailable":
logger.info("mailbox unavailable, ending call")
ctx.shutdown("mailbox unavailable")
if __name__ == "__main__":
agents.cli.run_app(server)
AMD doesn’t work, I’m always seeing AMD category is AMDCategory.MACHINE_UNAVAILABLE.
I’m doing CreateSIPParticipantRequest from an external file/API.
@Kamal_Moha, three stacked issues on your aws.realtime.RealtimeModel + AMD path, verified in livekit-agents/livekit/agents/voice/amd/detector.py:
Ordering: session.start() triggers on_enter immediately, which calls self.session.generate_reply(...). Nova Sonic begins speaking before AMD initializes. By the time detector.execute() runs, the realtime model is mid-utterance.
Realtime-model compatibility: AMD’s constructor has suppress_compatibility_warning: bool = False, implying realtime models aren’t fully supported. Check worker logs at AMD startup for that warning. _pause_authorization gates pipeline TTS but realtime models own audio end-to-end.
MACHINE_UNAVAILABLE default: that category fires when AMD can’t classify (timeout / corrupted signal). Nova Sonic mid-turn during detection is the expected failure mode.
Fix: strip generate_reply from on_enter, and move AMD(...) before session.start() so detection runs against clean callee audio. For Nova Sonic, the cleaner pattern is AMD on a pipeline (STT + LLM + TTS), switching to the realtime model only after AMD returns human or uncertain. Canonical example: examples/telephony/amd.py.
@CWilson@darryncampbell worth flagging whether AMD + realtime models is officially supported, given the compatibility-warning flag.
@Kamal_Moha Thank you for providing your source code, but before I try to replicate, have you tested with a pipeline model, and your code works? My understanding is that AMD should work with a realtime agent session model (and it’s implied in the docs that it should work)
Guys, I have an update, I tried this out and this feature it just does not suit my use case.
In my usecase the Agent has to speak first when user picks up the phone and stays silent for sometime, becuz of cold starts my agent will take some time to talk lets say 2 sec,
In this meantime, this AMD is getting activated and categorising as machine unresponsive.
Hi @RabbaniF22 , I’m just trying to parse what you’re saying…
I see you have a few projects in your account, some of which are on ship and some are on build. I wouldn’t expect cold starts to be an issue on your ship plans (looks like your production projects use ship)
In this meantime, this AMD is getting activated and categorising as machine unresponsive.
Wouldn’t you do the categorisation first, and then speak only after it’s determined that a human answered the phone (at which point, your agent would speak first)
No, I am not using those agents, I am using the self hosted agents in the company that I am working, here genrally we do cold calls, so Agent should start the conversation. so the after user picks up the call, agent should speak first, not the user.
when using this AMD, since the user will be silent, AMD is categorizing as machine unresponsive and in my code, i am mapping that machine unresponsive to voicemail.
So main issue is, valid pickups where user;s refuse to speak first are getting tagged as machine unresponsive.
session.room_io.set_participant(participant_identity): focus on the dialed participant.
AMD(session, participant_identity=...): construct the detector against the now-running session.
result = await detector.execute(): block until classification.
Branch on result.category: “human” or “uncertain” continues to normal flow, “machine-ivr” starts IVR nav, “machine-vm” triggers session.generate_reply with voicemail instructions, “machine-unavailable” ends the call.
The detector attaches to the running session, so constructing it before session.start() means there’s no session to attach to. That’s what your “AgentSession isn’t running” is telling you.
Two things to verify in your code: on_enter no longer calling generate_reply (in the canonical, first speech happens after detector.execute() returns “human”), and AMD is constructed after session.start() has returned.