Agent fails to publish its audio track at join

Setup: voice agents on livekit-agents 1.4.6 / livekit 1.1.2 / livekit-api 1.0.7 (Python 3.11), inbound SIP telephony, LiveKit Cloud (project p_3tqm7ro6kbs). On JobContext.connect() the agent starts an AgentSession with RoomIO (it publishes one audio track for the agent’s voice) and a room-composite audio egress.

What happened (one call): the agent connected, then failed to publish its own audio output track because the publish call timed out with no response from the server. The session never recovered: it dropped into a Subscriber pc state failed → resume loop that repeated every 5 minutes, the agent never produced any audio (the caller heard dead air and hung up), and the job process stayed alive for ≥ 2 h 18 m — leaking memory and running inference slower than realtime — without ever self-terminating.

Root error at join (_init_task):

File ".../livekit/agents/voice/room_io/room_io.py", line 338, in _init_task
    await self._audio_output.start()
File ".../livekit/agents/voice/room_io/_output.py", line 65, in _publish_track
    self._publication = await self._room.local_participant.publish_track(...)
File ".../livekit/rtc/participant.py", line 699, in publish_track
    raise PublishTrackError(cb.publish_track.error)
livekit.rtc.participant.PublishTrackError: room error engine: internal error:
    track publication timed out, no response received from the server

Timeline (UTC):

22:16:36  Entrypoint / Connected to LiveKit
22:16:38  Audio egress started
22:16:48  _init_task FAILS -> publish_track: "track publication timed out, no response received from the server"
22:16:52  rtc_engine: received session close -> resuming connection... attempt: 0
22:17:03  rtc_session: signal_event taking too much time: Answer(...)
22:17:09  rtc_session: Subscriber pc state failed -> resume ; "Wrong packet sequence while retrying"
22:17:25  Subscriber pc state failed -> resume
22:21:52  Subscriber pc state failed -> "received session close: pc_state failed" (resume)
22:26:52  Subscriber pc state failed ...                (repeats EVERY 5 minutes)
   ...
23:58:37+ "process memory usage is high" (every ~5s) + "inference is slower than realtime"
00:35:03  still logging the same loop (≥ 2h18m after start); process never exited in our capture window

IDs (LiveKit Cloud, project p_3tqm7ro6kbs):

  • room: RM_G3frdtYX4CPj
  • agent job: AJ_w93PjKVfhwUK
  • start: 2026-06-19 22:16:36 UTC

Please help investigate. I am guessing this is similar to Server-initiated migration fails to resume on agents 1.4.6 — subscriber + publisher PC fail, no recovery, process killed (expected fixed in >1.4.2 per agents #4705) ?