Reliably persisting transcriptions under moderate concurrency

Hey everyone,

I’m running into an issue where transcriptions are occasionally missing or not stored, especially under moderate concurrency (~15 concurrent sessions). At that level, 1–2 transcriptions per batch fail to persist.

  1. Saving at session end via SessionReport — I use ctx.make_session_report() in an on_session_end/shutdown callback, but instead of writing JSON to disk I forward the data to another service over gRPC. This works most of the time but some transcription still go missing.

    Data hooks | LiveKit Documentation

  2. Saving incrementally on conversation_item_added — This ensures data is captured per-turn, but it noticeably increases end-to-end latency.

Is there a recommended pattern for reliably persisting transcription data under concurrency without impacting latency?

When you persist locally, does it ever fail?

I am wondering if the GRPC is taking too long and times out. Do you see anything in your agent logs for the failed transcript sessions?

locally it works fine. most of the time if this happened they just throw silent error and my success log is not printed. other times it just throw this error

failed to publish transcription
Traceback (most recent call last):
File “/opt/venv/lib/python3.12/site-packages/livekit/agents/voice/room_io/_output.py”, line 307, in _publish_transcription
await self._room.local_participant.publish_transcription(transcription)
File “/opt/venv/lib/python3.12/site-packages/livekit/rtc/participant.py”, line 285, in publish_transcription
raise PublishTranscriptionError(cb.publish_transcription.error)
livekit.rtc.participant.PublishTranscriptionError: engine: connection error: engine is closed

OR

process did not exit in time, killing process

is it because i use job context shutdown instead of session.shutdown when call end so on_session_end function never triggered?

if is it taking too long does increasing shutdown_process_timeoutwill help?

There is a time limit for shutdown hooks to complete. Sounds like this is your issue.

Note

Shutdown hooks should complete within a short amount of time. By default, the framework waits 10 seconds before forcefully terminating the process. You can adjust this timeout using the shutdown_process_timeout parameter in agent server options.

See:

I already increase the shutdown_process_timeout into 45s then 60s. but now there is an error and transcript is not saved event for 1 call there are this error in stt:

[WARNING] voice-agent — Speech recognition failed, retrying in 2.0s

Root Cause:
Audio Timeout Error — Long duration elapsed without audio.
Audio must be streamed close to real-time.


gRPC Error:
grpc.aio._call.AioRpcError
Status Code : OUT_OF_RANGE (11)
Details : Audio Timeout Error

Debug:
“UNKNOWN: Error received from peer
{grpc_status:11,
grpc_message:“Audio Timeout Error: Long duration elapsed without audio.”}”


Google API Exception:
google.api_core.exceptions.OutOfRange (400)
Message: Audio Timeout Error — Long duration elapsed without audio


LiveKit Exception:
livekit.agents._exceptions.APIStatusError
Status Code : 400
Retryable : False
Message : Audio Timeout Error — Long duration elapsed without audio


This is how I setup my cleanup function:

It looks like this is coming from the Google STT, and not your transcription upload step? Since STT would not be enabled when you are uploading your transcripts, this might be a red herring.