I’m running into an issue where transcriptions are occasionally missing or not stored, especially under moderate concurrency (~15 concurrent sessions). At that level, 1–2 transcriptions per batch fail to persist.
Saving at session end via SessionReport — I use ctx.make_session_report() in an on_session_end/shutdown callback, but instead of writing JSON to disk I forward the data to another service over gRPC. This works most of the time but some transcription still go missing.
locally it works fine. most of the time if this happened they just throw silent error and my success log is not printed. other times it just throw this error
failed to publish transcription
Traceback (most recent call last):
File “/opt/venv/lib/python3.12/site-packages/livekit/agents/voice/room_io/_output.py”, line 307, in _publish_transcription
await self._room.local_participant.publish_transcription(transcription)
File “/opt/venv/lib/python3.12/site-packages/livekit/rtc/participant.py”, line 285, in publish_transcription
raise PublishTranscriptionError(cb.publish_transcription.error)
livekit.rtc.participant.PublishTranscriptionError: engine: connection error: engine is closed
OR
process did not exit in time, killing process
is it because i use job context shutdown instead of session.shutdown when call end so on_session_end function never triggered?
if is it taking too long does increasing shutdown_process_timeoutwill help?
There is a time limit for shutdown hooks to complete. Sounds like this is your issue.
Note
Shutdown hooks should complete within a short amount of time. By default, the framework waits 10 seconds before forcefully terminating the process. You can adjust this timeout using the shutdown_process_timeout parameter in agent server options.
I already increase the shutdown_process_timeout into 45s then 60s. but now there is an error and transcript is not saved event for 1 call there are this error in stt:
[WARNING] voice-agent — Speech recognition failed, retrying in 2.0s
Root Cause:
Audio Timeout Error — Long duration elapsed without audio.
Audio must be streamed close to real-time.
It looks like this is coming from the Google STT, and not your transcription upload step? Since STT would not be enabled when you are uploading your transcripts, this might be a red herring.