Title: How to add a language attribute to the user transcription stream (multi-language STT per participant)?
Hi team,
We’re building a multi-user transcriber where each participant has their own STT language configured via participant metadata, and the language can change mid-call (we rebuild the participant’s AgentSession when it does). Our web client subscribes to the lk.transcription text stream and needs to group lines per-language, so the natural place to put that info is a language attribute on the stream itself.
Before posting we spent a fair bit of time in the SDK looking for the right hook — I want to confirm we didn’t miss anything obvious, and get your take on the longer-term answer.
What we checked
-
RoomOutputOptions(livekit-agents/livekit/agents/voice/room_io/types.py) exposestranscription_enabled,audio_enabled,sync_transcription,json_format,next_in_chain,transcription_speed_factor. Noattributesfield or per-stream metadata hook. -
Agent.transcription_nodelooks like it’s for the agent’s TTS transcription, not the user STT transcription path. As far as we can tell, the user transcription flow is:AgentSession ──"user_input_transcribed"──▶ RoomIO._on_user_input_transcribed ──▶ RoomIO._forward_user_transcript ──▶ _user_tr_output.capture_text(...)and there’s no public override point on that path.
-
Internally,
_ParticipantStreamTranscriptionOutput(room_io/_output.py) already supports anattributesconstructor kwarg, and_create_text_writermerges_additional_attributesinto both the interim and the final writers — exactly the behavior we want. But the public wrapper_ParticipantTranscriptionOutputinstantiates it withattributes=Nonehardcoded, so there’s no way to pass anything through fromRoomIO/RoomOutputOptions. -
Your own
examples/other/translation/multi-user-translator.pysolves this same problem by disabling auto-transcription and callinglocal_participant.stream_text(topic=TOPIC_TRANSCRIPTION, attributes={"language": ...})manually.
What we ended up shipping
After trying both approaches we went with a small helper that reaches into the private internals after room_io.start():
def _tag_user_transcription_language(room_io: RoomIO, language: str) -> None:
user_tr = getattr(room_io, "_user_tr_output", None)
if user_tr is None:
return
outputs = getattr(user_tr, "_ParticipantTranscriptionOutput__outputs", None) or []
for output in outputs:
attrs = getattr(output, "_additional_attributes", None)
if isinstance(attrs, dict):
attrs["language"] = language
We picked this over the manual stream_text route (the multi-user-translator example) because it lets the SDK’s auto-publisher keep handling the writer lifecycle, segment IDs, lk.transcription_final flag, and the is_delta_stream=False user-side semantics — we only need to add one attribute, not reimplement the whole pipeline.
We’re aware the underscore-prefixed names mean this is not part of the supported API and could break on any SDK upgrade. Hence this post.
Questions
- Did we miss a public hook to attach attributes (like
language) to the auto-published transcription text stream — anything onRoomIO,RoomOutputOptions,AgentSession, orAgentwe overlooked? - Is there a recommended override point for the user (STT) transcription analogous to
Agent.transcription_nodefor the agent (TTS) side? If a hook exists or is planned, we’d happily switch to it. - If neither exists today, would you be open to a small PR exposing
attributes: dict[str, str]onRoomOutputOptions(or an equivalent setter onRoomIO) that flows through_ParticipantTranscriptionOutputinto_ParticipantStreamTranscriptionOutput._additional_attributes? The plumbing is already there — it’s a handful of lines. - If the answer is “use the manual
stream_textpattern” frommulti-user-translator.py, can you confirm that’s the officially recommended approach for this kind of per-stream metadata? We want to make sure we’re not building on something you consider transitional, and we’d revert to that path if it’s the supported answer.
For reference, we’re on livekit-agents (latest at time of writing) and Python 3.11.
Thanks — happy to open a PR or share more of our setup if useful.