Attaching custom attributes to user transcription stream — public hook in RoomOutputOptions?

Title: How to add a language attribute to the user transcription stream (multi-language STT per participant)?


Hi team,

We’re building a multi-user transcriber where each participant has their own STT language configured via participant metadata, and the language can change mid-call (we rebuild the participant’s AgentSession when it does). Our web client subscribes to the lk.transcription text stream and needs to group lines per-language, so the natural place to put that info is a language attribute on the stream itself.

Before posting we spent a fair bit of time in the SDK looking for the right hook — I want to confirm we didn’t miss anything obvious, and get your take on the longer-term answer.

What we checked

  • RoomOutputOptions (livekit-agents/livekit/agents/voice/room_io/types.py) exposes transcription_enabled, audio_enabled, sync_transcription, json_format, next_in_chain, transcription_speed_factor. No attributes field or per-stream metadata hook.

  • Agent.transcription_node looks like it’s for the agent’s TTS transcription, not the user STT transcription path. As far as we can tell, the user transcription flow is:

    AgentSession  ──"user_input_transcribed"──▶  RoomIO._on_user_input_transcribed
                                            ──▶  RoomIO._forward_user_transcript
                                            ──▶  _user_tr_output.capture_text(...)
    

    and there’s no public override point on that path.

  • Internally, _ParticipantStreamTranscriptionOutput (room_io/_output.py) already supports an attributes constructor kwarg, and _create_text_writer merges _additional_attributes into both the interim and the final writers — exactly the behavior we want. But the public wrapper _ParticipantTranscriptionOutput instantiates it with attributes=None hardcoded, so there’s no way to pass anything through from RoomIO / RoomOutputOptions.

  • Your own examples/other/translation/multi-user-translator.py solves this same problem by disabling auto-transcription and calling local_participant.stream_text(topic=TOPIC_TRANSCRIPTION, attributes={"language": ...}) manually.

What we ended up shipping

After trying both approaches we went with a small helper that reaches into the private internals after room_io.start():

def _tag_user_transcription_language(room_io: RoomIO, language: str) -> None:
    user_tr = getattr(room_io, "_user_tr_output", None)
    if user_tr is None:
        return
    outputs = getattr(user_tr, "_ParticipantTranscriptionOutput__outputs", None) or []
    for output in outputs:
        attrs = getattr(output, "_additional_attributes", None)
        if isinstance(attrs, dict):
            attrs["language"] = language

We picked this over the manual stream_text route (the multi-user-translator example) because it lets the SDK’s auto-publisher keep handling the writer lifecycle, segment IDs, lk.transcription_final flag, and the is_delta_stream=False user-side semantics — we only need to add one attribute, not reimplement the whole pipeline.

We’re aware the underscore-prefixed names mean this is not part of the supported API and could break on any SDK upgrade. Hence this post.

Questions

  1. Did we miss a public hook to attach attributes (like language) to the auto-published transcription text stream — anything on RoomIO, RoomOutputOptions, AgentSession, or Agent we overlooked?
  2. Is there a recommended override point for the user (STT) transcription analogous to Agent.transcription_node for the agent (TTS) side? If a hook exists or is planned, we’d happily switch to it.
  3. If neither exists today, would you be open to a small PR exposing attributes: dict[str, str] on RoomOutputOptions (or an equivalent setter on RoomIO) that flows through _ParticipantTranscriptionOutput into _ParticipantStreamTranscriptionOutput._additional_attributes? The plumbing is already there — it’s a handful of lines.
  4. If the answer is “use the manual stream_text pattern” from multi-user-translator.py, can you confirm that’s the officially recommended approach for this kind of per-stream metadata? We want to make sure we’re not building on something you consider transitional, and we’d revert to that path if it’s the supported answer.

For reference, we’re on livekit-agents (latest at time of writing) and Python 3.11.

Thanks — happy to open a PR or share more of our setup if useful.

@Boussaid_Mohamed Your archaeology is right. No public hook exists for attaching custom attributes to the auto-published user transcription stream. Confirmed:

  • Agent.transcription_node is TTS-side per the pipeline nodes doc (https://docs.livekit.io/agents/logic/nodes/) (“adjust transcription before sending to the user”). Not the user STT path.
  • RoomOutputOptions doesn’t expose attributes; _ParticipantTranscriptionOutput hardcodes attributes=None, as you found.
  • No GitHub issue requesting this; gh_search on related terms returns adjacent requests, not yours.

So the design answer needs a maintainer call. Right channel: open an issue on livekit/agents with this exact post text plus the PR you offered. The plumbing argument is strong: wire attributes: dict[str, str] from RoomOutputOptions through _ParticipantTranscriptionOutput to the existing _additional_attributes slot. Maintainers approve these when the alternative is users monkey-patching private fields.

multi-user-translator.py is the documented escape hatch today, but forces you to reimplement segment IDs, the lk.transcription_final flag, and interim writer lifecycle for what’s fundamentally a metadata tag. That’s the friction the PR would remove.

Your monkey-patch is fragile but reasonable in the meantime. Pin livekit-agents==X.Y.Z so an upgrade can’t silently break the _ParticipantTranscriptionOutput__outputs name-mangling, and guard the helper to log when the attribute path is missing instead of silently no-op’ing.