Question: bounded evidence receipts from FunctionToolsExecutedEvent

Hi LiveKit Agents folks,

I’m working on a small Assay-side reducer for agent test/runtime artifacts. The goal is narrow: take one bounded FunctionToolsExecutedEvent capture and reduce it into reviewable tool-action receipts.

No LiveKit integration ask, no endorsement ask, and no roadmap ask. I just want to sanity-check the shape before building the reducer.

The proposed boundary is:

  • input: one serialized FunctionToolsExecutedEvent
  • output: one receipt per observed function tool call
  • include: function name, call_id when present, argument/output hashes, completed / is_error, event/call/output timestamps
  • exclude: raw arguments, raw output, transcript, audio, room state, usage, session identity, and traces

Two questions where I’d value your read:

  1. Is pairing by call_id the right first rule, with order fallback only when IDs are absent? Are there retry/cancel/parallel cases where this breaks?

  2. Do LiveKit users commonly serialize session events to a file/stream for offline review, or would a small event-capture helper be needed for this to be useful?

If this is off or too narrow, happy to be corrected. The intent is to consume a published event shape carefully, not to make LiveKit own an evidence standard.

Context:

Verified live from livekit-agents/livekit/agents/voice/events.py on main: FunctionToolsExecutedEvent.zipped() is zip(self.function_calls, self.function_call_outputs, strict=False), pure list-order pairing, with a model validator enforcing equal lengths on construction. call_id exists on FunctionCall but the event’s own zipper doesn’t use it.

So your “prefer call_id when every entry has one, fall back to list order” rule is stricter than the SDK’s own contract, fine and arguably safer for external audit, just worth knowing it diverges. None outputs are typed as a real possibility (list[FunctionCallOutput | None]) but the meaning isn’t documented — which is exactly what Docs: clarify FunctionToolsExecutedEvent call/output pairing semantics · Issue #5696 · livekit/agents · GitHub is asking for.

On Q2: no canonical event-capture helper ships in the framework. Common pattern is session.on("function_tools_executed", handler) + Pydantic.model_dump_json() to your own sink, or OpenTelemetry via livekit.agents.telemetry for richer ops data. A small fixture-backed helper would be a useful contribution.

Cc: @Rul1an

Thanks, this is exactly the source-level check I was hoping for.

The SDK-contract vs audit-rule distinction is the useful bit. I’ll treat list order as the LiveKit behavior, since that is what zipped() does, and keep call_id as an extra consistency check only when every call/output pair has one.

So the reducer rule becomes:

  • primary pairing: SDK list order
  • optional audit check: if all entries carry call_id, mismatches fail the reduced receipt
  • unknown output: preserve None as observed, without inferring success/failure semantics from it

That keeps the receipt aligned with LiveKit instead of pretending call_id is the framework contract.

The capture-helper answer is useful too. If there is no canonical offline event sink, I’ll keep the first Assay-side example fixture-backed and small. A helper can stay separate: basically session.on("function_tools_executed", ...) plus model_dump_json() to JSONL, not an observability layer.

Appreciate the careful read. I’ll keep an eye on the pairing-semantics docs issue before treating the boundary as anything stronger than an observed SDK event shape.

Glad it helped! Build great things :victory_hand:t2: