- I am trying to calculate the total and per turn cost of TTS models but some models such as gemini models have input text tokens and output audio tokens, so what should be considered ? both (by calculating the length of the llm generated text and converting them into tokens & multiplying with the input pricing and then adding with the calculation of the pricing of the output audio tokens) or just the input or only the output.
- And for chirp, legacy models & other providers where the pricing is defined per character, I am assuming that these are the characters which will be fed into the TTS model. So this is straight forward.
Both. Gemini TTS is token-priced like an LLM: input text tokens for what you send to synthesize, output audio tokens for the synthesized result. Sum them per Google’s published rates.
For per-turn accounting in LiveKit, don’t recompute from text length yourself. Subscribe to session_usage_updated on AgentSession; that event surfaces the actual token/character counts each plugin reports, which includes any normalization, SSML expansion, or repeats the plugin does internally.
For Chirp/ElevenLabs/Cartesia (character-priced), your read is right: characters fed to the TTS API, straight multiply.
I’ve implemented as per below, and i am using “gemini-3.1-flash-tts-preview” model using the GeminiTTS plugin imported from livekit.plugins.google.beta. Now the characters and audio duration are coming but the input_tokens & output_tokens are not being returned. What’s the issue here? pls help me out :-
session = AgentSession(
vad=vad,
min_interruption_duration=0.4,
allow_interruptions=True,
turn_detection="vad",
min_endpointing_delay=0.05,
preemptive_generation=False
)
@session.on("session_usage_updated")
def on_session_usage_updated(ev):
try:
for usage in ev.usage.model_usage:
logger.info(f"📊 [LIVE USAGE] {getattr(usage, 'provider', 'unknown')}/{getattr(usage, 'model', 'unknown')}: {usage}")
logger.info(f"📊 [USAGE DICT] {getattr(usage, '__dict__', dir(usage))}")
except Exception as e:
logger.error(f"Error logging session usage: {e}")
Sample Output Log:-
08:39:26.484 INFO dental-agent 📊 [USAGE DICT] {'type': 'tts_usage',
'provider': 'Gemini', 'model':
'gemini-3.1-flash-tts-preview',
'input_tokens': 0, 'output_tokens': 0,
'characters_count': 192,
'audio_duration': 13.52}
I think that might be a bug in the plugin.
I can add the following code to this file, agents/livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/gemini_tts.py at main · livekit/agents · GitHub
response = await self._tts._client.aio.models.generate_content(
model=self._tts._opts.model,
contents=input_text,
config=config,
)
# --- ADD THIS BLOCK ---
if response.usage_metadata:
self._set_token_usage(
input_tokens=response.usage_metadata.prompt_token_count or 0,
output_tokens=response.usage_metadata.candidates_token_count or 0,
)
# ----------------------
output_emitter.initialize(
request_id=utils.shortuuid(),
sample_rate=self._tts.sample_rate,
num_channels=self._tts.num_channels,
mime_type="audio/pcm",
)
And I now see input and output tokens in your tts_usage log.
Does that fix your issue? If so, I can raise a PR for this (or you can
)
Nice catch from @darryncampbell. That’s exactly the gap. My earlier “subscribe to session_usage_updated” pointer was right at the framework level but missed that the google.beta GeminiTTS plugin specifically wasn’t populating input_tokens / output_tokens from the response’s usage_metadata. The characters and audio_duration fields you see today come from the plugin’s own counters, not from Google’s response.
Darryn’s patch lifts the actual token counts from response.usage_metadata.prompt_token_count and candidates_token_count, which is where the Gemini API surfaces them. Once that lands, you’ll get the same per-turn cost math working for Gemini TTS that you already have for Chirp/ElevenLabs/Cartesia.
If you have a couple of minutes, the PR is a clean first contribution: five-line diff, low review surface. Otherwise, Darryn can raise it, but having you own it puts your name on the changelog.
thanks bro it worked, and as for the PR I do not know who should raise it should it be me as I raised the issue or you who provided the solution?
Ok thanks got it, and as for the PR I can do it right ? Is it ethically ok ?
@Yash_Zinzuwadiya the plugin is open source, so anyone can raise a PR.
I don’t know your circumstances, but many developers like to contribute to open source projects as it helps build their portfolio and shows a willingness to engage with software development more generally rather than just doing it because “it’s their job”. You then get your name in the list of contributors: Contributors to livekit/agents · GitHub and it looks good to future employers. On the other hand, you may not want to fuss about with all that and take the attitude "it’s your problem, you fix it
", in which case I’d raise it.
Just let me know either way
@darryncampbell ok no prob, I would like to raise it if that’s fine with you ![]()