How to calculate the pricing for gemini tts plugin

Yash_Zinzuwadiya · May 13, 2026, 3:10pm

I am trying to calculate the total and per turn cost of TTS models but some models such as gemini models have input text tokens and output audio tokens, so what should be considered ? both (by calculating the length of the llm generated text and converting them into tokens & multiplying with the input pricing and then adding with the calculation of the pricing of the output audio tokens) or just the input or only the output.
And for chirp, legacy models & other providers where the pricing is defined per character, I am assuming that these are the characters which will be fed into the TTS model. So this is straight forward.

Muhammad_Usman_Bashir · May 13, 2026, 5:59pm

Both. Gemini TTS is token-priced like an LLM: input text tokens for what you send to synthesize, output audio tokens for the synthesized result. Sum them per Google’s published rates.

For per-turn accounting in LiveKit, don’t recompute from text length yourself. Subscribe to session_usage_updated on AgentSession; that event surfaces the actual token/character counts each plugin reports, which includes any normalization, SSML expansion, or repeats the plugin does internally.

For Chirp/ElevenLabs/Cartesia (character-priced), your read is right: characters fed to the TTS API, straight multiply.

Yash_Zinzuwadiya · May 14, 2026, 4:08am

I’ve implemented as per below, and i am using “gemini-3.1-flash-tts-preview” model using the GeminiTTS plugin imported from livekit.plugins.google.beta. Now the characters and audio duration are coming but the input_tokens & output_tokens are not being returned. What’s the issue here? pls help me out :-

    session = AgentSession(
        vad=vad,
        min_interruption_duration=0.4,
        allow_interruptions=True,
        turn_detection="vad",
        min_endpointing_delay=0.05,
        preemptive_generation=False
    )

   @session.on("session_usage_updated")

    def on_session_usage_updated(ev):

        try:

            for usage in ev.usage.model_usage:

                logger.info(f"📊 [LIVE USAGE] {getattr(usage, 'provider', 'unknown')}/{getattr(usage, 'model', 'unknown')}: {usage}")

                logger.info(f"📊 [USAGE DICT] {getattr(usage, '__dict__', dir(usage))}")

        except Exception as e:

            logger.error(f"Error logging session usage: {e}")

Sample Output Log:-
08:39:26.484 INFO     dental-agent       📊 [USAGE DICT] {'type': 'tts_usage',   
                                             'provider': 'Gemini', 'model':          
                                             'gemini-3.1-flash-tts-preview',         
                                             'input_tokens': 0, 'output_tokens': 0,  
                                             'characters_count': 192,                
                                             'audio_duration': 13.52}

darryncampbell · May 14, 2026, 9:46am

I think that might be a bug in the plugin.

I can add the following code to this file, agents/livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/gemini_tts.py at main · livekit/agents · GitHub

response = await self._tts._client.aio.models.generate_content(
                model=self._tts._opts.model,
                contents=input_text,
                config=config,
            )

            # --- ADD THIS BLOCK ---  
            if response.usage_metadata:  
                self._set_token_usage(  
                    input_tokens=response.usage_metadata.prompt_token_count or 0,  
                    output_tokens=response.usage_metadata.candidates_token_count or 0,  
                )  
            # ----------------------  

            output_emitter.initialize(
                request_id=utils.shortuuid(),
                sample_rate=self._tts.sample_rate,
                num_channels=self._tts.num_channels,
                mime_type="audio/pcm",
            )

And I now see input and output tokens in your tts_usage log.

Does that fix your issue? If so, I can raise a PR for this (or you can )

Muhammad_Usman_Bashir · May 14, 2026, 5:32pm

Nice catch from @darryncampbell. That’s exactly the gap. My earlier “subscribe to session_usage_updated” pointer was right at the framework level but missed that the google.beta GeminiTTS plugin specifically wasn’t populating input_tokens / output_tokens from the response’s usage_metadata. The characters and audio_duration fields you see today come from the plugin’s own counters, not from Google’s response.

Darryn’s patch lifts the actual token counts from response.usage_metadata.prompt_token_count and candidates_token_count, which is where the Gemini API surfaces them. Once that lands, you’ll get the same per-turn cost math working for Gemini TTS that you already have for Chirp/ElevenLabs/Cartesia.

If you have a couple of minutes, the PR is a clean first contribution: five-line diff, low review surface. Otherwise, Darryn can raise it, but having you own it puts your name on the changelog.

Yash_Zinzuwadiya · May 18, 2026, 6:25am

thanks bro it worked, and as for the PR I do not know who should raise it should it be me as I raised the issue or you who provided the solution?

Yash_Zinzuwadiya · May 18, 2026, 6:32am

Ok thanks got it, and as for the PR I can do it right ? Is it ethically ok ?

darryncampbell · May 19, 2026, 12:49pm

@Yash_Zinzuwadiya the plugin is open source, so anyone can raise a PR.

I don’t know your circumstances, but many developers like to contribute to open source projects as it helps build their portfolio and shows a willingness to engage with software development more generally rather than just doing it because “it’s their job”. You then get your name in the list of contributors: Contributors to livekit/agents · GitHub and it looks good to future employers. On the other hand, you may not want to fuss about with all that and take the attitude "it’s your problem, you fix it ", in which case I’d raise it.

Just let me know either way

Yash_Zinzuwadiya · May 20, 2026, 1:29pm

@darryncampbell ok no prob, I would like to raise it if that’s fine with you

Topic		Replies	Views
UsageSummary: per-model LLM/STT/TTS usage for cost calculation Agents python	3	78	March 12, 2026
Agent speaking audio_text tokens out loud Agents llm , openai	4	74	March 6, 2026
Feature request: Gemini thinkingLevel=minimal for faster voice-agent TTFT Agents llm , gemini	8	109	May 27, 2026
Google TTS Plugin Timeout with Gemini Model & Chirp_3 Streaming Error (livekit-plugins-google==1.4.4) Getting Started	8	130	March 23, 2026
Why is GPT-5.4 pricing via LiveKit Inference about 2x OpenAI direct? Agents livekit-inference	7	99	May 14, 2026

How to calculate the pricing for gemini tts plugin

Related topics