Livekit Inference thinking configuration for gemini 2.5 and 3.5 flash

Hello,

I would like to set the below parameters for gemini 2.5-flash and 3.5-flash since we use both of these models on production with Livekit Inference. However, I noticed that not all settings are exposed for Livekit Inference. Could you correct me on my implementation?

Im currently setting the below:

35_flash = inference.LLM(
    model="gemini-3.5-flash",
    provider="google",
    inference_class="priority",
    extra_kwargs={"temperature": 1, "reasoning_effort": "minimal"}
)

25_flash = inference.LLM(
    model="gemini-2.5-flash",
    provider="google",
    inference_class="priority",
    extra_kwargs={"temperature": 0.5, "reasoning_effort": "minimal"}
)

A few focused questions:

  1. Gemini 2.5 modulates thinking with a thinkingBudget parameter instead of reasoning_effort. Is this parameter exposed with Livekit Inference? Will reasoning_effort work with this model?
  2. Does the parameter include_thoughts work in Livekit Inference? It doesn’t seem to work and gemini 3.5 flash leaks thinking tokens to the LLM-TTS pipeline, so I would like the ability to disable that. Example:

the exception:

{“message”: “livekit.agents.inference.llm.LLM failed, switching to next LLM\nTraceback (most recent call last):\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/fallback_adapter.py”, line 176, in _try_generate\n async for chunk in stream:\n …<3 lines>…\n yield chunk\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 393, in anext\n raise exc # noqa: B904\n ^^^^^^^^^\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 195, in _traceable_main_task\n await self._main_task()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 223, in _main_task\n await self._run()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/inference/llm.py”, line 429, in _run\n raise APIStatusError(\n …<5 lines>…\n ) from None\nlivekit.agents._exceptions.APIStatusError: message=‘provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ’, status_code=400, retryable=True, body=provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ”, “level”: “WARNING”, “name”: “livekit.agents”, “exc_info”: “Traceback (most recent call last):\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/fallback_adapter.py”, line 176, in _try_generate\n async for chunk in stream:\n …<3 lines>…\n yield chunk\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 393, in anext\n raise exc # noqa: B904\n ^^^^^^^^^\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 195, in _traceable_main_task\n await self._main_task()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 223, in _main_task\n await self._run()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/inference/llm.py”, line 429, in _run\n raise APIStatusError(\n …<5 lines>…\n ) from None\nlivekit.agents._exceptions.APIStatusError: message=‘provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ’, status_code=400, retryable=True, body=provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ”, “pid”: 13166, “job_id”: “AJ_BXTwxkvd4zFm”, “room_id”: “RM_tQ43frNhvMpk”, “timestamp”: “2026-05-28T08:03:15.597624+00:00”}

Gemini 2.5 modulates thinking with a thinkingBudget parameter instead of reasoning_effort. Is this parameter exposed with Livekit Inference? Will reasoning_effort work with this model?

Yes, reasoning_effort maps to thinkingBudget but accepts the values low, medium or high(not minimal), please see Google Gemini LLM | LiveKit Documentation. The values in that table correspond to the budgets specified for 2.5

Does the parameter include_thoughts work in Livekit Inference? It doesn’t seem to work and gemini 3.5 flash leaks thinking tokens to the LLM-TTS pipeline, so I would like the ability to disable that.

No, include_thoughts is not supported in Inference, and I don’t see it documented for the plugin either. I can’t see it documented, but Claude is telling me the default value should be false, so I’m surprised you see thoughts leaking out.

The actual cause of the exception (according to Claude, but it seems very plausible) is that you have a Fallback to gemini-3.1-flash-lite which has a different thought signature which is causing the error, so falling back to a different (non-Gemini) provider should work.

Alternatively, we have a recipe for this scenario, where thought reasoning escapes the LLM, and the solution is to use llm_node

I haven’t tried it Gemini however

Thanks,

a quick follow up on your answer:

Does gemini 3.5 flash support “minimal” reasoning_effort?
Im asking in context of this Github Issue: Expose Gemini thinkingLevel=minimal in LiveKit Inference · Issue #5802 · livekit/agents · GitHub. It’s not in the documentation but it’s marked as resolved.

Thanks!

I was looking for that PR when answering your question!! I should have looked harder :slight_smile: .

I just looked at the code, minimal will be honoured in 3.5 but for 2.5 it is not handled and looks to me like it will disable thinking (but I haven’t tested it)

I can’t speak to why it’s not documented - but it’s almost certainly because the docs hasn’t caught up yet.

Thanks a lot!

It would be much clearer if the “minimal” setting disabled thinking entirely for the 2.5 model. That would make the configuration easier to understand and use for sure :slight_smile: