Hello,
I would like to set the below parameters for gemini 2.5-flash and 3.5-flash since we use both of these models on production with Livekit Inference. However, I noticed that not all settings are exposed for Livekit Inference. Could you correct me on my implementation?
Im currently setting the below:
35_flash = inference.LLM(
model="gemini-3.5-flash",
provider="google",
inference_class="priority",
extra_kwargs={"temperature": 1, "reasoning_effort": "minimal"}
)
25_flash = inference.LLM(
model="gemini-2.5-flash",
provider="google",
inference_class="priority",
extra_kwargs={"temperature": 0.5, "reasoning_effort": "minimal"}
)
A few focused questions:
- Gemini 2.5 modulates thinking with a thinkingBudget parameter instead of reasoning_effort. Is this parameter exposed with Livekit Inference? Will reasoning_effort work with this model?
- Does the parameter include_thoughts work in Livekit Inference? It doesn’t seem to work and gemini 3.5 flash leaks thinking tokens to the LLM-TTS pipeline, so I would like the ability to disable that. Example:
the exception:
{“message”: “livekit.agents.inference.llm.LLM failed, switching to next LLM\nTraceback (most recent call last):\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/fallback_adapter.py”, line 176, in _try_generate\n async for chunk in stream:\n …<3 lines>…\n yield chunk\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 393, in anext\n raise exc # noqa: B904\n ^^^^^^^^^\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 195, in _traceable_main_task\n await self._main_task()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 223, in _main_task\n await self._run()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/inference/llm.py”, line 429, in _run\n raise APIStatusError(\n …<5 lines>…\n ) from None\nlivekit.agents._exceptions.APIStatusError: message=‘provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ’, status_code=400, retryable=True, body=provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ”, “level”: “WARNING”, “name”: “livekit.agents”, “exc_info”: “Traceback (most recent call last):\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/fallback_adapter.py”, line 176, in _try_generate\n async for chunk in stream:\n …<3 lines>…\n yield chunk\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 393, in anext\n raise exc # noqa: B904\n ^^^^^^^^^\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 195, in _traceable_main_task\n await self._main_task()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/llm/llm.py”, line 223, in _main_task\n await self._run()\n File “/app/.venv/lib/python3.13/site-packages/livekit/agents/inference/llm.py”, line 429, in _run\n raise APIStatusError(\n …<5 lines>…\n ) from None\nlivekit.agents._exceptions.APIStatusError: message=‘provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ’, status_code=400, retryable=True, body=provider: google model: gemini-3.1-flash-lite, message: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: . Corrupted thought signature.: Error 400, Message: Corrupted thought signature., Status: INVALID_ARGUMENT, Details: ”, “pid”: 13166, “job_id”: “AJ_BXTwxkvd4zFm”, “room_id”: “RM_tQ43frNhvMpk”, “timestamp”: “2026-05-28T08:03:15.597624+00:00”}