Best practice to answer FAQs: speed & accuracy

Marc_Torrellas_Socastro · March 9, 2026, 2:25pm

We’re building a multi-agent system similar to the drive-thru example. However, the user might have a question that we need to answer before continuing with the process. I’ve been plugging the whole FAQ knowledge into the agent context, and that was fine to start, but this has grown to a point where our context is mainly dominated by them

300 = base
1100 = agent
3500 = FAQs
other session context = 200

This obviously hurts latency. I see in the console that we can get down to replies in ~2secs (which still is not fast) if FAQs are not always in the context from the current 4-5 secs. I’m considering different options for the case where the user asks a question and the agent does not have the knowledge to answer:

A tool that calls an LLM who does have the FAQs loaded in context, and returns the reply to the user’s question
A tool that calls an LLM who does RAG (embedded-based). This would be worse in terms of accuracy, but better in latency. It is feasible right now with the small KB we have
A tool that creates a Task to spawn an agent who has the FAQs loaded in context
A tool that creates a Task to spawn an agent who has a tool to do RAG

Discarded:

A tool that returns the whole FAQs as a message - this would persist in conversation history!

–

My intuition is that (3) would be the best for accuracy (all FAQs in context) but latency would again be slow when answering Qs. At least it would free the agent when following the standard path.

On the other hand, (4) looks faster but more complicated

What’s best practice here? Is there an example for this? I see this but it assumes that a) the last turn requires RAG, b) the user message is formatted in the best way to query RAG

Raghu_Udiyar · March 10, 2026, 12:52pm

A direct RAG lookup is what I would reocmmend for the best latency. You can use the LLM to rewrite the users question into a search optimised query. There are multiple ways to improve vector search accuracy.

Marc_Torrellas_Socastro · March 11, 2026, 12:28pm

thanks for your answer, I will give the agent a tool to do RAG then

Topic		Replies	Views
Whats your current go-to LLM model? Agents llm	7	64	April 30, 2026
RAG helpers for Node agents? Agents agent-development	1	21	February 6, 2026
Response.prompt_cache_retention Input should be ‘in-memory’ or ‘24h Agents agent-development , openai	2	28	April 21, 2026
Dangerous assistant turn merging with Gemini Client SDKs python , llm	1	20	February 20, 2026
How to add filler words to reduce perceived latency Agents agent-development	1	64	January 21, 2026

Best practice to answer FAQs: speed & accuracy

Related topics