July 4, 2026 · 7 min read
When an AI agent needs to answer a question about your API, it has two main options: fetch the relevant page directly, or query a pre-built index of your content. The difference in answer quality is larger than most developers expect.
We ran a benchmark across 37 developer sites — pricing pages, documentation, and multi-page technical references — and had GPT-4o independently judge the answers. The result: web_fetch produced 3x more high-confidence wrong answers than indexed RAG (3 cases vs 1 across 37 questions). That gap widens significantly for JS-heavy sites and questions that span multiple pages.
When Claude or another agent uses a web fetch tool, it issues an HTTP GET to the URL you give it, receives the HTML response, extracts text, and stuffs it into context. That's it. One URL, one pass, one chance to find the answer.
This works fine when the URL is exactly right and the page is server-rendered. For a static docs page where you already know the exact path, web_fetch is perfectly reasonable. But those conditions fail more often than you'd think:
JS-rendered pages return almost nothing. Pricing pages, dashboards, and anything built with React client-side rendering often return a nearly empty shell. The fetch completes, the agent gets a few hundred characters of boilerplate, and it has to guess the rest — or says it can't find the information, which is the better outcome.
Single-page fetches miss distributed content. Your rate limits might live in three places: the API reference, the pricing page, and a dedicated limits guide. A web_fetch of any one page misses the other two. The agent synthesizes an answer from partial information and presents it with the same confidence as if it had read everything.
Thin content triggers hallucination. When an LLM receives a fetch result with sparse content — say, a JS-heavy page that returned 200 characters — it doesn't always say "I couldn't find this." Sometimes it fills the gap with plausible-sounding but incorrect details drawn from training data. The result is a confidently wrong answer that's harder to catch than an outright refusal.
We tested across three categories of sites: JS-heavy SPAs (pricing pages for Linear, Vercel, Stripe, Figma, Notion, Zoom, Airtable, monday, Miro, Loom), static documentation (Next.js, Tailwind, React, MDN, Vue, Express, Docker, TypeScript, GraphQL, Kubernetes, PostgreSQL, git, Redis, FastAPI), and multi-page synthesis questions (Supabase, Auth0, Anthropic, GitHub Actions, Cloudflare, MongoDB, SendGrid, Netlify, Render, Twilio, Datadog, Clerk, Resend).
For each site, we asked a question a developer would realistically ask. GPT-4o judged whether each answer covered the expected facts and whether it stated anything confidently wrong.
Key findings:
The pattern is consistent: web_fetch has better raw coverage when everything goes right, but indexed RAG has a lower floor when things go wrong. In a production agent where incorrect answers have real consequences — wrong API calls, wrong auth configuration, wrong billing assumptions — that lower floor matters.
Pre-indexed RAG has a property web_fetch doesn't: it can return a low-similarity score when nothing relevant was found, and decline to answer rather than hallucinate. When the top chunk retrieved has a similarity score below a threshold, a well-designed RAG system returns "the indexed content doesn't cover this topic" rather than asking the LLM to reason from thin evidence.
Web_fetch has no equivalent mechanism. If the page loads, the agent has context — sparse or not — and the LLM is expected to produce an answer from it.
Pre-indexing also lets the system pull from multiple pages in a single query. When you ask about "what's the rate limit on the free plan," indexed RAG can retrieve relevant chunks from the pricing page, the API reference, and the limits guide simultaneously, synthesizing a complete answer. Web_fetch on a single URL cannot.
Web_fetch is the right tool when you have an exact, stable URL for a static page that directly answers the question. If a user pastes a specific docs link and asks "what does this page say about authentication?", fetching that URL is fast and appropriate.
It's the wrong tool when you're trying to answer a question about a domain where you don't know the exact URL, when the site is JS-rendered, or when the answer might be spread across multiple pages.
The fastest thing you can do: publish an /llms.txt file. It won't help at query time — LLMs can't fetch it during inference unless explicitly told to — but it gives training corpora a dense, accurate summary of your product that's harder to hallucinate around.
For runtime accuracy, you need to be in an index. Either build a RAG pipeline yourself (crawl, chunk, embed, serve), or use a hosted service. AgentReady crawls your docs, embeds the content, and serves it via MCP so agents can query it with a single tool call — including JS-rendered pages via headless rendering fallback.
The value isn't just accuracy. When your docs are in an index, agents can cite exactly which page they pulled the answer from. That gives users a link to verify, which is something web_fetch answers rarely include.
Index your docs and make them reliably queryable by AI agents.
Index your docs on AgentReady →