July 4, 2026 · 7 min read

web_fetch vs Indexed RAG: Why AI Agents Get Facts Wrong About Your API

When an AI agent needs to answer a question about your API, it has two main options: fetch the relevant page directly, or query a pre-built index of your content. The difference in answer quality is larger than most developers expect.

We ran a benchmark across 37 developer sites — pricing pages, documentation, and multi-page technical references — and had GPT-4o independently judge the answers. The result: web_fetch produced 3x more high-confidence wrong answers than indexed RAG (3 cases vs 1 across 37 questions). That gap widens significantly for JS-heavy sites and questions that span multiple pages.

What web_fetch actually does

When Claude or another agent uses a web fetch tool, it issues an HTTP GET to the URL you give it, receives the HTML response, extracts text, and stuffs it into context. That's it. One URL, one pass, one chance to find the answer.

This works fine when the URL is exactly right and the page is server-rendered. For a static docs page where you already know the exact path, web_fetch is perfectly reasonable. But those conditions fail more often than you'd think:

JS-rendered pages return almost nothing. Pricing pages, dashboards, and anything built with React client-side rendering often return a nearly empty shell. The fetch completes, the agent gets a few hundred characters of boilerplate, and it has to guess the rest — or says it can't find the information, which is the better outcome.

Single-page fetches miss distributed content. Your rate limits might live in three places: the API reference, the pricing page, and a dedicated limits guide. A web_fetch of any one page misses the other two. The agent synthesizes an answer from partial information and presents it with the same confidence as if it had read everything.

Thin content triggers hallucination. When an LLM receives a fetch result with sparse content — say, a JS-heavy page that returned 200 characters — it doesn't always say "I couldn't find this." Sometimes it fills the gap with plausible-sounding but incorrect details drawn from training data. The result is a confidently wrong answer that's harder to catch than an outright refusal.

What the benchmark showed

We tested across three categories of sites: JS-heavy SPAs (pricing pages for Linear, Vercel, Stripe, Figma, Notion, Zoom, Airtable, monday, Miro, Loom), static documentation (Next.js, Tailwind, React, MDN, Vue, Express, Docker, TypeScript, GraphQL, Kubernetes, PostgreSQL, git, Redis, FastAPI), and multi-page synthesis questions (Supabase, Auth0, Anthropic, GitHub Actions, Cloudflare, MongoDB, SendGrid, Netlify, Render, Twilio, Datadog, Clerk, Resend).

For each site, we asked a question a developer would realistically ask. GPT-4o judged whether each answer covered the expected facts and whether it stated anything confidently wrong.

Key findings:

JS-heavy SPAs: web_fetch and indexed RAG tied on coverage (63% each), but web_fetch had 2 high-confidence wrong answers vs 0 for RAG. The fetch succeeds but returns thin content; the LLM fills gaps incorrectly.
Static docs: web_fetch edges out indexed RAG on raw fact coverage (92% vs 81%) — this is the case where a single well-chosen URL often has everything you need. But both had zero hallucination cases, so the difference is coverage, not reliability.
Multi-page synthesis: web_fetch won on coverage again (66% vs 52%), but had 1 high-confidence wrong answer vs 1 for RAG. The coverage gap here is partly because several multi-page RAG answers timed out on first index — warm runs close this gap.
Overall: web_fetch: 75% fact coverage, 3 high-confidence wrong answers. Indexed RAG: 66% coverage, 1 high-confidence wrong answer.

The pattern is consistent: web_fetch has better raw coverage when everything goes right, but indexed RAG has a lower floor when things go wrong. In a production agent where incorrect answers have real consequences — wrong API calls, wrong auth configuration, wrong billing assumptions — that lower floor matters.

Why indexed RAG has fewer confident mistakes

Pre-indexed RAG has a property web_fetch doesn't: it can return a low-similarity score when nothing relevant was found, and decline to answer rather than hallucinate. When the top chunk retrieved has a similarity score below a threshold, a well-designed RAG system returns "the indexed content doesn't cover this topic" rather than asking the LLM to reason from thin evidence.

Web_fetch has no equivalent mechanism. If the page loads, the agent has context — sparse or not — and the LLM is expected to produce an answer from it.

Pre-indexing also lets the system pull from multiple pages in a single query. When you ask about "what's the rate limit on the free plan," indexed RAG can retrieve relevant chunks from the pricing page, the API reference, and the limits guide simultaneously, synthesizing a complete answer. Web_fetch on a single URL cannot.

When to use web_fetch

Web_fetch is the right tool when you have an exact, stable URL for a static page that directly answers the question. If a user pastes a specific docs link and asks "what does this page say about authentication?", fetching that URL is fast and appropriate.

It's the wrong tool when you're trying to answer a question about a domain where you don't know the exact URL, when the site is JS-rendered, or when the answer might be spread across multiple pages.

Making your docs resilient to web_fetch failures

The fastest thing you can do: publish an /llms.txt file. It won't help at query time — LLMs can't fetch it during inference unless explicitly told to — but it gives training corpora a dense, accurate summary of your product that's harder to hallucinate around.

For runtime accuracy, you need to be in an index. Either build a RAG pipeline yourself (crawl, chunk, embed, serve), or use a hosted service. AgentReady crawls your docs, embeds the content, and serves it via MCP so agents can query it with a single tool call — including JS-rendered pages via headless rendering fallback.

The value isn't just accuracy. When your docs are in an index, agents can cite exactly which page they pulled the answer from. That gives users a link to verify, which is something web_fetch answers rarely include.

Index your docs and make them reliably queryable by AI agents.

Index your docs on AgentReady →