Mikel Lindsaar

Mikel Lindsaar

19 February 2026

You feed your agent that!?

Here’s a question worth asking: what is your AI agent actually reading when it fetches a web page?

Not the content you care about. The content. All of it. The nav bar. The cookie banner. The JavaScript that populates the dropdown menus. The ads. The footer links. The analytics scripts. The tracking pixels. Every byte of HTML markup wrapping the 800 words of actual information you needed.

A typical documentation page, fetched raw, costs your agent somewhere between 12,000 and 20,000 tokens. The useful information inside it is usually 800 to 1,500 tokens. The rest is noise, and your LLM is billing you to read every character of it.

I ran the numbers. Across the agents I’ve been building, raw HTML fetch is burning between 57% and 92% of tokens on content that contributes nothing to the output. That’s not a rounding error. That’s the majority of your API bill going to pay for nav bars.


What the garbage actually looks like

To be concrete about it, here’s what a fetch to the Python docs looks like without any processing:

  • Raw HTML response: approximately 18,400 tokens
  • Useful content (the actual documentation): approximately 1,200 tokens
  • Token waste: 93%

That 18,400-token call to Claude 3.5 Sonnet costs roughly $0.055. The same call through a proxy that strips the HTML and returns clean markdown costs around $0.004. If your agent is fetching 100 pages per day, that’s the difference between a $5.50/day bill and a $0.40/day bill. At scale, this is not a small number.

The reason this happens is straightforward. HTML was designed for browsers, not language models. Browsers ignore the markup and render the content. LLMs read everything, tokenize everything, and charge you for everything. Your agent is essentially paying to read assembly code when it could be reading a book.


The fix is one line of curl

The solution is to route fetches through a proxy that converts HTML to markdown before the content hits your LLM. That proxy is deshell.

Here’s what that looks like:

curl -H "X-DeShell-Key: YOUR_KEY" \
 https://proxy.deshell.ai/https://docs.python.org/3/library/json.html

You get back clean markdown. Headings, code blocks, paragraphs. No scripts, no nav, no cookie banners. The same documentation content, ready for an LLM to actually use, at a fraction of the token count.

The response includes an X-DeShell-Savings header that tells you exactly how many tokens were saved on that call. First time I saw it on a real fetch I thought it was miscalculated. It wasn’t.


The part that’s actually interesting: shared cache

Here’s where it gets worth thinking about.

deshell maintains a shared cache across all agents using the service. When any agent fetches a URL, the converted markdown is cached and available to every other agent. Not cached per-account; cached across the network.

This matters because AI agents, at the aggregate level, are reading a lot of the same content. The Python docs. The OpenAI API reference. The GitHub README for every major library. When one agent converts that content, every subsequent agent gets it in under 50ms, preconverted, at no additional processing cost.

The current cache hit rate across the network is 62%. Roughly 73% of documentation pages are already cached by the time a new agent asks for them. The network already has 122.8 million tokens saved across all agents since launch.

This is the thing you cannot build for yourself. You could run your own HTML-to-markdown converter, and several exist. But your private cache only benefits your agents. The shared cache means every agent on the network is, collectively, doing the pre-processing work for everyone else. The more agents use it, the higher the hit rate, the cheaper and faster it gets for everyone.


MCP: drop it into Claude Desktop or Cursor in 30 seconds

If you’re using Claude Desktop, Cursor, Windsurf, or any MCP-compatible tool, there’s an MCP server:

npx @deshell/mcp

That’s the full installation. After that, Claude Desktop gets two new tools: deshell_scrape and deshell_search. Every web fetch your Claude sessions make goes through the proxy automatically. No code changes. No configuration beyond setting the API key.

The package is @deshell/mcp on NPM, published today.


Compared to the obvious alternatives

Jina Reader does something similar, and they’ve been doing it since April 2024. If you know about them, you’re probably wondering what the difference is.

A few things:

First, the shared cache. Jina caches per-user. deshell caches across the network. For AI agents reading common documentation, the hit rates are materially different.

Second, the MCP server. Jina has API endpoints; deshell has npx @deshell/mcp. For Claude Desktop users, that’s a real UX difference.

Third, proxy mode. You can set HTTPS_PROXY to point at deshell, and every HTTP call in your agent framework routes through it without any code changes. I’m not aware of another service in this space that offers that.

Fourth, fallback behavior. If deshell can’t convert a page, it returns the raw HTML rather than an error. Your agent keeps working; it just pays more for that call. No dropped requests.


What’s available today

  • HTTP proxy: works with curl, any HTTP client, any framework
  • Converts HTML, PDFs, SPAs, and screenshots to markdown
  • Web search that returns markdown results instead of raw HTML
  • Site change monitoring with markdown fetch on update
  • MCP server: npx @deshell/mcp (Claude Desktop, Cursor, Windsurf, anything MCP-compatible)
  • OpenClaw skill: clawhub install deshell
  • ROI calculator at deshell.ai so you can estimate your own savings before committing to anything

Pricing

500 free credits per month to start. Credit packs from $5. No credit card required to get started.

The first thing I’d suggest: sign up, make one fetch through the proxy, and look at the X-DeShell-Savings header. That number will either be interesting to you or it won’t. If your agents are reading the web at any scale, it will be interesting.


Try it

https://deshell.ai/signup

No pressure. 500 free credits, no card, one curl command to see if the numbers make sense for your use case. If they do, great. If not, you’ve spent five minutes and learned something about where your token budget is actually going.