MCP Token Trap: Why Your AI Agent Burns 35x More Tokens Than a CLI

Every time an LLM agent connects to an MCP server, the full tool catalog is injected into the context window. For a 93-tool GitHub MCP server, that is 55,000 tokens—before the agent does anything. Connect three services (GitHub, Slack, Sentry) and you have burned 143,000 of your 200,000 token window. Seventy-two percent gone. On idle.

This post presents the numbers, explains why, and shows how OnlyCLI-generated CLIs bring that cost down by 96–99%.

The numbers

These figures come from published benchmarks by independent researchers and our own measurements. All token counts use the Claude Sonnet tokenizer; GPT-4 counts are comparable within 10%.

MCP: schema injection on every turn

Metric	Value
Tokens per MCP tool definition	550–1,400
GitHub MCP server (93 tools)	~55,000 tokens
3 services loaded (GitHub + Slack + Sentry)	~143,000 tokens
% of 200K context window consumed on idle	72%
Cost per request at Claude Sonnet pricing ($3/M input)	~$0.17 for schema alone

Every completion request carries this payload, whether the model calls zero tools or ten. At 1,000 requests per day, schema overhead alone costs $170/day or $5,100/month. At 10,000 requests, $51,000/month.

CLI: on-demand discovery

Metric	Value
`--help` output for root command	~150–250 tokens
`--help` for a single subcommand	~80–150 tokens
One command invocation + JSON output	200–3,000 tokens
Agent skill file (`SKILL.md`) summary	~300–500 tokens
Overhead injected on idle turns	0 tokens

An agent using a CLI discovers capabilities by calling --help when needed. It does not carry a 55,000-token schema in every request. The discovery cost is paid once per conversation, not once per turn.

Direct comparison: real task

Task: “What languages does the octocat/Hello-World repository use?”

Approach	Tokens consumed	Cost (Claude Sonnet)
MCP (GitHub server loaded)	44,026	$0.132
CLI (`./github repos get --owner octocat --repo Hello-World --transform languages_url`)	1,365	$0.004
Ratio	32x	32x

The MCP approach pays for all 93 tool definitions plus the JSON-RPC envelope. The CLI approach pays for the command string and the JSON response.

Why MCP costs so much

MCP was designed for rich, interactive tool surfaces—IDE extensions, chat integrations, multi-step workflows. Its architecture assumes the host needs the full tool catalog available at all times:

Full schema injection: Every registered tool’s JSON Schema is serialized into the system prompt. The model needs it to decide which tool to call.
No lazy loading by default: Until Anthropic shipped Tool Search (late 2025), there was no standard way to defer schema loading. Even Tool Search still pulls full schemas on demand and is Anthropic-only.
Multiplicative scaling: Each additional MCP server adds its complete catalog. Five servers with 20 tools each = 100 tool definitions = ~80,000 tokens of schema before any task.

Why CLI costs so little

A generated CLI is a compiled binary. Its “schema” is the --help text, which the agent reads on demand:

$ ./github repos get --help
Get a repository

Usage:
  github repos get [flags]

Flags:
      --owner string   Repository owner
      --repo string    Repository name

That is ~80 tokens. The agent reads it once, then issues the command. On subsequent turns, it already knows the flags.

The SKILL.md advantage

OnlyCLI projects can ship a SKILL.md—a compact, agent-facing summary of the top operations. A 20-command summary fits in ~400 tokens. The agent reads it at conversation start and has enough context to call any operation without ever loading a full schema.

Compare: 400 tokens (SKILL.md) vs 55,000 tokens (MCP schema). That is 137x fewer tokens for the discovery step.

Cost at scale

Scale	MCP overhead/month	CLI overhead/month	Savings
100 req/day	$510	~$0	>99%
1,000 req/day	$5,100	~$12	99.8%
10,000 req/day	$51,000	~$120	99.8%

“CLI overhead” accounts for occasional --help calls during conversation starts. MCP overhead assumes the schema is carried on every completion request (the default behavior).

The emerging consensus

We are not alone in this analysis. Multiple independent projects have converged on the same conclusion:

mcp2cli (CyberCorsairs): Replaces full schema injection with lazy CLI discovery. Claims 96–99% token savings.
CLIHub (Kagan Yilmaz): Converts MCP servers to CLIs. Measures 92–98% savings.
BuildMVPFast analysis: Documents $81K/month overhead at scale for MCP-heavy architectures.
Vensas benchmark: Finds MCP 4–32x more expensive per task depending on complexity.

The pattern is clear: for stateless REST API access, MCP’s always-on schema model is a poor fit.

When MCP is still worth it

MCP is justified when you need:

Stateful sessions: Database connections, file handles, multi-step transactions.
Server push: Real-time notifications from the tool to the model.
Vendor lock-in mitigation: When the only integration a vendor ships is MCP.
Rich IDE integration: Where the host runtime handles lifecycle and the user benefits from interactive exploration.

For calling GET /repos/{owner}/{repo} and reading JSON? That is what CLIs were built for.

Try it yourself

Generate a CLI from any OpenAPI spec and compare your token costs:

# Install
go install github.com/onlycli/onlycli/cmd/onlycli@latest

# Generate from GitHub's spec
onlycli generate \
  --spec https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.yaml \
  --name github --auth bearer --out ./github-cli

cd github-cli && go mod tidy && go build -o github .

# 1,107 commands, ~200 tokens to discover
./github --help

Every endpoint in the GitHub REST API becomes a typed command with flags, help text, and shell completion—at a fraction of the token cost of MCP.

For a deeper technical comparison, see Why Native CLI Beats MCP for LLM Agent Tool Use. To get started, follow the quick start guide.