CleanMyPrompt
2026-03-06CleanMyPrompt Team4 min read

LLM Token Costs Explained: How to Estimate and Cut Your AI API Bill

Token costs compound silently until they don't. This guide explains exactly how tokenization works, what drives up your token count, and the concrete steps that cut AI API costs by 30–50% without changing your prompts.

tokenscost-optimizationopenaiclaudegeminiapitoken-compression

What Is a Token and Why Does It Cost Money?

Large language models don't read words — they read tokens. A token is roughly ¾ of a word in English. The phrase "context window" is 2 tokens. The word "tokenization" is 3. Whitespace, punctuation, and code operators each consume tokens too.

Every major LLM provider charges per token:

Model Input cost (per 1M tokens) Output cost (per 1M tokens)
GPT-4o $2.50 $10.00
Claude 3.5 Sonnet $3.00 $15.00
Gemini 1.5 Pro $1.25 $5.00
GPT-4o mini $0.15 $0.60

At $2.50/M input tokens, a single 10,000-token prompt costs $0.025. That sounds trivial. Run 5,000 prompts per day and you're at $125/day — $3,750/month — just on input tokens, before counting outputs.


What's Actually Driving Up Your Token Count

Most people focus on the content of their prompts. The bigger opportunity is in the noise that surrounds that content.

1. Comments and Documentation

A TypeScript file with full JSDoc blocks can be 30–40% comments by token count. The model doesn't need the comment to understand the function — it can read the function signature and implementation directly.

// BEFORE: 2,340 tokens
/**
 * Processes a payment transaction for the given customer.
 * @param amount - The amount in cents to charge
 * @param customerId - The Stripe customer ID
 * @returns Promise resolving to the Payment Intent
 * @throws PaymentError if the charge fails
 */
async function processPayment(amount: number, customerId: string): Promise<PaymentIntent> {

// AFTER: 1,290 tokens (45% less)
async function processPayment(amount: number, customerId: string): Promise<PaymentIntent> {

2. Filler Phrases in Natural Language

Written text is full of phrases that carry no semantic weight for AI comprehension:

Verbose phrase Compressed to Tokens saved
"in order to" "to" 2
"at this point in time" "now" 4
"due to the fact that" "because" 4
"it is important to note that" (removed) 6
"I would like to kindly request" "Please" 5

Across a 1,000-word document, filler removal alone typically saves 8–12%.

3. Markdown Formatting Overhead

If you're passing markdown content to an API call, the formatting tokens cost you:

  • ### headers: 1–2 tokens per heading
  • **bold**: 2 tokens per bold phrase
  • --- dividers: 1 token each
  • > blockquotes: 1 token per line

Stripping markdown from documents passed to models that don't render it: 5–15% savings.

4. Redundant Whitespace

Double blank lines, trailing spaces, and inconsistent indentation all consume tokens. Normalization alone saves 3–8% on most real-world documents.


How to Measure Your Actual Token Count

Do not use word-count heuristics ("1 token ≈ ¾ word"). Use exact tokenizers:

For OpenAI models (GPT-4, GPT-4o):

npx tiktoken encode --model gpt-4o myfile.txt

For Claude: Anthropic's API returns token counts in response metadata — check usage.input_tokens.

For any model via CleanMyPrompt: The web tool shows live token counts for GPT-4o, Claude, and Gemini before and after compression. Paste your text and switch between models to compare.


Real-World Compression Results

These are measured on typical production content:

Content type Before After Savings
TypeScript service file (400 lines) 2,340 1,290 45%
Legal contract summary 2,847 1,821 36%
Support ticket batch (10 tickets) 1,523 998 35%
Python FastAPI endpoint (280 lines) 1,820 980 46%
Marketing brief 892 534 40%
.env file (post-redaction) 420 240 43%

Three Compression Levels

Not every use case needs maximum compression. CleanMyPrompt supports three levels:

Level What it removes Best for
conservative Blank lines, trailing whitespace, filler phrases General prompts, documentation
normal (default) + single-line code comments, markdown headers Code review context, explanations
aggressive + JSDoc, unused imports, debug logs, dedup RAG pipelines, large context windows, CI

For context sent to Copilot Chat or a RAG system: use aggressive. For prompts you write and review: use normal.


The Cost Math

Let's make this concrete. A team of 5 developers uses GPT-4o via the API for code review automation:

  • 50 code reviews per day
  • Average context per review: 8,000 tokens (before compression)
  • Daily input tokens: 400,000
  • Monthly cost: ~$30

After 40% compression:

  • Average context per review: 4,800 tokens
  • Daily input tokens: 240,000
  • Monthly cost: ~$18

Saving: $12/month — from one team, one workflow. Scale this to 20 developers across 4 workflows and you're saving $200+/month without changing a single prompt.


How to Integrate Compression Into Your Pipeline

Via the CLI (one command):

npm install -g cleanmyprompt
cmp squeeze context.ts --level aggressive

Via the REST API (programmatic):

curl -X POST https://cleanmyprompt.io/api/v1/clean \
  -H "Content-Type: application/json" \
  -d '{"text": "your prompt here", "squeeze": true, "aggressive": true}'

The API returns the compressed text plus original_tokens, output_tokens, and savings_pct so you can log and track savings over time.

Via the VS Code extension: Ctrl+Shift+PCMP: Squeeze File — processes the current file in-place and shows a token delta notification.


Start Measuring

The fastest way to see your actual savings: go to cleanmyprompt.io, paste a file you regularly send to an AI, and toggle Token Squeeze. The token counts update live.

For deeper integration: CleanMyPrompt CLI docs · REST API reference · VS Code extension

Try CleanMyPrompt

Strip PII, compress tokens, and clean text for AI — 100% in your browser. No sign-up required.

Try It Free