What Is a Token and Why Does It Cost Money?
Large language models don't read words — they read tokens. A token is roughly ¾ of a word in English. The phrase "context window" is 2 tokens. The word "tokenization" is 3. Whitespace, punctuation, and code operators each consume tokens too.
Every major LLM provider charges per token:
| Model | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| GPT-4o mini | $0.15 | $0.60 |
At $2.50/M input tokens, a single 10,000-token prompt costs $0.025. That sounds trivial. Run 5,000 prompts per day and you're at $125/day — $3,750/month — just on input tokens, before counting outputs.
What's Actually Driving Up Your Token Count
Most people focus on the content of their prompts. The bigger opportunity is in the noise that surrounds that content.
1. Comments and Documentation
A TypeScript file with full JSDoc blocks can be 30–40% comments by token count. The model doesn't need the comment to understand the function — it can read the function signature and implementation directly.
// BEFORE: 2,340 tokens
/**
* Processes a payment transaction for the given customer.
* @param amount - The amount in cents to charge
* @param customerId - The Stripe customer ID
* @returns Promise resolving to the Payment Intent
* @throws PaymentError if the charge fails
*/
async function processPayment(amount: number, customerId: string): Promise<PaymentIntent> {
// AFTER: 1,290 tokens (45% less)
async function processPayment(amount: number, customerId: string): Promise<PaymentIntent> {
2. Filler Phrases in Natural Language
Written text is full of phrases that carry no semantic weight for AI comprehension:
| Verbose phrase | Compressed to | Tokens saved |
|---|---|---|
| "in order to" | "to" | 2 |
| "at this point in time" | "now" | 4 |
| "due to the fact that" | "because" | 4 |
| "it is important to note that" | (removed) | 6 |
| "I would like to kindly request" | "Please" | 5 |
Across a 1,000-word document, filler removal alone typically saves 8–12%.
3. Markdown Formatting Overhead
If you're passing markdown content to an API call, the formatting tokens cost you:
###headers: 1–2 tokens per heading**bold**: 2 tokens per bold phrase---dividers: 1 token each> blockquotes: 1 token per line
Stripping markdown from documents passed to models that don't render it: 5–15% savings.
4. Redundant Whitespace
Double blank lines, trailing spaces, and inconsistent indentation all consume tokens. Normalization alone saves 3–8% on most real-world documents.
How to Measure Your Actual Token Count
Do not use word-count heuristics ("1 token ≈ ¾ word"). Use exact tokenizers:
For OpenAI models (GPT-4, GPT-4o):
npx tiktoken encode --model gpt-4o myfile.txt
For Claude:
Anthropic's API returns token counts in response metadata — check usage.input_tokens.
For any model via CleanMyPrompt: The web tool shows live token counts for GPT-4o, Claude, and Gemini before and after compression. Paste your text and switch between models to compare.
Real-World Compression Results
These are measured on typical production content:
| Content type | Before | After | Savings |
|---|---|---|---|
| TypeScript service file (400 lines) | 2,340 | 1,290 | 45% |
| Legal contract summary | 2,847 | 1,821 | 36% |
| Support ticket batch (10 tickets) | 1,523 | 998 | 35% |
| Python FastAPI endpoint (280 lines) | 1,820 | 980 | 46% |
| Marketing brief | 892 | 534 | 40% |
.env file (post-redaction) |
420 | 240 | 43% |
Three Compression Levels
Not every use case needs maximum compression. CleanMyPrompt supports three levels:
| Level | What it removes | Best for |
|---|---|---|
conservative |
Blank lines, trailing whitespace, filler phrases | General prompts, documentation |
normal (default) |
+ single-line code comments, markdown headers | Code review context, explanations |
aggressive |
+ JSDoc, unused imports, debug logs, dedup | RAG pipelines, large context windows, CI |
For context sent to Copilot Chat or a RAG system: use aggressive. For prompts you write and review: use normal.
The Cost Math
Let's make this concrete. A team of 5 developers uses GPT-4o via the API for code review automation:
- 50 code reviews per day
- Average context per review: 8,000 tokens (before compression)
- Daily input tokens: 400,000
- Monthly cost: ~$30
After 40% compression:
- Average context per review: 4,800 tokens
- Daily input tokens: 240,000
- Monthly cost: ~$18
Saving: $12/month — from one team, one workflow. Scale this to 20 developers across 4 workflows and you're saving $200+/month without changing a single prompt.
How to Integrate Compression Into Your Pipeline
Via the CLI (one command):
npm install -g cleanmyprompt
cmp squeeze context.ts --level aggressive
Via the REST API (programmatic):
curl -X POST https://cleanmyprompt.io/api/v1/clean \
-H "Content-Type: application/json" \
-d '{"text": "your prompt here", "squeeze": true, "aggressive": true}'
The API returns the compressed text plus original_tokens, output_tokens, and savings_pct so you can log and track savings over time.
Via the VS Code extension:
Ctrl+Shift+P → CMP: Squeeze File — processes the current file in-place and shows a token delta notification.
Start Measuring
The fastest way to see your actual savings: go to cleanmyprompt.io, paste a file you regularly send to an AI, and toggle Token Squeeze. The token counts update live.
For deeper integration: CleanMyPrompt CLI docs · REST API reference · VS Code extension