How to Cut Your Copilot and ChatGPT Token Costs by 50% — Without Losing Meaning

Token Costs Are Silent and Compounding

If you use LLMs at any scale — a team of developers, a CI pipeline generating code reviews, a RAG system over a large codebase — token costs compound fast.

Let's be concrete. GPT-4o at the time of writing costs roughly $2.50 per million input tokens. A typical TypeScript file with comments, blank lines, and boilerplate runs 1,500–3,000 tokens. Run that through 1,000 CI pipeline passes and you're paying for 1.5–3 billion tokens just in code context — before you've written a single output token.

Token compression is the highest-leverage, lowest-risk optimization you can make. It doesn't change your prompts. It doesn't affect model behavior. It just removes the noise the model ignores anyway.

What Token Compression Actually Removes

The CleanMyPrompt squeeze engine removes four categories of noise:

1. Comments and Docstrings

// This function calculates the total price
// including tax and discounts
// Author: Jane Smith, updated 2024-01-15
function calculateTotal(price: number, taxRate: number): number {

After compression:

function calculateTotal(price: number, taxRate: number): number {

The model doesn't need the comment to understand the function signature. If the docstring is informative, keep it (the --level conservative setting preserves them). If it's noise, strip it.

2. Unused Imports

This is particularly high-value in TypeScript and Python files that have grown organically:

import { useState, useEffect, useCallback, useRef, useMemo } from 'react'
import { debounce } from 'lodash'
import { formatDate } from '../utils/date'
import { validateEmail } from '../utils/validation'
// ... (only useState and useEffect are actually used below)

After compression, only the used identifiers remain. useCallback, useRef, useMemo, debounce, formatDate, validateEmail — gone if they're not referenced in the file body.

3. Excessive Blank Lines and Whitespace

Three blank lines between functions → one. Trailing spaces on every line → stripped. Indentation normalized. These changes are imperceptible to a human reader but measurable in token counts.

4. Repeated Text Blocks

If your file contains the same error message, the same config block, or the same boilerplate repeated multiple times, the deduplication pass collapses repeated patterns. This matters most in generated code, test fixtures, and data files.

Real Numbers: Before and After

Here's a concrete example from a real file (verbose.ts, 156 lines of TypeScript with heavy comments and unused imports):

$ cmp squeeze verbose.ts --level aggressive --verbose

Input:    2,847 tokens
Output:   1,391 tokens
Savings:  1,456 tokens (51.1%)

Removed:
  - 31 comment lines
  - 8 blank lines
  - 4 unused imports (lodash, useCallback, useRef, formatDate)
  - 2 duplicate blocks

51% reduction. On a file that was not especially verbose by real-world standards.

Three Compression Levels

The CLI and VS Code extension both support three levels:

Level	What it removes	Best for
`conservative`	Blank lines, trailing whitespace	Code you're shipping — minimal changes
`normal` (default)	+ single-line comments	Code reviews, explanations
`aggressive`	+ block comments, unused imports, dedup	RAG pipelines, large context windows

Use aggressive for context sent to an LLM. Use conservative for code you're writing and will commit.

How to Use It

CLI — single file:

npm install -g cleanmyprompt

cmp squeeze myfile.ts
# or with verbose output:
cmp squeeze myfile.ts --level aggressive --verbose

CLI — pipe to another command:

cmp squeeze context.ts --stdout | pbcopy   # macOS: copies to clipboard

CLI — whole directory (CI pipeline):

find src/ -name "*.ts" | xargs -I{} cmp squeeze {} --level aggressive

VS Code extension: Open a file → Command palette → CMP: Squeeze File

The extension processes in-place and shows a notification with the token delta.

Pairing Compression with Redaction

The real workflow for any LLM pipeline:

Redact first (cmp fix) — remove secrets and PII
Squeeze second (cmp squeeze) — remove noise
Send to model — minimal tokens, no sensitive data

Both operations are idempotent. You can run fix and squeeze in sequence on the same file without side effects.

cmp fix context.ts && cmp squeeze context.ts --level aggressive

Or from the VS Code command palette: CMP: Fix File then CMP: Squeeze File.

The Copilot Context Window Math

GitHub Copilot's context window for inline suggestions is approximately 2,000–4,000 tokens (varies by model). For Copilot Chat, it's larger, but you're charged per token in team plans.

If you're using Copilot Chat to explain a complex module — say, a 500-line React component — squeezing it first:

Reduces the context window pressure (more room for the actual question and response)
If you're on a usage-tracked plan, directly reduces the per-query cost
Speeds up the response (fewer tokens to process = lower latency)

The VS Code extension makes this frictionless: Cmd+Shift+P → CMP: Squeeze File → paste context.

Summary

Token costs are not fixed. Every token you send is a choice. The CleanMyPrompt squeeze engine removes the tokens that add cost without adding signal — comments, unused imports, whitespace, duplicates.

At 50% average compression, the math is straightforward:

If you're spending $100/month on LLM API costs for context, you could be spending $50.

Install the CLI:

npm install -g cleanmyprompt

Or install the VS Code extension and run CMP: Squeeze File on your next big context file.

More details: cleanmyprompt.io/cli