CleanMyPrompt
2026-03-08CleanMyPrompt Team3 min read

How to Reduce Gemini API Costs: Token Optimization for Google AI

Gemini 1.5 Pro and Flash have attractive pricing — but only if you manage token usage deliberately. Here are the concrete tactics that cut Gemini API costs by 30–50% without degrading output quality.

geminigoogle-aitokenscost-optimizationapi

Gemini Pricing: The Opportunity and the Trap

Google's Gemini models are among the most cost-competitive in the market:

Model Input (per 1M tokens) Output (per 1M tokens)
Gemini 1.5 Flash $0.075 $0.30
Gemini 1.5 Pro $1.25 $5.00
Gemini 2.0 Flash $0.10 $0.40

The trap: Gemini's large context window (1M tokens for 1.5 Pro) makes it tempting to throw everything in. But you still pay per token. A 100,000-token context passed through 500 times per day costs $62.50/day in input tokens alone — before a single output token.

The solution isn't to use a smaller context. It's to send less noise.


The Five Highest-Impact Optimizations

1. Use Flash for Tasks That Don't Need Pro

Gemini 1.5 Flash is 94% cheaper than Gemini 1.5 Pro per input token. For tasks like:

  • Summarization
  • Classification
  • Entity extraction
  • Simple code generation

...Flash delivers comparable quality at a fraction of the cost. Save Pro for reasoning-intensive tasks where the quality difference is measurable.

2. Compress Input Text Before Sending

This is the highest-leverage optimization regardless of model. The CleanMyPrompt squeeze engine removes:

  • Filler phrases ("in order to", "it is important to note that")
  • Redundant whitespace and blank lines
  • Markdown formatting overhead when not needed
  • Repeated text blocks

Typical result: 25–40% token reduction on natural language, 40–50% on source code.

# CLI: compress before sending to any LLM
npm install -g cleanmyprompt
cmp squeeze context.txt --level aggressive
# REST API: compress in your pipeline
curl -X POST https://cleanmyprompt.io/api/v1/clean \
  -H "Content-Type: application/json" \
  -d '{"text": "your long document...", "squeeze": true, "aggressive": true}'

3. Use Structured Inputs Instead of Verbose Natural Language

Verbose natural language system prompts are a common source of wasted tokens:

# Verbose (127 tokens)
You are a highly skilled summarization assistant. Your task is to carefully read
the following document and produce a concise, accurate summary that captures the
most important points. Please ensure your summary is no longer than three sentences.

# Compressed (31 tokens)
Summarize in 3 sentences:

For structured data inputs, prefer JSON over prose:

// 45 tokens
{"customer": "Acme Corp", "issue": "payment failure", "order_id": "ORD-12345"}

// vs. 85 tokens in prose form
// "The customer is Acme Corp and they are reporting a payment failure
// on their recent order with ID ORD-12345."

4. Batch Short Requests Into a Single Call

API overhead is real. For high-volume workflows (classifying 1,000 support tickets, extracting entities from 500 documents), batching reduces:

  • Per-request overhead tokens (system prompt repeated)
  • Network round-trips
  • Latency

Gemini supports multi-turn context natively. A batch of 10 classification tasks in one prompt typically costs 20–30% fewer total tokens than 10 separate calls.

5. Cache and Reuse Context

If you're repeatedly passing the same large document or codebase as context:

  • Gemini 1.5's Context Caching (via the API) lets you store context server-side and reference it by ID. Cached tokens cost ~4x less than fresh input tokens.
  • For local workflows, save a compressed version of your context and reuse it rather than regenerating on every call.

Redact Before You Send

Beyond cost, there's a security concern specific to Gemini's large context window: it's tempting to drop entire codebases or document archives in. Those archives almost certainly contain secrets.

Before passing any large context to Gemini:

cmp fix context/ --recursive    # redact secrets from all files
cmp squeeze context/ --recursive # then compress

Or use the web tool for one-off cleaning.


Measuring Your Actual Savings

Gemini's API returns token counts in the response metadata:

response = model.generate_content(prompt)
print(response.usage_metadata.prompt_token_count)   # input tokens used
print(response.usage_metadata.candidates_token_count)  # output tokens

Log these over time. Compare before/after compression. The savings are consistent and compounding.


The Quick Win

If you're using Gemini via the API or Google AI Studio and want to cut costs today:

  1. Install the CleanMyPrompt CLI: npm install -g cleanmyprompt
  2. Run cmp squeeze your-context.txt --level aggressive before each call
  3. Check the token count in your response metadata

Or use the web tool to see your compression ratio before committing to the API integration.

Related: Reduce ChatGPT API Costs · LLM Token Costs Explained

Try CleanMyPrompt

Strip PII, compress tokens, and clean text for AI — 100% in your browser. No sign-up required.

Try It Free