Gemini Pricing: The Opportunity and the Trap
Google's Gemini models are among the most cost-competitive in the market:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini 1.5 Flash | $0.075 | $0.30 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| Gemini 2.0 Flash | $0.10 | $0.40 |
The trap: Gemini's large context window (1M tokens for 1.5 Pro) makes it tempting to throw everything in. But you still pay per token. A 100,000-token context passed through 500 times per day costs $62.50/day in input tokens alone — before a single output token.
The solution isn't to use a smaller context. It's to send less noise.
The Five Highest-Impact Optimizations
1. Use Flash for Tasks That Don't Need Pro
Gemini 1.5 Flash is 94% cheaper than Gemini 1.5 Pro per input token. For tasks like:
- Summarization
- Classification
- Entity extraction
- Simple code generation
...Flash delivers comparable quality at a fraction of the cost. Save Pro for reasoning-intensive tasks where the quality difference is measurable.
2. Compress Input Text Before Sending
This is the highest-leverage optimization regardless of model. The CleanMyPrompt squeeze engine removes:
- Filler phrases ("in order to", "it is important to note that")
- Redundant whitespace and blank lines
- Markdown formatting overhead when not needed
- Repeated text blocks
Typical result: 25–40% token reduction on natural language, 40–50% on source code.
# CLI: compress before sending to any LLM
npm install -g cleanmyprompt
cmp squeeze context.txt --level aggressive
# REST API: compress in your pipeline
curl -X POST https://cleanmyprompt.io/api/v1/clean \
-H "Content-Type: application/json" \
-d '{"text": "your long document...", "squeeze": true, "aggressive": true}'
3. Use Structured Inputs Instead of Verbose Natural Language
Verbose natural language system prompts are a common source of wasted tokens:
# Verbose (127 tokens)
You are a highly skilled summarization assistant. Your task is to carefully read
the following document and produce a concise, accurate summary that captures the
most important points. Please ensure your summary is no longer than three sentences.
# Compressed (31 tokens)
Summarize in 3 sentences:
For structured data inputs, prefer JSON over prose:
// 45 tokens
{"customer": "Acme Corp", "issue": "payment failure", "order_id": "ORD-12345"}
// vs. 85 tokens in prose form
// "The customer is Acme Corp and they are reporting a payment failure
// on their recent order with ID ORD-12345."
4. Batch Short Requests Into a Single Call
API overhead is real. For high-volume workflows (classifying 1,000 support tickets, extracting entities from 500 documents), batching reduces:
- Per-request overhead tokens (system prompt repeated)
- Network round-trips
- Latency
Gemini supports multi-turn context natively. A batch of 10 classification tasks in one prompt typically costs 20–30% fewer total tokens than 10 separate calls.
5. Cache and Reuse Context
If you're repeatedly passing the same large document or codebase as context:
- Gemini 1.5's Context Caching (via the API) lets you store context server-side and reference it by ID. Cached tokens cost ~4x less than fresh input tokens.
- For local workflows, save a compressed version of your context and reuse it rather than regenerating on every call.
Redact Before You Send
Beyond cost, there's a security concern specific to Gemini's large context window: it's tempting to drop entire codebases or document archives in. Those archives almost certainly contain secrets.
Before passing any large context to Gemini:
cmp fix context/ --recursive # redact secrets from all files
cmp squeeze context/ --recursive # then compress
Or use the web tool for one-off cleaning.
Measuring Your Actual Savings
Gemini's API returns token counts in the response metadata:
response = model.generate_content(prompt)
print(response.usage_metadata.prompt_token_count) # input tokens used
print(response.usage_metadata.candidates_token_count) # output tokens
Log these over time. Compare before/after compression. The savings are consistent and compounding.
The Quick Win
If you're using Gemini via the API or Google AI Studio and want to cut costs today:
- Install the CleanMyPrompt CLI:
npm install -g cleanmyprompt - Run
cmp squeeze your-context.txt --level aggressivebefore each call - Check the token count in your response metadata
Or use the web tool to see your compression ratio before committing to the API integration.
Related: Reduce ChatGPT API Costs · LLM Token Costs Explained