Cost Estimation Before Execution
Session 10.4 · ~5 min read
Know the Bill Before You Press Enter
API calls cost money. Not much per call, but costs compound in batch processing. A batch of 100 articles that costs $15 is affordable. A batch of 100 articles where each fails and regenerates three times, with an over-long prompt that triples token usage, costs $135. The difference is the gap between estimation and guessing.
Cost estimation before execution means: calculating expected token counts, multiplying by per-token rates, adding a failure margin, and knowing the number before you commit. This is not optional at scale. It is how you prevent budget surprises.
Token Counting Fundamentals
API costs are measured in tokens. A token is roughly 0.75 words in English (or about 4 characters). A 1,000-word article is approximately 1,333 tokens of output. Your prompt (system message + user message + context) might be 2,000 to 5,000 tokens of input.
Costs are charged separately for input tokens and output tokens. Output tokens are typically 3 to 5 times more expensive than input tokens.
| Provider / Model | Input (per 1M tokens) | Output (per 1M tokens) | 1,000-word article cost* |
|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.03 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.01 |
| GPT-5.2 | $1.75 | $14.00 | $0.03 |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.02 |
| Gemini 2.0 Flash | $0.30 | $2.50 | $0.005 |
* Estimated for a single generation call with ~3,000 input tokens and ~1,333 output tokens. Multi-agent chains multiply this by the number of agents.
The Cost Estimation Formula
For a batch of N items, each processed by an agent chain with A agents:
Per-item cost = sum of (input_tokens * input_rate + output_tokens * output_rate) for each agent in the chain.
Failure multiplier = 1 + (expected_failure_rate * average_retries). If 20% of items fail and each gets 1 retry, the multiplier is 1.2. If 10% fail with 2 retries each, the multiplier is 1.2 as well.
| Scenario | Items | Per-item cost | Failure rate | Total estimated cost |
|---|---|---|---|---|
| Blog posts, 3-agent chain, Sonnet | 10 | $0.09 | 10% | $0.99 |
| Blog posts, 3-agent chain, Sonnet | 100 | $0.09 | 15% | $10.35 |
| Product descriptions, 4-agent, Haiku | 500 | $0.03 | 10% | $16.50 |
| Book chapters, 3-agent, Sonnet (long) | 25 | $0.35 | 20% | $10.50 |
Building a Cost Estimator
A cost estimator is a spreadsheet or script that takes your batch manifest and calculates the total cost before you execute. Inputs:
- Number of items in the manifest
- Estimated input tokens per agent per item (measure from your test runs)
- Estimated output tokens per agent per item
- API price per token (input and output, for your chosen model)
- Expected failure rate (from your error logs)
- Average retries per failure
Your AI coding assistant can build this in under 5 minutes. The spreadsheet version is a single formula row. Either way, run it before every batch.
Cost Optimization Strategies
Four strategies reduce batch costs without reducing quality:
| Strategy | How It Saves | Typical Savings |
|---|---|---|
| Use smaller models for appropriate tasks | Research and formatting agents can use Haiku/Flash instead of Sonnet/Pro | 40-70% per agent |
| Trim prompt length | Remove redundant instructions, reduce context to essentials | 10-30% on input costs |
| Prompt caching | Repeated system prompts are cached at 90% discount on most providers | Up to 90% on system prompt tokens |
| Batch API | Submit jobs for async processing (not real-time) at 50% discount | 50% across all tokens |
Prompt caching and batch API discounts are significant. If your system prompt is 2,000 tokens and you run 100 items, that is 200,000 cached input tokens at 10% of normal price instead of full price. The savings justify the minor increase in latency.
The cost of producing AI content is not zero. It is low enough to be dangerous. Low costs encourage waste: over-long prompts, unnecessary retries, premium models for simple tasks. Estimate costs before every batch. Track actual costs after. The discipline prevents waste from compounding.
Further Reading
- LLM API Pricing 2026: Compare 300+ Models, PricePerToken
- AI API Pricing Comparison 2026, IntuitionLabs
- LLM Cost Calculator, Morph
- Prompt Caching, Anthropic Documentation
Assignment
Build a cost estimator for your batch pipeline:
- Measure actual token counts from your test runs: input tokens per agent, output tokens per agent.
- Look up current per-token pricing for your chosen model.
- Calculate per-item cost across your full agent chain.
- Apply your failure rate from error logs (or estimate 15% if you do not have data yet).
- Run the estimator on your 10-item manifest from Session 10.2. What is the predicted cost?
After running the batch, compare estimated cost to actual cost. How close was your estimate? Adjust the estimator based on actual data.