Data

AI Token Pricing Explained: Input, Output and Per-Million Rates

Every AI request is metered in tokens. Here's how input, output, and per-million rates actually work — and how to keep your AI spend predictable.

VNVeronica Nguyen7 min read

If you've looked at any AI provider's pricing page, you've seen rates like “$3 per million input tokens, $15 per million output tokens.” It looks simple — until you try to predict what a month of real usage will cost. This guide breaks down what tokens are, how per-million pricing actually works, and what really drives your bill.

What is a token, exactly?

A token is the unit AI models read and write in — a chunk of text that's usually a short word, part of a longer word, or a piece of punctuation. The model never sees “words”; it sees a stream of tokens, and billing counts every one of them in both directions.

1 token ≈ 4 characters of English text, or about three-quarters of a word.
1,000 tokens ≈ 750 words — roughly a page and a half of writing.
A typical document page is 500–600 tokens once headers and whitespace are stripped.
Code, numbers, and non-English text tokenize less efficiently — the same content can cost 20–40% more tokens.

How per-million pricing works

Providers price tokens per million, with separate rates for input and output. Input is everything you send: the system prompt, conversation history, any retrieved documents, and the user's question. Output is everything the model generates back. Output rates are typically 3–6× higher than input rates, because generating text is more computationally expensive than reading it.

Cost = (input tokens × input rate + output tokens × output rate) ÷ 1,000,000

A realistic example: a knowledge-assistant query that sends 1,500 input tokens and gets a 300-token answer, on a mid-tier model priced at $3 / $15 per million, costs $0.0045 + $0.0045 — about nine-tenths of a cent. Individual queries are cheap; volume and context size are what move the bill.

Per-million rates by model tier

Model tier	Input / 1M	Output / 1M	Best for
Flagship reasoning	$10–15	$50–75	Complex analysis, agentic workflows
Mid-tier workhorse	$2.50–5	$10–25	Most production assistants and RAG
Small / fast	$0.25–1	$1–5	Classification, routing, simple Q&A

Illustrative ranges — providers update pricing frequently, so always check current rate cards before modeling costs.

What actually drives your bill

Long system prompts are resent with every single request — a 2,000-token prompt at scale is a standing tax.
Conversation history grows every turn so turn ten of a chat can cost several times turn one.
Retrieved context (RAG) usually dominates input — the documents you attach dwarf the question itself.
Verbose outputs multiply your most expensive token type.
Retries, fallbacks, and evals are real usage too, even though no user ever sees them.

The “resend tax”

Models are stateless: the full prompt and history are resent on every turn. Prompt caching — supported by most major providers — can cut the cost of repeated input by up to 90%, and is usually the single biggest savings lever for assistants with long, stable prompts.

Five ways to lower token spend

Cache long, stable prompts. System instructions and shared context should be written once and cached, not re-billed every call.
Route simple queries to smaller models. Classification and short factual answers rarely need a flagship model.
Trim and summarize context. Retrieve the three most relevant passages, not ten; summarize old history instead of replaying it.
Cap output length. Set max-token limits and ask for concise, structured answers.
Monitor per-feature usage. You can't optimize a bill you can't attribute. Tag requests by feature and watch the ratios.

Key takeaways

You pay for both directions: input and output tokens, at separate per-million rates.
Output tokens typically cost 3–6× more than input tokens.
Real queries cost fractions of a cent — context size and volume are the multipliers.
Prompt caching and model routing are the two biggest cost levers in production.

Frequently asked questions

How many words is 1,000 tokens?

Roughly 750 English words, or 3,000–4,000 characters. A full page of a typical business document is usually 500–600 tokens.

Do I pay for the system prompt on every request?

Yes. Models are stateless, so your system prompt and conversation history are resent — and billed — with every call. Prompt caching can reduce the cost of that repeated input substantially.

Are tokens counted the same across providers?

No. Each provider uses its own tokenizer, and the same text can differ by 10–20% in token count. Compare models on cost per task, not cost per token.

Curious what answers cost your organization?

Put real numbers on it in two minutes with our AI ROI Calculator — or see AskBobAI answer your team's questions live.

Try the AI ROI Calculator Get a Demo