Don't stop here
Hand-picked guides our readers explore right after this one.
Prompting strategies for the OpenAI API
Read the guideExpert guide to Claude prompts with XML tags, artifacts, and complex reasoning
Read the guideAI video generation prompts for OpenAI Sora cinematic scenes, product videos, and creative content
Read the guideOpenAI API pricing in 2026 is pay-as-you-go by the token, with no flat monthly fee. On the Standard tier, priced per 1 million tokens, the GPT-5.4 family runs from nano at $0.20 input and $1.25 output, to mini at $0.75 and $4.50, to GPT-5.4 at $2.50 and $15. The latest flagship, GPT-5.5, costs $5 input and $30 output, and GPT-5.5 Pro costs $30 and $180 for research-grade work. Cached input is roughly ten times cheaper than fresh input, the Batch API cuts rates by 50 percent in exchange for up to 24 hours of latency, and built-in tools like web search and file search bill on top. There is no permanent free tier, only occasional promotional credits for new accounts. Token prices verified on developers.openai.com on May 26, 2026.
High-volume, simple, latency-sensitive tasks
LIMITATIONS
Most production apps that want quality without flagship cost
LIMITATIONS
Apps needing stronger reasoning and long context
LIMITATIONS
Complex reasoning, coding, and agentic workloads
LIMITATIONS
Frontier research and the most demanding analysis
LIMITATIONS
Output tokens cost far more than input. GPT-5.5 is $5 per 1M input but $30 per 1M output, and reasoning models bill their internal reasoning as output, so a verbose model can multiply your bill.
Long-context requests cost more. GPT-5.5 jumps from $5 and $30 (short context) to $10 input and $45 output per 1M on long context, per the OpenAI pricing page.
Built-in tools bill on top of tokens. Web search is $10 per 1,000 calls (plus search content tokens at model rates), file search is $2.50 per 1,000 calls plus $0.10 per GB per day of storage, and Code Interpreter containers run from $0.03 to $1.92 per 20-minute session by memory size.
Cached input is roughly ten times cheaper than fresh input, but only applies to repeated prompt prefixes, so the discount helps long system prompts and not one-off requests.
The Batch API saves 50 percent on input and output but can take up to 24 hours to return, which rules it out for interactive features.
Regional processing and data-residency endpoints add a 10 percent uplift on the GPT-5.5 and GPT-5.4 models, per OpenAI's data-residency note.
OpenAI is winding down its self-serve fine-tuning platform. It is closed to new users, and existing fine-tunes bill at higher inference rates than base models.
The Sora 2 video API still appears on the pricing page (sora-2 at $0.10 per second, sora-2-pro up to $0.70 per second at 1080p) but OpenAI's Help Center says the Sora API is scheduled to shut down on September 24, 2026, so do not build long-term on it.
There is no perpetual free tier. New accounts sometimes receive promotional credits, but ongoing use is pay-as-you-go.
We verify these token prices on the first weekend of each calendar quarter against developers.openai.com/api/docs/pricing. Last full pass: May 26, 2026. Next planned: August 1, 2026.
We run production workloads on the OpenAI API, and the rule we repeat to every new engineer is that the model name is rarely the biggest line on the bill. Output volume is. GPT-5.5 reads as a $5 model, but its $30 output rate is where real spend lands once an agent starts generating long responses and reasoning traces.
The cost lever that surprised us most was prompt caching. After we moved a long, stable system prompt into cache, the input portion of that endpoint dropped by roughly ten times overnight, because cached input on GPT-5.5 is $0.50 per 1M versus $5 fresh. If you reuse a big system prompt, this is the single highest-return change you can make.
We default new features to GPT-5.4 mini at $0.75 and $4.50, measure quality, and only promote the specific calls that fail to GPT-5.4 or GPT-5.5. Routing by task rather than picking one flagship for everything cut our blended cost well below half of an all-GPT-5.5 setup.
One honest caveat for non-developers: if you just want to use GPT-5 yourself, do not touch the API. A $20 ChatGPT Plus subscription is cheaper and simpler. The API is for building products, where per-token billing, caching, and batch discounts pay off at scale.
For most builders in 2026, GPT-5.4 mini at $0.75 input and $4.50 output per 1M tokens is the best value, and it is where we start every new project. It handles the bulk of production chat, drafting, classification, and summarization at a fraction of flagship cost. Drop to GPT-5.4 nano at $0.20 and $1.25 for high-volume, simple jobs like routing and tagging where reasoning depth does not matter. Step up to GPT-5.4 at $2.50 and $15 when a task needs real reasoning or long context, and reserve GPT-5.5 at $5 and $30 for coding agents and genuinely hard problems. GPT-5.5 Pro at $30 and $180 is a research-grade tool, not a default. The biggest cost lever is not the model, it is output volume and caching: cache long system prompts, keep responses tight, and use the Batch API for anything that can wait. When NOT to use the API: if you are not a developer and just want to chat with GPT-5, a $20 per month ChatGPT Plus subscription is cheaper and simpler than wiring up token billing. The API earns its keep when you are building a product, not when you are a single user.
It is pay-as-you-go by the token, with no monthly fee. On the Standard tier per 1 million tokens, GPT-5.4 nano is $0.20 input and $1.25 output, mini is $0.75 and $4.50, GPT-5.4 is $2.50 and $15, GPT-5.5 is $5 and $30, and GPT-5.5 Pro is $30 and $180. Cached input is about ten times cheaper. These rates were verified on developers.openai.com on May 26, 2026.
GPT-5.4 nano is the cheapest model in the GPT-5 class at $0.20 per 1M input tokens and $1.25 per 1M output tokens, with cached input at just $0.02. It is built for high-volume, latency-sensitive work like classification, routing, and tagging. It trades reasoning depth for cost, so it is not the right choice for complex analysis or coding, where mini or GPT-5.4 earn their higher price.
Input tokens are what you send the model, including your prompt and any context. Output tokens are what the model generates back. Output almost always costs more. On GPT-5.5 input is $5 per 1M and output is $30 per 1M, a 6x gap. Reasoning models also bill their hidden reasoning steps as output tokens, which is why a chatty or heavily-reasoning model can cost much more than the input figure suggests.
When the start of your prompt repeats across requests, OpenAI can serve that prefix from cache at roughly ten times less than fresh input. For GPT-5.5 that is $0.50 per 1M cached versus $5 fresh. The discount applies to repeated prompt prefixes, so it rewards long, stable system prompts reused across calls. One-off requests with unique prompts see no caching benefit.
The Batch API runs your requests asynchronously and returns results within 24 hours, in exchange for a 50 percent discount on both input and output tokens. It is ideal for bulk jobs like generating embeddings, classifying large datasets, or producing content overnight. It is the wrong tool for anything interactive, since users will not wait hours. For real-time features, stay on the Standard tier.
No, there is no perpetual free tier. The API is pay-as-you-go. OpenAI has at times granted small promotional credits to new accounts, but you cannot rely on a standing free allowance. If you only want to chat with GPT-5 as an individual rather than build software, a ChatGPT Plus subscription at $20 per month is the cheaper and simpler path than setting up API billing.
Tools bill on top of token usage. Web search is $10 per 1,000 calls on reasoning models, plus search content tokens at model rates, and $25 per 1,000 calls on the non-reasoning preview. File search is $2.50 per 1,000 calls plus $0.10 per GB per day of storage, with 1 GB free. Code Interpreter and Hosted Shell containers run from $0.03 for 1 GB to $1.92 for 64 GB per 20-minute session.
Image generation with gpt-image-2 is $8 per 1M image input tokens and $30 per 1M output, with a smaller gpt-image-1-mini at $2.50 and $8. Realtime audio on gpt-realtime-2 is $32 input and $64 output per 1M, with gpt-realtime-mini at $10 and $20. Transcription is about $0.006 per minute on gpt-4o-transcribe. Sora 2 video is $0.10 per second, though its API is set to shut down on September 24, 2026.
They bill differently. ChatGPT Plus is a flat $20 per month for a person using the chat app. The API charges per token for software you build. If you are a single user chatting, Plus is far cheaper. If you are powering a product that makes thousands of calls, the API scales with usage and gives you model choice, caching, and batch discounts that a subscription does not. Most teams use Plus for staff and the API for product.
Combine four levers. Use the smallest model that passes your quality bar, often GPT-5.4 mini or nano. Cache stable system prompts to cut input cost roughly tenfold. Send anything non-interactive through the Batch API for a 50 percent discount. And keep outputs short, since output tokens cost several times more than input. Together these typically cut a naive bill by more than half without touching quality on most workloads.
Only for the hardest tasks. GPT-5.5 at $5 input and $30 output is double the cost of GPT-5.4 at $2.50 and $15, and the quality gap shows up mainly on complex coding, multi-step reasoning, and agentic work. For everyday chat, drafting, and summarization, GPT-5.4 or even mini deliver near-identical results for less. Reserve GPT-5.5 for the slice of traffic that genuinely needs it, and route the rest to cheaper models.
OpenAI changes API rates several times a year as models launch and older ones drop in price, so a token-pricing page dates faster than a consumer plan would. We re-confirm every number here against developers.openai.com once a quarter, and the upcoming pass falls in August 2026. When this page and OpenAI disagree, OpenAI is correct, so check the live docs before committing a budget.