OpenAI Launches GPT-5.4 Mini and Nano for High-Volume AI Workloads

Peter Zhang Mar 18, 2026 02:05 UTC 18:05

0 Min Read

OpenAI dropped its most cost-efficient models yet on March 17, 2026—GPT-5.4 mini and nano—targeting developers building latency-sensitive applications where the flagship model's horsepower becomes overkill.

The mini variant runs more than twice as fast as GPT-5 mini while approaching the full GPT-5.4's performance on coding benchmarks. On SWE-Bench Pro, mini scored 54.4% compared to the flagship's 57.7%—a narrow gap that matters when you're paying 75 cents per million input tokens instead of premium rates.

Nano goes even cheaper at $0.20 per million input tokens and $1.25 per million output tokens. OpenAI positions it for classification, data extraction, and what they call "coding subagents"—smaller AI workers handling simpler tasks within larger systems.

The Subagent Play

Here's where this gets interesting for developers building agentic systems. OpenAI is explicitly pushing a tiered architecture: let GPT-5.4 handle planning and complex judgment while mini or nano subagents execute narrower tasks in parallel. In their Codex platform, mini uses only 30% of the GPT-5.4 quota.

The benchmark numbers back this up. Mini hit 72.1% on OSWorld-Verified for computer use tasks—nearly matching the flagship's 75%—while nano dropped to 39%. Translation: mini can interpret screenshots and navigate interfaces almost as well as the big model, but nano shouldn't touch those workflows.

Where Each Model Fits

The performance spread tells you exactly what OpenAI optimized for:

Mini excels at coding (54.4% SWE-Bench Pro, 60% Terminal-Bench 2.0) and tool-calling (93.4% on τ2-bench telecom tasks). It supports a 400k context window with text and image inputs, web search, and function calling.

Nano trades capability for cost efficiency. It scored 52.4% on SWE-Bench Pro and 46.3% on Terminal-Bench 2.0—respectable for a model at one-quarter mini's price point. But its long-context performance drops significantly, hitting just 33.1% on the 128K-256K needle retrieval test.

Hebbia's CTO Aabhas Sharma noted that mini "matched or exceeded competitive models on several output tasks and citation recall at a much lower cost" while achieving "stronger source attribution than the larger GPT-5.4 model."

Availability

Mini is live across the API, Codex, and ChatGPT. Free and Go users can access it through the Thinking feature; other tiers get it as a rate limit fallback for GPT-5.4 Thinking.

Nano remains API-only—a signal that OpenAI sees it primarily as infrastructure for developers rather than a consumer-facing product.

For teams running high-volume AI workloads, the math just changed. The question isn't whether to use smaller models anymore—it's figuring out which tasks actually need the flagship.

News ▸

OpenAI Launches GPT-5.4 Mini and Nano for High-Volume AI Workloads

The Subagent Play

Where Each Model Fits

Availability

Read More

NVIDIA Unveils AI Grid Architecture for Distributed Edge Inference at GTC 2026

Mamba-3 SSM Drops With Inference-First Design Beating Transformers at Decode

Algorand (ALGO) x402 Ideathon Draws 45 Builders to Berlin for Agentic Commerce Push

LangChain Releases Open SWE Framework for Enterprise AI Coding Agents

BNB Delivers 177% Returns for Holders Through Stacked Yield Programs