Post

GPT-5.5 Is Here — Everything You Need to Know

GPT-5.5 Is Here — Everything You Need to Know

OpenAI dropped GPT-5.5 yesterday — April 23, 2026 — just seven weeks after GPT-5.4. Internally codenamed “Spud”, it’s the first fully retrained base model since GPT-4.5, and OpenAI is calling it “a new class of intelligence for real work.” Here’s everything that actually matters.


What Is GPT-5.5?

GPT-5.5 is OpenAI’s latest frontier model, designed primarily for agentic and multi-step work — the kind of tasks where previous models needed constant hand-holding. Greg Brockman, OpenAI’s President, described it as “a big step towards more agentic and intuitive computing” and a foundation for OpenAI’s long-discussed super app — a unified platform combining ChatGPT, Codex, and an AI browser into one tool for enterprise customers.

The model comes in two variants:

VariantDescriptionAvailable To
GPT-5.5 (Thinking)Faster, smarter answers for complex workPlus, Pro, Business, Enterprise
GPT-5.5 ProDeepest reasoning, best for demanding tasksPro, Business, Enterprise only

What’s New and Actually Better

Agentic Coding — The Biggest Jump

GPT-5.5 is now the strongest coding model OpenAI has ever shipped. You can hand it a messy, multi-part engineering problem without step-by-step prompting — it plans, uses tools, checks its own work, and works toward a result autonomously.

Early testers reported saving up to 10 hours of work per week on engineering tasks. One founder described asking GPT-5.5 to diagnose and rewrite a broken system that had stumped his best engineer for days — GPT-5.4 couldn’t do it; GPT-5.5 could.

With GPT-5.5, Codex now goes further too — it can interact with web apps, test flows, click through pages, capture screenshots, and iterate on what it sees until the task is done.

Long-Context — A Massive Leap

This is arguably the most impressive improvement. On the MRCR v2 benchmark — which tests how reliably a model finds information buried deep in very long documents:

ModelMRCR v2 Score (512K–1M tokens)
GPT-5.436.6%
GPT-5.574.0%

That’s more than doubling long-context performance in a single generation. For anyone working with large codebases, legal documents, research papers, or lengthy contracts — this is a practical, immediate upgrade.

Speed — Same Latency, More Intelligence

Bigger models are usually slower. GPT-5.5 is an exception — it matches GPT-5.4’s per-token latency in real-world serving while delivering significantly higher quality. It also uses ~40% fewer tokens to complete the same tasks, which partially offsets the higher API price.

Computer Use

GPT-5.5 understands intent better, making it faster and more reliable at complex multi-step computer use tasks — navigating GUIs, filling forms, reading screens, and completing workflows with less guidance required.


Benchmarks vs The Competition

BenchmarkGPT-5.5Claude Opus 4.7Gemini 3.1 Pro
Terminal-Bench 2.0 (agentic coding)82.7%69.4%68.5%
FrontierMath Tier 4 (advanced math)35.4%22.9%16.7%
GDPval (knowledge work, 44 occupations)84.9%
SWE-Bench Pro (GitHub issue resolution)58.6%64.3%
MCP Atlas (tool use, Scale AI)75.3%79.1%78.2%
BrowseComp (web research)84.4%85.9%
AI Intelligence Index#1 — 60 points57 points57 points

GPT-5.5 leads on coding, math, and agentic work. Claude Opus 4.7 still wins on real-world GitHub issue resolution (SWE-Bench Pro) and tool use (MCP Atlas). Gemini 3.1 Pro edges ahead on web research and vision tasks.

⚠️ One Weakness — Hallucinations

Despite topping the accuracy benchmark (57% on AA Omniscience), GPT-5.5 has an 86% hallucination rate — the highest of the three frontier models tested. Claude Opus 4.7 sits at 36% and Gemini 3.1 Pro at 50%. For factual recall tasks where fabricated answers are costly, this is worth keeping in mind.


Pricing

ChatGPT Subscription (No Change)

PlanMonthly PriceGPT-5.5 Access
Free$0❌ Not available
Plus$20/month✅ GPT-5.5 Thinking
Pro$200/month✅ GPT-5.5 Thinking + Pro
Business / EnterpriseCustom✅ Full access

API Pricing (Coming Very Soon)

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-5.4$2.50$15.00
GPT-5.5$5.00$30.00
GPT-5.5 Pro$30.00$180.00

The API price doubled on paper — but because GPT-5.5 uses ~40% fewer tokens to complete the same tasks, the net real-world cost increase is closer to 20%. Batch and Flex modes are available at half the standard rate. Priority processing is 2.5x.

💡 Sam Altman’s take on pricing: Token efficiency gains mean the effective cost increase is much smaller than the sticker suggests — GPT-5.5 completes Codex tasks in fewer passes, meaning cheaper total runs despite the higher per-token rate.

Codex Bonus for Pro Users

OpenAI is giving Pro users 2x Codex usage through May 31, 2026 as a launch incentive.


What Hasn’t Changed

  • Same 1M token context window on the API (400K in Codex)
  • Same text + image input modalities
  • Same Responses and Chat Completions endpoints
  • API not live yet — coming “very soon” pending additional cybersecurity safeguards

The Super App Vision

Beyond the model itself, OpenAI used this launch to move closer to its “super app” ambition — a unified platform combining ChatGPT, Codex, and an AI browser for enterprise customers. Brockman described GPT-5.5 as “setting the foundation for how we’re going to use computers going forward.”

With 900 million weekly active users, 50 million subscribers, 9 million paying business users, and 4 million active Codex users, OpenAI is betting that GPT-5.5 also puts to rest the recent narrative that Anthropic has overtaken them in the enterprise race.


Who Should Upgrade?

Use CaseRecommendation
Hard coding, agentic tasks, multi-step work✅ Upgrade to GPT-5.5
Long document analysis, large codebases✅ Upgrade — long-context is dramatically better
High-volume summarization, classification⏳ Stick with GPT-5.4 or 5.4-mini
Correctness-critical legal / financial work✅ GPT-5.5 Pro (but watch the hallucination rate)
Free tier users❌ No access — Plus plan required

Tips

  • Don’t switch everything at once — route easy, high-volume tasks (classification, summaries) to GPT-5.4-mini; escalate hard ones to GPT-5.5. A smart router saves more than any prompt-level optimization.
  • Use Batch mode for offline evaluations, backfills, and nightly jobs — it’s 50% cheaper than standard API pricing.
  • Watch your reasoning tokens — the biggest billing surprises on GPT-5.5 come from reasoning-token spend at high effort. Set hard max_output_tokens caps on every call.
  • Cache your system prompts — cached input tokens are billed at a fraction of the standard rate. Anything reused across requests (system prompts, tool schemas, repo context) should be cached.
  • SWE-Bench Pro caveat — Claude Opus 4.7 still wins on real-world GitHub issue resolution at 64.3% vs 58.6%. If your primary use case is GitHub issue resolution at scale, benchmark both models on your own data before switching.
  • API isn’t live yet — if you need API access right now, GPT-5.4 is still the option. GPT-5.5 API is coming very soon per OpenAI.

Quick Reference

  
ReleasedApril 23, 2026
Internal codenameSpud
Weeks since GPT-5.47 weeks
Biggest improvementLong-context reasoning (36.6% → 74.0% on MRCR v2)
Best benchmark winTerminal-Bench 2.0 coding — 82.7% (vs 69.4% Claude)
Known weakness86% hallucination rate (vs 36% for Claude Opus 4.7)
API price vs GPT-5.42x sticker / ~20% net after token efficiency
Token efficiency~40% fewer tokens vs GPT-5.4
ChatGPT availabilityPlus, Pro, Business, Enterprise — April 23, 2026
API availabilityComing very soon
Free tier access❌ Not available
This post is licensed under CC BY 4.0 by the author.