Claude's self-improving agents, Codex runs in Chrome, and Anthropic's 300MW SpaceX deal

Plus: GPT-5.5 Instant cuts hallucinations by 52%, Subquadratic debuts a 12M-token context window, and Jupiter-v1-p enters red teaming.

May 10, 2026

Welcome back to Ampli AI — and welcome to the new readers who joined this week.

This week, the major labs shipped in parallel. Anthropic used its Code with Claude conference to launch self-improving agents with dreaming, outcomes, and multi-agent orchestration — features that make Claude agents meaningfully better at long-running tasks. OpenAI updated ChatGPT’s default model to GPT-5.5 Instant, cutting hallucinations by 52% and verbosity by 30%. And a Miami startup called Subquadratic came out of stealth with a 12-million-token context window, claiming 1,000x efficiency gains over standard attention. Meanwhile, the enterprise race heated up with both Anthropic and OpenAI launching PE-backed ventures, and China’s government moved to back DeepSeek at a $50B valuation.

TLDR; (30-sec)

Claude launched self-improving agents — dreaming, outcomes, and multi-agent orchestration went live at Code with Claude
Subquadratic debuted a 12M-token context window with $29M in seed funding, claiming 1,000x efficiency gains
OpenAI released GPT-5.5 Instant as ChatGPT’s new default — 52% fewer hallucinations, 30% shorter responses
Anthropic is red-teaming Jupiter-v1-p, a Claude 5-class model, ahead of its developer conference
AlphaEvolve, Google’s Gemini-powered coding agent, is now scaling impact across scientific fields
Codex now runs directly in Chrome on macOS and Windows — parallel tabs, no browser takeover
Anthropic signed a 300+ MW compute deal with SpaceX and doubled Claude Code rate limits
Both Anthropic ($1.5B) and OpenAI ($10B target) launched PE-backed enterprise AI ventures
DeepSeek in talks to raise billions from China’s National AI Fund at $50B valuation
Apple is planning to let iOS 27 users choose between multiple AI models

This Week’s Big Stories

Must-Read

Claude adds Self-Improving Agents — Anthropic shipped three features that change how managed agents learn and operate. Dreaming lets agents review past sessions to extract patterns and self-improve between runs. Outcomes let developers define success rubrics so agents self-correct until output meets the bar. And multiagent orchestration enables a lead agent to delegate work to specialist agents running in parallel on a shared filesystem. Harvey, Netflix, Spiral, and Wisedocs are already using these in production. This is Anthropic moving agents from “do the thing” to “get better at doing the thing.”

Subquadratic debuts a 12-million-token context window — Subquadratic launched out of stealth with $29M in seed funding and SubQ, the first frontier LLM that doesn’t use quadratic attention. The model uses Subquadratic Sparse Attention (SSA) — content-dependent token selection that scales linearly instead of quadratically with context length. At 1M tokens, it’s reportedly 50x faster and 50x cheaper than frontier competitors. Caveat worth noting: no public paper or model weights yet, and researchers are asking for independent reproduction.

GPT-5.5 Instant — OpenAI replaced GPT-5.3 Instant as ChatGPT’s default model. The key numbers: 52.5% fewer hallucinated claims on high-stakes prompts (medicine, law, finance), 37.3% fewer inaccurate claims on user-flagged conversations, and 30% shorter responses. Enhanced personalization from past chats and connected Gmail is rolling out to Plus and Pro users. The model is also available in the API as chat-latest.

Anthropic tests Jupiter-v1-p ahead of developer conference — Anthropic was spotted red-teaming a new internal build codenamed Jupiter-v1-p ahead of Code with Claude. Internal model naming at Anthropic separates planet-codename safety probes from incremental point releases — Jupiter suggests a Claude 5-class model, the first new base model family since Claude 4. The red team round is consistent with Anthropic’s responsible scaling policy. No public release yet, but the timing with the conference is hard to ignore.

AlphaEvolve: Gemini-powered coding agent scaling impact — Google DeepMind’s AlphaEvolve is a coding agent that designs advanced algorithms and has started making discoveries on open problems across mathematics, computer science, and other fields. This is one of the clearest demonstrations yet of AI agents doing useful scientific work — not just generating code, but finding solutions humans hadn’t.

Codex now works directly in Chrome — OpenAI’s Codex can now run directly in Chrome on macOS and Windows, working in parallel across tabs without taking over the browser. It writes code under the hood to navigate structured pages and handle data flows. This makes browser-based automation a first-class Codex capability.

Anthropic signs SpaceX compute deal and doubles Claude limits — Anthropic secured a 300+ MW compute partnership with SpaceX’s Colossus data center, adding to capacity from Microsoft, NVIDIA, and Fluidstack. Claude Code rate limits were doubled across Pro, Max, Team, and Enterprise plans, and peak-hours throttling on Pro/Max was removed. The company also announced plans for international expansion to serve regulated industries.

Anthropic and OpenAI Launch Enterprise AI Ventures — Both companies announced separate enterprise AI ventures backed by major financial firms in the same week. Anthropic’s is valued at $1.5B; OpenAI’s is targeting a $10B valuation. The private equity money flowing into enterprise AI services signals that the deployment phase is getting real.

Apple Explores Multi-Model AI in iOS 27 — Apple is planning to let iOS 27 users choose from multiple AI models instead of being locked to one provider. This would make Apple the first major platform to treat AI models as interchangeable — a significant architectural bet and a potential distribution channel for every model provider.

Natural Language Autoencoders — Anthropic published research on NLAs — a method to translate model activations into human-readable language and back. This is interpretability research that could eventually let you read what an AI model is “thinking” in plain English rather than opaque vectors.

China to Invest in DeepSeek at $50 Billion Valuation — DeepSeek is in talks to raise several billion from China’s National Artificial Intelligence Industry Investment Fund at a $50B valuation. This government-backed fund has around $8.8B in capital. The investment would mark China’s most direct bet yet on a single AI lab.

Anthropic committed to $200B in cloud spending over five years — Reports emerged that Anthropic has committed to spending $200B on cloud services over the next five years, boosting Alphabet stock on the news. The scale of the commitment underscores just how compute-intensive frontier model training and inference have become.

Replit: Cursor deal, fighting Apple, and why he’d rather not sell — Replit’s Amjad Masad says the company is nearing a billion-dollar annual run rate with 300% net revenue retention and positive gross margins — a contrast to Cursor’s negative margins. Masad is committed to independence but acknowledges conversations with potential acquirers, and expressed frustration with Apple’s App Store practices.

Build Tips & Engineering

MRC: Supercomputer networking for 100K+ GPU clusters — OpenAI partnered with AMD, Broadcom, Intel, Microsoft, and NVIDIA to develop MRC (Multipath Reliable Connection), a protocol that enables 100K+ GPU clusters using only two tiers of Ethernet switches. Already deployed across OpenAI’s largest GB200 supercomputers. Released through the Open Compute Project.

ProgramBench — A benchmark that challenges agents to recreate software executables without source code, using only documentation and experimentation. Over 248,000 behavioral tests across 200 tasks, from terminal utilities to compilers. Agents must design and implement entirely from scratch in a sandboxed environment.

Improving token efficiency in GitHub Agentic Workflows — GitHub’s deep dive into how they reduced token consumption in Copilot’s agentic workflows while maintaining quality. Practical lessons for anyone building agents that need to stay within token budgets.

How AI agent memory works — A comprehensive walkthrough of how modern agent memory systems are designed — short-term, long-term, episodic, and semantic memory layers, and how they interact during multi-turn tasks.

Inside OpenAI’s Low-Latency Voice Infrastructure — OpenAI rebuilt its WebRTC stack with a split relay + transceiver architecture for real-time voice AI. Useful reading if you’re building anything with streaming audio.

Computer use is 45x more expensive than structured APIs — Reflex benchmarked computer-use agents against structured API approaches and found a 45x cost gap. Worth considering before you reach for browser automation when an API exists.

Accelerating Gemma 4 with multi-token prediction — Google’s approach to faster Gemma 4 inference using multi-token prediction drafting — speculative decoding that predicts multiple tokens ahead to reduce generation latency.

Tools & Product Updates

OpenAI Realtime Audio Models — Three new models in the API: GPT-Realtime-2 for conversational reasoning, GPT-Realtime-Translate for live multilingual translation, and GPT-Realtime-Whisper for streaming transcription.

Anthropic Orbit — proactive assistant — Orbit is a briefing and insights system in Claude and Claude Code that produces personalized briefings with actionable insights from connected work tools. Spotted in testing ahead of the Code with Claude conference.

Perplexity Personal Computer on Mac — Perplexity’s Personal Computer feature is now available to all Mac users, bringing system-level integration for search and assistance.

Meta Hatch AI Agent — Meta is developing Hatch, an AI agent with social skills, positioned as a consumer-grade competitor to OpenAI’s Operator. Currently in waitlist-based access.

Gemini API File Search is now multimodal — Google’s Gemini API File Search now supports multimodal queries, enabling more efficient and verifiable RAG pipelines.

Gemini API Webhooks — Google AI Studio introduced webhooks for the Gemini API to handle long-running jobs without polling.

Vercel Deepsec — Vercel shipped Deepsec, a security tool that scans codebases for vulnerabilities and suggests fixes. Used internally at Vercel.

Trusted Contact for ChatGPT — OpenAI added a Trusted Contact feature to ChatGPT, adding a safety layer for account recovery and verification.

🧬 Model Releases

GPT-5.5 Instant — OpenAI’s new default ChatGPT model. 52% fewer hallucinations, 30% shorter responses, enhanced personalization. Available in the API as chat-latest. Replaces GPT-5.3 Instant.

Jupiter-v1-p (testing) — Anthropic’s Claude 5-class model spotted in red teaming. No public release yet, but the planet codename signals a new base model family.

Gemini Flash upgrades — Google is testing upgrades for Gemini Flash, with a candidate performing competitively against Gemini 3.1 Pro on LM Arena. Signs suggest a Flash 3.2 rollout. Users are being transitioned from Gemini 2 Flash to 3/3.1 Flash-Lite.

Quick Bits

DeepSeek V4 — frontier-class at a fraction of the price — Simon Willison’s take on DeepSeek V4: near-frontier performance, dramatically lower cost. The open-weight price war continues.

White House Considers Vetting AI Models Before Release — The White House is considering pre-release vetting requirements for AI models, potentially adding a regulatory step before frontier models ship.

Top AI Companies Agree to Pentagon Deals — Major AI companies are signing up for classified Pentagon work, marking a shift in the industry’s relationship with defense.

Meta plans advanced agentic AI assistant — Meta building a personalized AI assistant powered by its new Muse Spark model, designed for everyday task execution.

Google DeepMind partners with EVE Online — EVE Online’s complex economy and long-horizon decision-making make it a uniquely rich environment for testing AI systems.

GPT-5.5 price analysis — GPT-5.5 launched with a 2x price bump over 5.4, partially offset by generating fewer tokens per response at equal quality.

Moonshot AI (Kimi) at $20B valuation — Kimi maker Moonshot AI raised at $20B in a Meituan-led round, continuing Chinese AI’s separate funding momentum.

Y Combinator’s $5B OpenAI stake — YC’s ~0.6% stake in OpenAI is now worth over $5B at current valuation. The most successful seed bet in accelerator history.

Harvey’s Legal Agent Benchmark — Harvey launched a benchmark for evaluating AI agents on legal tasks, creating a shared yardstick for the growing legal-AI space.

Deep Dives

Notes from inside China’s AI labs — Deep reporting on what’s happening inside China’s leading AI labs — culture, competitive dynamics, and how the US-China AI gap is evolving. Worth the 18-minute read if you’re trying to understand the global picture.

Google Rethinks Hallucinations Through Uncertainty — Google reframed hallucinations as failures to express uncertainty rather than knowledge gaps, proposing “faithful uncertainty” as an alignment mechanism. A conceptual shift that could change how we measure and mitigate hallucinations.

OpenAI Flips the Script — Analysis of OpenAI’s strategic evolution — how the company’s positioning, product strategy, and competitive approach have shifted.

World Models Can Change Everything — A deep dive into why world models — AI systems that build internal representations of reality — could represent the next paradigm shift.

Long AI Short AGI — Silicon Valley’s AGI narrative vs. the reality that intelligence is commoditizing like compute, bandwidth, and storage before it. The real winners may not be the ones with the best models.

Automating AI Research (Import AI #455) — Jack Clark’s take on AI systems that automate AI research itself — what’s working, what’s overhyped, and where the real leverage is.

That’s a wrap for this week.

If this was useful, the best thing you can do is share it with someone who’d get value from it too. Forward this email, drop it in your team’s Slack, or just send it to that one friend who’s always asking “what happened in AI this week?” Every share helps us keep this going.

Hit reply if I missed something — I read everything. See you next week. Stay curious.

Attia

Ampli AI

Discussion about this post

Ready for more?