Claude Opus 4.8 GA (1M ctx / 128k out) — Two Anthropic Models Already Past Retirement

🚨 Breaking

Claude Opus 4.8 is now GA (claude-opus-4-8, released 2026-05-28). This is a new model ID — hard-coded references to prior Opus IDs will not pick it up automatically. Key specs: 1M token context window (default), 128k max output tokens. Tool and platform feature set is identical to claude-opus-4-7. Review Anthropic's "What's new in Claude Opus 4.8" for capability deltas before promoting to prod. (Release notes)

🗑️ Deprecations

Two models are already past their retirement date. Any traffic still hitting these IDs is running on borrowed time:

Model	Retired	Drop-in Replacement
`claude-3-opus-20240229`	2026-01-05	`claude-opus-4-8`
`claude-2.0`	2025-07-21	`claude-opus-4-8`

(Deprecation page)

💰 Pricing

No pricing changes in today's brief.

🆕 New

vLLM v0.22.0 (2026-05-29) — DeepSeek V4 hardened and reorganized into a dedicated vllm/models/deepseek_v4/ package; gains NVFP4 fused MoE support. 459 commits from 230 contributors (63 new). Actionable if you self-host DeepSeek V4 at scale. (Release)

llama.cpp (builds b9393–b9415, 2026-05-29) — Five targeted fixes and additions shipped in one day: DeepSeekOCR 2 multimodal support with dynamic multi-tile resolution (b9414), Hexagon backend op fusion with RMS_NORM+MUL (b9402), --skip-download flag for offline/cache workflows (b9415), Vulkan backend allreduce buffer corruption fix on non-COMPUTE tensors — previously caused corrupt output silently (b9403), Gemma 4 audio RMS norm eps correction (b9393). (Releases)

💡 Tip of the Day

Grep your codebase for claude-3-opus-20240229 and claude-2.0 right now. Both retirement dates are in the past — any surviving reference is a latent hard failure waiting to surface. Replace both with claude-opus-4-8. While you're there, audit your max_tokens budget: the 128k output ceiling on Opus 4.8 may widen or tighten your current safety margins depending on how you've set it.