🚨 Breaking
Claude Opus 4.8 is now GA (claude-opus-4-8, released 2026-05-28). This is a new model ID — hard-coded references to prior Opus IDs will not pick it up automatically. Key specs: 1M token context window (default), 128k max output tokens. Tool and platform feature set is identical to claude-opus-4-7. Review Anthropic's "What's new in Claude Opus 4.8" for capability deltas before promoting to prod. (Release notes)
🗑️ Deprecations
Two models are already past their retirement date. Any traffic still hitting these IDs is running on borrowed time:
| Model | Retired | Drop-in Replacement |
|---|---|---|
claude-3-opus-20240229 |
2026-01-05 | claude-opus-4-8 |
claude-2.0 |
2025-07-21 | claude-opus-4-8 |
💰 Pricing
No pricing changes in today's brief.
🆕 New
vLLM v0.22.0 (2026-05-29) — DeepSeek V4 hardened and reorganized into a dedicated vllm/models/deepseek_v4/ package; gains NVFP4 fused MoE support. 459 commits from 230 contributors (63 new). Actionable if you self-host DeepSeek V4 at scale. (Release)
llama.cpp (builds b9393–b9415, 2026-05-29) — Five targeted fixes and additions shipped in one day: DeepSeekOCR 2 multimodal support with dynamic multi-tile resolution (b9414), Hexagon backend op fusion with RMS_NORM+MUL (b9402), --skip-download flag for offline/cache workflows (b9415), Vulkan backend allreduce buffer corruption fix on non-COMPUTE tensors — previously caused corrupt output silently (b9403), Gemma 4 audio RMS norm eps correction (b9393). (Releases)
💡 Tip of the Day
Grep your codebase for claude-3-opus-20240229 and claude-2.0 right now. Both retirement dates are in the past — any surviving reference is a latent hard failure waiting to surface. Replace both with claude-opus-4-8. While you're there, audit your max_tokens budget: the 128k output ceiling on Opus 4.8 may widen or tighten your current safety margins depending on how you've set it.