2026-05-26 · No breaking changes. Clean sweep on deprecations and pricing.
🚨 Breaking
None today.
🗑️ Dépréciations
None announced.
💰 Pricing
No pricing changes in today's brief.
🆕 Nouveautés
llama.cpp b9330 — Silent correctness bug fixed for nemotron-h. ffn_latent_down/ffn_latent_up were declared as GGML_OP_MUL in LLM_TENSOR_INFOS, but nemotron-h routes them through ggml_mul_mat at runtime. The backend buffer probe tested the declared op — GGML_OP_MUL — which previously returned true unconditionally on q8_0 weights, silently assigning the wrong backend. Now correctly tagged MUL_MAT. If you run nemotron-h locally with quantized weights, this is a silent-correctness fix — upgrade.
→ b9330
llama.cpp b9329 — Fast Walsh-Hadamard transform added for CUDA, with unrolls and warp-size-64 tuning. Pure throughput win for ops that use it; no API surface change. → b9329
Cline v3.85.0 — Three new model families added: GPT-5.5 on SAP AI Core, DeepSeek V4 Flash and Pro, Gemini 3.5 Flash (both Gemini and Vertex providers). Also fixes Vertex AI global endpoint handling for Claude models — if you route Claude through the Vertex global endpoint, this patch is worth pulling now. → v3.85.0
browser-use 0.12.9 — Session ID is now passed through to judge LLM calls (improves traceability in multi-session agent runs). New-tab pages no longer trigger spurious screenshots. → 0.12.9
🌐 Actualité IA
DVAO (HF Papers): Extends Group Relative Policy Optimization with dynamic variance-adaptive advantage weighting for multi-reward RL settings — relevant if you're blending multiple reward signals in RLHF pipelines. → 2605.25604
ParaVT (HF Papers): Addresses the sequential-tool-call bottleneck in video-agent RL — enables parallel tool dispatch (multiple tools per turn) rather than one per turn. Watch if you're building multi-tool agentic systems. → 2605.20342
HN trending — "Using AI to write better code more slowly" hit 295 points and 117 comments. Community debate on deliberate vs. velocity-first AI coding workflows — worth a skim if your team is calibrating AI-assist norms. → nolanlawson.com
💡 Conseil du jour
If nemotron-h is in your local inference stack: update llama.cpp to b9330 before your next run. The buft probe bug produced no visible error — it silently assigned q8_0 weight tensors to the wrong backend. Wrong backend = wrong numerics, not a crash.