LLM API Daily 2026-05-24: llama.cpp quad-release, browser-use hardening, AI cost reality check

🚨 Breaking

None today. Zero breaking changes across all scanned providers and tools.

🗑️ Deprecations

None flagged in today's brief.

💰 Pricing

No pricing changes reported.

🆕 New

llama.cpp dropped four consecutive builds on 2026-05-23 (b9297, b9296, b9295, b9294):

b9297: Adds NVFP4 MTP scale tensors (#23563), links Qwen3.5 MTP tensors. Material if you self-host NVFP4-quantized or Qwen3.5 MTP models.
b9296: Fixes ggml interface method check before using the 2D-get fallback (#23514). Silent correctness fix.
b9295: Fixes Vulkan Windows find_package for SPIRV-Headers (#23215). Previously broke clean Windows Vulkan builds.
b9294: Generalizes OpenCL Adreno MoE kernels (#23449). Relevant for mobile/edge inference on Adreno GPUs.

pydantic-ai v2.0.0b3 (2026-05-22, release): Third V2 beta. An Upgrade Guide is published alongside. Do not move production agent pipelines without reading it — this is a major version.

browser-use 0.12.8 (release): Two security-adjacent hardening changes — Unix socket file now restricted to owner-only access; evaluate() refused on restricted browser profiles. Update if you run browser-use in shared-host or multi-tenant environments.

OpenAI Codex CLI 0.134.0-alpha.3 (release): Alpha track. No changelog detail available in today's brief.

🌐 AI Industry

Microsoft disclosed that running AI agents is currently more expensive than paying human employees for equivalent work (Fortune, 2026-05-22). The independent tracker isaiprofitable.com is generating active HN discussion on AI unit economics. Both are concrete data points if you're defending or stress-testing AI infra spend internally.

💡 Today's Action

If you self-host llama.cpp with NVFP4 quantized models or Qwen3.5 MTP variants, pull b9297 — it's the first build wiring Qwen3.5 MTP scale tensors. Windows + Vulkan pipeline failing? b9295 fixes the SPIRV-Headers find_package regression that broke clean builds.