🚨 Breaking
No breaking changes reported today.
🗑️ Deprecations
No deprecations.
💰 Pricing
No pricing changes in the brief.
🆕 New
llama.cpp — six builds (b9436–b9444), 2026-05-30/31
- b9444: Server now handles
If-None-Matchweak ETags — release - b9442: Tokenizer support added for
jina-embeddings-v2-base-zh(whitespace tokenizer;lowercasedefaults totrue) — release - b9441: Fixes ETag truncation bug in MSVC-compiled builds — release
- b9439: llama now defaults to using a single iGPU device — release
- b9437:
llama-benchgains-fa autosupport;-ngldefault changed to-1— release - b9436: OpenCL backend adds bf16 support via f16 conversion — release
MiniMax M3 — Multimodal model (text + image + video in, text out), 1M-token context window, positioned for long-horizon agentic work and coding. Now on OpenRouter — model page
StepFun Step 3.7 Flash — MoE architecture: 196B total / ~11B active parameters, native vision encoder for image and video understanding. On OpenRouter — model page
🌐 AI Landscape
Research: "Representation Forcing for Bottleneck-Free Unified Multimodal Models" — proposes eliminating the frozen, separately pretrained VAE that current unified multimodal models rely on for image generation, removing a structural bottleneck — paper
Research: "LongTraceRL" — applies RLVR (reinforcement learning with verifiable rewards) using search agent trajectories with rubric rewards to address long-context reasoning failures in LLMs — paper
💡 Action Today
If you run llama-bench in CI: audit your -ngl flag usage after upgrading past b9437 — the default changed to -1. Ensure your benchmark scripts explicitly set the value you intend rather than relying on the old default.