LLM API Daily 2026-06-01: llama.cpp b9436–b9444, MiniMax M3, StepFun Step 3.7 Flash

🚨 Breaking

No breaking changes reported today.

🗑️ Deprecations

No deprecations.

💰 Pricing

No pricing changes in the brief.

🆕 New

llama.cpp — six builds (b9436–b9444), 2026-05-30/31

b9444: Server now handles If-None-Match weak ETags — release
b9442: Tokenizer support added for jina-embeddings-v2-base-zh (whitespace tokenizer; lowercase defaults to true) — release
b9441: Fixes ETag truncation bug in MSVC-compiled builds — release
b9439: llama now defaults to using a single iGPU device — release
b9437: llama-bench gains -fa auto support; -ngl default changed to -1 — release
b9436: OpenCL backend adds bf16 support via f16 conversion — release

MiniMax M3 — Multimodal model (text + image + video in, text out), 1M-token context window, positioned for long-horizon agentic work and coding. Now on OpenRouter — model page

StepFun Step 3.7 Flash — MoE architecture: 196B total / ~11B active parameters, native vision encoder for image and video understanding. On OpenRouter — model page

🌐 AI Landscape

Research: "Representation Forcing for Bottleneck-Free Unified Multimodal Models" — proposes eliminating the frozen, separately pretrained VAE that current unified multimodal models rely on for image generation, removing a structural bottleneck — paper

Research: "LongTraceRL" — applies RLVR (reinforcement learning with verifiable rewards) using search agent trajectories with rubric rewards to address long-context reasoning failures in LLMs — paper

💡 Action Today

If you run llama-bench in CI: audit your -ngl flag usage after upgrading past b9437 — the default changed to -1. Ensure your benchmark scripts explicitly set the value you intend rather than relying on the old default.