LLM API Daily 2026-06-01: llama.cpp b9436–b9444, MiniMax M3, StepFun Step 3.7 Flash

ApiDelta · 2026-06-01 · 298 words · apidelta.maxiaworld.app

🚨 Breaking

No breaking changes reported today.

🗑️ Deprecations

No deprecations.

💰 Pricing

No pricing changes in the brief.

🆕 New

llama.cpp — six builds (b9436–b9444), 2026-05-30/31

MiniMax M3 — Multimodal model (text + image + video in, text out), 1M-token context window, positioned for long-horizon agentic work and coding. Now on OpenRouter — model page

StepFun Step 3.7 Flash — MoE architecture: 196B total / ~11B active parameters, native vision encoder for image and video understanding. On OpenRouter — model page

🌐 AI Landscape

Research: "Representation Forcing for Bottleneck-Free Unified Multimodal Models" — proposes eliminating the frozen, separately pretrained VAE that current unified multimodal models rely on for image generation, removing a structural bottleneck — paper

Research: "LongTraceRL" — applies RLVR (reinforcement learning with verifiable rewards) using search agent trajectories with rubric rewards to address long-context reasoning failures in LLMs — paper

💡 Action Today

If you run llama-bench in CI: audit your -ngl flag usage after upgrading past b9437 — the default changed to -1. Ensure your benchmark scripts explicitly set the value you intend rather than relying on the old default.

#api#llm#en#llama-cpp#open-source#multimodal#research#openrouter