LLM API Advisory — 2026-05-28
🚨 Breaking
No breaking changes declared in today's brief. Two Weaviate patch releases are nonetheless flagged as high-impact stability fixes worth immediate triage:
- Weaviate v1.37.6 and v1.36.16 both raise the SSB memory limit threshold from 80% to 90%, fix
TestReplicationAbort/DecodeResponseflakiness, and ship backup/HNSW entry-point panic fixes. Release notes explicitly state "Breaking Changes: none," but the threshold shift changes when memory pressure events fire on production clusters. Patch the branch you're tracking. (v1.37.6 · v1.36.16)
🗑️ Deprecations
Nothing in today's brief.
💰 Pricing
Nothing in today's brief.
🆕 New Features
-
Goose v1.36.0: Adds
goose reviewlocal code review command,/goalagent self-evaluation before finishing, TUI diff viewer, and a TUI CLI command. Useful if you run agentic coding workflows locally. (release) -
llama.cpp b9354–b9371 (four builds, 2026-05-27): MiniCPM5 tokenizer support (b9354); Hexagon Q4_1 quantization in
MUL_MAT/MUL_MAT_IDwith HVX (b9370); WebGPU legacy constants removed (b9371); Vulkan transfer queue preference fix on AMD UMA devices (b9357). (b9371 · b9370 · b9357 · b9354) -
Ollama v0.30.0-rc28 (pre-release): Architecture shift to direct llama.cpp support, replacing the GGML layer; adds MLX inference acceleration on Apple Silicon; retains GGUF compatibility. Still pre-release — team is explicitly requesting feedback before GA. Do not deploy to production. (release)
-
NVIDIA on Hugging Face: Two new model drops —
nvidia/Assemble_Trocar(gr00t_n1_5 architecture) andnvidia/DeepSeek-V4-Pro-NVFP4(NVFP4-quantized DeepSeek-V4-Pro, MIT license, 8-bit/fp8). (Assemble_Trocar · DeepSeek-V4-Pro-NVFP4)
💡 Action Today
If you run Weaviate in production, check your current SSB memory headroom: the 80%→90% threshold bump in v1.37.6/v1.36.16 shifts when memory-pressure events trigger. Apply the patch for your active branch with memory monitoring active — don't roll it blind on a cluster that's already near capacity.