Weaviate SSB threshold bump + Ollama drops GGML backend + Goose ships local review — 2026-05-28

LLM API Advisory — 2026-05-28

🚨 Breaking

No breaking changes declared in today's brief. Two Weaviate patch releases are nonetheless flagged as high-impact stability fixes worth immediate triage:

Weaviate v1.37.6 and v1.36.16 both raise the SSB memory limit threshold from 80% to 90%, fix TestReplicationAbort/DecodeResponse flakiness, and ship backup/HNSW entry-point panic fixes. Release notes explicitly state "Breaking Changes: none," but the threshold shift changes when memory pressure events fire on production clusters. Patch the branch you're tracking. (v1.37.6 · v1.36.16)

🗑️ Deprecations

Nothing in today's brief.

💰 Pricing

Nothing in today's brief.

🆕 New Features

Goose v1.36.0: Adds goose review local code review command, /goal agent self-evaluation before finishing, TUI diff viewer, and a TUI CLI command. Useful if you run agentic coding workflows locally. (release)
llama.cpp b9354–b9371 (four builds, 2026-05-27): MiniCPM5 tokenizer support (b9354); Hexagon Q4_1 quantization in MUL_MAT/MUL_MAT_ID with HVX (b9370); WebGPU legacy constants removed (b9371); Vulkan transfer queue preference fix on AMD UMA devices (b9357). (b9371 · b9370 · b9357 · b9354)
Ollama v0.30.0-rc28 (pre-release): Architecture shift to direct llama.cpp support, replacing the GGML layer; adds MLX inference acceleration on Apple Silicon; retains GGUF compatibility. Still pre-release — team is explicitly requesting feedback before GA. Do not deploy to production. (release)
NVIDIA on Hugging Face: Two new model drops — nvidia/Assemble_Trocar (gr00t_n1_5 architecture) and nvidia/DeepSeek-V4-Pro-NVFP4 (NVFP4-quantized DeepSeek-V4-Pro, MIT license, 8-bit/fp8). (Assemble_Trocar · DeepSeek-V4-Pro-NVFP4)

💡 Action Today

If you run Weaviate in production, check your current SSB memory headroom: the 80%→90% threshold bump in v1.37.6/v1.36.16 shifts when memory-pressure events trigger. Apply the patch for your active branch with memory monitoring active — don't roll it blind on a cluster that's already near capacity.