🚨 Breaking: None
🗑️ Dépréciations: None
💰 Pricing: None
🆕 Nouveautés:
- llama.cpp (b9553) relaxes sampler name matching, allowing alternative names like top-k alongside canonical top_k. b9551 optimizes KV cache to avoid cell copies. b9547 skips mmproj download when user supplies one. b9544 fixes reasoning round-trip issues for LFM2/LFM2.5 models. b9543 adds video support for Qwen3.5-based models via frame merge.
- Ollama v0.30.5 fixes the gemma4:12b floating point exception crash on x86/CUDA/Linux/Windows. v0.30.6 introduces Gemma 4 QAT quantized weights (tags ending in -qat), reducing memory requirements for on-device inference.
- CohereLabs/BLS-Mini-Code-1.0 (Hugging Face) is a compact code model using MoE architecture.
- NVIDIA Nemotron 3.5 Content Safety (free on OpenRouter) is a 4B multimodal guardrail fine-tuned from Gemma-3-4B, moderating LLM/VLM inputs and outputs.
🌐 Actualité IA: - New research paper "When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents" introduces ToolMaze, a benchmark evaluating LLM agents' ability to recover from tool failures. (paper)
💡 Conseil du jour: If you deploy Gemma 4 models in production, test the new QAT weights (available in Ollama v0.30.6) to reduce memory footprint without sacrificing accuracy. For local inference with llama.cpp, upgrade to b9553+ to benefit from relaxed sampler naming and KV cache improvements.