Daily LLM Advisory: Goose xAI Auth, Qwen3.7 Plus, Anthropic Thinking Tokens, llama.cpp Fixes

🆕 Nouveautés

Goose v1.37.0 – Adds xAI SuperGrok OAuth subscription provider (#9420). Also includes replay ACP images on session load and exposes raw provider supported models over ACP. Evaluate if your workflow benefits from xAI integration.

Qwen3.7 Plus – New cost-effective model on OpenRouter (details). Supports text+image input, text output. Consider for image captioning or where lower cost is needed.

Anthropic API (May 27) – Now includes usage.output_tokens_details.thinking_tokens in response, reporting extended thinking tokens. In streaming, appears only on final message_delta. No beta header required. Important for billing transparency.

llama.cpp multiple releases – Several fixes and improvements: - b9518: Disable on-device spec checkpoints - b9515: Deduplicate imatrix loading code - b9503: Fix Gemma 4 audio projector embedding size - b9500: Reduce Metal rset heartbeat from 500ms to 5ms - b9496: Fix Gemma 4 unified FPE - b9495: Use post-norm hidden state for MTP on Qwen3.5

If self-hosting with llama.cpp, upgrade to b9518.

💡 Conseil du jour

Audit your Anthropic extended thinking usage. With the new thinking_tokens field, track cost breakdowns and consider caching strategies if thinking tokens are high.