The AI signal,filtered from the noise.
The agents, models, and labs that actually matter — one sharp email every morning.
Free forever · No hype, no filler · Unsubscribe anytime
Latest from AgentFlow
View all →Claude Sonnet 5: Almost Opus 4.8, at a Third of the Price
Anthropic shipped Claude Sonnet 5 (codename Fennec) on June 30, 2026, and the pitch is simple: near-Opus-4.8 quality at mid-tier prices. It is the most agentic Sonnet yet, ships a 1M-token context window, scores 82.1% on SWE-bench, and lands at $2/$10 per million tokens through August 31. Here is what actually shipped, the real benchmark gap to Opus 4.8, the new tokenizer that quietly changes your bill, the safety gains, and whether you should make it your default driver.
Speculative Decoding, and How DeepSeek DSpark Made Inference Up to 5x Faster
DeepSeek open-sourced DSpark on June 27, 2026, and the headline doing the rounds, "400x faster," is off by a factor. The real number is 50 to 400 percent faster (up to ~5x throughput), 60 to 85 percent lower latency on DeepSeek-V4, with zero retraining and identical output. Here is how speculative decoding actually works, what DSpark adds on top (DFlash, a Markov head, a load-aware scheduler), the honest numbers, and how to use the MIT-licensed DeepSpec toolkit on your own models.
Ornith 1.0: The Local Coding Model That Writes Its Own Scaffold
DeepReinforce shipped Ornith 1.0 on June 25, 2026, and the trick is not the benchmark, it is what the model learned to do. Ornith is trained with RL to write its own agent scaffold and solve the task in one joint loop, so the orchestration is baked into the weights instead of hand-built around them. It ships MIT-licensed in four sizes (9B dense, 31B dense, 35B MoE, 397B MoE), built on Gemma 4 and Qwen 3.5, with a 256K context. The 397B scores 82.4 on SWE-bench Verified, one point behind Opus 4.8, and the 9B and 35B run on a single consumer GPU. Here is how self-scaffolding actually works, the training recipe (token-level GRPO, async pipeline-RL, three anti-reward-hacking layers), the honest benchmark table, exactly which GPU runs which size, and why a model that carries its own orchestration changes what local, private, offline coding agents can do.
Everything that matters in AI,
straight to your inbox.
Join 12,000+ readers — daily, free, no spam.