AgentFlow DailyNew

The AI signal,filtered from the noise.

The agents, models, and labs that actually matter — one sharp email every morning.

Free forever · No hype, no filler · Unsubscribe anytime

Latest from AgentFlow

View all →
AI Models

Claude Sonnet 5: Almost Opus 4.8, at a Third of the Price

Anthropic shipped Claude Sonnet 5 (codename Fennec) on June 30, 2026, and the pitch is simple: near-Opus-4.8 quality at mid-tier prices. It is the most agentic Sonnet yet, ships a 1M-token context window, scores 82.1% on SWE-bench, and lands at $2/$10 per million tokens through August 31. Here is what actually shipped, the real benchmark gap to Opus 4.8, the new tokenizer that quietly changes your bill, the safety gains, and whether you should make it your default driver.

Jun 30, 2026
AI Infrastructure

Speculative Decoding, and How DeepSeek DSpark Made Inference Up to 5x Faster

DeepSeek open-sourced DSpark on June 27, 2026, and the headline doing the rounds, "400x faster," is off by a factor. The real number is 50 to 400 percent faster (up to ~5x throughput), 60 to 85 percent lower latency on DeepSeek-V4, with zero retraining and identical output. Here is how speculative decoding actually works, what DSpark adds on top (DFlash, a Markov head, a load-aware scheduler), the honest numbers, and how to use the MIT-licensed DeepSpec toolkit on your own models.

Jun 27, 2026
AI Models

Ornith 1.0: The Local Coding Model That Writes Its Own Scaffold

DeepReinforce shipped Ornith 1.0 on June 25, 2026, and the trick is not the benchmark, it is what the model learned to do. Ornith is trained with RL to write its own agent scaffold and solve the task in one joint loop, so the orchestration is baked into the weights instead of hand-built around them. It ships MIT-licensed in four sizes (9B dense, 31B dense, 35B MoE, 397B MoE), built on Gemma 4 and Qwen 3.5, with a 256K context. The 397B scores 82.4 on SWE-bench Verified, one point behind Opus 4.8, and the 9B and 35B run on a single consumer GPU. Here is how self-scaffolding actually works, the training recipe (token-level GRPO, async pipeline-RL, three anti-reward-hacking layers), the honest benchmark table, exactly which GPU runs which size, and why a model that carries its own orchestration changes what local, private, offline coding agents can do.

Jun 25, 2026

Everything that matters in AI,
straight to your inbox.

Join 12,000+ readers — daily, free, no spam.