Agents

Building a Daily AI Briefing Agent

A from-scratch agent that scrapes the last 24 hours of AI news, ranks what matters, and writes a tight daily briefing — the theory, the code, and what actually happened when I ran it for a month.

Jithin Kumar PalepuMay 25, 202612 min read

I read too much AI news so you don't have to — except “I” got tired, so I built an agent to do it instead. This is the full build: the theory behind why a naive “summarize everything” agent fails, the code that actually works, and a month of results from running it every morning.

The theory: ranking beats summarizing

The obvious version of this agent — scrape a feed, dump it into a model, ask for a summary — produces mush. It treats a major model release and a minor library bump as equally important, because the model has no notion of signal. The fix is to split the job into two stages: gather widely, then rank ruthlessly. Only the top-ranked items get the expensive write-up treatment.

So the pipeline is four steps: collect raw items from a handful of sources, dedupe near-identical stories, rank by a model that scores importance, then write a tight briefing from the top N. Each stage is cheap except the last, and the last only runs on a few items.

The code

The heart of it is the rank-then-write split. Here is the trimmed-down orchestration loop — collection and emailing are omitted for brevity (full source in the repo):

async def build_briefing(sources: list[Source]) -> Briefing:
    # 1. collect widely (cheap, parallel)
    raw = await asyncio.gather(*[s.fetch() for s in sources])
    items = dedupe(flatten(raw))

    # 2. rank ruthlessly with a small model
    scored = await rank(items, model="gpt-small")
    top = sorted(scored, key=lambda x: x.score, reverse=True)[:8]

    # 3. write only the top items with the big model
    briefing = await write(top, model="gpt-large")
    return briefing

The ranking prompt is the part worth tuning. I score each item 0–10 on “would a busy AI engineer regret missing this?” and pass a one-line rubric with examples. That single framing did more for quality than any model upgrade.

Gather widely, rank ruthlessly, write only the top. The ranking step is where the quality lives — not the summarizing.

Results

I ran it every morning for a month and compared its picks against what I'd have flagged by hand.

Ninety-one percent agreement with my manual picks, for four cents and ninety seconds a day, is a trade I'll take. The misses were almost always niche research papers the ranker undervalued — which is exactly the gap a dedicated papers reviewer should fill.

What I'd change

Two things. First, the dedupe step is naive string-similarity; it occasionally merges two genuinely different stories that share a headline. An embedding-based dedupe would be more robust. Second, the ranker has no memory — it can't tell that a story is a follow-up to yesterday's. Giving it the last few days of briefings as context is the obvious next iteration.