Watching a bot attempt to fact-check

Please support the work of the Poynter Institute and MediaWise. Tokens are expensive, but supporting journalism, a free society and a healthy information ecosystem is priceless.

New here? What is Community Notes — and why is a bot writing them?

Community Notes is X's crowd-sourced fact-checking system: contributors add context to potentially misleading posts, and a note is shown publicly only if people who usually disagree with each other both rate it helpful. In a recent pilot, X opened an API that lets approved AI "Note Writers" draft notes automatically — and bots now write roughly half of all notes on the platform. This page follows one of those bots, in public, as it works.

This page shows what an AI Community Notes writer does in public, every day, in detail. The bot writes under the Community Notes alias Kind Raspberry Chickadee — X assigns bird names to keep contributor identities pseudonymous (my human alias is Melodious Glacier Quail). It reads posts X surfaces, then looks for evidence that could fact-check the claim — fact-checks from IFCN signatories (PolitiFact), but also government data (BLS jobs reports, CBO scores), primary records (Congress.gov, agency filings), reporting from major newsrooms and even other X posts the current post may be misrepresenting. It submits a community note when it finds a clean match. It honestly declines most of the time.

A grey-headed bird with a raspberry-pink chest perched on a mossy branch surrounded by raspberries. — Kind Raspberry Chickadee as imagined by Google Gemini. Raspberry chickadees aren't real.

I was inspired by the excellent work of Alexios Mantzarlis at Indicator. I wanted to try to build a better bot writer that would actually address political misinformation. Especially ahead of the elections. That's why it has a narrow beat. And will probably have a low "helpfulness rating." But that's what we're testing! You'll see I also did some light fine-tuning of an open-source model and pitted the frontier models against each other to see which ones are the least terrible at this thing.

I built it with lots of button smashing with Claude Code. But also with the deep expertise I've developed (sadly) over the last FIVE years of note-watching. I run it. And I publish every step it takes here so other journalists, researchers and the public can see how this kind of system actually behaves.

Posts seen

Pulled from X's eligible-posts endpoint.

On-beat

About US politicians, federal policy, or election integrity.

Notes written

Drafted and validated against the source.

Submitted to X

In test_mode until the account earns in.

Data refreshes every time the bot runs (currently every two hours). Last update: .

The funnel

Where do posts drop out? Most never get past the first filter. That's by design — the bot's beat is narrow.

Step by step, what each stage means:

Eligible posts seen. Whatever X returns when the bot asks for posts it could note. Sports, gaming, celebrity drama, foreign politics — anything.
On-beat. A cheap Claude Haiku call decides whether the post is about US politicians or US political misinformation.
Evidence found. For on-beat posts, the bot searches PolitiFact, the Google Fact Check Tools API, and (if needed) the broader web. A post passes this stage if at least one fact-check or primary source comes back.
Note drafted. A Claude Opus call writes the note prose. The bot only writes when the evidence directly addresses the post's claim. When it doesn't, the bot returns NO_NOTE.
Submitted to X. Notes that pass length, URL, and evaluate_note pre-flight checks go to X. Still in test_mode — X requires it during the AI Note Writer pilot.

Why the bot declines

The bot says no a lot. Here's why, with counts:

The biggest two buckets are by design:

Off-beat. Most of what X surfaces isn't about U.S. politics. The bot drops it before spending an Opus token on it.
Picker: candidate doesn't match claim. The bot found a fact-check on a related topic, but it doesn't directly rate this post's claim. Rather than stretch, the bot returns nothing.

The other two are safety rails:

Opus declined to write. The model was given evidence and decided the case wasn't airtight. This is what you want a fact-checking bot to do when in doubt.
URL validator rejected. A safety check catches notes where Claude tried to write a URL itself. The bot only ever cites URLs that came back from a real search result; if the prose Claude produced contains any http, https, or domain text, the note is dropped. This is the structural fix for a hallucination problem an earlier version of the bot had.