Knowledge Scraper — Autonomous Research Agent
An autonomous Python research agent — give it any topic and it plans sub-questions, searches the web, fetches + extracts content, scores sources with Claude, and synthesizes a structured report with citations. Live progress streamed over SSE.

- Year
- 2026
- Type
- Day job
- Stack
- 9
- Outcomes
- 4
What needed solving
Internal teams burn hours collecting context for new topics — manual googling, copy-pasting into docs, no source tracking, no relevance ranking. Static search isn't enough; the work needs actual reasoning over what to search next based on what's already been found.
The solution
Built a closed-loop agent in Python: Claude plans 3–6 sub-questions from the topic, runs them through DuckDuckGo, fetches results with httpx + Playwright fallback, extracts clean content with trafilatura, scores each source for relevance/credibility, and reflects on whether coverage is sufficient or needs another loop. FastAPI exposes a Server-Sent Events stream so the UI shows agent progress in real time.
What changed
- Closed-loop agent: plan → search → fetch → extract → score → reflect → synthesize — runs end-to-end from a single prompt
- SSE-streamed progress so users see each step (searching, fetching URL X, scoring, reflecting) instead of staring at a spinner
- Configurable token + URL budgets per run keep cost and runtime predictable
- Past runs persisted in SQLite so users can revisit, export, or share previous reports
Technical highlights
Need something like this?
I take on a small number of projects each quarter. Let's talk if your idea fits.