Skip to content
Day job 2025 Solo developer · Production

AI Chatbot — RAG · Agents · Tools

A production retrieval-augmented chatbot over internal documents. Metadata-filtered retrieval feeds a tool-using agent loop; FastAPI exposes the API; the Python service runs on Render.

AI Chatbot — RAG · Agents · Tools screenshot
/ The challenge

What needed solving

Internal teams kept hitting the same long handbooks and policy docs for the same questions. Plain keyword search missed context, and single-shot RAG over the whole corpus returned irrelevant chunks when a question touched only one section.

/ What I built

The solution

Built a metadata-aware RAG pipeline: each chunk carries source / section / tag metadata used to pre-filter the candidate set before semantic search. On top, an agent loop (LangChain tools + structured tool-calling) decides when to retrieve, when to summarize, and when to call structured-extraction tools. Exposed via a FastAPI service deployed on Render with streaming responses.

PythonFastAPILangChainRAGAgents & ToolsOpenAIMetadata filteringRender
/ Outcomes

What changed

  • 1Metadata filtering narrows the candidate set before vector ranking — fewer off-topic chunks reach the LLM
  • 2Agent with tools handles structured questions ("extract X from section Y") beyond plain semantic search
  • 3Streaming responses + per-request tracing make latency and quality issues observable in production
/ Under the hood

Technical highlights

Two-stage retrieval — metadata filter → vector similarity → LLM rerank — improves precision on multi-topic corpora
Tool-using agent loop (retrieve · summarize · structured-extract) instead of single-shot RAG
FastAPI + Python service, deployed on Render with push-to-deploy
OpenAI for embeddings and chat completion; swappable per environment

Need something like this?

I take on a small number of projects each quarter. Let's talk if your idea fits.