Making a portfolio AI-readable: llms.txt, structured data, and a crawler policy
Search is splitting into two audiences — Google's crawler and AI assistants like ChatGPT, Claude, and Perplexity. Here's exactly how I made this site legible to both, with a single source of truth so nothing drifts.
On this page
For years, "being findable" meant one thing: rank on Google. That's changing. A growing share of people now ask an AI assistant — ChatGPT, Claude, Perplexity, Gemini — and the answer they get is synthesized from a handful of sources the model decided to trust. You want to be one of those sources.
The good news: the work that makes a site legible to an AI assistant overlaps heavily with classic technical SEO. Here's the full pass I did on this site, and why each piece earns its place.
llms.txt — a map for language models
llms.txt is an emerging convention: a single Markdown file at /llms.txt that gives a model a clean, curated map of your site — who you are, and where the canonical pages live — without making it crawl and parse your HTML.
The format is simple: an H1 with your name, a blockquote summary, a short prose paragraph, then ## sections of [name](url): description links.
# Manuj Rai
> Full-Stack & AI Engineer based in Ahmedabad, India...
## Pages
- [About & Résumé](https://www.manuj.online/about): Background, experience, résumé.
- [Work — Case Studies](https://www.manuj.online/work): Project case studies.
## Projects
- [News Desk — Story-Discovery Agent](https://www.manuj.online/work/news-scraper): ...
The mistake is to hand-write it and let it rot. I generate it from the same profile.ts that drives the rest of the site, as a route handler, so it can never drift:
export const dynamic = "force-static";
export function GET() {
return new Response(buildLlmsTxt(), {
headers: { "content-type": "text/plain; charset=utf-8" },
});
}
I also ship a fuller /llms-full.txt that inlines the whole profile — bio, every case study, services, FAQ — so an assistant can answer detailed questions in one fetch instead of crawling every page.
A deliberate AI-crawler policy
Your robots.txt already controls who crawls you, including AI bots. The default wildcard rule silently allows everyone — fine, but it says nothing about intent. Since this is a portfolio that wants to be cited, I name the AI crawlers explicitly and welcome them:
User-Agent: GPTBot
User-Agent: ClaudeBot
User-Agent: PerplexityBot
User-Agent: Google-Extended
Allow: /
One subtle but important rule: don't Disallow a page you've marked noindex. A crawler has to fetch a page to see its noindex tag. Block it in robots and you strand a bare, snippet-less URL in the index forever. Keep noindex pages crawlable; let the meta tag do its job.
Structured data is the part AI actually reads
JSON-LD (schema.org) is the most machine-legible thing on your page, and both Google and LLM-powered tools lean on it. The single highest-leverage move is to stop scattering anonymous duplicate entities and consolidate the graph with @id references.
Define your Person and WebSite once, site-wide, then have every other node reference them instead of re-describing them:
// On a case-study page — point at the canonical entities, don't re-declare them
author: { "@id": `${siteUrl}/#person` },
isPartOf: { "@type": "WebSite", "@id": `${siteUrl}/#website` },
Now there's exactly one "Manuj Rai" entity, and a case study, a photo, and an About page all resolve to it. That's what lets a knowledge graph — or an assistant — connect the dots into a single, confident picture of who you are.
The small stuff that signals care
security.txt(RFC 9116) at/.well-known/— expected on a security-conscious site, and cheap.- A clean entity image — I made the home/Person OG card a full-bleed portrait so every share and knowledge panel ties a face to the name, and branded title cards for the marketing pages.
- Honesty over keyword-stuffing. AI assistants are good at detecting padding. Specific, true claims about real work beat a wall of skills — which is also why I write these build-grounded engineering deep-dives instead of think-pieces.
Does it "work"?
The honest answer: AI-citation visibility is new and hard to measure cleanly, so treat this as positioning, not a guaranteed traffic lever. But none of it is speculative effort — every piece is also plain-good SEO and structured data. You're not betting on AI search; you're making your site legible to whoever is reading, human or model. That's a bet that pays either way.
Frequently asked questions
- What is llms.txt?
- It's an emerging convention: a single Markdown file at /llms.txt that gives AI assistants a clean, curated map of your site — who you are and where the canonical pages live — without making them crawl and parse your HTML.
- Does llms.txt actually help SEO?
- Indirectly. It's positioning for AI-assistant visibility rather than a guaranteed Google ranking lever, but every related step — structured data, a clean crawler policy, consolidated entities — is also plain-good technical SEO, so the effort pays off either way.
- Should I block AI crawlers like GPTBot in robots.txt?
- Only if you don't want to be cited. If you want AI assistants to quote you as a source, name crawlers like GPTBot, ClaudeBot, and PerplexityBot explicitly and allow them. And never Disallow a noindex page — a crawler must fetch it to see the noindex tag.
- What structured data matters most for AI readability?
- Consolidate your schema.org graph with @id references: define Person and WebSite once, site-wide, and have every other node reference them. Then there's exactly one entity an assistant or knowledge graph can resolve to.
Metadata-filtered RAG fixes single-shot retrieval that returns junk on multi-topic corpora. How I built a metadata pre-filter, vector search, and LLM rerank pipeline.
ReadHardening a Razorpay integration in Next.js: checkout vs webhook signature verification, idempotent settlement with a Postgres ledger, and the operational guards.
Read