It's an emerging convention: a single Markdown file at /llms.txt that gives AI assistants a clean, curated map of your site — who you are and where the canonical pages live — without making them crawl and parse your HTML.

Does llms.txt actually help SEO?

Indirectly. It's positioning for AI-assistant visibility rather than a guaranteed Google ranking lever, but every related step — structured data, a clean crawler policy, consolidated entities — is also plain-good technical SEO, so the effort pays off either way.

Should I block AI crawlers like GPTBot in robots.txt?

Only if you don't want to be cited. If you want AI assistants to quote you as a source, name crawlers like GPTBot, ClaudeBot, and PerplexityBot explicitly and allow them. And never Disallow a noindex page — a crawler must fetch it to see the noindex tag.

What structured data matters most for AI readability?

Consolidate your schema.org graph with @id references: define Person and WebSite once, site-wide, and have every other node reference them. Then there's exactly one entity an assistant or knowledge graph can resolve to.

All writing

June 18, 2026 3 min read Updated June 24, 2026

Making a portfolio AI-readable: llms.txt, structured data, and a crawler policy

Search is splitting into two audiences — Google's crawler and AI assistants like ChatGPT, Claude, and Perplexity. Here's exactly how I made this site legible to both, with a single source of truth so nothing drifts.

SEOAINext.js

On this page

llms.txt — a map for language models
A deliberate AI-crawler policy
Structured data is the part AI actually reads
The small stuff that signals care
Does it "work"?
FAQ

For years, "being findable" meant one thing: rank on Google. That's changing. A growing share of people now ask an AI assistant — ChatGPT, Claude, Perplexity, Gemini — and the answer they get is synthesized from a handful of sources the model decided to trust. You want to be one of those sources.

The good news: the work that makes a site legible to an AI assistant overlaps heavily with classic technical SEO. Here's the full pass I did on this site, and why each piece earns its place.

llms.txt — a map for language models

llms.txt is an emerging convention: a single Markdown file at /llms.txt that gives a model a clean, curated map of your site — who you are, and where the canonical pages live — without making it crawl and parse your HTML.

The format is simple: an H1 with your name, a blockquote summary, a short prose paragraph, then ## sections of [name](url): description links.

# Manuj Rai

> Full-Stack & AI Engineer based in Ahmedabad, India...

## Pages
- [About & Résumé](https://www.manuj.online/about): Background, experience, résumé.
- [Work — Case Studies](https://www.manuj.online/work): Project case studies.

## Projects
- [News Desk — Story-Discovery Agent](https://www.manuj.online/work/news-scraper): ...

The mistake is to hand-write it and let it rot. I generate it from the same profile.ts that drives the rest of the site, as a route handler, so it can never drift:

export const dynamic = "force-static";

export function GET() {
  return new Response(buildLlmsTxt(), {
    headers: { "content-type": "text/plain; charset=utf-8" },
  });
}

I also ship a fuller /llms-full.txt that inlines the whole profile — bio, every case study, services, FAQ — so an assistant can answer detailed questions in one fetch instead of crawling every page.

A deliberate AI-crawler policy

Your robots.txt already controls who crawls you, including AI bots. The default wildcard rule silently allows everyone — fine, but it says nothing about intent. Since this is a portfolio that wants to be cited, I name the AI crawlers explicitly and welcome them:

User-Agent: GPTBot
User-Agent: ClaudeBot
User-Agent: PerplexityBot
User-Agent: Google-Extended
Allow: /

One subtle but important rule: don't Disallow a page you've marked noindex. A crawler has to fetch a page to see its noindex tag. Block it in robots and you strand a bare, snippet-less URL in the index forever. Keep noindex pages crawlable; let the meta tag do its job.

Structured data is the part AI actually reads

JSON-LD (schema.org) is the most machine-legible thing on your page, and both Google and LLM-powered tools lean on it. The single highest-leverage move is to stop scattering anonymous duplicate entities and consolidate the graph with @id references.

Define your Person and WebSite once, site-wide, then have every other node reference them instead of re-describing them:

// On a case-study page — point at the canonical entities, don't re-declare them
author:   { "@id": `${siteUrl}/#person` },
isPartOf: { "@type": "WebSite", "@id": `${siteUrl}/#website` },

Now there's exactly one "Manuj Rai" entity, and a case study, a photo, and an About page all resolve to it. That's what lets a knowledge graph — or an assistant — connect the dots into a single, confident picture of who you are.

The small stuff that signals care

security.txt (RFC 9116) at /.well-known/ — expected on a security-conscious site, and cheap.
A clean entity image — I made the home/Person OG card a full-bleed portrait so every share and knowledge panel ties a face to the name, and branded title cards for the marketing pages.
Honesty over keyword-stuffing. AI assistants are good at detecting padding. Specific, true claims about real work beat a wall of skills — which is also why I write these build-grounded engineering deep-dives instead of think-pieces.

Does it "work"?

The honest answer: AI-citation visibility is new and hard to measure cleanly, so treat this as positioning, not a guaranteed traffic lever. But none of it is speculative effort — every piece is also plain-good SEO and structured data. You're not betting on AI search; you're making your site legible to whoever is reading, human or model. That's a bet that pays either way.

Frequently asked questions

What is llms.txt?: It's an emerging convention: a single Markdown file at /llms.txt that gives AI assistants a clean, curated map of your site — who you are and where the canonical pages live — without making them crawl and parse your HTML.
Does llms.txt actually help SEO?: Indirectly. It's positioning for AI-assistant visibility rather than a guaranteed Google ranking lever, but every related step — structured data, a clean crawler policy, consolidated entities — is also plain-good technical SEO, so the effort pays off either way.
Should I block AI crawlers like GPTBot in robots.txt?: Only if you don't want to be cited. If you want AI assistants to quote you as a source, name crawlers like GPTBot, ClaudeBot, and PerplexityBot explicitly and allow them. And never Disallow a noindex page — a crawler must fetch it to see the noindex tag.
What structured data matters most for AI readability?: Consolidate your schema.org graph with @id references: define Person and WebSite once, site-wide, and have every other node reference them. Then there's exactly one entity an assistant or knowledge graph can resolve to.

/ continue reading

June 24, 2026 7 min

Metadata-filtered RAG: two-stage retrieval that stops returning irrelevant chunks

Metadata-filtered RAG fixes single-shot retrieval that returns junk on multi-topic corpora. How I built a metadata pre-filter, vector search, and LLM rerank pipeline.

Read

June 23, 2026 7 min

Hardening a Razorpay integration: signatures, idempotent webhooks, and a settlement ledger

Hardening a Razorpay integration in Next.js: checkout vs webhook signature verification, idempotent settlement with a Postgres ledger, and the operational guards.

Read

Back to all writing

All writing

June 18, 2026 3 min read Updated June 24, 2026

Making a portfolio AI-readable: llms.txt, structured data, and a crawler policy

SEOAINext.js

On this page

llms.txt — a map for language models
A deliberate AI-crawler policy
Structured data is the part AI actually reads
The small stuff that signals care
Does it "work"?
FAQ

The good news: the work that makes a site legible to an AI assistant overlaps heavily with classic technical SEO. Here's the full pass I did on this site, and why each piece earns its place.

llms.txt — a map for language models

The format is simple: an H1 with your name, a blockquote summary, a short prose paragraph, then ## sections of [name](url): description links.

# Manuj Rai

> Full-Stack & AI Engineer based in Ahmedabad, India...

## Pages
- [About & Résumé](https://www.manuj.online/about): Background, experience, résumé.
- [Work — Case Studies](https://www.manuj.online/work): Project case studies.

## Projects
- [News Desk — Story-Discovery Agent](https://www.manuj.online/work/news-scraper): ...

The mistake is to hand-write it and let it rot. I generate it from the same profile.ts that drives the rest of the site, as a route handler, so it can never drift:

export const dynamic = "force-static";

export function GET() {
  return new Response(buildLlmsTxt(), {
    headers: { "content-type": "text/plain; charset=utf-8" },
  });
}

A deliberate AI-crawler policy

User-Agent: GPTBot
User-Agent: ClaudeBot
User-Agent: PerplexityBot
User-Agent: Google-Extended
Allow: /

Structured data is the part AI actually reads

Define your Person and WebSite once, site-wide, then have every other node reference them instead of re-describing them:

// On a case-study page — point at the canonical entities, don't re-declare them
author:   { "@id": `${siteUrl}/#person` },
isPartOf: { "@type": "WebSite", "@id": `${siteUrl}/#website` },

The small stuff that signals care

security.txt (RFC 9116) at /.well-known/ — expected on a security-conscious site, and cheap.
A clean entity image — I made the home/Person OG card a full-bleed portrait so every share and knowledge panel ties a face to the name, and branded title cards for the marketing pages.
Honesty over keyword-stuffing. AI assistants are good at detecting padding. Specific, true claims about real work beat a wall of skills — which is also why I write these build-grounded engineering deep-dives instead of think-pieces.

Does it "work"?

Frequently asked questions

What is llms.txt?: It's an emerging convention: a single Markdown file at /llms.txt that gives AI assistants a clean, curated map of your site — who you are and where the canonical pages live — without making them crawl and parse your HTML.
Does llms.txt actually help SEO?: Indirectly. It's positioning for AI-assistant visibility rather than a guaranteed Google ranking lever, but every related step — structured data, a clean crawler policy, consolidated entities — is also plain-good technical SEO, so the effort pays off either way.
Should I block AI crawlers like GPTBot in robots.txt?: Only if you don't want to be cited. If you want AI assistants to quote you as a source, name crawlers like GPTBot, ClaudeBot, and PerplexityBot explicitly and allow them. And never Disallow a noindex page — a crawler must fetch it to see the noindex tag.
What structured data matters most for AI readability?: Consolidate your schema.org graph with @id references: define Person and WebSite once, site-wide, and have every other node reference them. Then there's exactly one entity an assistant or knowledge graph can resolve to.

/ continue reading

June 24, 2026 7 min

Metadata-filtered RAG: two-stage retrieval that stops returning irrelevant chunks

Metadata-filtered RAG fixes single-shot retrieval that returns junk on multi-topic corpora. How I built a metadata pre-filter, vector search, and LLM rerank pipeline.

Read

June 23, 2026 7 min

Hardening a Razorpay integration: signatures, idempotent webhooks, and a settlement ledger

Hardening a Razorpay integration in Next.js: checkout vs webhook signature verification, idempotent settlement with a Postgres ledger, and the operational guards.

Read

Back to all writing