AI bot access
If a bot is blocked in robots.txt, that platform cannot crawl and cite your content. Ensure these crawlers are allowed for AI visibility:
| Bot / User-Agent | Platform |
|---|---|
| GPTBot, ChatGPT-User | OpenAI (ChatGPT) |
| PerplexityBot | Perplexity |
| ClaudeBot, anthropic-ai | Claude |
| Google-Extended | Gemini / AI Overviews |
| Bingbot | Copilot |
Some crawlers are used only for training (e.g. CCBot). Blocking training-only crawlers does not affect live search/citation; blocking the bots above does.
Example robots.txt rules
robots.txt is a small text file at the root of your site (visible4ai.com/robots.txt). Crawlers read it before fetching any page. Here are the two common cases:
Allow all major AI crawlers
This is the safe default if your goal is to be found in AI products. Explicit Allow rules make your intent obvious to anyone reading the file later.
# robots.txt — allow major AI crawlers # (Allow: / is also the default when no rules exist; # spelling it out makes intent explicit.) User-agent: GPTBot Allow: / User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: / User-agent: Google-Extended Allow: / User-agent: bingbot Allow: /
Block one bot, keep the rest
If you want to stay visible everywhere except one platform (commonly because that platform also uses your content for training), block that specific user-agent and leave a wildcard Allow for everyone else.
# robots.txt — opt one bot out, allow the rest # Useful if you want to stay citable but exclude a specific # crawler from training your own content. User-agent: GPTBot Disallow: / User-agent: * Allow: /
robots.txt is a request, not enforcement. Well-behaved crawlers (the ones listed above) honour it; abusive scrapers ignore it. If you need real blocking, use server-level rules (firewall, rate limit, or 403 by user-agent).