Glossary

robots.txt

The robots.txt file lives at the root of your website (e.g., https://example.com/robots.txt) and contains directives for web crawlers. It follows the Robots Exclusion Protocol, a convention that all major search engines respect. You can use it to block crawlers from specific directories, file types, or the entire site — and you can write separate rules for different user-agents (e.g., Googlebot vs. all bots).

It’s worth noting what robots.txt doesn’t do: it doesn’t prevent a page from being indexed if that page has inbound links and no other indexing controls. It also doesn’t secure content. If you block a URL in robots.txt, Google won’t crawl it, but the page can still appear in search results if other sites link to it. For pages you want genuinely hidden from search results, a noindex meta tag or HTTP header is the right tool.

Common robots.txt Use Cases

Blocking staging environments, admin paths, internal search result pages, and parameter-based URLs from being crawled are all common uses. On WordPress sites, you’ll typically see rules blocking /wp-admin/, /wp-includes/, and sometimes the /?s= search query pattern. Some sites block crawlers from thin or duplicate content sections.

WordPress generates a default robots.txt file when there isn’t a physical one present, and plugins like Yoast SEO let you edit it from the admin dashboard. The actual file, if it exists on disk, takes precedence over the virtual one WordPress generates.

robots.txt in Statamic

Statamic doesn’t manage robots.txt for you — it’s a static file you create in your public/ directory, or you can generate it dynamically from a route if you want environment-based behavior.

The environment-based approach is useful when you have a staging site that should never be indexed. You can set up a route that outputs Disallow: / when APP_ENV is anything other than production, and allow normal crawling rules on production. This prevents accidental indexing of staging content without manually maintaining separate files per environment.

During a migration, your robots.txt setup is one of the simpler items to transfer, but it’s easy to overlook. If your old WordPress site had customized rules — blocking certain paths, allowing specific bots, referencing a sitemap location — make sure those rules are replicated or deliberately updated in your Statamic setup. The sitemap path in particular may change.

Review your current robots.txt as part of any pre-migration audit. See WordPress Site Audit for a broader checklist.

Need more clarity?

Book a discovery call and we’ll walk through your situation — what you have, what the migration looks like, and whether it’s the right move.

Book a Discovery Call →