Robots.txt is one of the most overlooked but critical files on your site. It’s tiny. It’s technical. And when misconfigured, it can block your best content from ever showing up in search.
I’ve seen good sites vanish from Google because of one line in this file. I’ve also used it strategically to clean up crawl behavior, prioritize high-value pages, and make sure search engines are spending their time where it counts.
Let me walk you through how I use robots.txt in technical SEO—and why it still matters more than most people think.
What You’ll Learn in This Article
Here’s what I’ll walk you through:
- What robots.txt actually does (and what it doesn’t)
- How I use it to guide search engine bots
- Common mistakes that hurt crawlability
- My approach to writing a clean, effective robots.txt file
- Tools I use to test and monitor it
What Robots.txt Actually Does
Robots.txt is a plain text file that sits at the root of your domain (example.com/robots.txt).
Its job? To tell search engines which parts of your site they can and can’t crawl.
It’s not a security measure. It doesn’t stop pages from being accessed.
It’s a set of crawl instructions for bots like Googlebot, Bingbot, and others.
Here’s a basic example:
makefile
CopyEdit
User-agent: *
Disallow: /admin/
Disallow: /checkout/
That tells all bots not to crawl those two folders.
Simple? Yes. But powerful when used correctly.
How Robots.txt Affects SEO
Search engines have limited resources when crawling your site.
That’s called crawl budget—and while it’s not infinite, it’s yours to manage.
Here’s how I use robots.txt as part of my technical SEO process:
- Block low-value pages (admin, login, internal search results)
- Prevent duplication from filtered navigation (e.g. /?filter=price)
- Keep development or staging content from being crawled
- Optimize crawl efficiency so bots spend more time on ranking pages
In other words, robots.txt doesn’t improve rankings directly—it protects and prioritizes them.
Where Robots.txt Can Go Very Wrong
This is where things get risky. I’ve seen robots.txt files that:
- Block the entire site (Disallow: /)
- Block important assets like CSS or JS
- Block pages but still include them in sitemaps
- Prevent Google from rendering the page layout correctly
- Combine crawl blocks with “noindex” tags, confusing bots
One wrong line in robots.txt can hide your entire site from search engines.
If your site has indexing issues, this file is one of the first things I check.
My Process for Writing a Smart Robots.txt File
Here’s how I approach it—line by line.
Step 1: Allow Everything by Default
Unless I have a reason to block it, I let bots crawl it.
makefile
CopyEdit
User-agent: *
Disallow:
That’s a wide-open robots.txt file. It doesn’t block anything.
Step 2: Disallow Problem Areas
Then I get specific. I usually block:
- /wp-admin/ (except the admin-ajax.php file if needed)
- /cart/, /checkout/, or account-related URLs
- Internal search pages (e.g. /?s=)
- Tag archives or filtered product URLs
- Query parameters that generate duplicate content
Example:
makefile
CopyEdit
User-agent: *
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /search
Step 3: Double Check Key Assets Are Crawlable
CSS, JS, fonts—these need to be crawlable so Google can render the page correctly.
If your robots.txt blocks these files, you’ll likely see “page resources blocked” errors in Google Search Console.
Tools I Use to Audit and Test Robots.txt

You don’t have to guess whether your file is working. Here’s what I use:
- Google Search Console (robots.txt Tester) – test how bots interpret your rules
- Screaming Frog – check which URLs are blocked from crawling
- Ahrefs / Semrush Site Audits – for alerts when important URLs are disallowed
- Fetch as Google (in GSC) – to see if key elements are being rendered properly
If a page isn’t performing, I check if robots.txt is part of the problem before anything else.
Bonus Tip: Use with Sitemaps and Noindex Properly
Robots.txt is not a replacement for meta noindex.
If you block a page from being crawled via robots.txt, Google can’t see the “noindex” tag.
That means the page might still show up in search—even if you didn’t want it to.
My rule of thumb:
- Use robots.txt to block crawl access
- Use noindex (via meta tags) to remove from search results
- Never block pages via robots.txt and ask Google to index them in your sitemap
The three need to work together—not fight each other.
Final Takeaway: Robots.txt Isn’t Optional—It’s Strategic
Here’s the truth:
You don’t need a fancy SEO tool to fix crawl issues.
You need a clean robots.txt file that tells bots where to go—and where not to.
If you manage it well:
- Google crawls the right pages faster
- You reduce waste and duplication
- You improve crawl budget for your highest-value content
If you mess it up? Your rankings disappear, and you might not even know why.
That’s why I always start technical audits here.
If you haven’t checked your robots.txt in a while, this guide breaks it down.
Because visibility starts with access.






