Semrush helps you:

  • Do keyword research
  • Audit your local listings
  • Perform competitor analyses
  • Manage social media accounts
  • And much more!

Backlinko readers get:

A 14-day trial for premium features. 55+ tools.
Free access for core features.

Newsletter Sign Up

Backlinko readers get unlimited
access for 14 days. 55+ tools.

Close

Grow your online visibility.
On all key channels.
From just one platform.

✓ Find profitable keywords
✓ Create winning content
✓ Get more organic traffic

Technical SEO

  • SEO Marketing Hub 2.0
  • Technical SEO
  • Robots.txt
Page Speed and SEO
Duplicate Content

Robots.txt

What Is Robots.txt?

Robots.txt is a file instructing search engine crawlers which URLs they can access on your website. It’s primarily used to manage crawler traffic and avoid overloading your site with requests.

While major search engines like Google, Bing, and Yahoo recognize and respect robots.txt directives, it’s important to note that this file is not a foolproof method for preventing web pages from appearing in search results.

Why Is Robots.txt Important?

Most websites don’t need a robots.txt file.

That’s because Google can usually find and index all of the important pages on your site.

And they’ll automatically NOT index pages that aren’t important or duplicate versions of other pages.

That said, there are 3 main reasons that you’d want to use a robots.txt file.

Block Non-Public Pages: Sometimes, you have pages on your site that you don’t want indexed. For example, you might have a staging version of a page, a login page, or an internal search results page. These pages need to exist, but you don’t want random people landing on them. In this case, you’d use robots.txt to block these pages from search engine crawlers and bots.

Maximize Crawl Budget: If you’re having trouble getting all of your pages indexed, you might have a crawl budget problem. By blocking unimportant pages with robots.txt, Googlebot can spend more of your crawl budget on the pages that actually matter.

Prevent Search Engine Indexing of Resources: Using meta directives can work just as well as Robots.txt for preventing pages from getting indexed. However, meta directives don’t work well for multimedia resources, like PDFs and images. That’s where robots.txt comes into play.

The bottom line? Robots.txt tells search engine spiders not to crawl specific pages on your website.

You can check how many pages you have indexed in the Google Search Console.

GSC – Page indexing

If the number matches the number of pages that you want indexed, you don’t need to bother with a Robots.txt file.

But if that number is higher than you expected (and you notice indexed URLs that shouldn’t be indexed), then it’s time to create a robots.txt file for your website.

Best Practices

Create a Robots.txt File

Your first step is to actually create your robots.txt file.

Being a text file, you can actually create one using Windows Notepad.

And no matter how you ultimately make your robots.txt file, the format is exactly the same:

User-agent: X
Disallow: Y

User-agent is the specific bot that you’re talking to.

And everything that comes after “disallow” are pages or sections that you want to block.

Here’s an example:

User-agent: googlebot
Disallow: /images

This rule would tell Googlebot not to index the image folder of your website.

You can also use an asterisk (*) to address any search engine bots that stop by your website.

Here’s an example:

User-agent: *
Disallow: /images

The “*” tells any and all spiders to NOT crawl your images folder.

This is just one of many ways to use a robots.txt file. This helpful guide from Google has more info on the different rules you can use to block or allow bots from crawling different pages of your site.

Google Search Central – Robots.txt rules

Make Your Robots.txt File Easy to Find

Once you have your robots.txt file, it’s time to make it live.

You can place your robots.txt file in the root directory of your site.

But to increase the odds that your robots.txt file gets found, I recommend placing it at:

https://example.com/robots.txt

Note: Your robots.txt file is case-sensitive. So make sure to use a lowercase “r” in the filename.

Check for Errors and Mistakes

It’s REALLY important that your robots.txt file is set up correctly. One mistake and your entire site could get deindexed.

Fortunately, you don’t need to hope that your code is set up right. Google has a powerful tool for testing robots that you can use:

GSC – Settings – Robots.txt

It shows you your robots.txt file… and any errors and warnings that it finds.

As you can see, we block spiders from crawling our WP admin page.

We also use robots.txt to block crawling of WordPress auto-generated tag pages (to limit duplicate content).

Robots.txt vs. Meta Directives

Why would you use robots.txt when you can block pages at the page-level with the “noindex” meta tag?

As I mentioned earlier, the noindex tag is tricky to implement on multimedia resources like videos and PDFs.

Also, if you have thousands of pages that you want to block, it’s sometimes easier to block the entire section of that site with robots.txt instead of manually adding a noindex tag to every single page.

There are also edge cases where you don’t want to waste any crawl budget on Google landing on pages with the noindex tag.

That said:

Outside of those three edge cases, I recommend using meta directives instead of robots.txt. They’re easier to implement. And there’s less chance of a disaster happening (like blocking your entire site).

Learn More

Learn about robots.txt files: A helpful guide on how they use and interpret robots.txt.

What is a Robots.txt File? (An Overview for SEO + Key Insight): A no-fluff video on different use cases for robots.txt.

Next Duplicate Content
Previous Page Speed and SEO
Next Duplicate Content
More Topics
All Topics
8 ResourcesSEO Fundamentals
4 ResourcesKeyword Research Strategies
8 ResourcesContent Optimization Strategies