Robots.txt files are a must-have for any website that wants to rank higher in the search engines, and they're easy to create! In this article, we'll talk about what robots.txt files are, how they can help your site's SEO ranking, and why creating one is so important.
Robots.txt files are the simplest way to communicate with search engine crawlers, and they come in two forms: Disallow and Allow (or "Allow" for short). The Disallow form lets you specifically tell a crawler not to access certain parts of your website or online store; on the other hand, "Allow" simply lets a crawler know that it can visit everything.
In the eyes of search engines, it's important that every page on your site is crawlable. This includes pages that are not optimized for crawlers (like password protected ones), as well as those with errors in them. Robots.txt files can help you keep these pages off limits to crawlers and maintain a higher ranking in Google by making the rest of your site clean and navigable.
The Disallow form tells crawlers not to access certain parts of your website or online store, while "Allow" simply lets a crawler know that it can visit everything on the page. By using Robots.txt files correctly, you're able to maintain a higher ranking in Google by making sure that the rest of your site is clean and navigable.
Robots.txt files tell crawlers whether they should crawl a page or not, and this can have an impact on your SEO ranking in the long run. If you want to rank higher in Google for certain keywords that are only found on pages of your site which are set as "Disallow," then it's important to make sure those pages are set with the Disallow command.
This may include password protected pages, or certain sections of your site that include duplicate query parameters (like on e-commerce websites).
When you create your robots.txt file, you should always use the Disallow form and list one URL per line, like so:
That means that our blog is not accessible by search engine crawlers. We could also make this more specific by adding a path to the Disallow directive:
That means that our blog is not accessible by crawlers with URLs that contain "category/". We can also make this more broad and tell search engine crawlers they cannot access any of our content, like so:
Disallow: /* This is the broadest way to tell crawlers not to visit our website
In this case, we're telling all crawling bots that they should not access any pages on our site. With Disallow, we can use as many lines of code as needed; also keep in mind that you don't need a forward slash before every line: Disallow: /blog/*
If we were to add a line for the Allow directive, this is telling search engines that they can access the content in the subfolder, such as:
Crawl-delay: This command is used to regulate the amount of requests a spider bot puts in, and it works by specifying how long the bot should wait between each request. Here's a Crawl-delay example with an 8 millisecond crawl delay:
One thing to note is that Google doesn't use this type of directive, although other search engines like Bing do.
The robots.txt file should be uploaded to the root level of your site, or in other words: "in the home directory". It's possible that you'll have a subdirectory inside of this folder (such as blog), but it doesn't matter where you place it for search engines' purposes.
Many hosting systems come with a robots.txt file installed by default, but it's always possible that you've deleted the one they provided and need to create your own. If this is the case, here are some ways to check:
Robots.txt files are not a requirement for your website, though they can help you maintain better rankings in Google by making sure that the rest of your site is clean and navigable.
Many site owners may have robots.txt files automatically added, depending on their CMS like WordPress, but never update the directives.
The only time that robots.txt files become important are if you have a large website and want to influence Google's crawl budget, or if there are certain sections of the site that you don't want crawled.
Even then, a better solution is to add noindex or nofollow meta tags to these types of pages of site sections to ensure that Google doesn't crawl or display them in the search results.
However, this can get tricky for multimedia elements like images or PDFs, which is where a robots.txt file comes into play.
The first step to setting up your robots.txt file is to write one. You can either use WordPress plugins, or write one in a notepad document.
Robots.txt Files typically have the following format:
User-agent is the label for a specific type of bot.
And after "disallow" in robots.txt, you put any page that you want to block.
Here's an example of what that looks like:
The robots.txt rule is beneficial for SEO as it tells Googlebot not to index the Image folder of your website.
In addition, you can use an asterisk (*) to stop any bots from searching your site.
So in this example:
The "*" prevents all search spiders from crawling your images folder, not just Google.
The most important thing you can do for your SEO is apply a robots.txt file to your site in a way that's easily discoverable by search crawlers.
You don't want to place the file in any random directory, but preferably at:
(It doesn't matter what it's called, but make sure that you use lower case, because it is case-sensitive).
You'll also need to upload it to your web host's FTP client, or the root of your domain.
It's vital that you ensure there are no errors or mistakes associated with your robots.txt file, or you run the risk of accidentally deindexing your entire website.
You can use Google's Robots Testing Tool to make sure that it's set up properly.
You also want to include your sitemaps within your robots.txt to help search engines crawl and discover your web content. This helps with discoverability, as well as crawl budget.