Crawl budget is an often overlooked area of search engine optimization that can have serious implications for your website.
If Google isn’t actively crawling or discovering new site pages, it won’t index them in the search results, which will hurt your keyword rankings and minimize opportunities for driving more organic traffic to your website.
Read our latest guide to learn more about crawl budget, why it’s important for SEO, and how to optimize it for search engines like Google.
Crawl budget is a broad term that specifies how often Google crawls and indexes your site’s pages in a given period of time.
Website and navigation layout, duplicate content (within the site), soft 404 errors, low-value sites, website speed, and hacking problems are all factors that influence crawl budget.
This is one of the most surprising facts. It may be surprising because a website crawler may be active on a site for weeks or months while barely ever marking its pages as crawler-friendly.
Crawl budget (or crawl demand) is important for SEO for the following reasons:
A massive amount of duplicate content, such as large sites with thousands of articles, or ecommerce websites with millions of product pages, can be a huge drawback for websites that are suffering from crawling issues..
However, if your website is properly crawled, and you have a large amount of content on your website, Google will be able to index it.
If you don’t have a huge amount of content, and you’re not on every platform that’s ranking for your keywords, you can get by with a smaller crawl budget.
Through search engine optimization, you can ensure that all of your website’s pages are useful and up to date, and can be crawled by Google and ranked for searches.
Crawl budget optimization is the method of ensuring that search engines will crawl and index all of the site's relevant pages in a timely manner.
Like I mentioned, small websites don't normally have a problem with crawl budget optimization, but large websites with thousands of URLs do.
However, as you'll see farther down, the easiest approach to optimize your crawl expenditure is to adopt SEO best practices, which will also have a good influence on your keyword rankings.
A comprehensive crawl budget optimization plan will include things like setting a time frame in which to achieve a certain level of mobile-like speed and load times. You should also set and implement a crawl budget best practices for each piece of content, as this will ensure that the page you optimize is continuously crawled by search engines over time.
Looking to learn more about search engine optimization? Read our SEO beginner’s guide for everything you need to know about SEO and how to drive business results through search engines like Google and Bing.
Below we’ll walk through all the ways you can optimize the crawl budget for your website.
The first step is optimizing your site’s navigation and overall structure. Make sure that your most important pages are linked within your navigation, as well as the homepage.
You also want to reduce the page depth of the URLs on your website.
Page depth is how many clicks it takes a user before they can navigate to that web page. Pages that are closer to the homepage, the more important they’re considered to be by Google.
Best practice is to ensure that your page depth is 3 clicks or less from the homepage. The further your web pages are from the homepage, the less likely it is that they will be crawled.
When it comes to crawling and indexation, search engines will choose the most relevant pages on your website.
Internal links are also a big factor for enabling Google’s spiders to properly crawl your website.
Internal linking optimization that aids crawl budget entails:
A combination of pagination and infinite scrolling can help with improving your internal linking for your website to ensure that your web pages are being discovered and indexed by search engines.
Simply put, a fast-loading website allows the Googlebot to crawl more pages on the same domain in less time. This is an indicator to Google that you have stable website architecture, as well as a signal to crawlers that your site is worth visiting because it can offer a good user experience due to quick page load times.
A fast site speed encourages users to visit your website and make online transactions. This is because more users are able to see your products and services quickly, which is a key factor in increasing your traffic to your site.
The more that users can browse your web pages within a quick time frame, the higher the chances are that those pages will rank at the forefront of Google's Search Results. Page speed is also considered an important ranking factor in Google’s algorithm with their latest announcement of Web Core Vitals.
Duplicate content is one aspect that can have a detrimental effect on crawl expenditure.
In this case, duplicate content refers to the same or quite close content that appears in several URLs on your website.
When related goods are classified in several categories on larger sites, or eCommerce sites with the same content, this is a prominent SEO issue because it signals to Google that it shouldn’t crawl your other product pages.
Duplicate material is also a problem for blogs. For instance, if you have many pages that target the same keywords and the content on those pages is identical, Google can consider this duplicate content.
Because of this, it makes Googlebot's task to crawl your site more challenging since it must pick which pages it should index.
Since the crawl rate cap could have been hit crawling and indexing redundant content, pages that are more important to the web may not be indexed.
Another way duplicate content can cause problems is when Google’s crawler treats websites differently in content preference.
In fact, Google is very strict in this respect. With some exceptions for particular types of sites, such as eCommerce or magazines, Googlebot will only allow the content on a website to match one or at most two queries to list in its search results (in what is known as keyword cannibalization).
If you have duplicate content that fails to match one of those queries, Googlebot can detect this duplicate content and eliminate it.
An example of such criteria would be:
Another aspect that may affect crawl budget, similar to redundant material, is thin content.
Thin content refers to web pages with little or no content and have little benefit to the customer. They're often known as low-value pages or low-quality pages.
Pages with no text material, vacant pages, or outdated pages are all examples of pages that are no longer relevant to both search engines and consumers.
To get the most out of your crawl budget, optimize for and repair thin content pages by:
Thin content often isn't worth the money or effort a web designer or content writer would put into it. In some cases, the efforts to increase your website's quality may cost more than you get in the end.
Nevertheless, you can still afford to invest in fine-tuning your website's content and content-seeking strategy.
When you do invest in it, however, you may be able to get a higher return on your investment by working with a website developer who has some SEO experience. They may be able to spend less of your precious budget, know what web pages to remove, or work around these inconsistencies.
404 errors are a prevalent problem for crawl budget because Google is wasting resources trying to recrawl pages that are missing on your website.
To minimize this, you want to 301 redirect any web pages that result in 404 error code statuses, or update any broken links on your website.
To find 404 errors, you can either view these URLs in Google Search Console, or run a technical audit using the Screaming Frog tool.
Reducing the amount of crawl errors on your website is another way to optimize your crawl budget. It's a waste of resources to spend time crawling for mistakes that shouldn't happen in the first place.
To locate and correct crawl mistakes, the best approach is to use the Google search console's "Index Coverage Report" (or crawl stats report in the legacy version of the tool). You can identify any server errors within this report.
Another problem that can cause crawl budget issues are 301 redirect chains.
Let’s say URL A points to URL B. But if URL B then points to URL C, Google is wasting resources crawling this redirect chain.
You want to ensure that you don’t have 301 redirect chains occurring on your website. Again, you can use Screaming Frog to identify and pull a list of URLs that are suffering from 301 redirect chains.
Since search engines choose to frequently update their index with the most current content, popular URLs are crawled more often by search engines.
The number and quality of external links from referring domains can be considered one of the most important factors in determining whether a page is authoritative and should be crawled frequently.
Backlinks aid in the establishment of credibility with search engines, as well as the improvement of a page's PageRank and authority, which leads to higher rankings.
It's one of the most basic SEO principles that hasn't changed in years.
As a consequence, driving backlinks from other websites to your target pages encourages search engines to access those pages more often, increasing crawl budget.
Obtaining links from other websites is challenging, and it is one of the most difficult facets of SEO, but it can strengthen your domain and boost your overall SEO.
Finding reliable links isn't as easy as people think.
One of the factors that can adversely affect your site's search rankings is acquiring links from sites with low authority.
Search engines and other websites will never link to a low-quality page.
A link is only considered a backlink if the original author believes it is a relevant one. You cannot bid for link opportunities and then pay for them.
Many businesses who pay for link opportunities and feel that they are being exploited can be sure that they have lost a large piece of their overall SEO marketing link building budget.
Such practices rarely offer a solid return on investment.
Still, using a link building service provider is a good way to increase link building opportunities.
They will take on the responsibility of finding and executing a link building strategy. This allows you to spend more time on your core business.
Read our latest guide on linkbuilding for SEO for the top methods for driving backlinks to your website.
Robots.txt files are great for telling Google which pages on your website you want crawled. When Google’s crawl bot hits your website, it will first look at your robots.txt file to determine which directives to follow before crawling your site pages.
If you have low quality pages, or want to prevent others from being crawled or indexed on Google, you can add these types of directives directly within the robots.txt file, which will help with optimizing your crawl budget.
All of your site pages should have canonical tags. These are HTML tags you insert into the <head> section of your site pages and are mainly used if you have duplicate content, or slight variations of the same page.
These tags basically tell Google that one URL is the “master copy”, and all of the other variant URLs should either be ignored, or pass SEO value to the “master copy”, which will help improve your keyword rankings for that page.
Canonical tags are especially important on ecommerce sites that use URL parameters to filter products based on things like color, price, or year.
It's also a smart idea to stop directing Google to pages with non-200 status codes.
Make sure you're linking to the live, preferred version of your URLs in your content to stop wasting your crawl budget. As a general rule, you should stop referring to URLs that aren't the content's ultimate destination.
For instance, you should not link to:
This is especially important in your XML sitemap. Google uses your XML sitemap to discover your site pages and check for things such as when that page was uploaded or last updated.
If you’re XML sitemap contains any of the above URLs or status codes, you’re wasting valuable crawl budget.
If a page hasn't updated on the few occasions Google has crawled it, that page may no longer be crawled by Google because the search engine tries to avoid sites in their index that are stale (otherwise known as Google’s “freshness ranking factor”).
Google prioritizes new content that’s frequently updated, not outdated pieces that haven’t been touched in years and may be unsatisfactory to searchers.
Having fresh content helps keep your site relevant for new search results. It has the added bonus of helping your site rank better and keeping users on your web pages because it has the most up-to-date information.
Make sure you have an aggressive writing cadence (multiple articles per week) and that you’re updating your site pages as regularly as every 3-6 months.
Read our latest guide on how to update old content for SEO.