These days, most website owners are keenly aware of the important role that high quality content plays in getting noticed by Google. To that end, businesses and digital marketers are spending increasingly large amounts of time and resources to ensure that websites are spotted by search engine robots and therefore found by their target audiences.
But while every website owner wants high search engine rankings and the corresponding increases in traffic, there are certain areas of a site that are best hidden from the search engine crawlers completely.
You might wonder why it’s good to keep crawlers from indexing parts of your website. In short, it can actually help your overall rankings. If you’ve spent lots of time, money and energy crafting high quality content for your audience, you need to make sure that search engine crawlers understand that your blog posts and main pages are much more important than the more “functional” areas of your website.
Here are a few examples of web pages that you might want the search robots to ignore:
As you can see, there are plenty of instances where you should be actively dissuading search engines from listing certain areas of your site. Hiding these pages helps to ensure that your homepage and cornerstone content gets the attention it deserves.
So how can you instruct search engine robots to turn a blind eye to certain pages of your website? The answer lies in noindex, nofollow and disallow. These instructions allow you define exactly how you want your website to be crawled by search engines.
Let’s dive right in and find out how they work.
As you can probably imagine, adding a noindex instruction to a web page tells a search engine to “not index” that particular area of your site. The web page will still be visible if a user clicks a link to the page or types its URL directly into a browser, but it will never appear in a Google search, even if it contains keywords that users are searching for.
The noindex instruction is typically placed in the <head>
section of the page’s HTML code as a meta tag:
<meta name="robots" content="noindex">
It’s also possible to change the meta tag so that only specific search engines ignore the page. For example, if you only want to hide the page from Google, allowing Bing and other search engines to list the page, you’d alter the code in the following way:
<meta name="googlebot" content="noindex">
A bit more difficult to configure and therefore less often used is delivering the noindex instruction as part of the server’s HTTP response headers:
HTTP/2.0 200 OK
…
X-Robots-Tag: noindex
These days, most people build sites using a content management system like WordPress, which means you won’t have to fiddle around with complicated HTML code to add a noindex instruction to a page. The easiest way to add noindex is by downloading an SEO plugin such as All in One SEO or the ever-popular alternative from Yoast. These plugins allow you to apply noindex to a page by simply ticking a checkbox.
Adding a nofollow instruction to a web page doesn’t stop search engines from indexing it, but it tells them that you don’t want to endorse anything linked from that page. For example, if you are the owner of a large, high authority website and you add the nofollow instruction to a page containing a list of recommended products, the companies you have linked to won’t gain any authority (or rank increase) from being listed on your site.
Even if you’re the owner of a smaller website, nofollow can still be useful:
Even if your pages only contain internal links to other areas of your website, it can be useful to include a nofollow instruction to help search engines understand the importance and hierarchy of the pages within your site. For example, every page of your site might contain a link to your “Contact” page. While that page is super important and you’d like Google to index it, you might not want the search engine to place more weight on that page than other areas of your site, just because so many of your other pages link to it.
Adding a nofollow instruction works in exactly the same way as adding the noindex instruction introduced earlier, and can be done by altering the page’s HTML <head>
section:
<meta name="robots" content="nofollow">
If you only want certain links on a page to be tagged as nofollow, you can add rel="nofollow"
attributes to the links’ HTML tags:
<a href="https://www.example.com/" rel="nofollow">example link</a>
WordPress website owners can also use the aforementioned All in One SEO or Yoast plugins to mark the links on a page as nofollow.
The last of the instructions we are discussing in this blog post is “disallow.” You might be thinking that this sounds a lot like noindex, but while the two are very similar, there are slight differences:
As you can see, disallowing a page means that you’re telling the search engine robots not to crawl it all, which signifies that it has no use at all for SEO. Disallow is best used for the pages on your site that are completely irrelevant to most search users, such as client login areas or “thank you” pages.
Unlike noindex and nofollow, the disallow instruction isn’t included into a page’s HTML code or HTTP response, but instead is included in a separate file named “robots.txt.”
A robots.txt file is a simple plain text file that can be created with any basic text editor and sits at the root of your site (www.example.com/robots.txt). Your site doesn’t need a robots.txt for search engines to crawl it, but you will need one if you want to use the disallow directive to block access to certain pages. To do that, you’ll simply list the relevant parts of your site on the robots.txt file like this:
User-agent: *
Disallow: /path/to/your/page.html
WordPress website owners can use the All in One SEO plugin to quickly generate their own robots.txt file without the need to access the content management system’s underlying file structure directly.
Do you know for certain which parts of your website are marked as noindex and nofollow or are excluded from being indexed by a disallow rule? If you are not sure, you might consider taking an inventory and reviewing your past decisions.
One way to do such an inventory is to go to www.drlinkcheck.com, enter the URL of your site’s homepage, and hit the Start Check button.
Dr. Link Check’s primary function is to reveal broken links, but the service also provides detailed information on working links.
After the crawl of your site is complete, switch to the All Links report and create a filter to only show page links tagged as noindex:
You now have a custom report that shows you the pages that contain a noindex tag or have a noindex X-Robots-Tag HTTP header.
If you want see all links that are marked as nofollow, switch to the All links report, click on Add… in the filter bar, and select Nofollow/Dofollow from the drop-down menu.
By default, the Dr. Link Check crawler ignores all links disallowed by the rules found in the site’s robots.txt file. You can change that in the project settings:
Now switch the Overview report and hit the Rerun check button to start a new crawl with the updated settings.
After the crawl has finished, open the All Links, click on Add… in the filter section, and select robots.txt status to limit the list to links disallowed by your website’s robots.txt file.
While the vast majority of website owners are far more interested in getting the search engines to notice the pages of their websites, the noindex, nofollow and disallow instructions are powerful tools to help crawlers better understand a site’s content, and they indicate which sections of the site should be hidden from search engine users.