Those of you who have been following this blog for a while know that I’m big on developing good internal linking structures. A good internal linking structure can not only increase the number of pages your site/blog has indexed, but it can actually increase the authority certain pages receive.
There are 3 “controllers” when it comes to this; using the META noindex tags and it’s various forms, using a robots.txt file and using the rel=”nofollow” attribute within links.
In recent months there has been alot of confusion as to what each of these controls actually do. Does a robots.txt exclusion restrict Google from indexing a page, does it prevent it from receiving “link juice,” does it still show up in the SERPS? Same for the META nofollow tag and rel=”nofollow”??
How exactly are each of these treated by Google? In this article we’re going to dive into each of these bot controllers and try to eliminate some of the confusion.
First I’ll explain what each one does and then I’ll go over the best way to control your internal linking.
The robots.txt File
The robots.txt file is a simple txt file that you can create with any basic text editor like notepad, wordpad etc. It is uploaded into the root of a site and named robots.txt For a more in-depth explanation, refer to my robots.txt guide.
According to Matt Cutts, who was interviewed by Eric Enge which was published by Andy Beard, robots.txt will prevent the GoogleBot from crawling any page that is restricted within. However, these pages can still obtain PageRank and they can still be returned in the SERPS.
What’s that mean? If you restrict a page with robots.txt it only means that Google won’t read its content or follow links within the page(s.) That tells me that although the page(s) can still receive PageRank, that PR will not be distributed to the links within the page, as it would if it weren’t restricted. This is why Google wants those who sell links to restrict sponsored posts and other pages that have links which were sold through robots.txt.
This also tells me that it is NOT a good way to control your own internal linking structure in most cases because if there is even one other link on the WWW linking to that page, it will still accrue PR and it will still rank.
The META noindex tag
There are actually several forms of this META tag, but we’ll just talk about the most important.
- “NOINDEX” same as “NOINDEX, FOLLOW”
This tells Google not to index the page, but it still crawls the page and links are still followed.
- “NOFOLLOW” same as “INDEX, NOFOLLOW”
This tells Google to index this page, but to ignore the outgoing links.
- CONTENT=”NOINDEX, NOFOLLOW”
This tells Google not to index the page or follow the links contained within.
This is a very basic explanation of these tags, excluding whether or not they receive and/or pass PageRank because to be honest with you it’s damn confusing to me too. However, if you follow my advice below, it will not matter.
The rel=”nofollow” attribute
(*EDIT* Google has changed the way authority is passed to links on pages using the rel=”nofollow” attribute since this post was published.) This attribute, when inserted into links, tells google to ignore the link. But, if the page that link points to is linked to from another page on the WWW it will still be indexed, crawled and assigned authority.
The Best Way to Control your Internal Linking
You can do pretty well by simply using the rel=”nofollow” attribute in many cases, but to be absolutely sure and to have 100% control you need to use a combination of robots.txt, META nofollow tags and rel=”nofollow”
Not using any one of those controllers can prevent your site from having the best possible internal linking structure!
photo creadits: bogdan.glushak