Keys to a great robots.txt file

I know this is a fairly dry search engine optimization subject, but it is important nonetheless. Setting up and managing your robots.txt file on your website could be the difference between getting ranked and not. Below I will explain the function of the robots.txt file and give you some tips on making sure that it is correct, strong and serving its purpose.

 

What is a robots.txt file?


A robots.txt file is a file that is uploaded on your server to let search engines know if there are any webpages on your website that it should not index or crawl and to restrict certain search engine robots that you do not want crawling your website. It is a very simple text file that is placed on your web server in the root folder so the search engines can find it, for example:

http://www.yoursite.com/robots.txt.

 

Use disallow protocol properly.


The robots.txt files purpose is to let the search engines know what pages NOT to index. Your page should be made up of disallow protocols only unless there is a case where you want a search engine to view a page in a blocked subdirectory, for example:


User-agent:  *
Disallow:  /pictures
Allow:  /pictures/mywidget.jpg


The search engines only check the robots.txt page to see what they are not supposed to do; there is no need to have a long list of web pages that you want indexed with “allow” protocols. They will find those pages naturally unless otherwise advised by a disallowed protocol.


Know how spiders function.


The robot text file is read from top to bottom, and if a search engine runs into an error on the file, it could simply ignore the information below the error. Therefore, it is very important to ensure your robot text file has no errors. There are quite a few tools you can find online to check your file for errors; http://www.frobee.com/robots-txt-check is one example. The last thing you want is a simple file like your robots.txt file holding back your rankings because it was created with errors.

 

Creating your robots.txt file.


Many people don’t understand how simple it can be to create the robots.txt file. It can be created on WordPad, Notepad or any plain text editor that you have on your computer. Start by creating an empty file and title it “robots.txt “.


The first text that should appear on the page should read something like this:


User-agent: *
Disallow:


When you use the * that is referring to all search engines. If you wanted to specify behaviour for a specific search engine, you would use:


User-agent: Googlebot


This would specifically tell Googlebot that these are the rules for its indexing. Although, in general, most websites address all the search engines simultaneously with the * protocol.  Every page that you do not want the search engines to index or view should have the disallow protocol, for example:

 

User-agent: *
Disallow: /pictures
Disallow: /design-template
Disallow: /temp-blog


Or whatever pages you do not want to be crawled and indexed.


Have Fun!


I hope this information has been helpful for your robots.txt file creation and optimization. Have fun creating your page, and make sure always to check it for errors when you are done creating it or even updating it.