Thursday, December 24, 2009

What is robots.txt File and How to Best Use it For Your Benefit?

Duplicate content is a big threat to a SEO company as it can affect the ranking of a site but in certain cases it becomes necessary to put up such content on the site especially when you want to provide the printable version of a particular page to the viewers. You might be thinking if there is a way to avoid the search engines from indexing a page and the answer is the robots.txt HTML code.
For those who are new to SEO when the robot.txt file command appears before a page it instructs the search engines to stay away from that page. This command is especially useful when there are same pages on the site or some sensitive data which you do not want to be public. Images, javascript and stylesheets can be excluded from indexing using this command this will also save on bandwidth.

It should be understood that robots.txt is a mere instruction for the search engines and it will be very foolish if you use it for sensitive data. If in case the search engines are not able to locate the robots file then there are all possibilities that the data is indexed by them. So it is better to use a combination of methods for protecting sensitive data.

It should be known that the search engines do not search the whole site to find the robots.txt file but the main directory located here www.mysite.com/robots.txt. The robots.txt should be located in the main directory for the search engines to find it and do not index the files you do not want to be indexed. If it is not so placed then you cannot blame the search engines for indexing all the pages.

