robots.txt FAQ
Many search engines only download the file once per day, but they may send in many robots later that same day. So, it is extremely important to make the change to the robots.txt file one day, and then wait until the next day to create a subdirectory and place files into it. Always plan your changes one day ahead of time, or you may be very surprised to see your hidden pages appear in the SERPs.
Will a robots.txt file stop a page from appearing the SERPs?
No. Even if you follow the procedure above, most robots will follow a link up to that page, and then cache just a link to that page, which includes the page title (in the head section of the HTML code) and the URL of the page itself. There will not be a description, and it will not likely rate very highly in the SERPs, but it will definitely be there and you can find it by using the "site:" command (or by searching for the entire page title in quotation marks).
What happens to a page that is already in the SE's cache?
The robot will try to access and update the page, but it will be blocked, so it will keep the cache that it already has. In a few weeks (or months, depending upon the engine) the snippet and the link to the cache of the page will eventually disappear, but the link to the page will remain in the search engine. Also, the page title that was in the cache prior to the change will appear in the SERPs for a very long time, even if you change the page title after blocking the robots' access to the page.
Will a robots.txt file stop every robot?
No. There are many, many rogue robots that ignore the robots.txt file. Examples of this are sweepstakes scrapper sites that want to show every sweepstakes for their visitors. Their bots completely ignore the robots.txt file (and all other META tag robot instructions).
Is a robots.txt file necessary?
No, there are lots of sites on the web that do not have this file in place. However, you will see lots of 404 errors related to the robots.txt file in your server logs, so creating a robots.txt file that specifically allows the robots in will eliminate these errors and reduce your server log file size.
Is a robots.txt file needed if there are no links to any pages in a site?
No, the robots do not snoop around looking for hidden subdirectories with hidden files. HOWEVER, depending upon the registrar, new domains get submitted to search engines automatically and they send their robots out to index the site. ALSO, any link from a page that is part of the existing "visible" web will provide a path for the robots to follow, and they will all eventually follow that link in and cache whatever pages they can "get" so it is best to create a robots.txt file first. This will make certain that no legitimate search engine gets your hidden pages and caches them.
Can a robots.txt file delete pages from the search engines?
No, only Google has a URL removal tool that can be used in conjunction with the robots.txt file, and great care must be taken or your entire site can be deleted from their index for 180 days. Note that pages will return to Google's index after 180 days if the robots.txt restriction is removed. None of the other engines have a URL removal tool at the moment, so if they already have a cache of the page, they will not delete it based on the robots.txt file. The page will remain for what seems like an eternity, even after it is putting out a 404 server header code (page not found).
