User-Agent: * Disallow: /a/ Disallow: /tools/ Disallow: /img #If a site owner wishes to give instructions to web robots he must place #a text file called robots.txt to the root of the web site hierarchy #(e.g. www.example.com/robots.txt). This text file should contain #the instructions in a specific format (see examples below). Robots that are #programmed to follow the instructions try to fetch #this file and read the instructions before fetching any #other file from the web site. If this file doesn't exist web #robots assume that the web owner wishes to provide no specific instructions. #A robots.txt file on a website will function as a request that #specified robots ignore specified files or directories in their search. #This might be, for example, out of a preference for privacy from search #engine results, or the belief that the content of the selected directories #might be misleading or irrelevant to the categorization of the site as a whole, #or out of a desire that an application only operate on certain data. #For websites with multiple subdomains, each subdomain must have its #own robots.txt file. If example.com had a robots.txt file #but a.example.com did not, the rules that would apply #for example.com would not apply to a.example.com. #[edit] Disadvantages #The protocol is purely advisory. It relies on the cooperation of the #web robot, so that marking an area of a site out of bounds with #robots.txt does not guarantee privacy. Some web site administrators #have tried to use the robots file to make private parts of a website #invisible to the rest of the world, but the file is necessarily publicly #available and its content is easily checked by anyone with a web browser. #There is no official standards body or #RFC for the robots.txt protocol. # It was created by consensus in June 1994 #by members of the robots mailing list (robots-request@nexor.co.uk). #The information specifying the parts that should #not be accessed is specified in a file called robots.txt #in the top-level directory of the website. Disallow: /links/ # The robots.txt patterns are matched by simple #substring comparisons, so care #should be taken to make sure that patterns matching #directories have the final '/' character appended, o #therwise all files with names starting with that substring will match, rather than just those in the directory intended.