app/soc/content/robots.txt
author Lennard de Rijk <ljvderijk@gmail.com>
Tue, 31 Mar 2009 19:25:43 +0000
changeset 2044 3aa6123be2a7
parent 73 211a3eeacf27
permissions -rw-r--r--
Now using GET request for getting the data. This will prevent a 411 error from occurring on a live website. Also the URL does not need the timestamp since it already is added automatically. Patch by: Lennard de Rijk Reviewed by: to-be-reviewed

# Directions for web crawlers.
# See http://www.robotstxt.org/wc/norobots.html.

User-agent: HTTrack
User-agent: puf
User-agent: MSIECrawler
User-agent: Nutch
Disallow: /