app/soc/content/robots.txt
author Sverre Rabbelier <srabbelier@gmail.com>
Sun, 19 Apr 2009 17:42:44 +0000
changeset 2246 c29272f640b0
parent 73 211a3eeacf27
permissions -rw-r--r--
Tweak the 'load balancing' algorithm In order to reduce contention we randomly skipped jobs, but this caused many jobs to end up stopping early. Now instead we keep on going until we time out (also increased the chance of doing work). Patch by: Sverre Rabbelier

# Directions for web crawlers.
# See http://www.robotstxt.org/wc/norobots.html.

User-agent: HTTrack
User-agent: puf
User-agent: MSIECrawler
User-agent: Nutch
Disallow: /