Tweak the 'load balancing' algorithm
In order to reduce contention we randomly skipped jobs, but this
caused many jobs to end up stopping early. Now instead we keep on
going until we time out (also increased the chance of doing work).
Patch by: Sverre Rabbelier
# Directions for web crawlers.
# See http://www.robotstxt.org/wc/norobots.html.
User-agent: HTTrack
User-agent: puf
User-agent: MSIECrawler
User-agent: Nutch
Disallow: /