Opened 17 years ago
Closed 17 years ago
#140 closed task (fixed)
Problem Spiders Pulling Big Trac Changesets
Reported by: | warmerdam | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | SysAdmin | Keywords: | |
Cc: |
Description
Some spiders are still walking trac.osgeo.org, ignoring the robots.txt and end up pulling huge changesets out bringing www.osgeo.org to it's knees.
Change History (2)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Closing under the optimistic assumption that this will take care of a problem. I've primed the forbidden_ips.txt with two known spider IPs.
Note:
See TracTickets
for help on using tickets.
I have applied the "Lay Spider Traps" pattern from:
I have added a spider trap at the bottom of http://trac.osgeo.org/index.html to bad.html (actual url deliberately avoided!)
This is redirected to /cgi-bin/bad.pl, a perl script that adds the offending ip to /var/www/trac/forbidden_ips.txt along with a user agent info comment. All ips in this file are magically redirected to a 403 error by additional rewrite rules using it as a map. The apache conf magic is in /etc/httpd/conf.d/hosts/trac.conf:
The cgi script is /var/www/trac/cgi-bin/bad.pl
Note bad.html was added to /var/www/trac/robots.txt