Opened 17 years ago
Closed 16 months ago
#139 closed task (invalid)
Apache - robots
Reported by: | sbarnes | Owned by: | sbarnes |
---|---|---|---|
Priority: | normal | Milestone: | Sysadmin Contract 2023-II |
Component: | General | Keywords: | |
Cc: | hobu, warmerdam |
Description
The server was thrashed again. 9:00pm EST, 27 Aug/2007
I rebooted via the ups. Edited httpd.conf and changed MaxClients to 50
Going through the logs it seems as if we where being crawled by google, msn, yahoo, and twiceler.
I added a block in redirect.conf to try and block twiceler but, it doesn't seem to be working.
I just emailed the owners of twiceler and asked to not be crawled, there website claims they will do this if the bot is causing problems.
Change History (6)
comment:1 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 17 years ago
Cc: | added |
---|
Shawn,
I would suggest the maxclients value be restored to normal to avoid occasional and surprising failures. Also of interest is ticket #139 which attempts to solve this on the server side with "spider traps".
comment:4 by , 17 years ago
I've upped the MaxClients to 150.
will continue watching the server before closing this
comment:5 by , 7 years ago
This ticket has been cold for 10 years. It is not clear whether proper handling of spiderbots was ever actually resolved properly.
It is unacceptable to limit the total number of connections to less than needed for actual web users, and it is undesirable to completely stop indexing of discoverable data.
Proper use of robots.txt and possibly connection/rate limiting for spiders is a more appropriate solution.
The current configuration should be reviewed, corrected as necessary, and documented on the SAC wiki before closing this ticket.
comment:6 by , 16 months ago
Milestone: | → Sysadmin Contract 2023-II |
---|---|
Resolution: | → invalid |
Status: | assigned → closed |
Closing as invalid:
This ticket corresponds to the old osgeo.org website which is no longer in service.
Twiceler returned email and indicated that they would block twiceler from crawling on osgeo.org
I'm leaving this open for a few days to make sure twiceler is leaving us alone and for verification that MaxClients change is ok.
shawn