Opened 17 years ago

Closed 10 months ago

#139 closed task (invalid)

Apache - robots

Reported by: sbarnes Owned by: sbarnes
Priority: normal Milestone: Sysadmin Contract 2023-II
Component: General Keywords:
Cc: hobu, warmerdam

Description

The server was thrashed again. 9:00pm EST, 27 Aug/2007

I rebooted via the ups. Edited httpd.conf and changed MaxClients to 50

Going through the logs it seems as if we where being crawled by google, msn, yahoo, and twiceler.

I added a block in redirect.conf to try and block twiceler but, it doesn't seem to be working.

I just emailed the owners of twiceler and asked to not be crawled, there website claims they will do this if the bot is causing problems.

Change History (6)

comment:1 by sbarnes, 17 years ago

Owner: changed from tmitchell to sbarnes
Status: newassigned

Twiceler returned email and indicated that they would block twiceler from crawling on osgeo.org

I'm leaving this open for a few days to make sure twiceler is leaving us alone and for verification that MaxClients change is ok.

shawn

comment:2 by warmerdam, 17 years ago

Cc: warmerdam added

Shawn,

I would suggest the maxclients value be restored to normal to avoid occasional and surprising failures. Also of interest is ticket #139 which attempts to solve this on the server side with "spider traps".

comment:3 by warmerdam, 17 years ago

Err, I mean #140 is the other ticket on this issue.

comment:4 by sbarnes, 17 years ago

I've upped the MaxClients to 150.

will continue watching the server before closing this

comment:5 by TemptorSent, 6 years ago

This ticket has been cold for 10 years. It is not clear whether proper handling of spiderbots was ever actually resolved properly.

It is unacceptable to limit the total number of connections to less than needed for actual web users, and it is undesirable to completely stop indexing of discoverable data.

Proper use of robots.txt and possibly connection/rate limiting for spiders is a more appropriate solution.

The current configuration should be reviewed, corrected as necessary, and documented on the SAC wiki before closing this ticket.

comment:6 by cvvergara, 10 months ago

Milestone: Sysadmin Contract 2023-II
Resolution: invalid
Status: assignedclosed

Closing as invalid:

This ticket corresponds to the old osgeo.org website which is no longer in service.

Note: See TracTickets for help on using tickets.