Opened 16 years ago

Closed 8 years ago

#277 closed task (fixed)

Robots Are Attacking!

Reported by: warmerdam Owned by: sac@…
Priority: normal Milestone:
Component: SysAdmin Keywords: trac
Cc:

Description

Today we were able to catch one of our load spikes in action. The server-status report indicated:

Srv	PID	Acc	M	CPU 	SS	Req	Conn	Child	Slot	Client	VHost	Request
0-0	28743	0/909/2477	W 	162.91	92	0	0.0	10.29	21.07 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
1-0	2426	0/44/1752	W 	11.97	133	0	0.0	2.51	23.90 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
2-0	2876	0/2/1699	W 	1.35	120	0	0.0	0.01	14.26 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
3-0	2880	0/8/2075	W 	3.45	77	0	0.0	0.01	91.60 	70.91.111.164	trac.osgeo.org	GET /gdal/log/ HTTP/1.0
4-0	2882	0/11/2494	W 	4.84	0	0	0.0	0.14	32.74 	70.91.111.164	trac.osgeo.org	GET /gdal/log/sandbox/ajolma/swig HTTP/1.0
5-0	2883	0/6/1292	W 	1.81	10	0	0.0	0.03	17.24 	70.91.111.164	trac.osgeo.org	GET /gdal/log/trunk?rev=14376 HTTP/1.0
6-0	540	0/279/952	W 	53.38	109	0	0.0	6.77	14.25 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
7-0	543	0/276/1812	W 	55.07	109	0	0.0	2.62	14.39 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
8-0	20939	0/2031/2508	W 	390.31	200	0	0.0	25.80	30.36 	198.253.49.6	trac.osgeo.org	GET /ossim/doxygen/classossimImageData.html HTTP/1.1
9-0	2890	0/20/2507	W 	4.27	5	0	0.0	0.27	14.41 	74.6.22.97	trac.osgeo.org	GET /fdo/wiki/WikiFormatting HTTP/1.0
10-0	2893	0/0/1744	W 	181.85	101	0	0.0	0.00	42.93 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
11-0	26129	0/1332/1966	W 	212.63	0	0	0.0	9.06	25.59 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
12-0	546	0/277/785	W 	56.27	115	0	0.0	1.47	5.29 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
13-0	2895	0/18/609	W 	4.95	0	0	0.0	0.44	5.58 	67.195.37.123	osgeo1.osgeo.org	GET /switchuilocale/id?destination=node%2F723 HTTP/1.0
14-0	548	0/283/982	W 	59.52	74	0	0.0	1.76	7.41 	70.91.111.164	trac.osgeo.org	GET /gdal/log/trunk HTTP/1.0
15-0	2896	0/0/591	W 	34.98	96	0	0.0	0.00	4.39 	70.91.111.164	trac.osgeo.org	GET /gdal/log/branches/1.4 HTTP/1.0
16-0	2897	0/7/733	W 	3.37	0	0	0.0	0.18	5.03 	209.169.157.146	osgeo1.osgeo.org	GET / HTTP/1.0
17-0	551	0/273/2312	W 	49.57	128	0	0.0	5.62	26.73 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
18-0	552	0/262/1491	W 	44.07	127	0	0.0	1.06	22.71 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
19-0	2898	0/9/295	W 	4.06	0	0	0.0	0.22	2.45 	70.91.111.164	trac.osgeo.org	GET /gdal/browser/sandbox/crschmidt?order=size HTTP/1.0
20-0	2899	0/5/433	W 	1.43	20	0	0.0	0.10	2.69 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
21-0	20959	0/2073/2346	W 	382.95	9	0	0.0	27.21	28.29 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
22-0	2900	0/3/456	W 	1.17	20	0	0.0	0.08	2.68 	70.91.111.164	trac.osgeo.org	GET /gdal/log/trunk?rev=14376 HTTP/1.0
23-0	20966	0/2043/2121	W 	362.15	3	0	0.0	39.28	40.03 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
24-0	2901	0/1/377	W 	0.00	94	0	0.0	0.000	5.31 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
25-0	20968	0/2090/2137	W 	406.93	1	0	0.0	48.82	49.00 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13273/sandbox/crschmidt HTTP/1.0
26-0	2904	0/9/209	W 	3.43	2	0	0.0	0.15	0.97 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/11871/sandbox/hobu HTTP/1.0
27-0	558	0/265/519	W 	54.33	116	0	0.0	1.25	3.52 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
28-0	559	0/282/438	W 	46.89	77	0	0.0	2.26	4.42 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
29-0	20982	0/2112/2125	W 	394.18	1	0	0.0	22.55	22.84 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/11871/sandbox/hobu HTTP/1.0
30-0	2906	0/22/79	W 	6.23	0	0	0.0	0.44	1.24 	74.6.18.233	osgeo1.osgeo.org	GET /pipermail/mapserver-users/2003-December/047445.html HTTP/1
31-0	2907	0/12/1450	W 	2.25	58	0	0.0	0.09	8.68 	74.6.22.97	trac.osgeo.org	GET /grass/query?status=new&status=assigned&status=reopened&mil
32-0	19340	0/2268/2293	W 	429.97	78	0	0.0	29.11	29.57 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
33-0	2910	0/15/177	W 	5.81	8	0	0.0	0.20	1.93 	70.91.111.164	trac.osgeo.org	GET /gdal/log/trunk?rev=14376 HTTP/1.0
34-0	2911	0/10/642	W 	2.71	0	0	0.0	0.36	4.87 	24.61.22.108	trac.osgeo.org	GET /mapguide/ HTTP/1.1
35-0	19351	0/2075/2088	W 	567.14	102	0	0.0	143.37	143.43 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.5?old_path=%2f&format=
36-0	2912	0/5/2090	W 	2.28	2	0	0.0	0.21	22.41 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13273/sandbox/crschmidt HTTP/1.0
37-0	2913	0/11/1972	W 	3.02	0	0	0.0	0.15	14.52 	209.85.238.11	trac.osgeo.org	GET /gdal/timeline?milestone=on&ticket=on&changeset=on&wiki=on&
38-0	20988	0/2101/2118	W 	369.99	139	0	0.0	38.18	38.77 	192.5.156.252	svn.osgeo.org	REPORT /ossim/!svn/vcc/default HTTP/1.1
39-0	2914	0/9/219	W 	3.37	7	0	0.0	0.16	1.77 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
40-0	2915	0/10/18	W 	3.79	15	0	0.0	0.21	0.79 	70.91.111.164	trac.osgeo.org	GET /gdal/log/trunk?rev=14376 HTTP/1.0
41-0	2916	0/13/81	W 	3.42	7	0	0.0	0.07	0.42 	74.6.22.97	trac.osgeo.org	GET /grass/query?status=new&status=assigned&status=reopened&mil
42-0	2917	0/8/20	W 	2.45	0	0	0.0	0.23	0.79 	72.171.0.144	trac.osgeo.org	GET /server-status HTTP/1.1
43-0	2918	0/10/39	W 	3.26	10	0	0.0	0.21	0.92 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
44-0	2919	0/10/160	W 	5.24	7	0	0.0	0.25	1.13 	70.91.111.164	trac.osgeo.org	GET /gdal/log/trunk?rev=14376 HTTP/1.0
45-0	2920	0/9/54	W 	2.17	15	0	0.0	0.11	0.61 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/13196/sandbox/ajolma HTTP/1.0
46-0	18139	0/2315/2315	W 	539.51	123	0	0.0	153.35	153.35 	70.91.111.164	trac.osgeo.org	GET /gdal/changeset/14384/branches/1.4?old_path=%2f&format=
47-0	2921	0/22/50	W 	5.08	0	0	0.0	0.29	0.87 	70.91.111.164	trac.osgeo.org	GET /gdal/browser/sandbox/crschmidt?order=date HTTP/1.0
48-0	2928	0/0/100	W 	8.54	88	0	0.0	0.00	0.36 	70.91.111.164	trac.osgeo.org	GET /gdal/log/ HTTP/1.0
49-0	2929	0/2/90	W 	0.87	76	0	0.0	0.01	0.96 	70.91.111.164	trac.osgeo.org	GET /gdal/log/ HTTP/1.0

Of note is that we were getting massive hits (at about 5 requests per second) from a robot against Trac for changesets and trac was not able to keep up -- possibly because the client was unable to consume the results we were sending back fast enough.

It is proposed that we put in place "maximum ip per connection" limits on trac.osgeo.org, similar to what we did on download.osgeo.org for #216.

Change History (6)

comment:1 by crschmidt, 16 years ago

  1. I've turned on "Combined" (instead of 'common') logging for Trac, so that we can see if bots are sending user-agents that indicate contact information in the future if this happens.
  1. I've installed httpd-devel so that I can get the apxs binary. (up2date -i httpd-devel)
  1. I've downloaded and installed limitipconn:
wget http://dominia.org/djao/limit/mod_limitipconn-0.23.tar.bz2
cd mod_limitipconn-0.23
sudo make install
  1. Set MaxConnPerIP 1, restarted apache, confirmed that reloading the gdal trac page resulted in a couple 503s. Set MaxConnPerIP to 8, reloaded, and confirmed no 503s.

This matches the default of '8' max server connections in Firefox about:config on my mac.

We may want to apply this to other services if we see other problems like this occuring: For now, I'd like to leave it on trac only and see what happens.

comment:2 by jbirch, 16 years ago

I wonder if it would be worth setting crawl-delay for the major spiders?

Yahoo and Microsoft support this directive in robots.txt, while for Google you have to set up a Webmasters Tools account and tell it to slow down in there.

comment:3 by crschmidt, 16 years ago

Jason:

What problem are you trying to solve? The 'crawler' causing problems in this case was crawling from a comcast internet connection: clearly not one of the 'big 3' search spiders, which are typically well behaved, according to all of my log-reading and observations.

Anything that opens 45 different connections to your server at once is simply a broken crawler, in my mind, no questions asked.

comment:4 by jbirch, 16 years ago

I guess that answers my question :)

I'm wasn't trying to solve a particular problem; you have dealt with that nicely. Just wondering if setting those values would help conserve server resources in general.

comment:5 by crschmidt, 16 years ago

Yeah. In general, well-behaved bots are not a problem (so far as I can observe) -- only poorly behaved bots which would ignore our "please be polite" requests anyway.

comment:6 by neteler, 8 years ago

Resolution: fixed
Status: newclosed

Since we even kind of survive the actual spam storm, closing.

Note: See TracTickets for help on using tickets.