Context Navigation

← Previous Ticket
Next Ticket →

#726 closed defect (fixed)

MapGuide Stops Listening on 2811 (Client Connections)

Reported by:	rbranson	Owned by:
Priority:	high	Milestone:	2.1
Component:	Server	Version:	2.0.2
Severity:	trivial	Keywords:
Cc:	brucedechant, trevorwekel	External ID:

Description ¶

Here's what I've got going:

MapGuide 2.0.2 on Windows 2008 Server 32-bit w/2GB RAM.
4x2.66GHz Core 2 Duo
Using the installers with Apache

My client "application" is OpenLayers with THREE Layers that connect to MapGuide. We're needing to switch the content of the "middle" layer in our sandwich out so there's a lot of concurrent connections going on.

The majority of the data set is worldwide vector geography down to 1:500,000. The feature sources are base oceans (just a square out to the extents), water polygons, water lines (very, very small rivers), nations, administrative areas, built-up areas (like metro areas), cities (187k of these), shorelines, national boundary lines, major routes, minor routes, and cities. Layers are created from these, up to 9 layers are pointed at a single source (in the case of cities because we have 9 significance levels) and we use some simple filter rules to get the features we want into each layer. We're not using any theming and instead we're breaking anything that would fall under the "theming" concept into separate layers. We just find that easier to manage. The "sandwich" layers are vector RF contours. These are "relatively" simple as they're really only useful down to 1:1,000,000 usually. All of it is in the same projection and I've double/quadruple verified all of this. All of this was imported from SHP files and converted to SDF upon import.

I've got one custom little PHP piece that that finds cities by name and any RF patterns that spatially match. This was originally a source of problems because I wasn't closing my feature readers. That has been fixed. I thought maybe this was "triggering" the problem, but I could reproduce the problem without touching the PHP script.

My problem is that after some load is applied to the system, the MapGuide stops listening on port 2811, requiring a restart of the server process to get it back. The bizarre thing is that the current connections (from the map agent running in the apache process to the mg server) stay open so for a little while some of the queries go thru. Eventually they all timeout and things go south. The pattern seems to be when I swap that sandwich layer out and then immediately zoom down to a city point (this is replicated from the search results from that custom PHP piece). Vancouver, BC is particularly bad because it happens to pull features out of almost all of the layers. The admin (2810) and site (2812) ports continue to stay listening.

I'm not seeing anything in the error logs. I tried to use the trace logging but nothing really came of that. I spent a couple of hours tuning the serverconfig.ini and I could find ways to make some small progress but the port would eventually just stop listening. Most of my focus was playing with the thread, queue, and max connections for both client and site as well as the data connection pooling / caching features. I just reverted back to using the default serverconfig as a "control." I tried to single out a layer that was causing problems, but none of them appeared to be causing problems. Even taking the majority of the layers down out would "put off" the problem for a little while, but eventually after enough requests it would surface.

I spent some time burning thru the source code and learning how it was all interconnected to see why it would just suddenly stop listening on a port, but I eventually got down in abstraction to the ACE framework and my morale sank. ACE seems really nifty, but tracing thru their source code was a little daunting :)

The only workaround that actually succeeded was editing the Apache configuration to limit the ThreadsPerChild value. I believe the default was 20. I dropped this to 12 and inched it up to 16 without problems. This is where I left it because the performance was ACCEPTABLE. Granted, it was a bit better with the default (better parallelism) but slower is better than eventual crashing.

So I've "fixed" the problem but it seems somewhat like a copout on my part. is this some kind of locking problem with too many FDO connections trying to access the same data source? I'd love to help in any way.

Back Links(4)

#480 Crash with MgInvalidStreamHeaderException with GETTILEIMAGE (load test) – comment:8
#1168 mapguide hangs after 20 users access maps
#1272 Change ACE WFMO reactor used by Windows to the ACE SELECT reactor
source:(default) – [3856]

Attachments (1)

test-case-726-55-max-connections.php (1.6 KB ) - added by zspitzer 16 years ago.

Download all attachments as: .zip

Change History (26)

comment:1 by zspitzer, 16 years ago

by doing lots of filters rather than theming, your doing a full table scan for each layer as SDF isn't indexed except spatially. This ends up being a lot of IO...

what was the mgserver.exe process load like?

comment:2 by rbranson, 16 years ago

That's good to know. There's a lot of disk I/O, but we've also got a good bit of data. The problem with the port not listening anymore continued to happen even as I dropped all of the layers styled in this manner out of the map. These layers had a 1:1 layer to feature source ratio.

in reply to: description comment:3 by amitmarty, 16 years ago

Priority:	low → high

Replying to rbranson:

Here's what I've got going:

MapGuide 2.0.2 on Windows 2008 Server 32-bit w/2GB RAM.
4x2.66GHz Core 2 Duo
Using the installers with Apache
My client "application" is OpenLayers with THREE Layers that connect to MapGuide. We're needing to switch the content of the "middle" layer in our sandwich out so there's a lot of concurrent connections going on.

The majority of the data set is worldwide vector geography down to 1:500,000. The feature sources are base oceans (just a square out to the extents), water polygons, water lines (very, very small rivers), nations, administrative areas, built-up areas (like metro areas), cities (187k of these), shorelines, national boundary lines, major routes, minor routes, and cities. Layers are created from these, up to 9 layers are pointed at a single source (in the case of cities because we have 9 significance levels) and we use some simple filter rules to get the features we want into each layer. We're not using any theming and instead we're breaking anything that would fall under the "theming" concept into separate layers. We just find that easier to manage. The "sandwich" layers are vector RF contours. These are "relatively" simple as they're really only useful down to 1:1,000,000 usually. All of it is in the same projection and I've double/quadruple verified all of this. All of this was imported from SHP files and converted to SDF upon import.

I've got one custom little PHP piece that that finds cities by name and any RF patterns that spatially match. This was originally a source of problems because I wasn't closing my feature readers. That has been fixed. I thought maybe this was "triggering" the problem, but I could reproduce the problem without touching the PHP script.

My problem is that after some load is applied to the system, the MapGuide stops listening on port 2811, requiring a restart of the server process to get it back. The bizarre thing is that the current connections (from the map agent running in the apache process to the mg server) stay open so for a little while some of the queries go thru. Eventually they all timeout and things go south. The pattern seems to be when I swap that sandwich layer out and then immediately zoom down to a city point (this is replicated from the search results from that custom PHP piece). Vancouver, BC is particularly bad because it happens to pull features out of almost all of the layers. The admin (2810) and site (2812) ports continue to stay listening.

I'm not seeing anything in the error logs. I tried to use the trace logging but nothing really came of that. I spent a couple of hours tuning the serverconfig.ini and I could find ways to make some small progress but the port would eventually just stop listening. Most of my focus was playing with the thread, queue, and max connections for both client and site as well as the data connection pooling / caching features. I just reverted back to using the default serverconfig as a "control." I tried to single out a layer that was causing problems, but none of them appeared to be causing problems. Even taking the majority of the layers down out would "put off" the problem for a little while, but eventually after enough requests it would surface.

I spent some time burning thru the source code and learning how it was all interconnected to see why it would just suddenly stop listening on a port, but I eventually got down in abstraction to the ACE framework and my morale sank. ACE seems really nifty, but tracing thru their source code was a little daunting :)

The only workaround that actually succeeded was editing the Apache configuration to limit the ThreadsPerChild value. I believe the default was 20. I dropped this to 12 and inched it up to 16 without problems. This is where I left it because the performance was ACCEPTABLE. Granted, it was a bit better with the default (better parallelism) but slower is better than eventual crashing.

So I've "fixed" the problem but it seems somewhat like a copout on my part. is this some kind of locking problem with too many FDO connections trying to access the same data source? I'd love to help in any way.

comment:4 by amitmarty, 16 years ago

Severity:	trivial → critical

Hi, We are using 2.0.1 and facing a similar issue. Using windows 2003 Enterprise server. When the number of connections on port 2811 are greater than 55. The Mapguide server becomes unresponsive. Since it opens 7 to 8 client connections per user connection. This is causing a severe issue in using mapguide. I tried the above method to reduce the ThreadsPerChild ( My default was 50 ) to 12 or 20 and the performance of the application becomes unacceptable.

I do not see any issues logged in the mapguide server logs even when I turn on trace logging.

I have one base layer from a DWF image and data coming for the feature objects from a database.

To find how many connections are established I am using the command.

netstat -anp TCP | find "127.0.0.1:2811" | find /c "ESTABLISHED"

I am raising the severity of the issue from trivial to crtical.

I would appreciate any feedback. Do let me know if I can help in anyway.

Thank You

comment:5 by tomfukushima, 16 years ago

Cc:	brucedechant trevorwekel added

Bruce, Trevor?

comment:6 by amitmarty, 16 years ago

Been trying to investigate this more. We are using the java api and so tomcat is in the picture for us. When trying to check the connections to the port 2811 the majority of these connections are from tomcat. I have reduced the session timeout in tomcat to 1 minute and still see the same problem.

comment:7 by amitmarty, 16 years ago

Hi, We have nailed it down to the tile service. We have a base layer which is from a dwf file. When this layer is on there are a large number of connections generated to port 2811. After the number of ESTABLISHED connections go over 55, No more connections are allowed to this port till the mapguide server is restarted. If we remove the base map from the application the number of ports being used drops dramatically and pushing the server only takes it to about 35 connections on port 2811. The base layer is very important for us to give context to the feature objects.

We would appreciate any feedback.

comment:8 by amitmarty, 16 years ago

Possibly related to ticket # 480

http://trac.osgeo.org/mapguide/ticket/480

comment:9 by brucedechant, 16 years ago

Thanks for the additional information it is much appreciated.

comment:10 by amitmarty, 16 years ago

Hi Bruce, I appreciate your taking a look at the information I sent in. Any suggestions on what we can do to workaround this problem. It is basically stopping us from going forward in deploying / using our application.

Thanks Amit

comment:11 by brucedechant, 16 years ago

At this time I don't have a different workaround other than what has already been stated - ie reducing the # of connections, but makes the performance unacceptable.

Hopefully, the root cause of this issue will be found and fixed.

comment:12 by amitmarty, 16 years ago

Bruce, Wondering if any progress was made on this ...

Thanks

follow-up: 14 comment:13 by brucedechant, 16 years ago

I have only briefly looked at this as I have other tasks that I am still working on. Hopefully, someone will be able to look into this more closely soon.

in reply to: 13 comment:14 by amitmarty, 16 years ago

IS there anyway i can contact you offline. I would like to understand if there is any way i can capture more data between tomcat and mapguide so I can provide this ticket more information under which circumstances this might be happening e.t.c. Probably be easier for someone to solve if they can easily repeat it.

Thanks Replying to brucedechant:

I have only briefly looked at this as I have other tasks that I am still working on. Hopefully, someone will be able to look into this more closely soon.

comment:15 by brucedechant, 16 years ago

Essentially, any information that will help reproduce this defect consistently is critical to whoever investigates this.

comment:16 by zspitzer, 16 years ago

would creating a package with 56 tiny sdfs and then a script which opens 56 feature readers without closing them achieve this?

or a script which logs in 56 users?

by zspitzer, 16 years ago

Attachment:	test-case-726-55-max-connections.php added

comment:17 by zspitzer, 16 years ago

here's a quick attempt at a script which hold the connection open, didn't work tho :(

comment:18 by amitmarty, 16 years ago

zspitzer Can you check how many close wait connections you have after connecting 10 users and waiting for timeouts to occur ? Basically what we are doing is loading a dwf base layer, loading features from db. There are concurrent users using a different dwf and feature objects combination. After about 10 minutes ( My timeouts are set low) you will see the close wait connections to 2811 pile up.

This might be part of the problem also. When I either hit 55 connections on the client port (2811) or go above about 150 connections in the close wait state to port 2811 the server stops responding.

I have removed the dwf and then it gets tough trying to get the numbers of client connections up. The way I recreate is open the ajax viewer wait for the dwf (base layer) and feature objects to load, close it and reopen. If I do this about 10 to 15 times the connections pile up and crash the server.
is there any way you can suggest that I can snoop on the traffic between tomcat and mapguide on port 2811 ?

Also I am curious to know how the base layer rendering service works. It should be using the Apache mgmapagent.so file to go get the base layer data right ?

Does the magmapagent make a connection to the mapguide client port ?

It seems like some connections that get opened to 2811 get closed while others don't and I am trying to figure out how to reduce the scope of where I should look for the problem.

Thanks for checking

comment:19 by amitmarty, 16 years ago

Zac / Bruce, I wanted to follow the source code to try and find where the problem might be occurring. Here is my plan of action. Since I am using the Java API all my connections are coming from tomcat server ( Atleast the ones that are in the state of Close Wait) Can you give me hints or point me to documentation as to how I can attach to the tomcat process from Visual Studio and where the calls for the connections to mgserver are being initiated from. The connections for me would be

DWF Base Map connections
Feature Object Connections from Mysql DB
Tile Cache Connections.

I am purely using the ajax client for now. I appreciate the feedback.

comment:20 by amitmarty, 16 years ago

Severity:	critical → trivial

After going through a lot of permutations and combination's. I finally replaced my mysql connector jar file with a older version. And now I am glad to say the problem has gone away. I have tried jmeter tests with upto 64 users in parallel trying to access the application and have not had the 2811 connections from tomcat go above 10.

I am changing the sev. back to trivial as per the original ticket that was opened by someone else.

Thank You

comment:21 by amitmarty, 16 years ago

I guess I spoke to soon.

http://www.mail-archive.com/commons-httpclient-dev@jakarta.apache.org/msg04338.html

Above is a link to a problem in httpclientclass that has a problem with closing sockets.

And below is on JDK.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6215050

Could any one of them be part of the issue ?

comment:22 by jbirch, 16 years ago

"sftp" just posted something on MapGuide-Users that may be related to this:

http://tinyurl.com/b9ggh7

Basically, they're saying that because MapGuide implement's ACE's ACE_WFMO_Reactor it can only handle 64 (the following says 62) active connections:

http://tinyurl.com/d7h6yx

comment:23 by jbirch, 16 years ago

More email on this in the internals group:

http://tinyurl.com/ddnsyw

I wonder if having the connections set to 100 in the serverconfig.ini file has anything to do with this. Odd that the connections are being held open though, I wonder if the PHP API isn't closing them correctly. Time for more blind testing I guess. First thing I'm going to do is try lowering the client connections setting in serverconfig.ini, then I'll try running the PHP in cgi mode instead of ISAPI, and finally I'll try the same with the MapAgent.

comment:24 by brucedechant, 16 years ago

Milestone:	→ 2.1
Resolution:	→ fixed
Status:	new → closed

Fixed.

See submission r3847

comment:25 by trevorwekel, 16 years ago

Backported to adsk 2.1 under http://trac.osgeo.org/mapguide/changeset/3855

Backported to 2.0.x under http://trac.osgeo.org/mapguide/changeset/3856

Note: See TracTickets for help on using tickets.

Download in other formats: