Ticket #726 (closed defect: fixed)
MapGuide Stops Listening on 2811 (Client Connections)
| Reported by: | rbranson | Owned by: | |
|---|---|---|---|
| Priority: | high | Milestone: | 2.1 |
| Component: | Server | Version: | 2.0.2 |
| Severity: | trivial | Keywords: | |
| Cc: | brucedechant, trevorwekel | External ID: |
Description
Here's what I've got going:
* MapGuide 2.0.2 on Windows 2008 Server 32-bit w/2GB RAM. * 4x2.66GHz Core 2 Duo * Using the installers with Apache
My client "application" is OpenLayers with THREE Layers that connect to MapGuide. We're needing to switch the content of the "middle" layer in our sandwich out so there's a lot of concurrent connections going on.
The majority of the data set is worldwide vector geography down to 1:500,000. The feature sources are base oceans (just a square out to the extents), water polygons, water lines (very, very small rivers), nations, administrative areas, built-up areas (like metro areas), cities (187k of these), shorelines, national boundary lines, major routes, minor routes, and cities. Layers are created from these, up to 9 layers are pointed at a single source (in the case of cities because we have 9 significance levels) and we use some simple filter rules to get the features we want into each layer. We're not using any theming and instead we're breaking anything that would fall under the "theming" concept into separate layers. We just find that easier to manage. The "sandwich" layers are vector RF contours. These are "relatively" simple as they're really only useful down to 1:1,000,000 usually. All of it is in the same projection and I've double/quadruple verified all of this. All of this was imported from SHP files and converted to SDF upon import.
I've got one custom little PHP piece that that finds cities by name and any RF patterns that spatially match. This was originally a source of problems because I wasn't closing my feature readers. That has been fixed. I thought maybe this was "triggering" the problem, but I could reproduce the problem without touching the PHP script.
My problem is that after some load is applied to the system, the MapGuide stops listening on port 2811, requiring a restart of the server process to get it back. The bizarre thing is that the current connections (from the map agent running in the apache process to the mg server) stay open so for a little while some of the queries go thru. Eventually they all timeout and things go south. The pattern seems to be when I swap that sandwich layer out and then immediately zoom down to a city point (this is replicated from the search results from that custom PHP piece). Vancouver, BC is particularly bad because it happens to pull features out of almost all of the layers. The admin (2810) and site (2812) ports continue to stay listening.
I'm not seeing anything in the error logs. I tried to use the trace logging but nothing really came of that. I spent a couple of hours tuning the serverconfig.ini and I could find ways to make some small progress but the port would eventually just stop listening. Most of my focus was playing with the thread, queue, and max connections for both client and site as well as the data connection pooling / caching features. I just reverted back to using the default serverconfig as a "control." I tried to single out a layer that was causing problems, but none of them appeared to be causing problems. Even taking the majority of the layers down out would "put off" the problem for a little while, but eventually after enough requests it would surface.
I spent some time burning thru the source code and learning how it was all interconnected to see why it would just suddenly stop listening on a port, but I eventually got down in abstraction to the ACE framework and my morale sank. ACE seems really nifty, but tracing thru their source code was a little daunting :)
The only workaround that actually succeeded was editing the Apache configuration to limit the ThreadsPerChild? value. I believe the default was 20. I dropped this to 12 and inched it up to 16 without problems. This is where I left it because the performance was ACCEPTABLE. Granted, it was a bit better with the default (better parallelism) but slower is better than eventual crashing.
So I've "fixed" the problem but it seems somewhat like a copout on my part. is this some kind of locking problem with too many FDO connections trying to access the same data source? I'd love to help in any way.

