Opened 8 years ago

Closed 8 years ago

#2596 closed defect (fixed)

Constant crashes under high load/many concurrent requests

Reported by: andymorf Owned by:
Priority: high Milestone: 3.1
Component: Server Version: 3.0.0
Severity: blocker Keywords: crash, high load
Cc: Andreas, Morf External ID:

Description

Firing 100-200 concurrent QUERYMAPFEATURES (according Maptip) to mapagent leads to constant crashing of mgserver. Before crashing there are a lots of exceptions logged:

 Error: Invalid argument(s):
        String argument is empty: className
 StackTrace:
  - MgRenderingServiceHandler.ProcessOperation line 83 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\RenderingServiceHandler.cpp
  - MgOpQueryFeatures.Execute line 125 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\OpQueryFeatures.cpp
  - MgServerRenderingService.QueryFeatures line 1093 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\ServerRenderingService.cpp
  - MgServerRenderingService.RenderForSelection line 1826 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\ServerRenderingService.cpp
  - MgServerFeatureService.SelectFeatures line 451 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\feature\ServerFeatureService.cpp
  - MgServerSelectFeatures.SelectFeatures line 331 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\feature\ServerSelectFeatures.cpp
  - MgServerSelectFeatures::ValidateParam line 826 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\feature\ServerSelectFeatures.cpp	
<2016-06-06T20:41:30> 	8788	MgStress	127.0.0.1	Administrator
 Error: The specified class was not found.
 StackTrace:
  - MgRenderingServiceHandler.ProcessOperation line 83 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\RenderingServiceHandler.cpp
  - MgOpQueryFeatures.Execute line 125 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\OpQueryFeatures.cpp
  - MgServerRenderingService.QueryFeatures line 1093 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\ServerRenderingService.cpp
  - MgServerRenderingService.RenderForSelection line 1826 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\ServerRenderingService.cpp
  - MgServerDescribeSchema.GetClassDefinition line 1029 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\feature\ServerDescribeSchema.cpp	
<2016-06-06T20:41:30> 	6236	MgStress	127.0.0.1	Administrator
 Error: An exception occurred in FDO component.
        Error occurred in Feature Source (Library://oradata/av.FeatureSource): c_KgOraSelectCommand.Execute : ERROR: FindClassDefinition() return NULL  (Cause: , Root Cause: c_KgOraSelectCommand.Execute : ERROR: FindClassDefinition() return NULL )
 StackTrace:
  - MgRenderingServiceHandler.ProcessOperation line 83 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\RenderingServiceHandler.cpp
  - MgOpQueryFeatures.Execute line 125 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\OpQueryFeatures.cpp
  - MgServerRenderingService.QueryFeatures line 1093 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\ServerRenderingService.cpp
  - MgServerRenderingService.RenderForSelection line 1826 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\rendering\ServerRenderingService.cpp
  - MgServerFeatureService.SelectFeatures line 451 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\feature\ServerFeatureService.cpp
  - MgServerSelectFeatures.SelectFeatures line 331 file c:\working\build_area\mapguide\3.1.0\x64\mgdev\server\src\services\feature\ServerSelectFeatures.cpp	
  • If sending the requests one after another, everyone is executed correct response.
  • KingOra is used

Attachments (2)

2596.patch (93.2 KB ) - added by jng 8 years ago.
Proposed patch
2596_v4.patch (4.0 KB ) - added by jng 8 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by jng, 8 years ago

Do you have any load testing code / data for reproducing this?

comment:2 by jng, 8 years ago

My current findings thus far.

I was able to finally reproduce this problem and am strongly suspecting that it is some kind thread-safety issue around strings.

This error from the provider

c_KgOraSelectCommand.Execute : ERROR: FindClassDefinition() return NULL

is most likely because it was fed a corrupted string as a result of intense multi-threaded usage.

I found an old mapguide-internals thread which highlighted some pitfalls around STL string usage in a multi-threaded environment (http://osgeo-org.1560.x6.nabble.com/std-string-not-thread-safe-on-Linux-td4210940i20.html)

Which leads me to suspect it is some thread-safety issue around MgUtil and its static sm_classNameQualifier STL string member (https://trac.osgeo.org/mapguide/browser/trunk/MgDev/Common/Foundation/System/Util.cpp#L1232). I think corruption is being introduced by multiple threads concatting from that same static STL string.

Why do I think this? Because this is the file where VS generally starts breaking on when it starts access violating in my load tests. Other places where it breaks on access violation also happen to be places that involve an FDO class name that would've went through MgUtil one way or another.

by jng, 8 years ago

Attachment: 2596.patch added

Proposed patch

comment:3 by andymorf, 8 years ago

The last few tests of mgserver (release) with attached debugger always crashed within geos::io::WKTReader.read() and I stumbled over http://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function which states the setLocale() function used in CLocalizer.cpp operates globally (per process) – and saving and restoring the locale via pointers seemed risky to me.

So as a first attempt I commented out the line “CLocalizer clocale;” in WKTReader.read() and WKTWriter.writeFormatted() and rebuilt geos.dll. Astonishingly, my load tests cannot tear down mgserver so far… Maybe, leaving locale-handling just commented out is not the definitive solution

comment:4 by jng, 8 years ago

Is there a preprocessor flag in GEOS that we can defined so that the code in question in GEOS can be #ifdef'd out?

I don't want to modify Oem sources if we don't have to.

by jng, 8 years ago

Attachment: 2596_v4.patch added

comment:5 by jng, 8 years ago

Resolution: fixed
Status: newclosed

Fixed trunk (r8986) and 3.0 (r9048)

Note: See TracTickets for help on using tickets.