MapGuide Open Source:  Home |  Download |  Internals

Ticket #462 (closed defect: fixed)

Opened 2 years ago

Last modified 10 months ago

Server Crash and Unresponsive using OSGeo.Gdal FDO provider

Reported by: zspitzer Assigned to: brucedechant
Priority: medium Milestone: 2.1
Component: General Version: 2.0.2
Severity: blocker Keywords:
Cc: stevedang, brucedechant, haris External ID:

Description

I have just had a 2.0 RC4 server crash and was forced to reboot the entire server running with a lot of other applications.

I was playing with a tif via gdal and the server crashed with the following stack trace. the server remained up after afterwards, but the service would not respond to a stop request and was left hung, nothing is written to the error log afterward

I have seen similiar behaviour with 1.2, but i wasn't able to reproduce the error reliably

<2008-02-26T17:59:07> Administrator

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource?2.0\Server\Repositories\TileCache?\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png StackTrace?:

  • MgTileServiceHandler?.ProcessOperation? line 83 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\TileServiceHandler?.cpp
  • MgOpGetTile?.Execute line 150 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\OpGetTile?.cpp
  • MgServerTileService?.GetTile? line 263 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\ServerTileService?.cpp
  • MgByteSink::ToFile? line 245 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\common\foundation\Data/ByteSink.cpp
  • MgByteSink?.ToFile? line 220 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\common\foundation\Data/ByteSink.cpp A file IO exception occurred: C:\Program Files\MapGuideOpenSource?2.0\Server\Repositories\TileCache?\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png

<2008-02-26T17:59:09> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred. <2008-02-26T17:59:10> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred.

Attachments

mapguide_raster_unalloc.5.patch (1.6 kB) - added by jbirch on 03/27/2009 02:29:33 AM.
mapguide_raster_stability.patch (1.0 kB) - added by jbirch on 03/27/2009 02:30:11 AM.
grfp_addref.patch (6.1 kB) - added by traianstanev on 03/29/2009 01:27:03 AM.
GDAL raster provider patch to addref the FDO connection from the FeatureReader?
gdal_nomutex_cacheschema.patch (12.1 kB) - added by traianstanev on 03/29/2009 01:07:47 PM.
Fix refcounting problem + cache schema

Change History

02/26/2008 08:40:03 PM changed by tomfukushima

Please try editing your serverconfig.ini file and change the line that reads DataConnectionPoolSizeCustom = to DataConnectionPoolSizeCustom = OSGeo.Gdal:1

and report this resolves your problem. Thanks, Tom

02/26/2008 08:41:13 PM changed by tomfukushima

Wow, what ugly formatting, let's try again. Change the line that reads

DataConnectionPoolSizeCustom =

to

DataConnectionPoolSizeCustom = OSGeo.Gdal:1

02/26/2008 09:16:28 PM changed by zspitzer

Just tried that, seems better, but I'm still getting this error, after which most layers (sdf, not raster) don't render properly until i restart the service

The service hasn't locked up again since the change

<2008-02-27T13:07:09> Administrator

Error: Failed to stylize layer: LayerDefinition?35

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred.

02/26/2008 10:36:03 PM changed by zspitzer

And then i am seeing

Cannot create any more connections to the OSGeo.Gdal FDO provider.

02/26/2008 11:27:52 PM changed by tomfukushima

The connection problem has been fixed by submission r2978 (submitted after RC4 came out)

02/28/2008 11:16:54 PM changed by tomfukushima

  • status changed from new to closed.
  • resolution set to fixed.

03/03/2008 09:49:39 PM changed by zspitzer

  • status changed from closed to reopened.
  • resolution deleted.

Still occurring with the final 2.0.0 release version

<2008-03-04T02:09:12> Anonymous

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource?2.0\Server\Repositories\TileCache?\RASTER_True Marble_True Marble Tiled EPSG 4283/S3/True Martble/R120/C150/26_12.png StackTrace?:

  • MgTileServiceHandler?.ProcessOperation? line 83 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\tile\TileServiceHandler?.cpp
  • MgOpGetTile?.Execute line 150 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\tile\OpGetTile?.cpp
  • MgServerTileService?.GetTile? line 263 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\tile\ServerTileService?.cpp
  • MgByteSink::ToFile? line 245 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\common\foundation\Data/ByteSink.cpp
  • MgByteSink?.ToFile? line 220 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\common\foundation\Data/ByteSink.cpp A file IO exception occurred: C:\Program Files\MapGuideOpenSource?2.0\Server\Repositories\TileCache?\RASTER_True Marble_True Marble Tiled EPSG 4283/S3/True Martble/R120/C150/26_12.png

<2008-03-04T09:16:13> Anonymous

Error: An unclassified exception occurred. StackTrace?:

<2008-03-04T09:16:14> Anonymous

Error: Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

StackTrace?:

  • MgMappingUtil?.StylizeLayers? line 776 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\mapping\MappingUtil?.cpp Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

03/03/2008 09:54:08 PM changed by zspitzer

  • summary changed from Server Crash and server unresponsive to Server Crash and Unresponsive using OSGeo.Gdal FDO provider.

09/24/2008 12:47:20 AM changed by zspitzer

  • version changed from 2.0.0 to 2.0.2.

Problem still occurs with 2.0.2 release & tiled maps

11/25/2008 04:13:51 PM changed by tomfukushima

  • cc set to stevedang.

We did stability tests with the GDAL provider and GeoTIFF and found that it was stable. Can you try your test again with GeoTIFF (you can use the GDAL utilities to convert to this format) and see if the problem occurs again? We know of a memory leak, but it's only a few KB at a time so it should take a while before that causes memory problems.

11/25/2008 04:23:12 PM changed by zspitzer

test case is here which will crash 2.0.2

you will need to tweak the path for the raster file which is unmanaged and is defined as being under c:\data\raster\

http://ennoble.dreamhosters.com/tests//raster_ticket_462.mgp.zip

11/25/2008 06:39:24 PM changed by tomfukushima

Hi Zac, please add the steps to recreate the problem once the package is loaded.

11/25/2008 06:50:38 PM changed by zspitzer

just open up the basic layout and start to zoom in and pan around,

it doesn't take long for mapguide to stop rendering tiles

http://localhost:8008/mapguide/mapviewerajax/?WEBLAYOUT=Library%3a%2f%2fraster%2fTrueMarble.8km.5400x2700+basic.WebLayout&LOCALE=en&USERNAME=Anonymous&PASSWORD=&

11/25/2008 07:40:17 PM changed by tomfukushima

Thanks Zac, I couldn't reproduce this using IE, but once I moved to using Google Chrome, a problem showed up right away. I only needed to do a single zoom and then pan. Steve, this is pretty easy to reproduce. Come see me if you want to see it. The server continues to work though, as I can continue to get in through Studio; it seems that the GDAL provider is hooped because I can't do anything with raster.

This is the error that I get:

<2008-11-25T17:34:24> 	Ajax Viewer	144.111.170.90	Anonymous
 Error: Failed to stylize layer: TrueMarble.8km.5400x2700 layer
        An unclassified exception occurred.
 StackTrace:
  - MgMappingUtil.StylizeLayers line 786 file d:\build\mapguide_open_source_v2.0\build_30.11\mgdev\server\src\services\mapping\MappingUtil.cpp	Failed to stylize layer: TrueMarble.8km.5400x2700 layer
An unclassified exception occurred.

03/27/2009 02:29:33 AM changed by jbirch

  • attachment mapguide_raster_unalloc.5.patch added.

03/27/2009 02:30:11 AM changed by jbirch

  • attachment mapguide_raster_stability.patch added.

03/27/2009 03:03:36 AM changed by jbirch

There are (at least) three problems with raster in 2.0.2.

The first is a memory leak that has since been fixed in 2.1.

The second is (my non-technical description) writing to unallocated memory. (see mapguide_raster_unalloc.5.patch)

The third is a defect in the way that MapGuide deals with single-threaded providers. The attachment mapguide_raster_stability.patch provides a workaround for this defect in conjunction with the GDAL provider.

However, this could conceptually happen with other providers. Haris is looking into the problem more in depth, but in the meantime explained the problem to me as follows, referencing the code around Line 660 of MappingUtil.cpp :

Assume two Raster layers accessed at same time.

  1. Raster connection to Layer 1 created
  2. ExecuteRasterQuery executed, class Georaster created which keeps pointer to connection ( not adding ref count)
  3. That threads goes into Stylize Layers
  4. Second thread goes into ExecuteRasterQuery, but is accessing another raster layer so can't use the same connection
  5. Second thread creates new connection to raster provider, but because the pool size for single-threaded providers is limited to 1 (and also because gdal provider didn't ref count++) the connection manager deletes the first thread's connection
  6. First thread which now in StylizeGridLayer finds that its connection was deleted and the pointer is gone

Result: Exception and corrupted memory

03/27/2009 11:59:06 AM changed by brucedechant

  • cc changed from stevedang to stevedang, brucedechant.

03/27/2009 12:26:17 PM changed by brucedechant

If the FdO connection manager is deleting a connection in use that is a bug. Let me debug the server to see what is happening because we only see this issue with GDAL. I would like to know where the bug is - either in the handling of the single threaded providers or if it is just GDAL not referencing counting properly.

03/27/2009 12:48:44 PM changed by jbirch

  • cc changed from stevedang, brucedechant to stevedang, brucedechant, haris.

03/29/2009 01:27:03 AM changed by traianstanev

  • attachment grfp_addref.patch added.

GDAL raster provider patch to addref the FDO connection from the FeatureReader?

03/29/2009 01:07:47 PM changed by traianstanev

  • attachment gdal_nomutex_cacheschema.patch added.

Fix refcounting problem + cache schema

04/04/2009 06:38:20 PM changed by jbirch

  • milestone changed from 2.0 to 2.1.

04/20/2009 04:02:46 PM changed by brucedechant

  • owner set to brucedechant.
  • status changed from reopened to new.

04/20/2009 04:26:05 PM changed by brucedechant

  • status changed from new to assigned.

04/20/2009 04:30:06 PM changed by brucedechant

  • status changed from assigned to closed.
  • resolution set to fixed.

Fixed.

See changeset r3829.