Ticket #462 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

Server Crash and Unresponsive using OSGeo.Gdal FDO provider

Reported by: zspitzer Owned by: brucedechant
Priority: medium Milestone: 2.1
Component: General Version: 2.0.2
Severity: blocker Keywords:
Cc: stevedang, brucedechant, haris External ID:

Description

I have just had a 2.0 RC4 server crash and was forced to reboot the entire server running with a lot of other applications.

I was playing with a tif via gdal and the server crashed with the following stack trace. the server remained up after afterwards, but the service would not respond to a stop request and was left hung, nothing is written to the error log afterward

I have seen similiar behaviour with 1.2, but i wasn't able to reproduce the error reliably

<2008-02-26T17:59:07> Administrator

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource?2.0\Server\Repositories\TileCache?\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png StackTrace?:

<2008-02-26T17:59:09> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred. <2008-02-26T17:59:10> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred.

Attachments

mapguide_raster_unalloc.5.patch Download (1.6 KB) - added by jbirch 3 years ago.
mapguide_raster_stability.patch Download (1.0 KB) - added by jbirch 3 years ago.
grfp_addref.patch Download (6.1 KB) - added by traianstanev 3 years ago.
GDAL raster provider patch to addref the FDO connection from the FeatureReader?
gdal_nomutex_cacheschema.patch Download (12.1 KB) - added by traianstanev 3 years ago.
Fix refcounting problem + cache schema

Change History

Changed 4 years ago by tomfukushima

Please try editing your serverconfig.ini file and change the line that reads DataConnectionPoolSizeCustom = to DataConnectionPoolSizeCustom = OSGeo.Gdal:1

and report this resolves your problem. Thanks, Tom

Changed 4 years ago by tomfukushima

Wow, what ugly formatting, let's try again. Change the line that reads

DataConnectionPoolSizeCustom =

to

DataConnectionPoolSizeCustom = OSGeo.Gdal:1

Changed 4 years ago by zspitzer

Just tried that, seems better, but I'm still getting this error, after which most layers (sdf, not raster) don't render properly until i restart the service

The service hasn't locked up again since the change

<2008-02-27T13:07:09> Administrator

Error: Failed to stylize layer: LayerDefinition?35

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred.

Changed 4 years ago by zspitzer

And then i am seeing

Cannot create any more connections to the OSGeo.Gdal FDO provider.

Changed 4 years ago by tomfukushima

The connection problem has been fixed by submission r2978 (submitted after RC4 came out)

Changed 4 years ago by tomfukushima

  • status changed from new to closed
  • resolution set to fixed

Changed 4 years ago by zspitzer

  • status changed from closed to reopened
  • resolution fixed deleted

Still occurring with the final 2.0.0 release version

<2008-03-04T02:09:12> Anonymous

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource?2.0\Server\Repositories\TileCache?\RASTER_True Marble_True Marble Tiled EPSG 4283/S3/True Martble/R120/C150/26_12.png StackTrace?:

<2008-03-04T09:16:13> Anonymous

Error: An unclassified exception occurred. StackTrace?:

<2008-03-04T09:16:14> Anonymous

Error: Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

StackTrace?:

  • MgMappingUtil?.StylizeLayers? line 776 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\mapping\MappingUtil?.cpp Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

Changed 4 years ago by zspitzer

  • summary changed from Server Crash and server unresponsive to Server Crash and Unresponsive using OSGeo.Gdal FDO provider

Changed 3 years ago by zspitzer

  • version changed from 2.0.0 to 2.0.2

Problem still occurs with 2.0.2 release & tiled maps

Changed 3 years ago by tomfukushima

  • cc stevedang added

We did stability tests with the GDAL provider and GeoTIFF and found that it was stable. Can you try your test again with GeoTIFF (you can use the GDAL utilities to convert to this format) and see if the problem occurs again? We know of a memory leak, but it's only a few KB at a time so it should take a while before that causes memory problems.

Changed 3 years ago by zspitzer

test case is here which will crash 2.0.2

you will need to tweak the path for the raster file which is unmanaged and is defined as being under c:\data\raster\

 http://ennoble.dreamhosters.com/tests//raster_ticket_462.mgp.zip

Changed 3 years ago by tomfukushima

Hi Zac, please add the steps to recreate the problem once the package is loaded.

Changed 3 years ago by zspitzer

just open up the basic layout and start to zoom in and pan around,

it doesn't take long for mapguide to stop rendering tiles

 http://localhost:8008/mapguide/mapviewerajax/?WEBLAYOUT=Library%3a%2f%2fraster%2fTrueMarble.8km.5400x2700+basic.WebLayout&LOCALE=en&USERNAME=Anonymous&PASSWORD=&

Changed 3 years ago by tomfukushima

Thanks Zac, I couldn't reproduce this using IE, but once I moved to using Google Chrome, a problem showed up right away. I only needed to do a single zoom and then pan. Steve, this is pretty easy to reproduce. Come see me if you want to see it. The server continues to work though, as I can continue to get in through Studio; it seems that the GDAL provider is hooped because I can't do anything with raster.

This is the error that I get:

<2008-11-25T17:34:24> 	Ajax Viewer	144.111.170.90	Anonymous
 Error: Failed to stylize layer: TrueMarble.8km.5400x2700 layer
        An unclassified exception occurred.
 StackTrace:
  - MgMappingUtil.StylizeLayers line 786 file d:\build\mapguide_open_source_v2.0\build_30.11\mgdev\server\src\services\mapping\MappingUtil.cpp	Failed to stylize layer: TrueMarble.8km.5400x2700 layer
An unclassified exception occurred.

Changed 3 years ago by jbirch

Changed 3 years ago by jbirch

Changed 3 years ago by jbirch

There are (at least) three problems with raster in 2.0.2.

The first is a memory leak that has since been fixed in 2.1.

The second is (my non-technical description) writing to unallocated memory. (see mapguide_raster_unalloc.5.patch Download)

The third is a defect in the way that MapGuide deals with single-threaded providers. The attachment mapguide_raster_stability.patch Download provides a workaround for this defect in conjunction with the GDAL provider.

However, this could conceptually happen with other providers. Haris is looking into the problem more in depth, but in the meantime explained the problem to me as follows, referencing the code around Line 660 of MappingUtil.cpp :

Assume two Raster layers accessed at same time.

  1. Raster connection to Layer 1 created
  2. ExecuteRasterQuery executed, class Georaster created which keeps pointer to connection ( not adding ref count)
  3. That threads goes into Stylize Layers
  4. Second thread goes into ExecuteRasterQuery, but is accessing another raster layer so can't use the same connection
  5. Second thread creates new connection to raster provider, but because the pool size for single-threaded providers is limited to 1 (and also because gdal provider didn't ref count++) the connection manager deletes the first thread's connection
  6. First thread which now in StylizeGridLayer finds that its connection was deleted and the pointer is gone

Result: Exception and corrupted memory

Changed 3 years ago by brucedechant

  • cc brucedechant added

Changed 3 years ago by brucedechant

If the FdO connection manager is deleting a connection in use that is a bug. Let me debug the server to see what is happening because we only see this issue with GDAL. I would like to know where the bug is - either in the handling of the single threaded providers or if it is just GDAL not referencing counting properly.

Changed 3 years ago by jbirch

  • cc haris added

Changed 3 years ago by traianstanev

GDAL raster provider patch to addref the FDO connection from the FeatureReader?

Changed 3 years ago by traianstanev

Fix refcounting problem + cache schema

Changed 3 years ago by jbirch

  • milestone changed from 2.0 to 2.1

Changed 3 years ago by brucedechant

  • owner set to brucedechant
  • status changed from reopened to new

Changed 3 years ago by brucedechant

  • status changed from new to assigned

Changed 3 years ago by brucedechant

  • status changed from assigned to closed
  • resolution set to fixed

Fixed.

See changeset r3829.

Note: See TracTickets for help on using tickets.