Opened 10 years ago

Last modified 5 years ago

#462 reopened defect

GDAL FDO provider stability issues (patch)

Reported by: zspitzer Owned by: brucedechant
Priority: medium Milestone: 3.0
Component: Rendering Service Version:
Severity: blocker Keywords:
Cc: stevedang, brucedechant, haris External ID:

Description

I have just had a 2.0 RC4 server crash and was forced to reboot the entire server running with a lot of other applications.

I was playing with a tif via gdal and the server crashed with the following stack trace. the server remained up after afterwards, but the service would not respond to a stop request and was left hung, nothing is written to the error log afterward

I have seen similiar behaviour with 1.2, but i wasn't able to reproduce the error reliably

<2008-02-26T17:59:07> Administrator

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache?\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png StackTrace?:

  • MgTileServiceHandler?.ProcessOperation? line 83 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\TileServiceHandler?.cpp
  • MgOpGetTile?.Execute line 150 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\OpGetTile?.cpp
  • MgServerTileService?.GetTile? line 263 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\ServerTileService?.cpp
  • MgByteSink::ToFile? line 245 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\common\foundation\Data/ByteSink?.cpp
  • MgByteSink?.ToFile? line 220 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\common\foundation\Data/ByteSink?.cpp A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache?\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png

<2008-02-26T17:59:09> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred. <2008-02-26T17:59:10> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred.

Attachments (4)

mapguide_raster_unalloc.5.patch (1.6 KB) - added by jbirch 9 years ago.
mapguide_raster_stability.patch (976 bytes) - added by jbirch 9 years ago.
grfp_addref.patch (6.1 KB) - added by traianstanev 9 years ago.
GDAL raster provider patch to addref the FDO connection from the FeatureReader?
gdal_nomutex_cacheschema.patch (12.1 KB) - added by traianstanev 9 years ago.
Fix refcounting problem + cache schema

Download all attachments as: .zip

Change History (32)

comment:1 Changed 10 years ago by tomfukushima

Please try editing your serverconfig.ini file and change the line that reads DataConnectionPoolSizeCustom = to DataConnectionPoolSizeCustom = OSGeo.Gdal:1

and report this resolves your problem. Thanks, Tom

comment:2 Changed 10 years ago by tomfukushima

Wow, what ugly formatting, let's try again. Change the line that reads

DataConnectionPoolSizeCustom =

to

DataConnectionPoolSizeCustom = OSGeo.Gdal:1

comment:3 Changed 10 years ago by zspitzer

Just tried that, seems better, but I'm still getting this error, after which most layers (sdf, not raster) don't render properly until i restart the service

The service hasn't locked up again since the change

<2008-02-27T13:07:09> Administrator

Error: Failed to stylize layer: LayerDefinition35

An unclassified exception occurred.

StackTrace?:

An unclassified exception occurred.

comment:4 Changed 10 years ago by zspitzer

And then i am seeing

Cannot create any more connections to the OSGeo.Gdal FDO provider.

comment:5 Changed 10 years ago by tomfukushima

The connection problem has been fixed by submission r2978 (submitted after RC4 came out)

comment:6 Changed 10 years ago by tomfukushima

Resolution: fixed
Status: newclosed

comment:7 Changed 10 years ago by zspitzer

Resolution: fixed
Status: closedreopened

Still occurring with the final 2.0.0 release version

<2008-03-04T02:09:12> Anonymous

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache?\RASTER_True Marble_True Marble Tiled EPSG 4283/S3/True Martble/R120/C150/26_12.png StackTrace?:

<2008-03-04T09:16:13> Anonymous

Error: An unclassified exception occurred. StackTrace?:

<2008-03-04T09:16:14> Anonymous

Error: Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

StackTrace?:

  • MgMappingUtil?.StylizeLayers? line 776 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\mapping\MappingUtil?.cpp Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

comment:8 Changed 10 years ago by zspitzer

Summary: Server Crash and server unresponsiveServer Crash and Unresponsive using OSGeo.Gdal FDO provider

comment:9 Changed 9 years ago by zspitzer

Version: 2.0.02.0.2

Problem still occurs with 2.0.2 release & tiled maps

comment:10 Changed 9 years ago by tomfukushima

Cc: stevedang added

We did stability tests with the GDAL provider and GeoTIFF and found that it was stable. Can you try your test again with GeoTIFF (you can use the GDAL utilities to convert to this format) and see if the problem occurs again? We know of a memory leak, but it's only a few KB at a time so it should take a while before that causes memory problems.

comment:11 Changed 9 years ago by zspitzer

test case is here which will crash 2.0.2

you will need to tweak the path for the raster file which is unmanaged and is defined as being under c:\data\raster\

http://ennoble.dreamhosters.com/tests//raster_ticket_462.mgp.zip

comment:12 Changed 9 years ago by tomfukushima

Hi Zac, please add the steps to recreate the problem once the package is loaded.

comment:13 Changed 9 years ago by zspitzer

just open up the basic layout and start to zoom in and pan around,

it doesn't take long for mapguide to stop rendering tiles

http://localhost:8008/mapguide/mapviewerajax/?WEBLAYOUT=Library%3a%2f%2fraster%2fTrueMarble.8km.5400x2700+basic.WebLayout&LOCALE=en&USERNAME=Anonymous&PASSWORD=&

comment:14 Changed 9 years ago by tomfukushima

Thanks Zac, I couldn't reproduce this using IE, but once I moved to using Google Chrome, a problem showed up right away. I only needed to do a single zoom and then pan. Steve, this is pretty easy to reproduce. Come see me if you want to see it. The server continues to work though, as I can continue to get in through Studio; it seems that the GDAL provider is hooped because I can't do anything with raster.

This is the error that I get:

<2008-11-25T17:34:24> 	Ajax Viewer	144.111.170.90	Anonymous
 Error: Failed to stylize layer: TrueMarble.8km.5400x2700 layer
        An unclassified exception occurred.
 StackTrace:
  - MgMappingUtil.StylizeLayers line 786 file d:\build\mapguide_open_source_v2.0\build_30.11\mgdev\server\src\services\mapping\MappingUtil.cpp	Failed to stylize layer: TrueMarble.8km.5400x2700 layer
An unclassified exception occurred.

Changed 9 years ago by jbirch

Changed 9 years ago by jbirch

comment:15 Changed 9 years ago by jbirch

There are (at least) three problems with raster in 2.0.2.

The first is a memory leak that has since been fixed in 2.1.

The second is (my non-technical description) writing to unallocated memory. (see mapguide_raster_unalloc.5.patch)

The third is a defect in the way that MapGuide deals with single-threaded providers. The attachment mapguide_raster_stability.patch provides a workaround for this defect in conjunction with the GDAL provider.

However, this could conceptually happen with other providers. Haris is looking into the problem more in depth, but in the meantime explained the problem to me as follows, referencing the code around Line 660 of MappingUtil.cpp :

Assume two Raster layers accessed at same time.

  1. Raster connection to Layer 1 created
  2. ExecuteRasterQuery executed, class Georaster created which keeps pointer to connection ( not adding ref count)
  3. That threads goes into Stylize Layers
  4. Second thread goes into ExecuteRasterQuery, but is accessing another raster layer so can't use the same connection
  5. Second thread creates new connection to raster provider, but because the pool size for single-threaded providers is limited to 1 (and also because gdal provider didn't ref count++) the connection manager deletes the first thread's connection
  6. First thread which now in StylizeGridLayer finds that its connection was deleted and the pointer is gone

Result: Exception and corrupted memory

comment:16 Changed 9 years ago by brucedechant

Cc: brucedechant added

comment:17 Changed 9 years ago by brucedechant

If the FdO connection manager is deleting a connection in use that is a bug. Let me debug the server to see what is happening because we only see this issue with GDAL. I would like to know where the bug is - either in the handling of the single threaded providers or if it is just GDAL not referencing counting properly.

comment:18 Changed 9 years ago by jbirch

Cc: haris added

Changed 9 years ago by traianstanev

Attachment: grfp_addref.patch added

GDAL raster provider patch to addref the FDO connection from the FeatureReader?

Changed 9 years ago by traianstanev

Fix refcounting problem + cache schema

comment:19 Changed 9 years ago by jbirch

Milestone: 2.02.1

comment:20 Changed 9 years ago by brucedechant

Owner: set to brucedechant
Status: reopenednew

comment:21 Changed 9 years ago by brucedechant

Status: newassigned

comment:22 Changed 9 years ago by brucedechant

Resolution: fixed
Status: assignedclosed

Fixed.

See changeset r3829.

comment:23 Changed 6 years ago by zspitzer

Component: GeneralRendering Service
Milestone: 2.12.4
Resolution: fixed
Status: closedreopened
Summary: Server Crash and Unresponsive using OSGeo.Gdal FDO providerGDAL FDO provider stability issues (patch)
Version: 2.0.2

re-opening as there are useful patches which haven't been applied yet

comment:25 Changed 6 years ago by brucedechant

Please list and attach the new patches that need to be applied.

comment:26 Changed 6 years ago by brucedechant

Specifically the MapGuide source patches. FDO source patches should be added to the linked FDO trac ticket directly.

comment:27 Changed 5 years ago by jng

Milestone: 2.42.5

comment:28 Changed 5 years ago by jng

Milestone: 2.52.6
Note: See TracTickets for help on using tickets.