Opened 13 years ago

Closed 11 years ago

#4270 closed defect (fixed)

FileGDB update freezes / ogr2ogr problems when using a datasource both as input and output

Reported by: peifer Owned by: warmerdam
Priority: normal Milestone: 1.10.1
Component: OGR_SF Version:
Severity: normal Keywords: FileGDB
Cc: Even Rouault, rburhum, Kyle Shannon, springmeyer

Description

I used the below command to copy layer1 as layer2. layer2 is created, but the process runs endlessly on 100% CPU until I interrupt it.

ogr2ogr -f filegdb demo.gdb demo.gdb layer1 -nln layer2 -update

Attachments (1)

demo.gdb.zip (29.4 KB ) - added by peifer 13 years ago.
demo file geodatabase

Download all attachments as: .zip

Change History (21)

by peifer, 13 years ago

Attachment: demo.gdb.zip added

demo file geodatabase

comment:1 by peifer, 13 years ago

I forgot to mention that the progress meter also considers the work to be done:

0...10...20...30...40...50...60...70...80...90...100 - done.

comment:2 by Even Rouault, 13 years ago

Milestone: 1.9.0
Resolution: fixed
Status: newclosed

r23147 /trunk/gdal/apps/ogr2ogr.cpp: ogr2ogr: fix 'ogr2ogr -update a.gdb a.gdb srclayername -nln dstlayername' (#4270)

comment:3 by warmerdam, 12 years ago

Cc: Even Rouault rburhum added
Milestone: 1.9.0
Resolution: fixed
Status: closedreopened

With all due respect this needs to be fixed in the FileGDB driver. It is unacceptable to have this sort of hackery in the utilities things still won't work right for other applications without similar hacks. Possibly I'll have some time to work on this.

comment:4 by Even Rouault, 12 years ago

I agree the fix is not satisfactory. We could also use OpenShared() as a better way of having a single datasource handle, but that's not much better intrisically. IMHO the bug is in the FileGDB SDK itself, not in the driver.

See the following python script :

from osgeo import ogr
from osgeo import osr
import shutil

shutil.rmtree('test7270.gdb')

ds = ogr.GetDriverByName('FileGDB').CreateDataSource('test7270.gdb')
ds = None

ds = ogr.Open('test7270.gdb', update = 1)
sr = osr.SpatialReference()
sr.SetFromUserInput('WGS84')
ds.CreateLayer('alayer', geom_type = ogr.wkbPoint, srs = sr)

ds2 = ogr.Open('test7270.gdb')
print('after ds2 = ogr.Open(test7270.gdb)')
ds2 = None
print('after ds2 = None. Oops I never get to that point')

ds = None
print('after ds = None')

It stalls at :

after ds2 = ogr.Open(test7270.gdb)
^C
Program received signal SIGINT, Interrupt.
0x00007fffeda638ec in FileGDBAPI::Geodatabase::CloseGeodatabase() () from /home/even/FileGDB_API/lib/libFileGDBAPI.so
(gdb) bt
#0  0x00007fffeda638ec in FileGDBAPI::Geodatabase::CloseGeodatabase() () from /home/even/FileGDB_API/lib/libFileGDBAPI.so
#1  0x00007fffeda6a279 in FileGDBAPI::CloseGeodatabase(FileGDBAPI::Geodatabase&) () from /home/even/FileGDB_API/lib/libFileGDBAPI.so
#2  0x00007ffff5d83b82 in ~FGdbDataSource (this=0xa069a0, __in_chrg=<value optimized out>) at FGdbDatasource.cpp:67
#3  0x00007ffff5face70 in OGRSFDriverRegistrar::ReleaseDataSource (this=0x993ea0, poDS=0xa069a0) at ogrsfdriverregistrar.cpp:438
#4  0x00007ffff5fad24a in OGRReleaseDataSource (hDS=0xa069a0) at ogrsfdriverregistrar.cpp:518

But if you close ds before ds2, it works...

comment:5 by warmerdam, 12 years ago

It seems to me that the FileGDB driver itself should be doing keeping track of the low level database handles and reusing in a reference counted way if the FileGDB SDK can't handle multiple opens on the same file.

I realize this may require some work and it doesn't need to be done right away. But I wanted to reopen this ticket as a reminder that we should move in that direction.

comment:6 by Even Rouault, 12 years ago

Summary: FileGDB update freezesFileGDB update freezes / ogr2ogr problems when using a datasource both as input and output

I've encoutered a similar issue with sqlite driver. First issue is that sqlite3_step didn't like the database schema change when iterating over the source layer and creating the target layer. I mamaged to fix that by using sqlite3_prepare_v2() instead of sqlite3_prepare() that is compatible witch schema changes. However, this doesn't solve the whole problem since I finally get the following error :

ERROR 1: COMMIT transaction failed: database is locked
ERROR 1: BEGIN transaction failed: cannot start a transaction within a transaction

Looks like the 2 handles to the database deadlock... I'm afraid that there might be other drivers also that don't like when their backing file are changed behind their back... So for the time being, I'm generalizing the hack in ogr2ogr... r23677

comment:7 by Even Rouault, 12 years ago

Test case added in r23678

comment:8 by Even Rouault, 12 years ago

This becomes really interesting. The generalization doesn't work with the way the PG driver manages transactions. So for the PG driver, we must open 2 distinct connections... So it looks like the PG driver should ultimately be fixed too to have better transaction support. I somehow remember that there's an old ticket with an extensive patch to change transaction support in the PG driver.

r23679 /trunk/ (autotest/utilities/test_ogr2ogr.py gdal/apps/ogr2ogr.cpp): Restrict again r23677 to only FGdb and SQlite drivers. Known to cause problems with PG driver otherwise (#4270)

comment:9 by Kyle Shannon, 12 years ago

Cc: Kyle Shannon added

comment:10 by springmeyer, 11 years ago

Cc: springmeyer added

Not sure if it is related, but I just tried using the FileGDB (built against the API 1.2.0.136) via OGR (at r25770) on OS X in a multithreaded read-only, rendering environment with Mapnik.

I immediately hit a hang in each thread (which may actually be a crash within the thread):

    +         2498 mapnik::feature_style_processor<mapnik::agg_renderer<mapnik::image_32> >::apply(double)  (in libmapnik.dylib) + 476  [0x10400cbfc]
    +           2498 mapnik::feature_style_processor<mapnik::agg_renderer<mapnik::image_32> >::apply_to_layer(mapnik::layer const&, mapnik::agg_renderer<mapnik::image_32>&, mapnik::projection const&, double, double, unsigned int, unsigned int, mapnik::box2d<double> const&, int, std::set<std::string, std::less<std::string>, std::allocator<std::string> >&)  (in libmapnik.dylib) + 6916  [0x10400e954]
    +             2498 mapnik::feature_style_processor<mapnik::agg_renderer<mapnik::image_32> >::render_style(mapnik::layer const&, mapnik::agg_renderer<mapnik::image_32>&, mapnik::feature_type_style const*, mapnik::rule_cache const&, std::string const&, boost::shared_ptr<mapnik::Featureset>, mapnik::proj_transform const&)  (in libmapnik.dylib) + 144  [0x10400f430]
    +               2498 ogr_featureset::next()  (in ogr.input) + 54  [0x10c5b5cf6]
    +                 2498 FGdbBaseLayer::GetNextFeature()  (in libgdal.1.dylib) + 276  [0x1054dab64]  FGdbLayer.cpp:92
    +                   2498 FileGDBAPI::EnumRows::Next(FileGDBAPI::Row&)  (in libFileGDBAPI.dylib) + 46  [0x105ca879c]
    +                     2498 SqlSelectCommand::Next()  (in libFileGDBAPI.dylib) + 86  [0x105d8bd58]
    +                       2498 StandardDatafile::ReadNextRow(ScanContext&, FieldValue*, int*)  (in libFileGDBAPI.dylib) + 102  [0x105d96b5e]
    +                         2498 StandardDatafile::FindMatchingRow(ScanContext&)  (in libFileGDBAPI.dylib) + 139  [0x105d96345]
    +                           2498 StandardDatafile::SeekToRow(ScanContext&)  (in libFileGDBAPI.dylib) + 121  [0x105d9547f]
    +                             2498 StandardDatafile::SetFieldLocatorsForRead(ScanContext&)  (in libFileGDBAPI.dylib) + 80  [0x105d8fb54]
    +                               2498 FieldLocators::SetFieldLocatorsForRead(unsigned char*, int&)  (in libFileGDBAPI.dylib) + 199  [0x105d78f13]
    +                                 2498 VLUInt::Expand(unsigned char*, unsigned int&)  (in libFileGDBAPI.dylib) + 4  [0x105d2ad14]
}}

comment:11 by Even Rouault, 11 years ago

Dane,

Just to be sure of the context :

  • can you confirm that each of your threads is using a different OGR datasource ? (if not, it's clearly unsupported by OGR and anything can happen)
  • And in the case they use they use a different OGR datasource, are those datasources opening a different FileGDB directory ?

comment:12 by rburhum, 11 years ago

The FileGDB API has thread-affinity. The thread that opens it is the thread that should be accessing it.

OGR also has thread-affinity.

Can you comment on rouault's questions?

comment:13 by springmeyer, 11 years ago

Hey Guys,

In my case I am using Node.js to call into Mapnik, which calls into OGR. Node has a dedicated threadpool of N threads (usually 4). When targeting N threads for map rendering I create N copies of a mapnik ogr_datasource which wraps one-to-one a new OGRDatasource *. These are created in the "main loop", so on one thread, and then dispatched to the thread pool to be used (only at this time is OGR ask for features). No instance of an OGRDatasource * is used/accessed at the same time by multiple threads, so for all other OGR datasources I use frequently use (mainly GeoJSON and GPX) this works fine.

So independent instances of OGRDatasource * are accessing the same gdb directory on the disk. It sounds like based on the way that FileGDB works, this is not going to fly? If I want to concurrently access a FileGDB from multiple threads do I then need to copy the file on disk N times? Is this usage also incorrect in general with OGR - am I getting away with it simply based on how the GeoJSON/GPX drivers work?

comment:14 by Even Rouault, 11 years ago

Ragi,

To my knowledge, OGR doesn't have thread-affinity (if thread-affinity means that an object can be used by a single thread, the one that created it ?). An OGR datasource can be used by several threads, but not at the same time. After that, if the underlying library such as FileGDB API has thread-affinity, then of course this constraint goes up to the OGR datasource. Where is it documented that FileGDB API has thread-affinity ?

Dane, I remember that the FileGDB API creates lock files in the directory, so it might indeed be incorrect to use even different OGR datasources (of FileGDB) that point to the same directory. For most other drivers (and certainly GeoJSON and GPX), there's no such constraint, and the mapnik access pattern is OK.

comment:15 by rburhum, 11 years ago

Even, Dane,

Let me explain:

The difference between the GeoJSON and GPX drivers is that they may not have critical sections/semaphores so things just go on fine and dandy, but the FileGDB API most likely does. In fact, I know for a fact, that at some point in the past, the internal implementation in GeoDatabases included a Workspace + ThreadID as a hash on all the singletons that open these workspaces.

Nevertheless, I do not know enough about the node.js internals. I know it has a thread-pool and an event-loop that, whenever any of the tasks encounter an I/O, will cause that task to set a callback so that any of the other tasks can execute while the the I/O is happening.

Please note that this behavior is different than thread-affinity. Yes, two threads will not access the same memory at the same time, but that doesn't guarantee that the *same thread* that called open is the one doing the reading (i.e *true* thread affinity). The whole concept of a thread-pool is that any of the threads are equal and any of them can do the work. AFAIK, That is not true in this instance. AFAIK, if the FileGDB completely mimics the ArcObjects GeoDatabase internals, the it is highly likely that it also has thread-affinity. This would be because Multithreading in ArcObjects is implemented as STA COM Apartments. http://stackoverflow.com/questions/127188/could-you-explain-sta-and-mta

Is there a node.js specific way to guarantee thread-affinity? I have a feeling that copying files around may not work because of the thread-affinity issue... I hope I am wrong.

comment:16 by springmeyer, 11 years ago

I've never heard of thread affinity before, so I need to do more reading on this. It sounds like it's mostly a windows thing that will not play nicely with the general premise of using a thread pool to dispatch work that is queued in the main loop - which is what Node.js makes easy but is also a common pattern. So yes, in my case open is being called in a different (the main) thread than the reading is being done in (a thread in the pool). If that is what thread affinity blocks/disallows, then it makes total sense why I encountered this hang.

It sounds like a possible workaround for me is to do a deepcopy of the OGR Datasource in each thread pool, right before doing the reading to get a handle that has affinity to the thread. This would be expensive I presume, but maybe the only way forward.

comment:17 by rburhum, 11 years ago

Hi Dane,

Well, not just a Windows thing. In iOS, all the drawing has to be done on the main thread. Hence why there are methods like "performSelectorOnMainThread" exist https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSObject_Class/Reference/Reference.html

AFAIK, Android has the same restrictions.

I don't think DeepCopy will work. Again, I am hoping I am wrong on this.

:-(

comment:18 by rburhum, 11 years ago

I got a confirmation from ESRI that there are not critical sections or anything related to thread synchronization. If you are experience *blocking* and not a crash, I wonder if the problem is being caused somewhere else up the stack...

comment:19 by Even Rouault, 11 years ago

trunk r26307 "FileGDB: implement ref counting of the FileGDB SDK API' Geodatabase* object to avoid issues on Linux 64bit with interleaved opening and closing of databases (#4270)"

comment:20 by Even Rouault, 11 years ago

Milestone: 1.10.1
Resolution: fixed
Status: reopenedclosed
Version: svn-trunk

r26307 backported in 1.10 in r26310

Note: See TracTickets for help on using tickets.