Opened 14 years ago

Closed 14 years ago

#3335 closed defect (fixed)

CopyLayer() method should use transactions similar to ogr2ogr

Reported by: rdewit Owned by: chaitanya
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: CopyLayer
Cc:

Description (last modified by warmerdam)

Copying from any data source to SQLite (at least in Python) is painfully slow when you don't explicitly use transactions.

The generic implementation of the OGRDataSource::CopyLayer() method for a data source should use transactions in a similar way to ogr2ogr by default, and have an option with similar functionality to -gt in ogr2ogr to control how many features are in a transaction.

See discussion here: http://lists.osgeo.org/pipermail/gdal-dev/2010-January/023220.html

Change History (6)

comment:1 by warmerdam, 14 years ago

Component: PythonBindingsOGR_SF
Description: modified (diff)
Keywords: CopyLayer added
Owner: changed from hobu to chaitanya

Chaitanya,

Please implement this when convenient.

comment:2 by chaitanya, 14 years ago

Resolution: fixed
Status: newclosed

Applied a patch in trunk (r18845) and 1.7 branch (r18846).

comment:3 by Even Rouault, 14 years ago

Resolution: fixed
Status: closedreopened

Chaitanya,

I'm a bit skeptical about your approach in implementing this :

  • why are 2 loops necessary : the first that fills the area with the read features and the second one that writes them in the destination layer ? why not one single loop ? that would make things much more simple
  • The logic of the second loop (line 376->393) seems wrong. If I understand well the code, if poDstLayer->CreateFeature() at line 382 fails, it will set bStopTransaction = FALSE and the test at line 376 will loop again. So I'm pretty sure that this loop will never terminate in that case. Unless that for some reason CreateFeature() on a feature ends up being successful after a few iterations. That might happen with some databases backend. But most of the time the failures will fail each time (for example because of geometry type incompatibility if you copy a layer with multiple geometry types into a layer th). Also I don't understand the purpose of nFeaturesToAdd = i; at line 384.

So basically, I think a simpler and more robust approach when we want to use transactions would be to basically put a StartTransaction() before the loop and a CommitTransaction() afterwards. That would save the current code duplication.

It would be good also to control via the papszOptions the number of features in the group, so as the user to be able to change the hard-coded value or turn it off if wanted.

comment:4 by Even Rouault, 14 years ago

Hum, now after a bit of thinking, I understand better how it works. I'm realizing that nFeaturesToAdd = i must indeed prevent from the endless looping, and retry to insert the features in the current group that are placed just before the problematic feature but were discarded because of the RollbackTransaction(). Then once it is done, it stops converting all remaining features, right ?

So, I guess this is OK, but maybe a bit complicated, especially as ogr2ogr is much more basic than this when dealing with failed insertions in the middle of a transaction ;-)

Anyway, I see no test of CopyLayer() in autotest/ogr. And my last run of gcov on GDAL also confirms it is totally untested ( http://even.rouault.free.fr/coverage/ogr/ogrsf_frmts/generic/ogrdatasource.cpp.gcov.html ). It would be desirable that you add testing of it.

in reply to:  4 comment:5 by chaitanya, 14 years ago

Replying to rouault: ogr2ogr may have more basic code but I had a hard time following the transaction logic. That is the reason for this approach.

I too noticed the lack of testing. I will add a test in ogr_basic_test.py using the sqlite and pg drivers as destination datasources.

comment:6 by Even Rouault, 14 years ago

Resolution: fixed
Status: reopenedclosed

A test was added in ogr_sqlite.py in the context of #3617

Note: See TracTickets for help on using tickets.