Opened 13 years ago
Closed 13 years ago
#3335 closed defect (fixed)
CopyLayer() method should use transactions similar to ogr2ogr
|Reported by:||rdewit||Owned by:||chaitanya|
Description (last modified by )
Copying from any data source to SQLite (at least in Python) is painfully slow when you don't explicitly use transactions.
The generic implementation of the OGRDataSource::CopyLayer() method for a data source should use transactions in a similar way to ogr2ogr by default, and have an option with similar functionality to -gt in ogr2ogr to control how many features are in a transaction.
See discussion here: http://lists.osgeo.org/pipermail/gdal-dev/2010-January/023220.html
Change History (6)
comment:1 by , 13 years ago
|Component:||PythonBindings → OGR_SF|
comment:2 by , 13 years ago
|Status:||new → closed|
comment:3 by , 13 years ago
|Status:||closed → reopened|
I'm a bit skeptical about your approach in implementing this :
- why are 2 loops necessary : the first that fills the area with the read features and the second one that writes them in the destination layer ? why not one single loop ? that would make things much more simple
- The logic of the second loop (line 376->393) seems wrong. If I understand well the code, if poDstLayer->CreateFeature() at line 382 fails, it will set bStopTransaction = FALSE and the test at line 376 will loop again. So I'm pretty sure that this loop will never terminate in that case. Unless that for some reason CreateFeature() on a feature ends up being successful after a few iterations. That might happen with some databases backend. But most of the time the failures will fail each time (for example because of geometry type incompatibility if you copy a layer with multiple geometry types into a layer th). Also I don't understand the purpose of nFeaturesToAdd = i; at line 384.
So basically, I think a simpler and more robust approach when we want to use transactions would be to basically put a StartTransaction() before the loop and a CommitTransaction() afterwards. That would save the current code duplication.
It would be good also to control via the papszOptions the number of features in the group, so as the user to be able to change the hard-coded value or turn it off if wanted.
follow-up: 5 comment:4 by , 13 years ago
Hum, now after a bit of thinking, I understand better how it works. I'm realizing that nFeaturesToAdd = i must indeed prevent from the endless looping, and retry to insert the features in the current group that are placed just before the problematic feature but were discarded because of the RollbackTransaction(). Then once it is done, it stops converting all remaining features, right ?
So, I guess this is OK, but maybe a bit complicated, especially as ogr2ogr is much more basic than this when dealing with failed insertions in the middle of a transaction ;-)
Anyway, I see no test of CopyLayer() in autotest/ogr. And my last run of gcov on GDAL also confirms it is totally untested ( http://even.rouault.free.fr/coverage/ogr/ogrsf_frmts/generic/ogrdatasource.cpp.gcov.html ). It would be desirable that you add testing of it.
comment:5 by , 13 years ago
Replying to rouault: ogr2ogr may have more basic code but I had a hard time following the transaction logic. That is the reason for this approach.
I too noticed the lack of testing. I will add a test in ogr_basic_test.py using the sqlite and pg drivers as destination datasources.
comment:6 by , 13 years ago
|Status:||reopened → closed|
A test was added in ogr_sqlite.py in the context of #3617
Please implement this when convenient.