Opened 12 years ago
Closed 8 years ago
#4792 closed enhancement (fixed)
ogr2ogr - SQL Server data loading very slow
Reported by: | jduchesne | Owned by: | tamas |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | 1.9.1 |
Severity: | normal | Keywords: | mssql |
Cc: | tamas |
Description
Following is a discussion we've had on the mailing list.
Basically, it seems that the SQL Server driver does not use batch/bulk insert.
It's at least 10 times slower than what it could be.
I left the priority and severity to normal, but in reality, for me, it means that I will have to go with a custom made solution. I could have done the fix myself I guess, but the time involved to setup a development environment for ogr2ogr would exceed the time to manually read a mapinfo file and right through jdbc.
Keep up the good work by the way.
Discussions:
Jimmy,
Seems we can get the significant improvement by using bulk inserts then. You may submit a ticket to keep this enhancements in scope. Some of the guys used the driver successfully by using FreeTDS on Linux.
Best regards,
Tamas
2012/8/27 Duchesne, Jimmy <JDuchesne@…> Thanks for the answers guys.
What you're saying concurs with what I was experiencing, and I believe, what I was seeing with the SQL Server Management profiler.
Also, I don't know if inserting with something else than WKT would actually improve the performance because my test, bulk inserts done with Java, were also inserting using WKT.
Finally, just to be sure, is there any issue with using this driver on a Linux OS?
Jimmy.
From: fwarmerdam@… fwarmerdam@… On Behalf Of Frank Warmerdam Sent: August-24-12 6:39 PM To: Duchesne, Jimmy Cc: gdal-dev@…; Szekeres Tamás Subject: Re: [gdal-dev] SQL Server Driver
Jimmy,
From an examination of MSSQLSpatialTableLayer::CreateFeature() it appears a regular INSERT statement is done for each feature written to the db rather than any sort of bulk mechanism. I could be missing something of course. Hopefully Tamas will provide a more experienced answer.
Best regards, Frank
On Fri, Aug 24, 2012 at 3:03 PM, Duchesne, Jimmy <JDuchesne@…> wrote:
Hello,
I've been using SQL Server Driver for a few days now, and I noticed it was very slow, whatever the parameters, such as GT, I was using.
I need to insert millions of rows in a table.
To give you an idea, it takes around 1 minute to import 10k records with the driver
On the other hand, if I write my own routine which reads a MapInfo file and does batch insert into the database, I can insert over 125k records per minute.
I tried reading the same TAB file with ogr2ogr but writing to PostGis and noticed the speed was about the same than my routine, which is enough for my needs.
Would it be possible that the SQL Server Driver does not do bulk insert?
Thanks for the help.
Jimmy Duchesne
Change History (8)
comment:1 by , 12 years ago
Cc: | added |
---|---|
Component: | Utilities → OGR_SF |
Keywords: | mssql added |
comment:2 by , 11 years ago
Owner: | changed from | to
---|
comment:3 by , 11 years ago
follow-up: 5 comment:4 by , 11 years ago
Tamas,
FYI, as far as the OGR side of things is concerned, there's a similar behaviour in the PG driver where you can define the configuration option PG_USE_COPY=YES. When it is set, CreateFeature() uses the PQputCopyData() API instead of INSERT SQL statement. At CommitTransaction() time (or at datasource closing), the current COPY is ended with PQputCopyEnd(). This works fine with ogr2ogr.
comment:5 by , 11 years ago
It would indeed require some tests to make sure how amount of improvement can be achieved. The driver is already aware of transaction handling which could somewhat improve the performance (by executing several inserts in a single transaction) but that seems to require further enhancement.
comment:6 by , 8 years ago
Status: | new → assigned |
---|
comment:8 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Bulk insert is implemented in GDAL 2.x
It doesn't seem to be an option to implement bulk insert at the moment since the OGR API doesn't provide the option to pass multiple features in a single call. It would apply for at least a "CreateFeatures" method and make sure ogr2ogr can utilize this method. Another problem is that the ODBC standard does not directly support SQL Server bulk operations. We'd require to bind to specific drivers and nonstandard extensions to support this option.