Opened 5 years ago

Closed 17 months ago

#4792 closed enhancement (fixed)

ogr2ogr - SQL Server data loading very slow

Reported by: jduchesne Owned by: tamas
Priority: normal Milestone:
Component: OGR_SF Version: 1.9.1
Severity: normal Keywords: mssql
Cc: tamas

Description

Following is a discussion we've had on the mailing list.

Basically, it seems that the SQL Server driver does not use batch/bulk insert.

It's at least 10 times slower than what it could be.

I left the priority and severity to normal, but in reality, for me, it means that I will have to go with a custom made solution. I could have done the fix myself I guess, but the time involved to setup a development environment for ogr2ogr would exceed the time to manually read a mapinfo file and right through jdbc.

Keep up the good work by the way.

Discussions:

Jimmy,

Seems we can get the significant improvement by using bulk inserts then. You may submit a ticket to keep this enhancements in scope. Some of the guys used the driver successfully by using FreeTDS on Linux.

Best regards,

Tamas

2012/8/27 Duchesne, Jimmy <JDuchesne@…> Thanks for the answers guys.

What you're saying concurs with what I was experiencing, and I believe, what I was seeing with the SQL Server Management profiler.

Also, I don't know if inserting with something else than WKT would actually improve the performance because my test, bulk inserts done with Java, were also inserting using WKT.

Finally, just to be sure, is there any issue with using this driver on a Linux OS?

Jimmy.


From: fwarmerdam@… fwarmerdam@… On Behalf Of Frank Warmerdam Sent: August-24-12 6:39 PM To: Duchesne, Jimmy Cc: gdal-dev@…; Szekeres Tamás Subject: Re: [gdal-dev] SQL Server Driver

Jimmy,

From an examination of MSSQLSpatialTableLayer::CreateFeature?() it appears a regular INSERT statement is done for each feature written to the db rather than any sort of bulk mechanism. I could be missing something of course. Hopefully Tamas will provide a more experienced answer.

Best regards, Frank

On Fri, Aug 24, 2012 at 3:03 PM, Duchesne, Jimmy <JDuchesne@…> wrote:

Hello,

I've been using SQL Server Driver for a few days now, and I noticed it was very slow, whatever the parameters, such as GT, I was using.

I need to insert millions of rows in a table.

To give you an idea, it takes around 1 minute to import 10k records with the driver

On the other hand, if I write my own routine which reads a MapInfo? file and does batch insert into the database, I can insert over 125k records per minute.

I tried reading the same TAB file with ogr2ogr but writing to PostGis? and noticed the speed was about the same than my routine, which is enough for my needs.

Would it be possible that the SQL Server Driver does not do bulk insert?

Thanks for the help.

Jimmy Duchesne

Change History (8)

comment:1 Changed 5 years ago by Even Rouault

Cc: tamas added
Component: UtilitiesOGR_SF
Keywords: mssql added

comment:2 Changed 4 years ago by tamas

Owner: changed from warmerdam to tamas

comment:3 Changed 4 years ago by tamas

It doesn't seem to be an option to implement bulk insert at the moment since the OGR API doesn't provide the option to pass multiple features in a single call. It would apply for at least a "CreateFeatures?" method and make sure ogr2ogr can utilize this method. Another problem is that the ODBC standard does not directly support SQL Server bulk operations. We'd require to bind to specific drivers and nonstandard extensions to support this option.

comment:4 Changed 4 years ago by Even Rouault

Tamas,

FYI, as far as the OGR side of things is concerned, there's a similar behaviour in the PG driver where you can define the configuration option PG_USE_COPY=YES. When it is set, CreateFeature?() uses the PQputCopyData() API instead of INSERT SQL statement. At CommitTransaction?() time (or at datasource closing), the current COPY is ended with PQputCopyEnd(). This works fine with ogr2ogr.

comment:5 in reply to:  4 Changed 4 years ago by tamas

It would indeed require some tests to make sure how amount of improvement can be achieved. The driver is already aware of transaction handling which could somewhat improve the performance (by executing several inserts in a single transaction) but that seems to require further enhancement.

comment:6 Changed 19 months ago by tamas

Status: newassigned

comment:7 Changed 19 months ago by tamas

Implemented in r33775

comment:8 Changed 17 months ago by tamas

Resolution: fixed
Status: assignedclosed

Bulk insert is implemented in GDAL 2.x

Note: See TracTickets for help on using tickets.