Opened 9 years ago

Closed 9 years ago

Last modified 17 months ago

#2409 closed defect (fixed)

ogr2ogr -skipfailures backs out previous successful insert to postgis

Reported by: kyngchaos Owned by: warmerdam
Priority: normal Milestone: 1.5.2
Component: Utilities Version: 1.5.1
Severity: normal Keywords: ogr2ogr
Cc:

Description

When appending to a PostGIS table, using the -skipfailures flag, when an error is encountered and skipped, it looks like the previously successful feature insert is getting reverted/backed out. An example is appending a TIGER county shapefile to the table that is adjacent to a county already appended, where the lines along the county border are duplicated. I have the 'tlid' field set as the primary key (changed from the default ogc_fid field added by OGR on initial import).

ogr2ogr -update -append -skipfailures -a_srs EPSG:4269 -nln edges -f PostgreSQL PG:'dbname=tiger' fe_2007_55021_edges.shp

Where FIPS county 55025 (adjacent county to the south) has already been successfully and completely imported.

There are many failure errors, as expected for the duplicate shared lines along their common border. But now there are many missing lines inside the newly imported county that are not common county boundary lines.

When I look at the missing lines in the original shapefile, they appear to be the record just before a skipped county line. ie:

shape 1    OK
shape 2    missing interior line
shape 3    missing shared county boundary, the one skipped by ogr2ogr
shape 4    OK
...

None of the missing interior lines are in the error log, only the ones skipped.

Change History (9)

comment:1 Changed 9 years ago by warmerdam

Status: newassigned

I see your point. You should be able to use "-tg 1", an undocumented switch to set the transaction group size to 1 from the default 200. That way, only one feature is going to be handled per transaction.

If this fixes things for you, I will modify ogr2ogr so that -skipfailures implicitly changes the transaction group size to 1.

comment:2 Changed 9 years ago by kyngchaos

Ah, that does it. I ended up doing a similar thing with shp2pgsql - I had to strip out all the BEGIN; and END; lines in the sql before dumping that into psql.

It was quite a shock when my 25GB of TIGER in PostGIS truned out to be missing a lot of records! I'll use the -tg flag until it gets into the next release.

PS. Could there be a runtime-configurable value for MAX_NUMBER_OF_ERRORS_REPORTED in cpl_error/CPLDefaultErrorHandler()? In order to keep track of all the skipped features and verify that those are the only errors, the #defined 1000 was too little, and I commented out that limit so I could get all errors. Preferrably, some way to tell it no limit (0 or -1?).

comment:3 Changed 9 years ago by warmerdam

I have made the -skipfailures flag set the transaction group size to 1 in ogr2ogr.cpp in trunk (r14630) and 1.5 branch (r14632).

comment:4 Changed 9 years ago by warmerdam

Component: OGR_SFUtilities
Keywords: ogr2ogr added
Milestone: 1.5.2
Resolution: fixed
Severity: criticalnormal
Status: assignedclosed

William,

I have added a config option for the default error handler in trunk (r14632).

The option is "CPL_MAX_ERROR_REPORTS".

comment:5 in reply to:  4 Changed 9 years ago by kyngchaos

Replying to warmerdam:

I have added a config option for the default error handler in trunk (r14632).

The option is "CPL_MAX_ERROR_REPORTS".

Cool. So, it looks like -1 means it hasn't been set yet (within CPLDefaultErrorHandler), and 0 is no limit?

Thanks.

comment:6 Changed 9 years ago by warmerdam

William,

That is correct.

comment:7 Changed 21 months ago by tamas

Applied a change in ogr2ogr to prevent from the gt setting to override transaction group size of 1 set by skipfailures earlier (r30690, r30691, r30692)

comment:8 Changed 20 months ago by ethervoid

Hi!

Is this problem only affecting 'append' operations? if that is the case it'll be great to also check for the 'append' flag, in the if condition, in order to avoid problems like the one we have, with non-append operations that have 1 feature per transaction instead of the default 20k slowing down a lot the process.

Thank you!

comment:9 Changed 17 months ago by Jukka Rahkonen

It does not affect only "append". "skipfailures" must always lead to using gt=1 so that only one row at a time is tried to insert and commit and just the failures can be skipped.

Instead of "skipfailures" it is sometimes possible to filter out faulty input geometries with SQLite dialect:

-dialect SQLite -sql "SELECT * from source WHERE ST_IsValid(geometry)=1"
Note: See TracTickets for help on using tickets.