Ticket #2409 (closed defect: fixed)

Opened 3 months ago

Last modified 3 months ago

ogr2ogr -skipfailures backs out previous successful insert to postgis

Reported by: kyngchaos Assigned to: warmerdam
Priority: normal Milestone: 1.5.2
Component: Utilities Version: 1.5.1
Severity: normal Keywords: ogr2ogr
Cc:

Description

When appending to a PostGIS table, using the -skipfailures flag, when an error is encountered and skipped, it looks like the previously successful feature insert is getting reverted/backed out. An example is appending a TIGER county shapefile to the table that is adjacent to a county already appended, where the lines along the county border are duplicated. I have the 'tlid' field set as the primary key (changed from the default ogc_fid field added by OGR on initial import).

ogr2ogr -update -append -skipfailures -a_srs EPSG:4269 -nln edges -f PostgreSQL PG:'dbname=tiger' fe_2007_55021_edges.shp

Where FIPS county 55025 (adjacent county to the south) has already been successfully and completely imported.

There are many failure errors, as expected for the duplicate shared lines along their common border. But now there are many missing lines inside the newly imported county that are not common county boundary lines.

When I look at the missing lines in the original shapefile, they appear to be the record just before a skipped county line. ie:

shape 1    OK
shape 2    missing interior line
shape 3    missing shared county boundary, the one skipped by ogr2ogr
shape 4    OK
...

None of the missing interior lines are in the error log, only the ones skipped.

Change History

06/04/08 22:29:31 changed by warmerdam

  • status changed from new to assigned.

I see your point. You should be able to use "-tg 1", an undocumented switch to set the transaction group size to 1 from the default 200. That way, only one feature is going to be handled per transaction.

If this fixes things for you, I will modify ogr2ogr so that -skipfailures implicitly changes the transaction group size to 1.

06/04/08 23:24:01 changed by kyngchaos

Ah, that does it. I ended up doing a similar thing with shp2pgsql - I had to strip out all the BEGIN; and END; lines in the sql before dumping that into psql.

It was quite a shock when my 25GB of TIGER in PostGIS truned out to be missing a lot of records! I'll use the -tg flag until it gets into the next release.

PS. Could there be a runtime-configurable value for MAX_NUMBER_OF_ERRORS_REPORTED in cpl_error/CPLDefaultErrorHandler()? In order to keep track of all the skipped features and verify that those are the only errors, the #defined 1000 was too little, and I commented out that limit so I could get all errors. Preferrably, some way to tell it no limit (0 or -1?).

06/05/08 10:07:41 changed by warmerdam

I have made the -skipfailures flag set the transaction group size to 1 in ogr2ogr.cpp in trunk (r14630) and 1.5 branch (r14632).

(follow-up: ↓ 5 ) 06/05/08 10:15:47 changed by warmerdam

  • status changed from assigned to closed.
  • severity changed from critical to normal.
  • component changed from OGR_SF to Utilities.
  • milestone set to 1.5.2.
  • keywords set to ogr2ogr.
  • resolution set to fixed.

William,

I have added a config option for the default error handler in trunk (r14632).

The option is "CPL_MAX_ERROR_REPORTS".

(in reply to: ↑ 4 ) 06/05/08 10:35:30 changed by kyngchaos

Replying to warmerdam:

I have added a config option for the default error handler in trunk (r14632). The option is "CPL_MAX_ERROR_REPORTS".

Cool. So, it looks like -1 means it hasn't been set yet (within CPLDefaultErrorHandler), and 0 is no limit?

Thanks.

06/05/08 10:52:54 changed by warmerdam

William,

That is correct.