Opened 15 years ago

Closed 13 years ago

Last modified 7 years ago

#110 closed enhancement (fixed)

shp2pgsql new option to not batch commit

Reported by: robe Owned by: mcayland
Priority: low Milestone: PostGIS Fund Me
Component: postgis Version: master
Keywords: Cc:

Description (last modified by jadams)

What steps will reproduce the problem?

  1. If you have some geometries that fail insert such as polygons without

closed rings, it kills the current batch and gives a more or less meaningless message

  1. This has happened to me a lot and a lot of users we train. A lot of the

time the data should just rightfully not be added.

  1. Write now to overcome this I generate a .sql file and use sed to remove

the begin commits which means I can't use the normal | to directly load

I think the simplest option is to allow users with a flag to have the option to not have begin commits so that everything is in its own transaction. That way at least a whole batch of good records aren't lost because of one bad apple.

I know we have talked about other options such as allowing these beasts into the database or nulling the geometry if invalid etc — though those options are trickier to implement and may not handle all cases.

Change History (17)

comment:1 by mcayland, 15 years ago

Two things here:

i) Inserting transactions throughout a large dataset is, IMHO, wrong. You either want the whole set of data inserted atomically, or you handle the consequences yourself. So I would suggest removing all of the code related to batching in LoadData() and just have one single BEGIN…COMMIT block. At least then if the insert fails, you can manually remove the BEGIN/COMMIT yourself and live with what gets inserted.

ii) The 1.4 version of loader currently throws an error if an invalid geometry is inserted; it would be fairly easy to modify this so that the erroneus geometry were simply omitted.

ATB,

Mark.

comment:2 by robe, 15 years ago

Though that would still leave me with not able to |.

The old shp2pgsql (i) does do that already. I always thought the reason for commits every 250 records was for some performance reason.

comment:3 by mcayland, 15 years ago

Sure. In terms of performance, putting everything in one transaction is a lot quicker than inserting everything separately. So removing the COMMIT every 250 records with actually speed things up, and saves you having to manually sort out by hand what got inserted and what didn't - you can just re-run the batch. It still makes sense to wrap everything within a single transaction.

So then the solution you're looking for is actually ii).

ATB,

Mark.

comment:4 by robe, 15 years ago

Yes 2 if possible — also have the option to NULL the geometry. I think others have asked for the ability to NULL.

Its a reality we get bad data and the last set I had to deal with was intentionally bad. They had used the same malformed geometry to represent the absense of a geometry — presumably to satisfy ESRI shapefile that requires a geometry for all records.

After trying to salvage these things — closing them and so forth — to my shock they were all the same malformed creatures.

comment:5 by robe, 15 years ago

Well doing i would be good too. Though not sure its worth it for 1.3.6.

Ah what am I thinking I can use sed to pipe the pipe I guess. So its not a big deal except on windows where I have no sed.

comment:6 by mcayland, 15 years ago

Sure. 1.3 codebase is maintenance only as far as I am concerned; any changes will go to 1.4. Feel free to submit a patch BTW ;)

ATB,

Mark.

comment:7 by bobdebm, 15 years ago

I was just wondering if there could be an option in shp2pgsql that puts all of the 'not isvalid(the_geom)' geometries in a same named table with a '_err' suffix?

comment:8 by mcayland, 15 years ago

The validity check at the moment is currently only OGC validity (i.e. closed rings, correct number of points), not topological (IsValid) validity. There is currently no link between the new liblwgeom and GEOS.

ATB,

Mark.

comment:9 by robe, 15 years ago

I like Bob's idea of throwing in an error table or giving the option to do so though as you said we can't do a true st_isvalid check.

I think the fact there is no GEOS binding is fine. The main issue to me is that these are just thrown out and are therefore hard to repair since its easier to repair things in the db - if we could even throw just the WKT in the table that would be better than nothing.

comment:10 by mcayland, 15 years ago

What is the current status of this? Is anyone planning to work on this for 1.4?

ATB,

Mark.

comment:11 by robe, 15 years ago

I think we can let this slide.

comment:12 by robe, 15 years ago

Description: modified (diff)
Milestone: 2.0
Version: 1.4

comment:13 by pramsey, 14 years ago

Owner: changed from robe to mcayland
Status: assignednew

comment:14 by pramsey, 14 years ago

Milestone: PostGIS 1.5.0PostGIS Future

I'm rolling this out to Future

comment:15 by pramsey, 14 years ago

Version: 1.4trunk

comment:16 by jadams, 13 years ago

Description: modified (diff)
Resolution: newfixed
Status: newclosed

At some point (probably the refactoring into -core that happened for 1.5) this behavior was changed to always use a single transaction, rather than one per 250 records.

I've added a "-e" command line flag that prevents using a transaction. Revision 6909 in trunk.

The discussion of what to do with invalid geometries has been moved into its own ticket: http://trac.osgeo.org/postgis/ticket/859

comment:17 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.