#110 closed enhancement (fixed)
shp2pgsql new option to not batch commit — at Version 16
Reported by: | robe | Owned by: | mcayland |
---|---|---|---|
Priority: | low | Milestone: | PostGIS Fund Me |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description (last modified by )
What steps will reproduce the problem?
- If you have some geometries that fail insert such as polygons without
closed rings, it kills the current batch and gives a more or less meaningless message
- This has happened to me a lot and a lot of users we train. A lot of the
time the data should just rightfully not be added.
- Write now to overcome this I generate a .sql file and use sed to remove
the begin commits which means I can't use the normal | to directly load
I think the simplest option is to allow users with a flag to have the option to not have begin commits so that everything is in its own transaction. That way at least a whole batch of good records aren't lost because of one bad apple.
I know we have talked about other options such as allowing these beasts into the database or nulling the geometry if invalid etc — though those options are trickier to implement and may not handle all cases.
Change History (16)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Though that would still leave me with not able to |.
The old shp2pgsql (i) does do that already. I always thought the reason for commits every 250 records was for some performance reason.
comment:3 by , 16 years ago
Sure. In terms of performance, putting everything in one transaction is a lot quicker than inserting everything separately. So removing the COMMIT every 250 records with actually speed things up, and saves you having to manually sort out by hand what got inserted and what didn't - you can just re-run the batch. It still makes sense to wrap everything within a single transaction.
So then the solution you're looking for is actually ii).
ATB,
Mark.
comment:4 by , 16 years ago
Yes 2 if possible — also have the option to NULL the geometry. I think others have asked for the ability to NULL.
Its a reality we get bad data and the last set I had to deal with was intentionally bad. They had used the same malformed geometry to represent the absense of a geometry — presumably to satisfy ESRI shapefile that requires a geometry for all records.
After trying to salvage these things — closing them and so forth — to my shock they were all the same malformed creatures.
comment:5 by , 16 years ago
Well doing i would be good too. Though not sure its worth it for 1.3.6.
Ah what am I thinking I can use sed to pipe the pipe I guess. So its not a big deal except on windows where I have no sed.
comment:6 by , 16 years ago
Sure. 1.3 codebase is maintenance only as far as I am concerned; any changes will go to 1.4. Feel free to submit a patch BTW
ATB,
Mark.
comment:7 by , 16 years ago
I was just wondering if there could be an option in shp2pgsql that puts all of the 'not isvalid(the_geom)' geometries in a same named table with a '_err' suffix?
comment:8 by , 16 years ago
The validity check at the moment is currently only OGC validity (i.e. closed rings, correct number of points), not topological (IsValid) validity. There is currently no link between the new liblwgeom and GEOS.
ATB,
Mark.
comment:9 by , 16 years ago
I like Bob's idea of throwing in an error table or giving the option to do so though as you said we can't do a true st_isvalid check.
I think the fact there is no GEOS binding is fine. The main issue to me is that these are just thrown out and are therefore hard to repair since its easier to repair things in the db - if we could even throw just the WKT in the table that would be better than nothing.
comment:10 by , 16 years ago
What is the current status of this? Is anyone planning to work on this for 1.4?
ATB,
Mark.
comment:12 by , 16 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 2.0 |
Version: | → 1.4 |
comment:13 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:14 by , 15 years ago
Milestone: | PostGIS 1.5.0 → PostGIS Future |
---|
I'm rolling this out to Future
comment:15 by , 15 years ago
Version: | 1.4 → trunk |
---|
comment:16 by , 14 years ago
Description: | modified (diff) |
---|---|
Resolution: | new → fixed |
Status: | new → closed |
At some point (probably the refactoring into -core that happened for 1.5) this behavior was changed to always use a single transaction, rather than one per 250 records.
I've added a "-e" command line flag that prevents using a transaction. Revision 6909 in trunk.
The discussion of what to do with invalid geometries has been moved into its own ticket: http://trac.osgeo.org/postgis/ticket/859
Two things here:
i) Inserting transactions throughout a large dataset is, IMHO, wrong. You either want the whole set of data inserted atomically, or you handle the consequences yourself. So I would suggest removing all of the code related to batching in LoadData() and just have one single BEGIN…COMMIT block. At least then if the insert fails, you can manually remove the BEGIN/COMMIT yourself and live with what gets inserted.
ii) The 1.4 version of loader currently throws an error if an invalid geometry is inserted; it would be fairly easy to modify this so that the erroneus geometry were simply omitted.
ATB,
Mark.