Opened 13 years ago
Closed 13 years ago
#1505 closed defect (fixed)
Deal with database dumps containing geometries with SRID > MAX_SRID
Reported by: | strk | Owned by: | pramsey |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.0.0 |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description
Currently you'd be unable to restore. SRID < -1 is fine because it gets converted to 0. Should we convert SRID > MAX_SRID to something ?
See #1504 for such case happening in real world.
Attachments (1)
Change History (23)
comment:1 by , 13 years ago
comment:4 by , 13 years ago
We just need to define ranges, CRC or simple modulo would then work at fitting in. Does anyone feel like drafting such set ?
One range would likely be "system range", which is where we promise to put standard entries. Another range would be "user range" where we suggest users to put theirs. Another would be "reserved" which is those you can't put in spatial_ref_sys but possibly have special internal meaning. Finally we'd be adding a "lost+found" range, is that your proposal Paul ? Such "lost+found" range would be the range into which rows with SRID>MAX_SRID would be put, in a deterministic way. The user should then be warned about the need of taking his geometries out of that range and into the valid "user range" instead. Correct ?
comment:6 by , 13 years ago
Well, then next step is about setting those ranges, with spatialreference.org and the abundance of bits we have in mind.
comment:7 by , 13 years ago
A proposal for ranges: http://postgis.refractions.net/pipermail/postgis-devel/2012-February/018463.html
comment:8 by , 13 years ago
There's another issue with the idea of the CRC: spatial_ref_sys limit is lower than typmod limit. We allow "internal SRIDs" in typmod but not in spatial_ref_sys. So we don't want to use the same function to clamp. The data in the dump should be clamped to the spatial_ref_sys range.
comment:9 by , 13 years ago
I've attached an example dump with high srids. Another issue with it is that it has SRID checks constraints on the table…
by , 13 years ago
Attachment: | highsrid.dump.gz added |
---|
Example dump with high and -1 srid geoms and tables
comment:10 by , 13 years ago
I'm wondering if we could re-use the "internal" range also for lost+found. It would mean that although postgis ST_Transform will NOT lookup spatial_ref_sys for those SRID values you could still add them, and you could then be somehow notified about the matter by a trigger in the table or some other mean.
comment:11 by , 13 years ago
Alright I've a stub for postgis_restore.pl dealing with table constraints and spatial_ref_sys records. Still missing (beside ranges definition approval) is the actual geometry values. I've no intention to do that part in perl (parsing HEXWKB) so we'll really need cooperation between perl and core postgis and would then be much better to set some ranges in stone.
comment:12 by , 13 years ago
Ok, here's the new plan.
1) We drop the 999000 limit in spatial_ref_sys and transform that to a WARNING raising trigger 2) We clamp any value > the typmod max into the "reserved range" (currently 999000..999999)
That way geometry values with out-of-range srids would be dragged into the "reserved range" thus being kind of unusable for reprojection purposes but identifiable by looking at the postgis_restore.pl log.
Redefining ranges would still be useful.
comment:13 by , 13 years ago
I don't really agree with clamping into the reserved range. I think that overlarge SRIDs should be mapping into a usable range. I also don't want conflicts in the reserved range. I also wonder if we can't have all of this handled in the core, without any changes at all to postgis_restore.pl (that was certainly how I was envisioning implementing it).
comment:14 by , 13 years ago
Can't be all in the core because the table definition includes explicit SRIDs mentioned. Check out the attached dump.
I've no problem avoiding the conflict in the reserved range.
comment:15 by , 13 years ago
actually, I do have the problem of clamp_srid() being unable to deal with both dump imports and normal runs. When should it clamp out-of-reserved and when out-of-valid ?
comment:16 by , 13 years ago
To be clearer: if the dump has a check like this:
"enforce_srid_geom2" CHECK (st_srid(geom2) = 999002)
The core can't do anything about it, it'll always fail, so we need to fix at the restore end.
Converting to typmod could do, but anyway if we _are_ the core we can't refuse to use 999002 because otherwise our system of reserved srids handling would also fail.
comment:17 by , 13 years ago
So bottom line: for the sake of CLAMPING we can only clamp at the hard-limit (currently pretending to be 999999).
comment:18 by , 13 years ago
in turn this means that we _have_ to accept SRIDs in the range 999000..999999 anyway, which means we have to raise (or drop) the check on spatial_ref_sys. At that point why not reusing the range _also_ for lost+found ? Clamping is a corner-case operation anyway, and must/will be noisy. It's currently an EXCEPTION.
comment:19 by , 13 years ago
Oh, about spatial_ref_sys… we do not really need to raise the check as postgis_restore.pl already takes care of dropping it if can't be applied. So fair players will have the right check, while those upgrading and having hard deviant SRIDs will not.
comment:20 by , 13 years ago
So, early testing immediately hit the known limit:
WARNING: SRID value 20123123 converted to 999123 WARNING: SRID value 1123123123 converted to 999123
ERROR: duplicate key value violates unique constraint "spatial_ref_sys_pkey"
That's converting using modulo over the reserved space composed by 1000 slots. With same algorithm used by postgis_restore.pl and core.
comment:21 by , 13 years ago
Not sure what the point is here… we're mapping 4B values into 1000 slots, it's known we're going to have a chance of collision. Particularly if we use an even modulo (1000) peoples tendency to build SRIDs by starting from a round number will drastically increase odds of collision. Using a CRC is one way to avoid this. Another is to just go modulo 999.
comment:22 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Alright, as of r9145 we have the cooperative clamping in postgis_restore and core. The attached dump restores with a single non-blocking error:
ERROR: check constraint "spatial_ref_sys_srid_check" is violated by some row
That error is expected and doesn't prevent all entries from being restored. This was an already implemented "feature" of disabling the check and attempting to recreate it later for the sake of reestablishing it or raising an error to the user attention.
Actually, now that we clamp the values we do know that NO SRID > MAX_SRID will end up in spatial_ref_sys so could avoid that warning, except that the original spatial_ref_sys wants SRID to be also ⇐ MAX_USER_SRID, which is before the reserved zone…
Anyway, this ticket is completed. I would postpone cleanup (getting the range values in a central place, possibly at ./configure time) to after deciding about the actual range values.
OK, here's the plan…
If the SRID is > MAX_SRID we will take its 16-bit CRC. We'll then add that to the lower bound of a reserved space we can create to store out of range SRIDs. The advantage of the CRC is that it's deterministic, so we can apply the same trick if people insert a spatial_ref_sys record that is out of range: re-write the srid value and then insert it. That way a dump that includes geometries and spatial_ref_sys entries that are out of range will get re-written to use alternate SRIDs that are in range but still link up the geometry to the spatial_ref_sys srid value.
This is the best I can do. We can add in notices to try and get people to see the problem and be aware for the future. We might alternative re-write the auth_srid and auth_name when translating an out of bounds spatial_ref_sys entry, but that would be a nicety.