Opened 12 years ago

Closed 12 years ago

#1505 closed defect (fixed)

Deal with database dumps containing geometries with SRID > MAX_SRID

Reported by: strk Owned by: pramsey
Priority: high Milestone: PostGIS 2.0.0
Component: postgis Version: master
Keywords: Cc:

Description

Currently you'd be unable to restore. SRID < -1 is fine because it gets converted to 0. Should we convert SRID > MAX_SRID to something ?

See #1504 for such case happening in real world.

Attachments (1)

highsrid.dump.gz (43.8 KB ) - added by strk 12 years ago.
Example dump with high and -1 srid geoms and tables

Download all attachments as: .zip

Change History (23)

comment:1 by pramsey, 12 years ago

OK, here's the plan…

If the SRID is > MAX_SRID we will take its 16-bit CRC. We'll then add that to the lower bound of a reserved space we can create to store out of range SRIDs. The advantage of the CRC is that it's deterministic, so we can apply the same trick if people insert a spatial_ref_sys record that is out of range: re-write the srid value and then insert it. That way a dump that includes geometries and spatial_ref_sys entries that are out of range will get re-written to use alternate SRIDs that are in range but still link up the geometry to the spatial_ref_sys srid value.

This is the best I can do. We can add in notices to try and get people to see the problem and be aware for the future. We might alternative re-write the auth_srid and auth_name when translating an out of bounds spatial_ref_sys entry, but that would be a nicety.

comment:2 by robe, 12 years ago

+1

comment:3 by robe, 12 years ago

Another real world case #1152 .

comment:4 by strk, 12 years ago

We just need to define ranges, CRC or simple modulo would then work at fitting in. Does anyone feel like drafting such set ?

One range would likely be "system range", which is where we promise to put standard entries. Another range would be "user range" where we suggest users to put theirs. Another would be "reserved" which is those you can't put in spatial_ref_sys but possibly have special internal meaning. Finally we'd be adding a "lost+found" range, is that your proposal Paul ? Such "lost+found" range would be the range into which rows with SRID>MAX_SRID would be put, in a deterministic way. The user should then be warned about the need of taking his geometries out of that range and into the valid "user range" instead. Correct ?

comment:5 by pramsey, 12 years ago

Yes, correct.

comment:6 by strk, 12 years ago

Well, then next step is about setting those ranges, with spatialreference.org and the abundance of bits we have in mind.

comment:8 by strk, 12 years ago

There's another issue with the idea of the CRC: spatial_ref_sys limit is lower than typmod limit. We allow "internal SRIDs" in typmod but not in spatial_ref_sys. So we don't want to use the same function to clamp. The data in the dump should be clamped to the spatial_ref_sys range.

comment:9 by strk, 12 years ago

I've attached an example dump with high srids. Another issue with it is that it has SRID checks constraints on the table…

by strk, 12 years ago

Attachment: highsrid.dump.gz added

Example dump with high and -1 srid geoms and tables

comment:10 by strk, 12 years ago

I'm wondering if we could re-use the "internal" range also for lost+found. It would mean that although postgis ST_Transform will NOT lookup spatial_ref_sys for those SRID values you could still add them, and you could then be somehow notified about the matter by a trigger in the table or some other mean.

comment:11 by strk, 12 years ago

Alright I've a stub for postgis_restore.pl dealing with table constraints and spatial_ref_sys records. Still missing (beside ranges definition approval) is the actual geometry values. I've no intention to do that part in perl (parsing HEXWKB) so we'll really need cooperation between perl and core postgis and would then be much better to set some ranges in stone.

comment:12 by strk, 12 years ago

Ok, here's the new plan.

1) We drop the 999000 limit in spatial_ref_sys and transform that to a WARNING raising trigger 2) We clamp any value > the typmod max into the "reserved range" (currently 999000..999999)

That way geometry values with out-of-range srids would be dragged into the "reserved range" thus being kind of unusable for reprojection purposes but identifiable by looking at the postgis_restore.pl log.

Redefining ranges would still be useful.

comment:13 by pramsey, 12 years ago

I don't really agree with clamping into the reserved range. I think that overlarge SRIDs should be mapping into a usable range. I also don't want conflicts in the reserved range. I also wonder if we can't have all of this handled in the core, without any changes at all to postgis_restore.pl (that was certainly how I was envisioning implementing it).

comment:14 by strk, 12 years ago

Can't be all in the core because the table definition includes explicit SRIDs mentioned. Check out the attached dump.

I've no problem avoiding the conflict in the reserved range.

comment:15 by strk, 12 years ago

actually, I do have the problem of clamp_srid() being unable to deal with both dump imports and normal runs. When should it clamp out-of-reserved and when out-of-valid ?

comment:16 by strk, 12 years ago

To be clearer: if the dump has a check like this:

    "enforce_srid_geom2" CHECK (st_srid(geom2) = 999002)

The core can't do anything about it, it'll always fail, so we need to fix at the restore end.

Converting to typmod could do, but anyway if we _are_ the core we can't refuse to use 999002 because otherwise our system of reserved srids handling would also fail.

comment:17 by strk, 12 years ago

So bottom line: for the sake of CLAMPING we can only clamp at the hard-limit (currently pretending to be 999999).

comment:18 by strk, 12 years ago

in turn this means that we _have_ to accept SRIDs in the range 999000..999999 anyway, which means we have to raise (or drop) the check on spatial_ref_sys. At that point why not reusing the range _also_ for lost+found ? Clamping is a corner-case operation anyway, and must/will be noisy. It's currently an EXCEPTION.

comment:19 by strk, 12 years ago

Oh, about spatial_ref_sys… we do not really need to raise the check as postgis_restore.pl already takes care of dropping it if can't be applied. So fair players will have the right check, while those upgrading and having hard deviant SRIDs will not.

comment:20 by strk, 12 years ago

So, early testing immediately hit the known limit:

WARNING: SRID value 20123123 converted to 999123 WARNING: SRID value 1123123123 converted to 999123

ERROR: duplicate key value violates unique constraint "spatial_ref_sys_pkey"

That's converting using modulo over the reserved space composed by 1000 slots. With same algorithm used by postgis_restore.pl and core.

comment:21 by pramsey, 12 years ago

Not sure what the point is here… we're mapping 4B values into 1000 slots, it's known we're going to have a chance of collision. Particularly if we use an even modulo (1000) peoples tendency to build SRIDs by starting from a round number will drastically increase odds of collision. Using a CRC is one way to avoid this. Another is to just go modulo 999.

comment:22 by strk, 12 years ago

Resolution: fixed
Status: newclosed

Alright, as of r9145 we have the cooperative clamping in postgis_restore and core. The attached dump restores with a single non-blocking error:

ERROR:  check constraint "spatial_ref_sys_srid_check" is violated by some row

That error is expected and doesn't prevent all entries from being restored. This was an already implemented "feature" of disabling the check and attempting to recreate it later for the sake of reestablishing it or raising an error to the user attention.

Actually, now that we clamp the values we do know that NO SRID > MAX_SRID will end up in spatial_ref_sys so could avoid that warning, except that the original spatial_ref_sys wants SRID to be also ⇐ MAX_USER_SRID, which is before the reserved zone…

Anyway, this ticket is completed. I would postpone cleanup (getting the range values in a central place, possibly at ./configure time) to after deciding about the actual range values.

Note: See TracTickets for help on using tickets.