Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#3031 closed defect (fixed)

"POINT EMPTY" as WKB defaults to "MULTIPOINT EMPTY" when restoring with copy and failing validation

Reported by: jeanchristophesaville Owned by: pramsey
Priority: medium Milestone: PostGIS 2.1.6
Component: postgis Version: 2.1.x
Keywords: POINT EMPTY, MULTIPOINT EMPTY Cc:

Description

Using version "POSTGIS="2.1.2 r12389" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.6.26" LIBJSON="UNKNOWN""

I have a table with a column definition of: ALTER TABLE "myTable" ADD COLUMN point_as_4326 geometry(Point,4326);

In which I will store 'POINT EMPTY' for invalid point geometries

However when I try and restore a previous dump file to this table I get the following error.

COPY failed for table "position": ERROR: Geometry type (MultiPoint) does not match column type (Point)

Upon further investigation I found that my version of PostGIS seems to produce the same WKB for 'POINT EMPTY' and 'MULTIPOINT EMPTY' which in turn fails the column validation of geometry(Point,4326);

To demonstrate the problem behaviour:

SELECT ST_GeomFromText('POINT EMPTY') —> producess '010400000000000000'

SELECT ST_GeomFromText('MULTIPOINT EMPTY') —> producess '010400000000000000' SELECT ST_GeomFromText('POINT EMPTY') = ST_GeomFromText('MULTIPOINT EMPTY') —> produces true

SELECT ST_AsText('010400000000000000') —> produces 'MULTIPOINT EMPTY' — I expect 'POINT EMPTY' here

SELECT ST_AsText(ST_GeomFromText('POINT EMPTY')) —> However produces 'POINT EMPTY'

It seems that there are explicit constants for various empty geometry types as per the following SELECT ST_AsText('010200000000000000') —"LINESTRING EMPTY" SELECT ST_AsText('010300000000000000') —"POLYGON EMPTY" SELECT ST_AsText('010400000000000000') —"MULTIPOINT EMPTY" SELECT ST_AsText('010500000000000000') —"MULTILINESTRING EMPTY" SELECT ST_AsText('010600000000000000') —"MULTIPOLYGON EMPTY" SELECT ST_AsText('010700000000000000') —"GEOMETRYCOLLECTION EMPTY" SELECT ST_AsText('010800000000000000') —"CIRCULARSTRING EMPTY" SELECT ST_AsText('010900000000000000') —"COMPOUNDCURVE EMPTY"

However I'm not sure what happened to "POINT EMPTY" which I haven't come accross yet

In version "POSTGIS="1.5.8" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.6.26" USE_STATS (procs from 1.5 r5976 need upgrade)" SELECT ST_GeomFromText('POINT EMPTY') — produces '010700000000000000' = "GEOMETRYCOLLECTION EMPTY"

All of the above to ask why are we not using an explicit POINT EMPTY value and how can I get my previous backup restored without failing geometry type validation.

Change History (30)

comment:1 by robe, 9 years ago

Milestone: Management 2.0PostGIS 2.1.6

comment:2 by robe, 9 years ago

It's complicated :). As I recall it being explained to me - its a mismatch between the canonical and the internal representation. Since backup uses the canonical, canonical doesn't know how to represent POINT EMPTY so restore comes back as MULTIPOINT EMPTY.

pramsey can explain better why this limitation. I can't honestly remember why it was so complicated to get the canonical right. This is a known issue but I guess few have stumbled on the issue with restore which makes this a bit more serious.

I thought we had this in our ticket system somewhere as an open issue, but I can't find it.

comment:3 by pramsey, 9 years ago

Every type except point as a WKB representation that goes

endian[byte] + typenumber[integer] + numpoints[integer]

those are the patterns you're seeing for the other types. endian, type, and then a zero, indicating the object has zero elements or zero points

trouble is, the wkb for point is

endian[byte] + typenumber[integer] + x[double} + y[double]

there's nowhere to put in the information that the point is actually empty

our internal representation is slightly fluffier, but in order to retain backwards compatibility with postgis 1.x when we moved to 2.x we kept the same canonical representation… this allowed things like apps (mapserver, geoserver, fme, etc) to keep on functioning w/o changes against 2.0 which was great. the only problem we've had is this compromise we had to make on POINT EMPTY

probably the "best" fix would be to stop creating POINT EMPTY in *any* eventuality and always create a MULTIPOINT EMPTY if called upon for an empty pointy thing

comment:4 by pramsey, 9 years ago

Hm, other alternative representations for POINT EMPTY:

  • POINT(NaN NaN)
  • NULL

Neither is ideal, but each at least matches the type of the original, so wouldn't cause the constraint failure we have here.

comment:5 by strk, 9 years ago

POINT(NaN NaN) is good for me. At least there's another known implementor of it: http://lists.osgeo.org/pipermail/gdal-dev/2013-December/037734.html

I don't think this should go in 2.1 though.

comment:6 by pramsey, 9 years ago

On the one hand, it's a "big" change, on the other hand, without it, folks on the current stable branch cannot dump/restore a point table that has an EMPTY in it. That feels like a bug, and worthy of fixing here.

comment:7 by strk, 9 years ago

Changing the WKB to "POINT NAN NAN" won't help with restoring existing dumps, so we'll still need support from postgis_restore.pl to do that. Or, for those NOT using postgis_restore.pl, we might need to make the constraint looser (add OR ST_IsEmpty() ?)

comment:8 by pramsey, 9 years ago

Not for existing dump files, no, but it's not like people aren't out there running backups on their 2.1 databases every day. The sooner we fix this, the sooner it is fixed.

comment:9 by robe, 9 years ago

I'm fine with this going into 2.1

comment:10 by robe, 9 years ago

I should add why I think this is important to get in NOW — we can't count on constraints since most people use typmod now (like this guy) and besides there are so many ways people restore their data that we can't expect people to have postgis_restore.pl

comment:11 by pramsey, 9 years ago

OK, a patch for this at r13359. The patch is actually a suggestion from strk, to just allow MULTIPOINT EMPTY inputs a special pass through the typmod checker. It turned out to be less invasive code than I feared, so I went for that. The reason I did was to avoid emitting POINT(Nan nan) for our WKB output, which could potentially break downstream applications not expecting such a fun construct.

Question: for trunk, do we go to POINT(nan nan) or keep our funny little hack in place?

comment:12 by pramsey, 9 years ago

OK, so that fix actually breaks Regina's carefully constructed all-on-all regression test for typmod, which is so complex I cannot change it myself. I backed it out. Comments?

comment:13 by robe, 9 years ago

what test is that?

comment:14 by pramsey, 9 years ago

regress/typmod

comment:15 by robe, 9 years ago

I think you are giving me way too much credit. I don't think I wrote that. Looks more like strk's craftmanship.

comment:16 by robe, 9 years ago

Proof it's strk's work — #1085

comment:17 by robe, 9 years ago

I looked at the regress and your failures here:

http://debbie.postgis.net:8080/job/PostGIS_Regress/5936/consoleFull

I guess the question is if we need to change the test or just the regress output or your code. I thinkyou need to change your code (as its still not right, but the test output needs to be changed as well since it assumes screwed up behavior which we are trying to fix)

So here is how you read it. ( If I read this right, then I've probably been spending way too much time talking with strk :) )

pattern is: table_to_insert|srid_of_geom|type_of_geom|zmflag_geom|out_status

for out_status: OK means text insert pass KO means text insert fail BOK is binary insert pass (binary is from a COPY proces so the equivalent of our restore) BKO is binary insert failure GOK is text geography pass, GKO: geography plain text failure etc.

and sample of putput from debbie

-point|0|Point|0|KO-BKO-GKO:-BGKO
+point|0|Point|0|OK-BKO-GOK-BGKO
-pointm4326|0|Point|1|KO-BKO-GKO:-BGKO
+pointm4326|0|Point|1|OK-BKO-GOK-BGKO

So from the above assuming I am reading these okay, I'm a little concerned about the fact that you can do a text rep insert but not a binary as I thought this was supposed to fix that issue.

I could very well be misreading this. But I would expect the answers to be for new not the same as the old, but not what you are emitting

--this is what it should be
point|0|Point|0|OK-BOK-GOK-BGOK

-- this is what the old returns
point|0|Point|0|KO-BKO-GKO:-BGKO

-- this is still wrong (assume this is with your NAN NAN)
point|0|Point|0|OK-BKO-GOK-BGKO

To quickly test this is what I get with current which is wrong behavior (but what current emits)

CREATE SCHEMA tm;
CREATE TABLE tm.point(id serial, g geometry(point), gg geography(point);

-- note this is mimicking strk's test without resorting to backup and restore
 (this test is a bit flawed  as you really aren't testing POINT EMPTY but POINT EMTPY casted to geomety casted as text to unknown so it's dumb luck the answers look agreeable

-- meh KO
-- ERROR:  Geometry type (MultiPoint) does not match column type (Point)
********** Error **********
INSERT INTO tm.point(g)
VALUES('POINT EMPTY'::geometry::text || '');

-- meh GKO
INSERT INTO tm.point(gg)
VALUES('POINT EMPTY'::geography::text || '');

-- meh KO, BGKO 
-- ERROR:  Geometry type (MultiPoint) does not match column type (Point)
INSERT INTO tm.point(g, gg)
VALUES('POINT EMPTY'::geometry::bytea, 'POINT EMPTY'::geography::bytea);

comment:18 by strk, 9 years ago

Hey, Regina, thanks for this in-depth description of the test. I'm the author but I swear it was useful for me to read your documentation. Could you maybe write it (if not already present) in the typmod.sql test file, for future reference ?

I was wondering, reading your description, what is the ":" symbol for ? In

-pointm4326|0|Point|1|KO-BKO-GKO:-BGKO
+pointm4326|0|Point|1|OK-BKO-GOK-BGKO

comment:19 by strk, 9 years ago

btw, I agree about updating the test to STOP expecting the bogus behaviour of being unable to restore point empty

comment:20 by robe, 9 years ago

strk you have in your code :-BGKO (as signifier of binary geography fail) I wasn't sure why you put a : there maybe its a typo in the code we should remove.

comment:21 by strk, 9 years ago

looked, I agree it's a typo, to be removed once we change expected results.

comment:22 by strk, 9 years ago

Actually, typo fixed with r13361 in 2.1 branch and r13362 in trunk - so we don't mix style and functional changes.

comment:23 by strk, 9 years ago

Looking at the currently expected output, I see issues with both point and multipoint.

A sane expectance, for a (polygon) typmoded geometry/geography, is to accept only 2d polygons, of any srid:

polygon|0|Polygon|0|OK-BOK-GOK-BGOK
polygon|4326|Polygon|0|OK-BOK-GOK-BGOK
polygon||COUNT|4|                                                               
polygon||GCOUNT|4|

But we're also expecting (insanely, I'd say) that for a (multipoint) typmoded geometry/geography, to accept both 2d multipoints AND points:

multipoint|0|Point|0|OK-BOK-GOK-BGOK                                            
multipoint|0|MultiPoint|0|OK-BOK-GOK-BGOK                                       
multipoint|4326|Point|0|OK-BOK-GOK-BGOK                                         
multipoint|4326|MultiPoint|0|OK-BOK-GOK-BGOK
multipoint||COUNT|8|                                                            
multipoint||GCOUNT|8|                                

And (again insanely) that for a (point) typmoded geometry/geography, to accept NOTHING:

point||COUNT|0|                                                                 
point||GCOUNT|0| 

By "accept" here we mean by stepping trough WKB. The wrong failures in the point case are due to WKB converting the point into a multipoint. The wrong successes I'm not sure about (why is a point accepted in a multipoint?)

comment:24 by strk, 9 years ago

Oh, now I got why for a multipoint we accept both multipoint and point. It is because the _value_ of POINT EMPTY becomes a multipoint and is thus accepted. So it's still the same exact problem we're trying to address.

So, with the workaround fix in place, we should be accepting both points and multipoints (we only test empties) for the point case and we'd be still accepting both points and multipoints (empty) for the multipoint case. Basically it would be making the point typmod as "tolerant" as the multipoint one (when it comes to empties).

The test should probably be further enhanced to also test non-empties…

comment:25 by strk, 9 years ago

With r13363 in 2.1 branch and r13364 in trunk I've added the non-empty point and multipoint cases to the test, confirming that a "point" typmod'ed geometry/geography do can accept non-empty points and a "multipoint" typmod'ed geometry/geography cannot accept non-empty points:

point|0|Point|0|KO-BKO-GKO-BGKO
point|0|PointNE|0|OK-BOK-GOK-BGOK
point|4326|Point|0|KO-BKO-GKO-BGKO
point|4326|PointNE|0|OK-BOK-GOK-BGOK
point||COUNT|4|
point||GCOUNT|4|
multipoint|0|Point|0|OK-BOK-GOK-BGOK
multipoint|0|PointNE|0|KO-BKO-GKO-BGKO
multipoint|0|MultiPoint|0|OK-BOK-GOK-BGOK
multipoint|0|MultiPointNE|0|OK-BOK-GOK-BGOK
multipoint|4326|Point|0|OK-BOK-GOK-BGOK
multipoint|4326|PointNE|0|KO-BKO-GKO-BGKO
multipoint|4326|MultiPoint|0|OK-BOK-GOK-BGOK
multipoint|4326|MultiPointNE|0|OK-BOK-GOK-BGOK
multipoint||COUNT|12|
multipoint||GCOUNT|12|

comment:26 by pramsey, 9 years ago

OK, this has been a good process, the patch is now much more sensible (IMO) and committed at r13367. I have *not* changed the regress tests, they are going to fail, but that's on purpose, because I want you to review the failures and make sure they are sane to you. To my eyes they look right.

comment:27 by strk, 9 years ago

I'm ok with r13367, please update the test (you can add —expect to RUNTESTFLAGS) and forward-port the whole thing.

comment:28 by pramsey, 9 years ago

Resolution: fixed
Status: newclosed

Committed to trunk at r13372. All up-to-date. Should patch 2.0 also, I suppose? (same problem)

comment:29 by strk, 9 years ago

+1 for patching 2.0 too

comment:30 by pramsey, 9 years ago

Discussion and implementation of POINT EMPTY =⇒ POINT(NaN NaN) in #3181

Note: See TracTickets for help on using tickets.