Opened 14 years ago

Closed 14 years ago

#253 closed defect (fixed)

Gist causes wrong result from ~=

Reported by: nicklas Owned by: pramsey
Priority: medium Milestone: PostGIS 1.4.1
Component: postgis Version: 1.3.X
Keywords: Cc:

Description

Something seems to have happen between postgis 1.3 and 1.4 or postgresql 8.3 to 8.4. I couldn't reproduce on 8.3 1.3, but on 8.4 both with trunk and realeased 1.4.0

When running this on the attached dataset:

select a.the_geom ~= b.the_geom from test a,test b where a.gid != b.gid and a.the_geom ~= b.the_geom;

I get the little bit absurd answer:
1537;1553;f
1553;1537;f

so the answer from ~= is different when appearing in the select-part then in the where-part

Without the index this doesn't happen. Then they are sorted away by the where-part. Also if I take away any of the other polygons of the set this fenomena dissapears. My conclusion is this indicates that the gist-index is involved when built on this particular dataset. The dataset was bigger from the beginning but I tried to take away as much as possible still preserving the error.

Something seems to have happen between postgis 1.3 and 1.4 or postgresql 8.3 to 8.4. I couldn't reproduce on 8.3 1.3, but on 8.4 both with trunk and realeased 1.4.0

/Nicklas

Attachments (2)

test.backup (2.0 KB ) - added by nicklas 14 years ago.
testcase2.backup (1.8 KB ) - added by nicklas 14 years ago.

Download all attachments as: .zip

Change History (36)

by nicklas, 14 years ago

Attachment: test.backup added

comment:1 by nicklas, 14 years ago

I have changed the test-table. Now it is more obvious that it by some reason only compares the boundingboxes for two particular polygons in the where-part but comparing as expected vertex to vertes in the select-part. The new testset only contains the result of st_envelope expect for the problematic polygons where I have edited the envelope-boxes without affecting their bboxes.

If it, as I think, is a gist-bug, can it still be PostGIS or is it PostgreSQL then?

/Nicklas

comment:2 by robe, 14 years ago

Well it does appear to be something amiss in our PostGIS 1.4. I tried your example on a PostgreSQL 8.2 windows install. The 1.3 returns no results as expected but the 1.4 returns the same absurd results you have just described.

One database running 1.3 and another running 1.4. I think its our removal of RECHECK that is doing it. So maybe its not a bug in postgis, but our removal of RECHECK may be exposing a bug in GIST.

I haven't tried yet on PostgreSQL 8.4, but on my PostgresSQL 8.2, if I create a new spatial database with 1.4, but instead switch the operator classes definition to this

CREATE OPERATOR CLASS gist_geometry_ops DEFAULT
   FOR TYPE geometry USING gist AS
   OPERATOR 1  <<,
   OPERATOR 2  &<,
   OPERATOR 3  &&,
   OPERATOR 4  &>,
   OPERATOR 5  >>,
   OPERATOR 6  ~= RECHECK,
   OPERATOR 7  ~,
   OPERATOR 8  @,
   OPERATOR 9  &<|,
   OPERATOR 10  <<|,
   OPERATOR 11  |>>,
   OPERATOR 12  |&>,
   FUNCTION 1  lwgeom_gist_consistent(internal, geometry, integer),
   FUNCTION 2  lwgeom_gist_union(bytea, internal),
   FUNCTION 3  lwgeom_gist_compress(internal),
   FUNCTION 4  lwgeom_gist_decompress(internal),
   FUNCTION 5  lwgeom_gist_penalty(internal, internal, internal),
   FUNCTION 6  lwgeom_gist_picksplit(internal, internal),
   FUNCTION 7  lwgeom_gist_same(box2d, box2d, internal)

Then the problem goes away.

comment:3 by robe, 14 years ago

Okay looking at our 1.3.7SVN exhibits the same problem. I see in the 1.3.7 we took out RECHECK. I ran 1.3.7 on my 8.3 windows install and it exhibits the same problem. The comment says

— — Create opclass index bindings for PG>83 — (No RECHECK since this is now handled as part of the GiST Access methods) —

So maybe its not really handled or not handled right. Though my 8.3 dev install is old 8.3.1 (which one are you running Nicklas?). I was going to upgrade to the latest to see if it fixes the issue.

comment:4 by nicklas, 14 years ago

I have tried: 8.3.7 with postgis 1.3.5 wich gives the expected result, returning nothing

Then I have tried 8.4.1 with postgis 1.4.0 and my spike, both giving the strange result with returning two rows. It is difficult to investigate what is happening since the fenomen dissapears at once you take away any of the other polygons. It must be something with the way the index is built with this specific dataset. I guess it makes this a quite mean bug since it can hide wrong answers among many correct answers. how to see how the index is built in this case?

/Nicklas

comment:5 by robe, 14 years ago

It sounds similar to the bug we had a while ago where the bboxes were not being regenerated, though that was a GIST bug fixed in 8.3.5. I suspect this one is tooa gist bug too.

I also tried with 8.3.7 (and if I remove the RECHECK (basically running with a 1.3.7)), it behaves badly like the 1.4.0. So its nothing different in our codebase — just that RECHECK that is camouflaging what I think to be a gist bug. I'm not even quite clear what RECHECK does. Mark or Paul — can you explain that. I'm concerned this is serious that may bite us in other unpredictable ways with the more common &&.

comment:6 by mcayland, 14 years ago

Yeah sorry. I've been half-following this but haven't really had any spare time to pick this up. Things should be back to normal now, so I'm happy to dig into this over the next week to find out what's actually happening.

ATB,

Mark.

comment:7 by robe, 14 years ago

Wait a minute. ~= is supposed to be an exact geometry matcher (it is the true geometry equality operator), how can you possibly not have that set to not RECHECK. Of course it is wrong.

I think we need to put RECHECK back into the 1.3.7 and 1.4.* though that annoyingly requires a dump restore for people to fix.

Nick — sorry I didn't look at the other issues you mentioned — there might be something else going on — but definitely not having RECHECK on ~= seems wrong to me.

comment:8 by mcayland, 14 years ago

No, this is wrong as all indexable operators (which is all of the ones in the gist_geometry_ops class) only work on the bounding boxes of the geometries - remember that && is only a bounding box overlap. So ~= should return true if the bounding boxes are equal, regardless of whether or not the geometries are the same.

ATB,

Mark.

comment:9 by nicklas, 14 years ago

Not according to the manual Mark

http://www.postgis.org/documentation/manual-svn/ST_Geometry_Same.html[[BR]]

And the thing here is that ~= operator gives different answer in the where-part and the select-part.

/Nicklas

comment:10 by robe, 14 years ago

Mark,

I disagree with you on this. It has always been that ~= is an exact geometry operator. I have always used it as such. the = operator is the equality operator. Our documentation has always said that ~= is true geometry equality operator — its the only one of the operators that does not work on just bounding box there it is probably the only one that requires a recheck.

I refer you to the docs — see how in the index — the ~= explicitly says geometry while the others say bounding box.

http://www.postgis.org/documentation/manual-svn/ch07.html#Operators http://www.postgis.org/documentation/manual-svn/ST_Geometry_Same.html

It is the ONLY operator that works against the geometry — which makes it a bit different from the others.

comment:11 by robe, 14 years ago

I meant to say = is the bounding box equality operator which makes that terribly confusing and makes for that wonderful weird but fast behavior of our group bys, DISTINCT, order bys etc.

comment:12 by mcayland, 14 years ago

Intuitively this just seems plain wrong - all of the operators apart from this one just slice the index, so why is this one the exception? I would say this is a bug in the original behaviour. If anyone wants to compare geometry equality then they can use ST_Equals() or even create _ST prefix wrapper SQL functions which is what we have done for everything else.

ATB,

Mark.

comment:13 by robe, 14 years ago

I agree with you, but not sure its worthwhile to remove the RECHECK for this. There are a couple of problems with using ST_Equals and ST_OrderEquals which I am sure Kevin and I have brought up before — and may be in bug list somewhere.

The ST_Equals doesn't use a spatial index. Yes we should consider this a bug

CREATE OR REPLACE FUNCTION st_equals(geometry, geometry)

RETURNS boolean AS

'$libdir/postgis-1.5', 'geomequals'

LANGUAGE 'c' IMMUTABLE STRICT COST 1;

I personally never use ST_Equals when speed is a concern because of that reason or without an &&. I think we might actually have this in somewhere as bug that Kevin posted.

And that ain't really the same thing anyway. ST_Equals doesn't care about directionality and uses intersection matrix (so really a spatial equality so to speak) so is not a true geometry equality operator where as st_geometry_same is closer to true geometric equality.

It is more equivalent to ST_OrderingEquals which hmm although it appears to use a spatial index (the ordering is sometimes wrong I think so its slower than using ~=). It should just be using the explicit ST_Geometry_Same function (which shouldn't have an ST in front of it BTW) :).

comment:14 by robe, 14 years ago

Here is the ST_OrderingEquals definition — note that it too might be broken (without the recheck) if its inlined

CREATE OR REPLACE FUNCTION st_orderingequals(geometry, geometry)

RETURNS boolean AS

$BODY$

SELECT $1 && $2 AND $1 ~= $2 $BODY$

LANGUAGE 'sql' IMMUTABLE STRICT

comment:15 by robe, 14 years ago

I just thought of something. Note that I feel all these changes are so breaking in what people (granted possibly a minority of people are used to), that it should wait till 2.0.

The = operator uses a btree index not a GIST index (and I think it has to because of some sort of requirement that is baked into the PostgreSQL behavior for dealing with DISTINCT, GROUPING, ORDER etc.,

so the only equality operator we have that uses a gist index is ~=. Now I suppose you could leave the recheck out and change the ST_OrderingEquals to be

$1 ~= $2 AND _ST_OrderingEquals($1,$2)

ST_Equals — then ST_Equals could use

$1 ~= $2 AND _ST_Equals($1,$2)

(Right now note ST_Equals would have to use && which is much less efficient than a bounding box = because we have no bounding box = that uses GIST index) and we can't use ~= because well ST_Equals does not imply true geometry equality though would have a requirement that the bounding boxes are equal.

comment:16 by kneufeld, 14 years ago

Mark, I agree that intuitively this does seem wrong. We currently don't have an operator that tests just for bbox equality in gist_geometry_ops. ~= does seem to fit bill. I too think this is a bug in the original behaviour.

I like the idea to change ~= to be just bbox equality and change ST_Equals and ST_OrderingEquals appropriately as Regina suggested. (Yea, bbox equality that uses gist indexes!)

Also, I agree that this a behavioural change and will have to wait till 2.0. I for one use ~= (currently meaning exact geometry equality) a lot.

Regina, on the topic of equality, doesn't the definition of ST_OrderingEquals have a redundent index lookup in there? the && could probably be removed (at least until 2.0 when the def of ~= may change). And it looks like ST_Equals doesn't use any indexes. I was expecting to see "SELECT $1 && $2 AND _ST_Equals($1, $2)".

So, does this mean we need to put the RECHECK back in for — Kevin

comment:17 by kneufeld, 14 years ago

oops, posted too soon.

So, does this mean we need to put the RECHECK back in for ~= in 1.4.1 to fix the bug?

— Kevin

comment:18 by robe, 14 years ago

Milestone: postgis 1.5.0postgis 1.3.7
Version: 1.4.X1.3.X

I would say so and definitely put it back for 1.3.7SVN before 1.3 people are screwed. I don't think there is a way to do this without forcing everyone to dump reload or reindex. I experimented with sneaking it in by dropping cascade and rebuilding my indexes (but I guess there is some other stuff besides index that a drop cascade does since when I tried to build my index, it crashed :().

Anyway people who upgraded there db from 1.3 to 1.4 without a dump restore are probably not affected. Just new installs I suspect.

Luckily I have a feeling Kevin, that you and I are in a minority of the people who use this operator or really know what it does. So we can patch the ST_OrderingEquals so its not affected and that will alleviate further damage for those where the drop operator class is not an option.

comment:19 by mcayland, 14 years ago

Yeah, I think between the 3 of us we've roughly agreed on what needs to be done. Regina, we never actually took RECHECK out of 1.3 branch (and I don't think we should) so I would suggest putting the milestone back to 1.4.1.

Note we still haven't come to a concensus yet on how we should handle SQL-level upgrades. This is going to become more important as it seems we need this for 1.4.1 so we need to make a decision on this reasonably soon.

ATB,

Mark.

comment:20 by robe, 14 years ago

We did in branch. We don't have it in branch 1.3 anymore so someone took it out. Put it back.

comment:21 by mcayland, 14 years ago

I think you must be thinking of trunk/1.4 branch? According to the SVN log, it has never been there.

http://trac.osgeo.org/postgis/log/branches/1.3/lwgeom/lwpostgis.sql.in

ATB,

Mark.

comment:22 by robe, 14 years ago

Its in there as a conditional — only if you are running 8.4+ it seems so probably not as big of a deal but we should put it back.

See line 1347 (#else the if you are running 8.4+ branch)

comment:23 by robe, 14 years ago

here is the smoking gun r3287

comment:24 by mcayland, 14 years ago

Oh - that's fine as the RECHECK keyword isn't used anymore for 8.4+, and so the top part of the diff enables it within the AM instead.

ATB,

Mark.

comment:25 by robe, 14 years ago

Milestone: postgis 1.3.7postgis 1.4.1

comment:26 by nicklas, 14 years ago

ok I don't understand everything from the discussion above, but I have two questions, that at least to me seems relevant :-) If the problem is this RECHECK thing, why isn't it working as expected from Paul's comment in r3287. I read it as he is putting that recheck flag there, instead of in the operator class definition. If the recheck doesn't work as expected I guess that's a good explanation of the unpredictable behavior of the bug from Paul's comment that it is the conversion from float8 to float4 that makes it nessecary. But at the other side I think that should only cause a little bit to many true-answers (because all boxes is rounded to be a little bigger) for the bbox-comparation. Why does it make the planner choose not to do the vertex to vertex comparasion.

the second question, can this recheck-thing explain that there is different answer between select-part and where-part of the query. That part of the issue worries me a bit.

and the third unannounced question. Do we (read you) have control that no other functions are affected so we don't just hide the symtom instead of finding the cause.

/Nicklas

comment:27 by robe, 14 years ago

The reason why its different in the WHERE from the SELECT part is that the indexed calculation is only used in WHERE (the index doesn't kick in on the SELECT)

Since GIST is lossy (basically only works against the bbox) and ~= was not meant to be, the GIST answer is just bbox compare. What the RECHECK does is say if the index answer returns true, check the real value. Without the RECHECK, it would accept the answer of the index which in this case is wrong. So the compare based on index would just be checking the 4 numbers of the bbox.

the float4, float8 issue I haven't thought of, but that would cause the situation where the index answer is wrong even for a bbox case I guess.

by nicklas, 14 years ago

Attachment: testcase2.backup added

comment:28 by nicklas, 14 years ago

now I'm little confused. I thought I found this as a special case, because I had to struggle to minimize the dataset without destroying the error.

If you try se same thing at testcase2 it will work as expected and return nothing and the only difference is that I took away one of the other polygons and ran vacuum full analyze on it.

But now I see that it maybe is the planner that is cheating me. on testcase2 it
doesn't seem to use the index:
it seems like the change in behavior is because the planner decides the dataset is too small for using the index. Am I right? /Nicklas

comment:29 by robe, 14 years ago

That sounds about right. If the planner doesn't use the index you should see the behavior you expect whereas when it does — it will use the bboxes from the index to satisfy the WHERE and the geometry from the table to satisfy the SELECT.

not all databases behave this way by the way and PostgreSQL may change later to implement covered indexes the way say SQL Server does so that the index value may also be used if you mark the answer as not lossy (so it never needs to hit the table). In that case the difference will be totally camouflaged and harder to troubleshoot.

comment:30 by nicklas, 14 years ago

I just noticed that ~= is defined in postgresql too. As I read it there is the same definition as postgis of today that the geometries should be "the same" to return true even if the example given, makes me quite confused if the order of vertexes matters or not. http://www.postgresql.org/docs/8.4/static/functions-geometry.html

just a note toplans of changing ~= to just compare bboxes from the discussion above.

comment:31 by pramsey, 14 years ago

Using the 8.4 feature of conditional re-checking, I've re-enabled re-check on just the ~= operator (r4736).

comment:32 by pramsey, 14 years ago

Also applied into 1.4 branch at r4738.

comment:33 by pramsey, 14 years ago

Fix for older versions of PgSQL also committed to 1.4 branch at r4756. Turns out this can be done an operator at a time, so no harm to other ops.

comment:34 by pramsey, 14 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.