Opened 14 years ago

Closed 7 years ago

Last modified 7 years ago

#1014 closed enhancement (fixed)

GEOMETRY type is not "hashable"

Reported by: vince Owned by: pramsey
Priority: medium Milestone: PostGIS 2.5.0
Component: postgis Version: master
Keywords: GEOMETRY hash UNION Cc:

Description

It seems the GEOMETRY type is not hashable, which prevents recursive WITH clauses to work with the UNION keyword:

WITH RECURSIVE path (id, the_geom, z) AS (SELECT id, the_geom, z_min AS z FROM "bâti" WHERE z_min = (SELECT MAX(z_min) FROM "bâti") UNION SELECT "bâti".id, "bâti".the_geom, z FROM path, "bâti" WHERE ST_Distance ("bâti".the_geom, path.the_geom) < 20 AND "bâti".z_min > path.z - 10) SELECT * FROM path;
ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

Using UNION ALL instead of UNION works, but produces duplicates.

I have no clue on how to make the GEOMETRY type hashable, but if it is just providing a hash function, it might maybe worth it?

Change History (18)

comment:1 by pramsey, 14 years ago

What PgSQL version are you on?

comment:2 by vince, 14 years ago

9.0.3

psql -V
psql (PostgreSQL) 9.0.3
contains support for command-line editing

Is it tied somehow to PostGreSQL?

comment:3 by mcayland, 14 years ago

Unfortunately structures containing IEEE-754 floating point numbers cannot be marked HASHABLE because of the "negative zero" problem. There is an explanation of this here: http://www.virtualdub.org/blog/pivot/entry.php?id=259.

I suspect that this will have to be marked as a "won't fix" :(

comment:4 by robe, 14 years ago

Milestone: PostGIS 2.0.0PostGIS 1.5.3
Resolution: wontfix
Status: newclosed

comment:5 by vince, 14 years ago

How does PostGreSQL cope with float/double numbers?

comment:6 by mcayland, 14 years ago

It seems there is an explicit check for zero added into the floating point hash function. However, in the case of PostGIS we'd have to take a copy of the geometry, isolate the floating point numbers within it, fix the zero values, hash it and destroy the copy. That's quite a lot of tricky work to do in order to make this work.

If someone can show that making geometry hashable is a big win then I would consider a patch, but it's going to need a fair bit of work.

comment:7 by vince, 14 years ago

Of course I can't alone state that. I was attempting to implement some kind of steepest descent algorithm into a single recursive SQL query; was thinking also about identifying areas potentially liable to flooding by growing iteratively around water polygons until a certain z was reached. I guess this is far to be a major use of PostGIS.

There is, of course, a workaround: do not include the geometry column in the result table (the one that grows iteratively), and, at each stage, perform a join between the result table and the table containing geometry info (using a primary key, for example). The query is more complex but, if, as you say, writing a hash function is tricky and maybe also cumbersome, it might even be more efficient to go that way.

Anyhow, thanks a lot for your input and time.

comment:8 by chodgson, 14 years ago

Wouldn't it be just as valid to hash based on the bounding box? That's down to only 4 zero-checks. Might not be as efficient for special cases handling gridded data or similar (but would still work), but would work fine in the general cases where you have no reason to expect any two random geometries to have the same bounding box.

comment:9 by strk, 12 years ago

I just hit this issue. Can't we have a special "hash" function outputting a geohash ?

comment:10 by pramsey, 9 years ago

I hit it too. I ended up working around it by just hashing on the EWKB (bytea) and casting that back and forth. Seems silly to do that (it worked) when we could have a bad geometry hash instead.

Last edited 9 years ago by pramsey (previous) (diff)

comment:11 by pramsey, 9 years ago

Milestone: PostGIS 1.5.3PostGIS 2.2.0
Resolution: wontfix
Status: closedreopened
Type: defectenhancement
Version: 1.5.Xtrunk

I'm finding it hard to care about the issue mentioned in Mark's article. Basically, if I'm reading it right, it means that our hashing would provide a different value for:

POINT(0 0) and POINT(-0 0)

So if you tried to test equality with the hash function, you'd be told they aren't equal. And if you did a SELECT 'POINT(0 0)' UNION SELECT POINT(-0 0) you'd get back two rows instead of one.

This seems like hardly a deathly issue, and I being too cowboy here? We're already consuming POINT(NaN NaN) for infrastructural purposes, and not allowing convenient CTE recursion (in all cases) because we might accidentally duplicate 0 and -0 cases? Seem like nose cutting and face spiting.

comment:12 by pramsey, 9 years ago

Oh ha ha, turns out something is already happening…

pramsey=# select 'POINT(0 0)'::geometry UNION SELECT 'POINT(0 0)'::GEometry;
                  geometry                  
--------------------------------------------
 010100000000000000000000000000000000000000
(1 row)

pramsey=# select 'POINT(0 0)'::geometry UNION SELECT 'POINT(-0 0)'::GEometry;
                  geometry                  
--------------------------------------------
 010100000000000000000000000000000000000000
(1 row)

pramsey=# select 'POINT(-0 0)'::geometry UNION SELECT 'POINT(0 0)'::GEometry;
                  geometry                  
--------------------------------------------
 010100000000000000000000800000000000000000
(1 row)

comment:14 by pramsey, 9 years ago

Milestone: PostGIS 2.2.0PostGIS Future

Just a final note before punting (read the link given above for full details). In order to get a proper hashing behaviour, we need the = operator to be defined in both btree and hash opclasses, with the same symbol and the same semantics. Basically that means going away from bbox equality in the old btree ops. Perhaps moving the btree over to geohash comparisons for < and > (to get a semi-rational ORDER BY) and maybe a topological equality or a vertex-level equality for '='.

comment:15 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:16 by pramsey, 7 years ago

Milestone: PostGIS Fund MePostGIS 2.5.0

Now that b-tree uses geometry_cmp on the exact value, we can use it for hashability too.

comment:17 by pramsey, 7 years ago

Resolution: fixed
Status: reopenedclosed

In 16058:

Make geometry a hashable type, allowing recursive CTEs
to use geometry in their signatures.
Closes #1014

comment:18 by pramsey, 7 years ago

In 16059:

Equality test failed on points with same coordinates
and differing SRIDs, resulting in incorrect hashing.
Amazingly, this problem existed in *both* the btree
function and the new hash machinery (and it was the
btree machinery that was actually called for the
UNION operation).
References #1014

Note: See TracTickets for help on using tickets.