#1014 closed enhancement (fixed)
GEOMETRY type is not "hashable"
Reported by: | vince | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.5.0 |
Component: | postgis | Version: | master |
Keywords: | GEOMETRY hash UNION | Cc: |
Description
It seems the GEOMETRY type is not hashable, which prevents recursive WITH clauses to work with the UNION keyword:
WITH RECURSIVE path (id, the_geom, z) AS (SELECT id, the_geom, z_min AS z FROM "bâti" WHERE z_min = (SELECT MAX(z_min) FROM "bâti") UNION SELECT "bâti".id, "bâti".the_geom, z FROM path, "bâti" WHERE ST_Distance ("bâti".the_geom, path.the_geom) < 20 AND "bâti".z_min > path.z - 10) SELECT * FROM path; ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable.
Using UNION ALL instead of UNION works, but produces duplicates.
I have no clue on how to make the GEOMETRY type hashable, but if it is just providing a hash function, it might maybe worth it?
Change History (18)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
9.0.3
psql -V psql (PostgreSQL) 9.0.3 contains support for command-line editing
Is it tied somehow to PostGreSQL?
comment:3 by , 14 years ago
Unfortunately structures containing IEEE-754 floating point numbers cannot be marked HASHABLE because of the "negative zero" problem. There is an explanation of this here: http://www.virtualdub.org/blog/pivot/entry.php?id=259.
I suspect that this will have to be marked as a "won't fix"
comment:4 by , 14 years ago
Milestone: | PostGIS 2.0.0 → PostGIS 1.5.3 |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
comment:6 by , 14 years ago
It seems there is an explicit check for zero added into the floating point hash function. However, in the case of PostGIS we'd have to take a copy of the geometry, isolate the floating point numbers within it, fix the zero values, hash it and destroy the copy. That's quite a lot of tricky work to do in order to make this work.
If someone can show that making geometry hashable is a big win then I would consider a patch, but it's going to need a fair bit of work.
comment:7 by , 14 years ago
Of course I can't alone state that. I was attempting to implement some kind of steepest descent algorithm into a single recursive SQL query; was thinking also about identifying areas potentially liable to flooding by growing iteratively around water polygons until a certain z was reached. I guess this is far to be a major use of PostGIS.
There is, of course, a workaround: do not include the geometry column in the result table (the one that grows iteratively), and, at each stage, perform a join between the result table and the table containing geometry info (using a primary key, for example). The query is more complex but, if, as you say, writing a hash function is tricky and maybe also cumbersome, it might even be more efficient to go that way.
Anyhow, thanks a lot for your input and time.
comment:8 by , 14 years ago
Wouldn't it be just as valid to hash based on the bounding box? That's down to only 4 zero-checks. Might not be as efficient for special cases handling gridded data or similar (but would still work), but would work fine in the general cases where you have no reason to expect any two random geometries to have the same bounding box.
comment:9 by , 12 years ago
I just hit this issue. Can't we have a special "hash" function outputting a geohash ?
comment:10 by , 9 years ago
I hit it too. I ended up working around it by just hashing on the EWKB (bytea) and casting that back and forth. Seems silly to do that (it worked) when we could have a bad geometry hash instead.
comment:11 by , 9 years ago
Milestone: | PostGIS 1.5.3 → PostGIS 2.2.0 |
---|---|
Resolution: | wontfix |
Status: | closed → reopened |
Type: | defect → enhancement |
Version: | 1.5.X → trunk |
I'm finding it hard to care about the issue mentioned in Mark's article. Basically, if I'm reading it right, it means that our hashing would provide a different value for:
POINT(0 0) and POINT(-0 0)
So if you tried to test equality with the hash function, you'd be told they aren't equal. And if you did a SELECT 'POINT(0 0)' UNION SELECT POINT(-0 0) you'd get back two rows instead of one.
This seems like hardly a deathly issue, and I being too cowboy here? We're already consuming POINT(NaN NaN) for infrastructural purposes, and not allowing convenient CTE recursion (in all cases) because we might accidentally duplicate 0 and -0 cases? Seem like nose cutting and face spiting.
comment:12 by , 9 years ago
Oh ha ha, turns out something is already happening…
pramsey=# select 'POINT(0 0)'::geometry UNION SELECT 'POINT(0 0)'::GEometry; geometry -------------------------------------------- 010100000000000000000000000000000000000000 (1 row) pramsey=# select 'POINT(0 0)'::geometry UNION SELECT 'POINT(-0 0)'::GEometry; geometry -------------------------------------------- 010100000000000000000000000000000000000000 (1 row) pramsey=# select 'POINT(-0 0)'::geometry UNION SELECT 'POINT(0 0)'::GEometry; geometry -------------------------------------------- 010100000000000000000000800000000000000000 (1 row)
comment:13 by , 9 years ago
This is looking harder than ever…
http://postgresql.nabble.com/Hashable-custom-types-tp5801576p5857261.html
comment:14 by , 9 years ago
Milestone: | PostGIS 2.2.0 → PostGIS Future |
---|
Just a final note before punting (read the link given above for full details). In order to get a proper hashing behaviour, we need the = operator to be defined in both btree and hash opclasses, with the same symbol and the same semantics. Basically that means going away from bbox equality in the old btree ops. Perhaps moving the btree over to geohash comparisons for < and > (to get a semi-rational ORDER BY) and maybe a topological equality or a vertex-level equality for '='.
comment:16 by , 7 years ago
Milestone: | PostGIS Fund Me → PostGIS 2.5.0 |
---|
Now that b-tree uses geometry_cmp on the exact value, we can use it for hashability too.
What PgSQL version are you on?