Opened 4 months ago
Closed 3 months ago
#5734 closed defect (fixed)
Wrong ST_EstimatedExtent result for geography column
Reported by: | strk | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.4.3 |
Component: | postgis | Version: | 3.4.x |
Keywords: | Cc: |
Description
See this session:
strk=# select id, st_astext(g) from qgis_issue_30294; id | st_astext ----+---------------- 1 | POINT(-10 -10) 2 | POINT(20 30) (2 rows) strk=# analyze qgis_issue_30294; ANALYZE strk=# select st_estimatedextent('qgis_issue_30294', 'g'); st_estimatedextent ------------------------------------------------------------------------------- BOX(0.813017427921295 -0.173346117138863,0.970626592636108 0.298534214496613) (1 row)
The g
column is of geography type.
The estimated extent is plain wrong
Change History (18)
comment:1 by , 4 months ago
comment:2 by , 4 months ago
This problem affects QGIS downstream, see https://github.com/qgis/QGIS/pull/57514#issuecomment-2124451905 - at the moment QGIS just "assumes" the extent of geography-based layers is always -180..180 -90..90 as it cannot trust PostGIS to tell
comment:3 by , 4 months ago
The estimate itself seems off:
DEBUG: analyzing "public.qgis_issue_30294" DEBUG: "qgis_issue_30294": scanned 1 of 1 pages, containing 2 live rows and 0 dead rows; 2 rows in sample, 2 estimated total rows DEBUG: [gserialized_estimate.c:compute_gserialized_stats_mode:1419] compute_gserialized_stats called DEBUG: [gserialized_estimate.c:compute_gserialized_stats_mode:1420] # sample_rows: 2 DEBUG: [gserialized_estimate.c:compute_gserialized_stats_mode:1421] estimate of total_rows: 2 DEBUG: [gserialized_estimate.c:nd_box_from_gbox:602] GBOX((0.96984631,-0.17101008,0),(0.96984637,-0.17101006,0)) DEBUG: [gserialized_estimate.c:nd_box_from_gbox:602] GBOX((0.81379765,0.29619813,0),(0.81379771,0.29619816,0))
I don't see how the above matches the actual boxes:
strk=# select g::geometry::box2d from qgis_issue_30294; g ---------------------- BOX(-10 -10,-10 -10) BOX(20 30,20 30) (2 rows)
comment:4 by , 4 months ago
The gserialized_datum_get_gbox_p returns random data when the datum has no cached bbox:
DEBUG: [gserialized_gist_2d.c:gserialized_datum_get_internals_p:491] gserialized has bbox ? 0 DEBUG: [gserialized_gist_2d.c:gserialized_datum_get_internals_p:492] gserialized gpart size : 32 DEBUG: [gserialized_gist_2d.c:gserialized_datum_get_internals_p:493] gserialized_max_header_size(): 52 DEBUG: [gserialized_estimate.c:compute_gserialized_stats_mode:1464] gserialized_datum_get_gbox_p returned box[0.813798 0.813798, 0.296198 0.296198]
comment:5 by , 4 months ago
according to logs the box is even calculated:
DEBUG: [gserialized.c:gserialized_get_gbox_p:100] gserialized_get_gbox_p: version is 1 DEBUG: [gserialized2.c:lwgeom_from_gserialized2_buffer:1472] Got type 1 (Point), hasz=0 hasm=0 geodetic=1 hasbox=0 DEBUG: [gserialized2.c:gserialized2_get_gbox_p:607] box is calculated DEBUG: [gserialized_estimate.c:compute_gserialized_stats_mode:1464] gserialized_datum_get_gbox_p returned box[0.813798 0.813798, 0.296198 0.296198]
The culprit seems to be ptarray_calculate_gbox_geodetic
comment:6 by , 4 months ago
It looks like the lwgeom_calculate_gbox_geodetic
function is called often by tests in liblwgeom/cunit/cu_geodetic.c but its results are never checked. Is a geography bounding box having a different meaning from a geometry one ?
comment:7 by , 4 months ago
The problem seems at least as old as 1.5: https://gis.stackexchange.com/questions/173022/bounding-rectangle-of-a-geography
comment:8 by , 4 months ago
If the problem is "how can QGIS get a bounds for the data set" that's a little fiddly. In general the answer to "how do I get a 'normal' box" is "just use geometry". But for estimated extent that's not really on the table, as the information we want is already in geocentrics. One could take all the corners of the geocentric box, convert them back to degrees and then take the geometry extent of the collection…
comment:9 by , 4 months ago
Estimating extent for QGIS is the source problem yes, I cannot tell what other problems may arise by having the same datatype mean different things. The next question is: do we have a function to perform that conversion of a GBOX from geodetic to cartesian ?
comment:10 by , 4 months ago
Answering myself: there's no function to convert from geodesic GBOX to non-geodetic one. At least, the cast from gegraphy to geometry *drops* the GBOX to trigger re-computation.
comment:11 by , 4 months ago
So next problem is how to tell, from just reading the stats, whether the ND_BOX is geocentric or not, because that seems dependent on whether the column being analyzed is of type geography or geometry, but there's no such indication in the ND_STATS structure to read, am I right ? Also, there doesn't seem to be a Z value for the geodetic ones, isn't that mandatory to convert between geocentric and not ?
strk=# select jsonb_pretty(_postgis_stats('qgis_issue_30294_geom', 'g')::jsonb -> 'extent'); { "max": [ 20.15, 30.2 ], "min": [ -10.15, -10.2 ] } strk=# select jsonb_pretty(_postgis_stats('qgis_issue_30294', 'g')::jsonb -> 'extent'); { "max": [ 0.970627, 0.298534 ], "min": [ 0.813017, -0.173346 ] }
comment:13 by , 3 months ago
Given the amount of changes in PR https://git.osgeo.org/gitea/postgis/postgis/pulls/205 I guess that work could not be backported to the 3.4 branch or earlier, what do you think Paul ? If so, would it be worth a smaller changeset to just return NULL from ST_EstimatedExtent when the column is a geography type ?
comment:14 by , 3 months ago
Well the geodetic box code is nicely self-contained… I could probably do a low-touch patch to 3.4 that would be applicable to 3.3 and earlier with a little work, now that I know what the fix actually is, and the exact places I need to touch.
comment:15 by , 3 months ago
Here's a working branch that should be the minimal changes needed for 3.4 https://github.com/pramsey/postgis/tree/stable-3.4-extent
comment:17 by , 3 months ago
In 15e18d33e/git, applied refactor and geography support to master branch.
comment:18 by , 3 months ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The index-mediated return is also NULL, even if an index is present: