#5337 closed defect (invalid)
Row estimate for gist geometry index is more than 500 times smaller than actual count
Reported by: | Björn Harrtell | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.4.0 |
Component: | postgis | Version: | 3.2.x |
Keywords: | Cc: |
Description
I have a table that contains ~30 million mixed geometries of points, linestrings and polygons, mostly small but some few exceptions that are large. Data is not uniformly distributed.
Counting rows within a small bbox (~5% of full extent) where real density is high results in 1.552.761 rows. Row estimate when looking at the explained plan is 2.819.
The table is recently analyzed and I've attempted to adjust with set statistics on the geometry column to 1000 instead of default 100 hoping that could improve the estimate but it has little to no effect.
Data is open at https://datafordeler.dk/dataoversigt/geodanmark-vektor/geodanmark-vektor-brugerdefineret-filudtraek/ but unfortunately it is in Danish only (I think?) and requires a cumbersome process to register and download. I might be able to provide a table copy if anyone is interested to look into this issue.
postgis_full_version: POSTGIS="3.2.3 0" [EXTENSION] PGSQL="130" GEOS="3.9.2-CAPI-1.14.3" PROJ="9.0.1" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" (core procs from "3.1.1 aaf4c79" need upgrade)
Change History (7)
comment:2 by , 22 months ago
comment:3 by , 22 months ago
Also tried _postgis_selectivity with the same bbox as mentioned in description which is POLYGON ((716905.5 6168346.1, 716905.5 6185703.38, 741648.19 6185703.38, 741648.19 6168346.1, 716905.5 6168346.1))
and I get 0.055185432031709364. Assuming that means 5.5% of 28 million that seems to be a good estimate, it doesn't make sense to me why the row estimate is 2819 rows (→ Bitmap Heap Scan on geodkobjekt g (cost=68.27..5625.92 rows=2819 width=472)).
comment:4 by , 22 months ago
Mabye this is simply me doing things the wrong way. The bad row estimate is when using static geometry from a CTE. If I supply the geometry as literal I get a good estimate!
With geometry from joined single row CTE:
→ Bitmap Heap Scan on geodkobjekt g (cost=68.27..5625.92 rows=2819 width=472)
Recheck Cond: (geometri && (st_setsrid((st_extent(geodkobjekt.geometri))::geometry, 25832))) → Bitmap Index Scan on geodkobjekt_geometri_idx (cost=0.00..67.56 rows=2819 width=0)
Index Cond: (geometri && (st_setsrid((st_extent(geodkobjekt.geometri))::geometry, 25832)))
With literal geometry:
→ Parallel Bitmap Heap Scan on geodkobjekt g (cost=36847.48..1506295.63 rows=648121 width=0)
Recheck Cond: (geometri && '0103000020E8640000010000000500000000000000D3E0254166666686C687574100000000D3E0254185EB51D8B998574114AE476120A2264185EB51D8B998574114AE476120A2264166666686C687574100000000D3E0254166666686C6875741'::geometry) → Bitmap Index Scan on geodkobjekt_geometri_idx (cost=0.00..36458.60 rows=1555491 width=0)
Index Cond: (geometri && '0103000020E8640000010000000500000000000000D3E0254166666686C687574100000000D3E0254185EB51D8B998574114AE476120A2264185EB51D8B998574114AE476120A2264166666686C687574100000000D3E0254166666686C6875741'::geometry)
comment:5 by , 22 months ago
I've also tried rewriting the query to use a single row subquery in the WHERE clause instead of the CTE join, giving the same bad result as the CTE variant.
comment:6 by , 22 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Any dynamically created geometry will not be known at planning time and thus cannot be used in estimation. (st_setsrid((st_extent(geodkobjekt.geometri))::geometry, 25832)) is a dynamically created geometry. Typical way to deal with that is to either create the table really in separate query before this one instead of inline CTE, or to rewrite your query to be a direct JOIN on the tables without proxy calculations (which you can still add on top of the JOIN if needed).
comment:7 by , 22 months ago
Thanks @kompza. I had somehow become ignorant of this fact. Closing as non-issue.
Note to self might want to see if _postgis_selectivity or _postgis_stats can provide some insight (see https://git.osgeo.org/gitea/postgis/postgis/src/branch/master/postgis/postgis.sql.in#L548)