Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#3739 closed defect (wontfix)

ST_Within Not using index

Reported by: postgispaul Owned by: pramsey
Priority: medium Milestone: PostGIS 2.3.3
Component: postgis Version: 2.3.x
Keywords: Cc: pramsey

Description (last modified by robe)

Hi, I've got this query and I can't figure what I'm doing wrong. It doesn't use the geo-index.

Versions:

"POSTGIS="2.3.2 r15302" GEOS="3.6.1-CAPI-1.10.1 r4317" PROJ="Rel. 4.9.1, 04 March 2015" GDAL="GDAL 2.1.3, released 2017/20/01 GDAL_DATA not found" LIBXML="2.7.8" LIBJSON="0.12" (core procs from "2.3.0 r15146" need upgrade) RASTER (raster procs from "2.3.0 r (...)"
"PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit"

This happens on a full planet_osm_polygon table. Definition:

CREATE TABLE public.planet_osm_polygon
(
  osm_id bigint,
  ... snip a number of irrelevant columns ...
  way geometry(Geometry,3857)
)
WITH (
  OIDS=FALSE
);


CREATE INDEX planet_osm_polygon_index
  ON public.planet_osm_polygon
  USING gist
  (way);

CREATE INDEX planet_osm_polygon_pkey
  ON public.planet_osm_polygon
  USING btree
  (osm_id);

I've done an analyze planet_osm_polygon but still the query planner surprises me.

select s.way from planet_osm_polygon s
join planet_osm_polygon ap on (ST_Within(s.way,ap.way))
where ap.osm_id=123456

results in a pkey lookup followed by a bitmap index scan

Nested Loop  (cost=6339.51..360823.14 rows=88540 width=229)
  Output: s.way
  ->  Index Scan using planet_osm_polygon_pkey on public.planet_osm_polygon ap  (cost=0.57..2.79 rows=1 width=229)
        Output: ap.osm_id, ap.highway, ap.surface, ap.railway, ap.waterway, ap.tunnel, ap.bridge, ap.oneway, ap.access, ap.name, ap.z_order, ap.way_area, ap.area, ap.natural, ap.water, ap.admin_level, ap.boundary, ap.landuse, ap.amenity, ap.parking, ap.a (...)
        Index Cond: (ap.osm_id = 123456)
  ->  Bitmap Heap Scan on public.planet_osm_polygon s  (cost=6338.94..359934.96 rows=88540 width=229)
        Output: s.osm_id, s.highway, s.surface, s.railway, s.waterway, s.tunnel, s.bridge, s.oneway, s.access, s.name, s.z_order, s.way_area, s.area, s.natural, s.water, s.admin_level, s.boundary, s.landuse, s.amenity, s.parking, s.aeroway, s.military, s (...)
        Recheck Cond: (ap.way ~ s.way)
        Filter: _st_contains(ap.way, s.way)
        ->  Bitmap Index Scan on planet_osm_polygon_index  (cost=0.00..6316.81 rows=265621 width=0)
              Index Cond: (ap.way ~ s.way)

adding ap.way ~ s.way and to the join condition:

select s.way from planet_osm_polygon s
join planet_osm_polygon ap on (ap.way ~ s.way and ST_Within(s.way,ap.way))
where ap.osm_id=123456

results in a pkey lookup and a normal index-scan

Nested Loop  (cost=1.12..373.05 rows=89 width=229)
  Output: s.way
  ->  Index Scan using planet_osm_polygon_pkey on public.planet_osm_polygon ap  (cost=0.57..2.79 rows=1 width=229)
        Output: ap.osm_id, ap.highway, ap.surface, ap.railway, ap.waterway, ap.tunnel, ap.bridge, ap.oneway, ap.access, ap.name, ap.z_order, ap.way_area, ap.area, ap.natural, ap.water, ap.admin_level, ap.boundary, ap.landuse, ap.amenity, ap.parking, ap.a (...)
        Index Cond: (ap.osm_id = 123456)
  ->  Index Scan using planet_osm_polygon_index on public.planet_osm_polygon s  (cost=0.55..369.37 rows=89 width=229)
        Output: s.osm_id, s.highway, s.surface, s.railway, s.waterway, s.tunnel, s.bridge, s.oneway, s.access, s.name, s.z_order, s.way_area, s.area, s.natural, s.water, s.admin_level, s.boundary, s.landuse, s.amenity, s.parking, s.aeroway, s.military, s (...)
        Index Cond: ((ap.way ~ s.way) AND (ap.way ~ s.way))
        Filter: _st_contains(ap.way, s.way)

One thing stands out here: the duplicate index condition ap.way ~ s.way

My question is: why isn't it planning to use the normal index-scan without adding ap.way ~ s.way. After all, it is part of the st_within definition:

CREATE OR REPLACE FUNCTION public.st_within(
    geom1 geometry,
    geom2 geometry)
  RETURNS boolean AS
'SELECT $2 OPERATOR(public.~) $1 AND public._ST_Contains($2,$1)'
  LANGUAGE sql IMMUTABLE
  COST 100;

Change History (34)

comment:1 by robe, 7 years ago

Description: modified (diff)

comment:2 by robe, 7 years ago

I reformatted your ticket since it was hard to follow.

Unfortunately I don't have OSM data to test with. I did a sample of a similar query on one of my datasets and it behaved correctly with the gist index kicking in first.

Can you try the following:

  1. Do you have the same issue with ST_Intersects?

That would at least isolate the issue to ST_Within since in theory && and ~ should be more or less optimized the same and use the same geo stats

  1. Sometimes I have seen indexes get corrupt. Particularly gist indexes (not just geometry but I've had issues with gist trigram for example that I also use a lot).

Queries that use them just go down to a crawl. I forget what the plan showed of these, but when I see the behavior, I just reindex the whole table or just the offending index and have a weekly routine to do this since it happens quite frequently (like once a month or so it happens to us on tables that are under massive updates).

(Could very well be a windows specific issue too).

Try:

reindex index public.planet_osm_polygon_index;

comment:3 by postgispaul, 7 years ago

Thanks for your reformatting and fast answer!

St_intersects does not have that issue. It's correctly using the index on ap.way && s.way without explicitly adding that condition.

I also did create a new

 CREATE OR REPLACE FUNCTION public.st_withintest(
    geom1 geometry,
    geom2 geometry)
  RETURNS boolean AS
'SELECT $2 OPERATOR(public.~) $1 AND public._ST_Contains($2,$1)'
  LANGUAGE sql IMMUTABLE
  COST 100;

function to rule out messed up function definitions, but that didn't help either.

Could that corrupt index be caused by postgres 9.6.1? I did see some related bugs fixed in 9.6.3 Anyway, reindex is running

comment:4 by robe, 7 years ago

Cc: pramsey added

Not sure. Gist indexes have always caused us problems on tables where we always need to do massive updates e.g. where we get new electronic updated info from a vendor that requires us to update more than 20%-30% of a table.

A lot of the fixes I've seen for example in 9.6.2 (haven't checked 9.6.3 upcoming list) had to do with how the planner handles parallel queries. Anyway I think I've got a freebsd box lying around (PostgreSQL 9.3 with 2.3.something) somewhere that has full osm data loaded (or full enough) and see if I can replicate.

It's odd that && doesn't give the same issue as ~ as those look like they have similar costs and should go down similar channels. I would think with a corrupt index it would suffer the same fate.

pramsey any thoughts on this one?

comment:5 by postgispaul, 7 years ago

FYI: Reindex just finished. No improvements though…

comment:6 by robe, 7 years ago

One more test, what happens if you change your ST_Withintest function to use

&& instead of ~

and use your ST_Withintest function instead of ST_Within.

That should determine if its ~ or _ST_Contains at fault here.

comment:7 by postgispaul, 7 years ago

with && it behaves normally:

Nested Loop  (cost=1.12..36813.06 rows=317184 width=229)
  Output: s.way
  ->  Index Scan using planet_osm_polygon_pkey on public.planet_osm_polygon ap  (cost=0.57..2.79 rows=1 width=229)
        Output: ap.osm_id, ap.highway, ap.surface, ap.railway, ap.waterway, ap.tunnel, ap.bridge, ap.oneway, ap.access, ap.name, ap.z_order, ap.way_area, ap.area, ap.natural, ap.water, ap.admin_level, ap.boundary, ap.landuse, ap.amenity, ap.parking, ap.a (...)
        Index Cond: (ap.osm_id = 123456)
  ->  Index Scan using planet_osm_polygon_index on public.planet_osm_polygon s  (cost=0.55..36721.72 rows=8855 width=229)
        Output: s.osm_id, s.highway, s.surface, s.railway, s.waterway, s.tunnel, s.bridge, s.oneway, s.access, s.name, s.z_order, s.way_area, s.area, s.natural, s.water, s.admin_level, s.boundary, s.landuse, s.amenity, s.parking, s.aeroway, s.military, s (...)
        Index Cond: (ap.way && s.way)
        Filter: _st_contains(ap.way, s.way)

comment:8 by robe, 7 years ago

Looking at difference between && and ~. Looks like for && we are using our own selectivity functions (gserialized_gist_sel_2d and gserialized_gist_joinsel_2d.

CREATE OPERATOR &&(
  PROCEDURE = geometry_overlaps,
  LEFTARG = geometry,
  RIGHTARG = geometry,
  COMMUTATOR = &&,
  RESTRICT = gserialized_gist_sel_2d,
  JOIN = gserialized_gist_joinsel_2d);

but for ~ we are using the PostgreSQL packaged containment selectivity contsel and contjoinsel functions.

CREATE OPERATOR ~(
  PROCEDURE = geometry_contains,
  LEFTARG = geometry,
  RIGHTARG = geometry,
  COMMUTATOR = @,
  RESTRICT = contsel,
  JOIN = contjoinsel);

I'm not sure if that is the culprit thought that would make it more sensitive to changes upstream. I still need to test on my 9.3 to see if it exhibits the same issue. I thought there was a time ago when && used use the built-in ones as well.

strk, pramsey is there a reason why we decided to reimplement the ones used for && and not for containment?

comment:9 by postgispaul, 7 years ago

What I did see though, is when I swap the $1 and $2 around OPERATOR(public.~) in the withintest function it does use the index. So perhaps it's some confusion down the line about which parameter should be fully inside?

comment:10 by robe, 7 years ago

Well geometry_contains is a bounding box check, defined here: http://postgis.net/docs/doxygen/2.3/df/d5e/gserialized__gist__2d_8c_source.html#l00779

which ultimately relies on: box2df_contains

http://postgis.net/docs/doxygen/2.3/df/d5e/gserialized__gist__2d_8c_source.html#l00294

if you switch the order, you may get a different answer (fewer results) and it would also affect the query optimizer. Do you get different results? I would expect so. .

The order from the function at spot glance looks correct.

comment:11 by postgispaul, 7 years ago

Indeed, I do get different results, so that's correct. What I meant was that perhaps when checking/evaluating the ~ for index-use the order is erroneously swapped

comment:12 by pramsey, 7 years ago

The default selectivity estimates from pgsql are bunk. We (I) may have made a critical mistake in swapping out the && operator, which while over-determined for these functions at least will be used correctly by the planner. The idea is that using the correct index op gives you a smaller result set to pass to the exact function, but that's no good if the index op is never called due to planning failures.

The pgsql defaults are just selectivity constants, they don't look at the data at all. They aren't worse than nothing, they *are* nothing. So, if things work when you define ~ to && in your call, then that's probably what we should do ourselves as a quick fix.

And an enhancement would be to do selectivity calculations correctly for @ and ~. Not sure how enthused I am about that :)

comment:13 by strk, 7 years ago

How about ~ AND && ? Anyone up for benchmarking adding both in the query ?

comment:14 by robe, 7 years ago

Maybe it's just late, but relooking at the plans I see nothing wrong here. Your original does use an index but chooses a bitmap index scan strategy instead of a pure index scan.

I don't have the same dataset, but I get similar plans. The geo index is used just like yours in your bad case. True it's in a bitmap index,but that's fine and common. I'm thinking there is nothing worth fixing here as far as plans go, but if you are seeing much worse performance with the bitmap index scan, that might be a separate issue.

postgispaul, can you rerun your examples with actual costs ( explain analyze instead of explain). Only thing I see here is the estimated and actual are a bit off, which is no big deal in most cases.

This is testing on a FreeBSD with an example that yields an output of 299 rows, goes thru similar plan as yours

first does btree index by id followed by bitmap index scan first using the ~ to bring in 783 rows of which 484 are then removed by a recheck and _ST_Contains. Same as yours, uses an index. I got thrown off because I forgot I have to read the plan from bottom up from each node.

PostgreSQL 9.3.14 on amd64-portbld-freebsd10.1, compiled by FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512, 64-bit POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.2, 08 September 2015" GDAL="GDAL 2.1.0, released 2016/04/25" LIBXML="2.9.4" LIBJSON="0.12" RASTER
EXPLAIN ANALYZE VERBOSE select s.way FROM  osm.us_west_planet_osm_polygon s
join osm.us_west_planet_osm_polygon ap on ( st_within(s.way, ap.way) )
where ap.osm_id=179980467 ;

Yields:

                                                                                                                                               QUERY PLAN

----------------------- Nested Loop  (cost=362.38..22734.22 rows=2001 width=252) (actual time=0.560..13.253 rows=299 loops=1)
   Output: s.way
   ->  Index Scan using us_west_planet_osm_polygon_pkey on osm.us_west_planet_osm_polygon ap  (cost=0.43..8.45 rows=1 width=252) (actual time=0.013..0.015 rows=1 loops=1)
         Output: ap.osm_id, ap.access, ap."addr:housename", ap."addr:housenumber", ap."addr:interpolation", ap.admin_level, ap.aerialway, ap.aeroway, ap.amenity, ap.area, ap.barrier, ap.bicycle, ap.brand, ap.bridge
nomination, ap.disused, ap.embankment, ap.foot, ap."generator:source", ap.harbour, ap.highway, ap.historic, ap.horse, ap.intermittent, ap.junction, ap.landuse, ap.layer, ap.leisure, ap.lock, ap.man_made, ap.militar
.population, ap.power, ap.power_source, ap.public_transport, ap.railway, ap.ref, ap.religion, ap.route, ap.service, ap.shop, ap.sport, ap.surface, ap.toll, ap.tourism, ap."tower:type", ap.tracktype, ap.tunnel, ap.w
         Index Cond: (ap.osm_id = 179980467)
   ->  Bitmap Heap Scan on osm.us_west_planet_osm_polygon s  (cost=361.95..22705.76 rows=2001 width=252) (actual time=0.542..13.168 rows=299 loops=1)
         Output: s.osm_id, s.access, s."addr:housename", s."addr:housenumber", s."addr:interpolation", s.admin_level, s.aerialway, s.aeroway, s.amenity, s.area, s.barrier, s.bicycle, s.brand, s.bridge, s.boundary,
, s.embankment, s.foot, s."generator:source", s.harbour, s.highway, s.historic, s.horse, s.intermittent, s.junction, s.landuse, s.layer, s.leisure, s.lock, s.man_made, s.military, s.motorcar, s.name, s."natural", s
c_transport, s.railway, s.ref, s.religion, s.route, s.service, s.shop, s.sport, s.surface, s.toll, s.tourism, s."tower:type", s.tracktype, s.tunnel, s.water, s.waterway, s.wetland, s.width, s.wood, s.z_order, s.way
         Recheck Cond: (ap.way ~ s.way)
         Filter: _st_contains(ap.way, s.way)
         Rows Removed by Filter: 484
         ->  Bitmap Index Scan on us_west_planet_osm_polygon_index  (cost=0.00..361.44 rows=6004 width=0) (actual time=0.314..0.314 rows=783 loops=1)
               Index Cond: (ap.way ~ s.way)
 Total runtime: 13.402 ms
(13 rows)

Now repeating using the ~

EXPLAIN ANALYZE VERBOSE select s.way FROM  osm.us_west_planet_osm_polygon s
join osm.us_west_planet_osm_polygon ap on ( ap.way ~ s.way AND st_within(s.way, ap.way) )
where ap.osm_id=179980467 ;

Yields:

                                                                                                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------Nested Loop  (cost=4.91..38.44 rows=2 width=252) (actual time=0.659..13.404 rows=299 loops=1)
   Output: s.way
   ->  Index Scan using us_west_planet_osm_polygon_pkey on osm.us_west_planet_osm_polygon ap  (cost=0.43..8.45 rows=1 width=252) (actual time=0.017..0.018 rows=1 loops=1)
         Output: ap.osm_id, ap.access, ap."addr:housename", ap."addr:housenumber", ap."addr:interpolation", ap.admin_level, ap.aerialway, ap.aeroway, ap.amenity, ap.area, ap.barrier, ap.bicycle, ap.brand, ap.bridge, ap.boundary, ap.building, ap.construction, ap.covered
nomination, ap.disused, ap.embankment, ap.foot, ap."generator:source", ap.harbour, ap.highway, ap.historic, ap.horse, ap.intermittent, ap.junction, ap.landuse, ap.layer, ap.leisure, ap.lock, ap.man_made, ap.military, ap.motorcar, ap.name, ap."natural", ap.office, ap.on
.population, ap.power, ap.power_source, ap.public_transport, ap.railway, ap.ref, ap.religion, ap.route, ap.service, ap.shop, ap.sport, ap.surface, ap.toll, ap.tourism, ap."tower:type", ap.tracktype, ap.tunnel, ap.water, ap.waterway, ap.wetland, ap.width, ap.wood, ap.z_
         Index Cond: (ap.osm_id = 179980467)
   ->  Bitmap Heap Scan on osm.us_west_planet_osm_polygon s  (cost=4.48..29.97 rows=2 width=252) (actual time=0.637..13.320 rows=299 loops=1)
         Output: s.osm_id, s.access, s."addr:housename", s."addr:housenumber", s."addr:interpolation", s.admin_level, s.aerialway, s.aeroway, s.amenity, s.area, s.barrier, s.bicycle, s.brand, s.bridge, s.boundary, s.building, s.construction, s.covered, s.culvert, s.cut
, s.embankment, s.foot, s."generator:source", s.harbour, s.highway, s.historic, s.horse, s.intermittent, s.junction, s.landuse, s.layer, s.leisure, s.lock, s.man_made, s.military, s.motorcar, s.name, s."natural", s.office, s.oneway, s.operator, s.place, s.population, s
c_transport, s.railway, s.ref, s.religion, s.route, s.service, s.shop, s.sport, s.surface, s.toll, s.tourism, s."tower:type", s.tracktype, s.tunnel, s.water, s.waterway, s.wetland, s.width, s.wood, s.z_order, s.way_area, s.way
         Recheck Cond: ((ap.way ~ s.way) AND (ap.way ~ s.way))
         Filter: _st_contains(ap.way, s.way)
         Rows Removed by Filter: 484
         ->  Bitmap Index Scan on us_west_planet_osm_polygon_index  (cost=0.00..4.47 rows=6 width=0) (actual time=0.363..0.363 rows=783 loops=1)
               Index Cond: ((ap.way ~ s.way) AND (ap.way ~ s.way))
 Total runtime: 13.570 ms
(13 rows)

Not much of a difference, but different from yours in that mine still uses a bitmap index scan instead of a plain index scan.

next defined this:

 CREATE OR REPLACE FUNCTION public.st_withinol(
    geom1 geometry,
    geom2 geometry)
  RETURNS boolean AS
'SELECT $2 OPERATOR(public.&&) $1 AND public._ST_Contains($2,$1)'
  LANGUAGE sql IMMUTABLE
  COST 100;

EXPLAIN ANALYZE VERBOSE select s.way FROM  osm.us_west_planet_osm_polygon s
join osm.us_west_planet_osm_polygon ap on (  st_withinol(s.way, ap.way) )
where ap.osm_id=179980467 ;

Still no difference, get a bitmap index scan again except slightly worse because the && index scan does not remove as many false positives leaving the _ST_Contains to do more work.

Nested Loop  (cost=37.40..2510.63 rows=49298 width=252) (actual time=0.719..16.919 rows=299 loops=1)
  Output: s.way
  ->  Index Scan using us_west_planet_osm_polygon_pkey on osm.us_west_planet_osm_polygon ap  (cost=0.43..8.45 rows=1 width=252) (actual time=0.012..0.013 rows=1 loops=1)
        Output: ap.osm_id, ap.access, ap."addr:housename", ap."addr:housenumber", ap."addr:interpolation", ap.admin_level, ap.aerialway, ap.aeroway, ap.amenity, ap.area, ap.barrier, ap.bicycle, ap.brand, ap.bridge, ap.boundary, ap.building, ap.construction, ap.covered, ap.culvert, ap.cutting, ap.denomination, ap.disused, ap.embankment, ap.foot, ap."generator:source", ap.harbour, ap.highway, ap.historic, ap.horse, ap.intermittent, ap.junction, ap.landuse, ap.layer, ap.leisure, ap.lock, ap.man_made, ap.military, ap.motorcar, ap.name, ap."natural", ap.office, ap.oneway, ap.operator, ap.place, ap.population, ap.power, ap.power_source, ap.public_transport, ap.railway, ap.ref, ap.religion, ap.route, ap.service, ap.shop, ap.sport, ap.surface, ap.toll, ap.tourism, ap."tower:type", ap.tracktype, ap.tunnel, ap.water, ap.waterway, ap.wetland, ap.width, ap.wood, ap.z_order, ap.way_area, ap.way
        Index Cond: (ap.osm_id = 179980467)
  ->  Bitmap Heap Scan on osm.us_west_planet_osm_polygon s  (cost=36.96..2500.18 rows=200 width=252) (actual time=0.703..16.831 rows=299 loops=1)
        Output: s.osm_id, s.access, s."addr:housename", s."addr:housenumber", s."addr:interpolation", s.admin_level, s.aerialway, s.aeroway, s.amenity, s.area, s.barrier, s.bicycle, s.brand, s.bridge, s.boundary, s.building, s.construction, s.covered, s.culvert, s.cutting, s.denomination, s.disused, s.embankment, s.foot, s."generator:source", s.harbour, s.highway, s.historic, s.horse, s.intermittent, s.junction, s.landuse, s.layer, s.leisure, s.lock, s.man_made, s.military, s.motorcar, s.name, s."natural", s.office, s.oneway, s.operator, s.place, s.population, s.power, s.power_source, s.public_transport, s.railway, s.ref, s.religion, s.route, s.service, s.shop, s.sport, s.surface, s.toll, s.tourism, s."tower:type", s.tracktype, s.tunnel, s.water, s.waterway, s.wetland, s.width, s.wood, s.z_order, s.way_area, s.way
        Recheck Cond: (ap.way && s.way)
        Filter: _st_contains(ap.way, s.way)
        Rows Removed by Filter: 513
        ->  Bitmap Index Scan on us_west_planet_osm_polygon_index  (cost=0.00..36.91 rows=600 width=0) (actual time=0.309..0.309 rows=812 loops=1)
              Index Cond: (ap.way && s.way)
Total runtime: 17.111 ms


Now strk for your comment about combining both:

 CREATE OR REPLACE FUNCTION public.st_withinolc(
    geom1 geometry,
    geom2 geometry)
  RETURNS boolean AS
'SELECT $2 OPERATOR(public.&&) $1 AND $2 OPERATOR(public.~) $1 AND public._ST_Contains($2,$1)'
  LANGUAGE sql IMMUTABLE
  COST 100;

EXPLAIN ANALYZE VERBOSE select s.way FROM  osm.us_west_planet_osm_polygon s
join osm.us_west_planet_osm_polygon ap on ( st_withinolc(s.way, ap.way) )
where ap.osm_id=179980467 ;
Nested Loop  (cost=0.85..17.15 rows=49 width=252) (actual time=0.327..13.473 rows=299 loops=1)
  Output: s.way
  ->  Index Scan using us_west_planet_osm_polygon_pkey on osm.us_west_planet_osm_polygon ap  (cost=0.43..8.45 rows=1 width=252) (actual time=0.013..0.013 rows=1 loops=1)
        Output: ap.osm_id, ap.access, ap."addr:housename", ap."addr:housenumber", ap."addr:interpolation", ap.admin_level, ap.aerialway, ap.aeroway, ap.amenity, ap.area, ap.barrier, ap.bicycle, ap.brand, ap.bridge, ap.boundary, ap.building, ap.construction, ap.covered, ap.culvert, ap.cutting, ap.denomination, ap.disused, ap.embankment, ap.foot, ap."generator:source", ap.harbour, ap.highway, ap.historic, ap.horse, ap.intermittent, ap.junction, ap.landuse, ap.layer, ap.leisure, ap.lock, ap.man_made, ap.military, ap.motorcar, ap.name, ap."natural", ap.office, ap.oneway, ap.operator, ap.place, ap.population, ap.power, ap.power_source, ap.public_transport, ap.railway, ap.ref, ap.religion, ap.route, ap.service, ap.shop, ap.sport, ap.surface, ap.toll, ap.tourism, ap."tower:type", ap.tracktype, ap.tunnel, ap.water, ap.waterway, ap.wetland, ap.width, ap.wood, ap.z_order, ap.way_area, ap.way
        Index Cond: (ap.osm_id = 179980467)
  ->  Index Scan using us_west_planet_osm_polygon_index on osm.us_west_planet_osm_polygon s  (cost=0.41..8.68 rows=1 width=252) (actual time=0.291..13.376 rows=299 loops=1)
        Output: s.osm_id, s.access, s."addr:housename", s."addr:housenumber", s."addr:interpolation", s.admin_level, s.aerialway, s.aeroway, s.amenity, s.area, s.barrier, s.bicycle, s.brand, s.bridge, s.boundary, s.building, s.construction, s.covered, s.culvert, s.cutting, s.denomination, s.disused, s.embankment, s.foot, s."generator:source", s.harbour, s.highway, s.historic, s.horse, s.intermittent, s.junction, s.landuse, s.layer, s.leisure, s.lock, s.man_made, s.military, s.motorcar, s.name, s."natural", s.office, s.oneway, s.operator, s.place, s.population, s.power, s.power_source, s.public_transport, s.railway, s.ref, s.religion, s.route, s.service, s.shop, s.sport, s.surface, s.toll, s.tourism, s."tower:type", s.tracktype, s.tunnel, s.water, s.waterway, s.wetland, s.width, s.wood, s.z_order, s.way_area, s.way
        Index Cond: ((ap.way && s.way) AND (ap.way ~ s.way))
        Filter: _st_contains(ap.way, s.way)
        Rows Removed by Filter: 484
Total runtime: 13.660 ms

switches to a regular old index scan instead of bitmap index, but timing I get is worse by a little bit presumably because I have that extra && check which is redundant.

I tried swapping the order of and && and got 13.521ms presumably because is just more efficient so gives && less to check.

comment:15 by postgispaul, 7 years ago

Wow, first: thanks for your extensive investigations!

A couple of remarks: I (probably, only window is early tomorrow) won't be able to get reasonable timings for the next week(s) as our server is going to be doing some serious processing. I thought Bitmap index scans are always worse (or at least: not better) than normal index scans? Still doesn't explain why the condition from the function itself isn't taken into consideration.

Most of the time when I encounter some weird effect in our software it pays off to dive in deep. It either leads to better understanding, or (more often than I care of) leads to a bug/edgecase/… in some deep-longforgotten piece of code. That was my main motivation to file this.

I hope to get some additional timings for you tomorrow, will let you know

Last edited 7 years ago by postgispaul (previous) (diff)

comment:16 by postgispaul, 7 years ago

Ok, just found some time on our server to dive into this. The thing is: I've simplified my example to the most simple one where the st_within isn't behaving as it should.

The initial query where I'm working with is even worse:

select ap.tags->'ISO3166-1' iso,s.admin_level admlev,s.way, s.osm_id from planet_osm_polygon s
join planet_osm_polygon ap on ST_Within(s.way,ap.way) 
where ap.tags?'ISO3166-1' AND ap.tags->'ISO3166-1' in ('KR') AND s.admin_level ~ '^\d+$'

gives me:

Nested Loop  (cost=0.28..193474644.34 rows=227104 width=271)
  Output: (ap.tags -> 'ISO3166-1'::text), s.admin_level, s.way, s.osm_id
  Join Filter: ((ap.way ~ s.way) AND _st_contains(ap.way, s.way))
  ->  Seq Scan on public.planet_osm_polygon s  (cost=0.00..12924619.20 rows=513037 width=239)
        Output: s.osm_id, s.highway, s.surface, s.railway, s.waterway, s.tunnel, s.bridge, s.oneway, s.access, s.name, s.z_order, s.way_area, s.area, s.natural, s.water, s.admin_level, s.boundary, s.landuse, s.amenity, s.parking, s.aeroway, s.military, s (...)
        Filter: (s.admin_level ~ '^\d+$'::text)
  ->  Materialize  (cost=0.28..1479.66 rows=1328 width=258)
        Output: ap.tags, ap.way
        ->  Index Scan using polygon_iso3166_1_idx on public.planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258)
              Output: ap.tags, ap.way
              Index Cond: ((ap.tags -> 'ISO3166-1'::text) = 'KR'::text)

while

select ap.tags->'ISO3166-1' iso,s.admin_level admlev,s.way, s.osm_id from planet_osm_polygon s
join planet_osm_polygon ap on ap.way ~ s.way and ST_Within(s.way,ap.way) 
where ap.tags?'ISO3166-1' AND ap.tags->'ISO3166-1' in ('KR') AND s.admin_level ~ '^\d+$'

gives me

Nested Loop  (cost=0.83..486830.14 rows=227 width=271)
  Output: (ap.tags -> 'ISO3166-1'::text), s.admin_level, s.way, s.osm_id
  ->  Index Scan using polygon_iso3166_1_idx on public.planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258)
        Output: ap.osm_id, ap.highway, ap.surface, ap.railway, ap.waterway, ap.tunnel, ap.bridge, ap.oneway, ap.access, ap.name, ap.z_order, ap.way_area, ap.area, ap.natural, ap.water, ap.admin_level, ap.boundary, ap.landuse, ap.amenity, ap.parking, ap.a (...)
        Index Cond: ((ap.tags -> 'ISO3166-1'::text) = 'KR'::text)
  ->  Index Scan using planet_osm_polygon_index on public.planet_osm_polygon s  (cost=0.55..365.47 rows=1 width=239)
        Output: s.osm_id, s.highway, s.surface, s.railway, s.waterway, s.tunnel, s.bridge, s.oneway, s.access, s.name, s.z_order, s.way_area, s.area, s.natural, s.water, s.admin_level, s.boundary, s.landuse, s.amenity, s.parking, s.aeroway, s.military, s (...)
        Index Cond: ((ap.way ~ s.way) AND (ap.way ~ s.way))
        Filter: ((s.admin_level ~ '^\d+$'::text) AND _st_contains(ap.way, s.way))

So without the extra condition, it's using a sequential scan, and that one is definitely slower than an index scan.

Do you still need those explain analyze results?

comment:17 by robe, 7 years ago

Yes it would help to have the explain analyze results to see what the actual number of rows it is comparing vs what it's actually getting and estimated vs. actual costs.

How long do each of these queries take? The slower vs. faster.

I'd also like to see the indexes you have in place, clearly the ones you have listed above is not all of them. For example it seems odd to me that your index cond is (ap.tags →'….' ..) unless you have a functional index on that. I would have also expected a gist or gin index on tags so you could write the query like

SELECT  ap.tags->'ISO3166-1' iso,s.admin_level admlev, s.way, s.osm_id 
FROM planet_osm_polygon s
JOIN planet_osm_polygon ap on ST_Within(s.way,ap.way) 
WHERE ap.tags @> 'ISO3166-1=>KR'::hstore  AND s.admin_level ~ '^\d+$'

If you can provide some sample data that exhibits the issue that would be really great. I know it might be hard that it might require your full dataset so do your best.

comment:18 by postgispaul, 7 years ago

Here are the results:

Fast one:

explain analyze
select ap.tags->'ISO3166-1' iso,s.admin_level admlev,s.way, s.osm_id from planet_osm_polygon s
join planet_osm_polygon ap on ap.way ~ s.way and ST_Within(s.way,ap.way)
where ap.tags?'ISO3166-1' AND ap.tags->'ISO3166-1' in ('KR') AND s.admin_level ~ '^\d+$' 
Nested Loop  (cost=0.83..486830.14 rows=227 width=271) (actual time=1319.929..92007.078 rows=1377 loops=1)
  ->  Index Scan using polygon_iso3166_1_idx on planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258) (actual time=0.067..0.076 rows=3 loops=1)
        Index Cond: ((tags -> 'ISO3166-1'::text) = 'KR'::text)
  ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.55..365.47 rows=1 width=239) (actual time=443.719..30636.884 rows=459 loops=3)
        Index Cond: ((ap.way ~ way) AND (ap.way ~ way))
        Filter: ((admin_level ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 102799
Planning time: 0.340 ms
Execution time: 92018.858 ms

Slow one:

explain analyze
select ap.tags->'ISO3166-1' iso,s.admin_level admlev,s.way, s.osm_id from planet_osm_polygon s
join planet_osm_polygon ap on ST_Within(s.way,ap.way)
where ap.tags?'ISO3166-1' AND ap.tags->'ISO3166-1' in ('KR') AND s.admin_level ~ '^\d+$' 
Nested Loop  (cost=0.28..193474644.34 rows=227104 width=271) (actual time=368744.872..4795335.403 rows=1377 loops=1)
  Join Filter: ((ap.way ~ s.way) AND _st_contains(ap.way, s.way))
  Rows Removed by Join Filter: 1541343
  ->  Seq Scan on planet_osm_polygon s  (cost=0.00..12924619.20 rows=513037 width=239) (actual time=26.551..4588533.670 rows=514240 loops=1)
        Filter: (admin_level ~ '^\d+$'::text)
        Rows Removed by Filter: 265120823
  ->  Materialize  (cost=0.28..1479.66 rows=1328 width=258) (actual time=0.001..0.005 rows=3 loops=514240)
        ->  Index Scan using polygon_iso3166_1_idx on planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258) (actual time=32.366..37.253 rows=3 loops=1)
              Index Cond: ((tags -> 'ISO3166-1'::text) = 'KR'::text)
Planning time: 107.125 ms
Execution time: 4795338.151 ms

indices on planet_osm_polygon:

polygon_iso3166_1_idx : CREATE INDEX polygon_iso3166_1_idx ON planet_osm_polygon USING btree (((tags -> 'ISO3166-1'::text))) WHERE (tags ? 'ISO3166-1'::text)
planet_osm_polygon_pkey : CREATE INDEX planet_osm_polygon_pkey ON planet_osm_polygon USING btree (osm_id)
planet_osm_polygon_index : CREATE INDEX planet_osm_polygon_index ON planet_osm_polygon USING gist (way)

So I used a partial index, because the ISO3166-1 tag is the only one we use in our queries, and a full gist/gin index would be wasting too much diskspace. And as for sample data, that's going to take a while, if at all possible.

But if you're able to get the bitmap/normal index difference at your end, wouldn't that be sufficient to get to the cause of the issue?

comment:19 by robe, 7 years ago

Well bitmap/normal index is not an issue. Usually in my case the bitmap and index scan speeds are pretty close and the bitmap often beats out the plain index scan.

So the only concern I have here is your seq scan one. Sadly I can't replicate with a smaller test sample even on my windows 64-bit 9.6.2 running PostGIS 2.4 dev (which should be more or less the same as 2.3.2) and same makeup as yours.

Here is my vein attempt just using full NJ state data I had lying around and using the same indexes you have except the iso had to replace with iso 2 since I didn't have iso 1.

EXPLAIN  ANALYZE SELECT  ap.tags->'ISO3166-2' iso,s.admin_level admlev, s.way, s.osm_id 
FROM planet_osm_polygon s
JOIN planet_osm_polygon ap on ST_Within(s.way,ap.way) 
WHERE  ap.tags?'ISO3166-2' AND ap.tags->'ISO3166-2' in ('US-NJ')  AND s.admin_level ~ '^\d+$';


                                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.30..28.74 rows=1 width=7236) (actual time=2.629..389.108 rows=488 loops=1)
   ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..8.14 rows=1 width=7231) (actual time=0.015..0.016 rows=1 loops=1)
         Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
   ->  Bitmap Heap Scan on planet_osm_polygon s  (cost=4.18..20.58 rows=1 width=7204) (actual time=2.283..386.256 rows=488 loops=1)
         Recheck Cond: (ap.way ~ way)
         Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
         Rows Removed by Filter: 3124
         Heap Blocks: exact=1348
         ->  Bitmap Index Scan on planet_osm_polygon_index  (cost=0.00..4.18 rows=4 width=0) (actual time=0.594..0.594 rows=3612 loops=1)
               Index Cond: (ap.way ~ way)
 Planning time: 4.018 ms
 Execution time: 392.913 ms
(12 rows)
EXPLAIN  ANALYZE SELECT  ap.tags->'ISO3166-2' iso,s.admin_level admlev, s.way, s.osm_id 
FROM planet_osm_polygon s
JOIN planet_osm_polygon ap on ap.way ~ s.way AND ST_Within(s.way,ap.way) 
WHERE  ap.tags?'ISO3166-2' AND ap.tags->'ISO3166-2' in ('US-NJ')  AND s.admin_level ~ '^\d+$';


                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.27..16.58 rows=1 width=7236) (actual time=1.707..398.057 rows=488 loops=1)
   ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..8.14 rows=1 width=7231) (actual time=0.016..0.017 rows=1 loops=1)
         Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
   ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.15..8.42 rows=1 width=7204) (actual time=1.480..395.016 rows=488 loops=1)
         Index Cond: ((ap.way ~ way) AND (ap.way ~ way))
         Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
         Rows Removed by Filter: 3124
 Planning time: 0.156 ms
 Execution time: 398.725 ms
(9 rows)

SET enable_bitmapscan = false;
EXPLAIN  ANALYZE SELECT  ap.tags->'ISO3166-2' iso,s.admin_level admlev, s.way, s.osm_id 
FROM planet_osm_polygon s
JOIN planet_osm_polygon ap on  ST_Within(s.way,ap.way) 
WHERE  ap.tags?'ISO3166-2' AND ap.tags->'ISO3166-2' in ('US-NJ')  AND s.admin_level ~ '^\d+$';

                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.27..29.38 rows=1 width=7236) (actual time=1.607..402.173 rows=488 loops=1)
   ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..8.14 rows=1 width=7231) (actual time=0.007..0.008 rows=1 loops=1)
         Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
   ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.15..21.23 rows=1 width=7204) (actual time=1.545..397.700 rows=488 loops=1)
         Index Cond: (ap.way ~ way)
         Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
         Rows Removed by Filter: 3124
 Planning time: 0.159 ms
 Execution time: 402.805 ms
(9 rows)

See in above the execution time the bitmap is fastest and even when you add in planning it's still faster. You can fiddle with maybe enable_seqscan=false to see what it does if it makes it go the right way.

It would be great if you can maybe try the above example too on your machine - to rule out machine/PostgreSQL version/ postgrsql settings specific issues.

The data I used was - http://postgis.us/downloads/nj.osm.pbf

If your bounding boxes are big enough (which is a especially big problem when using long lat projection and comparing countries), it could dismiss using an index entirely because it thinks its useless, never even bothers costing. The extra ~ would just add some extra incentive to make it consider the plan. pramsey can describe the innards better but I do think the histogram is used in some shape or form for the planner to decide if a plan is even worth considering.

comment:20 by robe, 7 years ago

Here is just the polygon table in format suitable for pg_restore:

http://postgis.us/downloads/nj_poly.backup

comment:21 by postgispaul, 7 years ago

Here are the results:

explain analyze 
SELECT  ap.tags->'ISO3166-2' iso,s.admin_level admlev, s.way, s.osm_id 
FROM planet_osm_polygon s
JOIN planet_osm_polygon ap on ST_Within(s.way,ap.way) 
WHERE  ap.tags?'ISO3166-2' AND ap.tags->'ISO3166-2' in ('US-NJ')  AND s.admin_level ~ '^\d+$';

Nested Loop  (cost=0.27..9.08 rows=1 width=128) (actual time=3.006..849.464 rows=488 loops=1)
  ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..2.34 rows=1 width=64) (actual time=0.014..0.017 rows=1 loops=1)
        Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
  ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.15..6.73 rows=1 width=96) (actual time=2.864..839.118 rows=488 loops=1)
        Index Cond: (ap.way ~ way)
        Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 3124
Planning time: 0.317 ms
Execution time: 851.518 ms

and

explain analyze 
SELECT  ap.tags->'ISO3166-2' iso,s.admin_level admlev, s.way, s.osm_id 
FROM planet_osm_polygon s
JOIN planet_osm_polygon ap on ap.way ~ s.way and ST_Within(s.way,ap.way) 
WHERE  ap.tags?'ISO3166-2' AND ap.tags->'ISO3166-2' in ('US-NJ')  AND s.admin_level ~ '^\d+$';

Nested Loop  (cost=0.27..4.98 rows=1 width=128) (actual time=2.895..842.255 rows=488 loops=1)
  ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..2.34 rows=1 width=64) (actual time=0.013..0.016 rows=1 loops=1)
        Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
  ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.15..2.62 rows=1 width=96) (actual time=2.721..832.200 rows=488 loops=1)
        Index Cond: ((ap.way ~ way) AND (ap.way ~ way))
        Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 3124
Planning time: 0.333 ms
Execution time: 844.151 ms

So results are different at my end, for this case there doesn't seem to be an issue. I did reduce the random_page_cost to 1 (all previous queries have been run with this cost). Changing the random_page_cost to the default 4 gave me this:

Nested Loop  (cost=4.30..28.67 rows=1 width=128) (actual time=4.821..873.925 rows=488 loops=1)
  ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..8.14 rows=1 width=64) (actual time=0.075..0.079 rows=1 loops=1)
        Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
  ->  Bitmap Heap Scan on planet_osm_polygon s  (cost=4.18..20.51 rows=1 width=96) (actual time=4.705..863.589 rows=488 loops=1)
        Recheck Cond: (ap.way ~ way)
        Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 3124
        Heap Blocks: exact=1098
        ->  Bitmap Index Scan on planet_osm_polygon_index  (cost=0.00..4.18 rows=4 width=0) (actual time=1.424..1.424 rows=3612 loops=1)
              Index Cond: (ap.way ~ way)
Planning time: 0.332 ms
Execution time: 876.048 ms

and

Nested Loop  (cost=0.27..16.58 rows=1 width=128) (actual time=2.963..854.142 rows=488 loops=1)
  ->  Index Scan using polygon_iso3166_2_idx on planet_osm_polygon ap  (cost=0.13..8.14 rows=1 width=64) (actual time=0.015..0.017 rows=1 loops=1)
        Index Cond: ((tags -> 'ISO3166-2'::text) = 'US-NJ'::text)
  ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.15..8.42 rows=1 width=96) (actual time=2.798..844.017 rows=488 loops=1)
        Index Cond: ((ap.way ~ way) AND (ap.way ~ way))
        Filter: (((admin_level)::text ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 3124
Planning time: 0.307 ms
Execution time: 856.015 ms

so apart from the random_page_cost: no differences between our servers.

regarding your remarks

If your bounding boxes are big enough (which is a especially big problem when using long lat projection and comparing countries), it could dismiss using an index entirely because it thinks its useless, never even bothers costing. The extra ~ would just add some extra incentive to make it consider the plan. pramsey can describe the innards better but I do think the histogram is used in some shape or form for the planner to decide if a plan is even worth considering.

The bbox of south korea should be selective enough for the polygons of the whole world, I even set the analyze parameter (the one stating how many records to fetch for statistics) to the max.

And as for the extra ~ : I thought it wasn't possible to influence the planner (other than disabling specific actions)?

Note also that if the extra ~ is added, that condition is added twice to the index condition

comment:22 by robe, 7 years ago

Yah that is the thing that puzzles me the extra ~ making it do the right thing. If it did the same with just ~ I would have dismissed it as just a a stats issue with the fact that stats are only single columned and to do this justice you'd need compound on admin_level , geometry. Multicolumn stats. are coming in PostgreSQL 10.

I also thought there was a time when it collapsed redundant statements ~ AND ~, but perhaps they took that logic out.

Do you see a difference in costs between when you do something like

set enable_seqscan = false;

-- apply the original


Does it do the right thing or still try to do sequential scan?

pramsey , strk any thoughts on why doing ST_Within() .. vs. ~ AND ST_Within().. would make a difference.

I'm suspecting it's not even considering the solution with the index because it has to peak into the function to expose the ~ and that itself has a cost, so if it's already drummed up enough plans to try it may stop after x number.

That's a very wild uneducated guess.

comment:23 by postgispaul, 7 years ago

Total costs without enable_seqscan=false: 193474644

with enable_seqscan=false: 475346571

Nested Loop  (cost=6230.89..475346571.95 rows=227104 width=271)
  Output: (ap.tags -> 'ISO3166-1'::text), s.admin_level, s.way, s.osm_id
  ->  Index Scan using polygon_iso3166_1_idx on public.planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258)
        Output: ap.osm_id, ap.highway, ap.surface, ap.railway, ap.waterway, ap.tunnel, ap.bridge, ap.oneway, ap.access, ap.name, ap.z_order, ap.way_area, ap.area, ap.natural, ap.water, ap.admin_level, ap.boundary, ap.landuse, ap.amenity, ap.parking, ap.a (...)
        Index Cond: ((ap.tags -> 'ISO3166-1'::text) = 'KR'::text)
  ->  Bitmap Heap Scan on public.planet_osm_polygon s  (cost=6230.62..357938.45 rows=171 width=239)
        Output: s.osm_id, s.highway, s.surface, s.railway, s.waterway, s.tunnel, s.bridge, s.oneway, s.access, s.name, s.z_order, s.way_area, s.area, s.natural, s.water, s.admin_level, s.boundary, s.landuse, s.amenity, s.parking, s.aeroway, s.military, s (...)
        Recheck Cond: (ap.way ~ s.way)
        Filter: ((s.admin_level ~ '^\d+$'::text) AND _st_contains(ap.way, s.way))
        ->  Bitmap Index Scan on planet_osm_polygon_index  (cost=0.00..6230.57 rows=265635 width=0)
              Index Cond: (ap.way ~ s.way)

with extra ~ condition: 486830

comment:24 by robe, 7 years ago

would it be a bother to provide the explain analyze? This shows the planner thinks using the index is more costly than not, but doesn't tell whether it is actually more costly. I would assume not - that your actual timings would be similar to having the extra ~ condition, but hard to tell.

comment:25 by postgispaul, 7 years ago

ok,

explain analyze
select ap.tags->'ISO3166-1' iso,s.admin_level admlev,s.way, s.osm_id from planet_osm_polygon s
join planet_osm_polygon ap on ST_Within(s.way,ap.way)
where ap.tags?'ISO3166-1' AND ap.tags->'ISO3166-1' in ('KR') AND s.admin_level ~ '^\d+$' 

results in:

Nested Loop  (cost=6230.89..475346571.95 rows=227104 width=271) (actual time=7009.454..15032.957 rows=1377 loops=1)
  ->  Index Scan using polygon_iso3166_1_idx on planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258) (actual time=13.654..13.685 rows=3 loops=1)
        Index Cond: ((tags -> 'ISO3166-1'::text) = 'KR'::text)
  ->  Bitmap Heap Scan on planet_osm_polygon s  (cost=6230.62..357938.45 rows=171 width=239) (actual time=2291.334..4919.787 rows=459 loops=3)
        Recheck Cond: (ap.way ~ way)
        Filter: ((admin_level ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 102799
        Heap Blocks: exact=14628
        ->  Bitmap Index Scan on planet_osm_polygon_index  (cost=0.00..6230.57 rows=265635 width=0) (actual time=2186.556..2186.556 rows=103258 loops=3)
              Index Cond: (ap.way ~ way)
Planning time: 0.292 ms
Execution time: 15054.474 ms

subsequent executions are in the 1800ms range due to caching. So in this case the speed is comparable to the query with the extra ~ condition in it

comment:26 by robe, 7 years ago

ahah so I think I was wrong, it is aware of the option to use the index, but has wrongly concluded it is worse to do so than to use it because it estimated a traversal count of 265635 rows when there is only 103258 rows.

Why extra ~ puts it over the edge is a bit of a mystery. I suspect it's just treating it as an extra condition and so multiplying the rows it has to traverse for the cost, which sways it back in the other direction. So it's just dumb luck.

One more test,

Can you move the ~ inside the ST_Within

so ~ and ~. I'm suspecting it will have the same affect as it being outside, but just want to confirm that.

Aside from Paul's (pramsey) note about improving ~ selectivity, perhaps managing this ourselves, I'm not sure there is much we can do about this. Even with that, I'm not sure how that would help, as it seems to be very data specific.

comment:27 by postgispaul, 7 years ago

You're right. Adding the ~ to the st_within (so it's present twice) is enough to persuade postgres to use the index correctly.

CREATE OR REPLACE FUNCTION public.st_withintest(
    geom1 geometry,
    geom2 geometry)
  RETURNS boolean AS
'SELECT $2 OPERATOR(public.~) $1 AND $2 OPERATOR(public.~) $1 AND public._ST_Contains($2,$1)'
  LANGUAGE sql IMMUTABLE
  COST 100;

explain analyze 
select ap.tags->'ISO3166-1' iso,s.admin_level admlev,s.way, s.osm_id from planet_osm_polygon s
join planet_osm_polygon ap on ST_Withintest(s.way,ap.way)
where ap.tags?'ISO3166-1' AND ap.tags->'ISO3166-1' in ('KR') AND s.admin_level ~ '^\d+$' }}}
results in 
{{{
Nested Loop  (cost=0.83..486830.14 rows=227 width=271) (actual time=15.872..1163.043 rows=1377 loops=1)
  ->  Index Scan using polygon_iso3166_1_idx on planet_osm_polygon ap  (cost=0.28..1473.02 rows=1328 width=258) (actual time=0.146..0.179 rows=3 loops=1)
        Index Cond: ((tags -> 'ISO3166-1'::text) = 'KR'::text)
  ->  Index Scan using planet_osm_polygon_index on planet_osm_polygon s  (cost=0.55..365.47 rows=1 width=239) (actual time=3.473..363.983 rows=459 loops=3)
        Index Cond: ((ap.way ~ way) AND (ap.way ~ way))
        Filter: ((admin_level ~ '^\d+$'::text) AND _st_contains(ap.way, way))
        Rows Removed by Filter: 102799
Planning time: 0.352 ms
Execution time: 1172.677 ms
}}}

comment:28 by robe, 7 years ago

Resolution: wontfix
Status: newclosed

I don't think thereis much we can do to remedy this in PostGIS especially when mixing with other gist types compounds the planners planning options.

comment:29 by komzpa, 7 years ago

I've met the same thing and digged into estimators to understand what's happening.

ANALYSE builds inexact histogram of data, with up to ~10000 cells. On the scale of the world, it means they are ~300x300 km cells, with all the cities falling into same cell.

When you join it, it multiplies a cell from table A by cell from table B to get upper limit. Thus inside a city you get a squared number of objects. When you're using unfiltered planet on both sides (with different conditions) it leads to catastrophic overestimate of row count.

PostGIS estimator is unsuitable for planet-sized datasets unfortunately.

Workarounds I've used:

  • kill postgis statistics using CLUSTER. Vacuum will come anyway and re-collect it, so a short-term thing to use in scripts;
  • bump work_mem to tremendously large values (20GB on my machine), so it thinks the intermediate result will fit into it anyway;
  • repeat the operator more times. Selectivity is still < 100%, each repeat will multiply it by itself, getting it lower and lower.
Version 0, edited 7 years ago by komzpa (next)

comment:30 by strk, 7 years ago

Kompza had you tried increasing the per-column STATISTICS target to get smaller cells ? How effective can that get ?

comment:31 by komzpa, 7 years ago

It's a hardcoded limit:

https://github.com/postgis/postgis/blob/6c9f8058eb0277ae0ddad323506f5c9377167e9f/postgis/gserialized_estimate.c#L1445

Increasing statistics will get more data inside cells, but will not increase number of cells.

For 2D world, it is sqrt(40 000 000. * 40 000 000 / (2 * 10 000)) ~= 282 842 m box.

It's the total number of cells - it's being rooted here: https://github.com/postgis/postgis/blob/6c9f8058eb0277ae0ddad323506f5c9377167e9f/postgis/gserialized_estimate.c#L1554

By default the STATISTICS is 100, thus world is split into 10x10 grid, 2 828 km box each. That covers almost all of Europe. If you multiply number of nodes in such box by number of countries in such box, you get insanely big number.

comment:32 by strk, 7 years ago

From the comment the number of cells is attstattarget2 with an upper limit of 10k*ndims per side (20k x 20k cells grid).

Does the comment not reflect the code ?

comment:33 by postgispaul, 7 years ago

Adding partial index create index on planet_osm_polygon using gist (way) where ap.tags?'ISO3166-1'; doesn't make any difference in my queries…

comment:34 by komzpa, 7 years ago

@strk the code uses this number as total number, not a side length. The comment isn't in sync with code.

Last edited 7 years ago by komzpa (previous) (diff)
Note: See TracTickets for help on using tickets.