Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#3548 closed defect (invalid)

Slow queries with ST_DWithin and geography columns

Reported by: rdallasgray Owned by: pramsey
Priority: medium Milestone: PostGIS 2.2.3
Component: postgis Version: 2.1.x
Keywords: ST_DWithin geography Cc:

Description

We noticed that a query to find points within a given distance of a line was running slowly. The points have SRID 4326 and are recorded with an indexed geography column.

The culprit appears to be the filter step where _st_dwithin is run against each of the records returned by the index.

We were able to work around the issue by introducing a second ST_DWithin condition which runs against an indexed geometry column (again SRID 4326). We set the distance parameter for this in a rough conversion of metres to degrees, allowing a large buffer to ensure we include all candidate points. This narrowed the field sufficiently and the query is now ~300% faster.

Interestingly, the order of the two ST_DWithin queries matters. If we run the geographic query first, we have poor performance. If we run the geometric query first, we have the better performance.

Attached are redacted queries and explain results.

QUERY 1

SELECT COUNT(*) FROM "locations"
WHERE (ST_DWithin(
  ST_SetSRID(ST_MakeLine(
  ARRAY[
    ST_MakePoint(2.1736, 41.38518) <... redacted, approx 200 points> ST_MakePoint(73.31681999999995, 55.05574999999999)
  ]
)
, 4326)::geography,
  locations.geog,
  5000,
  false
)
);

EXPLAIN 1

 Aggregate  (cost=2.76..2.77 rows=1 width=0) (actual time=6501.755..6501.757 rows=1 loops=1)
   Buffers: shared hit=15542
   ->  Index Scan using index_locations_on_geog on locations  (cost=0.28..2.76 rows=1 width=0) (actual time=8.016..6499.419 rows=1121 loops=1)
         Index Cond: (geog && '01<...>40'::geography)
         Filter: (('01<...>40'::geography && _st_expand(geog, 5000::double precision)) AND _st_dwithin('01<...>40'::geography, geog, 5000::double precision, false))
         Rows Removed by Filter: 19213
         Buffers: shared hit=15542
 Total runtime: 6502.081 ms

QUERY 2

SELECT COUNT(*) FROM "locations"
WHERE (ST_DWithin(
  ST_SetSRID(ST_MakeLine(
  ARRAY[
  ST_MakePoint(2.1736, 41.38518) <... redacted, approx 200 points> ST_MakePoint(73.31681999999995, 55.05574999999999)
  ]
)
, 4326)::geography,
  locations.geog,
  5000,
  false
)
AND (ST_DWithin(
  ST_SetSRID(ST_MakeLine(
  ARRAY[
  ST_MakePoint(2.1736, 41.38518) <... redacted, approx 200 points> ST_MakePoint(73.31681999999995, 55.05574999999999)
  ]
)
, 4326),
  locations.geom,
  0.26
)
)
);

EXPLAIN 2

 Aggregate  (cost=3.02..3.02 rows=1 width=0) (actual time=5976.269..5976.270 rows=1 loops=1)
   Buffers: shared hit=15468
   ->  Index Scan using index_locations_on_geog on locations  (cost=0.28..3.01 rows=1 width=0) (actual time=5.942..5973.619 rows=1121 loops=1)
         Index Cond: (geog && '01<...>40'::geography)
         Filter: ((geom && '0103000020E6100000010000000500000075029A081B9EFE3F800EF3E50590444075029A081B9EFE3F8FC151F2EA484C406DEB6E9EEA6452408FC151F2EA484C406DEB6E9EEA645240800EF3E50590444075029A081B9EFE3F800EF3E505904440'::geometry) AND ('01<...>40'::geography && _st_expand(geog, 5000::double precision)) AND ('01<...>40'::geometry && st_expand(geom, 0.26::double precision)) AND _st_dwithin('01<...>40'::geography, geog, 5000::double precision, false) AND _st_dwithin('01<...>40'::geometry, geom, 0.26::double precision))
         Rows Removed by Filter: 19213
         Buffers: shared hit=15468
 Total runtime: 5976.354 ms

QUERY 3

SELECT COUNT(*) FROM "locations"
WHERE (ST_DWithin(
  ST_SetSRID(ST_MakeLine(
  ARRAY[
    ST_MakePoint(2.1736, 41.38518) <... redacted, approx 200 points> ST_MakePoint(73.31681999999995, 55.05574999999999)
  ]
)
, 4326),
  locations.geom,
  0.26
)
)
AND (ST_DWithin(
  ST_SetSRID(ST_MakeLine(
  ARRAY[
    ST_MakePoint(2.1736, 41.38518) <... redacted, approx 200 points> ST_MakePoint(73.31681999999995, 55.05574999999999)
  ]
)
, 4326)::geography,
  locations.geog,
  5000,
  false
)
);

EXPLAIN 3

 Aggregate  (cost=3.02..3.02 rows=1 width=0) (actual time=1308.285..1308.287 rows=1 loops=1)
   Buffers: shared hit=15468
   ->  Index Scan using index_locations_on_geog on locations  (cost=0.28..3.01 rows=1 width=0) (actual time=1.917..1306.076 rows=1121 loops=1)
         Index Cond: (geog && '01...40'::geography)
         Filter: ((geom && '0103000020E6100000010000000500000075029A081B9EFE3F800EF3E50590444075029A081B9EFE3F8FC151F2EA484C406DEB6E9EEA6452408FC151F2EA484C406DEB6E9EEA645240800EF3E50590444075029A081B9EFE3F800EF3E505904440'::geometry) AND ('01...40'::geometry && st_expand(geom, 0.26::double precision)) AND ('01...40'::geography && _st_expand(geog, 5000::double precision)) AND _st_dwithin('01...40'::geometry, geom, 0.26::double precision) AND _st_dwithin('01...40'::geography, geog, 5000::double precision, false))
         Rows Removed by Filter: 19213
         Buffers: shared hit=15468
 Total runtime: 1308.386 ms

Change History (4)

comment:1 by rdallasgray, 8 years ago

Apologies — version info:

POSTGIS="2.1.8 r13780" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.1, 04 March 2015" GDAL="GDAL 1.11.2, released 2015/02/10" LIBXML="2.9.1" LIBJSON="UNKNOWN" RASTER

PostgreSQL 9.3.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

comment:2 by pramsey, 8 years ago

Resolution: invalid
Status: newclosed

Spherical calculations are just more expensive than planar ones, as you've demonstrated.

Only thing I see about your query you might want to change is, rather than building up a 200-vertex line and calling dwithin on that, build up 199 2-vertex segments, and do a join against that set. The index winnow much better, since it won't be using a global bounding box for the whole line, but instead 199 much smaller ones for the segments.

comment:3 by rdallasgray, 8 years ago

OK, thanks, that makes sense.

How about the issue that the order in which the conditions are declared affects the performance? Should the query planner not be able to take into account that geometry is going to be considerably faster than geography and optimise accordingly?

comment:4 by pramsey, 8 years ago

If it knew what was faster/slower, it would take that into account, but we don't have costing on most of our functions (now) and therefor it just processes things it figures are equal left-to-right.

Note: See TracTickets for help on using tickets.