Opened 9 years ago

Closed 9 years ago

#3065 closed defect (wontfix)

Spatial index on LINESTRING not used under some circumstances

Reported by: realityexists Owned by: pramsey
Priority: medium Milestone: PostGIS 2.1.8
Component: postgis Version: 2.1.x
Keywords: Cc:

Description

POSTGIS="2.1.5 r13152" GEOS="3.4.2-CAPI-1.8.2 r3924" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER

PostgreSQL 9.3.6, compiled by Visual C++ build 1600, 64-bit

Windows 7 SP1 x64

I have a complex query that intersects linestrings with polygons. It first builds a temp table of the linestrings with a GIST index on the linestring column, then does the main intersection query, which normally takes ~20 minutes. With a particular set of linestrings this GIST index is not used (as shown by the EXPLAIN output) and the main query takes nearly 3 hours.

The temp table was created like this:

CREATE TEMP TABLE _sia_segment
AS SELECT ST_MakeLine(start_pos::geometry, end_pos::geometry)::geography AS line
        , NULL::integer AS segment_length, ... other stuff ...
FROM ...

UPDATE _sia_segment
SET segment_length = ST_Length(line);

CREATE INDEX ix__sia_segment_line ON _sia_segment USING GIST (line);

Also, not sure if it's related, but the CREATE INDEX command outputs messages like this:

DEBUG:  picksplit method for column 1 of index "ix__sia_segment_line" failed
HINT:  The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one in the CREATE INDEX command.

If I run CREATE INDEX before the UPDATE then the UPDATE also outputs those messages.

I found that if I move the ST_Length() into the original SELECT statement the problem goes away - the index is used. So my original problem is solved, but if anyone wants to investigate the PostGIS behaviour I can try to create a repro. It should be easy for the picksplit error, though not so easy for the original problem (index not used).

Also, the problem only occurred when the CREATE TEMP TABLE and the main intersection query were done inside a transaction. Outside of a transaction the index was used just fine, despite the picksplit error. So the circumstances are quite particular, but reliably reproducible in my database.

Attachments (1)

repro3-debug3-logs.zip (243.5 KB ) - added by realityexists 9 years ago.
Debug log (level 3) from running repro 3. Problem occurs only in a transaction.

Download all attachments as: .zip

Change History (28)

comment:1 by strk, 9 years ago

The picksplit warning is being taken care in #3058

comment:2 by strk, 9 years ago

The picksplit method should be fixed in all development branches now (2.0, 2.1 and 2.2/trunk). Beside that (which you may test if you build the code yourself) you'd have to show the query and the explain output to get better help.

Note that the planner decides how to access the db based on statistics which may be outdated if you didn't run ANALYZE on the table after batch insert or update.

comment:3 by realityexists, 9 years ago

Thanks, I tested the 2.1 branch (r13295), now using PostgreSQL 9.2.6 on Linux x86. I can confirm that the picksplit message is gone, but the index is still not used. I got it down to a minimal repro. The tables are very simple:

CREATE TABLE test_poly
(
  ts_id integer,
  border geography
);

CREATE TABLE test_line
(
  segment_id integer,
  timerange tsrange,
  line geography,
  segment_length integer
);

Query:

BEGIN; -- Problem doesn't happen outside of a transaction

DROP TABLE IF EXISTS _temp_line;

CREATE TEMP TABLE _temp_line AS
SELECT * FROM test_line;

-- Problem doesn't happen without this UPDATE.
-- Also doesn't happen without the seemingly-irrelevant timerange column.
UPDATE _temp_line SET segment_length = ST_Length(line);

CREATE INDEX ix__temp_line_line ON _temp_line USING GIST (line);

-- ANALYZE _sia_segment; -- This doesn't help!

EXPLAIN
SELECT ts_id, segment_id
FROM test_poly a
JOIN _temp_line s ON ST_Intersects(a.border, s.line)

-- ROLLBACK;

EXPLAIN output:

Nested Loop  (cost=0.00..893167397.30 rows=1083307 width=8)
  Join Filter: ((a.border && s.line) AND (_st_distance(a.border, s.line, 0::double precision, false) < 1e-05::double precision))
  ->  Seq Scan on test_poly a  (cost=0.00..1363.05 rows=14805 width=880)
  ->  Materialize  (cost=0.00..8264.72 rows=219515 width=36)
        ->  Seq Scan on _temp_line s  (cost=0.00..5559.15 rows=219515 width=36)

But when running outside of a transaction or without the UPDATE statement or without the timerange column:

Nested Loop  (cost=0.00..116283.23 rows=1083307 width=8)
  ->  Seq Scan on test_poly a  (cost=0.00..1363.05 rows=14805 width=880)
  ->  Index Scan using ix__temp_line_line on _temp_line s  (cost=0.00..7.69 rows=7 width=36)
        Index Cond: (a.border && line)
        Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-05::double precision)

If you want the data I can upload that somewhere and send you the link. It's ~10MB compressed.

comment:4 by strk, 9 years ago

Use EXPLAIN ANALYZE to check how good the estimates are, then use "set enable_seqscan=off" or "=on" to compare the two runs.

comment:5 by realityexists, 9 years ago

I don't know what to make of the EXPLAIN ANALYZE output. Here it is. In a transaction:

Nested Loop  (cost=0.00..893167397.30 rows=1083307 width=8) (actual time=53.944..3005465.506 rows=821046 loops=1)
  Join Filter: ((a.border && s.line) AND (_st_distance(a.border, s.line, 0::double precision, false) < 1e-05::double precision))
  Rows Removed by Join Filter: 1624338609
  ->  Seq Scan on test_poly a  (cost=0.00..1363.05 rows=14805 width=880) (actual time=0.065..66.477 rows=14805 loops=1)
  ->  Materialize  (cost=0.00..8264.72 rows=219515 width=36) (actual time=0.006..46.816 rows=109771 loops=14805)
        ->  Seq Scan on _temp_line s  (cost=0.00..5559.15 rows=219515 width=36) (actual time=0.124..68.740 rows=109771 loops=1)
Total runtime: 3005878.650 ms

Oustide of a transaction:

Nested Loop  (cost=0.00..116283.23 rows=1083307 width=8) (actual time=1.224..869916.660 rows=821046 loops=1)
  ->  Seq Scan on test_poly a  (cost=0.00..1363.05 rows=14805 width=880) (actual time=0.043..36.121 rows=14805 loops=1)
  ->  Index Scan using ix__temp_line_line on _temp_line s  (cost=0.00..7.69 rows=7 width=36) (actual time=1.642..58.708 rows=55 loops=14805)
        Index Cond: (a.border && line)
        Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-05::double precision)
        Rows Removed by Filter: 41
Total runtime: 870215.643 ms

Even with enable_seqscan=off the index is not used. EXPLAIN output (in a transaction):

Nested Loop  (cost=20000000000.00..20893167397.30 rows=1083307 width=8)
  Join Filter: ((a.border && s.line) AND (_st_distance(a.border, s.line, 0::double precision, false) < 1e-05::double precision))
  ->  Seq Scan on test_poly a  (cost=10000000000.00..10000001363.05 rows=14805 width=880)
  ->  Materialize  (cost=10000000000.00..10000008264.73 rows=219515 width=36)
        ->  Seq Scan on _temp_line s  (cost=10000000000.00..10000005559.15 rows=219515 width=36)

comment:6 by pramsey, 9 years ago

It's almost as if, in the transactional context, the database doesn't know the index exists yet. Which might be true, I'm not certain that DDL is managed within the same transactional context as DML. Given the relatively narrow scope of this issue, and that I'm not even 10% certain it's a PostGIS issue, could we close off?

comment:7 by strk, 9 years ago

I cannot reproduce the problem here:

PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit

POSTGIS="2.2.0dev r13327" GEOS="3.5.0dev-CAPI-1.9.0 r4038" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24" LIBXML="2.9.1" LIBJSON="0.12" RASTER

I've run the preparation and the query in your original submission, and got the index scan: `

QUERY PLAN


Nested Loop (cost=0.14..585.98 rows=328 width=8) (actual time=0.003..0.003 rows=0 loops=1)

→ Seq Scan on test_poly a (cost=0.00..22.30 rows=1230 width=36) (actual time=0.001..0.001 rows=0 loops=1) → Index Scan using ixtemp_line_line on _temp_line s (cost=0.14..0.45 rows=1 width=36) (never executed)

Index Cond: (a.border && line) Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-05::double precision)

Total runtime: 0.058 ms

(6 rows) `

comment:8 by strk, 9 years ago

Sorry, I meant I run the reduced test you wrote in http://trac.osgeo.org/postgis/ticket/3065#comment:3

Now I've also tried on the same PostgreSQL version but with the PostGIS version in 2.1 branch:

POSTGIS="2.1.6dev r13280" GEOS="3.5.0dev-CAPI-1.9.0 r4038" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24" LIBXML="2.9.1" LIBJSON="UNKNOWN" RASTER

Same plan:

                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.14..585.98 rows=328 width=8) (actual time=0.003..0.003 rows=0 loops=1)
   ->  Seq Scan on test_poly a  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.002..0.002 rows=0 loops=1)
   ->  Index Scan using ix__temp_line_line on _temp_line s  (cost=0.14..0.45 rows=1 width=36) (never executed)
         Index Cond: (a.border && line)
         Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-05::double precision)

comment:9 by robe, 9 years ago

I ran this on my windows 64-bit dev that has

POSTGIS="2.1.3 r12547" GEOS="3.5.0dev-CAPI-1.9.0 r3989" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.0, released 2014/04/16" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER PostgreSQL 9.3.3, compiled by Visual C++ build 1600, 64-bit
BEGIN; -- Problem doesn't happen outside of a transaction

DROP TABLE IF EXISTS _temp_line;

CREATE TEMP TABLE _temp_line AS
SELECT * FROM test_line;

-- Problem doesn't happen without this UPDATE.
-- Also doesn't happen without the seemingly-irrelevant timerange column.
UPDATE _temp_line SET segment_length = ST_Length(line);

CREATE INDEX ix__temp_line_line ON _temp_line USING GIST (line);

-- ANALYZE _sia_segment; -- This doesn't help!

EXPLAIN
SELECT ts_id, segment_id
FROM test_poly a
JOIN _temp_line s ON ST_Intersects(a.border, s.line);

and it behaved as expected

Nested Loop  (cost=0.14..585.98 rows=328 width=8)
  ->  Seq Scan on test_poly a  (cost=0.00..22.30 rows=1230 width=36)
  ->  Index Scan using ix__temp_line_line on _temp_line s  (cost=0.14..0.45 rows=1 width=36)
        Index Cond: (a.border && line)
        Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-005::double precision)

I have to upgrade my dev to see if it's an issue with newer PostGIS and PostgreSQL but at least as far as I can tell the issue does not show on older.

comment:10 by robe, 9 years ago

Well I upgrade to 2.1.5 and it was still fine. I haven't tried upgrading to 9.3.6 yet. This is running on windows 7 btw.

comment:11 by robe, 9 years ago

Milestone: PostGIS 2.1.6

Okay I started with a spanking new install which should replicate yours, and still it does an index scan as I would expect even in a transaction

POSTGIS="2.1.5 r13152" GEOS="3.4.2-CAPI-1.8.2 r3924" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER PostgreSQL 9.3.6, compiled by Visual C++ build 1600, 64-bit
Nested Loop  (cost=0.14..585.98 rows=328 width=8)
  ->  Seq Scan on test_poly a  (cost=0.00..22.30 rows=1230 width=36)
  ->  Index Scan using ix__temp_line_line on _temp_line s  (cost=0.14..0.45 rows=1 width=36)
        Index Cond: (a.border && line)
        Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-005::double precision)

Not sure how you are getting a materialized step. I was running with factory settings so perhaps it's sensitive to your settings.

here is my show stuff

show work_mem; -- 1MB
show shared_buffers; -- 128MB

comment:12 by robe, 9 years ago

realityexists if you can provide a link to the data that would be swell as I realize now these tables in your test are empty so might not be the same thing you are seeing since your plans all show a lot more records than ours do.

As I recall I think I did see something somewhat bizarre with one of my datasets that had both a timerange column and a geometry column, but then I couldn't reproduce the issue. So I suspect something might be going on between the gistiness of timerange and geometry which is causing some sort of corruption with the index.

comment:13 by realityexists, 9 years ago

Well, here is something weird… I first created a repro on my original data (repro 1), then on somewhat simplified data (repro 2), then on further simplified data (repro 3). Now repro 3 (what I posted above) no longer works - the index gets used no matter what I do! Repro 2 still works, though. What's more, the problem now occurs even outside of a transaction (in repro 2)! This is still on the same DB server, same versions of Postgres and PostGIS, so I don't know what's changed, other than the phase of the moon!

Here is the data for it: https://www.sendspace.com/file/h0sy1y Restore the 2 tables and run bug3065_repro2.sql.

My show stuff is the same: 1MB work_mem, 128MB shared_buffers.

comment:14 by strk, 9 years ago

I can think of these possible reasons: (1) different statistics gathered, try ALTER TABLE .. SET STATISTICS (2) memory error in the postgis estimator, try re-building postgis with debugging on (3) memory bug in your version of PostgreSQL, check release notes ?

comment:15 by realityexists, 9 years ago

I forgot to mention: I also reproduced this on a different machine:

PostgreSQL 9.2.10 on i686-pc-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 32-bit POSTGIS="2.1.6dev r13356" GEOS="3.4.0dev-CAPI-1.8.0 r3879" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.8" RASTER

Repro 3 (the simplest one) still works there, even without of a transaction now. I just created a brand new DB, restored the data and reproduced the issue there.

I haven't played around with SET STATISTICS at all, so I don't think that's a factor. No PostgreSQL release notes to check, since I'm running the latest minor releases for 9.2 and 9.3.

comment:16 by realityexists, 9 years ago

I built the above PostGIS version with —enable-debug and tried repro 3 again, but the debug output is 3.5 GB so far and still growing! You can try it for yourself, here's the data and query: https://www.sendspace.com/file/fx5gy9

by realityexists, 9 years ago

Attachment: repro3-debug3-logs.zip added

Debug log (level 3) from running repro 3. Problem occurs only in a transaction.

comment:17 by pramsey, 9 years ago

Not reproducing on OSX

                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..68054.38 rows=541720 width=8)
   ->  Seq Scan on test_poly a  (cost=0.00..1306.05 rows=14805 width=880)
   ->  Index Scan using ix__temp_line_line on _temp_line s  (cost=0.00..4.47 rows=4 width=36)
         Index Cond: (a.border && line)
         Filter: (_st_distance(a.border, line, 0::double precision, false) < 1e-05::double precision)
(5 rows)

comment:18 by realityexists, 9 years ago

Hmm, OK… Have you tried the other repro I uploaded?

comment:19 by pramsey, 9 years ago

No, using your latest bundle and testing both within and outside transactions, I get an index scan on OSX. PgSQL 9.2, PostGIS 2.1 svn

comment:20 by realityexists, 9 years ago

Could you please try the earlier one as well https://www.sendspace.com/file/h0sy1y ? As I mentioned, on my Windows instance I can't reproduce using the later one any more, but still can using this earlier one.

comment:21 by pramsey, 9 years ago

I can get that one (https://www.sendspace.com/file/h0sy1y ) to fail to index scan within a transaction, but not without.

comment:22 by realityexists, 9 years ago

OK, so - bug reproduced! :) It should still be using the index within a transaction, right? It does if you comment out the UPDATE statement or drop the timerange column.

comment:23 by pramsey, 9 years ago

Well, watching the selectivity calculation through both the "index used" (no transaction) and "index not used" (transaction) cases, it's exactly the same: there's stats on the airspace table but not on the _sia_segment table (no surprise, since it's brand new) and so the calculation falls back to a default selectivity. So our behaviour is consistent in both cases. I tend to think this is a PgSQL quirk, not a PostGIS quirk.

comment:24 by realityexists, 9 years ago

Hmm, OK, but why is the index used without the UPDATE statement? It would make some sense if the UPDATE caused the stats to be calculated, which then caused the index to be used - but in fact the opposite happens.

comment:25 by pramsey, 9 years ago

Milestone: PostGIS 2.1.6PostGIS 2.1.7

comment:26 by robe, 9 years ago

Milestone: PostGIS 2.1.7PostGIS 2.1.8

comment:27 by pramsey, 9 years ago

Resolution: wontfix
Status: newclosed

I'm not personally going to fix it, cause I can't find it making an error within my wheelhouse (the postgis code base)

Note: See TracTickets for help on using tickets.