Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#3344 closed enhancement (wontfix)

Unnecessary use of table/possible speed improvement

Reported by: postgispaul Owned by: pramsey
Priority: medium Milestone: PostGIS 2.1.9
Component: postgis Version: 2.1.x
Keywords: Cc: postgispaul

Description

Situation:

  • table polygons with geomety and a nameid
  • table names with id and name

query-plan for SELECT st_extent(geometry) FROM polygons gives me a sequential scan over the polygons table

but the query-plan for

SELECT st_extent(geometry) FROM polygons LEFT JOIN names ON names.id=polygons.nameid

gives me a sequential scan over polygons and after that a nested loop fetching the name from the names table.

This last step seems really superfluous as the extent doesn't change if a left join is involved or not.

This is particularly annoying when using postgresql/postgis with qgis if the left join is against a slow source, because qgis often does a SELECT st_extent

Change History (7)

comment:1 by robe, 9 years ago

Milestone: PostGIS 2.1.9
Resolution: wontfix
Status: newclosed

If there is an issue here, it is more to do with the PostgreSQL query planner and little to do with postgis.

First of all you didn't state which version of PostgreSQL or PostGIS you are using.

How the PostgreSQL planner operates on joins has changed over the years and would also I think be dependent on what keys you have in place. I think PostgresQL got smarter around 9.1 with how it handles left joins to determine if it has to look at the second table or not. If names.id isn't marked as a primary key in the second table, then it would definitely have to treat it as if it would effect the end result.

comment:2 by postgispaul, 9 years ago

Sorry for omitting the versions, they are: POSTGIS="2.1.7 r13414" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24 GDAL_DATA not found" LIBXML="2.7.8" LIBJSON="UNKNOWN" TOPOLOGY RASTER

PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit

And as you say, it's more a Postgres issue: I agree, but wasn't sure about it. As for your last remark: names.id isn't a primary key, but I can't see why that should make a difference in the end-result.

comment:3 by postgispaul, 9 years ago

Cc: postgispaul added

comment:4 by robe, 9 years ago

It does. Because if the query planner can't guarantee that a join with names table doesn't change the count of rows to process, then it can't just ignore it. That's how this new feature (it was in 9.0 actually) works.

From 9.0 release notes: http://www.postgresql.org/docs/9.0/static/release-9-0.html


E.24.3.1.3. Optimizer

Remove unnecessary outer joins (Robert Haas)

Outer joins where the inner side is unique and not referenced above the join are unnecessary and are therefore now removed. This will accelerate many automatically generated queries, such as those created by object-relational mappers (ORMs).


Even though ST_Extent answer wouldn't change for this particular case you have, the planner doesn't know the details of what goes in ST_Extent, so can't assume if it passes in 3 duplicatate rows the end result is same as single row. It treats it like a black box.

So I would suggest if you can to put a primary key or unique key on the names.id field.

comment:5 by postgispaul, 9 years ago

Thanks for your thorough answer, and that reassures me that posting in the postgis area is correct after all…

But I still stand by my assumption that being a primary key does in fact *not* change the end-result in case of a left-join.

Edit2: just tested this with a left join lateral icw limit 1, but I guess that's a bit too tricky for postgres's query planner to understand (still uses names table)

I assume there isn't a mechanism present where st_extent publishes this kind of information (duplicates won't change the result) so that the query planner of postgres knows it can skip scanning that table? It is not possible for me to make id a primary key, because a polygon can have multiple names (different languages).

Edit: as a side-note: I'm really grateful for your responses, but to be honest, I find http://trac.osgeo.org/postgis/ticket/3341 a bit more annoying

Last edited 9 years ago by postgispaul (previous) (diff)

comment:6 by robe, 9 years ago

Paul, Nope there isn't currently anyway to tell postgres planner that info. Do you have an index at all on the names table. I would have expected at least an index scan thought that would depend on size of table.

We'll look at #3341 one in a bit. That one requires more reading to see if there is an issue or not or just a side-effect of 1) cartesian long/lat data not being a good choice for doing differencing (particularly for large areas), its better to transform and then transform back, 2) just the nature of double-precision rounding

comment:7 by postgispaul, 9 years ago

Ok, I understand. Whether or not indexscan is used is indeed dependent on the size, some names tables use them, some of them don't.

Note: See TracTickets for help on using tickets.