Opened 6 years ago

Last modified 2 years ago

#2602 new defect

In some cases ST_Distance (geography) would be faster building the index each time

Reported by: niqueco Owned by: pramsey
Priority: medium Milestone: PostGIS Fund Me
Component: postgis Version: master
Keywords: Cc:


For some complex inputs ST_Distance is very slow, but _st_distancetree (which always builds the index) is fast.

As explained to me by Paul Ramsey in IRC (edited transcript):

<pramsey> For any given SQL query that only run a single calcuation, you'll always get the brute force calculation.

eg, SELECT ST_Distance(a.geog, b.geog) from a, b where = 1 and = 2;

On the other hand, if you run a repeated calculation on one geography you should get cached behavior and circtree indexes coming into play: SELECT ST_Distance(a.geog, b.geog) from a, b where = 1;

So then we can circle back and ask if that logic is actually good logic. Perhaps we should build and use the index every time

I think, for cases where the number of vertices is large enough the overhead of building the index O(n) + O(m) then using it O(log(n)) + O(log(m)) is going to be lower than the brute force cost O(m*n). If the shapes are small, the machinery of the index will start to outweigh the cost of just running the calculation and being done w/ it

<niqueco> can't be, perhaps, some heuristics.... with the number of vertices?

<pramsey> I think, given your experience, it makes sense, since the performance difference is so huge

Testcase: (with attached sql data loaded)

select _st_distancetree(geo1, geo2) from x;

Total runtime: 225.428 ms

select st_distance(geo1, geo2) from x;

Total runtime: 244821.796 ms

Attachments (1)

x.sql.gz (115.2 KB) - added by niqueco 6 years ago.
Testcase data

Download all attachments as: .zip

Change History (3)

Changed 6 years ago by niqueco

Attachment: x.sql.gz added

Testcase data

comment:1 Changed 2 years ago by robe

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:2 Changed 2 years ago by pramsey

This is true, maybe, but in particular if the traversal of the tree happens in some more deterministic order so that the first traversal attempts to go down the most likely branches of the tree, causing the pruning of branches to be more aggressive. As it stands, it's possible that, since the search is depth-first, useless branches can be traversed first, resulting in to pruning. Getting a good result from the trees is more-or-less a matter of luck: do we happen to traverse effective branches of the tree first?

Sorting the nodes in distance order first would ameliorate this.

Note: See TracTickets for help on using tickets.