Opened 11 years ago
Last modified 7 years ago
#2602 new defect
In some cases ST_Distance (geography) would be faster building the index each time
Reported by: | niqueco | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS Fund Me |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description
For some complex inputs ST_Distance is very slow, but _st_distancetree (which always builds the index) is fast.
As explained to me by Paul Ramsey in IRC (edited transcript):
<pramsey> For any given SQL query that only run a single calcuation, you'll always get the brute force calculation.
eg, SELECT ST_Distance(a.geog, b.geog) from a, b where a.id = 1 and b.id = 2;
On the other hand, if you run a repeated calculation on one geography you should get cached behavior and circtree indexes coming into play: SELECT ST_Distance(a.geog, b.geog) from a, b where a.id = 1;
So then we can circle back and ask if that logic is actually good logic. Perhaps we should build and use the index every time
I think, for cases where the number of vertices is large enough the overhead of building the index O(n) + O(m) then using it O(log(n)) + O(log(m)) is going to be lower than the brute force cost O(m*n). If the shapes are small, the machinery of the index will start to outweigh the cost of just running the calculation and being done w/ it
<niqueco> can't be, perhaps, some heuristics…. with the number of vertices?
<pramsey> I think, given your experience, it makes sense, since the performance difference is so huge
Testcase: (with attached sql data loaded)
select _st_distancetree(geo1, geo2) from x;
Total runtime: 225.428 ms
select st_distance(geo1, geo2) from x;
Total runtime: 244821.796 ms
Attachments (1)
Change History (3)
by , 11 years ago
comment:2 by , 7 years ago
This is true, maybe, but in particular if the traversal of the tree happens in some more deterministic order so that the first traversal attempts to go down the most likely branches of the tree, causing the pruning of branches to be more aggressive. As it stands, it's possible that, since the search is depth-first, useless branches can be traversed first, resulting in to pruning. Getting a good result from the trees is more-or-less a matter of luck: do we happen to traverse effective branches of the tree first?
Sorting the nodes in distance order first would ameliorate this.
Testcase data