Opened 2 years ago

Closed 18 months ago

Last modified 17 months ago

#3361 closed enhancement (fixed)

v.select: very slow using GEOS operators

Reported by: mlennert Owned by: grass-dev@…
Priority: normal Milestone: 7.4.2
Component: Vector Version: svn-trunk
Keywords: v.select GEOS within slow Cc:
CPU: Unspecified Platform: Unspecified

Description

I have not made similar tests with the other operators, but using the within operator v.select is very slow.

First I create a buffer around the NC railroads map:

v.buffer railroads dist=5000 out=rail5000

Then v.select:

time v.select ain=boundary_municp bin=rail5000 op=within out=select
real   2m13.989s
user   1m57.888s
sys    0m15.956s

Using the following script, I get the identical result much faster (maybe using v.distance is another option, but I haven't tried that):

g.copy vect=boundary_municp,munic
v.db.addcolumn munic col="totalarea double precision"
v.to.db munic op=area col=totalarea
v.overlay ain=munic bin=rail5000 op=and out=munic_and_buffer
v.db.addcolumn munic_and_buffer col="area double precision"
v.to.db munic_and_buffer op=area col=area
sleep 1
v.extract boundary_municp cat=$(db.select -c sql="select a_cat from munic_and_buffer where round(area,1)/round(a_totalarea,1)=1" | awk '{printf"%s,", $1}') output=select_bis

Time for running entire script:

real	0m14.611s
user	0m6.084s
sys	0m5.084s

I stumbled across this because a student had a within operation that kept on running for hours and hours, and using an equivalent of the above script we were able to get the same result within minutes.

I imagine that by going through GEOS we lose the spatial index, or that there are other significant overheads, and that this is what causes such a serious slowdown. This is such a difference, however, that I wonder if there is anything we could do to optimize v.select's GEOS operators ? Or is the only solution to implement the same operators natively ? Maybe a nice GSoC project ?

I'm classifying this as an enhancement, but I'm pretty close to considering such long operation time as soon as there is a significant amount of data as a bug...

Change History (8)

comment:1 Changed 2 years ago by mlennert

Summary: v.select: very slow on within (GEOS) operatorv.select: very slow using within (GEOS) operator

comment:2 Changed 22 months ago by neteler

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:3 Changed 20 months ago by mlennert

Summary: v.select: very slow using within (GEOS) operatorv.select: very slow using GEOS operators

Actually, it is not only within. Comparing the native 'overlap' operator with its GEOS equivalent, the 'intersects' operator, I get significant time difference:

time v.select -c ain=boundary_municp bin=rail5000 op=overlap out=select_overlap
real	0m27.363s
user	0m12.836s
sys	0m14.696s
time v.select -c ain=boundary_municp bin=rail5000 op=intersects out=select_intersects
real	1m12.190s
user	0m56.844s
sys	0m15.511s

comment:4 Changed 18 months ago by mmetz

Resolution: fixed
Status: newclosed

In 72705:

v.select: re-organize code to select features from vector map A by features from other vector map B (fixes #3361)

comment:5 in reply to:  4 ; Changed 18 months ago by mmetz

Replying to mmetz:

In 72705:

v.select: re-organize code to select features from vector map A by features from other vector map B (fixes #3361)

Assuming that the result will be a subset of map A, selected by features from map B, the code re-organization results in a substantial speed-up. v.select is now nearly as fast as the alternative in the description.

The results of operator=overlap and the GEOS-equivalent operator=intersects are identical, but the speed difference based on the example in the description

v.select ain=boundary_municp bin=rail5000 out=select op=overlap/intersects

is astonishing, as of trunk r72705.

comment:6 Changed 18 months ago by neteler

Milestone: 7.4.17.6.0

comment:7 in reply to:  5 Changed 18 months ago by mlennert

Replying to mmetz:

Replying to mmetz:

In 72705:

v.select: re-organize code to select features from vector map A by features from other vector map B (fixes #3361)

Assuming that the result will be a subset of map A, selected by features from map B, the code re-organization results in a substantial speed-up. v.select is now nearly as fast as the alternative in the description.

The results of operator=overlap and the GEOS-equivalent operator=intersects are identical, but the speed difference based on the example in the description

v.select ain=boundary_municp bin=rail5000 out=select op=overlap/intersects

is astonishing, as of trunk r72705.

As reported on the grass-users list, working with r72716, I actually get different results depending on whether I use intersects or overlap, when working with atype=areas and btype=lines. Don't know if this result is expected. I can provide the data privately if useful.

comment:8 in reply to:  4 Changed 17 months ago by neteler

Milestone: 7.6.07.4.2

Replying to mmetz:

In 72705:

v.select: re-organize code to select features from vector map A by features from other vector map B (fixes #3361)

Reopened for potential backport

Note: See TracTickets for help on using tickets.