Opened 15 years ago

Closed 6 years ago

#1088 closed enhancement (fixed)

r.fillnulls: support other interpolation methods

Reported by: kyngchaos Owned by: grass-dev@…
Priority: normal Milestone: 7.2.1
Component: Raster Version: svn-develbranch6
Keywords: r.fillnulls, RST, bspline, r.buffer, r.grow.distance Cc:
CPU: All Platform: All

Description

Support other interpolation methods in r.fillnulls, like bilinear and bicubic. This can be done from v.surf.bspline. Patch attached for dev6 branch.

Attachments (2)

r.fillnulls.patch (2.7 KB ) - added by neteler 14 years ago.
Updated to current SVN state
rbufpy_dual_method.diff (5.1 KB ) - added by hamish 13 years ago.
yet another crack at it

Download all attachments as: .zip

Change History (37)

comment:1 by kyngchaos, 15 years ago

...I should make sure it works before attaching the patch...

comment:2 by kyngchaos, 15 years ago

Now the patch works.

I forgot to mention, the idea is that the rst method may not work well on large areas (blocks from segmentation don't align well). bilinear/bicubic seems to process smoother.

comment:3 by hamish, 15 years ago

CPU: UnspecifiedAll
Platform: UnspecifiedAll

thanks, this has been on the todo list for a long time (I 1/2 thought there was already a ticket for it, but guess not).

here's a question (maybe for Markus Metz): what method should be the default in grass 7?

porting this patch to python should be trivial.

cheers, Hamish

comment:4 by kyngchaos, 15 years ago

One thing that would be nice in v.surf.bspline is for it to use the MASK, like v.surf.rst does, otherwise there can be a lot of warnings about no data in a subregion when I want to mask off areas that I want to stay null. (there's a note in the patch about the MASK not applying, but I set it anyways for the future possibility)

I was going to add a feature request for this, but I'm burnt out today from testing the processing of the patch.

comment:5 by hamish, 15 years ago

MASKs used in GRASS are only used when _reading_ existing raster maps. They do not mask writing. If v.surf.rst respects MASKs at all it would be somewhat surprising news to me and should be well documented.

As the output raster map is only written for the current region, ignoring points outside of this box (and maybe a little buffer around it) can save some needless work, and so v.surf.rst does do that.

Hamish

ps- note to self:

- g.rename > /dev/null
+ g.rename --quiet

comment:6 by kyngchaos, 15 years ago

Sorry, I was being vague, v.surf.rst has an option to set a mask for writing (though I didn't realize that MASKs only apply to reading, thanks, though it makes sense).

in reply to:  description comment:7 by mmetz, 15 years ago

In grass 7, it's already implemented in r.resamp.bspline. r.resamp.bspline interpolates a raster to the current resolution, filling NULL cells on the fly, or optionally only interpolates NULL cells. Available methods are bilinear and bicubic, tested with SRTM data, filling large gaps in the European Alps.

For grass 6.x, I would use a different approach to r.fillnulls to follow the design of v.surf.bspline, because only interpolating NULL cells is already possible with v.surf.bspline:

Create a new raster where NULL cells in the original surface raster are set to something else than NULL, others to NULL. Convert both raster maps to vector points. Use the vector points representing NULL cells as sparse points input for v.surf.bspline. Recommended settings for v.surf.bspline in this case are sie = 2 * ewres, sin = 2 * nsres, lambda = 0.005 (best in my tests, should be somewhere between 0.001 and 0.01). The output raster holds the interpolated NULL cells and can be patched with the original surface.

Although all lidar tools work within the current region, none respects a MASK. I'm not sure how this could be implemented properly and efficiently, currently there will be IMHO harmless warnings about no points in the current subregion.

Markus M

comment:8 by kyngchaos, 15 years ago

Markus, see my patch. It's very similar to what you describe. A couple notes/questions:

I didn't completely understand the description for the sparse option, but it didn't look like it was needed here. Now it looks to me like it's essentially a mask option, like the maskmap option in v.surf.rst. For now, the mask is implied when the results are patched into the original raster. I guess a mask option (if sparse is such) only makes sense if it reduces processing time by ignoring the masked areas.

The description says 1 * resolution was good for regularly spaced points, which is what we get from a raster. Why 2* in your method?

For the lambda, the docs are unclear what orders of numbers have effects, beyond "the larger the number the more smoothing applied". The default is one. You say 0.001-0.01 work well. I figured 1 was something like 1*, or no smoothing. Can you clarify this option?

in reply to:  8 ; comment:9 by mmetz, 15 years ago

Replying to kyngchaos:

A couple notes/questions:

I didn't completely understand the description for the sparse option, but it didn't look like it was needed here. Now it looks to me like it's essentially a mask option, like the maskmap option in v.surf.rst. For now, the mask is implied when the results are patched into the original raster. I guess a mask option (if sparse is such) only makes sense if it reduces processing time by ignoring the masked areas.

Right. Unfortunately, using sparse points as input for v.surf.bspline does not reduce processing time substantially, although it could be implemented similar to what I did for r.resamp.bspline. The sparse points act somewhat similar to a mask in the way that only these sparse points are interpolated. The matrix calculations are done anyway, a waste of time if no sparse points are within the current subregion. Therefore I think your approach is much faster for just a few NULL cells in a large surface raster.

The description says 1 * resolution was good for regularly spaced points, which is what we get from a raster. Why 2* in your method?

This is a compromise between speed, smoothing, and interpolating both small and large gaps. The spline step is similar to tension in RST, larger spline step values mean that points influence each other over larger distances. 1 * resolution seems to be the best for very small gaps, but for larger steps, I would use larger spline steps. I did a bit of testing, punching holes into a DEM and interpolating these holes. Spline steps of 1.5 - 2 * resolution produced best results, i.e. interpolated values were closest to real values.

I have to update the manual for r.resamp.bspline because it uses 1.5 * resolution as default spline step. For a gap-free surface, I think 1 * resolution of the input raster is best

For the lambda, the docs are unclear what orders of numbers have effects, beyond "the larger the number the more smoothing applied". The default is one. You say 0.001-0.01 work well. I figured 1 was something like 1*, or no smoothing. Can you clarify this option?

Umm, that's trial and error what I did there. Lambda = 1 did usually not provide very nice results, too much smoothing. Lambda and spline step influence each other, I tried to find reasonably good default values, but sometimes some fine-tuning might be required. I guess more testing is needed to determine the smoothing behaviour of different lambda values, or there is someone out there who understands the theory behind the code and could help.

Markus M

in reply to:  9 comment:10 by mmetz, 15 years ago

Replying to mmetz:

Replying to kyngchaos:

A couple notes/questions:

I didn't completely understand the description for the sparse option, but it didn't look like it was needed here. Now it looks to me like it's essentially a mask option, like the maskmap option in v.surf.rst. For now, the mask is implied when the results are patched into the original raster. I guess a mask option (if sparse is such) only makes sense if it reduces processing time by ignoring the masked areas.

Right. Unfortunately, using sparse points as input for v.surf.bspline does not reduce processing time substantially, although it could be implemented similar to what I did for r.resamp.bspline.

I just noticed that I did implement it, interpolation in v.surf.bspline is skipped if a sparse points vector is given but no sparse points fall into the current subregion. The advantage of the sparse points approach for filling NULL cells would be that more than the currently 3 cells around a gap are used for interpolation, providing (in theory) better results for larger gaps.

Markus M

by neteler, 14 years ago

Attachment: r.fillnulls.patch added

Updated to current SVN state

comment:11 by neteler, 14 years ago

This patch (I have updated it right now to current SVN) is yet interesting and should IMHO be submitted.

comment:12 by hamish, 13 years ago

Milestone: 6.5.06.4.3
  • applied in devbr6 by MarkusN in r50114, with sie,sin = 3*res and the default lambda_i=1.
  • applied to trunk by luca in r50128 with sie,sin = 1*res and the default lambda_i=1.

see above comments for MarkusM's recommendation for using si=2*res and lambda=0.005. shall we apply those coeff's? MM: are those numbers going to be dataset dependent, or are they generally good for the way r.fillnulls works by extracting the centers of the 3 cell buffer pixels as the vector starting points?

  • in 6.5svn I added a step which zooms into the minimum region containing all holes before running v.surf.rst. For my data this made the region about half the size and r.fillnulls ran twice as fast. beforehand v.surf.rst and v.surf.bspline bicubic methods took about the same amount of time (6.5svn versions..), so for my particular srtm data the v.surf.rst method was now twice as fast as the v.surf.bspline bicubic method. I tried running v.surf.bspline with the region zoom but it made no difference besides using 30 subregions (with a few empty) instead of the original 77 subregions (with a lot empty). actually for bspline with the region zoom it took about 10-15% longer, so I only applied it to the RST method, i.e. processing time was mostly independent from region settings. I suspect the original subregion segments were slightly more efficient than the zoomed version due to a bit of luck of where the points were & what subregions were empty.

both RST and bspline results were (very nearly) identical to without the zoom.

nominating both the method= and the RST zoom patch for inclusion in 6.4.3.

trialing bspline in grass7 now... rst time was about the same as in devbr6, a wee bit faster due to partial OpenMP support.

Hamish

comment:13 by hamish, 13 years ago

Keywords: r.fillnulls added; fillnulls removed

trialing bspline in grass7 now...

the zoom trick for bspline is the same as in grass65: almost the same but slightly slower.

I changed the bspline coeffs to si=3*res and lambda_i = 0.005. Should we change the default option answer to that? I note in the cross-validation code there the maximum they try is 0.05, so maybe 1.0 is way out of useful range.

with that, time to complete was not quite as fast as zoomed-RST, but about twice as fast as in grass65. (partly due to si= settings, partly due to using some OpenMP, and perhaps partly due to speedups in trunk that MM was talking about?)

both RST and bspline results were (very nearly) identical to without the zoom.

(there I meant r.univar results, but really that should be done on the cells-to-be patched in, not the final result where n_patched is quite a small percentage of the whole. an over-zoomed-in d.rast overlay didn't show discernible visual color change)

Hamish

in reply to:  12 ; comment:14 by mmetz, 13 years ago

Replying to hamish:

  • applied in devbr6 by MarkusN in r50114, with sie,sin = 3*res and the default lambda_i=1.
  • applied to trunk by luca in r50128 with sie,sin = 1*res and the default lambda_i=1.

see above comments for MarkusM's recommendation for using si=2*res and lambda=0.005.

The sie/sin settings are the result of guestimation and experimenting. I found that for evenly placed points, e.g. a vectorized raster without gaps, 1 * res gives nice results. With r.fillnulls, the points are not evenly spaced and sie/sin needs to be larger than 1*res in order to avoid artifacts. For very large gaps, support points are needed anyway by both RST and bspline. For moderately large gaps, 2*res or 3*res seems to work fine with bspline.

The idea of adding bspline to r.fillnulls is to have a different interpolation method which should give different results, just like in r.resamp.interp. RST tends to provide more detail at the cost of overshoots, whereas bspline with method=bilinear does not produce overshoots and produces a smoother surface. Bspline with method=bicubic can produce results very similar to RST, but AFAIU the idea of bspline interpolation is to avoid overshoots and to produce a smoother surface. Therefore r.fillnulls could just as well have the method options rst,bspline with bspline using bilinear by default. The results would then be intentionally different.

I have modified r.fillnulls in trunk to use r.resamp.bspline which is designed for raster interpolation instead v.surf.bspline. That avoids the vectorization step and essentially reduces the processing to r.resamp.bspline -n followed by r.patch.

Markus M

PS: I fixed r.buffer in trunk, it could now be used (again) by r.fillnulls instead of r.grow.

in reply to:  14 ; comment:15 by hamish, 13 years ago

Replying to hamish:

see above comments for MarkusM's recommendation for using si=2*res and lambda=0.005.

Replying to mmetz:

The sie/sin settings are the result of guestimation and experimenting.

any idea about the lambda_i setting? I suggest to change the default to 0.01, but I've only tested with lat/lon srtm data so far. (v.surf.bspline cross-validation output is now a bit easier to digest in 6.5svn, btw; will port to trunk soon)

for that data res*3.0 seemed to do well for sie,sin. res*1.0 was not enough for the patched tiles I tried it with.

RST tends to provide more detail at the cost of overshoots,

fwiw in my tests yesterday v.surf.bspline bicubic with lambda=0.005 and si?=res*3 did the best job, v.surf.bspline with default lambda=1 wasn't totally horrible, but wasn't great, and rst was fairly obvious something had been patched in (but I didn't tune the tension at all).

whereas bspline with method=bilinear does not produce overshoots and produces a smoother surface.

but the downside of that is that it lowers the contrast/rounds down towards the mean more than a spline fit would.. which is an issue because at least for srtm the holes usually happen in the mountains. fwiw most of the overshoot errors from r.fillnulls are from the masked area afaict, so perhaps can be ignored.

Bspline with method=bicubic can produce results very similar to RST, but AFAIU the idea of bspline interpolation is to avoid overshoots and to produce a smoother surface. Therefore r.fillnulls could just as well have the method options rst,bspline with bspline using bilinear by default. The results would then be intentionally different.

I personally prefer bicubic to bilinear (will let a job run slower if it means a better end result), and I'd vote to keep method=rst,bicubic,bilinear instead of method= + method2= options. the user doesn't care about which middleware modules, are doing the work, just the beginning and the end.

I have modified r.fillnulls in trunk to use r.resamp.bspline which is designed for raster interpolation instead v.surf.bspline.

I'll try it out & see how it does against the others.

That avoids the vectorization step

fwiw with -z that wasn't too costly, but happy to try out a new option and see if it does as well as v.surf.bspline's bicubic + lambda=0.005 (which _really_ did nicely).

also, what would it take to get 'g.region vect=' to work with a level 1 map? right now you get a no-topology error if you try it with 'r.to.vect -b' (& so make the rst method prep work even smaller/faster/less ram).

and essentially reduces the processing to r.resamp.bspline -n followed by r.patch.

btw, you'll need to move that del_temp_region back to where it was. with it still in place when you run r.patch you're cropping the patched result with the zoomed-in region. the python lib create temp region fn includes a on-exit hook to automatically unset and delete itself when the module ends, so don't worry about leftovers.

PS: I fixed r.buffer in trunk, it could now be used (again) by r.fillnulls instead of r.grow.

you mean r.buffer.py :-) "r.buffer2" (the classic C version) in trunk always worked bug free for many years, and is about 50x faster (race them with time). Perhaps the python version was an old experiment to see if some modules could be combined in a single engine?

thanks, Hamish

in reply to:  15 ; comment:16 by mmetz, 13 years ago

Replying to hamish:

Replying to hamish:

see above comments for MarkusM's recommendation for using si=2*res and lambda=0.005.

Replying to mmetz:

The sie/sin settings are the result of guestimation and experimenting.

any idea about the lambda_i setting?

In my tests, 1 is usually way too high, 0.005 seems to do well in mountainous areas, 0.1 is better for nearly flat areas.

RST tends to provide more detail at the cost of overshoots,

fwiw in my tests yesterday v.surf.bspline bicubic with lambda=0.005 and si?=res*3 did the best job, v.surf.bspline with default lambda=1 wasn't totally horrible, but wasn't great, and rst was fairly obvious something had been patched in (but I didn't tune the tension at all).

I think it is not easy to avoid segmentation artifacts with RST (very high npmin, very low segmax maybe).

whereas bspline with method=bilinear does not produce overshoots and produces a smoother surface.

but the downside of that is that it lowers the contrast/rounds down towards the mean more than a spline fit would..

Sometimes this is a downside, sometimes not. For hydrological analysis, overshoots are a problem and must be dealt with somehow.

Bspline with method=bicubic can produce results very similar to RST, but AFAIU the idea of bspline interpolation is to avoid overshoots and to produce a smoother surface. Therefore r.fillnulls could just as well have the method options rst,bspline with bspline using bilinear by default. The results would then be intentionally different.

I personally prefer bicubic to bilinear (will let a job run slower if it means a better end result), and I'd vote to keep method=rst,bicubic,bilinear instead of method= + method2= options. the user doesn't care about which middleware modules, are doing the work, just the beginning and the end.

Sounds good. Anyway, the manual should also state that r.fillnulls is an attempt for semi-automated NULL filling, and if in doubt this must be done manually to obtain the desired results.

what would it take to get 'g.region vect=' to work with a level 1 map?

The routines are already in v.info (trunk).

and essentially reduces the processing to r.resamp.bspline -n followed by r.patch.

btw, you'll need to move that del_temp_region back to where it was. with it still in place when you run r.patch you're cropping the patched result with the zoomed-in region.

Oops. Changed in r50803. The grid geometry of the input raster is still maintained, though.

PS: I fixed r.buffer in trunk, it could now be used (again) by r.fillnulls instead of r.grow.

you mean r.buffer.py :-) "r.buffer2" (the classic C version) in trunk always worked bug free for many years, and is about 50x faster (race them with time). Perhaps the python version was an old experiment to see if some modules could be combined in a single engine?

Hmm. One of the two r.buffer's should probably be deactivated, after comparative testing with latlong and real projections.

Markus M

in reply to:  16 ; comment:17 by hamish, 13 years ago

Replying to hamish:

any idea about the lambda_i setting?

Replying to mmetz:

In my tests, 1 is usually way too high, 0.005 seems to do well in mountainous areas, 0.1 is better for nearly flat areas.

in trunk and devbr6 I have changed it to 0.01 now. I would suggest to backport to 6.4svn ASAP before it harms any more data, but would like to test it with some meter-based LIDAR data first, as I've only tried with lat/lon srtm so far.

I think it is not easy to avoid segmentation artifacts with RST (very high npmin, very low segmax maybe).

for r.fillnulls I'd expect the 3-cell buffer of most holes not to fall on segmentation boundaries. it wasn't that it had obvious segmentation lines in it, rather it just didn't follow the ridges and valleys in such a natural way. (qualitative eyeball analysis) I've been convinced for years that there must be a way to avoid the segmentation artifacts without massively overlapping the quadtree boxes.. still keeping an eye out.

Sounds good. Anyway, the manual should also state that r.fillnulls is an attempt for semi-automated NULL filling, and if in doubt this must be done manually to obtain the desired results.

yeah, the man page needs quite a bit of work actually.. and I can't quite figure out what the WARNING is trying to get you to delta against with r.mapcalc to verify the interpolation. seems an impossible task, since if you knew the correct answer you there'd be no reason to interpolate.

Hmm. One of the two r.buffer's should probably be deactivated, after comparative testing with latlong and real projections.

I get 4.3sec for a test in r.buffer.py, 0.140 sec for r.buffer.c, for a simple test starting at a single cell, with visually identical results. Actually I was pleasantly surprised that r.buffer.py did the right thing in lat/lon and made a pear-shaped buffer for the plate carree display, wider nearer to the poles.

Hamish

in reply to:  17 ; comment:18 by mmetz, 13 years ago

Replying to hamish:

Replying to hamish:

any idea about the lambda_i setting?

Replying to mmetz:

In my tests, 1 is usually way too high, 0.005 seems to do well in mountainous areas, 0.1 is better for nearly flat areas.

in trunk and devbr6 I have changed it to 0.01 now. I would suggest to backport to 6.4svn ASAP before it harms any more data, but would like to test it with some meter-based LIDAR data first, as I've only tried with lat/lon srtm so far.

Tested with LiDAR datasets, lambda=1 is too high, synced in 6.4 to trunk/devbr6 with default lambda=0.01.

I think it is not easy to avoid segmentation artifacts with RST (very high npmin, very low segmax maybe).

for r.fillnulls I'd expect the 3-cell buffer of most holes not to fall on segmentation boundaries. it wasn't that it had obvious segmentation lines in it, rather it just didn't follow the ridges and valleys in such a natural way. (qualitative eyeball analysis) I've been convinced for years that there must be a way to avoid the segmentation artifacts without massively overlapping the quadtree boxes.. still keeping an eye out.

I had obvious segmentation lines with RST. From my testing with the bspline method I think that there is no way around massively overlapping quadtree boxes.

Hmm. One of the two r.buffer's should probably be deactivated, after comparative testing with latlong and real projections.

I get 4.3sec for a test in r.buffer.py, 0.140 sec for r.buffer.c, for a simple test starting at a single cell, with visually identical results. Actually I was pleasantly surprised that r.buffer.py did the right thing in lat/lon and made a pear-shaped buffer for the plate carree display, wider nearer to the poles.

In trunk, r.buffer.c loads both input and output completely to memory, whereas r.buffer.py using r.grow.distance doesn't. Therefore r.buffer.c is faster for smaller maps but will freeze the system if it goes into swap space. r.buffer.py keeps the disk busy, but the system remains responsive.

Markus M

comment:19 by cmbarton, 13 years ago

The trick on sin and sie (spline interval north and spline interval east) is to set them to the minimum NS (sin) and EW (sie) distance between points. A first run of v.surf.bspline -e will provide a decent estimate of this to use. If you set these too large, it will take forever to run.

Michael

in reply to:  18 ; comment:20 by hamish, 13 years ago

Replying to mmetz:

I have modified r.fillnulls in trunk to use r.resamp.bspline which is designed for raster interpolation instead v.surf.bspline.

my test 2x3 deg srtm mosaic takes 40 seconds to complete with 'r.resamp.bspline method=bilinear' as the backend and 2 minutes using bicubic (slightly better results). v.surf.bspline was taking ~4-6 minutes for bicubic, and r.surf.rst takes about 4 minutes with the zoom-to-data trick. both rst(no zoom) and v.surf.bspline's bicubic took about 8 minutes in grass 6 (no OpenMP for both there [still inefficiently used, but helps a bit]).

Replying to mmetz:

In trunk, r.buffer.c loads both input and output completely to memory, whereas r.buffer.py using r.grow.distance doesn't. Therefore r.buffer.c is faster for smaller maps but will freeze

(ie degrade)

the system if it goes into swap space. r.buffer.py keeps the disk busy, but the system remains responsive.

hmmm, what if r.buffer.py checks rows*columns and decides which backend it should call based on the result. small regions get sent to one, large ones to another, & best of both worlds.

(as long as the two r.buffer2/r.grow.distance backends give identical results, which visually they do)

btw, you'll need to move that del_temp_region back to where it was. with it still in place when you run r.patch you're cropping the patched result with the zoomed-in region.

Oops. Changed in r50803. The grid geometry of the input raster is still maintained, though.

yeah, but if the interp and the orig are patched together on the orig's native grid (I'm not sure if it makes a practical difference here, but no objection to that), it will still need a 'r.mapcalc "resamp = patched" step to follow the rule that output maps are created using the (real) current region's grid settings, not the input map's native alignment. the alternative is to decalre that r.fillnulls works on the input map's native region completely and ignores the current region settings completely. but that stops you from working on sub-windows of a larger dataset.

what would it take to get 'g.region vect=' to work with a level 1 map?

The routines are already in v.info (trunk).

shall we move vector/v.info/level1.c into libvector as Vect_level1_describe(Plus_head *) or similar? seems quite reusable.

Hamish

in reply to:  20 ; comment:21 by mmetz, 13 years ago

Replying to hamish:

Replying to mmetz:

In trunk, r.buffer.c loads both input and output completely to memory, whereas r.buffer.py using r.grow.distance doesn't. Therefore r.buffer.c is faster for smaller maps but will freeze

(ie degrade)

the system if it goes into swap space. r.buffer.py keeps the disk busy, but the system remains responsive.

hmmm, what if r.buffer.py checks rows*columns and decides which backend it should call based on the result. small regions get sent to one, large ones to another, & best of both worlds.

Small and large is relative to the amount of free memory. You would need to check for that first before you decide what is small and what is large. IMHO too complicated, I would just use the safer method.

btw, you'll need to move that del_temp_region back to where it was. with it still in place when you run r.patch you're cropping the patched result with the zoomed-in region.

Oops. Changed in r50803. The grid geometry of the input raster is still maintained, though.

yeah, but if the interp and the orig are patched together on the orig's native grid (I'm not sure if it makes a practical difference here, but no objection to that), it will still need a 'r.mapcalc "resamp = patched" step to follow the rule that output maps are created using the (real) current region's grid settings, not the input map's native alignment. the alternative is to decalre that r.fillnulls works on the input map's native region completely and ignores the current region settings completely. but that stops you from working on sub-windows of a larger dataset.

Working with subwindows is supported, the subwindow is aligned to the input, though. Surface resampling is another issue and I would rather avoid implicit nearest-neighbour resampling. The standard resampling modules for surfaces are r.resamp.interp (anything but nearest) and r.resamp.stats. r.fillnulls should do just that, filling nulls, and not do resampling. Working with a subregion is fine of course.

what would it take to get 'g.region vect=' to work with a level 1 map?

The routines are already in v.info (trunk).

shall we move vector/v.info/level1.c into libvector as Vect_level1_describe(Plus_head *) or similar? seems quite reusable.

Could make sense, I would like to think a bit about it.

Markus M

by hamish, 13 years ago

Attachment: rbufpy_dual_method.diff added

yet another crack at it

in reply to:  21 ; comment:22 by hamish, 13 years ago

Replying to hamish:

hmmm, what if r.buffer.py checks rows*columns and decides which backend it should call based on the result. small regions get sent to one, large ones to another, & best of both worlds.

Replying to mmetz:

Small and large is relative to the amount of free memory. You would need to check for that first before you decide what is small and what is large. IMHO too complicated, I would just use the safer method.

see attached patch for a rough idea. check free memory is not a road I want to go down, so set a default at 12000x12000 cells (I just picked a number), and if it's bigger than that use the r.grow.distance method, otherwise the fast but memory intensive method. Add an option to allow the user to override the default.

4.5 sec for .py vs. 0.140 sec for .c. I know I've got at least one script which runs r.buffer in a loop for thousands of starting points, I don't want to think how long it will take at 4.5sec each iteration. (actually 0.250 sec for r.buffer2 if you use the python wrapper)

for a 12000x12000 region using r.buffer2: 17.4 sec, 156mb RAM. same using r.grow.distance: 10 minutes 7 second, 20mb RAM (takes forever reading and writing map, even on a fast RAID [it's CPU bound not IO bound])

20000x20000 region: r.buffer2 takes 51 sec, uses 400mb RAM (1 byte per cell). I'm not going to bother to find out how long r.grow.distance would take.

40000x40000 region: r.buffer2 takes 4 min 7sec (~1 min to read, ~1 min to buffer, ~2 min to write), uses 1.6gb RAM. r.grow.distance, no idea, maybe days to complete.

95% of work will be smaller than 40k2 cells. the r.grow.distance method seems like it is best left as a footnote + example in the r.buffer2 help page.

r.fillnulls should do just that, filling nulls, and not do resampling.

I am not convinced (not even a little). No exceptions to the standard map creation conventions unless it is absolutely fundamental to what the module does. If this was a edit-in-place module like r.null maybe I could buy the argument, but it isn't. Either deal with all of the input map's native bounds, or use the current region, not half and half. Using the current region settings gives the user the choice of which one they want.

thanks, Hamish

in reply to:  14 ; comment:23 by glynn, 13 years ago

Replying to mmetz:

PS: I fixed r.buffer in trunk, it could now be used (again) by r.fillnulls instead of r.grow.

What was the problem? Because the fix is needlessly inefficient. The efficient way to compare Euclidean distance against a fixed value is "distance^2 < limit^2" rather than "sqrt(distance^2) < limit". Calculating Euclidean distance invariably involves calculating distance-squared as an intermediate value; if you only need the value for a comparison, squaring one value is faster than taking the square root of the other.

in reply to:  23 ; comment:24 by mmetz, 13 years ago

Replying to glynn:

Replying to mmetz:

PS: I fixed r.buffer in trunk, it could now be used (again) by r.fillnulls instead of r.grow.

What was the problem?

metric=geodesic does not return squared distances, and for other metrics the distances as returned by r.grow.distance were in map units, not meters, but r.buffer requires meter as unit.

Because the fix is needlessly inefficient. The efficient way to compare Euclidean distance against a fixed value is "distance^2 < limit^2" rather than "sqrt(distance^2) < limit". Calculating Euclidean distance invariably involves calculating distance-squared as an intermediate value; if you only need the value for a comparison, squaring one value is faster than taking the square root of the other.

Ok, r.buffer now (r50830) uses squared distance for metric=squared and regular distance in meters for metric=geodesic.

Markus M

in reply to:  22 comment:25 by mmetz, 13 years ago

Replying to hamish:

Replying to mmetz:

r.fillnulls should do just that, filling nulls, and not do resampling.

I am not convinced (not even a little). No exceptions to the standard map creation conventions unless it is absolutely fundamental to what the module does. If this was a edit-in-place module like r.null maybe I could buy the argument, but it isn't. Either deal with all of the input map's native bounds, or use the current region, not half and half. Using the current region settings gives the user the choice of which one they want.

In theory I would agree, but not everybody knows that for a 3 arcsec SRTM g.region -a res=00:00:03 is wrong and only g.region align=SRTM properly aligns the region to the raster map.

If you want to respect the current region, you would need to remove the align=input option from g.region in r.fullnulls when you zoom to the holes. You could zoom to the raster buffer instead which has been created using the current region.

Markus M

in reply to:  24 ; comment:26 by glynn, 13 years ago

Replying to mmetz:

Ok, r.buffer now (r50830) uses squared distance for metric=squared and regular distance in meters for metric=geodesic.

I think that r.grow.distance needs this, right?

--- raster/r.grow.distance/main.c	(revision 50836)
+++ raster/r.grow.distance/main.c	(working copy)
@@ -208,8 +208,11 @@
     else
 	G_fatal_error(_("Unknown metric: '%s'"), opt.met->answer);
 
-    if (flag.m->answer)
+    if (flag.m->answer) {
 	scale = G_database_units_to_meters_factor();
+	if (strcmp(opt.met->answer, "squared") == 0)
+	    scale *= scale;
+    }
 
     in_fd = Rast_open_old(in_name, "");
 

comment:27 by hamish, 13 years ago

none of this gets away from the fact that r.buffer.py is consistently ~ 40x slower than r.buffer2 at all(?) region sizes. For my 8GB workstation the extra RAM needed by the classic r.buffer.c would mean I'd have to transition to the less ram-hungry r.grow.distance method at a region size of 90000x90000 (mem. req. here is 1 byte per region cell), which is, so far, still a corner case. r.buffer.py is an interesting exercise, but I'm not convinced that the r.grow.distance method needs to be anything more than a full example in the r.buffer help page, wrt alternate steps to take if it wants to use more RAM than you've got.

regards, Hamish

in reply to:  26 comment:28 by mmetz, 13 years ago

Replying to glynn:

Replying to mmetz:

Ok, r.buffer now (r50830) uses squared distance for metric=squared and regular distance in meters for metric=geodesic.

I think that r.grow.distance needs this, right?

> --- raster/r.grow.distance/main.c	(revision 50836)
> +++ raster/r.grow.distance/main.c	(working copy)
> @@ -208,8 +208,11 @@
>      else
>  	G_fatal_error(_("Unknown metric: '%s'"), opt.met->answer);
>  
> -    if (flag.m->answer)
> +    if (flag.m->answer) {
>  	scale = G_database_units_to_meters_factor();
> +	if (strcmp(opt.met->answer, "squared") == 0)
> +	    scale *= scale;
> +    }
>  
>      in_fd = Rast_open_old(in_name, "");
>  

Right. It worked in my test because map units were meters and scale thus 1. Patch applied in r50841.

Markus M

comment:29 by neteler, 12 years ago

Keywords: r.buffer r.grow.distance added

Anything to backport here?

in reply to:  29 ; comment:30 by mmetz, 12 years ago

Keywords: RST bspline added

Replying to neteler:

Anything to backport here?

Not yet. r.fillnulls has only options for RST parameters but not for bspline parameters. BTW, that applies to both 7 and 6.5.

Markus M

in reply to:  30 comment:31 by wenzeslaus, 9 years ago

Replying to mmetz:

Not yet. r.fillnulls has only options for RST parameters but not for bspline parameters.

I added the lambda parameter in r68301. It is a new feature, so no backport. There are no other parameters to add (ew_step and ns_step are used internally) except for memory which I'm not sure if it is needed (I can add it as well).

Once we decide on the memory, can we close it? (I'm not really sure of all the outcomes of the discussion above.)

comment:32 by wenzeslaus, 9 years ago

On more thing. I added the lambda option (for v.surf.bspline) into a new guisection called Spline options (the v.surf.rst options are in RST options section). Both are splines, so this might be a bit confusing. However, I did not name it Bspline options because I was not clear on what is b-spline and what is bicubic or bilinear spline interpolation with Tykhonov regularization which v.surf.bspline and related papers are taking about.

comment:33 by neteler, 9 years ago

Milestone: 6.4.36.4.6

comment:34 by marisn, 8 years ago

As r.fillnulls now supports RST and B-Spline options, I think this issue can be closed. If there are some missing features, separate issues should be opened.

in reply to:  34 comment:35 by neteler, 6 years ago

Milestone: 6.4.67.2.1
Resolution: fixed
Status: newclosed

Replying to marisn:

As r.fillnulls now supports RST and B-Spline options, I think this issue can be closed. If there are some missing features, separate issues should be opened.

Closing.

Note: See TracTickets for help on using tickets.