Opened 14 years ago

Closed 12 years ago

#807 closed defect (fixed)

r.watershed doesnt consider longer distance to diagonal neighbouring pixels

Reported by: aread Owned by: grass-dev@…
Priority: major Milestone: 6.4.4
Component: Raster Version: 6.4.0 RCs
Keywords: r.watershed Cc:
CPU: All Platform: All

Description

When using r.watershed on a low relief smooth dem I realised that the flow direction surface was almost 20 times more likely to go to one of the diagonal pixels rather then the edge pixels. I’m sure this is not right as I interpret the description in Ehlschlaeger (1989) this shouldn’t happen.

The defect is that r.watershed isn’t correcting for the larger distance between cells that share a corner then cells that share an edge. The result is that r.watershed is directing flow to the lowest cell not the steepest cell.

Ehlschlaeger (1989) in the last paragraph of section 4 describes how the search algorithm should be adjusted to account for diagonal cells being further away so that flow follows the steepest path. see http://chuck.ehlschlaeger.info/older/IGIS/paper.html

In http://trac.osgeo.org/grass/browser/grass/trunk/raster/r.watershed/ram/do_astar.c#L232 there is code in get_slope that accounts for the diagonal but it doesn’t seem to be called.

Change History (16)

comment:1 by mmetz, 14 years ago

Fixed in trunk r39722 and r39724 and in devbr6 r39726 and r39727.

I followed the description in Ehlschlaeger (1989) to the letter and the results seem to be correct now. I want to wait for independent confirmation before backporting to relbr6. Using elev_lid792_1m in the North Carolina dataset, the SFD results of 64 and 65 are strikingly different and IMHO the 64 results look wrong compared to the 65 results. Diagonal bias in flow direction is gone, now it's close to 50-50.

I noticed that a correction for diagonal bias was implemented in the original version as in grass3.2 of 1990. Does anybody know why it was removed? SVN history doesn't go back far enough.

MFD mode in grass65 and grass7 was only slightly affected, getting out of sinks should be improved by now.

Markus M

comment:2 by hamish, 14 years ago

Keywords: r.watershed added
Priority: criticalmajor

triaging priority.

how does the patch (or r.watershed for that matter) work with lat/lon locations? maybe that had something to do with the removal (????) FWIW svn (cvs) history back to 1999 (~r1 to r3) can be found in the grass54 branch even if the history has been broken in trunk. Also the old ViewCVS is AFAIK still up & running (link on trac wiki page).

Hamish

comment:3 by helena, 14 years ago

I finally had chance to look at it and it does make a significant difference (much more than I would have ever guessed) and I think that after some more testing the fix should go to the GRASS64 release as a bugfix.

Some examples from the nc_spm_08 data set 1m resolution old

http://skagit.meas.ncsu.edu/~helena/wrriwork/lakewh/flowD8dem.jpg

new

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_5Kdiagfix.png

red is the stream from high res ortho

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_5Kdiagfix2.png.png

I haven't done any serious testing but brief visual comparison indicates that the fix may improve the accuracy of stream extraction beyond just looking more realistic) red-sreams from ortho, blue-old, black-new

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_diagfix_streams.png

example with the 30m ned shows quite a few differences - see the first stream from the west on these two images with the fix apparently better although still with some error:

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_10Korig.png

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_10kfix.png

It would be very useful if anybody who has some other data than nc_spm and some orthos to compare the results with runs another test and confirms that the change can go into 64, Markus, thanks for fixing it so quickly,

Helena

in reply to:  2 comment:4 by mmetz, 14 years ago

Replying to hamish:

how does the patch (or r.watershed for that matter) work with lat/lon locations?

Tested only in real projections, should most probably use geodesic distance for lat/lon locations?

I'm busy investigating the new patch for its behaviour in depressions, just to make sure I do not introduce another bug. Outside depressions it looks ok, flow directions are set as expected.

maybe that had something to do with the removal (????) FWIW svn (cvs) history back to 1999 (~r1 to r3) can be found in the grass54 branch even if the history has been broken in trunk. Also the old ViewCVS is AFAIK still up & running (link on trac wiki page).

The change happened between 1990 (grass32) and 1998 (grass40), I did not manage to find that commit :-(

Markus M

comment:5 by hamish, 14 years ago

Recommendation: wait to backport this to the 6.4.x line until after 6.4.0 has shipped, and have it go out in the 6.4.1 release instead. otherwise we only have time for cursory checks. Waiting for 6.4.1 buys us some time. We could add a BUGS section to the 6.4.0 help page if needed.

Hamish

in reply to:  5 comment:6 by mmetz, 14 years ago

Replying to hamish:

Recommendation: wait to backport this to the 6.4.x line until after 6.4.0 has shipped, and have it go out in the 6.4.1 release instead. otherwise we only have time for cursory checks. Waiting for 6.4.1 buys us some time.

Hmm, about time, nobody checked in the last 17 years or so? grass40 was released in 1992 I learned, and there the correction for diagonal flow bias was already disabled. It was probably not related to lat/lon projections because 1) I'm not sure if grass supported lat/lon back then, 2) there is a lot of slope calculation for the RUSLE factors that doesn't care about lat/lon locations and is still active. Making r.watershed lat/lon aware is (another) major rewrite... It was more probably related to a severe speed penalty imposed by the original implementation, avoided in the new implementation (which is also more in line with A* Search logic). It is really a pity that the commit comments from grass32 to grass40 are no longer available. I also did not find anything in the mailing list archives, although r.watershed was apparently regularly used.

Hydrologists at CSIRO in Australia are currently looking at r.watershed in depth and tear the algorithm apart, looking at all sorts of special cases. So far the corrected version produces pretty much optimal results.

We could add a BUGS section to the 6.4.0 help page if needed.

BTW, there is another bug in SECTION 4: Watershed determination, I'm busy fixing it.

Markus M

in reply to:  3 ; comment:7 by mmetz, 14 years ago

Replying to helena:

I finally had chance to look at it and it does make a significant difference (much more than I would have ever guessed) and I think that after some more testing the fix should go to the GRASS64 release as a bugfix.

Some examples from the nc_spm_08 data set 1m resolution old

http://skagit.meas.ncsu.edu/~helena/wrriwork/lakewh/flowD8dem.jpg

new

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_5Kdiagfix.png

Hmm, that looks very much like the old results before the fix, the differences are probably due to floating point DEM support in grass65 and grass7, all versions of r.watershed before grass65 read DEMs as CELL, discarding any decimal places.

The new SFD results for elev_lid792_1m in nc_spm_08 with grass7 are very similar to r.terraflow's SFD accumulation. I am pretty sure that all algorithms used in r.terraflow are properly implemented (although I disapprove of sink filling, I think it's done the right way). IIUC, r.terraflow sorts all cells descending by elevation (sort stream) and then distributes flow along the steepest slope (sweep stream), taking into account the longer distance to diagonal cells. Thus, the results for non-sinks should be mostly identical between the two modules.

Here are some numbers on SFD flow direction for r.watershed in grass7:

before the fix:

edge direction no sink: 8302 (1.59%)
diag direction no sink: 513291 (98.41%)
edge out of sink: 233 (45.60%)
diag out of sink: 278 (54.40%)

after the fix:

edge direction no sink: 241894 (46.38%)
diag direction no sink: 279699 (53.62%)
edge out of sink: 233 (45.60%)
diag out of sink: 278 (54.40%)

Similar for other DEMs, each percentage is somewhere in the range of >40% to <60%. Although the new result doesn't look as smooth as the old result for elev_lid792_1m, it now corrects for diagonal flow bias. Note that sink traversal is unaffected; that is IMHO the big strength of r.watershed.

The fix not only affects the location of streams, the changes are also very prominent in areas with low flow accumulation, interesting for erosion modeling.

MFD is not affected, i.e. this fix would be mostly needed by 64 because it doesn't have MFD. OTOH, these are drastically different results and before submitting to 64 I would like to do more testing to 1) make sure I have properly understood and implemented Chuck Ehlschlaeger's description and 2) figure out why the correction present in grass32 was removed for grass4, there must have been a reason (maybe because the code did not exactly match the description and caused problems in special cases).

Markus M

in reply to:  7 comment:8 by hamish, 14 years ago

Replying to mmetz:

MFD is not affected, i.e. this fix would be mostly needed by 64 because it doesn't have MFD. OTOH, these are drastically different results and before submitting to 64 I would like to do more testing to 1) make sure I have properly understood and implemented Chuck Ehlschlaeger's description and 2) figure out why the correction present in grass32 was removed for grass4, there must have been a reason (maybe because the code did not exactly match the description and caused problems in special cases).

maybe a silly question, but have you tried emailing Chuck to ask? He is still contactable AFAIK & has been quite responsive to questions in the past. I'm not even sure if he knows about all the wonderful updates you've done and what r.watershed is now capable of.

Hamish

comment:9 by helena, 14 years ago

Markus was right - the difference was due to handling elevations as int versus fp, so now I got the version with the diagonal fix and here is the comparison of spatial patterns

grass64RC05 - SFD with integer elevation range ~ -200000 - +200000 note zig-zag main streams, missing flow accumulation along the road on the west ridge http://skagit.meas.ncsu.edu/~helena/grasswork/accum5K_gr64rc5_3d.jpg http://skagit.meas.ncsu.edu/~helena/grasswork/accum5K_gr64rc5.png

grass65 compiled in Sep 2009, SFD with FP before diagonal fix range ~ -200000 - +55000 (why so much lower than grass64?) no zig-zag on main streams, more realistic pattern on streams, lots of diagonals on hillslopes http://skagit.meas.ncsu.edu/~helena/grasswork/accum5K_gr65_sep09.jpg http://skagit.meas.ncsu.edu/~helena/grasswork/accum_5Ksep09.jpg

grass65 compiled Feb 2010, SFD with fp after diagonal fix range only slightly different, quite different pattern on hillslopes - note particularly NW section of the watershed where the previously diagonal flow changed to horizontal http://skagit.meas.ncsu.edu/~helena/grasswork/accum5K_gr65_feb10c_i.jpg http://skagit.meas.ncsu.edu/~helena/grasswork/accum_5Kdiag2010.jpg

I can see that somebody might have liked the diagonal biased version better than the correct one.

grass65 MFD - no difference between sep 2009 and feb 2010 range ~ -200000 - +16000 (lower than SFD as it should be, but still why such a diff between 64 and 65?) most realistic overall http://skagit.meas.ncsu.edu/~helena/grasswork/accum5K_gr65_mfd.jpg http://skagit.meas.ncsu.edu/~helena/grasswork/accum_5Kmfdi.jpg

Compared to r.watershed in GRASS65, the GRASS64 results look really bad for this high resolution data - I assume for lower resolution data the difference won't be as stark but still grass64 will be much worse than the grass65 version. Should grass65 version of r.watershed be backported to grass64? although the difference in values needs to be explained (it may be mistake on my side) and Markus M may have some additional issues that need to be addressed, what do others think? who makes the decision?

Helena

in reply to:  9 comment:10 by mmetz, 14 years ago

Replying to helena:

grass64RC05 - SFD with integer elevation range ~ -200000 - +200000

grass64svn - SFD with integer elevation range -251146 - +246789

grass65 compiled in Sep 2009, SFD with FP before diagonal fix range ~ -200000 - +55000 (why so much lower than grass64?)

grass65, SFD with diagonal fix range -256259 - +53004

Accumulation is actually a bit larger regarding absolute values with FP and the diagonal fix: 256259 in grass65 vs. 251146 in grass64.

The reason why the largest positive value is so much lower in 65 is that the main basin is now receiving offmap inflow from the North-West corner, downstream accumulation values become negative because they are now an underestimate. This is in turn an effect of floating point elevation support.

The reason why the largest absolute value in grass65 is a bit larger than the largest absolute value in grass64 is the correction for diagonal flow bias. In grass64, r.watershed finds the shortest path from any given cell to an outlet point. In grass65, surface flow follows the steepest downhill slope, even if a less steep downhill slope would be a shorter path to the respective outlet point. Thus in grass65 and grass7, streams make a few more twists and turns. are a bit longer and the maximum flow accumulation is slightly larger. The justification for the fix for diagonal flow bias is that any given cell does not "know" (the path to) its outlet point (the A* Search does though), therefore it can not follow the shortest path to the outlet point but follows the steepest downhill slope. Considering only its eight neighbours, this is the shortest way down to the lowest elevation.

SFD flow accumulation from r.terraflow is very similar to grass65 and SFD in r.terraflow also follows the steepest downhill slope.

Should grass65 version of r.watershed be backported to grass64? although the difference in values needs to be explained (it may be mistake on my side) and Markus M may have some additional issues that need to be addressed,

Only one, SFD still produces artefacts along edges: streams don't stop at region boundaries but creep along the boundaries, although flow direction can not be determined if elevation of some neighbours is unknown. Fixed for MFD but still present in SFD in order to keep results identical to previous versions.

I have left the 64 version pretty much as it is, only doing bug fixes, but actively developed the 65 version and the 70 version within the last year, the segmented mode in the 70 version is my favorite. I would opt to leave the 64 version as is, after fixing diagonal flow bias, but maybe the 65 version can go to 6.4.1?

Markus M

comment:11 by helena, 14 years ago

further comments on GRASS65 version, I looked again at the negative values and according to the manual:Negative numbers indicate that those cells possibly have surface runoff from outside of the current geographic region.

However, when running r.watershed with elev_lid792_1m in the nc_spm_08 data set I get this as the result for mfd (which looks great): http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd.png
but this is where the negative values are: http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd_neg.png
why does it switch from positive to negative in the middle of the slope even for the flow accumulation that apparently originates on the top of the hill well inside the area? Am I somehow misinterpreting the results? I can imagine the high accumulation values starting from outside, but the low negative accumulation values (in tens of cells) close to the center of the region have no way of starting outside.
Helena

Just a note that the man page has description and option twice, this should be an easy fix.

in reply to:  11 ; comment:12 by hamish, 14 years ago

Replying to helena:

However, when running r.watershed with elev_lid792_1m in the nc_spm_08 data set I get this as the result for mfd (which looks great): http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd.png
but this is where the negative values are: http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd_neg.png
why does it switch from positive to negative in the middle of the slope even for the flow accumulation that apparently originates on the top of the hill well inside the area?

I suspect that it hasn't switched to negative, just that the color map is stale and needs to be recalculated. (the slopes steeper than the last time r.colors was run go beyond the current color rules and so get mapped to the default color (white))

try re-running r.colors on it & d.redraw. and d.what.rast (or gui equiv) to query actual values.

Just a note that the man page has description and option twice, this should be an easy fix.

check the .tmp.html file in the module build dir. There's a chance that it didn't get cleaned and every time you rebuild another copy of the auto-generated help text gets appended. (I haven't seen that in a while, but maybe it's back)

the actual second OPTIONs section in it gives a paragraph explaining each option, which is too much for the command line quick help but useful to have. shrug.

regards, Hamish

in reply to:  12 comment:13 by helena, 14 years ago

Replying to hamish:

Replying to helena:

However, when running r.watershed with elev_lid792_1m in the nc_spm_08 data set I get this as the result for mfd (which looks great): http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd.png
but this is where the negative values are: http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd_neg.png
why does it switch from positive to negative in the middle of the slope even for the flow accumulation that apparently originates on the top of the hill well inside the area?

I suspect that it hasn't switched to negative, just that the color map is stale and needs to be recalculated. (the slopes steeper than the last time r.colors was run go beyond the current color rules and so get mapped to the default color (white))

try re-running r.colors on it & d.redraw. and d.what.rast (or gui equiv) to query actual values.

Sorry that my image was misunderstood - the image with white spaces is the exact same data as the image with full color except displayed as d.rast myaccum val=-250000-0 (or something like that) and then just to make sure I also extracted the negative values using r.mapcalc myaccum<0. And I did use d.what.rast to check the actual values - I was trying to find out whether they switch to negative along particular elevation contour, but they don't. So if you query downslope you get for example 25, 33, 56, -78,-90, ...

Just a note that the man page has description and option twice, this should be an easy fix.

check the .tmp.html file in the module build dir. There's a chance that it didn't get cleaned and every time you rebuild another copy of the auto-generated help text gets appended. (I haven't seen that in a while, but maybe it's back)

this is for the on-line man page http://grass.osgeo.org/grass65/manuals/html65_user/r.watershed.html as you have noticed it does not really repeat itself rather the OPTIONS section should have been a subsection of DESCRIPTION, maybe just removing the OPTION title and replacing it with sentence "Detailed explanation of flags and parameters:" would minimize the confusion.

the actual second OPTIONs section in it gives a paragraph explaining each option, which is too much for the command line quick help but useful to have. shrug.

just randomly checking r.surf.contour has section "Parameters" twice. There is probably more of it (nobody is apparently reading the manual pages anyway).

Helena

regards, Hamish

in reply to:  11 comment:14 by mmetz, 14 years ago

Replying to helena:

this is where the negative values are: http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd_neg.png
why does it switch from positive to negative in the middle of the slope even for the flow accumulation that apparently originates on the top of the hill well inside the area?

The main fow originates from the top of the hill, but cells on the slope also receive flow from other neighbouring cells with higher elevation. Tracing this back, particularly for MFD flow dispersion, can lead to the edge of the region

I can imagine the high accumulation values starting from outside, but the low negative accumulation values (in tens of cells) close to the center of the region have no way of starting outside.

Yes, they have, although the absolute amount of surface flow making it from the edge of the region to there is minimal, <<< 1. If the edge of the region has higher elevation than the center of the region, and if there is an ever so slight downhill slope leading from the edge to the center, some edge flow will reach the center.

Long explanation:

As the manual says, "Negative numbers indicate that those cells possibly have surface runoff from outside of the current geographic region." or from cells with unknown elevation (MASKed or NULL), and this is what the results should show, if not, it's a bug.

The general slope direction in elev_lid792_1m is from about Northwest to Southeast and the terrain is quite flat. Many hillslopes seem to be neither strongly convex nor concave in tangential direction, and in MFD mode, flow is distributed to many neighbours. Therefore it can be expected that the sign of flow accumulation will also be spread considerably. Still, is it true that so many cells receive offmap flow or is it a bug?

In elev_lid.acc.mfd.png

http://skagit.meas.ncsu.edu/~helena/grasswork/accum_mfd_neg.png

only negative accumulation values are displayed, overlayed by elev_lid792_cont1m (1m contour lines). The negative sign is not propagated uphill (except for egressing depressions), but, starting from the borders of the region, slowly descends downhill (edge between coloured and white cells).

Looking at negative sfd accumulation values, displayed in elev_lid.acc.sfd.highlight.png

http://sites.google.com/site/markusmetzgiswork/elev_lid.acc.sfd.highlight.png

one of the crucial differences is where the red ellipse is located in the Northwest. Here, the hillslope seems to be rather flat in the tangential direction, if not convex, and surface flow including the negative sign is spread quite far.

Comparing elev_lid792.acc.sfd.zoom.png

http://sites.google.com/site/markusmetzgiswork/elev_lid.acc.sfd.zoom.png

and elev_lid792.acc.mfd.zoom.png

http://sites.google.com/site/markusmetzgiswork/elev_lid.acc.mfd.zoom.png

both show only negative flow, it is apparent that SFD struggles to find a reasonable flow direction on this hill and shows a straight line where the negative sign is propagated from the region's edge downhill into the main basin. MFD on the contrary spreads out the surface flow, typical for convex hills and similar to flat areas. The negative sign is spread together with the flow, and because this is at the top of the main basin in the current region, the negative sign is propagated wide and far downhill.

Negative flow also appears within the region midway down the large hill in the Northwestern quarter, keeping in mind that this is not really a hill. Elevation difference from the top of the hill to the first occurence of negative flow is about 60cm and from that point the elevation difference slowly increases. AFAICT, the negative sign is only propagated downhill.

If you are confident that there are indeed negative flow accumulation values where there should be positive flow accumulation values, please post the exact coordinates, so I can look in more detail what is going there.

Markus M

comment:15 by neteler, 12 years ago

Milestone: 6.4.06.4.4

For the record: devbr6 r39726 and r39727 have been backported some time ago to GRASS 6.4.svn.

comment:16 by mmetz, 12 years ago

Resolution: fixed
Status: newclosed

Closing ticket as fixed in all branches.

Markus M

Note: See TracTickets for help on using tickets.