Opened 8 years ago

Last modified 2 months ago

#1421 reopened enhancement

scalability of r.terraflow

Reported by: dnewcomb Owned by: grass-dev@…
Priority: normal Milestone: 7.8.0
Component: Raster Version: svn-trunk
Keywords: r.terraflow, large grids Cc:
CPU: x86-64 Platform: Linux

Description (last modified by hamish)

I have an fcell grid of elevations for the state of North Carolina (51000 rows 133000 columns 6783000000 cells) . I tried to run r.terraflow in GRASS7 ( 8/8/2011 svn snapshot) and ran into the dimension limits. So I patched them according to Glynn's email , http://www.osgeo.org/pipermail/grass-user/2004-February/024722.html and tried again ( Would it be better to change the dimension variable to int instead of short int?) .

This time my Streams file builds to about 26 GB and then r.terraflow bombs with :

MFD flow direction
D8CUT=999999986991104.000000
Memory size: 808.00M (847249408) bytes
Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode.
r.terraflow: grass2str.h:145: AMI_STREAM<T>*
cell2stream(char*, elevation_type, long int*) [with T =
float, elevation_type = float]: Assertion `nrows * ncols ==
str->stream_len()' failed.

The memory size is interesting, because I'm giving it 8GB of RAM out of 16 GB in the command. The temp directory has about 900GB of space, so it has plenty of room .

The box is 64 bit Ubuntu 11.04 related to ?

http://trac.osgeo.org/grass/ticket/1006

Attachments (2)

types.h.diff (500 bytes) - added by dnewcomb 5 years ago.
diff file for types.h in r.terraflow directory
3scan.h.diff (520 bytes) - added by dnewcomb 5 years ago.
diff file for 3scan.h in r.terraflow directory

Download all attachments as: .zip

Change History (21)

comment:1 Changed 8 years ago by dnewcomb

OK, I got it to work with the big grid.

for types.h typedef short dimension_type; /* represent dimension of the grid */ static const dimension_type dimension_type_max=SHORT_MAX;

changed to :

typedef long dimension_type; /* represent dimension of the grid */ static const dimension_type dimension_type_max=LONG_MAX;

for 3scan.h: line 127

assert(ae == AMI_ERROR_END_OF_STREAM);

changed to :

assert((off_t)ae == AMI_ERROR_END_OF_STREAM);

line 141

assert(ae == AMI_ERROR_END_OF_STREAM)

changed to:

assert((off_t)ae == AMI_ERROR_END_OF_STREAM)

output from command:

GRASS 7.0.svn (ncstpft_nad83):/data2/grass7_svn/grass_trunk/bin.x86_64-unknown-linux-gnu > r.terraflow --overwrite elevation=nc_20ft_ncfpm filled=nc_fill direction=nc_direct swatershed=nc_sink accumulation=nc_flow_accum tci=nc_tci memory=8000 stream_dir=/data2/bareearth stats=/data2/bareearth/stats2.out STREAM temporary files in /data2/bareearth (THESE INTERMEDIATE STREAMS WILL NOT BE DELETED IN CASE OF ABNORMAL TERMINATION OF THE PROGRAM. TO SAVE SPACE PLEASE DELETE THESE FILES MANUALLY!) MFD flow direction D8CUT=999999986991104.000000 Memory size: 7.81G (8388608000) bytes Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode. total elements=6783000000, nodata elements=3291491362 largest temporary files: FILL: 454.84G (488376000000) [-1806934592 elements, 72B each] FLOW: 312.17G (335184829248) [3491508638 elements, 96B each] Will need at least 909.67G (976752000000) space available in /data2/bareearth


COMPUTING FLOW DIRECTIONS classifying nodata (inner & boundary) EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.93MB EMPQUEUEADAPTIVE: desired memory: 7997.93MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8386435434. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047221117 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.15MB EMPQUEUEADAPTIVE: desired memory: 7997.15MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385624130. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047119704 assigning preliminary directions finding flat areas (plateaus and depressions) EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.41MB EMPQUEUEADAPTIVE: desired memory: 7997.41MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385894538. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047153505 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7996.64MB EMPQUEUEADAPTIVE: desired memory: 7996.64MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385083234. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047052092 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7995.86MB EMPQUEUEADAPTIVE: desired memory: 7995.86MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8384271930. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1046950679 assigning directions on plateaus generating watersheds and watershed graph EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7998.96MB EMPQUEUEADAPTIVE: desired memory: 7998.96MB sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail: 8387517074. EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB EMPQUEUEADAPTIVE: pqsize set to 261837856 flooding depressions available memory: 7999MB (8387787594B) UnionFind::makeSet: reallocate double 2000 UnionFind::makeSet: reallocate double 4000 UnionFind::makeSet: reallocate double 8000 UnionFind::makeSet: reallocate double 16000 UnionFind::makeSet: reallocate double 32000 UnionFind::makeSet: reallocate double 64000 UnionFind::makeSet: reallocate double 128000 UnionFind::makeSet: reallocate double 256000 UnionFind::makeSet: reallocate double 512000 UnionFind::makeSet: reallocate double 1024000 UnionFind::makeSet: reallocate double 2048000 UnionFind::makeSet: reallocate double 4096000 UnionFind::makeSet: reallocate double 8192000 UnionFind::makeSet: reallocate double 16384000 UnionFind::makeSet: reallocate double 32768000 warning: watershed 1 (R=1) not done warning: watershed 31667557 (R=31688834) not done warning: watershed 31667558 (R=31688834) not done warning: watershed 31674901 (R=31688834) not done warning: watershed 31676231 (R=31688834) not done warning: watershed 31688834 (R=31688834) not done


REASSIGNING DIRECTIONS finding flat areas (plateaus and depressions) EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.15MB EMPQUEUEADAPTIVE: desired memory: 7997.15MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385624138. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047119705 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7996.38MB EMPQUEUEADAPTIVE: desired memory: 7996.38MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8384812834. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047018292 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7995.61MB EMPQUEUEADAPTIVE: desired memory: 7995.61MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8384001530. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1046916879 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7994.83MB EMPQUEUEADAPTIVE: desired memory: 7994.83MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8383190226. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1046815466 assigning directions on plateaus creating flowStream: [AMI_STREAM /data2/bareearth/flowStream 0] compute flow directions done.

100% 100% 100%


COMPUTING FLOW ACCUMULATION creating sweep stream from fill output stream

sorting sweep stream

sweeping: EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7999.73MB EMPQUEUEADAPTIVE: desired memory: 7999.73MB sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail: 8388328213. EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB EMPQUEUEADAPTIVE: pqsize set to 261863204

100%

sorting sweep output stream

100%

r.terraflow complete.

comment:2 Changed 5 years ago by neteler

Component: DefaultRaster
Keywords: r.terraflow large gridsr.terraflow, large grids

Could you please retry with a recent version of G7?

comment:3 Changed 5 years ago by dnewcomb

Running r.terraflow on Ubuntu 12.04.4 64 bit with grass-7.0.svn_src_snapshot_2014_03_22 with same input grid ( 51000 rows, 133000 columns): Run from the gui, command stops with error:

ERROR: [nrows=22004, ncols=33006] dimension_type overflow -- change dimension_type and recompile

Changed 5 years ago by dnewcomb

Attachment: types.h.diff added

diff file for types.h in r.terraflow directory

Changed 5 years ago by dnewcomb

Attachment: 3scan.h.diff added

diff file for 3scan.h in r.terraflow directory

comment:4 Changed 5 years ago by dnewcomb

r.terraflow has been running for 9 hours with the modifications. posted in the diff files on a grid 51000 rows and 133000 columns. It should take 5 days or so to complete in this computer.

The thing that confuses me at the moment is the FILL line below in the temp file listing below. Why are there a large negative number of elements?

total elements=6783000000, nodata elements=3291487486 largest temporary files: FILL: 454.84G (488376000000) [-1806934592 elements, 72B each] FLOW: 312.17G (335185201344) [3491512514 elements, 96B each] Will need at least 909.67G (976752000000) space available in /data1

comment:5 in reply to:  4 Changed 5 years ago by glynn

Replying to dnewcomb:

The thing that confuses me at the moment is the FILL line below in the temp file listing below. Why are there a large negative number of elements?

r.terraflow/main.cpp:410:

  G_message( "\t\t FILL: %s [%d elements, %dB each]",
		  formatNumber(buf, fillmaxsize),
		  nrows * ncols, sizeof(waterWindowType));
  G_message( "\t\t FLOW: %s [%ld elements, %dB each]",
		  formatNumber(buf, flowmaxsize),
		  (long)(nrows * ncols - nodata_count), sizeof(sweepItem));

Even if dimension_type is changed to "long", the value is still being formatted as an "int" ("%d" conversion specifier).

Also, the cast to "long" in the second call is wrong. If nrows and ncols are of type "int" (or any smaller type, e.g. "short"), the multiplication will be performed as "int", which may overflow; casting the (possibly overflowed) result to "long" won't change that.

It should be:

  G_message( "\t\t FLOW: %s [%ld elements, %dB each]",
		  formatNumber(buf, flowmaxsize),
		  (long)nrows * ncols - nodata_count, sizeof(sweepItem));

Casting either of the operands to "long" will force the multiplication to be performed as "long" and yield a "long" result (however, note that "long" is still only 32 bits on 64-bit versions of Windows).

comment:6 Changed 5 years ago by hamish

Description: modified (diff)

add '{{{' and '}}}' around code block.

comment:7 Changed 5 years ago by hamish

G_message %ld changes applied in trunk with r59505, and devbr6 with r59507.

I'd note a few lines above this a cast to (long long) is also used.

Hamish

comment:8 Changed 5 years ago by hamish

fwiw, 'diff -u' which gives "Unified" diffs is the preferred diff format. It provides a few lines of context around the change.

Hamish

comment:9 in reply to:  8 Changed 5 years ago by dnewcomb

Replying to hamish:

fwiw, 'diff -u' which gives "Unified" diffs is the preferred diff format. It provides a few lines of context around the change.

Hamish

Thanks! Still learning..

comment:10 in reply to:  7 Changed 5 years ago by dnewcomb

Replying to hamish:

G_message %ld changes applied in trunk with r59505, and devbr6 with r59507.

I'd note a few lines above this a cast to (long long) is also used.

Hamish

Restarted large grid run with 59507 and edited 3dscan.h and types.h Now reads:

COMPUTING FLOW DIRECTIONS classifying nodata (inner & boundary) total elements=6783000000, nodata elements=3291487486 largest temporary files: FILL: 454.84G (488376000000) [6783000000 elements, 72B each] FLOW: 312.17G (335185201344) [3491512514 elements, 96B each] Will need at least 909.67G (976752000000) space available in /data1

comment:11 Changed 5 years ago by dnewcomb

Seems to have finished correctly in 53.3 hours.

comment:12 Changed 5 years ago by neteler

Backported to relbr7 in r61340.

(unrelated: dnewcomb, please add you large file calculation timings in http://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance )

Can the ticket be closed?

comment:13 Changed 5 years ago by dnewcomb

Resolution: fixed
Status: newclosed

I will redo the timings when I get back from leave and post on the wiki.

comment:14 Changed 2 months ago by sbl

Changes by @dnewcomb do not seem to be applied in trunk... A glitch or on purpose?

comment:15 in reply to:  14 Changed 2 months ago by neteler

Replying to sbl:

Changes by @dnewcomb do not seem to be applied in trunk... A glitch or on purpose?

Do you refer to these two attachments?

types.h.diff​ (500 bytes) - added by dnewcomb 5 years ago.

diff file for types.h in r.terraflow directory

3scan.h.diff​ (520 bytes) - added by dnewcomb 5 years ago.

diff file for 3scan.h in r.terraflow directory

If they should be applied, pls open an updated PR at https://github.com/OSGeo/grass/pulls

comment:16 Changed 2 months ago by dnewcomb

My guess is a glitch. I am on the way out the door for vacation. Feel free to apply before I get back.

comment:18 Changed 2 months ago by sbl

Milestone: 7.0.07.8.0
Resolution: fixed
Status: closedreopened
Version: svn-develbranch6svn-trunk

Backporting candidate when tested properly?

comment:19 in reply to:  18 Changed 2 months ago by wenzeslaus

Replying to sbl:

Backporting candidate when tested properly?

I would say this is a feature, so we should not backport this.

Note: See TracTickets for help on using tickets.