Opened 13 years ago
Closed 4 years ago
#1421 closed enhancement (fixed)
scalability of r.terraflow
Reported by: | dnewcomb | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.8.3 |
Component: | Raster | Version: | svn-trunk |
Keywords: | r.terraflow, large grids | Cc: | |
CPU: | x86-64 | Platform: | Linux |
Description (last modified by )
I have an fcell grid of elevations for the state of North Carolina (51000 rows 133000 columns 6783000000 cells) . I tried to run r.terraflow in GRASS7 ( 8/8/2011 svn snapshot) and ran into the dimension limits. So I patched them according to Glynn's email , http://www.osgeo.org/pipermail/grass-user/2004-February/024722.html and tried again ( Would it be better to change the dimension variable to int instead of short int?) .
This time my Streams file builds to about 26 GB and then r.terraflow bombs with :
MFD flow direction D8CUT=999999986991104.000000 Memory size: 808.00M (847249408) bytes Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode. r.terraflow: grass2str.h:145: AMI_STREAM<T>* cell2stream(char*, elevation_type, long int*) [with T = float, elevation_type = float]: Assertion `nrows * ncols == str->stream_len()' failed.
The memory size is interesting, because I'm giving it 8GB of RAM out of 16 GB in the command. The temp directory has about 900GB of space, so it has plenty of room .
The box is 64 bit Ubuntu 11.04 related to ?
Attachments (2)
Change History (26)
comment:1 by , 13 years ago
comment:2 by , 11 years ago
Component: | Default → Raster |
---|---|
Keywords: | r.terraflow large grids → r.terraflow, large grids |
Could you please retry with a recent version of G7?
comment:3 by , 11 years ago
Running r.terraflow on Ubuntu 12.04.4 64 bit with grass-7.0.svn_src_snapshot_2014_03_22 with same input grid ( 51000 rows, 133000 columns): Run from the gui, command stops with error:
ERROR: [nrows=22004, ncols=33006] dimension_type overflow -- change dimension_type and recompile
follow-up: 5 comment:4 by , 11 years ago
r.terraflow has been running for 9 hours with the modifications. posted in the diff files on a grid 51000 rows and 133000 columns. It should take 5 days or so to complete in this computer.
The thing that confuses me at the moment is the FILL line below in the temp file listing below. Why are there a large negative number of elements?
total elements=6783000000, nodata elements=3291487486 largest temporary files: FILL: 454.84G (488376000000) [-1806934592 elements, 72B each] FLOW: 312.17G (335185201344) [3491512514 elements, 96B each] Will need at least 909.67G (976752000000) space available in /data1
comment:5 by , 11 years ago
Replying to dnewcomb:
The thing that confuses me at the moment is the FILL line below in the temp file listing below. Why are there a large negative number of elements?
r.terraflow/main.cpp:410:
G_message( "\t\t FILL: %s [%d elements, %dB each]", formatNumber(buf, fillmaxsize), nrows * ncols, sizeof(waterWindowType)); G_message( "\t\t FLOW: %s [%ld elements, %dB each]", formatNumber(buf, flowmaxsize), (long)(nrows * ncols - nodata_count), sizeof(sweepItem));
Even if dimension_type is changed to "long", the value is still being formatted as an "int" ("%d" conversion specifier).
Also, the cast to "long" in the second call is wrong. If nrows and ncols are of type "int" (or any smaller type, e.g. "short"), the multiplication will be performed as "int", which may overflow; casting the (possibly overflowed) result to "long" won't change that.
It should be:
G_message( "\t\t FLOW: %s [%ld elements, %dB each]", formatNumber(buf, flowmaxsize), (long)nrows * ncols - nodata_count, sizeof(sweepItem));
Casting either of the operands to "long" will force the multiplication to be performed as "long" and yield a "long" result (however, note that "long" is still only 32 bits on 64-bit versions of Windows).
follow-up: 10 comment:7 by , 11 years ago
follow-up: 9 comment:8 by , 11 years ago
fwiw, 'diff -u' which gives "Unified" diffs is the preferred diff format. It provides a few lines of context around the change.
Hamish
comment:9 by , 11 years ago
Replying to hamish:
fwiw, 'diff -u' which gives "Unified" diffs is the preferred diff format. It provides a few lines of context around the change.
Hamish
Thanks! Still learning..
comment:10 by , 11 years ago
Replying to hamish:
G_message %ld changes applied in trunk with r59505, and devbr6 with r59507.
I'd note a few lines above this a cast to
(long long)
is also used.Hamish
Restarted large grid run with 59507 and edited 3dscan.h and types.h Now reads:
COMPUTING FLOW DIRECTIONS classifying nodata (inner & boundary) total elements=6783000000, nodata elements=3291487486 largest temporary files: FILL: 454.84G (488376000000) [6783000000 elements, 72B each] FLOW: 312.17G (335185201344) [3491512514 elements, 96B each] Will need at least 909.67G (976752000000) space available in /data1
comment:12 by , 10 years ago
Backported to relbr7 in r61340.
(unrelated: dnewcomb, please add you large file calculation timings in http://grasswiki.osgeo.org/wiki/GRASS_GIS_Performance )
Can the ticket be closed?
comment:13 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I will redo the timings when I get back from leave and post on the wiki.
follow-up: 15 comment:14 by , 5 years ago
Changes by @dnewcomb do not seem to be applied in trunk... A glitch or on purpose?
comment:15 by , 5 years ago
Replying to sbl:
Changes by @dnewcomb do not seem to be applied in trunk... A glitch or on purpose?
Do you refer to these two attachments?
types.h.diff (500 bytes) - added by dnewcomb 5 years ago.
diff file for types.h in r.terraflow directory
3scan.h.diff (520 bytes) - added by dnewcomb 5 years ago.
diff file for 3scan.h in r.terraflow directory
If they should be applied, pls open an updated PR at https://github.com/OSGeo/grass/pulls
comment:16 by , 5 years ago
My guess is a glitch. I am on the way out the door for vacation. Feel free to apply before I get back.
follow-up: 19 comment:18 by , 5 years ago
Milestone: | 7.0.0 → 7.8.0 |
---|---|
Resolution: | fixed |
Status: | closed → reopened |
Version: | svn-develbranch6 → svn-trunk |
Backporting candidate when tested properly?
comment:19 by , 5 years ago
Replying to sbl:
Backporting candidate when tested properly?
I would say this is a feature, so we should not backport this.
comment:23 by , 5 years ago
Milestone: | → 7.8.3 |
---|
comment:24 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Implemented in a slightly different way in: https://github.com/OSGeo/grass/pull/265
Will be available in GRASS 7.10
Thanks to MarkusM.
OK, I got it to work with the big grid.
for types.h typedef short dimension_type; /* represent dimension of the grid */ static const dimension_type dimension_type_max=SHORT_MAX;
changed to :
typedef long dimension_type; /* represent dimension of the grid */ static const dimension_type dimension_type_max=LONG_MAX;
for 3scan.h: line 127
changed to :
line 141
changed to:
output from command:
GRASS 7.0.svn (ncstpft_nad83):/data2/grass7_svn/grass_trunk/bin.x86_64-unknown-linux-gnu > r.terraflow --overwrite elevation=nc_20ft_ncfpm filled=nc_fill direction=nc_direct swatershed=nc_sink accumulation=nc_flow_accum tci=nc_tci memory=8000 stream_dir=/data2/bareearth stats=/data2/bareearth/stats2.out STREAM temporary files in /data2/bareearth (THESE INTERMEDIATE STREAMS WILL NOT BE DELETED IN CASE OF ABNORMAL TERMINATION OF THE PROGRAM. TO SAVE SPACE PLEASE DELETE THESE FILES MANUALLY!) MFD flow direction D8CUT=999999986991104.000000 Memory size: 7.81G (8388608000) bytes Memory manager registering memory in MM_IGNORE_MEMORY_EXCEEDED mode. total elements=6783000000, nodata elements=3291491362 largest temporary files: FILL: 454.84G (488376000000) [-1806934592 elements, 72B each] FLOW: 312.17G (335184829248) [3491508638 elements, 96B each] Will need at least 909.67G (976752000000) space available in /data2/bareearth
COMPUTING FLOW DIRECTIONS classifying nodata (inner & boundary) EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.93MB EMPQUEUEADAPTIVE: desired memory: 7997.93MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8386435434. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047221117 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.15MB EMPQUEUEADAPTIVE: desired memory: 7997.15MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385624130. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047119704 assigning preliminary directions finding flat areas (plateaus and depressions) EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.41MB EMPQUEUEADAPTIVE: desired memory: 7997.41MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385894538. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047153505 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7996.64MB EMPQUEUEADAPTIVE: desired memory: 7996.64MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385083234. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047052092 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7995.86MB EMPQUEUEADAPTIVE: desired memory: 7995.86MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8384271930. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1046950679 assigning directions on plateaus generating watersheds and watershed graph EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7998.96MB EMPQUEUEADAPTIVE: desired memory: 7998.96MB sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail: 8387517074. EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB EMPQUEUEADAPTIVE: pqsize set to 261837856 flooding depressions available memory: 7999MB (8387787594B) UnionFind::makeSet: reallocate double 2000 UnionFind::makeSet: reallocate double 4000 UnionFind::makeSet: reallocate double 8000 UnionFind::makeSet: reallocate double 16000 UnionFind::makeSet: reallocate double 32000 UnionFind::makeSet: reallocate double 64000 UnionFind::makeSet: reallocate double 128000 UnionFind::makeSet: reallocate double 256000 UnionFind::makeSet: reallocate double 512000 UnionFind::makeSet: reallocate double 1024000 UnionFind::makeSet: reallocate double 2048000 UnionFind::makeSet: reallocate double 4096000 UnionFind::makeSet: reallocate double 8192000 UnionFind::makeSet: reallocate double 16384000 UnionFind::makeSet: reallocate double 32768000 warning: watershed 1 (R=1) not done warning: watershed 31667557 (R=31688834) not done warning: watershed 31667558 (R=31688834) not done warning: watershed 31674901 (R=31688834) not done warning: watershed 31676231 (R=31688834) not done warning: watershed 31688834 (R=31688834) not done
REASSIGNING DIRECTIONS finding flat areas (plateaus and depressions) EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7997.15MB EMPQUEUEADAPTIVE: desired memory: 7997.15MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8385624138. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047119705 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7996.38MB EMPQUEUEADAPTIVE: desired memory: 7996.38MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8384812834. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1047018292 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7995.61MB EMPQUEUEADAPTIVE: desired memory: 7995.61MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8384001530. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1046916879 EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7994.83MB EMPQUEUEADAPTIVE: desired memory: 7994.83MB sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 8383190226. EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB EMPQUEUEADAPTIVE: pqsize set to 1046815466 assigning directions on plateaus creating flowStream: [AMI_STREAM /data2/bareearth/flowStream 0] compute flow directions done.
COMPUTING FLOW ACCUMULATION creating sweep stream from fill output stream
sweeping: EMPQUEUEADAPTIVE: starting in-memory pqueue EMPQUEUEADAPTIVE: available memory: 7999.73MB EMPQUEUEADAPTIVE: desired memory: 7999.73MB sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail: 8388328213. EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB EMPQUEUEADAPTIVE: pqsize set to 261863204
sorting sweep output stream
r.terraflow complete.