#2764 closed defect (wontfix)
corrupt data written to FCELL and DCELL rasters, hard to re-produce
Reported by: | dylan | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.2.4 |
Component: | Raster | Version: | unspecified |
Keywords: | Cc: | ||
CPU: | x86-64 | Platform: | Linux |
Description
I think that I may have made some progress after a long battle with some very hard to reproduce errors. The attached bash scripts can be used in any location (LL and projected give the same results on my machine) and will test the following:
- generate a new DCELL map via
r.surf.gauss
- test resulting map via
r.univar
- repeat process, overwriting previous DCELL map
I have noticed that running these tests serially and in parallel produce the same results: ~2% of the maps generated generate an error message when read via r.univar
. This is the same error message that started my recent string of postings to GRASS-dev. After applying a patch (#2762) to add some more information to the error message, I see:
ERROR: Error reading raster data for row 1949 of <testing_gauss_7> (buffer size mismatch)
Tests were performed using r66514 of grass_trunk.
Attachments (6)
Change History (47)
by , 9 years ago
by , 9 years ago
Attachment: | test-map-creation.sh added |
---|
test map creation via r.surf.gauss (bash script)
comment:1 by , 9 years ago
Update
If the maps generated by r.surf.gauss
are deleted with g.remove
explicitly (instead of overwriting with the "--o" flag) then there are 0 cases (out of 1000) in which read errors are reported.
Testing on a different disk to see if that makes any difference.
follow-up: 8 comment:2 by , 9 years ago
Update
The same tests on a different disk (SSD) this time resulted in 0 errors.
The original test was done on a pair of standard hard drives, mirrored via Linux software RAID (level 1). Could this have anything to do with the reported read errors?
comment:4 by , 8 years ago
Perhaps a RAID configuration issue at your end?
Here a test on a SSD, no error (I have no RAID to test at time):
GRASS 7.2.svn (nc_spm_08_grass7):~ > g.region raster=elevation res=100 for i in `seq 1 1000` ; do echo -n "$i... " ; bash /tmp/test-map-creation.sh ; done 1... 2... 3... 4... 5... 6... 7... 8... 9... 10... 11... 12... 13... 14... 15... 16... 17... 18... 19... 20... 21... 22... 23... 24... 25... 26... 27... 28... 29... 30... 31... 32... 33... 34... [...] 967... 968... 969... 970... 971... 972... 973... 974... 975... 976... 977... 978... 979... 980... 981... 982... 983... 984... 985... 986... 987... 988... 989... 990... 991... 992... 993... 994... 995... 996... 997... 998... 999... 1000... GRASS 7.2.svn (nc_spm_08_grass7):~ >
comment:6 by , 8 years ago
Milestone: | 7.2.1 → 7.2.2 |
---|
comment:8 by , 7 years ago
Replying to dylan:
Update
The same tests on a different disk (SSD) this time resulted in 0 errors.
The original test was done on a pair of standard hard drives, mirrored via Linux software RAID (level 1). Could this have anything to do with the reported read errors?
The attached patch raster_get_row.patch adds more warnings and errors, maybe this helps for debugging.
follow-up: 10 comment:9 by , 7 years ago
After applying the patch, I see this (more informative) error message:
Decompression failed with error 0
This was in the context of a corrupt (?) map generated by r.horizon, when run in parallel like this:
seq 0 $step 330 | parallel --gnu --progress "bash make-hz-maps.sh {} $step $elev"
make-hz-makes.sh is a simple wrapper around r.horizon:
# start angle start=$1 # angle step size step=$2 # elevation map map=$3 # r.horizon requires start, step, and end angles # compute end angle via bc end=`echo "$start + $step" | bc` r.horizon --q --o elevation=$map step=$step start=$start end=$end output=hzangle
Am I abusing GRASS raster libraries by invocation of multiple instances of the same module that all access the same elevation map?
comment:10 by , 7 years ago
Replying to dylan:
After applying the patch, I see this (more informative) error message:
Decompression failed with error 0
The actual error code is missing, the patch was wrong, now updated.
Which compressor are you using?
This was in the context of a corrupt (?) map generated by r.horizon, when run in parallel like this:
seq 0 $step 330 | parallel --gnu --progress "bash make-hz-maps.sh {} $step $elev"make-hz-makes.sh is a simple wrapper around r.horizon:
# start angle start=$1 # angle step size step=$2 # elevation map map=$3 # r.horizon requires start, step, and end angles # compute end angle via bc end=`echo "$start + $step" | bc` r.horizon --q --o elevation=$map step=$step start=$start end=$end output=hzangleAm I abusing GRASS raster libraries by invocation of multiple instances of the same module that all access the same elevation map?
I don't think so, multiple instances of the same module should run independent of each other. The problem could be anywhere between the physical hard drive having problems with multiple simultaneous requests to read the same file, the OS handling such requests, and the GRASS file reading mechanism.
comment:11 by , 7 years ago
Using ZLIB compression on all raster maps and the NULL files are compressed.
Applied the new patch and get the following:
ERROR: Decompression failed with error -1
I have identified one of the offending maps, and attempts to read it via r.compress -u
or r.univar
fail. Attempting to write each cell to a file with r.stats
results in reasonable data output until the above error is encountered. I assume that this has to do with row-level decompression encountering corrupt data?
Are there any diagnostics that can be performed on the map which might shed some light on why it can't be decompressed?
Given the errors and manner of failure, it seems like the problems are happening at write-time.
follow-up: 15 comment:14 by , 7 years ago
Replying to dylan:
Relevant zlib manual page.
For ticket completion, here the related grass-user message:
https://lists.osgeo.org/pipermail/grass-user/2018-January/077572.html
On Wed, Jan 3, 2018 at 8:41 PM, Dylan Beaudette wrote:
Update: after applying the latest patch, I now see
ERROR: Decompression failed with error -1
I found the map that fails decompression. Is there any way to inspect the map in order to search for more clues as to what is wrong with it or how it might have happened?
All of the maps in this project are using the default ZLIB compression, along with compressed NULL files. Looking over the zlib manual (https://www.zlib.net/manual.html), I see several references to an error code of "-1":
#define Z_ERRNO (-1)
Z_ERRNO if there is an error writing the flushed data
Z_ERRNO on a file operation error
ZEXTERN const char * ZEXPORT gzerror OF((gzFile file, int *errnum));
Returns the error message for the last error which occurred on the given compressed file. errnum is set to zlib error number. If an error occurred in the file system and not in the compression library, errnum is set to Z_ERRNO and the application may consult errno to get the exact error code.
The last note is interesting and suggests that this specific "problem" may be happening at the file system or OS level. That said, I have only encountered this error in the context of FCELL or DCELL maps--which makes me think that it is some combination of GRASS and the underlying file system.
Some OS information:
3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Description: Ubuntu 14.04.5 LTS Release: 14.04
follow-up: 16 comment:15 by , 7 years ago
Replying to neteler:
Replying to dylan:
Relevant zlib manual page.
For ticket completion, here the related grass-user message:
https://lists.osgeo.org/pipermail/grass-user/2018-January/077572.html
On Wed, Jan 3, 2018 at 8:41 PM, Dylan Beaudette wrote:
Update: after applying the latest patch, I now see
ERROR: Decompression failed with error -1
I found the map that fails decompression. Is there any way to inspect the map in order to search for more clues as to what is wrong with it or how it might have happened?
All of the maps in this project are using the default ZLIB compression, along with compressed NULL files. Looking over the zlib manual (https://www.zlib.net/manual.html), I see several references to an error code of "-1":
#define Z_ERRNO (-1)
Z_ERRNO if there is an error writing the flushed data
Z_ERRNO on a file operation error
ZEXTERN const char * ZEXPORT gzerror OF((gzFile file, int *errnum));
We can't use gzerror() because libgis does not operate on a gzFile. Instead we could use the undocumented zError() function, now in the updated patch.
You will need at least trunk r71890 for this patch. Maybe trunk r71890 or later without the attached patch fixes the problem already.
The errors must be somehow related to G_read_compressed() because only FCELL/DCELL maps use G_read_compressed() while CELL maps use G_expand() directly.
follow-up: 17 comment:16 by , 7 years ago
Replying to mmetz:
Replying to neteler:
Replying to dylan:
Relevant zlib manual page.
For ticket completion, here the related grass-user message:
https://lists.osgeo.org/pipermail/grass-user/2018-January/077572.html
On Wed, Jan 3, 2018 at 8:41 PM, Dylan Beaudette wrote:
Update: after applying the latest patch, I now see
ERROR: Decompression failed with error -1
I found the map that fails decompression. Is there any way to inspect the map in order to search for more clues as to what is wrong with it or how it might have happened?
All of the maps in this project are using the default ZLIB compression, along with compressed NULL files. Looking over the zlib manual (https://www.zlib.net/manual.html), I see several references to an error code of "-1":
#define Z_ERRNO (-1)
Z_ERRNO if there is an error writing the flushed data
Z_ERRNO on a file operation error
ZEXTERN const char * ZEXPORT gzerror OF((gzFile file, int *errnum));
We can't use gzerror() because libgis does not operate on a gzFile. Instead we could use the undocumented zError() function, now in the updated patch.
You will need at least trunk r71890 for this patch. Maybe trunk r71890 or later without the attached patch fixes the problem already.
The errors must be somehow related to G_read_compressed() because only FCELL/DCELL maps use G_read_compressed() while CELL maps use G_expand() directly.
Thanks Markus. This is all starting to (maybe?) make sense, given the clues collected over the last couple of years:
- errors only reported at read-time, from maps generated by several different modules
- FCELL / DCELL maps occasionally contain corrupt ZLIB-compressed rows when created in parallel
- CELL maps not affected
- the likelihood of corrupt rows seems to be a function of the number of cells involved: number of maps * number of cells * number of iterations * number of concurrent processes
- this only seems to happen when working with relatively large maps 3000 x 3000 cells or larger
- the
*** buffer overflow detected ***
message encountered while reading the corrupt row - the zlib error number
- the fact that I have not yet encountered this problem with LZ4 compression
I think that I was still encountering errors, post-r71890. My latest build (with ZLIB-related errors) was r7200.
Could it be that the zlib library's length function isn't multi-thread safe and occasionally reports the wrong length in the context of writing a row of data? A simple, but quite heavy-handed test would be a debugging option for reading the last written row as a sanity check. I don't have the skills to implement this, so just an idea.
I'll try the latest patch today and report back.
follow-up: 18 comment:17 by , 7 years ago
Replying to dylan:
Could it be that the zlib library's length function isn't multi-thread safe and occasionally reports the wrong length in the context of writing a row of data?
In your examples, modules are run in parallel, i.e. independent of each other and calls to zlib's library functions are (should be) also independent of each other.
A simple, but quite heavy-handed test would be a debugging option for reading the last written row as a sanity check. I don't have the skills to implement this, so just an idea.
Yes, that's a good idea. I will think about such a test, to be activated at compile time in order to not add overhead on standard operation.
I'll try the latest patch today and report back.
I have added more informative warnings and errors in trunk r72030,1.
follow-up: 19 comment:18 by , 7 years ago
follow-up: 20 comment:19 by , 7 years ago
Replying to dylan:
Replying to mmetz:
I have added more informative warnings and errors in trunk r72030,1.
Switched back to ZLIB compression and got the following error:
WARNING: ZLIB decompression error -3: data error ERROR: Error uncompressing fp raster data for row 2000 of <beam.206>: error code -1
Could there be a lurking zlib compression bug in the code that only manifests during parallel invocation, or, when a very large number of rows are involved?
Switching back to LZ4 for more testing...
follow-up: 21 comment:20 by , 7 years ago
Replying to dylan:
Replying to dylan:
Replying to mmetz:
I have added more informative warnings and errors in trunk r72030,1.
Switched back to ZLIB compression and got the following error:
WARNING: ZLIB decompression error -3: data error ERROR: Error uncompressing fp raster data for row 2000 of <beam.206>: error code -1Could there be a lurking zlib compression bug in the code that only manifests during parallel invocation, or, when a very large number of rows are involved?
I don't think so because
- you can't reproduce the error with a SSD
- Markus N can't reproduce the error with a SSD
- I can't reproduce the error with a SSD or an old spinning HDD (no RAID) or a NAS with spinning disks and software RAID 5
Therefore I guess that something is wrong with your hard drive or there is a bug in your filesystem driver or there is a bug in your RAID management/configuration (in case you are still using a software RAID 1 system).
follow-up: 22 comment:21 by , 7 years ago
Replying to mmetz:
Could there be a lurking zlib compression bug in the code that only manifests during parallel invocation, or, when a very large number of rows are involved?
I don't think so because
- you can't reproduce the error with a SSD
- Markus N can't reproduce the error with a SSD
- I can't reproduce the error with a SSD or an old spinning HDD (no RAID) or a NAS with spinning disks and software RAID 5
Therefore I guess that something is wrong with your hard drive or there is a bug in your filesystem driver or there is a bug in your RAID management/configuration (in case you are still using a software RAID 1 system).
All good points, except that this iteration of errors (grass-user messages and tickets filed since 2017-12-30) have all been encountered while working on an ssd exclusively. Unless of course temp files are stored outside of the database/mapset. I'll work on a short list of all related tickets and mailing list posts.
follow-up: 23 comment:22 by , 7 years ago
Replying to dylan:
Replying to mmetz:
Could there be a lurking zlib compression bug in the code that only manifests during parallel invocation, or, when a very large number of rows are involved?
I don't think so because
- you can't reproduce the error with a SSD
- Markus N can't reproduce the error with a SSD
- I can't reproduce the error with a SSD or an old spinning HDD (no RAID) or a NAS with spinning disks and software RAID 5
Therefore I guess that something is wrong with your hard drive or there is a bug in your filesystem driver or there is a bug in your RAID management/configuration (in case you are still using a software RAID 1 system).
All good points, except that this iteration of errors (grass-user messages and tickets filed since 2017-12-30) have all been encountered while working on an ssd exclusively. Unless of course temp files are stored outside of the database/mapset. I'll work on a short list of all related tickets and mailing list posts.
If you are not using the test scripts attached to this ticket any more, can you please also add data and commands that trigger this error to this ticket? Thanks!
follow-up: 24 comment:23 by , 7 years ago
Replying to mmetz:
If you are not using the test scripts attached to this ticket any more, can you please also add data and commands that trigger this error to this ticket? Thanks!
Good point, I should probably have opened another ticket.
Several folks have posted a number of messages to GRASS-user and GRASS-dev over the last couple of years, related to errors reading raster maps after parallel processing:
- http://osgeo-org.1560.x6.nabble.com/Error-reading-raster-data-for-row-xxx-only-when-using-r-series-and-t-rast-series-td5229569i20.html
- https://lists.osgeo.org/pipermail/grass-dev/2015-July/075691.html
- https://lists.osgeo.org/pipermail/grass-dev/2015-July/075627.html
- #2762
This ticket was originally opened with a set of scripts to test for errors writing raster maps in parallel and it now seems that those tests no longer generate errors when:
- run on an SSD
- run on a standard HDD
- run on mirrored HDD (Linux RAID 1) as long as maps are deleted vs. overwritten via --o
This would have been a good place to close this ticket and re-open another one.
As of 2017-12-30, I was again encountering the hard-to-reproduce "ERROR: Error reading raster data for row 1949 of <map>" errors. Just as with last time in the context of parallel execution. I posted this message to GRASS-user and then proceeded to document progress as of comment 8 in this ticket.
I am attaching two scripts to illustrate the latest instance, beam-rad-at-tile.sh
and daily-rad.sh
, invoked like this:
# ... ## try ZLIB compression: # random errors as described in #2764 # export GRASS_COMPRESSOR=ZLIB ## try LZ4 compression: # no errors! export GRASS_COMPRESSOR=LZ4 # # ... looping code # bash beam-rad-at-tile.sh $tile_i # ...
Essentially, I am iterating over 5000x5000 {elevation, slope, aspect} tiles, computing horizon angle maps in parallel, computing daily beam radiance maps in parallel, summing daily maps, and then proceeding to the next tile. With ZLIB compression, I am randomly encountering raster read errors generated by r.horizon
, r.sun
, or r.mapcalc
in this context. This seems to happen with or without NULL cells in the active tile. Errors are not encountered when using LZ4 compression.
The system is an 8-core Intel i7 950 @ 3.07Ghz with GRASS database/mapset residing on an SSD.
I'll post one of the {elevation, slope, aspect} tiles if anyone is interested in tinkering with them. They seem to work fine both within GRASS and, after exporting, in other GIS software.
by , 7 years ago
Attachment: | daily-rad.sh added |
---|
by , 7 years ago
Attachment: | beam-rad-at-tile.sh added |
---|
comment:24 by , 7 years ago
Replying to dylan: [...]
As of 2017-12-30, I was again encountering the hard-to-reproduce "ERROR: Error reading raster data for row 1949 of <map>" errors. Just as with last time in the context of parallel execution. I posted this message to GRASS-user and then proceeded to document progress as of comment 8 in this ticket.
I am attaching two scripts to illustrate the latest instance,
beam-rad-at-tile.sh
anddaily-rad.sh
, invoked like this:# ... ## try ZLIB compression: # random errors as described in #2764 # export GRASS_COMPRESSOR=ZLIB ## try LZ4 compression: # no errors! export GRASS_COMPRESSOR=LZ4 # # ... looping code # bash beam-rad-at-tile.sh $tile_i # ...Essentially, I am iterating over 5000x5000 {elevation, slope, aspect} tiles,
The script make-hz-maps.sh called by beam-rad-at-tile.sh is missing. Is the exact looping code needed in order to reproduce, or does it also happen with any given tile?
computing horizon angle maps in parallel, computing daily beam radiance maps in parallel, summing daily maps, and then proceeding to the next tile. With ZLIB compression, I am randomly encountering raster read errors generated by
r.horizon
,r.sun
, orr.mapcalc
in this context. This seems to happen with or without NULL cells in the active tile. Errors are not encountered when using LZ4 compression.The system is an 8-core Intel i7 950 @ 3.07Ghz with GRASS database/mapset residing on an SSD.
I'll post one of the {elevation, slope, aspect} tiles if anyone is interested in tinkering with them.
Test data, or a script to generate test data would help.
by , 7 years ago
Attachment: | make-hz-maps.sh added |
---|
follow-up: 26 comment:25 by , 7 years ago
The errors are typically encountered within the 365 calls to r.sun
, any tile. I'll post an example tile today.
Most of the errors are encountered in the final call to r.series
within:
bash beam-rad-at-tile.sh $tile_i
follow-up: 27 comment:26 by , 7 years ago
Replying to dylan:
I'll post an example set of tile data shortly.
OK.
Most of the errors are encountered in the final call to
r.series
within:bash beam-rad-at-tile.sh $tile_i
That means r.sun has created corrupt output. BTW, considering that you are running several instances of r.sun in parallel, I wonder if you compiled GRASS with openmp and use the nprocs option of r.sun. In this case you would have several instances of r.sun and each instance of r.sun would be multi-threaded: no speed gain, more sources of potential errors.
Well crud, just got this after a 10 hour run, returned by
r.series
:WARNING: LZ4 decompression error ERROR: Error uncompressing fp raster data for row 3929 of <beam.106>: error code -1That is the first error using LZ4 compression after many successful tiles. I wonder if the faster compression results in a lower probability of row corruption? Within my current project, I seem to be encountering corrupt rows about 0.0001% of the time: 2 rows out of (5000 rows * 365 calls to
r.sun
).
Chances are very small that ZLIB and LZ4 have the same bug. It rather seems to be a write error when writing several files at (nearly) the same time.
This makes me wonder about my hardware and OS...
However, I recall having this kind of error (365 maps generated in parallel, and 1 or 2 rows in the entire set with corruption "discovered" by
r.series
) nearly every time I have usedr.sun
over the last 10 years. It just so happens that this time I am performing the same operation 54 times vs. the typical single run. In every situation I was using some version of Debian or Xubuntu on fairly common (multi-processor or multi-core) hardware.Thinking back, all of the errors have been encountered with hyper-threading enabled: both on a dual Xeon and currently i7 950.
follow-up: 28 comment:27 by , 7 years ago
Replying to mmetz:
[...]
That means r.sun has created corrupt output. BTW, considering that you are running several instances of r.sun in parallel, I wonder if you compiled GRASS with openmp and use the nprocs option of r.sun. In this case you would have several instances of r.sun and each instance of r.sun would be multi-threaded: no speed gain, more sources of potential errors.
Note that this sometimes happens in the follow-up step where there output from r.sun
is converted to MJ/sq.m by r.mapcalc
: "beam.106" (r.mapcalc
) vs. "temp_beam_106" (r.sun
).
I had previously compiled using --with-openmp. I'll try disabling it. So far, I haven't used the nprocs
argument to r.sun
.
Well crud, just got this after a 10 hour run, returned by
r.series
:WARNING: LZ4 decompression error ERROR: Error uncompressing fp raster data for row 3929 of <beam.106>: error code -1That is the first error using LZ4 compression after many successful tiles. I wonder if the faster compression results in a lower probability of row corruption? Within my current project, I seem to be encountering corrupt rows about 0.0001% of the time: 2 rows out of (5000 rows * 365 calls to
r.sun
).Chances are very small that ZLIB and LZ4 have the same bug. It rather seems to be a write error when writing several files at (nearly) the same time.
Right. Are there any functions below put_row.c where concurrent writes could be trouble? Is there anything else that I can do to help test?
I just posted an example tileset containing elevation, slope, and aspect maps. Sorry for the large file sizes. Note that it takes my machine about 10 hours per tile, with errors surfacing in the final call to r.series
. I'll re-write my code to check the intermediate files in the meantime.
Thanks!
follow-up: 29 comment:28 by , 7 years ago
Replying to dylan:
Replying to mmetz:
[...]
That means r.sun has created corrupt output. BTW, considering that you are running several instances of r.sun in parallel, I wonder if you compiled GRASS with openmp and use the nprocs option of r.sun. In this case you would have several instances of r.sun and each instance of r.sun would be multi-threaded: no speed gain, more sources of potential errors.
Note that this sometimes happens in the follow-up step where there output from
r.sun
is converted to MJ/sq.m byr.mapcalc
: "beam.106" (r.mapcalc
) vs. "temp_beam_106" (r.sun
).
This is a very simple r.mapcalc expression. You should be able to trigger the error by simply creating a number of maps in parallel with r.mapcalc, after that (serially) testing the outputs with r.univar. With r.mapcalc in daily-rad.sh, you are running several independent instances of r.mapcalc in parallel. With daily-rad.sh called by beam-rad-at-tile.sh you are not testing if GRASS is thread-safe, instead you are testing if your OS, filesystem and hard drive can handle multiple simultaneous IO requests. Please check your system messages and the health of your hard drives (e.g. with smartctl) first, before you proceed.
follow-up: 30 comment:29 by , 7 years ago
Replying to mmetz:
This is a very simple r.mapcalc expression. You should be able to trigger the error by simply creating a number of maps in parallel with r.mapcalc, after that (serially) testing the outputs with r.univar. With r.mapcalc in daily-rad.sh, you are running several independent instances of r.mapcalc in parallel. With daily-rad.sh called by beam-rad-at-tile.sh you are not testing if GRASS is thread-safe, instead you are testing if your OS, filesystem and hard drive can handle multiple simultaneous IO requests. Please check your system messages and the health of your hard drives (e.g. with smartctl) first, before you proceed.
Yeah, that is what I thought and what the original test scripts basically perform. As stated in my update circa 2 years ago, these tests run fine on both the RAID1 and SSD on this machine.
I don't see any troubling messages reported by dmesg
or smartctl
. Note that I don't have any issues with any other GRASS commands, or (as far as I can tell) general usage on this machine. I only see these errors when working with GRASS commands that:
- take a long time to run:
r.sun
ort.rast.mapcalc
(e.g. a couple of years ago) - operate on moderately large, floating-point maps
- are done in parallel, either via GNU
parallel
or as implemented in the temporal suite of modules
...hence the extreme difficulty in recreating the errors or further debugging.
For the record, I have been getting corrupt rows using LZ4 compression in 1/8 attempts vs. 1/2 attempts when using ZLIB compression.
Next error I'll re-compile without openmp (not using it, but just in case) and -g -Wall CFLAGS.
follow-up: 31 comment:30 by , 7 years ago
Replying to dylan:
[...] Note that I don't have any issues with any other GRASS commands, or (as far as I can tell) general usage on this machine. I only see these errors when working with GRASS commands that:
- take a long time to run:
r.sun
ort.rast.mapcalc
(e.g. a couple of years ago)- operate on moderately large, floating-point maps
- are done in parallel, either via GNU
parallel
or as implemented in the temporal suite of modules...hence the extreme difficulty in recreating the errors or further debugging.
Unfortunately, I was not able to recreate these errors with the provided test data and scripts.
I still think this is some obscure disk IO error. You could try to use nice
, e.g. nice r.sun ...
and nice r.mapcalc ...
in daily-rad.sh
. At least this helps when running many GRASS modules in parallel on HPC systems where results are written out to one single storage device.
follow-up: 32 comment:31 by , 7 years ago
Replying to mmetz:
Replying to dylan:
[...] Note that I don't have any issues with any other GRASS commands, or (as far as I can tell) general usage on this machine. I only see these errors when working with GRASS commands that:
- take a long time to run:
r.sun
ort.rast.mapcalc
(e.g. a couple of years ago)- operate on moderately large, floating-point maps
- are done in parallel, either via GNU
parallel
or as implemented in the temporal suite of modules...hence the extreme difficulty in recreating the errors or further debugging.
Unfortunately, I was not able to recreate these errors with the provided test data and scripts.
I still think this is some obscure disk IO error. You could try to use
nice
, e.g.nice r.sun ...
andnice r.mapcalc ...
indaily-rad.sh
. At least this helps when running many GRASS modules in parallel on HPC systems where results are written out to one single storage device.
Well thank you very much for all of your patience, patches, and testing. I'll try the nice
option. For now, I think that I can tolerate the much lower frequency of errors after switching to LZ4 compression. Perhaps the faster speed of LZ4 lowers the probability of concurrent write operations.
It is still quite puzzling that this kind of error has come up on several different machines while tracking GRASS trunk over a 10 year period. Maybe this is a subtle hint that it is time to build a new workstation...
I know this is a lot to ask, but did you try testing using ZLIB compression and running it multiple times? It took a couple of tiles before I noticed the error.
There will be an opportunity to test these same scripts in an HPC environment over the next two months. I'll be sure and report back any findings from those tests.
follow-up: 33 comment:32 by , 7 years ago
Replying to dylan:
Replying to mmetz:
Replying to dylan:
[...] Note that I don't have any issues with any other GRASS commands, or (as far as I can tell) general usage on this machine. I only see these errors when working with GRASS commands that:
- take a long time to run:
r.sun
ort.rast.mapcalc
(e.g. a couple of years ago)- operate on moderately large, floating-point maps
- are done in parallel, either via GNU
parallel
or as implemented in the temporal suite of modules...hence the extreme difficulty in recreating the errors or further debugging.
Unfortunately, I was not able to recreate these errors with the provided test data and scripts.
I still think this is some obscure disk IO error. You could try to use
nice
, e.g.nice r.sun ...
andnice r.mapcalc ...
indaily-rad.sh
. At least this helps when running many GRASS modules in parallel on HPC systems where results are written out to one single storage device.Well thank you very much for all of your patience, patches, and testing. I'll try the
nice
option. For now, I think that I can tolerate the much lower frequency of errors after switching to LZ4 compression. Perhaps the faster speed of LZ4 lowers the probability of concurrent write operations.
I don't think so because because with LZ4, more data need to be written, which takes longer.
It is still quite puzzling that this kind of error has come up on several different machines while tracking GRASS trunk over a 10 year period. Maybe this is a subtle hint that it is time to build a new workstation...
Markus Neteler in particular spent a lot of time to fix various systems for parallel execution of GRASS commands. GRASS itself was never the problem, instead the main problem was that the multiple outputs to be written to a single storage device were too much for that storage device.
I know this is a lot to ask, but did you try testing using ZLIB compression and running it multiple times? It took a couple of tiles before I noticed the error.
I did use ZLIB compression when running the test with the data and scripts provided. Do you mean I should run the test several times with the same data?
There will be an opportunity to test these same scripts in an HPC environment over the next two months. I'll be sure and report back any findings from those tests.
An HPC environment basically consists of a job scheduler (e.g slurm) and any number of execution hosts. You can set up a minimal HPC environment on a single standard workstation. More execution hosts mean more (hardware) trouble.
follow-up: 34 comment:33 by , 7 years ago
Replying to mmetz:
Markus Neteler in particular spent a lot of time to fix various systems for parallel execution of GRASS commands. GRASS itself was never the problem, instead the main problem was that the multiple outputs to be written to a single storage device were too much for that storage device.
OK. Good to know. Are there any other diagnostics for these kind of problems, other than looking through the output from dmesg
or kernel messages? I typically run dstat
while developing parallel processing scripts, but I haven't noticed any disk-thrashing in this case.
Update Looking at disk and I/O stats at a 1 second granularity via:
dstat -m -c --top-cpu --top-bio --top-io -d -D sdd1 -r --disk-util 1
I see that the SSD (sdd1) is idle for most of the time and then spikes to 30-80% of its "disk utilization" as reported by dstat --disk-util
when several r.sun
jobs finish (?). As a comparison, making a .tgz on the SSD yields ~ 12% disk utilization.
Each instance of r.sun
is writing out ~5000 rows of data over a period of about 8 minutes, so that is 5000 rows * 8 processes * 1/8 proc per minute * 1/60 minutes per second = ~80 write operations per second (assuming rows are written as processed). It would appear (from the dstat
output) that rows are written in batches?
I know this is a lot to ask, but did you try testing using ZLIB compression and running it multiple times? It took a couple of tiles before I noticed the error.
I did use ZLIB compression when running the test with the data and scripts provided. Do you mean I should run the test several times with the same data?
Yes (please). It wasn't until I had ran a couple of tiles that I encountered errors.
follow-up: 35 comment:34 by , 7 years ago
Replying to dylan:
Replying to mmetz:
Markus Neteler in particular spent a lot of time to fix various systems for parallel execution of GRASS commands. GRASS itself was never the problem, instead the main problem was that the multiple outputs to be written to a single storage device were too much for that storage device.
OK. Good to know. Are there any other diagnostics for these kind of problems, other than looking through the output from
dmesg
or kernel messages? I typically rundstat
while developing parallel processing scripts, but I haven't noticed any disk-thrashing in this case.
Markus N might know of other places to look for problematic messages.
Update Looking at disk and I/O stats at a 1 second granularity via:
dstat -m -c --top-cpu --top-bio --top-io -d -D sdd1 -r --disk-util 1I see that the SSD (sdd1) is idle for most of the time and then spikes to 30-80% of its "disk utilization" as reported by
dstat --disk-util
when severalr.sun
jobs finish (?).
After r.sun
comes r.mapcalc
with a simple expresssion, producing output quite fast. This could be the reason for the spikes.
As a comparison, making a .tgz on the SSD yields ~ 12% disk utilization.
Each instance of
r.sun
is writing out ~5000 rows of data over a period of about 8 minutes, so that is 5000 rows * 8 processes * 1/8 proc per minute * 1/60 minutes per second = ~80 write operations per second (assuming rows are written as processed). It would appear (from thedstat
output) that rows are written in batches?
No, rows are written out one at a time. The hard drive decides when pending changes are actually written out from the disk cache to the actual disk. As before, r.mapcalc
might produce output quite fast.
I know this is a lot to ask, but did you try testing using ZLIB compression and running it multiple times? It took a couple of tiles before I noticed the error.
I did use ZLIB compression when running the test with the data and scripts provided. Do you mean I should run the test several times with the same data?
Yes (please). It wasn't until I had ran a couple of tiles that I encountered errors.
I have run the test again, letting it overwrite the results of the previous run (no prior g.remove because you stated earlier that this helps trigger the errors), and it finished successfully again.
My system will be busy with other parallel GRASS processing tasks for the next couple of days, so no capacity here to test again soon.
comment:35 by , 7 years ago
Replying to mmetz:
Replying to dylan:
I see that the SSD (sdd1) is idle for most of the time and then spikes to 30-80% of its "disk utilization" as reported by
dstat --disk-util
when severalr.sun
jobs finish (?).After
r.sun
comesr.mapcalc
with a simple expresssion, producing output quite fast. This could be the reason for the spikes.
The reason for the spikes in disk utilization is most likely r.sun
because it writes out all output at once at the end.
Coincidentally, I encountered similar problems on a different system where many GRASS commands were running in parallel. The errors were reading and decompression errors of both the data file and the null file. There were 2 hosts doing the processing, the first host running 4 jobs in parallel, the second host running 8 jobs in parallel. Reading and decompression errors happened only on the first host running only 4 jobs in parallel. The problem was that the file system used for temporary data was nearly full and that it was not optimized for synchronous IO requests, slowing down the whole system and causing these errors. I freed some space on that filesystem and optimized its configuration, now all is running ok. Not perfect, the system is still sluggish in response, but I don't get any errors again.
Therefore this ticket must IMHO be closed as wontfix because the solution is to adjust hardware, filesystems, and OS to heavy IO read/write requests.
comment:37 by , 7 years ago
Milestone: | → 7.2.4 |
---|
comment:39 by , 6 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Replying to martinl:
Still relevant? Some outcome? Thanks.
All the additional error messages indicate that this problem can't be fixed inside GRASS. Since this problem occurs with different compression methods (ZLIB, LZ4), it's also not a problem of the compressor. It's probably hardware, filesystems, and/or OS having trouble with heavy disk IO.
comment:41 by , 6 years ago
Replying to mmetz:
In 74456:
from man 2 write
:
A successful return from write() does not make any guarantee that data has been committed to disk. On some filesystems, including NFS, it does not even guarantee that space has successfully been reserved for the data. In this case, some errors might be delayed until a future write(2), fsync(2), or even close(2). The only way to be sure is to call fsync(2) after you are done writing all your data.
main testing script (bash)