Opened 12 years ago

Last modified 5 years ago

#1662 new defect

Caching bug in 3D raster library with large data

Reported by: huhabla Owned by: grass-dev@…
Priority: critical Milestone: 7.6.2
Component: Raster3D Version: svn-trunk
Keywords: 3D raster, tiles, cache Cc:
CPU: x86-32 Platform: Linux

Description

There is a strange bug that appeared on an Intel Atom netbook while converting a list of raster maps into a 3D raster map. Operating system is a 32 Bit Ubuntu linux:

GRASS 7.0.svn (etopo_epsg4326): t.rast.to.rast3 input=temptest output=temptest
 100%
Creating 3D raster map
ERROR: Rast3d_cache_hash_remove_name: name not in hashtable
ERROR: Unable to create raster3d map <temptest>

GRASS 7.0.svn (etopo_epsg4326): g.region -p3
projection: 3 (Latitude-Longitude)
zone:       0
datum:      wgs84
ellipsoid:  wgs84
north:      90N
south:      90S
west:       180W
east:       180E
top:        20.00000000
bottom:     0.00000000
nsres:      0:02
nsres3:     0:02
ewres:      0:02
ewres3:     0:02
tbres:      1
rows:       5400
rows3:      5400
cols:       10800
cols3:      10800
depths:     20
cells:      58320000
cells3:     1166400000

I can not reproduce this bug on my 64Bit machine, since i have no 32Bit Linux available, it would be great if somebody with a 32Bit Linux can reproduce this.

Best Soeren

Change History (17)

comment:1 by neteler, 11 years ago

Could you please provide the "etopo_epsg4326" location which includes the temptest map?

comment:2 by mlennert, 11 years ago

Could you provide the data ?

Can anyone else confirm this ?

comment:3 by huhabla, 11 years ago

The test case for this error has been integrated in the raster3d unit test suite:

test.g3d.lib unit=large

The raster3d test suite must be compiled by hand. Simply switch into the lib/raster3d/test directory and type make. A new module will be created with the name "test.g3d.lib".

GRASS 7.0.svn (nc_spm_08_grass7):~/src/grass7.0/grass_trunk/lib/raster3d/test > test.g3d.lib help

Description:
 Performs unit and integration tests for the g3d library

Usage:
 test.g3d.lib [-uial] [unit=string] [integration=string] [depths=value]
   [rows=value] [cols=value] [tile_size=value] [--verbose] [--quiet]

Flags:
  -u   Run all unit tests
  -i   Run all integration tests
  -a   Run all unit and integration tests
  -l   Switch zip compression on
 --v   Verbose module output
 --q   Quiet module output

Parameters:
         unit   Choose the unit tests to run
                options: coord,putget,large
  integration   Choose the integration tests to run
                options: 
       depths   The number of depths to be used for the large file put/get value test
                default: 20
         rows   The number of rows to be used for the large file put/get value test
                default: 5400
         cols   The number of columns to be used for the large file put/get value test
                default: 10800
    tile_size   The tile size in kilo bytes to be used for the large file put/get value test. Set the tile size to 2048 and the number of row*cols*depths > 130000 to reproduce the tile rle error.
                default: 32

I have no 32Bit Linux system to test this issue, but i think it is related to the 32Bit address space of the operating system and therefore not a bug.

comment:4 by annakrat, 11 years ago

I tested it on 32Bit Ubuntu and I get a segmentation fault after 100% is finished. Should I test something else?

comment:5 by annakrat, 11 years ago

Here is the backtrace:

#0  0xb7f7c3ac in Rast3d_cache_hash_name2index (h=0x8051f98, name=1715470336) at cachehash.c:114
#1  0xb7f7adf7 in Rast3d_cache_elt_ptr (c=0x805a0f0, name=1715470336) at cache1.c:469
#2  0xb7f7b04c in Rast3d_cache_load (c=0x805a0f0, name=1715470336) at cache1.c:517
#3  0xb7f7c08b in Rast3d_flush_all_tiles (map=0x8054028) at cache.c:310
#4  0x0804bd57 in test_large_file_random (depths=20, rows=5400, cols=10800, tile_size=32) at test_put_get_value_large_file.c:283
#5  0x0804b421 in unit_test_put_get_value_large_file (depths=20, rows=5400, cols=10800, tile_size=32) at test_put_get_value_large_file.c:39
#6  0x08049c34 in main (argc=2, argv=0xbfffedd4) at test_main.c:160

Thank you for the instructions, here they are for completeness:

gdb test.g3d.lib

r unit=large
..... segfault
bt

in reply to:  5 ; comment:6 by huhabla, 11 years ago

Many thanks for the backtrace Anna. I was able to reproduce this behavior with my shiny "new" 32Bit Atom Netbook. The problem is that the cache file accessed in cache.c line 310 gets wrong so that a corrupted tile index is read from the file. This index is much to large for the tile array accessed in cachehash.c line 114, hence crash because of memory violation.

The cache file is corrupted because it exceedes the 32Bit limit of 4GB, hence the computed offset in cache.c faces a number overrun, resulting in a wrong file offsets. The size of size_t and off_t is 4 Bytes on my 32 Bit system and therefore not suited for files larger than 4GB.

IMHO its a problem of the 32Bit operating system. I don't know how to solve this issue ... LFS?? Any suggestions?

Conclusion: raster3D maps with more than a billion cells in case of type float, or more than 500 million cells in case of type double are not supported on 32Bit systems.

Replying to annakrat:

Here is the backtrace:

#0  0xb7f7c3ac in Rast3d_cache_hash_name2index (h=0x8051f98, name=1715470336) at cachehash.c:114
#1  0xb7f7adf7 in Rast3d_cache_elt_ptr (c=0x805a0f0, name=1715470336) at cache1.c:469
#2  0xb7f7b04c in Rast3d_cache_load (c=0x805a0f0, name=1715470336) at cache1.c:517
#3  0xb7f7c08b in Rast3d_flush_all_tiles (map=0x8054028) at cache.c:310
#4  0x0804bd57 in test_large_file_random (depths=20, rows=5400, cols=10800, tile_size=32) at test_put_get_value_large_file.c:283
#5  0x0804b421 in unit_test_put_get_value_large_file (depths=20, rows=5400, cols=10800, tile_size=32) at test_put_get_value_large_file.c:39
#6  0x08049c34 in main (argc=2, argv=0xbfffedd4) at test_main.c:160

Thank you for the instructions, here they are for completeness:

gdb test.g3d.lib

r unit=large
..... segfault
bt

in reply to:  6 comment:7 by mlennert, 11 years ago

Replying to huhabla:

Conclusion: raster3D maps with more than a billion cells in case of type float, or more than 500 million cells in case of type double are not supported on 32Bit systems.

So does that mean this is a wontfix bug that should go into known issues ?

Moritz

in reply to:  6 ; comment:8 by hamish, 11 years ago

Replying to huhabla:

IMHO its a problem of the 32Bit operating system. I don't know how to solve this issue ... LFS?? Any suggestions?

I see many times "long"* is used as a variable type in lib/raster3D/, e.g.: (* and some "unsigned long" too)

index.c:    long indexLast;
index.c:    indexLast = lseek(map->data_fd, (long)0, SEEK_END);

so to make it LFS compatible on 32bit those having to do with file offsets and cell counts should be changed to e.g. off_t or use a G_*() wrapper instead? (none for lseek(), but we do have G_ftell() and G_fseek())

fixable! :)

Hamish

comment:9 by hamish, 11 years ago

lib/raster3D/

index.c:    long indexLast;
index.c:    indexLast = lseek(map->data_fd, (long)0, SEEK_END);

for the record:

NAME
       lseek - reposition read/write file offset

SYNOPSIS
       #include <sys/types.h>
       #include <unistd.h>

       off_t lseek(int fd, off_t offset, int whence);

in reply to:  8 comment:10 by glynn, 11 years ago

Replying to hamish:

I see many times "long"* is used as a variable type in lib/raster3D/, e.g.: (* and some "unsigned long" too)

Historically, most of my "bulk" fixes for things like LFS skipped over the raster3D library (because the number of cases in raster3D often exceeded those in the rest of the GRASS code base combined).

so to make it LFS compatible on 32bit those having to do with file offsets and cell counts should be changed to e.g. off_t or use a G_*() wrapper instead? (none for lseek(), but we do have G_ftell() and G_fseek())

File offsets should use off_t.

Cell counts ... are a problem. "long long" and "int64_t" aren't in C89, "long" is only 32 bits on 64-bit Windows, size_t is unsigned.

Windows itself doesn't have off_t (or fseeko/ftello). The POSIX functions in MSVCRT use either int (e.g. read, write) or long (e.g. lseek). Some of them have 64-bit variants using __int64 (e.g. _lseeki64, although the name has changed between MSVCRT versions).

comment:11 by neteler, 8 years ago

Milestone: 7.0.07.0.3

comment:12 by neteler, 8 years ago

Milestone: 7.0.3

Ticket retargeted after milestone closed

comment:13 by neteler, 8 years ago

Milestone: 7.0.4

Ticket retargeted after 7.0.3 milestone closed

comment:14 by martinl, 8 years ago

Milestone: 7.0.47.0.5

comment:15 by neteler, 8 years ago

Milestone: 7.0.57.0.6

comment:16 by neteler, 6 years ago

Milestone: 7.0.67.0.7

comment:17 by martinl, 5 years ago

Milestone: 7.0.77.6.2
Note: See TracTickets for help on using tickets.