Opened 12 years ago

Closed 12 years ago

#4815 closed defect (fixed)

gdal_retile.py segmentation fault

Reported by: ajuvolkov Owned by: warmerdam
Priority: normal Milestone: 1.9.2
Component: GDAL_Raster Version: 1.9.1
Severity: normal Keywords: vrt
Cc:

Description

Hello, I got a segmentation fault message while I tilling a big ASTER GDEM dataset by gdal_retile.py

Call stack from the core dump file:

#0  __memset_sse2 () at ../sysdeps/x86_64/multiarch/../memset.S:65 
#1  0x00007fda74904bff in VRTSourcedRasterBand::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, int, int) () from /usr/lib/libgdal.so.1 
#2  0x00007fda7494581a in GDALProxyRasterBand::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, int, int) () from /usr/lib/libgdal.so.1 
#3  0x00007fda7490913a in VRTComplexSource::RasterIO(int, int, int, int, void*, int, int, GDALDataType, int, int) () from /usr/lib/libgdal.so.1 
#4  0x00007fda74904997 in VRTSourcedRasterBand::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, int, int) () from /usr/lib/libgdal.so.1 
#5  0x00007fda74934113 in GDALDataset::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, int, int*, int, int, int) () from /usr/lib/libgdal.so.1 
#6  0x00007fda74934909 in GDALDataset::RasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, int, int*, int, int, int) () from /usr/lib/libgdal.so.1 
#7  0x00007fda749af4e2 in GDALWarpOperation::WarpRegionToBuffer(int, int, int, int, void*, GDALDataType, int, int, int, int, double, double) () from /usr/lib/libgdal.so.1 
#8  0x00007fda7490ac87 in VRTWarpedDataset::ProcessBlock(int, int) () from /usr/lib/libgdal.so.1 
#9  0x00007fda7490b0c1 in VRTWarpedRasterBand::IReadBlock(int, int, void*) () from /usr/lib/libgdal.so.1 
#10 0x00007fda7494bbf5 in GDALRasterBand::GetLockedBlockRef(int, int, int) () from /usr/lib/libgdal.so.1 
#11 0x00007fda74958f52 in GDALRasterBand::IRasterIO(GDALRWFlag, int, int, int, int, void*, int, int, GDALDataType, int, int) () from /usr/lib/libgdal.so.1 
#12 0x00007fda750fb97c in ?? () from /usr/lib/python2.7/dist-packages/osgeo/_gdal.so 
#13 0x0000000000497ea4 in PyEval_EvalFrameEx () 
#14 0x000000000049f1c0 in PyEval_EvalCodeEx () 
#15 0x00000000004983b8 in PyEval_EvalFrameEx () 
#16 0x0000000000498602 in PyEval_EvalFrameEx () 
#17 0x0000000000498602 in PyEval_EvalFrameEx () 
#18 0x0000000000498602 in PyEval_EvalFrameEx () 
#19 0x000000000049f1c0 in PyEval_EvalCodeEx () 
#20 0x00000000004983b8 in PyEval_EvalFrameEx () 
#21 0x000000000049f1c0 in PyEval_EvalCodeEx () 
#22 0x00000000004a9081 in PyRun_FileExFlags () 
#23 0x00000000004a9311 in PyRun_SimpleFileExFlags () 
#24 0x00000000004aa8bd in Py_Main () 
#25 0x00007fda7597176d in __libc_start_main (main=0x41b980 <main>, argc=14, ubp_av=0x7fff89365758, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff89365748) at libc-start.c:226 
#26 0x000000000041b9b1 in _start ()

I have a core dump file. It's size is 35 Mb. I can send you it if necessary.

Gdal version is 1.9.1, released 2012/05/15 from the repository http://ppa.launchpad.net/ubuntugis/ubuntugis-unstable/ubuntu

Attachments (1)

gdal_crash_example.tar.gz (328.0 KB ) - added by ajuvolkov 12 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 by Even Rouault, 12 years ago

The core dump file might not be the most convenient way of debugging this; a link to the source ASTER GDEM dataset and the exact command line you've used would be better

comment:2 by ajuvolkov, 12 years ago

Aster GDEM is a very big dataset (about 250 Gb) and it very difficult to download a full dataset. You can read this paper http://www.geos.ed.ac.uk/homes/mattal/ASTER.pdf to get to know how to download it. I will try to reproduce a crash and get you more information about it.

comment:3 by ajuvolkov, 12 years ago

I think, that this problem is related to out of memory. Sometimes I got an out of memory messages and I increased a swap in my machine. After it I began to get the segmentation fault message. Crash occured at the memset calling, may be it accepted a null pointer from the malloc function?

comment:4 by Even Rouault, 12 years ago

At the minimum, it would be expected that GDAL shouldn't crash in out-of-memory situation, and error out in a more nicely way. Generally, that's what is done, but there might be a missing check somewhere.

Another possibility is that, on Linux, due to the "memory overcommit" (see http://opsmonkey.blogspot.fr/2007/01/linux-memory-overcommit.html ), the malloc() might succeeds, but at the time the memory pages are really accessed, for example by memset(), the kernel then realizes that it cannot honour the allocation and then kill the process. That's just an hypothesis: generally, you would observe heavy swapping and the memory would become sluggish before.

To reproduce I think I don't need the real dataset, but something with the same characteristics. So if you attach the full output of gdalinfo on your input dataset, that should be enough to recreate a fake one with similar charactheristics. And the exact gdalwarp commandline you are using too.

comment:5 by ajuvolkov, 12 years ago

Thank you for the interesting link. I've never heard about such memory management on the Linux. I think that there are a similar case. I have a machine with a 6 Gb of memory and 4 CPU core. I run a 4 processes of gdal_retile.py to increase a computation speed. Sometimes the system is going on extremly swap usage - about of 5 Gb of extra memory.

It will a hard way to reproduce my dataset, but I try to explain.

So, about my dataset.

I have a complex dataset which cover a whole Earth. I combined together a 3 datasets by the VRT-files complex source mechanism (Blue Marble, SRTM3 v4, ASTER GDEM):

1) Blue marble is a small dataset with a 500 meters per pixel resolution (at the ecuator) covers a whole Earth. gdalinfo:

Driver: VRT/Virtual Raster
Files: dataset.vrt
       dataset_int16.vrt
Size is 86400, 43200
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        TOWGS84[0,0,0,0,0,0,0],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9108"]],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.004166667000000,-0.004166667000000)
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -90.0000144) (180d 0' 0.00"W, 90d 0' 0.05"S)
Upper Right ( 180.0000288,  90.0000000) (180d 0' 0.10"E, 90d 0' 0.00"N)
Lower Right ( 180.0000288, -90.0000144) (180d 0' 0.10"E, 90d 0' 0.05"S)
Center      (   0.0000144,  -0.0000072) (  0d 0' 0.05"E,  0d 0' 0.03"S)
Band 1 Block=128x128 Type=UInt16, ColorInterp=Gray
  NoData Value=0

2) SRTM3-V4 is a dataset which have a 90 meters per pixel resolution and covers a area between 60 degrees of North latitude and 54 degrees of south latitude (872 GeoTiff files, each size is ~ 72 Mb). gdalinfo:

Driver: VRT/Virtual Raster
Files: dataset.vrt
       int16_dataset.vrt
Size is 432000, 144000
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,60.000000000000000)
Pixel Size = (0.000833333333333,-0.000833333333333)
Corner Coordinates:
Upper Left  (-180.0000000,  60.0000000) (180d 0' 0.00"W, 60d 0' 0.00"N)
Lower Left  (-180.0000000, -60.0000000) (180d 0' 0.00"W, 60d 0' 0.00"S)
Upper Right ( 180.0000000,  60.0000000) (180d 0' 0.00"E, 60d 0' 0.00"N)
Lower Right ( 180.0000000, -60.0000000) (180d 0' 0.00"E, 60d 0' 0.00"S)
Center      (   0.0000000,  -0.0000000) (  0d 0' 0.01"E,  0d 0' 0.00"S)
Band 1 Block=128x128 Type=UInt16, ColorInterp=Gray
  NoData Value=0

3) Aster GDEM have a 30 meters per pixel resolution (22700 GeoTiff files, each size is ~25 Mb). gdalinfo:

Driver: VRT/Virtual Raster
Files: dataset.vrt
       int16_dataset.vrt
Size is 1296001, 597601
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000138888888898,83.000138888888884)
Pixel Size = (0.000277777777778,-0.000277777777778)
Corner Coordinates:
Upper Left  (-180.0001389,  83.0001389) (180d 0' 0.50"W, 83d 0' 0.50"N)
Lower Left  (-180.0001389, -83.0001389) (180d 0' 0.50"W, 83d 0' 0.50"S)
Upper Right ( 180.0001389,  83.0001389) (180d 0' 0.50"E, 83d 0' 0.50"N)
Lower Right ( 180.0001389, -83.0001389) (180d 0' 0.50"E, 83d 0' 0.50"S)
Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
Band 1 Block=128x128 Type=UInt16, ColorInterp=Gray
  NoData Value=22769

I combined that datasets to eliminate a holes in it (no data regions).

Each dataset consists of many files. I built a list of this files for the each datasets and call gdalbuildvrt -input_file_list to build a 3 VRT-files, that presents each of this datasets.

Without going into details I got this files:

datasets/bm/dataset.vrt
datasets/srtm3-v4/dataset.vrt
datasets/aster-gdem/dataset.vrt

Than i built a text file 'complex_source_list.txt' with a contant:

datasets/bm/dataset.vrt
datasets/srtm3-v4/dataset.vrt
datasets/aster-gdem/dataset.vrt

and call gdalbuildvrt to get a resulted dataset vrt. No data regions will be filled in reverse order of 'complex_source_list.txt'. gdalinfo:

Files: __harvester_tmp__/dataset.vrt
       datasets/bm/dataset.vrt
       datasets/srtm3-v4/dataset.vrt
       datasets/aster-gdem/dataset.vrt
Size is 1296001, 648000
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        TOWGS84[0,0,0,0,0,0,0],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9108"]],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000138888888898,90.000000000000000)
Pixel Size = (0.000277777777778,-0.000277777777778)
Corner Coordinates:
Upper Left  (-180.0001389,  90.0000000) (180d 0' 0.50"W, 90d 0' 0.00"N)
Lower Left  (-180.0001389, -90.0000000) (180d 0' 0.50"W, 90d 0' 0.00"S)
Upper Right ( 180.0001389,  90.0000000) (180d 0' 0.50"E, 90d 0' 0.00"N)
Lower Right ( 180.0001389, -90.0000000) (180d 0' 0.50"E, 90d 0' 0.00"S)
Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
Band 1 Block=128x128 Type=UInt16, ColorInterp=Gray
  NoData Value=0

I need to project Earth onto cube faces (http://www.progonos.com/furuti/MapProj/Normal/ProjPoly/Foldout/Cube/cube.html).

I call gdalwarp to get a gnomonic projection (for example for the center in latitude: 90N, longitude: 0):

gdalwarp -wm 2000  -srcnodata -32768 -dstnodata 0 -q -te -6378137 -6378137 6378137 6378137 -t_srs "+proj=gnom  +lon_0=0 +lat_0=90 +ellps=WGS84 +datum=WGS84" -ts 195585 195585 -wo SOURCE_EXTRA=100 -wo SAMPLE_GRID=YES -r cubic -of VRT -overwrite "__harvester_tmp__\dataset.vrt" "__harvester_tmp__\cube_face.vrt"

gdalinfo:

Driver: VRT/Virtual Raster
Files: __harvester_tmp__/cube_face.vrt
       __harvester_tmp__/dataset.vrt
Size is 195585, 195585
Coordinate System is:
PROJCS["unnamed",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9108"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Gnomonic"],
    PARAMETER["latitude_of_origin",90],
    PARAMETER["central_meridian",0],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0]]
Origin = (-6378137.000000000000000,6378137.000000000000000)
Pixel Size = (65.221126364496257,-65.221126364496257)
Corner Coordinates:
Upper Left  (-6378137.000, 6378137.000) (135d 0' 0.00"W, 35d15'51.80"N)
Lower Left  (-6378137.000,-6378137.000) ( 45d 0' 0.00"W, 35d15'51.80"N)
Upper Right ( 6378137.000, 6378137.000) (135d 0' 0.00"E, 35d15'51.80"N)
Lower Right ( 6378137.000,-6378137.000) ( 45d 0' 0.00"E, 35d15'51.80"N)
Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E, 90d 0' 0.00"N)
Band 1 Block=512x128 Type=UInt16, ColorInterp=Gray
  NoData Value=0

Then I call to gdal_retile.py to split this cube face to tiles (255x255 pixels).

I got a strange behaviuor of gdal. If I work with projection with center in (lattitude 0, longitude 0), then it works a very fast and there are no extremly memory usage. If I work with the projection in (latitude 0, longitude 90) or (latitude 0, longitude 180) or (latitude 90, longitude 0) centers, then it works a very slow, the system is going on extremly swap usage. How can you explain it? I can explain extra memory using in case (latitude 90, longitude 0). It may occur, because gdal works with polar regions, and it need to process a too many tiles of datasets in regions that closer to north. I don't understand extra memory usage in (latitude 0, longitude 90) to compare of (latitude 0, longitude 0). May be the reason in my VRT files? I built it by going over files on the hard drive. Should I put it in a specific order? How the order of files can influence on the gdal behaviour?

comment:6 by ajuvolkov, 12 years ago

I able to reproduce the crash without a dataset files (The vrt files only are required). I think that the "memory overcommit" has nothing to do with this problem. Run the run.sh file in the attached archieve gdal_crash_example.tar.gz. Note, that the gdal_retile.py isn't a original script. It has a some little changes for my usability.

by ajuvolkov, 12 years ago

Attachment: gdal_crash_example.tar.gz added

comment:7 by Even Rouault, 12 years ago

trunk r24939 frmts/vrt/vrtsourcedrasterband.cpp -m "VRT: avoid crashes due to int overflow when dealing with requests filling a buffer larger than 2 GB (#4815)"

branches/1.9 r24940 frmts/vrt/vrtsourcedrasterband.cpp -m "VRT: avoid crashes due to int overflow when dealing with requests filling a buffer larger than 2 GB (#4815)"

This fixes the bug that causes the crash. (thanks by the way for making it so easy to reproduce!)

The large memory use is a different issue, much more complex to analyze. At first sight, I'd suspect that gdalwarp'ing into a VRT to be less clever than gdalwarp'ing into a TIFF, because it would not split requests that will need a large area of the source dataset into smaller requests. Perhaps you could try warping into a TIFF instead of a VRT to see if it improves the situation. But in some situations, due to the particularities of source and target SRS, computing a small target window might actually require a large amount of source data to be processed as you correctly guessed.

comment:8 by ajuvolkov, 12 years ago

Thank you for the quick fix. I have a question about the integer overflow. I used a flag "-wm 2000" to eliminate a large memory allocation that the warp API is allowed to use for caching. Why is it not fixing a larger than 2 GB buffer allocations?

in reply to:  8 comment:9 by Even Rouault, 12 years ago

Replying to ajuvolkov:

I used a flag "-wm 2000" to eliminate a large memory allocation that the warp API is allowed to use for caching. Why is it not fixing a larger than 2 GB buffer allocations?

See my previous comment: when retrieving the pixels from a VRT generated by gdalwarp -of VRT, the code path taken isn't exactly the same as the one taken by gdalwarp to a geotiff. In particuliar, it seems that its logic ignores the warping memory limit. Hence my suggestion to warp to a "concrete" format such as TIFF instead of VRT as a workaround

comment:10 by ajuvolkov, 12 years ago

Thank you for the reply. I have a question about the gdalwarp'ing into a TIFF. Imaging that we have a dataset in the Plate Carree projection. This dataset is divided onto tiles. Say, I want to make gnomonic projection with center in lat=0, lon=0 and put it on the cube face. I take a tiles, that covers the region lat: -45...45 degrees, lon: -45...45 degrees. For example, we got a tile set 90x90 tiles (tile size is 1 degree). I reproject each tile separately in gnomonic projection and obtain a tile set 90x90 tiles. New tiles will have a no data regions, because the source tiles are in Plate Carre projection. Would there be any problems with those no data regions? I mean, if I'll build a VRT dataset from the new tiles, and do gdalretile on it, there possibly will be gaps between the adjacent tiles. And also I am affraid that some edge pixels could be discarded or wont stick seamlessly with edge pixels of neighboring tiles (while in Plate Carre they stick perfectly).

comment:11 by Even Rouault, 12 years ago

Component: UtilitiesGDAL_Raster
Keywords: vrt added
Milestone: 1.9.2
Resolution: fixed
Status: newclosed

I'd like that we keep this ticket focused on a specific issue, and in fact I think it has been adressed by the fix done, so I close it. (You could open a new ticket about the large memory usage when using warped VRT vs using regular warping)

Your last question belongs to the gdal-dev mailing list where other people can perhaps answer.

Note: See TracTickets for help on using tickets.