The gdalwarp utility
gdalwarp is an image mosaicing, reprojection and warping utility. The program can reproject to any supported projection, and can also apply GCPs stored with the image if the image is "raw" with control information.
The official documentation for the gdalwarp utility is http://www.gdal.org/gdalwarp.html.
- What if nodata is different in each image?
- Will increasing RAM increase the speed of gdalwarp?
- How does -et Error Threshold work?
- [GeoTIFF output] -co COMPRESS= is broken!
- [GeoTIFF output] Use -co TILED=YES when possible
What if nodata is different in each image?
If the nodata tag is set in the geotiff header, gdalwarp will use it automatically, you don't need to do anything. However, -srcnodata overrides this, so if you are handling a bunch of images with different values to be ignored you need to either a) pre-process them to have the same to-be-ignored value, or b) set the nodata flag for each file. Use (b) if you need to preserve the original values for some reason, for example:
# for this image we want to ignore black (0) gdalwarp -srcnodata 0 -dstnodata 0 orig-ignore-black.tif black-nodata.tif # and now we want to ignore white (0) gdalwarp -srcnodata 255 -dstnodata 255 orig-ignore-white.tif white-nodata.tif # and finally ignore a particular blue-grey (RGB 125 125 150) gdalwarp -srcnodata "125 125 150" -dstnodata "125 125 150" orig-ignore-grey.tif grey-nodata.tif # now we can mosaic them all and not worry about nodata parameters gdalwarp -dstnodata 0 black-nodata.tif grey-nodata.tif white-nodata.tif final-mosaic.tif
If you are wondering why the -dstnodata is there, it's because although gdalwarp automatically honours input nodata, but before GDAL 1.11, it didn't carry that through unless instructed up.
Will increasing RAM increase the speed of gdalwarp?
Adding ram will almost certainly increase the speed. That’s not at all the same as saying that it is worth it, or that the speed increase will be significant. Disks are the slowest part of the process.
By default gdalwarp won't take much advantage of RAM. Using the flag "-wm 500" will operate on 500MB chunks at a time which is better than the default. To increase the io block cache size may also help. This can be done on the command like:
gdalwarp --config GDAL_CACHEMAX 500 -wm 500 ...
This uses 500MB of RAM for read/write caching, and 500MB of RAM for working buffers during the warp. Beyond that it is doubtful more memory will make a substantial difference.
Check CPU usage while gdalwarp is running. If it is substantially less than 100% then you know things are IO bound. Otherwise they are CPU bound.
Caution : increasing the value of the -wm param may lead to loss of performance in certain circumstances, particularly when gdalwarping *many* small rasters into a big mosaic. See http://trac.osgeo.org/gdal/ticket/3120 for more details
Error allocating memory
In some cases using --config GDAL_CACHEMAX xxx -wm xxx can be detrimental. A symptom of this is:
ERROR 2: Out of memory allocating 365425784 byte destination buffer.
Our understanding is that 32 bit processes are frequently subject to memory fragmentation problems and so even in a process with - in theory - 2GB of heap RAM space available it is still often difficult to allocate large blocks of memory. In this case with -wm 500 set the main warp buffers can be quite large (364MB for the example) and it seems it has failed to allocate the buffer. Use more modest buffers (or not use the options at all) or else get a 64 bit executable for gdalwarp.
Warp and Cache Memory: Technical Details
The GDAL_CACHEMAX affects the amount of space reserved for the low level io buffers. When blocks are read from disk, or written to disk, they are cached in a global block cache by the GDALRasterBlock class. Once this cache exceeds GDAL_CACHEMAX old blocks are flushed from the cache.
You can think of this as primarily an IO cache, and it mostly benefits you when you might need to read or write file blocks more than once. This could occur, for instance, in a scanline oriented input file which is processed in multiple chunks (horizontally) by gdalwarp.
The -wm flag affects the warping algorithm. The warper will total up the memory required to hold the input and output image arrays and any auxilary masking arrays and if they are larger than the "warp memory" allowed it will subdivide the chunk into smaller chunks and try again.
If the -wm value is very small there is some extra overhead in doing many small chunks so setting it larger is better but it is a matter of diminishing returns.
Overall, there is only so much more memory can do for you.
To get a better sense of how things are working, you might want to try running in debug mode. For instance:
gdalwarp --debug on abc.tif def.tif
When things shut down you may see messages like:
GDAL: 224 block reads on 32 block band 1 of utm.tif.
In this case it is saying that band 1 of utm.tif has 32 blocks, but that 224 block reads were done, implying that lots of data was having to be re-read, presumably because of a limited io cache.
You will also see messages like:
GDAL: GDALWarpKernel()::GWKNearestNoMasksByte() Src=0,0,512x512 Dst=0,0,512x512
The Src/Dst? windows show you the "chunk size" being used. In this case my whole image which is very small. If you find things are being broken into a lot of chunks increasing -wm may help somewhat.
But far more important than memory are ensuring you are going through an optimized path in the warper. If you ever see it reporting GDALWarpKernel()::GWKGeneralCase() you know things will be relatively slow. Basically, the fastest situations are nearest neighbour resampling on 8bit data with no nodata or alpha masking in effect.
How does -et Error Threshold work?
By default gdalwarp uses a linear approximator for the transformations with a permitted error of (I think [FW]) a quarter pixel. This basically transforms three points on a scanline. The start, end and middle. Then it compares the linear approximation of the center based on the end points to the real thing and checks the error. If the error is less than the error threshold then the remaining points are approximated (in two chunks utilizing the center point). The error threshold (in pixels) can be controlled with the gdalwarp -et switch.
So if you want to compare a true pixel-by-pixel reprojection use -te 0 which disables this approximator entirely.
[GeoTIFF output] -co COMPRESS= is broken!
When I compress an image with gdalwarp the result is often many times larger than the original!
By default gdalwarp operates on chunks that are not necessarily aligned with the boundaries of the blocks/tiles/strips of the output format, so this might cause repeated compression/decompression of partial blocks, leading to lost space in the output format.
The situation can be improved by using the OPTIMIZE_SIZE warping option ("-wo OPTIMIZE_SIZE=YES"), but note that depending on the source and target projections, it might also significantly slow down the warping process.
Another possibility is to use gdalwarp without compression and then follow up with gdal_translate with compression:
gdalwarp infile tempfile.tif ...options... gdal_translate tempfile.tif outfile.tif -co compress=lzw ...etc.
Alternatively, you can use a VRT file as the output format of gdalwarp. The VRT file is just an XML file that will be created immediately. The gdal_translate operations will be of course a bit slower as it will do the real warping operation.
gdalwarp -of VRT infile tempfile.vrt ...options... gdal_translate tempfile.vrt outfile.tif -co compress=lzw ...etc.
[GeoTIFF output] Use -co TILED=YES when possible
Due to the way gdalwarp proceeds, when generating a huge output file (width > 100,000 pixels for example), you should consider producing a tiled GeoTIFF file if it is an option for you (some software will only read strip TIFF files). If you have a lot of RAM, increasing the maximum cache size and the working buffers of the warping algorithm as explained above might also help if you generate a strip TIFF file.