How to generate efficiently a large Landsat7 mosaic


A common task may be to stitch together the freely available Landsat7 scenes to create a much larger image. For example, we would like to have a image covering a complete U.S. state such as California. Often a resolution of about 250m per pixel may suffice and then other sources such as MODIS become preferable. If higher resolutions are required it becomes necessary to deal with the creation of a large mosaic from many tiles which requires some efficiency in data processing. The goal will be to have a large single file as output.

The basic steps are:

1) use Global Land Survey selection of Landsat scenes and download all scenes. They come in separate geotiff files for each band, and are in UTM projections per UTM zone

2) create vrt for each scene to combine bands say 3,2,1

3) combine all vrt of the same UTM zone into a single vrt

4) reproject (warp) all combined vrts to geographic "projection", needs to create geotiffs

5) merge all reprojected geotiffs

6) post-process with gimp: levels, color balance, color enhance

Detailed steps

get LS7 scenes

Global Land Survey data are preselected Landsat scenes of good quality and no cloud cover. Make smart use of Glovis or earthexplorer to select GLS scenes and use their bulk download tool after "ordering". GLS2000 has scenes from before the failure of one of the LS7 sensors. WELD has nice mosaics but only after the failure of one of the LS7 sensors in 2003 and only allows for relatively small tiles to be extracted. The LS7 scenes are delivered in compressed tar archives which contain one geotiff file per band in UTM projection for the UTM zone appropriate for the scene.

Put all downloaded files in a working directory and unzip:

bash$ for zipped in *tar.gz; do tar -xzvf $zipped; done

This will take a while and create a lot of files. They are systematically named by path, row, acquisition date, UTM zone, and band.

use a vrt to combine bands per scene

We want to avoid to duplicate data as much as possible, and gdal's vrts make this easy. If we want to use LS7 bands 3, 2 and 1 ("natural color") as R, G and B channels, we can first combine the band using gdalbuiltvrt -separate. We do want to automate this as much as possible since we have many scenes. One way to do that is to first build a list of unique scene ids consisting of band and row which we will then use in a loop to combine the bands per scene:

bash$ ls p*tar.gz | sed 's/_.*//' > scenes.list

We take advantage of the systematic naming of all downloaded files and strip out everything but the path and row description at the beginning of the file name.

bash$ for s in `cat scenes.list`; do gdalbuildvrt -separate $s.vrt ${s}*30.tif ${s}*20.tif ${s}*10.tif; done

The loop will be relatively quick since it just creates the proper vrt descriptions as p_pathNumber_r_rowNumber.vrt

Add all vrts from the same UTM zone to a single vrt

This step is only necessary because we need to reproject all scenes to a single, shared projection (here latlong) using gdalwarp and gdalwarp does not deal with vrts which contain images in different projections. We could skip this step and just reproject all band-combined vrts from the previous step into a single merged geotif as in:

bash$ gdalwarp -srcnodata 0 -te -125 32 -113 42 -wm 500 -t_srs "+proj=latlong" -r bilinear p*.vrt allp_allr.tif

and we are done. This would be simplest way and avoids any duplication. However, gdalwarp can take a long time the way it processes many input files (about 2.5 hours for 30 scenes) and this is why we explore another way by merging all tiles from each UTM zone separately.

We can identify all scenes from the same UTM zone for example by

bash$ ls p*z10*10.tif | sed 's/_.*//'

This just lists all band 1 geotiffs in zone 10 and then gets rid of everything after the first underscore and leaves the path and row numbers. Now we can add up all zone 10 vrts into a single vrt:

bash$ gdalbuildvrt -srcnodata 0 all_zone10.vrt `ls p*z10*10.tif | sed 's/_.*/.vrt/'`

Note that we add the correct .vrt extension with sed substitution, and that we specify the nodata value to help with creating the mosaic. gdal is smart enough to deal with vrts within a vrt but gdalbuildvrt itself is not smart enough to resolve the internal vrts to their actual sources. Again, this step is fast since no large files are created.

Similarly we can build other UTM zone vrts:

bash$ gdalbuildvrt -srcnodata 0 all_zone11.vrt `ls p*z11*10.tif | sed 's/_.*/.vrt/'`

In theory, one could loop over all UTM zones involved this way if there are many.

reproject all UTM zone vrts

Next we need to reproject all large zone vrts to single, shared projection. One could just use one of UTM zones as this target projection but here we want to use geographic (latlong) projection meaning that pixels will be referenced by geographic coordinates.

This is just a standard gdalwarp invocation:

bash$ gdalwarp -t_srs "+proj=latlong" -r bilinear all_zone10.vrt all_zone10_geo.tif

We are using bilinear interpolation as a trade-off between time and quality although nearest neighbour may be ok since we are not reducing the resolution. gdalwarp picks a good target resolution based on the input resolution. This will take a while. gdalwarp will also mosaic overlapping areas together and will respect the nodata values defined above, eg. black areas in the scenes are ignored.

Again, one could loop over all UTM zones this way if there are many, or just repeat for each zone:

bash$ gdalwarp -t_srs "+proj=latlong" -r bilinear all_zone11.vrt all_zone11_geo.tif

This step will about double the total size of the working directory, say from 8 GB downloaded data to 16 GB.

merge all reprojected images

Finally, it is time to actually merge together the large geotiffs created in the previous step. Again, this is a standard invocation:

bash$ -n 0 -ul_lr -125 42 -113 32 -o mosaic.tif all_zone*geo.tif

This may take a long time and adds another increment of original download size, say from 16 GB to 24 GB. I found that it is necessary to declare again the input nodata value (-n 0) since gdalwarp does not carry it over from its input. Often it makes sense to trim the output (-ul_lr) since only small areas in the tiles around the border of the area interest are required. This cuts down on the file size right away and can provide easy to use borders for usage of the created mosaic.

It is worth noting that since up to this point we only used bash and gdal, all the processing could be combined into a single script for re-use. and experimentation (say other band combinations, interpolation, tile selection ...)


Gimp can deal with large tiffs pretty well. I found that enhancing the contrast per channel by restricting the channel input levels, then some global, careful color balancing followed by auto-color enhancement can bring out features of interest or help to produce an image which is better suited to the human eye. Gimp is also good for subsampling since it efficiently implements interpolation. For example, it may suffice to subsample from the original 30m resolution to 120m which cuts down file size by a factor of 16 but is still noticeably better than 250m MODIS resolution.

Edits to this page

Apr 28, 2014

Last modified 5 years ago Last modified on Apr 28, 2014 7:14:54 AM