Ticket #4484 (assigned defect)

Opened 4 months ago

Last modified 4 months ago

gdalwarp on large netcdf datasets sets data from first input dataset to <nodata> and generates error if -dstnodata option is used with netcdf-4 file

Reported by: hsumanto Owned by: etourigny
Priority: normal Milestone:
Component: default Version: unspecified
Severity: normal Keywords: netcdf
Cc: etourigny

Description

Was trying to combine 2 large netcdf tiles into a single tile using gdalwarp and found a few interesting issues below:

The two large netcdf tiles could found here:

 http://staff.vpac.org/~hsumanto/tile_x0_y0.nc

 http://staff.vpac.org/~hsumanto/tile_x1_y0.nc

gdalwarp into nc4c with -dstnodata option:
Error encountered and output dataset viewed in QGIS is NOT OK

gdalwarp -tr 500.0 500.0 -t_srs EPSG:3112 -te -2576984.0 -5211665.0 1992016.0 -1219665.0 -r near -dstnodata 9.969209968386869E36 -of netCDF -co COMPRESS=DEFLATE -co ZLEVEL=6 -co FORMAT=NC4C -co WRITE_GDAL_TAGS=yes tile_x0_y0.nc tile_x1_y0.nc tile_combined-nc4c.nc

Creating output file that is 9138P x 7984L.
Processing input file tile_x0_y0.nc.
Using internal nodata values (eg. 9.96921e+36) for image tile_x0_y0.nc.
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing input file tile_x1_y0.nc.
Using internal nodata values (eg. 9.96921e+36) for image tile_x1_y0.nc.
ERROR 1: netcdf error #-122 : NetCDF: Attempt to define fill value when data already exists. .
at (netcdfdataset.cpp,SetNoDataValue,750)
0...10...20...30...40...50...60...70...80...90...100 - done.

gdalwarp into nc4c without -dstnodata option:
No error but output file when viewed in QGIS is NOT OK

gdalwarp -tr 500.0 500.0 -t_srs EPSG:3112 -te -2576984.0 -5211665.0 1992016.0 -1219665.0 -r near -of netCDF -co COMPRESS=DEFLATE -co ZLEVEL=6 -co FORMAT=NC4C -co WRITE_GDAL_TAGS=yes tile_x0_y0.nc tile_x1_y0.nc tile_combined-nc4c_withnodstnodata.nc

gdalwarp into geotiff:
No error and output file when viewed in QGIS is OK

gdalwarp -tr 500.0 500.0 -t_srs EPSG:3112 -te -2576984.0 -5211665.0 1992016.0 -1219665.0 -r near -dstnodata 9.969209968386869E36 tile_x0_y0.nc tile_x1_y0.nc tile_combined-tif.tif

Note: To view nc4c files with QGIS, you might have to convert nc4c into geotiff using gdal_translate as QGIS currently can not open nc4c files directly.

The QGIS screenshots are attached as well.

Attachments

output-geotiff.png Download (422.1 KB) - added by hsumanto 4 months ago.
The geotiff output file viewed in QGIS
output-nc4c.png Download (186.9 KB) - added by hsumanto 4 months ago.
NC4C output file when viewed in QGIS

Change History

Changed 4 months ago by hsumanto

The geotiff output file viewed in QGIS

Changed 4 months ago by hsumanto

NC4C output file when viewed in QGIS

Changed 4 months ago by rouault

  • cc etourigny added

Changed 4 months ago by etourigny

  • owner changed from warmerdam to etourigny
  • status changed from new to assigned
  • summary changed from gdalwarp on large netcdf datasets producing netcdf error #-122 (NetCDF: attempt to define fill value when data already exists) to gdalwarp on large netcdf datasets sets data from first input dataset to <nodata> and generates error if -dstnodata option is used with netcdf-4 file

The nodata error has been fixed in trunk (r23889), but it was not fatal. I have updated SetNoDataValue?() to only update the value when necessary. The error happens because gdalwarp calls SetNoDataValue?() for each input dataset, and netcdf-4 does not allow to change the nodata value after data is written.

For netcdf-4 support in qgis, you have to build qgis with gdal-1.9.0, your version probably uses 1.8. ubuntugis-unstable will soon include gdal-1.9.0.

The more serious issue needs more investigation. For some reason the data from the first dataset is set to nodata. If you use the "Value Tool" in QGis, you can see that that data is "null (no data)" rather than "out of extent".

Changed 4 months ago by hsumanto

I am wondering as well why the data from the first dataset is set to "null (no data)" while it actually contains some data (as shown in tile_combined-tif.tif which is the result of gdalwarp directly into geotiff).

Changed 4 months ago by hsumanto

Managed to reproduce the same issue on the below case too:
gdalwarp the two input nc4c dataset into nc format with or without -dstnodata option also set the first dataset into "null (no data)".

gdalwarp -tr 500.0 500.0 -t_srs EPSG:3112 -te -2576984.0 -52116
65.0 1992016.0 -1219665.0 -r near -of netCDF -co FORMAT=NC -co tile_x0_y0.nc tile_x1_y0.nc tile_combined-nc.nc

Changed 4 months ago by hsumanto

Etienne, this might be helpful information.

If I removed the -t_srs EPSG:3112 from gdalwarp, both the output dataset seems OK as it contains both correct data from the first and second datasets.

Give this a try.

gdalwarp into nc4c with no -t_srs option:
No error and output file when viewed in QGIS is OK

gdalwarp -tr 500.0 500.0 -te -2576984.0 -5211665.0 1992016.0 -1219665.0 -r near -of netCDF -co COMPRESS=DEFLATE -co ZLEVEL=6 -co FORMAT=NC4C -co WRITE_GDAL_TAGS=yes tile_x0_y0.nc tile_x1_y0.nc tile_combined-nc4c_withnosrs.nc

The issue needs more investigation now: why the first dataset is set to "null (no data)" when gdalwarp is run with -t_srs EPSG:3112 option.

Changed 4 months ago by hsumanto

Ignore my previous post, we could still reproduce the issue with or without -t_srs EPSG:3112 option. Sorry for making that mistake.

Changed 4 months ago by etourigny

The problem is because the warp operation uses chunking when data exceeds a certain threshold. The last chunk that is warped is written to file, but the previous chunks are overwritten to default values (nodata). The netcdf driver has not been designed with chunked IO in mind, so I need some time to sort out what is missing.

Try using --debug=on you will see various lines like

GDAL: GDALWarpKernel()::GWKGeneralCase()
Src=5397,573,1143x1996 Dst=0,0,1142x1996

GDAL: GDALWarpKernel()::GWKGeneralCase()
Src=6539,573,1143x1996 Dst=1142,0,1142x1996

In the meantime, the following work-around seems to work (at least on my machine, using your datasets) and also speeds up the process considerably: add the parameters '--config GDAL_CACHEMAX 1000 -wm 1000' to the command line. A larger dataset/data type will require larger cache values. See http://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp#WillincreasingRAMincreasethespeedofgdalwarp for more info.

Changed 4 months ago by hsumanto

Thanks for the reply Etienne, really appreciate it.

Note: See TracTickets for help on using tickets.