Opened 11 years ago

Closed 11 years ago

#2107 closed defect (fixed)

ASCII grid disregards nodatavalue when choosing data type

Reported by: asgerpetersen Owned by: Mateusz Łoskot
Priority: normal Milestone: 1.5.1
Component: GDAL_Raster Version: svn-trunk
Severity: normal Keywords: AAIGRID type statistics
Cc: warmerdam

Description

When choosing the data type for an ascii grid the driver doesn't care about the no data value which can lead to very wrong results.

Short example:

The grid

ncols        2
nrows        2
xllcorner    500000.00
yllcorner    6100000.00
cellsize     1
nodata_value -999999
 39 13
 -999999 22

returns the following statistics from gdalinfo -stats:

Band 1 Block=2x1 Type=Int16, ColorInterp=Undefined
  Minimum=-32145.000, Maximum=39.000, Mean=-12263.000, StdDev=13410.683
  NoData Value=-999999
  Metadata:
    STATISTICS_MINIMUM=-32145
    STATISTICS_MAXIMUM=39
    STATISTICS_MEAN=-12263

I think it should be possible to use a nodatavalue which is well outside the domain of the data values. So, in the above case the driver should choose GDT_Float32 from looking at the nodatavalue alone. This would have the nice side effect, that it makes the complete file scan unnecessary in cases like this.

Change History (8)

comment:1 Changed 11 years ago by warmerdam

Cc: warmerdam added
Component: defaultGDAL_Raster
Keywords: AAIGRID added
Owner: changed from warmerdam to Mateusz Łoskot

I agree that the nodata value should be considered in the data range for ascii files.

comment:2 Changed 11 years ago by Mateusz Łoskot

Keywords: type statistics added
Status: newassigned
Version: svn-trunk

If I understand this report correctly, there are actually two issues:

  1. Use nodata value to determine data type of a grid according to following algorithm:
    if nodata is float-point number
    {
      set grid data type to GDT_Float32
    }
    else // nodata is integral number
    {
      for pixels in sample chunk)
      {
        if float-point numbers found
          set data type to GDT_Float32
        else
          set data type to GDT_Integer16 (default type)
      }
    }
    
  1. When calculating statistics, nodata value should be also compared, so no data is not out of range of min/max value:
    min = MIN( calculated_min, nodata_value )
    max = MAX( calculated_max, nodata_value )
    

The 1st issue can be solved directly in AAIGrid driver. The 2nd issue seems to be solvable in GDALComputeRasterMinMax function only, so the fix would apply to all GDAL drivers. I think , this is generally correct that nodata value is in range of all values:

min <= nodata < max
or
min < nodata <= max

Could you confirm if I've caught the problem correctly?

comment:3 Changed 11 years ago by warmerdam

Mateusz,

I believe point 2 is not right. nodata values should not be included in min/max statistics if possible. I think you can just focus on point 1.

comment:4 Changed 11 years ago by Mateusz Łoskot

Frank,

I'm an idiot. I've no idea why I wanted to use nodata in statistics computation. This is completely stupid idea that nodata value is in range of all values. Sorry for that.

comment:5 Changed 11 years ago by Mateusz Łoskot

Resolution: fixed
Status: assignedclosed

Fixed in trunk (r13404)

comment:6 Changed 11 years ago by Mateusz Łoskot

Milestone: 1.5.1

Backported to branches/1.5 (r13407)

comment:7 Changed 11 years ago by asgerpetersen

Resolution: fixed
Status: closedreopened

Actually that is not what I meant. My problem was overflow. It doesn't matter if the nodatavalue is float if there are no nodatavalues in the data or if the nodatavalues in data are represented as ints.

I'm nut sure it is a good idea to make data type float just because the nodatavalue is float. Some drivers (I think GDAL included) always write the nodata as a float regardless of the "real" data type.

What actually meant was something like:

if nodata < -32,768 or nodata > 32,767
{
  set grid data type to GDT_Float32
}
else
{
  for pixels in sample chunk)
  {
    if float-point numbers found
      set data type to GDT_Float32
    else
      set data type to GDT_Integer16 (default type)
  }
}

I'm sorry, I din't come back to this before. You two are just too fast for me :-)

comment:8 Changed 11 years ago by Mateusz Łoskot

Resolution: fixed
Status: reopenedclosed

Thanks for the clarification. Right, this is slightly different than I understood previously.

Re-fixed in trunk (r13432, r13433) and branches/1.5 (r13434)

Note: See TracTickets for help on using tickets.