Opened 12 years ago

Closed 11 years ago

#2108 closed defect (fixed)

[PATCH] GDALRasterBand::ComputeStatistics() does not use arbitrary overviews

Reported by: dron Owned by: dron
Priority: normal Milestone:
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords: statistics
Cc: warmerdam, Mateusz Łoskot

Description

GDALRasterBand::ComputeStatistics?() does note use arbitrary overviews when computing approximate statistics and hence does not have any benefit of reduced datasets. The whole dataset will be read even if it is not required (the block sampling and AdviseRead?() help a bit, but still).

The attached patch adds ability to request the subsampled overview of size ~2500 pixels and compute statistics on that reduced array.

Frank, please, take a look at the code as we talked about previously on IRC. If your are fine with this, I can add it both in trunk and in 1.5 branch. I think it can safely wait until the next maintenance release of 1.5.

Best regards, Andrey

Attachments (1)

arbitrary.diff (11.4 KB) - added by dron 12 years ago.

Download all attachments as: .zip

Change History (6)

Changed 12 years ago by dron

Attachment: arbitrary.diff added

comment:1 Changed 12 years ago by warmerdam

Keywords: statistics added

Andrey,

Skimming the .diff I find it quite hard to understand what is changing. It looks like the sampling of blocks mechanism is being altered in surprising ways. Is your change only intended to alter behavior for HasArbitraryOverviews? datasets?

I'm not comfortable with this going into the 1.5 branch as it is, though the problem may just me getting lost in the diff.

comment:2 Changed 12 years ago by Mateusz Łoskot

Cc: Mateusz Łoskot added

comment:3 in reply to:  1 Changed 12 years ago by dron

Replying to warmerdam:

It looks like the sampling of blocks mechanism is being altered in surprising ways. Is your change only intended to alter behavior for HasArbitraryOverviews? datasets?

Yes, for datasets providing arbitrary overviews I've added another way of subsampling. Instead of fetching blocks of data from overview layer I have added single IRasterIO() call to subsample the whole image to desired size (~2500 samples). ECW driver can do this subsampling very fast. On my test image of 21600x10800 in size this reduced overview is being fetched almost immediately, though the old block by block reading code works for several seconds, so it is noticeable improvement. I presume on larger datasets it will be even more noticeable.

I'm not comfortable with this going into the 1.5 branch as it is, though the problem may just me getting lost in the diff.

The only addition is the following snippet:

    if ( bApproxOK && HasArbitraryOverviews() )
    {
        double  dfReduction = sqrt(
            (double)nRasterXSize * nRasterYSize / GDALSTAT_APPROX_NUMSAMPLES );
        int     nXReduced = (int)( nRasterXSize / dfReduction );
        int     nYReduced = (int)( nRasterYSize / dfReduction );

        void    *pData =
            CPLMalloc(GDALGetDataTypeSize(eDataType)/8 * nXReduced * nYReduced);

        IRasterIO( GF_Read, 0, 0, nRasterXSize, nRasterYSize, pData,
                   nXReduced, nYReduced, eDataType, 0, 0 );

        /* this isn't the fastest way to do this, but is easier for now */
        for( int iY = 0; iY < nYReduced; iY++ )
        {
            for( int iX = 0; iX < nXReduced; iX++ )
            {
                int    iOffset = iX + iY * nXReduced;
                double dfValue = 0.0;

                switch( eDataType )
                {
                  case GDT_Byte:
                    dfValue = ((GByte *)pData)[iOffset];
                    break;
                  case GDT_UInt16:
                    dfValue = ((GUInt16 *)pData)[iOffset];
                    break;
                  case GDT_Int16:
                    dfValue = ((GInt16 *)pData)[iOffset];
                    break;
                  case GDT_UInt32:
                    dfValue = ((GUInt32 *)pData)[iOffset];
                    break;
                  case GDT_Int32:
                    dfValue = ((GInt32 *)pData)[iOffset];
                    break;
                  case GDT_Float32:
                    dfValue = ((float *)pData)[iOffset];
                    if (CPLIsNan(dfValue))
                        continue;
                    break;
                  case GDT_Float64:
                    dfValue = ((double *)pData)[iOffset];
                    if (CPLIsNan(dfValue))
                        continue;
                    break;
                  case GDT_CInt16:
                    dfValue = ((GInt16 *)pData)[iOffset*2];
                    break;
                  case GDT_CInt32:
                    dfValue = ((GInt32 *)pData)[iOffset*2];
                    break;
                  case GDT_CFloat32:
                    dfValue = ((float *)pData)[iOffset*2];
                    break;
                  case GDT_CFloat64:
                    dfValue = ((double *)pData)[iOffset*2];
                    break;
                  default:
                    CPLAssert( FALSE );
                }
                
                if( bGotNoDataValue && dfValue == dfNoDataValue )
                    continue;

                if( bFirstValue )
                {
                    dfMin = dfMax = dfValue;
                    bFirstValue = FALSE;
                }
                else
                {
                    dfMin = MIN(dfMin,dfValue);
                    dfMax = MAX(dfMax,dfValue);
                }

                dfSum += dfValue;
                dfSum2 += dfValue * dfValue;

                nSampleCount++;
            }
        }

        CPLFree( pData );
    }

comment:4 Changed 12 years ago by dron

I decided to commit this functionality with revision r13420. This ticket remains opened just for the case if we will decide to port it in 1.5 branch if there will be no problems reported on trunk.

comment:5 Changed 11 years ago by dron

Resolution: fixed
Status: newclosed

It seems there is no visible problems with the proposed patch, so the ticket can be closed now.

Note: See TracTickets for help on using tickets.