Opened 18 years ago

Closed 16 years ago

#1000 closed defect (fixed)

HFA reading code does not handle reduced precision properly

Reported by: sam.gillingham@… Owned by: warmerdam
Priority: normal Milestone: 1.5.3
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords: hfa
Cc: Mateusz Łoskot, globalmapper

Description (last modified by Mateusz Łoskot)

Frank,

I am having some trouble reading some compressed img files which I will attach
to this bug. One is float and the other signed 32 bit int. The central block of
each of these images is compressed using the 'reduced precision' system (numruns
= -1). I see this was added back in GDAL 1.1.5. However, the values GDAL reads
out are quite different from what Imagine displays - the numbers are huge for
floats and zero for ints where they should be more similar to the sorrounding
values. 

I had a quick look and couldn't see anything obvious in the reading code and I
couldn't find any documentation about the layout of reduced precision blocks. Do
you have additional documentation or any insight as to what could be going wrong?

Thanks,

Sam.

Attachments (3)

reduced_precision.zip (75.0 KB ) - added by sam.gillingham@… 18 years ago.
img files that demonstrate the problem
UncompressBlock.cpp (15.0 KB ) - added by globalmapper 16 years ago.
Updated UncompressBlock implementation with possible EPT_f32 scaling fix
usgssubset.zip (215.9 KB ) - added by Sam Gillingham 16 years ago.
Subset of mb1_be_07.img from USGS that demonstrates problems with reduced precision with values around 512

Download all attachments as: .zip

Change History (42)

by sam.gillingham@…, 18 years ago

Attachment: reduced_precision.zip added

img files that demonstrate the problem

comment:1 by warmerdam, 17 years ago

I have confirmed this is still a problem. 

The int.img file blows the assertion in UncompressBlock (hfaband.cpp)
which indicates it is not a supported block code. 

The float file seems to be something different, but does seem to 
produce bad results. 

If you can come up with patches for this it would be quite helpful.  Otherwise, 
I'll try and have it addressed for the 1.4.1 release.

comment:2 by warmerdam, 17 years ago

On review I remember that the UncompressBlock() code in hfaband.cpp is
derived from the similar code in the arcinfo binary grid reader 
(gdal/frmts/aigrid/gridlib.c) but it hasn't been kept up to date as 
fixes and improvements were made in that module. 

I'm turning this over to Mateusz to try and synchronize the code. 

Mateusz, please insure that int.img is added to the test suite as it
is of modest size and would be a good check that the compression 
support hasn't rotted. 

The fixes should go into trunk and 1.4 branch. 


comment:5 by Mateusz Łoskot, 17 years ago

Description: modified (diff)
Milestone: 1.4.0
Status: newassigned

comment:6 by warmerdam, 17 years ago

Milestone: 1.4.01.4.1

comment:7 by Mateusz Łoskot, 17 years ago

I added new test cases to hfa.py test, reading metadata and calculating statistics from data attached to this report: float.img and int.img

comment:8 by Mateusz Łoskot, 17 years ago

Owner: changed from Mateusz Łoskot to warmerdam
Status: assignednew

Unfortunately, I'm not able to reproduce the problem, on 32-bit machine. Frank, on 64-bit machine, gets the assertion failure at line 537 of file hfaband.cpp. So, the conclusion is that this issue is related to 32/64 bit architecture.

BTW, Sam if you could provide some details about what architecture you use that would be helpful.

comment:9 by warmerdam, 17 years ago

Cc: warmerdam added
Owner: changed from warmerdam to Mateusz Łoskot

The problem was not 32/64bit related (just an artifact of how Mateusz tested for it).

There was missing code in UncompressBlock() for EPT_u32 and EPT_s32 types, which I have added.

And there was wrong code for handling unrunlength encoded blocks in UncompressBlock() for EPT_f32. It turns out there is a funky scaling scheme used for the offset values. I have added a case for 16bit offsets which are what is used in this (float.img) file, and now it works. But there might well be cases with 8bit or other offsets but I don't know what scaling values they would use so I've set the code to blow an assert in this case.

All changes applied to 1.4 branch and trunk.

Bouncing back to Mateusz to finalize the test script for these files. Mateusz, the range of values in float.img is small, so please add a test in hfa.py that reads pixel 100,100 and verifies it's value to four decimal places of precision. Also, please include a test that does checksumming for int.img to ensure we get the exact values.

All the histogram and statistics stuff in hfa.py isn't actually related to the image data - it is really just testing reading of stasistics metadata.

comment:10 by Mateusz Łoskot, 17 years ago

Status: newassigned

in reply to:  9 comment:11 by Mateusz Łoskot, 17 years ago

I added tests for float.img and int.img: http://trac.osgeo.org/gdal/changeset/11144

comment:12 by Mateusz Łoskot, 17 years ago

Resolution: fixed
Status: assignedclosed

comment:13 by globalmapper, 16 years ago

Resolution: fixed
Status: closedreopened

Some folks at the USGS have recently provided some .img files that contain the floating point tiles decompressed in UncompressBlock that don't quite work with the above fix. After messing around a bit, I found that using the following calculation for the new float value worked correctly:

fValue = fValue + (nRawValue / 32768.0);

while the old conversion of :

fValue = fValue + 0.25 * (nRawValue / 65536.0);

did not. However, the old conversion did work properly for the float.img sample file provided earlier in this thread.

The new working value for the USGS data looks like it makes a lot of sense for converting a signed 16-bit offset to a floating point offset (it gives a range of -1 to +1 for the offset range). Is it possible that the float.img file is messed up, or is the more unfortunate situation of different multiplier constants being used for different files the case. If the latter, does anyone have any idea where that multiplier might be stored?

The USGS data (which is quite large) is available at:

ftp://ftpext.usgs.gov/pub/cr/co/lakewood/Dan_Daniels/

Thanks,

Mike Global Mapper Support support@…

comment:14 by globalmapper, 16 years ago

After some further tests with more data files from the USGS, it appears that not only is the scale factor not constant for an entire file, but the correct scale factor to use can vary from tile to tile within a single file. This would seem to indicate to me that the appropriate scale factor to use is stored somewhere in the tile data. Does that help?

Thanks,

Mike Childs Global Mapper Software LLC support@…

comment:15 by warmerdam, 16 years ago

Keywords: hfa added
Milestone: 1.4.11.5.1
Priority: highnormal

Mike,

That is helpful, but clearly some more time in the hex editor will be required to work out this mystery. I do not anticipate working on this for at least a couple weeks, but if you want to dig into it that would be great.

I will try to address this issue for GDAL 1.5.1.

by globalmapper, 16 years ago

Attachment: UncompressBlock.cpp added

Updated UncompressBlock implementation with possible EPT_f32 scaling fix

comment:16 by globalmapper, 16 years ago

Severity: majornormal

Frank,

I did some more digging and it seems that the appropriate scaling factor depends on the magnitude of the minimum value for the block. This seems to scale with a power of 2. I have not been able to get things exactly worked out around boundaries (like 512) where I still get tiles using the wrong multiplier regardless of whether I round, take a ceiling, truncate, or several other things when getting the value to use for determining the multiplier, but everywhere else (at least for the 3 files that I tested) everything seems to work perfectly.

I have included a file in this thread with an updated UncompressBlock implementation for you to look at which includes the multiplier factor determination. I'm not quite up to 1.5.1 on the HFA library so I didn't just include the entire hfaband.cpp file. The important chunk of code for determining the scale is below:

/* -------------------------------------------------------------------- */ 
39 /*      For floating point output, the saved 16-bit value will be       */ 
40 /*      based on the minimum value for the tile. Note that the there    */ 
41 /*      still seems to be some problems selecting the correct factor    */ 
42 /*      right near a power of 2 boundary (like 512). I have not been    */ 
43 /*      able to determine how the switchover occurs at power of 2       */ 
44 /*      boundaries for the minimum value.                               */ 
45 /* -------------------------------------------------------------------- */ 
46         float fMultFactor = 1.0; 
47         float fMinValue = *((float *) &nDataMin); 
48         if ( nDataType == EPT_f32 ) 
49         { 
50             int nDataMin = ( fMinValue > 0.0 ) ? (int)fMinValue : (int)-fMinValue; 
51             int nDivShift = -9; 
52             for ( ; ( nDataMin > 0 ); nDivShift++, nDataMin >>= 1 ) {} 
53             if ( nDivShift < 0 ) 
54             { 
55                 nDivShift = -nDivShift; 
56                 fMultFactor = 1.0 / ( 1 << nDivShift ); 
57             } 
58             else 
59             { 
60                 fMultFactor = ( 1 << nDivShift ); 
61             } 
62         } 
63 

comment:17 by warmerdam, 16 years ago

Mike,

Nice job! I've incorporated your changes in trunk. If/when you are satisfied with it, I'll migrate this back into 1.5.x branch too. It would be lovely to have a small sample file demonstrating this problem if that is possible (I imagine not).

comment:18 by globalmapper, 16 years ago

Frank,

Unfortunately I don't have a small sample demonstrating the remaining problem. The smaller of the two .img files at ftp://ftpext.usgs.gov/pub/cr/co/lakewood/Dan_Daniels/ (a temporary location I'm sure) exhibits the problem where the terrain crosses 512 meters. That "smaller" file is still over 300 MB though. I don't have any tools that generates Erdas Imagine files using the compressed floating point tiles that exhibit this problem, do you?

comment:19 by warmerdam, 16 years ago

Cc: Mateusz Łoskot globalmapper added; warmerdam removed
Owner: changed from Mateusz Łoskot to warmerdam
Status: reopenednew

Mike,

I do not have a tool to produce these files. I think Sam manufactured me a small test file with Imagine when the bug was originally reported.

comment:20 by Sam Gillingham, 16 years ago

I created those original files using the Imagine subset tool from larger files. I have been intending to do the same with the USGS file but unfortunately too large to download at work. I will download it at home tonight hopefully and create some subsets.

Good detective work BTW! I suppose when we are happy we understand this hfacompress.cpp should be updated to use this mode with applicable. I'll have a look when I get some time.

comment:21 by globalmapper, 16 years ago

Everything seems to work perfectly with the decompression except very near a power of 2 boundaries (512 in particular). I did a lot of playing around trying to figure out exactly how the multiplier is determined at those boundaries, but was unable to come up with anything that worked.

I submitted a support request to Erdas Imagine support (apparently they are owned by Leica Geosystems now) to see if they could provide the exact algorithm, but I apparently need a Software Maintenance Agreement to even view their response to my inquiry, which may be difficult for me to get since I don't own any of their software. Does anyone else have a support agreement with them that would allow their response to be viewed? Here is the details for the support request:

Customer ID #:300461 Support Incident #:238092 Title of Case: Erdas Imagine IMG Format Specification

Thanks,

Mike Childs Global Mapper Software LLC support@…

comment:22 by Sam Gillingham, 16 years ago

Yes we have a support agreement. I couldn't browse your support request, but have made a similar one under our login. I'll post something if they get back to me.

I have also managed to download your USGS file. Send me some coordinates for problematic areas and I'll make some subsets with Imagine.

comment:23 by globalmapper, 16 years ago

Try the region around:

369500,3510500 - 370250,3510250

If you have Global Mapper (even the trial) installed, you can download the latest build from http://www.globalmapper.com/global_mapper9.zip and extract the contents over your existing v9.xx installation to see the problems. I would suggest creating a custom shader that has several dramatic changes of color in the range of 511 - 513 meters to easily visualize the problems.

Thanks,

Mike Global Mapper Support support@…

comment:24 by Sam Gillingham, 16 years ago

Can you send me those coordinates again? The image seems to go from 216279,3572711 - 240299,3557528

comment:25 by globalmapper, 16 years ago

Which file are you using? I forgot to mention that those coordinates are for the mb1_be_07.img file (the 350MB one).

comment:26 by Sam Gillingham, 16 years ago

I'm using mb1_be_01.img from the FTP link that I downloaded last night. Send me some coordinates for that one - they aren't too keen on me downloading large files at work.

comment:27 by globalmapper, 16 years ago

The mb1_be_01.img file does not have any problem areas. None of the tiles in it that are encoded with the 16-bit offset F32 format cross a power of 2 boundary. The mb1_be_07.img has tons of them though as a large chunk of the image is very flat and right around 512 meters in height. If it would help I can put a zipped up version of the mb1_be_01.img file on the globalmapper.com server. It is only about 180MB when zipped up.

comment:28 by Sam Gillingham, 16 years ago

Ahhh! That explains quite a bit. Yes zip that file up and I'll have a go at pulling it down.

comment:29 by globalmapper, 16 years ago

by Sam Gillingham, 16 years ago

Attachment: usgssubset.zip added

Subset of mb1_be_07.img from USGS that demonstrates problems with reduced precision with values around 512

comment:30 by Sam Gillingham, 16 years ago

Sorry about that - next time I'll actually read the thread properly before jumping in. I've performed the subset and the resulting file appears to show the same problem - i've attached it to this ticket.

Let me know if there are any problems or you want me to create other subsets - or log other jobs with Leica for that matter. I'll keep hassling them about the format spec.

comment:31 by globalmapper, 16 years ago

Thanks for the reduced subset, it should make debugging and testing.

Let me know if Leica ever responds with something on the format. We may be stuck without that.

comment:32 by Sam Gillingham, 16 years ago

OK after 14 posts with Leica support, I may have to admit failure for now. They kept referring me to the Imagine C Toolkit documentation which doesn't have any hard details. When I kept pressing them for more doco, I was told that is all they have, and I may have to talk to their 'consulting team' which sounds expensive.

I did find an old copy of the "ERDAS Field Guide" but didn't seem to have any more info than already here: http://home.gdal.org/projects/imagine/iau_docu1.pdf.

As it happens Leica Australasia is based just down the road from here in Brisbane. I will ask them too, but don't hold your breath - they've been less than cooperative on other requests, and they are just a sales office.

comment:33 by globalmapper, 16 years ago

Thanks for checking with Leica. I looked through the 'ERDAS Field Guide' as well, but didn't find anything helpful. I guess we are stuck for now unless Leice Australia happens to be useful.

At least the current solution is correct in almost all cases and is only very slightly off in the boundary cases where it is wrong, so unless you are dealing with very precise data (which unfortunately the USGS folks are with their LIDAR-based data), you probably won't even notice.

Thanks,

Mike Childs Global Mapper Software LLC support@…

comment:34 by Sam Gillingham, 16 years ago

I think its time for an update, but no good news i'm afraid.

I spoke to the regional manager and he agreed to help - we are a large customer and have a genuine need to run our processing on Linux. But unfortunately when he spoke to head office he became less helpful. His comments make me believe that the only complete documentation is in their source code. I'm not sure I can get any further without getting the lawyers in.

One more reason for us to move away from Imagine altogether...

comment:35 by Sam Gillingham, 16 years ago

I have had some more information from Leica (see below). The is from the online help. Does this help us? What in particular should I ask for?


Please find more info from our engineers..Please let me know that this is all you need.

Looks like there are different versions of the on-line help. I have highlighted the relevant clauses (that describe the range compression) in the help I am looking at in red below. There is an inconsistency in that the initial description does not allow for compression to 1, 2, and 4 bit dynamic ranges but the later text is correct (that this will happen). Another thing that is not explicit is that if the algorithm decides that it is not worthwhile to compress runs, numsegments will be set to -1 and you will just find the range compressed data at the data offset. This is actually the special case to which the comment extracted below from the GDAL code applies.

For uncompressed blocks, the data are simply packed into the block one pixel value at a time. Each pixel is read from the block as indicated by its data type. All non-integer data are uncompressed.

The compression scheme used by ERDAS IMAGINE is a two level run-length encoding scheme. If the data are an integral type, then the following steps are performed:

The minimum and maximum values for a block are determined.

The byte size of the output pixels is determined by examining the difference between the maximum and the minimum. If the difference is less than or equal to 256, then 8-bit data are used. If the difference is less than 65,536 then, 16-bit data are used, otherwise 32-bit data are used.

The minimum is subtracted from each of the values.

A run-length encoding scheme is used to encode runs of the same pixel value. The data minimum value occupies the first 4 bytes of the block. The number of run-length segments occupies the next 4 bytes, and the next 4 bytes are an offset into the block which indicates where the compressed pixel values begin. The next byte indicates the number of bits per pixel (1,2,4,8,16,32). These four values are encoded in the standard MIF format (as an Edms_RLCParams object at the start of the data block). Following this is the list of segment counts, following the segment counts are the pixel values. There is one segment count per pixel value.

No compression scheme is used if the data are non-integral.

min num segments data offset numbits per value data counts data values

Each data count is encoded as follows:

next 8 bits next 8 bits next 8 bits byte count high 6 bits byte 3 byte 2 byte 1 byte 0

There may be 1, 2, 3, or 4 bytes per count. The first two bits of the first count byte contains 0,1,2,3 indicating that the count is contained in 1, 2,3, or 4 bytes. Then the rest of the byte (6 bits) represent the six most significant bytes of the count. The next byte, if present, represents decreasing significance.

This order is different than the rest of the package. This was done so that the high byte with the encoded byte count would be first in the byte stream. This pattern is repeated as many times as indicated by the numsegments field.

The data values are compressed into the remaining space packed into as many bits per pixel as indicated by the numbitpervalue field.

in reply to:  35 comment:36 by globalmapper, 16 years ago

Unfortunately this doesn't help any as this documentation appears to be out of date. In particular, it says 'No compression scheme is used if the data are non-integral.' which is not the case with the problem data. Some form of RLE compression is being used for the floating point values in the problem files, but the exact nature of the RLE encoding for the floating point data is not described. Can you pass on that you need a description of the RLE-encoding for floating point samples?

Thanks,

Mike Global Mapper Support support@…

comment:37 by Sam Gillingham, 16 years ago

Have received the following from Leica. Also been told we will have to pay for any more info.


Our engineers also have noticed the same thing and the implementation may not match the documentation. If / when floating point data makes it down to the compression code it will be treated as 32-bit integers. This shouldn’t be a problem with respect to the integrity of the data, but will normally result in no compression.

in reply to:  37 comment:38 by globalmapper, 16 years ago

Holy Crap! I can't believe it was so simple and we completely overlooked it. By just treating the value as a 32-bit integer until the very end when it is stuffed in the output array, it worked perfectly for all samples! The following line made it worked perfectly:

((float *) pabyDest)[nPixelsOutput] = *((float*)( &nDataValue ));

As I am patching an older version of the HFA_band.cpp file, here is the relavent updated part in UncompressBlock for you (or someone) to merge into the checked in version:

/* ==================================================================== */
/*      If this is not run length encoded, but just reduced             */
/*      precision, handle it now.                                       */
/* ==================================================================== */
    if( nNumRuns == -1 )
    {
        pabyValues = pabyCData + 13;
        nValueBitOffset = 0;

        for( nPixelsOutput = 0; nPixelsOutput < nMaxPixels; nPixelsOutput++ )
        {
            int	nDataValue, nRawValue;

/* -------------------------------------------------------------------- */
/*      Extract the data value in a way that depends on the number      */
/*      of bits in it.                                                  */
/* -------------------------------------------------------------------- */
            if( nNumBits == 0 )
            {
                nRawValue = 0;
            }
            else if( nNumBits == 1 )
            {
                nRawValue =
                    (pabyValues[nValueBitOffset>>3] >> (nValueBitOffset&7)) & 0x1;
                nValueBitOffset++;
            }
            else if( nNumBits == 2 )
            {
                nRawValue =
                    (pabyValues[nValueBitOffset>>3] >> (nValueBitOffset&7)) & 0x3;
                nValueBitOffset += 2;
            }
            else if( nNumBits == 4 )
            {
                nRawValue =
                    (pabyValues[nValueBitOffset>>3] >> (nValueBitOffset&7)) & 0xf;
                nValueBitOffset += 4;
            }
            else if( nNumBits == 8 )
            {
                nRawValue = *pabyValues;
                pabyValues++;
            }
            else if( nNumBits == 16 )
            {
                nRawValue = 256 * *(pabyValues++);
                nRawValue += *(pabyValues++);
            }
            else if( nNumBits == 32 )
            {
                nRawValue = 256 * 256 * 256 * *(pabyValues++);
                nRawValue += 256 * 256 * *(pabyValues++);
                nRawValue += 256 * *(pabyValues++);
                nRawValue += *(pabyValues++);
            }
            else
            {
                printf( "nNumBits = %d\n", nNumBits );
                CPLAssert( FALSE );
                nRawValue = 0;
            }

/* -------------------------------------------------------------------- */
/*      Offset by the minimum value.                                    */
/* -------------------------------------------------------------------- */
            nDataValue = nRawValue + nDataMin;

/* -------------------------------------------------------------------- */
/*      Now apply to the output buffer in a type specific way.          */
/* -------------------------------------------------------------------- */
            if( nDataType == EPT_u8 )
            {
                ((GByte *) pabyDest)[nPixelsOutput] = (GByte) nDataValue;
            }
            else if( nDataType == EPT_u1 )
            {
                if( nDataValue == 1 )
                    pabyDest[nPixelsOutput>>3] |= (1 << (nPixelsOutput & 0x7));
                else
                    pabyDest[nPixelsOutput>>3] &= ~(1<<(nPixelsOutput & 0x7));
            }
            else if( nDataType == EPT_u2 )
            {
                if( (nPixelsOutput & 0x1) == 0 )
                    pabyDest[nPixelsOutput>>2] = (GByte) nDataValue;
                else if( (nPixelsOutput & 0x1) == 1 )
                    pabyDest[nPixelsOutput>>2] |= (GByte) (nDataValue<<2);
                else if( (nPixelsOutput & 0x1) == 2 )
                    pabyDest[nPixelsOutput>>2] |= (GByte) (nDataValue<<4);
                else
                    pabyDest[nPixelsOutput>>2] |= (GByte) (nDataValue<<6);
            }
            else if( nDataType == EPT_u4 )
            {
                if( (nPixelsOutput & 0x1) == 0 )
                    pabyDest[nPixelsOutput>>1] = (GByte) nDataValue;
                else
                    pabyDest[nPixelsOutput>>1] |= (GByte) (nDataValue<<4);
            }
            else if( nDataType == EPT_u16 )
            {
                ((GUInt16 *) pabyDest)[nPixelsOutput] = (GUInt16) nDataValue;
            }
            else if( nDataType == EPT_s16 )
            {
                ((GInt16 *) pabyDest)[nPixelsOutput] = (GInt16) nDataValue;
            }
            else if( nDataType == EPT_s32 )
            {
                ((GInt32 *) pabyDest)[nPixelsOutput] = nDataValue;
            }
            else if( nDataType == EPT_u32 )
            {
                ((GUInt32 *) pabyDest)[nPixelsOutput] = nDataValue;
            }
            else if( nDataType == EPT_f32 )
            {
/* -------------------------------------------------------------------- */
/*      Note, floating point values are handled as if they were signed  */
/*      32-bit integers.                                                */
/* -------------------------------------------------------------------- */
                ((float *) pabyDest)[nPixelsOutput] = *((float*)( &nDataValue ));
            }
            else
            {
                CPLAssert( FALSE );
            }
        }

        return CE_None;
    }

I have placed a new build of Global Mapper 9 at http://www.globalmapper.com/global_mapper9.zip with the change for you to try. Simply download that file and extract the contents into your existing v9.xx installation folder to give it a try.

Thanks,

Mike Childs Global Mapper Software LLC support@…

comment:39 by Sam Gillingham, 16 years ago

Frank,

Was this fix intended to be in the 1.5.2 release? Or has it been postponed?

comment:40 by warmerdam, 16 years ago

Doh!

Yes, it would have been desirable to get this into 1.5.2. Hopefully it will make it into 1.5.3.

comment:41 by Even Rouault, 16 years ago

Resolution: fixed
Status: newclosed

Fixed in trunk in r14709 and in branches/1.5 in r14710

Note: See TracTickets for help on using tickets.