Ticket #3366 (closed defect: invalid)

Opened 5 years ago

Last modified 4 years ago

GDALDriver::CreateCopy() and gdal_translate failed on large image

Reported by: ozys Owned by: warmerdam
Priority: high Milestone:
Component: GDAL_Raster Version: unspecified
Severity: major Keywords:
Cc:

Description

Data used: NITF, 16-bit, scanline, 40k x 100k panchromatic image, JPEG2000 compressed.

Trying to create a copy of above data with square/rectangular block-size.

CreateCopy?() memory use grows along with the size of image it tries to copy. With large image as above, CreateCopy?() was able to complete about 80% of its task when it used up all 8GB of swap and 98.5% of 8GB memory before the system killed the process. Similar trend is observed when using gdal_translate as well.

We also tested using uncompressed NITF, and the similar trend of growing memory use was observed as well.

valgrind was used to detect problems (gdal compiled with flags). No memory leak was found related to this issue. So it appears that CreateCopy?() has an ever growing request for memory which seems to correlate with the size of file it copies. This block of memory appears to be freed appropriately at the end of the copy process, thus valgrind does not see it as a leak.

We tested this using gdal 1.7 and the previous release of gdal with identical results. GDAL cache was varied as well from 64MB to 1GB with similar results (failure to complete the task and processed was killed by system).

Attachments

libecwj2-3.3-NCSPhysicalMemorySize-Linux.patch Download (0.5 KB) - added by rouault 5 years ago.
Avoid overflow in Linux implementation of NCSPhysicalMemorySize() in libecwj2-3.3

Change History

Changed 5 years ago by rouault

Ozy,

  • What is the target format of CreateCopy?()/gdal_translate ? NITF ?

  • Did you try what I suggested in my email and worked like a charm for me ? That is to say create a big NITF uncompressed file with Python :
import gdal
ds = gdal.GetDriverByName('NITF').Create('scanline.ntf', 40000, 100000, 1, gdal.GDT_Int16)
ds = None

And then:

gdal_translate -of NITF scanline.ntf tiled.ntf -co BLOCKSIZE=128

Changed 5 years ago by ozys

Even,

My target format is NITF, uncompressed.

The test you suggested worked fine. Memory use never exceeded 2.2GB on my system.

I take that the test image created using your Python script is uncompressed. I am not familiar in using gdal in Python environment. How do you create a scanline NITF compressed with JPEG2000 in Python? I would like to do the gdal_translate test again using compressed fake NITF file.

ozy

Here is a snapshot of my C++ code:

        GDALDataset* ds = (GDALDataset*) GDALOpen( src_image.c_str(), GA_ReadOnly );

        char **create_options = NULL;
        create_options = CSLSetNameValue( create_options, "IC", "NC" );
        create_options = CSLSetNameValue( create_options, "ICORDS", "G" );
        create_options = CSLSetNameValue( create_options, "BLOCKXSIZE", "128" );
        create_options = CSLSetNameValue( create_options, "BLOCKYSIZE", "128" );

        std::stringstream ss;

        if ( ds->GetMetadataItem( "RPC00A","TRE" ) != NULL )
        {
                ss.str("");
                ss << "RPC00A=" << ds->GetMetadataItem( "RPC00A","TRE" );
                create_options = CSLSetNameValue( create_options,"TRE", ss.str().c_str() );
        }else
        if ( ds->GetMetadataItem( "RPC00B","TRE" ) != NULL )
        {
                ss.str("");
                ss << "RPC00B=" << ds->GetMetadataItem( "RPC00B","TRE" );
                create_options = CSLSetNameValue( create_options,"TRE", ss.str().c_str() );
        }else
        {
                ss.str("");
                ss << "In scanline_to_block(), GetMetadataItem(\"RPC00A\",\"TRE\") and GetMetadataItem(\"RPC00B\",\"TRE\") return NULL";
                throw std::logic_error( ss.str() );
        }

        GDALDriver *driver = GetGDALDriverManager()->GetDriverByName("NITF");

        GDALDataset* out_ds= NULL;

        out_ds = driver->CreateCopy( out_image.c_str(), ds, FALSE, create_options, GDALTermProgress, NULL );

Changed 5 years ago by rouault

I've managed to generate a large JPEG2000 NITF from the scanline.ntf produced by the python script, by using the following commandline : gdal_translate --config ECW_LARGE_OK YES -of NITF scanline.ntf scanline_j2k.ntf -co IC=C8

(This takes several minutes to complete - The resulting image is only 8bit, not 16bit, as the ECW driver doesn't support it)

After that, I've done a gdalinfo -checksum / gdal_translate on scanline_j2k.ntf and I saw the memory usage increasing substantially, but it didn't go above 500 MB.

When using the JP2MrSID driver as the underlying driver, the memory usage stays quite low. So, the NITF driver is OK and I suspect this is an issue/feature of the ECW SDK or the way we use it in GDAL.

On further reading of the GDAL ECW driver and doc (  http://gdal.org/frmt_ecw.html) , I read that : "By default the ECW SDK will use up to one quarter of physical RAM for internal caching and work memory during decoding. This is normally fine, but the amount of memory to use can be adjusted by setting the GDAL_ECW_CACHE_MAXMEM configuration variable with an amount of memory in bytes. Config variables can be set in the environment or using the --config commandline switch."

Maybe the SDK does not work as advertized with your platform. I've tried setting the configuration variable GDAL_ECW_CACHE_MAXMEM to a small value (50 MB) and it did reduce the memory usage (however I wouldn't swear it exactly fitted in the specified value). I'd encourage you doing the same and hopefully it will solve your problem.

Changed 5 years ago by rouault

Actually, when looking a bit in the source code of the ECW SDK itself, I suspect there is a bug in the INT32 NCSPhysicalMemorySize(void) function of libecwj2-3.3/Source/C/NCSUtil/malloc.c.

In the Linux case, it expands to :

return(sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE))

If your system has more than 2GB of RAM, as the return is a INT32 only and no overflow check is done in the multiplication (contrary to the win32 case), you end up with a junk value and later checks based on 1/4 of that junk value have probably unspecified behaviour.

The attached patch might help.

Changed 5 years ago by rouault

Avoid overflow in Linux implementation of NCSPhysicalMemorySize() in libecwj2-3.3

Changed 4 years ago by rouault

  • status changed from new to closed
  • resolution set to invalid

Closing. I've referenced the patch in http://trac.osgeo.org/gdal/wiki/ECW?action=diff&version=6

Note: See TracTickets for help on using tickets.