Opened 10 years ago

Closed 10 years ago

#4228 closed defect (fixed)

[PATCH] HDF5 variable length string attributes

Reported by: vroeyt Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: default Version: 1.8.0
Severity: normal Keywords: HDF5
Cc: antonio

Description

GDAL crashes (memory corruption problem) when it tries to read an HDF-5 attribute containing variable length string values.

I fixed the bug in hdf5dataset.cpp starting from line 625:

if( H5Tget_class( hAttrNativeType ) == H5T_STRING ) {

Here I've made the bug fix

}

See the attached hdf5dataset.cpp containing the bug fix.

Attachments (3)

hdf5dataset.cpp (35.1 KB) - added by vroeyt 10 years ago.
ticket_4228.patch (2.2 KB) - added by Even Rouault 10 years ago.
Patch build from merging full provided file with trunk version
sample.zip (49.2 KB) - added by vroeyt 10 years ago.

Download all attachments as: .zip

Change History (16)

Changed 10 years ago by vroeyt

Attachment: hdf5dataset.cpp added

comment:1 Changed 10 years ago by antonio

Cc: antonio added

Changed 10 years ago by Even Rouault

Attachment: ticket_4228.patch added

Patch build from merging full provided file with trunk version

comment:2 Changed 10 years ago by Even Rouault

Summary: HDF5 variable length string attributes[PATCH] HDF5 variable length string attributes

vroeyt, to make the review process easier, it is better to attach patches rather than full versions, and to base them on the svn trunk developement version.

comment:3 Changed 10 years ago by antonio

vroeyt can you please provide a simple dataset to reproduce the issue? Thanks

Changed 10 years ago by vroeyt

Attachment: sample.zip added

comment:4 Changed 10 years ago by vroeyt

Added sample.zip (ZIP - file containing sample HDF5-file)

comment:5 Changed 10 years ago by antonio

Resolution: fixed
Status: newclosed

Thanks vroeyt. Fixed in trunk r23044.

comment:6 Changed 10 years ago by Even Rouault

Antonio,

I've commited r23047 /trunk/gdal/frmts/hdf5/hdf5dataset.cpp: HDF5: only free string array content on Linux for now (#4228)

How did I come with this ? I have #define DEBUG_VSIMALLOC and #define DEBUG_VSIMALLOC_STATS uncommended defined in port/cpl_vsisimple.cpp in my build. That way I can detect mismatched uses of standard malloc/free with CPLMalloc()/CPLFree(). I got a crash in hdf5_9 indicating such a mismatch.

The issue is that H5Aread() allocates the strings with its own memory allocator, so using CPLFree() isn't appropriate. On Linux, you won't see a difference, but on Windows, you could get crashes if HDF5 lib is compiled with a different C runtime than GDAL.

So for now, I've just replaced CPLFree() by free() and disable it in the Windows case. Better a memory leak than a crash.

I've looked a bit in HDF5 doc to see how to provide a custom memory allocator, I found H5Pset_vlen_mem_manager() which looked promising, but didn't manage to use it. If you have ideas...

comment:7 Changed 10 years ago by antonio

Thank you Even for fixing the issue. I will try to give a deeper look.

comment:8 Changed 10 years ago by antonio

OK Even, it should be now fixed in r23052. Can you pleas confirm?

It feels a little bit strange to use a function of the dataset API (H5Dvlen_reclaim) in this context but it seems to work.

comment:9 Changed 10 years ago by Even Rouault

Works for me on Linux and from an outsider point of view, looks appropriate. We will see tomorrow if Tamas' bots are still happy : http://www.gisinternals.com/sdk/

comment:10 Changed 10 years ago by Even Rouault

Resolution: fixed
Status: closedreopened

Antonio,

I see that you have pushed recently autotest/gdrivers/data/vlstr_metadata.h5 which is 5MB large. This is really really huge with respect to our usual practice. Quoting Frank in #4233 "We deliberately keep file sizes in svn quite modest to avoid bloating the autotest package and to avoid making svn grabs painfully large."

I've uploaded vlstr_metadata.h5 to http://download.osgeo.org/gdal/data/hdf5/vlstr_metadata.h5

Could you please svn delete autotest/gdrivers/data/vlstr_metadata.h5 and update hdf5.py so it fetches the file from there. You can use gdaltest.download_file() for that. It is just used at the bottom of hdf5.py. Thanks

comment:11 Changed 10 years ago by antonio

Sorry Even, I didn't realized the dataset was so large. I think I can shrink it to few bytes, after all it is currently used only for testing VL metadata. Is it OK for you?

comment:12 Changed 10 years ago by Even Rouault

Yes, using shortened versions of files (if you read only the metadata and not the imagery) is something that is regularly done in autotests. As a rule of thumb, avoid putting files > 10 Ko into SVN

comment:13 Changed 10 years ago by antonio

Resolution: fixed
Status: reopenedclosed

A small version of the dataset is now in svn (r23083).

Note: See TracTickets for help on using tickets.