Opened 6 years ago

Closed 6 years ago

#4228 closed defect (fixed)

[PATCH] HDF5 variable length string attributes

Reported by: vroeyt Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: default Version: 1.8.0
Severity: normal Keywords: HDF5
Cc: antonio

Description

GDAL crashes (memory corruption problem) when it tries to read an HDF-5 attribute containing variable length string values.

I fixed the bug in hdf5dataset.cpp starting from line 625:

if( H5Tget_class( hAttrNativeType ) == H5T_STRING ) {

Here I've made the bug fix

}

See the attached hdf5dataset.cpp containing the bug fix.

Attachments (3)

hdf5dataset.cpp (35.1 KB) - added by vroeyt 6 years ago.
ticket_4228.patch (2.2 KB) - added by Even Rouault 6 years ago.
Patch build from merging full provided file with trunk version
sample.zip (49.2 KB) - added by vroeyt 6 years ago.

Download all attachments as: .zip

Change History (16)

Changed 6 years ago by vroeyt

Attachment: hdf5dataset.cpp added

comment:1 Changed 6 years ago by antonio

Cc: antonio added

Changed 6 years ago by Even Rouault

Attachment: ticket_4228.patch added

Patch build from merging full provided file with trunk version

comment:2 Changed 6 years ago by Even Rouault

Summary: HDF5 variable length string attributes[PATCH] HDF5 variable length string attributes

vroeyt, to make the review process easier, it is better to attach patches rather than full versions, and to base them on the svn trunk developement version.

comment:3 Changed 6 years ago by antonio

vroeyt can you please provide a simple dataset to reproduce the issue? Thanks

Changed 6 years ago by vroeyt

Attachment: sample.zip added

comment:4 Changed 6 years ago by vroeyt

Added sample.zip (ZIP - file containing sample HDF5-file)

comment:5 Changed 6 years ago by antonio

Resolution: fixed
Status: newclosed

Thanks vroeyt. Fixed in trunk r23044.

comment:6 Changed 6 years ago by Even Rouault

Antonio,

I've commited r23047 /trunk/gdal/frmts/hdf5/hdf5dataset.cpp: HDF5: only free string array content on Linux for now (#4228)

How did I come with this ? I have #define DEBUG_VSIMALLOC and #define DEBUG_VSIMALLOC_STATS uncommended defined in port/cpl_vsisimple.cpp in my build. That way I can detect mismatched uses of standard malloc/free with CPLMalloc()/CPLFree(). I got a crash in hdf5_9 indicating such a mismatch.

The issue is that H5Aread() allocates the strings with its own memory allocator, so using CPLFree() isn't appropriate. On Linux, you won't see a difference, but on Windows, you could get crashes if HDF5 lib is compiled with a different C runtime than GDAL.

So for now, I've just replaced CPLFree() by free() and disable it in the Windows case. Better a memory leak than a crash.

I've looked a bit in HDF5 doc to see how to provide a custom memory allocator, I found H5Pset_vlen_mem_manager() which looked promising, but didn't manage to use it. If you have ideas...

comment:7 Changed 6 years ago by antonio

Thank you Even for fixing the issue. I will try to give a deeper look.

comment:8 Changed 6 years ago by antonio

OK Even, it should be now fixed in r23052. Can you pleas confirm?

It feels a little bit strange to use a function of the dataset API (H5Dvlen_reclaim) in this context but it seems to work.

comment:9 Changed 6 years ago by Even Rouault

Works for me on Linux and from an outsider point of view, looks appropriate. We will see tomorrow if Tamas' bots are still happy : http://www.gisinternals.com/sdk/

comment:10 Changed 6 years ago by Even Rouault

Resolution: fixed
Status: closedreopened

Antonio,

I see that you have pushed recently autotest/gdrivers/data/vlstr_metadata.h5 which is 5MB large. This is really really huge with respect to our usual practice. Quoting Frank in #4233 "We deliberately keep file sizes in svn quite modest to avoid bloating the autotest package and to avoid making svn grabs painfully large."

I've uploaded vlstr_metadata.h5 to http://download.osgeo.org/gdal/data/hdf5/vlstr_metadata.h5

Could you please svn delete autotest/gdrivers/data/vlstr_metadata.h5 and update hdf5.py so it fetches the file from there. You can use gdaltest.download_file() for that. It is just used at the bottom of hdf5.py. Thanks

comment:11 Changed 6 years ago by antonio

Sorry Even, I didn't realized the dataset was so large. I think I can shrink it to few bytes, after all it is currently used only for testing VL metadata. Is it OK for you?

comment:12 Changed 6 years ago by Even Rouault

Yes, using shortened versions of files (if you read only the metadata and not the imagery) is something that is regularly done in autotests. As a rule of thumb, avoid putting files > 10 Ko into SVN

comment:13 Changed 6 years ago by antonio

Resolution: fixed
Status: reopenedclosed

A small version of the dataset is now in svn (r23083).

Note: See TracTickets for help on using tickets.