Opened 11 years ago

Closed 11 years ago

#4228 closed defect (fixed)

[PATCH] HDF5 variable length string attributes

Reported by: vroeyt Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: default Version: 1.8.0
Severity: normal Keywords: HDF5
Cc: antonio

Description

GDAL crashes (memory corruption problem) when it tries to read an HDF-5 attribute containing variable length string values.

I fixed the bug in hdf5dataset.cpp starting from line 625:

if( H5Tget_class( hAttrNativeType ) == H5T_STRING ) {

Here I've made the bug fix

}

See the attached hdf5dataset.cpp containing the bug fix.

Attachments (3)

hdf5dataset.cpp (35.1 KB ) - added by vroeyt 11 years ago.
ticket_4228.patch (2.2 KB ) - added by Even Rouault 11 years ago.
Patch build from merging full provided file with trunk version
sample.zip (49.2 KB ) - added by vroeyt 11 years ago.

Download all attachments as: .zip

Change History (16)

by vroeyt, 11 years ago

Attachment: hdf5dataset.cpp added

comment:1 by antonio, 11 years ago

Cc: antonio added

by Even Rouault, 11 years ago

Attachment: ticket_4228.patch added

Patch build from merging full provided file with trunk version

comment:2 by Even Rouault, 11 years ago

Summary: HDF5 variable length string attributes[PATCH] HDF5 variable length string attributes

vroeyt, to make the review process easier, it is better to attach patches rather than full versions, and to base them on the svn trunk developement version.

comment:3 by antonio, 11 years ago

vroeyt can you please provide a simple dataset to reproduce the issue? Thanks

by vroeyt, 11 years ago

Attachment: sample.zip added

comment:4 by vroeyt, 11 years ago

Added sample.zip (ZIP - file containing sample HDF5-file)

comment:5 by antonio, 11 years ago

Resolution: fixed
Status: newclosed

Thanks vroeyt. Fixed in trunk r23044.

comment:6 by Even Rouault, 11 years ago

Antonio,

I've commited r23047 /trunk/gdal/frmts/hdf5/hdf5dataset.cpp: HDF5: only free string array content on Linux for now (#4228)

How did I come with this ? I have #define DEBUG_VSIMALLOC and #define DEBUG_VSIMALLOC_STATS uncommended defined in port/cpl_vsisimple.cpp in my build. That way I can detect mismatched uses of standard malloc/free with CPLMalloc()/CPLFree(). I got a crash in hdf5_9 indicating such a mismatch.

The issue is that H5Aread() allocates the strings with its own memory allocator, so using CPLFree() isn't appropriate. On Linux, you won't see a difference, but on Windows, you could get crashes if HDF5 lib is compiled with a different C runtime than GDAL.

So for now, I've just replaced CPLFree() by free() and disable it in the Windows case. Better a memory leak than a crash.

I've looked a bit in HDF5 doc to see how to provide a custom memory allocator, I found H5Pset_vlen_mem_manager() which looked promising, but didn't manage to use it. If you have ideas...

comment:7 by antonio, 11 years ago

Thank you Even for fixing the issue. I will try to give a deeper look.

comment:8 by antonio, 11 years ago

OK Even, it should be now fixed in r23052. Can you pleas confirm?

It feels a little bit strange to use a function of the dataset API (H5Dvlen_reclaim) in this context but it seems to work.

comment:9 by Even Rouault, 11 years ago

Works for me on Linux and from an outsider point of view, looks appropriate. We will see tomorrow if Tamas' bots are still happy : http://www.gisinternals.com/sdk/

comment:10 by Even Rouault, 11 years ago

Resolution: fixed
Status: closedreopened

Antonio,

I see that you have pushed recently autotest/gdrivers/data/vlstr_metadata.h5 which is 5MB large. This is really really huge with respect to our usual practice. Quoting Frank in #4233 "We deliberately keep file sizes in svn quite modest to avoid bloating the autotest package and to avoid making svn grabs painfully large."

I've uploaded vlstr_metadata.h5 to http://download.osgeo.org/gdal/data/hdf5/vlstr_metadata.h5

Could you please svn delete autotest/gdrivers/data/vlstr_metadata.h5 and update hdf5.py so it fetches the file from there. You can use gdaltest.download_file() for that. It is just used at the bottom of hdf5.py. Thanks

comment:11 by antonio, 11 years ago

Sorry Even, I didn't realized the dataset was so large. I think I can shrink it to few bytes, after all it is currently used only for testing VL metadata. Is it OK for you?

comment:12 by Even Rouault, 11 years ago

Yes, using shortened versions of files (if you read only the metadata and not the imagery) is something that is regularly done in autotests. As a rule of thumb, avoid putting files > 10 Ko into SVN

comment:13 by antonio, 11 years ago

Resolution: fixed
Status: reopenedclosed

A small version of the dataset is now in svn (r23083).

Note: See TracTickets for help on using tickets.