Opened 15 years ago

Closed 15 years ago

#3023 closed defect (invalid)

memory leak in hdf driver

Reported by: vincentschut Owned by: warmerdam
Priority: normal Milestone:
Component: GDAL_Raster Version: svn-trunk
Severity: normal Keywords: HDF4
Cc: Kyle Shannon

Description (last modified by warmerdam)

I'm afraid I've encountered a memory leak in the hdf driver. Context: because of massive parallel (cluster) processing, I'm reusing a python instance for lots of jobs. Some of these jobs use gdal to read and/or save data. In some cases, I saw the memory use of my python worker processes grow with each job, until the whole lot got killed. I've got 8 gig on that server :-) . I've been able to track down the leak to the use of gdal.Open on a hdf (modis) dataset. I guess this means that either the gdal hdf driver or libhdf4 has the leak.

Here is some python code (runs on linux only due to the code to get the mem usage) to prove a leak when opening a hdf file, while no leak when opening a tif. Run in a folder with 'data.hdf' and 'data.tif' present, or change those names to an existing hdf and tif file.

Using 64bit linux, gdal svn rev. 17228 (today), libhdf4.2r2, python 2.6.2

========= python code =========

import os
from osgeo import gdal

def getmemory():
   proc_status = '/proc/%d/status' % os.getpid()
   scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
         'KB': 1024.0, 'MB': 1024.0*1024.0}
   v = open(proc_status).read()
   i = v.index('VmSize:')
   v = v[i:].split(None, 3)
   return (int(v[1]) * scale[v[2]])

nFiles = 100

m0 = getmemory()
print 'memory usage before:', m0
print
print nFiles, 'times the same hdf file'
for i in range(nFiles):
   gdal.OpenShared('data.hdf')

m1 = getmemory()
print 'memory usage now:', m1, '  difference:', m1-m0
print
print nFiles, 'times the same tif file'
for i in range(nFiles):
   gdal.OpenShared('data.tif')

m2 = getmemory()
print 'memory usage now:', m2, '  difference:', m2-m1

Attachments (1)

valgrind.txt (3.0 KB ) - added by vincentschut 15 years ago.
valgrind output of gdalinfo on a (modis) hdf file

Download all attachments as: .zip

Change History (9)

comment:1 by Kyle Shannon, 15 years ago

Cc: Kyle Shannon added

comment:2 by warmerdam, 15 years ago

Component: defaultGDAL_Raster
Description: modified (diff)
Keywords: HDF4 added
Status: newassigned

I have tried running the command:

gdalinfo -mm HDF4_EOS:EOS_SWATH:"MOD28L2.A2001213.1525.004.2002197060500.hdf":Swath:sst

under valgrind and there is no apparent memory leaks according to valgrind. Are you working on linux? Can you try using valgrind to identify the leaks you are encountering?

comment:3 by vincentschut, 15 years ago

The leak is in opening the main hdf file, not in opening a sds (try changing the above python snippet to open a sds, and you'll see no leak; open the main hdf file and there is a leak). For info on my os, library versions, etc, see the original report. It's there, really.

I ran 'valgrind --leak-check=full --show-reachable=yes gdalinfo data.hdf', here's the output (gdalinfo's output omitted):

==6686== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 1) ==6686== malloc/free: in use at exit: 6,801 bytes in 393 blocks. ==6686== malloc/free: 42,613 allocs, 42,220 frees, 15,364,923 bytes allocated. ==6686== For counts of detected errors, rerun with: -v ==6686== searching for pointers to 393 not-freed blocks. ==6686== checked 1,591,120 bytes. ==6686== ==6686== 200 bytes in 5 blocks are still reachable in loss record 1 of 3 ==6686== at 0x4C2391E: malloc (vg_replace_malloc.c:207) ==6686== by 0x526860E: CPLCreateMutex (cpl_multiproc.cpp:722) ==6686== by 0x52686E4: CPLCreateOrAcquireMutex (cpl_multiproc.cpp:119) ==6686== by 0x5268742: CPLMutexHolder::CPLMutexHolder(void, double, char const*, int) (cpl_multiproc.cpp:63) ==6686== by 0x5235AF2: GetGDALDriverManager (gdaldrivermanager.cpp:73) ==6686== by 0x509C418: GDALAllRegister (gdalallregister.cpp:76) ==6686== by 0x4025E5: main (gdalinfo.c:89) ==6686== ==6686== ==6686== 256 bytes in 1 blocks are still reachable in loss record 2 of 3 ==6686== at 0x4C2391E: malloc (vg_replace_malloc.c:207) ==6686== by 0x53DD02E: NC_reset_maxopenfiles (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x53DD096: NC_open (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x53CC2A6: SDstart (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x50E32EE: HDF4Dataset::Open(GDALOpenInfo*) (hdf4dataset.cpp:679) ==6686== by 0x52303DA: GDALOpen (gdaldataset.cpp:2120) ==6686== by 0x402702: main (gdalinfo.c:149) ==6686== ==6686== ==6686== 6,345 bytes in 387 blocks are definitely lost in loss record 3 of 3 ==6686== at 0x4C2391E: malloc (vg_replace_malloc.c:207) ==6686== by 0x541118B: VPgetinfo (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x54114D5: Vinitialize (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x53DB15E: NC_new_cdf (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x53DD134: NC_open (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x53CC2A6: SDstart (in /usr/local/lib/libgdal.so.1.13.0) ==6686== by 0x5078B6A: EHopen (EHapi.c:374) ==6686== by 0x50E3396: HDF4Dataset::Open(GDALOpenInfo*) (hdf4dataset.cpp:767) ==6686== by 0x52303DA: GDALOpen (gdaldataset.cpp:2120) ==6686== by 0x402702: main (gdalinfo.c:149) ==6686== ==6686== LEAK SUMMARY: ==6686== definitely lost: 6,345 bytes in 387 blocks. ==6686== possibly lost: 0 bytes in 0 blocks. ==6686== still reachable: 456 bytes in 6 blocks. ==6686== suppressed: 0 bytes in 0 blocks.

By the way, I *do* get a 1410 bytes 'definitely lost' according to valgrind when running gdalinfo on a sds instead of on a hdf file. However, in the real world (read: my processing chain) opening subdatasets does not increase my memory usage, while opening hdf files does.

comment:4 by vincentschut, 15 years ago

I see something mangled my valgrind output... I'll attach it, 'cause I don't know how to edit my previous comment to repair it.

by vincentschut, 15 years ago

Attachment: valgrind.txt added

valgrind output of gdalinfo on a (modis) hdf file

comment:5 by Even Rouault, 15 years ago

From the valgrind trace, it would seem the the leak comes from SWopen() in the HDF-EOS code path. But I see a call to SWclose() a few lines below, so they look properly paired.

(Opening the HDF file itself or one of it subdatasets doesn't involve the same source file, so it is not so surprising that you get different results w.r.t memory leaks)

I've run : "valgrind --leak-check=full gdalinfo MISR_AM1_CGAS_MAY_2007_F15_0031.hdf" where this file is a HDF-EOS file coming from ftp://l4ftl01.larc.nasa.gov/MISR/MIL3MAE.004/2007.05.01/

--> No leak.

So, it might come from your particular file, the libhdf4 version, a 32/64 bit issue, ... My environment : Ubuntu 8.04, 32 bit, libhdf4g 4.1r4-21

If you could attach a (small) dataset with which you can reproduce the issue or provide a link to it, that might help. With the output of gdalinfo on it also.

comment:6 by vincentschut, 15 years ago

I've been able to test the same file(s) on a different system, also linux 64 bit but with a different hdflib release, which gives no leak. I'll investigate further and try to get the same hdf lib working on the faulty system, and report back. Thanks for investigating.

comment:7 by vincentschut, 15 years ago

Compiled and installed hdf4.2r4 (leaking was 4.2r2); rebuilt gdal. Leak seems gone, valgrind output:

==16306== LEAK SUMMARY: ==16306== definitely lost: 0 bytes in 0 blocks. ==16306== possibly lost: 0 bytes in 0 blocks. ==16306== still reachable: 200 bytes in 5 blocks. ==16306== suppressed: 0 bytes in 0 blocks.

In python I still have some memory growth, however this is only once when opening the first file. Consequetive gdal.Open calls don't leak memory, so I hope this is just some initialization issue.

Thanks for working on this with me. Suppose we can flag it solved for now. If I do still encounter leaks in my real-world processing I'll come back.

comment:8 by Even Rouault, 15 years ago

Resolution: invalid
Status: assignedclosed

Closing then

Note: See TracTickets for help on using tickets.