Opened 15 years ago
Last modified 15 years ago
#3023 closed defect
memory leak in hdf driver — at Initial Version
Reported by: | vincentschut | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | GDAL_Raster | Version: | svn-trunk |
Severity: | normal | Keywords: | HDF4 |
Cc: | Kyle Shannon |
Description
I'm afraid I've encountered a memory leak in the hdf driver. Context: because of massive parallel (cluster) processing, I'm reusing a python instance for lots of jobs. Some of these jobs use gdal to read and/or save data. In some cases, I saw the memory use of my python worker processes grow with each job, until the whole lot got killed. I've got 8 gig on that server :-) . I've been able to track down the leak to the use of gdal.Open on a hdf (modis) dataset. I guess this means that either the gdal hdf driver or libhdf4 has the leak.
Here is some python code (runs on linux only due to the code to get the mem usage) to prove a leak when opening a hdf file, while no leak when opening a tif. Run in a folder with 'data.hdf' and 'data.tif' present, or change those names to an existing hdf and tif file.
Using 64bit linux, gdal svn rev. 17228 (today), libhdf4.2r2, python 2.6.2
========= python code =========
import os from osgeo import gdal
def getmemory():
proc_status = '/proc/%d/status' % os.getpid() scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
'KB': 1024.0, 'MB': 1024.0*1024.0}
v = open(proc_status).read() i = v.index('VmSize:') v = v[i:].split(None, 3) return (int(v[1]) * scale[v[2]])
nFiles = 100
m0 = getmemory() print 'memory usage before:', m0 print print nFiles, 'times the same hdf file' for i in range(nFiles):
gdal.OpenShared('data.hdf')
m1 = getmemory() print 'memory usage now:', m1, ' difference:', m1-m0 print print nFiles, 'times the same tif file' for i in range(nFiles):
gdal.OpenShared('data.tif')
m2 = getmemory() print 'memory usage now:', m2, ' difference:', m2-m1