Opened 13 years ago

Closed 9 years ago

#4276 closed defect (invalid)

Memory leak when importing raster with AIG driver

Reported by: gdalnovice Owned by: warmerdam
Priority: high Milestone:
Component: GDAL_Raster Version: 1.8.0
Severity: critical Keywords: MacOS
Cc: hobu, kyngchaos

Description

Hi,

I am using GDAL in Python to do some computations. I iterate over multiple ArcGIS rasters and import them in order to use Numpy to do very simple calculations. I have noticed that there is a huge memory leak in doing this. My rasters are 2-3GB each and at each iteration I loose 2-7GB memory. Obviously this crashes the computer.

I am using GDAL 1.8 and Python 2.7.1 on a MacPro with 64GB memory and 12 cores. I had to employ Mac's "purge" command to release the inactive memory generated by the leak, but it would be nice to be able to do this without having to liberate memory at each iteration.

I am a novice so I have not been able to determine the exact step at which the memory loss occurs, but if I iterate on the same two files over and over, the memory leak diminishes or even disappears. So I'm guessing it is related to the import function. Here is the code I am using:

#Import tools to be used

import scikits.image.graph as graph

import os, sys, time

from osgeo import gdal, gdalconst

from osgeo.gdalconst import *

import numpy

from numpy import *

codes = open('countrycap.txt','r')

for line in codes:

# setup driver to import and export files and GDAL options

driver = gdal.GetDriverByName('AIG')

driver.Register()

pais = line.rstrip()

filepais = open('+pais+'.txt','w')

filepais.write('code,DiffWeekMean,DiffMax,DiffMin,DiffSum \n')

#define country for which to do the computation

file1='/Paths/'+pais+'dist'

file2='/paths/'+pais+'dist'

iso1 = gdal.Open(file1, GA_ReadOnly)

iso2 = gdal.Open(file2, GA_ReadOnly)

print('Opened...')

print(file1)

print(file2)

# Get data about rasters

cols1 = iso1.RasterXSize

rows1 = iso1.RasterYSize

band1 = iso1.GetRasterBand(1)

nodataval=band1.GetNoDataValue()

proj = iso1.GetProjection()

band2 = iso2.GetRasterBand(1)

#Get Data

data1 = band1.ReadAsArray(0, 0, cols1, rows1)

data2 = band2.ReadAsArray(0, 0, cols1, rows1)

mask1=numpy.greater_equal(data1,0)

data3=numpy.choose(mask1,(nodataval,(data1-data2)/(7*24)))

mask2=numpy.less(data1,0)

data4=numpy.ma.masked_array(data3,mask=mask2)

media=data4.mean()

minimo=data4.min()

maximo=data4.max()

suma=data4.sum() print(str(pais)+','+str(media)+','+str(maximo)+','+str(minimo)+','+str(suma)+' \n')

filediff.write(str(pais)+','+str(media)+','+str(maximo)+','+str(minimo)+','+str(suma)+' \n')

filepais.write(str(pais)+','+str(media)+','+str(maximo)+','+str(mini

mo)+','+str(suma)+' \n')

filepais.close()

# figure out how long the script took to run

endTime = time.time()

print 'The script has taken ' + str((endTime - startTime)/60) + ' minutes'

del iso1

del iso2

del data1

del data2

del data3

del data4

del mask1

del mask2

del band1

del band2

del rows1

del cols1

del nodataval

del proj

del media

del minimo

del maximo

del suma

del driver

os.system("purge")

filediff.close()

endTime = time.time()

print 'The script took ' + str((endTime - startTime)/60) + ' minutes'

Change History (8)

comment:1 by gdalnovice, 13 years ago

Doing it line by line I notice that the memory leak seems to come from the ReadAsArray command.

comment:2 by Even Rouault, 13 years ago

Cc: hobu added

I've tried your script (slightly edited) on Linux with a small AIG dataset, that can be found here http://trac.osgeo.org/gdal/browser/trunk/autotest/gdrivers/data/abc3x1 , and observed no leak.

So either there's something particular with your AIG dataset, that is not triggered with the one I tried, either there's something very particular with the way MacOS handles memory.

One hypothesis is that MacOS caches freed memory a bit too agressively, which is problematic is you read huge buffer of 2 GB, and that this "purge" commande forces it to release. If GDAL really leaked memory, the purge memory wouldn't be able to recover from it, from my understanding of what I've read about the purge command.

So perhaps you could try not to read the whole raster at a single time, but proceed line by line for example and see if it makes any difference.

CC'ing Howard which has more experience with MacOS than me. (mine is 0 ;-))

comment:3 by gdalnovice, 13 years ago

Thanks hobu for looking at this so promptly.

Some notes on what you did.

1.) I had previously tried the same code iterating 120 times on only 1 file and it didn't crash. For some reason it seemed to only leak memory once and did not accumulate the leak over iterations. So it doesn't surprise me that you did not see any leak in the exercise you did, especially since your file is so much smaller.

2.) By deleting some of the structures I was able to decrease the amount of memory leaked, but never got to have it stop leaking when I do the ReadAsArray command. (I wonder if it is caused by the lack of a close() command in Python?)

3.) Interestingly, if I stopped the process after iterating over X files, once memory had leaked and restarted the iteration over the same group of files it had done previously, it leaked very little...only once it iterated over files it had not done so before (X+1,X+2,...), would the leakage increase again.

So, as you can see it doesn't seem to depend on the fact that I read a huge file. Still, I had a version of the script that did the same processing but block by block (way slower) and I seem to remember I had similar issues. Once my computer finishes some computations I'll try that one again and see if it causes the same problems. I'll let you know.

comment:4 by Even Rouault, 13 years ago

Actually it was my comment (rouault), not Howard's (=hobu) ;-) I just added him in the CC list.

About 1), I think you are perhaps seing the effect of the GDAL block cache. But it is limited to 40 MB shared among all opened dataset, and when a dataset is closed, the blocks related to it that were in the block cache are freed. That should be insignificant.

About 3), it really does sound as a OS issue that does too agressive file caching. If you kill the process, then it is the responsibility of the OS to make the memory it used (or even leaked) available to new processes.

I've retried your script with much larger datasets, and still did not see any leak.

So I'm afraid I can't reproduce your issue and you have to investigate on MacOS behaviour related to file content caching.

comment:5 by gdalnovice, 13 years ago

Hmmm...no idea how to proceed here. In any case if anyone else is having the same issue they can solve it in the meantime using the "purge" command.

If someone has a guide on how to tackle this in order to find the problem let me know. I'm more than happy to help, I just don't know where to start.

comment:6 by Jukka Rahkonen, 9 years ago

Keywords: MacOS added

Are there any MacOS developers out there to say something about this memory leak issue? I do not know whom to CC.

comment:7 by Even Rouault, 9 years ago

Cc: kyngchaos added

CC'ing William Kyngesburye (aka kyngchaos) who maintains a MacOSX GDAL stack if he wants to try reproducing this old issue. Otherwise we might just close it as worksforme.

comment:8 by Even Rouault, 9 years ago

Resolution: invalid
Status: newclosed

No feedback. Closing

Note: See TracTickets for help on using tickets.