Opened 16 years ago
Last modified 5 years ago
#438 new defect
v.distance -a uses too much memory
Reported by: | mlennert | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 7.8.3 |
Component: | Vector | Version: | svn-trunk |
Keywords: | v.distance memory allocation | Cc: | |
CPU: | Unspecified | Platform: | Unspecified |
Description
Not sure if this should be considered as a bug or a wish for enhancement...chosing bug for now as it makes the module useless with large files.
When trying to calculate a distance matrix between 20 000 points with v.distance -a, I get:
ERREUR:G_realloc: unable to allocate 1985321728 bytes at main.c:568
As the machine only has 1 GB of RAM, this is normal, but v.distance should be rewritten to not keep everything in memory, at least when dealing with the -a flag, and to only allocate memory for data really requested.
Currently, it allocates memory for a large number of NEAR structures (3xint+10xdouble i.e., for example, 3x4+10x8=92Bytes for each point) which contain space for all the potential uplad options (lines 447-8 of vector/v.distance/main.c):
anear = 2 * nfrom; Near = (NEAR *) G_calloc(anear, sizeof(NEAR));
And then goes on to if necessary add memory space for the entire From x To matrix (lines 566-8 of vector/v.distance/main.c) in the loop of the to objects (count= total number of distances calculated after each loop):
if (anear <= count) { anear += 10 + nfrom / 10; Near = (NEAR *) G_realloc(Near, anear * sizeof(NEAR));
I'm not sure I completely understand this last part, as it seems to create huge jumps in allocation, i.e. when the count of distances goes beyond nfrom*2 (or later values of anear), it reallocates memory space for anear new NEARS. In my case, when count>40000, anear=40000+10+20000/10=42010, i.e. adding space for 2010 new NEAR structures, without knowing (AFAICT) how many will actually still come...
But, as I said, I don't understand the code well enough to make a definite judgement. It would seem, however, that it might be better to calculate each distance and update the table immediately, or maybe write the necessary queries to a temp file to be able to launch the query at the end in one run, but without keeping everything in memory.
Change History (10)
follow-up: 2 comment:1 by , 12 years ago
comment:2 by , 11 years ago
Replying to neteler:
Is this still an issue with the current 7.SVN version?
The code has changed.
I just checked on a machine with 8GB of RAM and
v.distance -a -p from=ssbel from_type=centroid to=ssbel to_type=centroid upload=dist col=dist > dist_ssbel
where ssbel has a bit more than 20000 centroids, still crashed after memory _and_ swap go up to their max levels.
I'm aware that we're talking about 400,000,000 pairs of points, but I'm still hoping that there is a way to avoid such heavy memory usage.
Moritz
comment:3 by , 9 years ago
Milestone: | 7.0.0 → 7.0.5 |
---|
comment:4 by , 8 years ago
Milestone: | 7.0.5 → 7.0.6 |
---|
comment:5 by , 7 years ago
Milestone: | 7.0.6 → 7.0.7 |
---|
comment:6 by , 6 years ago
Milestone: | 7.0.7 → 7.8.0 |
---|
comment:10 by , 5 years ago
Milestone: | → 7.8.3 |
---|
Is this still an issue with the current 7.SVN version?