Opened 11 years ago

Last modified 7 months ago

#438 new defect

v.distance -a uses too much memory

Reported by: mlennert Owned by: grass-dev@…
Priority: major Milestone: 7.8.3
Component: Vector Version: svn-trunk
Keywords: v.distance memory allocation Cc:
CPU: Unspecified Platform: Unspecified

Description

Not sure if this should be considered as a bug or a wish for enhancement...chosing bug for now as it makes the module useless with large files.

When trying to calculate a distance matrix between 20 000 points with v.distance -a, I get:

ERREUR:G_realloc: unable to allocate 1985321728 bytes at main.c:568

As the machine only has 1 GB of RAM, this is normal, but v.distance should be rewritten to not keep everything in memory, at least when dealing with the -a flag, and to only allocate memory for data really requested.

Currently, it allocates memory for a large number of NEAR structures (3xint+10xdouble i.e., for example, 3x4+10x8=92Bytes for each point) which contain space for all the potential uplad options (lines 447-8 of vector/v.distance/main.c):

        anear = 2 * nfrom;
        Near = (NEAR *) G_calloc(anear, sizeof(NEAR));

And then goes on to if necessary add memory space for the entire From x To matrix (lines 566-8 of vector/v.distance/main.c) in the loop of the to objects (count= total number of distances calculated after each loop):

             if (anear <= count) {
                   anear += 10 + nfrom / 10;
                   Near = (NEAR *) G_realloc(Near, anear * sizeof(NEAR));

I'm not sure I completely understand this last part, as it seems to create huge jumps in allocation, i.e. when the count of distances goes beyond nfrom*2 (or later values of anear), it reallocates memory space for anear new NEARS. In my case, when count>40000, anear=40000+10+20000/10=42010, i.e. adding space for 2010 new NEAR structures, without knowing (AFAICT) how many will actually still come...

But, as I said, I don't understand the code well enough to make a definite judgement. It would seem, however, that it might be better to calculate each distance and update the table immediately, or maybe write the necessary queries to a temp file to be able to launch the query at the end in one run, but without keeping everything in memory.

Change History (10)

comment:1 Changed 8 years ago by neteler

Is this still an issue with the current 7.SVN version?

comment:2 in reply to:  1 Changed 7 years ago by mlennert

Replying to neteler:

Is this still an issue with the current 7.SVN version?

The code has changed.

I just checked on a machine with 8GB of RAM and

v.distance -a -p from=ssbel from_type=centroid to=ssbel to_type=centroid upload=dist col=dist > dist_ssbel

where ssbel has a bit more than 20000 centroids, still crashed after memory _and_ swap go up to their max levels.

I'm aware that we're talking about 400,000,000 pairs of points, but I'm still hoping that there is a way to avoid such heavy memory usage.

Moritz

comment:3 Changed 4 years ago by martinl

Milestone: 7.0.07.0.5

comment:4 Changed 4 years ago by neteler

Milestone: 7.0.57.0.6

comment:5 Changed 2 years ago by neteler

Milestone: 7.0.67.0.7

comment:6 Changed 14 months ago by martinl

Milestone: 7.0.77.8.0

comment:7 Changed 10 months ago by neteler

Milestone: 7.8.07.8.1

Ticket retargeted after milestone closed

comment:8 Changed 8 months ago by neteler

Milestone: 7.8.17.8.2

Ticket retargeted after milestone closed

comment:9 Changed 7 months ago by neteler

Milestone: 7.8.2

Ticket retargeted after milestone closed

comment:10 Changed 7 months ago by neteler

Milestone: 7.8.3
Note: See TracTickets for help on using tickets.