Opened 8 years ago

Last modified 5 years ago

#3062 new defect

Segmentation fault with r.buffer

Reported by: escheper Owned by: grass-dev@…
Priority: major Milestone: 7.6.2
Component: Raster Version: 7.0.4
Keywords: r.buffer Cc:
CPU: Other Platform: Linux

Description

I'm trying to buffer raster regions at various resolutions. The following commands went well:

g.region w=-180.0 s=-90.0 e=180.0 n=90.0 res=0.00833333

r.buffer input=rasset output=rasbuf distances=10.0 units="kilometers" --o

The next commands give a segmentation fault at "Finding buffer zones..." (at around 37%).

g.region w=-180.0 s=-90.0 e=180.0 n=90.0 res=0.002777777

r.buffer input=rasset output=rasbuf distances=10.0 units="kilometers" --o

I'm using Ubuntu 16.04 LTS on an Intel Xeon E5-1650 server with >64GB memory.

Does anyone have a workaround or an idea what the cause is?

Attachments (1)

settlements.pack (57.4 KB ) - added by escheper 8 years ago.
packed input raster

Download all attachments as: .zip

Change History (19)

comment:1 by neteler, 8 years ago

Please make the map "rasset" available (via r.pack) or tell us how to generate it in order to enable us to reproduce the problem.

by escheper, 8 years ago

Attachment: settlements.pack added

packed input raster

comment:2 by escheper, 8 years ago

Added the file settlements.pack which is a small version of the original input raster. The size of the original raster is 16MB which cannot be added as an attachment. I could send this file by WeTransfer or is there another way?

comment:3 by escheper, 8 years ago

Did some further tests. Using the small version did not resulted in a segmentation fault. So the original raster is needed.

Last edited 8 years ago by escheper (previous) (diff)

comment:4 by escheper, 8 years ago

The original file can be downloaded here.

comment:5 by martinl, 8 years ago

Keywords: r.buffer added; buffer removed

comment:6 by neteler, 8 years ago

CMD='r.buffer input=r120749038 output=rasbuf distances=10.0 units=kilometers --o'
valgrind --tool=memcheck --leak-check=yes --show-reachable=yes  $CMD --o
==11148== Memcheck, a memory error detector

==11148== 
==11148== 933,120,000 bytes in 1 blocks are still reachable in loss record 73 of 73
==11148==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11148==    by 0x5076A4A: G__malloc (alloc.c:39)
==11148==    by 0x40333B: read_input_map (read_map.c:39)
==11148==    by 0x402814: main (main.c:141)
...

There is no G_free(map) it seems:

raster/r.buffer $ grep 'alloc\|free' *.c | grep -v GNU
parse_dist.c:    distances = (struct Distance *)G_calloc(count, sizeof(struct Distance));
read_map.c:    map = (MAPTYPE *) G_malloc((size_t) window.rows * window.cols * sizeof(MAPTYPE));
read_map.c:    cell = Rast_allocate_c_buf();
read_map.c:    G_free(cell);
support.c:    Rast_free_cats(&pcats);
write_map.c:    cell = Rast_allocate_c_buf();
write_map.c:    G_free(cell);

comment:7 by escheper, 8 years ago

Thanks for your support!

Reading the map didn't cause the problem. The method read_input_map() finishes without a fault. The segmentation fault occurs in execute_distance() in execute.c after the message "Finding buffer zones..." is shown.

After adding "G_free(map);" in read_input_map() the r.buffer command crashes within that method and execute_distances() isn't executed. So I removed the "G_free(map)" again.

After adding some debug statements I have noticed the program crashes at the following point in process_left().

        /* convert 1,2,3,4 to -1,0,1,2 etc. 0 becomes ndist */

        G_message(_("PL6"));
        G_message(_("cur_zone: %i ZONE_INCR: %i ndist: %i to_ptr: %i"),cur_zone,ZONE_INCR,ndist,to_ptr);
        if ((cur_zone = *--to_ptr))
            cur_zone -= ZONE_INCR;
        else
            cur_zone = ndist;

The last lines of the debug output was:

PL6

cur_zone: 16538 ZONE_INCR: 2 ndist: 1 to_ptr: -2116621104

Segmentation fault

I do not have much experience with C so maybe you can give me a clue.

in reply to:  7 comment:8 by neteler, 8 years ago

Replying to escheper:

        G_message(_("cur_zone: %i ZONE_INCR: %i ndist: %i to_ptr: %i"),cur_zone,ZONE_INCR,ndist,to_ptr);
        if ((cur_zone = *--to_ptr))
            cur_zone -= ZONE_INCR;
        else
            cur_zone = ndist;

...

cur_zone: 16538 ZONE_INCR: 2 ndist: 1 to_ptr: -2116621104

Good analysis, this number looks like an integer overflow to me (it is negative). A C expert wanted here...

comment:9 by escheper, 8 years ago

Changed the %i to %u. Now the numbers/pointers are positive. Also changed

	if ((cur_zone = *--to_ptr))

to

	if ((cur_zone == *--to_ptr))

but still get the segmentation fault :(.

comment:10 by annakrat, 8 years ago

I would say it is related to the fact that to_ptr is unsigned char, so it can overflow fast, although I don't understand the algorithm there. Note the comment in the code:

/* if MAPTYPE is changed to unsigned short, MAX_DIST can be set to 2^16-2
 * (if short is 2 bytes)
 */

So I would try to change it to unsigned short. Unfortunately I can't easily test the testcase, I have little memory right now.

in reply to:  10 ; comment:11 by escheper, 8 years ago

Replying to annakrat:

I would say it is related to the fact that to_ptr is unsigned char, so it can overflow fast, although I don't understand the algorithm there. Note the comment in the code:

/* if MAPTYPE is changed to unsigned short, MAX_DIST can be set to 2^16-2
 * (if short is 2 bytes)
 */

So I would try to change it to unsigned short. Unfortunately I can't easily test the testcase, I have little memory right now.

Thanks to your tip I found the solution.

Changing the MAPTYPE didn't solve the problem. But the problem was another overflow.

I'm using a 64,800 x 129,600 grid which results in 8,398,080,000 cells (or bytes). This is far beyond the maxint (2,147,483,647) and causes the overflow.

After changing most int variables in the r.buffer code in long types the segmentation fault disappeared :).

I didn't check the output visually, that's what I need to do next. Having no segmentation fault is very promising.

Except from changing the integer types to long, I have changed some "=" to "==" in "if" statements in process_left.c, process_at.c and process_rite.c. This didn't solve the segmentation fault but in my opinion the original code seems not to be correct.

I found some other lines that may not be correct too. But I'm not sure about it. These lines are:

process_rite.c(90): *to_ptr = (first_zone = i) + ZONE_INCR;
process_row.c(34): for (r = row; r >= 0 && (first_zone = find_distances(r)) >= 0; r--) 
process_row.c(47): r < window.rows && (first_zone = find_distances(r)) >= 0; r++) {
read_map.c(65): if ((*ptr++ = (*cell++ != 0))) {
read_map.c(81): if ((*ptr++ = !Rast_is_c_null_value(cell++))) {

Can anyone confirm this?

Remains the following questions:

  • How are other GRASS modules dealing with large grids? Do they have the same problem?
  • What will be the impact on the output raster after changing some "=" to "=="? Do we get different result buffers?
  • In what way should I do a "pull request" of my changes in de code?
Last edited 8 years ago by escheper (previous) (diff)

in reply to:  11 comment:12 by annakrat, 8 years ago

Replying to escheper:

Remains the following questions:

  • How are other GRASS modules dealing with large grids? Do they have the same problem?
  • What will be the impact on the output raster after changing some "=" to "=="? Do we get different result buffers?
  • In what way should I do a "pull request" of my changes in de code?

Create a diff and attach it here. You can make it simpler for us if you first identify the exact source of the error and then upload the patch which fixes without any extra changes. Changing int variables to long int might result in spending too much memory, so you shouldn't change it unless you consider the consequences.

If you think there are other changes which need to be done to run r.buffer correctly, please explain them. For example, it's not clear why "==" should replace "=" and where. And yes, very probably we get then different results.

in reply to:  11 comment:13 by mmetz, 8 years ago

Replying to neteler:

CMD='r.buffer input=r120749038 output=rasbuf distances=10.0 units=kilometers --o'
valgrind --tool=memcheck --leak-check=yes --show-reachable=yes  $CMD --o
==11148== Memcheck, a memory error detector

==11148== 
==11148== 933,120,000 bytes in 1 blocks are still reachable in loss record 73 of 73
==11148==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11148==    by 0x5076A4A: G__malloc (alloc.c:39)
==11148==    by 0x40333B: read_input_map (read_map.c:39)
==11148==    by 0x402814: main (main.c:141)
...

A segfault is reflected in valgrind as "invalid read of size ..." or "invalid write of size ...". Still reachable blocks are not a cause for segfault.

Replying to escheper:

[...] I found the solution.

Changing the MAPTYPE didn't solve the problem. But the problem was another overflow.

I'm using a 64,800 x 129,600 grid which results in 8,398,080,000 cells (or bytes). This is far beyond the maxint (2,147,483,647) and causes the overflow.

After changing most int variables in the r.buffer code in long types the segmentation fault disappeared :).

I didn't check the output visually, that's what I need to do next. Having no segmentation fault is very promising.

Note that long is a signed 32 bit integer except for POSIX 64 bit systems. On Windows, long is always 32 bit and would not solve the problem. Better use long long, or even better, ask for a platform-independent 64 bit integer type in GRASS.

comment:14 by martinl, 8 years ago

Is this really blocker for the release?

comment:15 by annakrat, 8 years ago

Priority: blockermajor

No, it seems to be only problem with huge data.

comment:16 by neteler, 7 years ago

Milestone: 7.0.57.0.6

comment:17 by neteler, 6 years ago

Milestone: 7.0.67.0.7

comment:18 by martinl, 5 years ago

Milestone: 7.0.77.6.2
Note: See TracTickets for help on using tickets.