Opened 12 years ago

Last modified 8 years ago

#1628 new defect

segfault in r.walk

Reported by: pertusus Owned by: grass-dev@…
Priority: normal Milestone: 6.4.6
Component: Raster Version: svn-releasebranch64
Keywords: r.cost, r.walk Cc:
CPU: x86-64 Platform: Linux

Description

After calling r.walk like

 r.walk elevation=dem start_points=stream_points stop_points=stop_points walk_coeff=1,10000,0,0 lambda=1.0 output=walk friction=friction max_cost=1600000

while finding cost path, there is a segfault. Here it is with the backtrace

(gdb) run
Starting program: /localdata/local/grass-6.5.svn/bin/r.walk elevation=dem start_points=stream_points stop_points=stop_points walk_coeff=1,10000,0,0 lambda=1.0 output=walk friction=friction max_cost=1600000
[Thread debugging using libthread_db enabled]
Walking costs are a=1.000000 b=10000.000000 c=0.000000 d=0.000000
Lambda is 1.000000
Slope_factor is -0.212500
Nseg is 4
Null cells excluded from cost evaluation.
Dev note: Adapted sites library used for vector points. (module should be
updated to GRASS 6 vector library)
Detaching after fork from child process 27060.
Dev note: Adapted sites library used for vector points. (module should be
updated to GRASS 6 vector library)
Detaching after fork from child process 27061.
DTM_Source map is: Integer cell type
9194 rows, 8736 cols
COST_Source map is: Integer cell type
4536 rows, 2370 cols
Output map is: Integer cell type
4536 rows, 2370 cols
EW resolution 1000.00003956 (1000.000040)
NS resolution 999.99998622 (999.999986)
Creating some temporary files...
Reading dem...
Reading friction...
 100%
Initializing output
 100%
Finding cost path

Program received signal SIGSEGV, Segmentation fault.
get () at memory.c:111
111         if (first_free->lower == NULL) {
(gdb) bt
#0  get () at memory.c:111
#1  0x0000000000402724 in insert (min_cost=<value optimized out>, row=999, col=1235)
    at btree.c:35
#2  0x00000000004054f1 in main (argc=347946718, argv=<value optimized out>)
    at main.c:1452

It is not easy to reproduce. I posted a tarball to reproduce here: https://poseidon.centre-cired.fr/pat/segfault_r_walk.tar.gz

To reproduce:

tar xzvf segfault_r_walk.tar.gz
cd segfault_r_walk

edit run.sh to have the right paths to grass and run

./run.sh.

I used v.in.ogr for the stop_points vector, as if I use what comes from v.out.ascii (the corresponding line is commented out in run.sh) the bug does not reproduce for me.

This bug is also in grass 6.4.2, with the exact same backtrace. It is also present in 6.4.1.

Change History (10)

comment:1 by mmetz, 12 years ago

I can reproduce the segfault with the test data. In GRASS 7, r.walk finishes successfully.

Backport r.cost, r.walk to 6.x?

Markus M

in reply to:  1 comment:2 by neteler, 10 years ago

Keywords: r.cost r.walk added
Milestone: 6.4.36.4.4
Version: 6.4.2svn-releasebranch64

Replying to mmetz:

I can reproduce the segfault with the test data. In GRASS 7, r.walk finishes successfully.

Backport r.cost, r.walk to 6.x?

Test data are still generating a crash in the test script with current 6.4.svn.

Which fix needs to be backported?

comment:3 by mlennert, 10 years ago

I tried with the test script and can confirm the crash. Testing a bit further the difference between the v.in.ascii and v.in.ogr imports, I saw that the latter has many more attribute columns. When I reduce the number of attribute columns, I get r.walk to work:

v.info -c stop_points_ogr
Displaying column types/names for database connection of layer 1:
INTEGER|cat
INTEGER|cat_point
INTEGER|cat_relocated
DOUBLE PRECISION|lon_relocated
DOUBLE PRECISION|lat_relocated
INTEGER|cat_
DOUBLE PRECISION|value
DOUBLE PRECISION|orig_lon
DOUBLE PRECISION|orig_lat

Running v.db.droptable before r.walk also makes r.walk work.

I do not have the time right now to check if this is linked to a specific attribute or just the general number of attributes, but maybe someone else can continue on that path ?

Backporting from grass7 seems quite invasive at this stage, unless the two modules are self-contained.

Moritz

comment:4 by mlennert, 10 years ago

Additional info: AFAICT, the segfault occurs in a call to get() (memory.c) within the insert() function (btree.c).

Moritz

in reply to:  3 ; comment:5 by glynn, 10 years ago

Replying to mlennert:

I do not have the time right now to check if this is linked to a specific attribute or just the general number of attributes, but maybe someone else can continue on that path ?

Could it be a problem with lib/sites?

The 6.x version of r.walk uses that library to obtain the start/stop points from a vector map using the 5.x sites API. The 7.x version uses the vector library directly.

In 7.x, the only clients of lib/sites are v.in.sites and v.vol.rst. The former only uses the G_oldsite_* functions to access genuine 5.x "sites" maps. The latter uses G_site_new_struct() to allocate a structure which is never used.

in reply to:  5 comment:6 by neteler, 10 years ago

Replying to glynn: ...

In 7.x, the only clients of lib/sites are v.in.sites and v.vol.rst. The former only uses the G_oldsite_* functions to access genuine 5.x "sites" maps. The latter uses G_site_new_struct() to allocate a structure which is never used.

(Unrelated to this ticket but to above comment): I have removed the lib/sites dependency of v.vol.rst in 7.x in r60272 (trunk) and r60273 (relbr7).

in reply to:  5 ; comment:7 by mlennert, 10 years ago

Replying to glynn:

Replying to mlennert:

I do not have the time right now to check if this is linked to a specific attribute or just the general number of attributes, but maybe someone else can continue on that path ?

Could it be a problem with lib/sites?

It would seem a likely candidate, but how would that influence what is happening a long time after the closure of the input file ? The segfault occurs in the routine finding the cost path in which, AFAICT, the lib/sites doesn't play a role anymore.

in reply to:  7 ; comment:8 by mmetz, 10 years ago

Replying to mlennert:

Replying to glynn:

Replying to mlennert:

I do not have the time right now to check if this is linked to a specific attribute or just the general number of attributes, but maybe someone else can continue on that path ?

Could it be a problem with lib/sites?

It would seem a likely candidate, but how would that influence what is happening a long time after the closure of the input file ? The segfault occurs in the routine finding the cost path in which, AFAICT, the lib/sites doesn't play a role anymore.

This can happen if the sites lib corrupts some memory that is not used by the sites lib itself. The segmentation fault can then appear anywhere else later on. You need to use valgrind to check memory usage and search for something like "invalid [read|write] ..." in the valgrind output, usually rather at the beginning than at the end of the valgrind output.

Apart from that, the usage of the wrappers provided by the sites lib is deprecated also in G6. I would suggest the replace the relevant code in the affected modules rather than fixing the sites lib.

in reply to:  8 comment:9 by mlennert, 10 years ago

Replying to mmetz:

Replying to mlennert:

Replying to glynn:

Replying to mlennert:

I do not have the time right now to check if this is linked to a specific attribute or just the general number of attributes, but maybe someone else can continue on that path ?

Could it be a problem with lib/sites?

It would seem a likely candidate, but how would that influence what is happening a long time after the closure of the input file ? The segfault occurs in the routine finding the cost path in which, AFAICT, the lib/sites doesn't play a role anymore.

This can happen if the sites lib corrupts some memory that is not used by the sites lib itself. The segmentation fault can then appear anywhere else later on. You need to use valgrind to check memory usage and search for something like "invalid [read|write] ..." in the valgrind output, usually rather at the beginning than at the end of the valgrind output.

I won't be able to work on this before next week, so if someone else wants to try...

Apart from that, the usage of the wrappers provided by the sites lib is deprecated also in G6. I would suggest the replace the relevant code in the affected modules rather than fixing the sites lib.

This sounds fairly invasive. Is this still feasible for 6.4.4 ?

Moritz

comment:10 by neteler, 8 years ago

Milestone: 6.4.46.4.6
Note: See TracTickets for help on using tickets.