Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#565 closed defect (fixed)

r.colors: glibc double free or corruption with -ae

Reported by: hamish Owned by: grass-dev@…
Priority: minor Milestone: 6.5.0
Component: Raster Version: svn-develbranch6
Keywords: r.colors Cc:
CPU: x86-32 Platform: Linux

Description

Hi,

I'm unable to recreate it right now, but yesterday I was consistently getting a rather ugly *** glibc ... error which said something about an invalid double free().

looked rather worse than a standard segfault and not something I've seen much of before. this is on the 20 month old version of Debian/stable and I haven't changed anything so expect glibc versions is as stable & well tested as it ever could be.

the command line looked like

   r.colors map1 color=bcyr -ae

the input map is gone now and I've changed r.surf.volcano enough that I don't get it anymore, but it can be assumed that there were 0.0s in the map. (-g blocks 0s but -a doesn't?)

Hamish

Change History (9)

comment:1 by hamish, 16 years ago

Summary: r.colors: glibc double free error with -aer.colors: glibc double free or corruption with -ae

here we go, happened again..

GRASS65svn> r.colors map1 color=bcyr -ae
Reading raster map <map1@user7>...
 100%
*** glibc detected *** double free or corruption (!prev): 0x09a3a4f8 ***
Aborted
G> r.univar -g map1
n=301941
null_cells=0
min=-23.8941449919674
max=1999.8932486971
range=2023.78739368907
mean=68.8068337965498
mean_of_abs=71.8955312273628
stddev=259.955255789945
...
G> r.info -t map1
datatype=DCELL

G> gdb `which r.colors`
(gdb) run gauss color=bcyr -ae    
Starting program: [...]/dist.i686-pc-linux-gnu/bin/r.colors map1 color=bcyr -ae
 100%
*** glibc detected *** double free or corruption (!prev): 0x08e40a18 ***

Program received signal SIGABRT, Aborted.
0xb7f18424 in __kernel_vsyscall ()

(gdb) bt
#0  0xb7ef0424 in __kernel_vsyscall ()
#1  0xb7d31811 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d32fb9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7d66dfa in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
#4  0xb7d6e68f in mallopt () from /lib/tls/i686/cmov/libc.so.6
#5  0xb7d6e732 in free () from /lib/tls/i686/cmov/libc.so.6
#6  0xb7ea2cd6 in G_free (buf=0x9b7ba18) at alloc.c:129
#7  0x0804b386 in get_fp_stats (name=0x9b72640 "map1", mapset=0x9b72030 "user7", 
    statf=0xbfd05f94, min=3.2146326349358647, max=7.601348984183665, geometric=0, 
    geom_abs=1) at stats.c:136
#8  0x0804a655 in main (argc=Cannot access memory at address 0x13d8
) at main.c:385

(gdb) frame 7
#7  0x0804b386 in get_fp_stats (name=0x9b72640 "map1", mapset=0x9b72030 "user7", 
    statf=0xbfd05f94, min=3.2146326349358647, max=7.601348984183665, geometric=0, 
    geom_abs=1) at stats.c:136
136         G_free(dcell);
(gdb) list
131             }
132         }
133     
134         G_percent(row, nrows, 2);
135         G_close_cell(fd);
136         G_free(dcell);
137     }

(gdb) bt full
...
#6  0xb7ea2cd6 in G_free (buf=0x9b7ba18) at alloc.c:129
No locals.
#7  0x0804b386 in get_fp_stats (name=0x9b72640 "map1", mapset=0x9b72030 "user7", 
    statf=0xbfd05f94, min=3.2146326349358647, max=7.601348984183665, geometric=0, 
    geom_abs=1) at stats.c:136
        dcell = (DCELL *) 0x9b7ba18
        row = 477
        col = 633
        nrows = 477
        ncols = 633
        fd = 7
#8  0x0804a655 in main (argc=Cannot access memory at address 0x13d8
) at main.c:385
        fpstats = {geometric = 0, geom_abs = 1, flip = 0, count = 1000, 
  min = 3.2146326349358647, max = 7.601348984183665, stats = 0x9b7cdf0, 
  total = 301941}
        overwrite = 1
        interactive = 0
...

this map was created with:

#spearfish
g.region n=4928010 s=4913700 w=590010 e=609000 res=30
#rows=477  cols=633   (odd number so there is a center cell)

#latest version from grass-addons svn
r.surf.volcano -g -r map2 peak=2000 crater=120

but that uses r.surf.gauss which changes each time you run it. The same r.colors command runs to 100% with no problem with the exact same call to r.surf.volcano.

r.mapcalc found no values exactly matching 0.00000 in the map.

It consistently breaks with this map, I can .tgz it if needed.

Here's the valgrind output:

G> CMD="r.colors map1 color=bcyr -ae"
G> valgrind -v --tool=memcheck --leak-check=yes  $CMD
==7130== Memcheck, a memory error detector.
==7130== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==7130== Using LibVEX rev 1658, a library for dynamic binary translation.
==7130== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==7130== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation framework.
==7130== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==7130== 
--7130-- Command line
--7130--    r.colors
--7130--    map1
--7130--    color=bcyr
--7130--    -ae
--7130-- Startup, with flags:
--7130--    -v
--7130--    --tool=memcheck
--7130--    --leak-check=yes
--7130-- Contents of /proc/version:
--7130--   Linux version 2.6.26-bpo.1-686 (Debian 2.6.26-13~bpo40+1) (nobse@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Mon Jan 12 20:06:26 UTC 2009
--7130-- Arch and hwcaps: X86, x86-sse1-sse2
--7130-- Valgrind library directory: /usr/lib/valgrind
--7130-- Reading syms from /lib/ld-2.3.6.so (0x4000000)
--7130-- Reading debug info from /lib/ld-2.3.6.so...
--7130-- ... CRC mismatch (computed A2342EFB wanted 8FC9FDDB)
--7130--    object doesn't have a symbol table
--7130-- Reading syms from /usr/local/src/grass/svn/grass65/dist.i686-pc-linux-gnu/bin/r.colors (0x8048000)
--7130-- Reading syms from /usr/lib/valgrind/x86-linux/memcheck (0x38000000)
--7130--    object doesn't have a dynamic symbol table
--7130-- Reading suppressions file: /usr/lib/valgrind/default.supp
--7130-- Reading syms from /usr/lib/valgrind/x86-linux/vgpreload_core.so (0x4019000)
--7130-- Reading syms from /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x401B000)
--7130-- Reading syms from /usr/local/src/grass/svn/grass65/dist.i686-pc-linux-gnu/lib/libgrass_gis.6.5.svn.so (0x4021000)
--7130-- Reading syms from /usr/local/src/grass/svn/grass65/dist.i686-pc-linux-gnu/lib/libgrass_datetime.6.5.svn.so (0x407A000)
--7130-- Reading syms from /usr/lib/libz.so.1.2.3 (0x409C000)
--7130--    object doesn't have a symbol table
--7130-- Reading syms from /lib/tls/i686/cmov/libm-2.3.6.so (0x40B0000)
--7130-- Reading debug info from /lib/tls/i686/cmov/libm-2.3.6.so...
--7130-- ... CRC mismatch (computed 70BA299D wanted 3AF3BB73)
--7130--    object doesn't have a symbol table
--7130-- Reading syms from /lib/tls/i686/cmov/libc-2.3.6.so (0x40D5000)
--7130-- Reading debug info from /lib/tls/i686/cmov/libc-2.3.6.so...
--7130-- ... CRC mismatch (computed B308380A wanted ED81AE67)
--7130--    object doesn't have a symbol table
--7130-- Reading syms from /lib/tls/i686/cmov/libdl-2.3.6.so (0x4206000)
--7130-- Reading debug info from /lib/tls/i686/cmov/libdl-2.3.6.so...
--7130-- ... CRC mismatch (computed DE27AFD0 wanted AA9202FA)
--7130--    object doesn't have a symbol table
--7130-- REDIR: 0x41407F0 (rindex) redirected to 0x401DF30 (rindex)
--7130-- REDIR: 0x413C970 (malloc) redirected to 0x401D306 (malloc)
--7130-- REDIR: 0x413FF30 (strcpy) redirected to 0x401E970 (strcpy)
--7130-- REDIR: 0x413A6B0 (free) redirected to 0x401CF20 (free)
--7130-- REDIR: 0x413FEC0 (strcmp) redirected to 0x401E2D0 (strcmp)
--7130-- REDIR: 0x4140430 (strlen) redirected to 0x401E200 (strlen)
--7130-- REDIR: 0x4140640 (strncmp) redirected to 0x401E260 (strncmp)
--7130-- REDIR: 0x413CD90 (realloc) redirected to 0x401D3B5 (realloc)
--7130-- REDIR: 0x4141610 (memmove) redirected to 0x401E520 (memmove)
--7130-- REDIR: 0x4141B20 (memcpy) redirected to 0x401EBE0 (memcpy)
--7130-- REDIR: 0x413C630 (calloc) redirected to 0x401C633 (calloc)
--7130-- REDIR: 0x413FD50 (index) redirected to 0x401E020 (index)
--7130-- REDIR: 0x413FBA0 (strcat) redirected to 0x401E5E0 (strcat)
--7130-- REDIR: 0x4141180 (memchr) redirected to 0x401E3F0 (memchr)
--7130-- REDIR: 0x41423C0 (rawmemchr) redirected to 0x401E5C0 (rawmemchr)
--7130-- REDIR: 0x4141680 (memset) redirected to 0x401E4F0 (memset)
==7130== Invalid read of size 4
==7130==    at 0x804B320: get_fp_stats (stats.c:129)
==7130==    by 0x804A654: main (main.c:385)
==7130==  Address 0x4223EDC is 12 bytes before a block of size 4,000 alloc'd
==7130==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==7130==    by 0x402EC00: G__calloc (alloc.c:74)
==7130==    by 0x804B1B6: get_fp_stats (stats.c:101)
==7130==    by 0x804A654: main (main.c:385)
==7130== 
==7130== Invalid write of size 4
==7130==    at 0x804B325: get_fp_stats (stats.c:129)
==7130==    by 0x804A654: main (main.c:385)
==7130==  Address 0x4223EDC is 12 bytes before a block of size 4,000 alloc'd
==7130==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==7130==    by 0x402EC00: G__calloc (alloc.c:74)
==7130==    by 0x804B1B6: get_fp_stats (stats.c:101)
==7130==    by 0x804A654: main (main.c:385)
  72%
valgrind: m_mallocfree.c:194 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 5120, hi = 5161.
Probably caused by overrunning/underrunning a heap block's bounds.

==7130==    at 0x38016993: report_and_quit (m_libcassert.c:136)
==7130==    by 0x38016CBF: vgPlain_assert_fail (m_libcassert.c:200)
==7130==    by 0x3802049E: vgPlain_arena_free (m_mallocfree.c:191)
==7130==    by 0x38036001: vgPlain_cli_free (replacemalloc_core.c:108)
==7130==    by 0x380019DB: die_and_free_mem (mc_malloc_wrappers.c:111)
==7130==    by 0x38036782: do_client_request (scheduler.c:1158)
==7130==    by 0x380380A0: vgPlain_scheduler (scheduler.c:869)
==7130==    by 0x38057012: run_a_thread_NORETURN (syswrap-linux.c:87)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==7130==    at 0x401CFA5: free (vg_replace_malloc.c:233)
==7130==    by 0x402ECD5: G_free (alloc.c:129)
==7130==    by 0x403F248: G_zlib_read (flate.c:190)
==7130==    by 0x4043E3A: read_data_fp_compressed (get_row.c:115)
==7130==    by 0x40442D0: read_data (get_row.c:238)
==7130==    by 0x4044F40: get_map_row_nomask (get_row.c:616)
==7130==    by 0x4045047: get_map_row_no_reclass (get_row.c:646)
==7130==    by 0x4045140: get_map_row (get_row.c:678)
==7130==    by 0x404538E: G_get_raster_row (get_row.c:898)
==7130==    by 0x4045435: G_get_d_raster_row (get_row.c:968)
==7130==    by 0x804B229: get_fp_stats (stats.c:109)
==7130==    by 0x804A654: main (main.c:385)


Note: see also the FAQ.txt in the source distribution.
It contains workarounds to several common problems.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what Linux distro you are using.  Thanks.

Hamish

comment:2 by hamish, 16 years ago

..., even with maps created with the same params which do run to 100% on the standard command line will barf in valgrind at the same 72% complete stage.

Same if I do 'g.region -d' first, so that's probably a red herring (I mentioned it because the volcano alg calcs distance from the map center so you can get an div0 anomaly there if the cells line up).

I can simplify the map creation line to:

r.surf.volcano -r map3

and it still breaks with valgrind. If I remove the -r rough-it-up flag (which sends the surface slightly negative) the min becomes 1.2345e-91 and valgrind runs to completion. it still complains about the same "Invalid read of size 4" and "Invalid write of size 4" errors though.

Hamish

comment:3 by hamish, 16 years ago

G65> CMD="r.colors map3 color=bcyr -ae"

G65> LD_PRELOAD=libefence.so.0.0 $CMD --v

  Electric Fence 2.1 Copyright (C) 1987-1998 Bruce Perens.
Reading raster map <map3@user7>...
Segmentation fault

it makes electric-fence segfault. that's not encouraging at all.

Hamish

in reply to:  1 comment:4 by glynn, 16 years ago

Replying to hamish:

here we go, happened again..

G> r.univar -g map1
...
min=-23.8941449919674
max=1999.8932486971
...
...
#7  0x0804b386 in get_fp_stats (name=0x9b72640 "map1", mapset=0x9b72030 "user7", 
    statf=0xbfd05f94, min=3.2146326349358647, max=7.601348984183665, geometric=0, 
    geom_abs=1) at stats.c:136

If any of the raster values lie outside of the range specified by the min/max parameters, memory corruption will occur due to modifying an out-of-range array element.

FWIW, I have noticed a bug in get_fp_stats(...,geom_abs=1): if the range is signed, values which are closer to zero than both min and max will also modify an out-of-range array element.

I'll fix that one shortly, but it won't help if the min/max parameters are bogus.

comment:5 by hamish, 16 years ago

Resolution: fixed
Status: newclosed
G> r.univar -g map1
...
min=-23.8941449919674
max=1999.8932486971
...
...
#7  0x0804b386 in get_fp_stats (name=0x9b72640 "map1", mapset=0x9b72030 "user7", 
    statf=0xbfd05f94, min=3.2146326349358647, max=7.601348984183665, geometric=0, 
    geom_abs=1) at stats.c:136

Glynn:

If any of the raster values lie outside of the range specified by the min/max parameters, memory corruption will occur due to modifying an out-of-range array element.

FWIW, I have noticed a bug in get_fp_stats(...,geom_abs=1): if the range is signed, values which are closer to zero than both min and max will also modify an out-of-range array element.

I'll fix that one shortly, but it won't help if the min/max parameters are bogus.

thanks, that solved the problem. backported to 6.4.0 and devbr6.

are the min/max values really bogus? r.univar and 'r.info -r' both give (-23,1999) for those values. The 'max' value seen in gdb frame # 7 (ie 7.60) is simply the natural log of the max value in the map (1999). The min value in frame # 7 (3.2146) is a bit weirder. AFAICT it comes from ln(abs(min)+1). But hopefully that weirdness was just product of this bug.

closing ticket, "thanks heaps"

Hamish

comment:6 by hamish, 16 years ago

fwiw, valgrind still reports an error, but now it is slightly different:

...
--3098-- REDIR: 0x4141680 (memset) redirected to 0x401E4F0 (memset)
==3098== Invalid read of size 4
==3098==    at 0x804B344: get_fp_stats (stats.c:132)
==3098==    by 0x804A654: main (main.c:385)
==3098==  Address 0x4224FA8 is 0 bytes after a block of size 4,000 alloc'd
==3098==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==3098==    by 0x402EC00: G__calloc (alloc.c:74)
==3098==    by 0x804B1DA: get_fp_stats (stats.c:104)
==3098==    by 0x804A654: main (main.c:385)
==3098== 
==3098== Invalid write of size 4
==3098==    at 0x804B349: get_fp_stats (stats.c:132)
==3098==    by 0x804A654: main (main.c:385)
==3098==  Address 0x4224FA8 is 0 bytes after a block of size 4,000 alloc'd
==3098==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==3098==    by 0x402EC00: G__calloc (alloc.c:74)
==3098==    by 0x804B1DA: get_fp_stats (stats.c:104)
==3098==    by 0x804A654: main (main.c:385)
 100%
Color table for raster map <map1> set to 'bcyr'
==3098== 
==3098== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 1)
==3098== 
==3098== 1 errors in context 1 of 2:
==3098== Invalid write of size 4
==3098==    at 0x804B349: get_fp_stats (stats.c:132)
==3098==    by 0x804A654: main (main.c:385)
==3098==  Address 0x4224FA8 is 0 bytes after a block of size 4,000 alloc'd
==3098==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==3098==    by 0x402EC00: G__calloc (alloc.c:74)
==3098==    by 0x804B1DA: get_fp_stats (stats.c:104)
==3098==    by 0x804A654: main (main.c:385)
==3098== 
==3098== 1 errors in context 2 of 2:
==3098== Invalid read of size 4
==3098==    at 0x804B344: get_fp_stats (stats.c:132)
==3098==    by 0x804A654: main (main.c:385)
==3098==  Address 0x4224FA8 is 0 bytes after a block of size 4,000 alloc'd
==3098==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==3098==    by 0x402EC00: G__calloc (alloc.c:74)
==3098==    by 0x804B1DA: get_fp_stats (stats.c:104)
==3098==    by 0x804A654: main (main.c:385)
--3098-- 
--3098-- supp:   21 Debian libc6 (2.3.x) stripped dynamic linker
==3098== 
==3098== IN SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 1)
==3098== 
==3098== malloc/free: in use at exit: 84,827 bytes in 1,460 blocks.
==3098== malloc/free: 3,367 allocs, 1,907 frees, 7,063,699 bytes allocated.
==3098== 
==3098== searching for pointers to 1,460 not-freed blocks.
==3098== checked 164,128 bytes.
==3098== 
...

ie before it read

Address 0x4223EDC is 12 bytes before a block of size 4,000 alloc'd

now it reads ... is 0 bytes after a block of size 4,000 alloc'd.

? Hamish

comment:7 by hamish, 16 years ago

..., and running with different combinations of -ae, -a, -e shows that the remaining valgrind error is only seen when using -e. If just -a alone then it's error free.

Hamish

in reply to:  6 comment:8 by glynn, 16 years ago

Replying to hamish:

fwiw, valgrind still reports an error, but now it is slightly different:

ie before it read

Address 0x4223EDC is 12 bytes before a block of size 4,000 alloc'd

now it reads

... is 0 bytes after a block of size 4,000 alloc'd

?

Fencepost error. If x == statf->max, then i == statf->count, which overruns. The array needs to have statf->count+1 elements. Fixed in r36862.

are the min/max values really bogus? r.univar and 'r.info -r' both give (-23,1999) for those values. The 'max' value seen in gdb frame # 7 (ie 7.60) is simply the natural log of the max value in the map (1999).

Right; get_fp_stats() overwrites min/max with the log/log-abs values, and gdb is showing the updated values.

So, yes, the bug was due to failing to account for the fact that log-abs needs to include zero in the range calculation as well as min and max.

comment:9 by hamish, 16 years ago

thanks, it now passes valgrind and electric-fence tests without errors. fix backported.

Hamish

Note: See TracTickets for help on using tickets.