Opened 8 years ago

Closed 8 years ago

#2966 closed defect (fixed)

r.sim.water not working in 7.0.3 32bit and 64bit

Reported by: baharmon Owned by: grass-dev@…
Priority: normal Milestone: 7.0.4
Component: Raster Version: 7.0.3
Keywords: r.sim.water, r.sim.erosion, SIMWE, MinGW-w64, compiling, DLL Cc:
CPU: All Platform: MSWindows 8

Description

Running r.sim.water caused GRASS 7.0.3 32 and 64 bit to crash on Windows 8.2 and Ubuntu. Works fine on GRASS 7.0.2.

Example with NC sample data:

g.region rast=elev_lid792_1m -p

r.slope.aspect elevation=elev_lid792_1m dx=elev_lid792_dx_1m dy=elev_lid792_dy_1m

r.sim.water elevation=elev_lid792_1m dx=elev_lid792_dx_1m dy=elev_lid792_dy_1m depth=water_depth_1m disch=water_discharge_1m nwalk=10000 rain_value=100 niter=5

When run in a script I got error code 255.

grass.exceptions.CalledModuleError: Module run None
['r.sim.water', '--o', 'niterations=niter',
'elevation=elevation', 'rain_value=rain',
'depth=depth', 'dx=dx', 'dy=dy', 'nwalkers=walkers']
ended with error
Process ended with non-zero return code 255. See errors in
the (error) output.

Change History (6)

comment:1 by baharmon, 8 years ago

n.b.: Just a problem on Windows. (I made a typo before.)

comment:2 by annakrat, 8 years ago

Based on the debug output, the problem is somewhere between lines 301 and 371

Version 0, edited 8 years ago by annakrat (next)

comment:3 by wenzeslaus, 8 years ago

Just to be sure I tried valgrind --redzone-size=256 but it revealed nothing.

I was not really successful in understanding what 255 mean on MS Windows: The extended attributes are inconsistent. (ERROR_EA_LIST_INCONSISTENT). According to my search this error is quite often related to sound (?) but also it happens also when programming with subprocesses in .NET/MVC and with some SAS and IBM software where it relates to access rights for cmd.exe. In one case existing with return code 0 from a script fixed the issue (r.sim.water runs OK and returns 0 for me on Ubuntu). (Most of the other search results suggest to download and install some program which will "fix" your PC in few mouse clicks.)

comment:4 by wenzeslaus, 8 years ago

CPU: UnspecifiedAll
Keywords: r.sim.water r.sim.erosion SIMWE MinGW-w64 compiling added

It seems that it fails in between lines 303 and 325:

mixx = conv * cellhd.west;
maxx = conv * cellhd.east;
miyy = conv * cellhd.south;
mayy = conv * cellhd.north;
stepx = cellhd.ew_res * conv;
stepy = cellhd.ns_res * conv;
step = (stepx + stepy) / 2.;
mx = cellhd.cols;
my = cellhd.rows;
x_orig = cellhd.west * conv;
y_orig = cellhd.south * conv;
xmin = 0.;
ymin = 0.;
xp0 = xmin + stepx / 2.;
yp0 = ymin + stepy / 2.;
xmax = xmin + stepx * (float)mx;
ymax = ymin + stepy * (float)my;

There is nothing special there except for the fact that the variables are global defined, e.g. in simlib/hydro.c:

double xmin, ymin, xmax, ymax;
double mayy, miyy, maxx, mixx;

and declared as extern in simlib/waterglobs.h, e.g.:

extern double xmin, ymin, xmax, ymax;
extern double mayy, miyy, maxx, mixx;

There should be nothing bad about these definitions but StackOverflow explains that global variables, when placed in DLLs on MS Windows, are different. [http://stackoverflow.com/questions/19373061 StackOverflow.

What happens to global and static variables in a shared library when it is dynamically linked?]:

In the case of Windows (.exe and .dll), the extern global variables are not part of the exported symbols. In other words, different modules are in no way aware of global variables defined in other modules. This means that you will get linker errors if you try, for example, to create an executable that is supposed to use an extern variable defined in a DLL, because this is not allowed.

In the case of Unix-like environments (like Linux), the dynamic libraries, called "shared objects" with extension .so export all extern global variables (or functions). In this case, if you do load-time linking from anywhere to a shared object file, then the global variables are shared, i.e., linked together as one. Basically, Unix-like systems are designed to make it so that there is virtually no difference between linking with a static or a dynamic library.

It further says that you must do the following to actually export a global variable in Windows:

#ifdef COMPILING_THE_DLL
#define MY_DLL_EXPORT extern "C" __declspec(dllexport)
#else
#define MY_DLL_EXPORT extern "C" __declspec(dllimport)
#endif

MY_DLL_EXPORT int my_global;

It further says that the syntax is similar to the function export/import syntax which we are not using in GRASS except for few places (grep reveals mapcalc.tab.c, lz4.h, shapefil.h, and sqlp.tab.c).

The simlib directory is compiled into a shared object file/dynamic library (I have libgrass_sim.so in dist.../lib), so I assume the above applies here, although the code should lead to linker error rather then some failure during runtime (but that might apply just to MSVC compiler).

r.sim.water works in 7.0.2:

System Info
GRASS version: 7.0.2
GRASS SVN Revision: 66861
Build Date: 2015-11-19
Build Platform: i686-pc-mingw32
GDAL/OGR: 1.11.3
PROJ.4: 4.8.0
GEOS: 3.5.0
SQLite: 3.7.17
Python: 2.7.4
wxPython: 2.8.12.1
Platform: Windows-8-6.2.9200
> g.version -re
GRASS 7.0.2 (2015)
libgis Revision: 64733
libgis Date: 2015-02-25 01:56:29 +0100 (st, 25 2 2015)
...

So, it seems that it is related to transition from MinGW (32) to MinGW-w64.

Based on the above I think that our options are (assuming we want to support MS Windows):

  1. use the (messy) __declspec in some way
  2. modify the globals in simlib using functions instead of direct access (approach used in the GRASS libraries?)
  3. get rid of the global variables, use structures and function parameters instead
  4. copy (duplicate) the files into each module (r.sim.water and r.sim.erosion)
  5. use simlib as a static library
  6. don't use simlib directory as a library, just compile object files

comment:5 by wenzeslaus, 8 years ago

In r68320 I changed the code in raster/r.sim to use a structure which is then passed to the library which still uses global variables internally. There were actually extern library variables used even before the block mixx = conv * cellhd.west;, so I removed them as well, however, this also may indicate that the problem is somewhere else.

comment:6 by wenzeslaus, 8 years ago

Keywords: DLL added
Resolution: fixed
Status: newclosed

Tested with 64bit version on MS Windows 8. Results are identical for both r.sim.water and r.sim.sediment according to test on 64bit Linux. r68320 backported to 70 release branch in r68325.

Note: See TracTickets for help on using tickets.