Opened 10 years ago
Closed 8 years ago
#2272 closed defect (fixed)
Improve consistency in random generator usage
Reported by: | neteler | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 7.0.5 |
Component: | Default | Version: | svn-releasebranch70 |
Keywords: | random | Cc: | |
CPU: | Unspecified | Platform: | Unspecified |
Description
The usage of drand48()/srand48() should be consolitated, perhaps using G_math_rand() which is provided in lib/gmath/rand1.c:
# incomplete search find . -name '*.c' | xargs grep -l rand48 ./raster/r.mapcalc/xrand.c ./raster/r.mapcalc/evaluate.c ./raster/r.random/creat_rand.c ./lib/gmath/rand1.c ./lib/vector/rtree/rect.c ./vector/v.qcount/findquads.c ./vector/v.random/main.c ./vector/v.kcv/main.c grep rand48 */*/*.c | grep -v flag lib/gmath/rand1.c: srand48(-seed); lib/gmath/rand1.c: return (float)drand48(); raster/r.mapcalc/evaluate.c: srand48(seed_value); raster/r.mapcalc/xrand.c:#define drand48() ((double)rand()/((double)RAND_MAX + 1)) raster/r.mapcalc/xrand.c:#define mrand48() ((long)rand()) raster/r.mapcalc/xrand.c: unsigned long x = (unsigned long)mrand48(); raster/r.mapcalc/xrand.c: double x = drand48(); raster/r.mapcalc/xrand.c: double x = drand48(); raster/r.random/creat_rand.c:#define lrand48() (((long) rand() ^ ((long) rand() << 16)) & 0x7FFFFFFF) raster/r.random/creat_rand.c:#define srand48(sv) (srand((unsigned)(sv))) raster/r.random/creat_rand.c: return lrand48(); raster/r.random/creat_rand.c: srand48((long)time((time_t *) 0)); vector/v.kcv/main.c: rng = drand48; vector/v.kcv/main.c: srand48((long)getpid()); vector/v.qcount/findquads.c: #define R_INIT srand48 vector/v.qcount/findquads.c: #define RANDOM(lo,hi) (drand48()/R_MAX*(hi-lo)+lo) vector/v.random/main.c:double drand48() vector/v.random/main.c:#define srand48(sv) (srand((unsigned)(sv))) vector/v.random/main.c: struct Flag *rand, *drand48, *z, *notopo; vector/v.random/main.c: rng = drand48; vector/v.random/main.c: srand48((long)seed); vector/v.random/main.c: srand48((long)getpid());
See also r53234 etc.
Attachments (1)
Change History (12)
by , 10 years ago
comment:1 by , 10 years ago
Replying to neteler:
The usage of drand48()/srand48() should be consolitated, perhaps using G_math_rand() which is provided in lib/gmath/rand1.c:
First, G_math_rand() should be fixed. Currently, it returns a value strictly less than 1.0 if drand48 is used, or a value less than or equal to 1.0 if rand() is used.
Also, seeding should be a separate function.
And we should probably have a portable equivalent to lrand48(), i.e. something which returns a non-negative 31-bit integer. The version in r.random is broken, as bit 15 will always be zero if RAND_MAX is 32767 (the minimum value allowed by the ISO C standards).
My suggestion would be to re-implement the LCG used by lrand48() etc. Sample implementation
comment:2 by , 10 years ago
Priority: | normal → blocker |
---|
follow-up: 5 comment:3 by , 10 years ago
For the record, from grass-dev:
On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements wrote:
Most uses of rand, random, lrand48, etc have been replaced in r61415 and r61416.
Many of these modules seeded the RNG from either the clock or the PID, with no way to provide an explicit seed. In such cases, the use of G_srand48_auto() has been marked with a "FIXME" comment.
It would be possible to modify G_srand48_auto() to allow the use of an environment variable, but this has problems of its own (e.g. setting the variable manually then forgetting about it).
r.li.daemon/daemon.c uses a hard-coded seed of zero.
r61416 changes readcell.c in r.proj, i.rectify, and i.ortho.rectify. These all used rand() to select a random cache block for ejection.
While this wouldn't have affected the result, only the first RAND_MAX cache blocks would have been used. RAND_MAX is only guaranteed to be at least 32767, which would limit the effective cache size to 1 GiB (each block is 64 * 64 = 4096 "double"s = 32kiB).
Cases which haven't been changed are:
lib/raster3d/test/test_put_get_value_large_file.c. This appears to be a test case; does it matter?
include/iostream/quicksort.h uses random() or rand() to select a pivot for the quicksort algorithm. That file has no dependency on lib/gis (or anything else, except for <stdlib.h> for random/rand), and I didn't want to add one unnecessarily.
Again, this shouldn't affect the result, but there may be performance issues if the size of the array being sorted is significantly larger than RAND_MAX (in this situation, the algorithm will be O(n2) even in the best case).
Unless there's a specific reason not to, it may be better to simply replace all uses of that file with std::sort() from <algorithm>.
comment:4 by , 10 years ago
Priority: | blocker → critical |
---|
Bundled backport to relbranch70 in r61471. Downgrading priority.
comment:5 by , 10 years ago
Priority: | critical → major |
---|
Downgrading priority (or to be closed?).
Replying to neteler:
For the record, from grass-dev:
On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements wrote:
Cases which haven't been changed are:
lib/raster3d/test/test_put_get_value_large_file.c. This appears to be a test case; does it matter?
include/iostream/quicksort.h uses random() or rand() to select a pivot for the quicksort algorithm. That file has no dependency on lib/gis (or anything else, except for <stdlib.h> for random/rand), and I didn't want to add one unnecessarily.
Again, this shouldn't affect the result, but there may be performance issues if the size of the array being sorted is significantly larger than RAND_MAX (in this situation, the algorithm will be O(n2) even in the best case).
Unless there's a specific reason not to, it may be better to simply replace all uses of that file with std::sort() from <algorithm>.
follow-up: 7 comment:6 by , 10 years ago
By chance I found
raster/simwe/simlib/random.c
/* uniform random number generator (combined type) */
Should that be replaced as well (how)?
follow-up: 8 comment:7 by , 10 years ago
Replying to neteler:
By chance I found
raster/simwe/simlib/random.c
/* uniform random number generator (combined type) */
Should that be replaced as well (how)?
ulec() can be replaced by G_drand48().
seedg() isn't used.
seeds() is always called with fixed values (12345 and 67891). Either call G_srand48() with a fixed value (if repeatability is important), or call G_srand48_auto() for a non-deterministic seed (if it isn't), or add a seed= option to the modules.
FWIW, it seems possible for ulec() to return values slightly greater than 1:
ret_val = (double)iz *4.656613e-10; return ret_val;
(231-1) * 4.656613e-10 = 1.0000000267907612
comment:8 by , 10 years ago
Replying to glynn:
Replying to neteler:
By chance I found
raster/simwe/simlib/random.c
/* uniform random number generator (combined type) */
Should that be replaced as well (how)?
ulec() can be replaced by G_drand48().
seedg() isn't used.
seeds() is always called with fixed values (12345 and 67891). Either call G_srand48() with a fixed value (if repeatability is important), or call G_srand48_auto() for a non-deterministic seed (if it isn't), or add a seed= option to the modules.
Done in r63216 and backported. The seed is fixed value.
comment:10 by , 8 years ago
Milestone: | 7.0.0 → 7.0.5 |
---|
lrand48 implementation