Ticket #335 (new task)

Opened 5 years ago

Last modified 5 weeks ago

export floats and doubles with correct precision

Reported by: hamish Owned by: grass-dev@…
Priority: critical Milestone: 6.4.4
Component: Default Version: svn-develbranch6
Keywords: precision Cc:
Platform: All CPU: All

Description

see  http://thread.gmane.org/gmane.comp.gis.grass.devel/29622/

this started with lib/db/dbmi_base/valuefmt.c %lf -> %.15g

High priority: r.what, r.univar sum_str, r.info's fancy report min= max=, and r.colors. Also 'g.region -g' -> lib/gis/wind_format.c:

static int format_double(double value, char *buf)
{
    sprintf(buf, "%.8f", value);
    G_trim_decimal(buf);

    return 0;
}

I notice that raster/r.mapcalc/expression.c uses %.8g for a double.

for lat/lon, %.8f is approx 1mm (1852*60*1e-8; ie better than RTK GPS), for meter/feet based units it's very very small.

I guess we may as well to do this properly, i.e. split off FCELL values to something less precise ( %.6g, %f, ? ). Maybe new G_format_double() and G_format_float() fns for that?

$ svngrep -r '%\.[1-9][0-9][fg]' * | cut -f1 -d: | uniq

display/d.rast.edit/edit.c
display/d.zoom/set.c
general/g.setproj/main.c
general/g.setproj/get_num.c
general/g.setproj/get_deg.c
lib/proj/convert.c
lib/proj/get_proj.c
lib/vector/rtree/gammavol.c
lib/vector/Vlib/intersect.c
lib/gis/quant_io.c
lib/gis/gislib.dox
lib/gis/proj3.c
lib/gis/color_write.c
lib/gis/cats.c
lib/g3d/g3dkeys.c
lib/g3d/writeascii.c
lib/g3d/filecompare.c
lib/g3d/g3dcats.c
lib/db/dbmi_base/datetime.c
lib/db/dbmi_base/valuefmt.c
ps/ps.map/ps_fclrtbl.c
raster/r.colors/rules.c
raster/r.stats/raw_stats.c
raster/r.univar2/stats.c
raster/r.reclass/main.c
raster/r.recode/read_rules.c
raster/r.info/main.c
raster/r.quant/read_rules.c
raster/r.distance/report.c
raster/r.cats/main.c
raster/r.external/main.c
raster/r.what/main.c
raster/r.statistics/o_average.c
raster/r.average/main.c
scripts/r.in.srtm/r.in.srtm
vector/v.to.points/main.c
vector/v.segment/main.c
vector/v.label.sa/main.c
vector/v.what.rast/main.c
vector/v.to.db/update.c
vector/v.to.db/report.c
vector/v.kernel/main.c
vector/v.in.ascii/points.c

the list is long and manual review is needed for each one.

Hamish

Change History

  Changed 5 years ago by neteler

IMHO this needs to be fixed before 6.4.0 is released (hopefully end of the year with release candidates starting as soon as vector undo is implemented in the new digitizer).

Markus

  Changed 5 years ago by msieczka

  • priority changed from minor to critical

  Changed 4 years ago by hamish

v.univar too,

g.copy vect=archsites,tmp_arch
v.db.addcol tmp_arch column='x double, y double'
v.to.db tmp_arch option=coor columns=x,y

eval `v.univar -g tmp_arch column=x type=point | grep -w mean`
X_MEAN=$mean

eval `v.univar -g tmp_arch column=y type=point | grep -w mean`
Y_MEAN=$mean

echo "Center of point cloud: $X_MEAN, $Y_MEAN"

Center of point cloud: 598326, 4.92146e+06

would be nice for y coord to be intact (ie preserve 3 digits to the right of the decimal point (mm))

Hamish

follow-up: ↓ 5   Changed 4 years ago by hamish

'g.region -g' for lat/lon fixed for G_format_*() in r37725,6,7.

in reply to: ↑ 4   Changed 4 years ago by hamish

Replying to hamish:

'g.region -g' for lat/lon fixed for G_format_*() in r37725,6,7.

(see also #654)

  Changed 4 years ago by hamish

Hi,

is there any reason not to change the following to %.15g?

raster/r.colors/
rules.c:47:         sprintf(minstr, "%.25f", (double)min);
rules.c:48:         sprintf(maxstr, "%.25f", (double)max);

AFAIK %.25f for a double (DCELL) is just reporting noise. It causes out-of-range white areas on a raster map because the 0%,100% color rules are no longer == the actual floating point value (50% chance it falls on the outside of the range)

also, in lib/gis/color_write.c:

#define PRECISION 30
#define THRESHOLD .0000000000000000000000000000005
/* .5 * 10 ^(-30) */
...
static int format_min(char *str, double dval)
{
    double dtmp;

    sprintf(str, "%.*f", PRECISION, dval);
    G_trim_decimal(str);
    sscanf(str, "%lf", &dtmp);
    if (dtmp != dval) {         /* if  no zeros after decimal point were trimmed */
        sprintf(str, "%.*f", PRECISION, dval - THRESHOLD);
        /* because precision is probably higher than PRECISION */
    }

    return 0;
}

(max is similar but adds the spillover threshold; but apparently this doesn't have the desired effect as 5e-31 is swept away in the noise anyway. ?)

I blindly guess number originally came from an attempt to be a wee bit bigger than 1e-32. ???? (dates back to CVS rev1.1 so no help from the commit history)

i.e. can we always assume DCELL|double to be the same bit depth, and %.15g the most we'll ever be able to squeeze from it?

Hamish

  Changed 4 years ago by hamish

oblig "What Every Computer Scientist Should Know About Floating-Point Arithmetic", by David Goldberg:  http://docs.sun.com/source/806-3568/ncg_goldberg.html

  Changed 4 years ago by hamish

on grass-dev Glynn wrote:

We can assume that it always refers to IEEE 754 double precision format. Systems where this isn't the case are extremely uncommon, and will almost certainly fail to support GRASS without a great deal of additional effort.

And regardless of the precision of "double", I seriously doubt that anyone has actual data which is accurate to 15 significant digits (equivalent to specifying the circumference of the earth to within 40 nanometres, or the height of Everest to within 9 picometres).

r.colors fixed in trunk with r38750, r38752. (see also #720)

Hamish

  Changed 4 years ago by hamish

r.info's data range (both fancy and shell style) updated in devbr6 and trunk to use %.7g for floats and %.15g for doubles.

Hamish

  Changed 4 years ago by hamish

r.what's cell data output updated in devbr6 and trunk to use %.7g for floats and %.15g for doubles.

  Changed 4 years ago by hamish

  • platform changed from Unspecified to All
  • version changed from unspecified to svn-develbranch6
  • cpu changed from Unspecified to All
  • milestone changed from 6.4.0 to 6.4.1

good progress made on the critical cases, deferring the rest 'til 6.4.1.

  Changed 4 years ago by anonymous

  • milestone 6.4.1 deleted

Milestone 6.4.1 deleted

  Changed 3 years ago by hamish

  • milestone set to 6.4.1

  Changed 2 years ago by hamish

with respect to raster/r.stats/raw_stats.c (and so r.out.xyz), the _is_fp() array-filling in main.c needs to be changed to fill the array with the map type, instead of just categorical or FP. The array is already type int, so this shouldn't be too hard. Then raw_stats.c can output %.7g or %.15g depending on FCELL or DCELL (see r.info), instead of the current %.10f for all.

Hamish

  Changed 2 years ago by hamish

r.what.color in the dev branches changed to %.15g for FP maps, but that should be broken down by FCELL & DCELL, and FCELL should use %.7g.

Hamish

  Changed 2 years ago by hamish

r.reclass done in trunk and devbr6 with r45696,7. Should be backported to relbr64 after some period of testing.

Hamish

  Changed 20 months ago by hamish

something thing to keep in mind as we work through these: GRASS's processing modules can be very(!) useful for processing imagery from scanning electron microscopes. Typical pixel resolution for that might be 1 micron. aka the map units aren't necessarily in degrees, meters, or some flavor of feet.

at the other end of things, for star catalog az,el,distance point data projected from a lat/lon import the map units might be parsec or light-years. (I suspect r.mapcalc et al. would also be highly useful for processing large telescope images)

Hamish

  Changed 11 months ago by hamish

devbr6: the precision stored in colr/ files done in r52194, d.what.rast done in r52195.

follow-up: ↓ 21   Changed 9 months ago by neteler

Please backport to 6.4 since not much testing happens in 6.5 (but in 6.4.svn). (I see that many backports are missing!).

  Changed 9 months ago by martinl

  • milestone changed from 6.4.1 to 6.4.3

in reply to: ↑ 19 ; follow-up: ↓ 23   Changed 9 months ago by mmetz

Replying to neteler:

Please backport to 6.4 since not much testing happens in 6.5 (but in 6.4.svn). (I see that many backports are missing!).

Glynn: Binary to text conversions must use %.9g (float) and %.17g (double) in order to preserve fp values in a binary-decimal-binary round-trip, e.g. r.out.ascii followed by r.in.ascii.

The modules [r|v].out.ascii need to be added to the list.

Markus M

  Changed 3 months ago by hamish

  • milestone changed from 6.4.3 to 6.4.4

continued in 6.4.4 ...

in reply to: ↑ 21 ; follow-up: ↓ 25   Changed 3 months ago by hamish

Replying to mmetz:

Glynn: Binary to text conversions must use %.9g (float) and %.17g (double) in order to preserve fp values in a binary-decimal-binary round-trip, e.g. r.out.ascii followed by r.in.ascii.

take care that preserving b-d-b round trip ,on a single platform, is not the only task or consideration. For the r.out.ascii + r.in.ascii round trip it may well be appropriate, but while conceding that point I'd argue that r.*.bin or r.*.mat would be a better choice in those cases.

%.15,7g was chosen because it's perfectly reproducible and doesn't introduce useless .0000000000000001 crap into the data files which G_trim_decimal() can't handle. For things like r.univar, r.info, and v.db.select the output is at least in part for human consumption; there's no practical need to expose the FP noise. The main thing for us to focus on I think is all the remaining lossy %f and meaningless %.56f type stuff in the code, not splitting hairs over preserving quanta finer than GRASS_EPSILON.

wrt lib/gis/color_write.c, look closely & you'll see there is a +/- GRASS_EPSILON adjustment on the range to ensure that the rounding exceeds the range, and so you shouldn't ever get white spots at the peaks and pits.

best, Hamish

(this is something I hope python cleans up with 3.0)

  Changed 3 months ago by hamish

I wrote:

look closely & you'll see there is a +/- GRASS_EPSILON adjustment on the range

oops, I used it wrong, you already fixed it. (2^1023.9999999999999 - 2^1023.9999999999998) > GRASS_EPSILON

Hamish

in reply to: ↑ 23   Changed 3 months ago by mmetz

Replying to hamish:

Replying to mmetz:

Glynn: Binary to text conversions must use %.9g (float) and %.17g (double) in order to preserve fp values in a binary-decimal-binary round-trip, e.g. r.out.ascii followed by r.in.ascii.

take care that preserving b-d-b round trip ,on a single platform, is not the only task or consideration. For the r.out.ascii + r.in.ascii round trip it may well be appropriate, but while conceding that point I'd argue that r.*.bin or r.*.mat would be a better choice in those cases.

Please add this information to the manuals if you think this is crucial.

%.15,7g was chosen because it's perfectly reproducible

No, it is not. %.17,9g is reproducible. See IEEE 754 standard.

and doesn't introduce useless .0000000000000001 crap

On what platform do you get this crap? %.17,9g does not produce this crap.

into the data files which G_trim_decimal() can't handle.

G_trim_decimal() is not needed for %.17,9g.

The main thing for us to focus on I think is all the remaining lossy %f and meaningless %.56f type stuff in the code

Sure, therefore %.17,9g.

not splitting hairs over preserving quanta finer than GRASS_EPSILON.

Huh? This thread is not about GRASS_EPSILON. GRASS_EPSILON is 1.0e-15. A perfectly valid range of double fp data is e.g. 1.0e-200 to 1.0e-300, way smaller than GRASS_EPSILON. You do not want to corrupt these data.

BTW, I found that fp calculation errors are rather in the range of 1.0e-4 to 1.0e-8.

Markus M

follow-up: ↓ 27   Changed 3 months ago by hamish

%.15,7g was chosen because it's perfectly reproducible

No, it is not. %.17,9g is reproducible. See IEEE 754 standard.

I didn't mean reproducible as far as the binary stored value round-trip was concerned, I meant reproducible as far as getting the same ascii result on two different hardware platforms.

for example, getting r.md5sum (addon to aid the test_suite) to give the same (portable) result when checksumming r.out.ascii output when the data is stored as floating point &/or not strictly real (2.5000000). My understanding, for what it is, is that the least sig. digits can flicker depending on the CPU arch, perhaps the compiler, and programming language's implementation too.

I can't remember which right now, but one of the ogr/gdal/proj4 input streams was introducing .00000000001 style artifacts, it's probably worth digging that up as a real-world test case example as there are probably two classes of this problem with the same symptom: one to do with GRASS's use of %f and atof() in places, and another to do with the output of input values which were malformed before they came into grass.

Another interesting corner-case example to try with this is the earthquake demo from the grass-promo/tutorials/ svn, where the heavily logarithmic data scale within a single dataset really stretches what fits well in the ieee FP model, and so something more linear like magnitude is used to store it instead of raw extajoules.

A perfectly valid range of double fp data is e.g. 1.0e-200 to 1.0e-300, way smaller than GRASS_EPSILON. You do not want to corrupt these data.

(see the following post with the 2^1024 example where I corrected myself about that after seeing how you fixed the error in my earlier attempt to make sure the color min/max rule was bigger than the data range)

regards, Hamish

in reply to: ↑ 26 ; follow-up: ↓ 28   Changed 3 months ago by glynn

Replying to hamish:

No, it is not. %.17,9g is reproducible. See IEEE 754 standard.

I didn't mean reproducible as far as the binary stored value round-trip was concerned, I meant reproducible as far as getting the same ascii result on two different hardware platforms.

All common platforms use IEEE-754 representation, so I wouldn't expect differences due to hardware.

My understanding, for what it is, is that the least sig. digits can flicker depending on the CPU arch, perhaps the compiler, and programming language's implementation too.

Any differences are due to software, not hardware. The main reasons for differences are:

1. Whether the software produces the closest decimal value to the actual binary value, or the shortest decimal value which would convert to the actual binary value (those two aren't necessarily the same).

2. The rounding mode used in the event of a tie. E.g. 3.0/16=0.1875; if that is displayed with 3 fractional digits, should the result be 0.187 (round toward zero, round toward negative infinity) or 0.188 (round toward positive infinity, round toward nearest even value)?

3. Bugs. Many programmers don't understand the details of floating-point, and don't particularly care whether the least-significant digits are "correct" (particularly as that would requiring which flavour of "correct" you actually want).

in reply to: ↑ 27 ; follow-up: ↓ 29   Changed 3 months ago by neteler

  • milestone changed from 6.4.4 to 6.4.3

Replying to glynn:

Replying to hamish:

Replying to mmetz:

No, it is not. %.17,9g is reproducible. See IEEE 754 standard.

I didn't mean reproducible as far as the binary stored value round-trip was concerned, I meant reproducible as far as getting the same ascii result on two different hardware platforms.

All common platforms use IEEE-754 representation, so I wouldn't expect differences due to hardware.

Should r55122 be reverted then?

in reply to: ↑ 28   Changed 3 months ago by mmetz

Replying to neteler:

Replying to glynn:

Replying to hamish:

Replying to mmetz:

No, it is not. %.17,9g is reproducible. See IEEE 754 standard.

I didn't mean reproducible as far as the binary stored value round-trip was concerned, I meant reproducible as far as getting the same ascii result on two different hardware platforms.

All common platforms use IEEE-754 representation, so I wouldn't expect differences due to hardware.

Should r55122 be reverted then?

Reverted in r55183.

Markus M

  Changed 5 weeks ago by hamish

  • milestone changed from 6.4.3 to 6.4.4

to be continued ..

Note: See TracTickets for help on using tickets.