Opened 12 years ago
Closed 12 years ago
#2051 closed defect (fixed)
ST_AsGeoJSON printing too many digits
Reported by: | strk | Owned by: | strk |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.0.2 |
Component: | postgis | Version: | 2.0.x |
Keywords: | Cc: |
Description
I found that ST_AsGeoJSON is printing more digits than the double number can hold, or that's the only explanation I can find for this:
=# with inp as ( SELECT st_snaptogrid('POINT(59.987654321987654321 -59.987654321987654321)'::geometry, 0.01) as g ) select ST_AsText(g) as text, ST_AsGeoJSON(g) as json from inp; -[ RECORD 1 ]----------------------------------------------------------------- text | POINT(59.99 -59.99) json | {"type":"Point","coordinates":[59.990000000000002,-59.990000000000002]}
I understand 0.01 is not a power of 2, but I think being consistent between ST_AsText and ST_AsGeoJSON would be nice
Change History (17)
comment:1 by , 12 years ago
- ST_AsText uses DBL_DIG (16): http://tigcc.ticalc.org/doc/float.html#DBL_DIG
- ST_AsGeoJSON uses OUT_MX_DOUBLE_PRECISION (15): source:/trunk/liblwgeom/liblwgeom.h.in#L1337
comment:2 by , 12 years ago
Speaking of which, I don't think those OUT_XXX macros should be in liblwgeom.h, right ?
comment:3 by , 12 years ago
Ok I found the real difference. It's not the precision, but the format string being %.*f for geojson and %.*g for ST_AsText.
The code in liblwgeom/lwout_geojson.c suggest it was intentional, but I can't understand the rationale for it. Oliver, can you shed a light here ? The code is:
if (fabs(pt.x) < OUT_MAX_DOUBLE) sprintf(x, "%.*f", precision, pt.x); else sprintf(x, "%g", pt.x);
comment:4 by , 12 years ago
Uhm, answering myself it seems related to the aim to avoid the scientific notation ..
Anyway this is still a bug. Easier to see it as such:
SELECT ST_AsGeoJSON('POINT(59.99, -59.99)'::geometry); {"type":"Point","coordinates":[59.990000000000002,-59.990000000000002]}
comment:5 by , 12 years ago
Sandro,
Yeap the aim is indeed to avoid scientific notation, in export functions other than ST_AsText
comment:6 by , 12 years ago
Olivier: but I see there's a unit test for a geojson output _with_ scientific notation in it, so is that still valid geojson ?
comment:7 by , 12 years ago
Scientific notation should only occur when there's no other way to represent the coordinates. like in the unit test: POINT(1E300 -1E300)
As i recall it was a user ticket who lead to this fix
GeoJSON is small enough not to say anything about scientific notation,
comment:8 by , 12 years ago
Ok so I think the proper fix to this would be to count the number of integer digits and detract them from the given precision. That way we should never give random numbers when out of significant digits (as happening in this case, right?)
comment:9 by , 12 years ago
If you can find that ticket that lead to accepting scientific notation I'll be happy to take a look to see if my approach would fix that case too…
comment:10 by , 12 years ago
Well dunno, but the aim is to be able to set the number of decimal digit i.e ST_AsGeoJson('POINT(1.23456 123.456789)'::geometry, 3)
Should still return …[1.234,123.456]…
Ticket was: http://trac.osgeo.org/postgis/ticket/277
comment:11 by , 12 years ago
Alright, but that ticket is about a crash, so I guess it had more to do with memory management than with notation. Anyway I see what you mean… if the number would have too many non-decimal digits we probably don't want to print them all.
But I guess the same could be said for things like 1e-300. You'd get all zeros from those. Should we adopt the same scientific notation in that case too ? Or just return all zeros ? Is this behavior documented in the manual ?
comment:12 by , 12 years ago
In a GIS common world we deal with 1e-2 to 1e10.
1e300 seems useless/overkill, but obviously must not lead to a crash, so scientific notation appears as a nice workaround/answer, without having to compute exact memory to allocate for coordinates.
Indeed, not documented as not a real world feature.
(Same use case with 1e-300 from my point of view too)
comment:13 by , 12 years ago
So I've pushed my implementation of "decimal-digits-is-limited-by-max-significant-digits" to trunk with r10451.
I would also backport to the 2.0 branch but haven't run profiling tests to know exactly how the new code affects speed. Do you have some good dataset to stress this out and compare ?
comment:14 by , 12 years ago
A rude profiling tool:
time psql -xAtc "select sum(octet_length(st_asgeojson(g))) from ( select st_makepoint(random(), random()) as g from generate_series(1,10000000) ) as f;"
Takes 28.92 seconds for me on trunk and 28.64 on 2.0. Doesn't look like any dramatic slowdown, does it ?
comment:15 by , 12 years ago
Indeed, it seems still OK on perf side, i will test on real world data too
comment:16 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Alright, backported to 2.0 branch with r10452 — also I added the test for 1e-300 case
Let me know if anything comes up
comment:17 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Assuming all is fine. Feel free to reopen if needed.