Opened 11 years ago

Closed 11 years ago

#2051 closed defect (fixed)

ST_AsGeoJSON printing too many digits

Reported by: strk Owned by: strk
Priority: medium Milestone: PostGIS 2.0.2
Component: postgis Version: 2.0.x
Keywords: Cc:

Description

I found that ST_AsGeoJSON is printing more digits than the double number can hold, or that's the only explanation I can find for this:

=# with inp as ( SELECT st_snaptogrid('POINT(59.987654321987654321 -59.987654321987654321)'::geometry, 0.01) as g ) select ST_AsText(g) as text, ST_AsGeoJSON(g) as json from inp;
-[ RECORD 1 ]-----------------------------------------------------------------
text | POINT(59.99 -59.99)
json | {"type":"Point","coordinates":[59.990000000000002,-59.990000000000002]}

I understand 0.01 is not a power of 2, but I think being consistent between ST_AsText and ST_AsGeoJSON would be nice

Change History (17)

comment:1 by strk, 11 years ago

comment:2 by strk, 11 years ago

Speaking of which, I don't think those OUT_XXX macros should be in liblwgeom.h, right ?

comment:3 by strk, 11 years ago

Ok I found the real difference. It's not the precision, but the format string being %.*f for geojson and %.*g for ST_AsText.

The code in liblwgeom/lwout_geojson.c suggest it was intentional, but I can't understand the rationale for it. Oliver, can you shed a light here ? The code is:

      if (fabs(pt.x) < OUT_MAX_DOUBLE) 
        sprintf(x, "%.*f", precision, pt.x);
      else
        sprintf(x, "%g", pt.x);

comment:4 by strk, 11 years ago

Uhm, answering myself it seems related to the aim to avoid the scientific notation ..

Anyway this is still a bug. Easier to see it as such:

SELECT ST_AsGeoJSON('POINT(59.99, -59.99)'::geometry);
{"type":"Point","coordinates":[59.990000000000002,-59.990000000000002]}

comment:5 by colivier, 11 years ago

Sandro,

Yeap the aim is indeed to avoid scientific notation, in export functions other than ST_AsText

comment:6 by strk, 11 years ago

Olivier: but I see there's a unit test for a geojson output _with_ scientific notation in it, so is that still valid geojson ?

comment:7 by colivier, 11 years ago

Scientific notation should only occur when there's no other way to represent the coordinates. like in the unit test: POINT(1E300 -1E300)

As i recall it was a user ticket who lead to this fix

GeoJSON is small enough not to say anything about scientific notation,

comment:8 by strk, 11 years ago

Ok so I think the proper fix to this would be to count the number of integer digits and detract them from the given precision. That way we should never give random numbers when out of significant digits (as happening in this case, right?)

comment:9 by strk, 11 years ago

If you can find that ticket that lead to accepting scientific notation I'll be happy to take a look to see if my approach would fix that case too…

comment:10 by colivier, 11 years ago

Well dunno, but the aim is to be able to set the number of decimal digit i.e ST_AsGeoJson('POINT(1.23456 123.456789)'::geometry, 3)

Should still return …[1.234,123.456]…

Ticket was: http://trac.osgeo.org/postgis/ticket/277

comment:11 by strk, 11 years ago

Alright, but that ticket is about a crash, so I guess it had more to do with memory management than with notation. Anyway I see what you mean… if the number would have too many non-decimal digits we probably don't want to print them all.

But I guess the same could be said for things like 1e-300. You'd get all zeros from those. Should we adopt the same scientific notation in that case too ? Or just return all zeros ? Is this behavior documented in the manual ?

comment:12 by colivier, 11 years ago

In a GIS common world we deal with 1e-2 to 1e10.

1e300 seems useless/overkill, but obviously must not lead to a crash, so scientific notation appears as a nice workaround/answer, without having to compute exact memory to allocate for coordinates.

Indeed, not documented as not a real world feature.

(Same use case with 1e-300 from my point of view too)

comment:13 by strk, 11 years ago

So I've pushed my implementation of "decimal-digits-is-limited-by-max-significant-digits" to trunk with r10451.

I would also backport to the 2.0 branch but haven't run profiling tests to know exactly how the new code affects speed. Do you have some good dataset to stress this out and compare ?

comment:14 by strk, 11 years ago

A rude profiling tool:

time psql -xAtc "select sum(octet_length(st_asgeojson(g))) from ( select st_makepoint(random(), random()) as g from generate_series(1,10000000) ) as f;"

Takes 28.92 seconds for me on trunk and 28.64 on 2.0. Doesn't look like any dramatic slowdown, does it ?

comment:15 by colivier, 11 years ago

Indeed, it seems still OK on perf side, i will test on real world data too

comment:16 by strk, 11 years ago

Owner: changed from pramsey to strk
Status: newassigned

Alright, backported to 2.0 branch with r10452 — also I added the test for 1e-300 case

Let me know if anything comes up

comment:17 by strk, 11 years ago

Resolution: fixed
Status: assignedclosed

Assuming all is fine. Feel free to reopen if needed.

Note: See TracTickets for help on using tickets.