Opened 14 years ago

Closed 12 years ago

#503 closed defect (fixed)

ST_Area on small polygons in EPSG:31467 gives incorrect results depending on 32-/64-system-architecture

Reported by: mjansen Owned by: pramsey
Priority: medium Milestone: PostGIS 2.0.0
Component: postgis Version: 1.5.X
Keywords: Cc: birgit.laggner@…

Description

See this thread on the postgis-users-list: http://old.nabble.com/st_area-and-0.001953125-td28217229.html

Running this SQL:

SELECT ST_Area(ST_GeomFromEWKT(foo.ewkt)) AS area FROM (
SELECT
'SRID=31467;POLYGON((3508395.5093 5910738.8973,3508505.2864 5910690.8533,3508300.641 5910780.4164,3508395.5093 5910738.8973))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3508395.5093 5949092.7294,3508333.8642 5949090.138,3508411.1808 5949093.3882,3508395.5093 5949092.7294))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3498246.2563 5815493.2389,3498246.2472 5815493.1259,3498246.2094 5815493.1202,3498246.2563 5815493.2389))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3539635.9601 5832797.6616,3539635.9574 5832797.6613,3539635.9565 5832797.6851,3539635.9601 5832797.6616))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3546842.9496103 5778962.75289926,3546842.9761781 5778962.61321963,3546842.976 5778962.613,3546842.9496103 5778962.75289926))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3427425.5879 5841806.726,3427416.0171 5841792.2466,3427416.01703247 5841792.24664866,3427425.5879 5841806.726))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3538420.34038636 5912791.09822265,3538420.31141462 5912790.97224757,3538420.3113 5912790.9723,3538420.34038636 5912791.09822265))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3577970.8916 5729534.987,3577827.5551 5729621.0532,3577805.6282 5729634.2192,3577970.8916 5729534.987))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3472882.79775403 5870816.50310222,3472882.7978 5870816.5031,3472875.19541818 5870717.64922727,3472882.79775403 5870816.50310222))' as ewkt
UNION SELECT
'SRID=31467;POLYGON((3557563.34513562 5791040.4530635,3557563.44511383 5791040.49625184,3557563.3452 5791040.4529,3557563.34513562 5791040.4530635))'
) as foo(ewkt);

Should return

         area         
----------------------
  0.00210952758789062
 8.58306884765625e-06
  0.00310611724853516
  0.00228118896484375
 8.58306884765625e-06
 3.24249267578125e-05
 0.000721931457519531
  0.00169944763183594
 1.43051147460938e-05
 0.000333786010742188

And does so on

POSTGIS="1.4.1" GEOS="3.1.0-CAPI-1.5.0" PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS

running on

PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.1-4ubuntu8) 4.4.1, 32-bit

For the following combinations of PostgreSQL / PostGIS

PostgreSQL 8.2.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
POSTGIS="1.3.1" GEOS="2.2.3-CAPI-1.1.1" PROJ="Rel. 4.4.9, 29 Oct 2004" USE_STATS
PostgreSQL 8.4.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.3 20100316 (prerelease), 64-bit
POSTGIS="1.5.1" GEOS="3.2.0-CAPI-1.6.0" PROJ="Rel. 4.7.1, 23 September 2009" LIBXML="2.7.6" USE_STATS
PostgreSQL 8.2.13 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291]
POSTGIS="1.3.6" GEOS="3.1.1-CAPI-1.6.0" PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS
PostgreSQL 8.2.15 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839]
POSTGIS="1.3.6" GEOS="3.1.1-CAPI-1.6.0" PROJ="Rel. 4.7.1, 23 September 2009" USE_STATS (procs from 1.3.3 need upgrade)
PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839], 64-bit
POSTGIS="1.4.0" GEOS="3.2.0-CAPI-1.6.0" PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS

The query returns areas of either 0 or 0.001953125, e.g.:

    area     
-------------
           0
 0.001953125
 0.001953125
 0.001953125
 0.001953125
 0.001953125
 0.001953125
 0.001953125
           0
 0.001953125

The one visible difference between the systems is that the code works as expected on 32-bit-machines, but gives strange results on 64-bit-machines.

Attachments (1)

areacalc.ods (10.1 KB ) - added by nicklas 14 years ago.
area calculation in spreadsheet for example in comment 9

Download all attachments as: .zip

Change History (17)

comment:1 by mjansen, 14 years ago

Cc: birgit.laggner@… added

comment:2 by mjansen, 14 years ago

Summary: ST_Area on small Polygons in EPSG:31467 gives incorrect results depending on 32-/64-system-architectureST_Area on small polygons in EPSG:31467 gives incorrect results depending on 32-/64-system-architecture

comment:3 by pramsey, 14 years ago

postgis15=# select st_area('POLYGON ((3498246.2563 5815493.2389, 3498246.2472 5815493.1259, 3498246.2094 5815493.1202, 3498246.2563 5815493.2389))');
   st_area   
-------------
 0.001953125
(1 row)

postgis15=# select st_area('POLYGON ((3498246.2563 5815493.2389, 3498246.2472 5815493.1259, 3498246.2094 5815493.1202, 3498246.2294 5815493.1802,  3498246.2563 5815493.2389))');
 st_area 
---------
       0
(1 row)

postgis15=# select st_area('POLYGON ((0.2563 0.2389, 0.2472 0.1259, 0.2094 0.1202, 0.2294 0.1802, 0.2563 0.2389))');    st_area   
-------------
 0.002329765
(1 row)

Here's the third example, also on an x64 box (OS/X in this case). Adding a fourth vertex between two of the others causes the result to shift to zero. And shifting the polygon back closer to the origin adds precision which brings the result closer to the result for the i386 numbers. Basically it looks like we are on a coarser precision grid, and the vertices are being snapped into that grid, which is pretty counter-intuitive since we are on 64 bit hardware. It's almost like we're in a 32-bit float space instead of a 64-bit double space. Something very deep is going on here.

comment:4 by blaggner, 14 years ago

I only wanted to remark that not only small polygons are affected by this phenomenon - all areas I calculated with st_area() are multiples of 0.001953125 (what's not as it should be as appears to me).

comment:5 by nicklas, 14 years ago

A few thoughts

I don't have any 64 bit box with PostGIS here.

Paul am I right that PostGIS doesn't know anything about 32 or 64 bit. That PostgreSQL serves PostGIS with a 32 bit environment. In that case, shouldn't it probably be something on the PostgreSQL side. If so, shouldn't it affect also other postgis native functions like length and distance.

I suspect that ST_GeomFromEWKT might trig something because otherwise it should have been catched it on the regression tests.

Since the result is the same and not random it is tempting to assume that something is happening after the answer is given in area function. If something happened before with the incoming values I guess it shouldn't give the same answer with different ingoing values.

/Nicklas

comment:6 by nicklas, 14 years ago

I get the same error at two different 32 bit windows system So 32/64 bit theory seems to go away.

/Nicklas

comment:7 by nicklas, 14 years ago

Something is happening in the last iteration through the points in lwgeom_polygon_area. If I skip the last iteration I instead get :

71247.6015625 136792.8515625 353203587.458984 149338001.519531 477.24609375 52089.759765625 3142129.515625 650966177.738281 324352.900390625 45459937.875

I can also see that the area is accumulating as expected (I assume as expected) all the way to the last iteration. The last point givesdifferent values also in the last iteration but when added to the previous summarized area then it ends strange.

/Nicklas

comment:8 by nicklas, 14 years ago

I'm sorry. I think I talk rubbish.

But it is very strange. I calculated the two first polygons from above in a spreadsheet with the algoritm in the area function and I get to the result 0.001953125 !!!!

And I get the same result for each step as the function.

I think this is about rounding and extream polygons. The number of decimals in the coordinates is only 3 or 4 and the polygons seems to be very long and thin triangels. I couldn't get QGIS to draw anything else than a line for one of them.

Can that be the answer? The precision of the koordinates is not enough to get a good answer from a polygon like this. Since it is so long and thin a very little different in coordinates mekes a very big difference in area if it affects the triangel to be thinner or fatter.

A note I calculated the answer in the spreadsheet in an environment that gave the "right" answer in PostGIS and had the wrong answer in the speadsheet.

So, I don't think it is a bug, it is just about floating point differnces in systems showing with extream plygons.

Rubbish again???

/Nicklas

comment:9 by pramsey, 14 years ago

Some of the polygons are very "not polygon like". They are extremely flat triangles. Which is why I suspected failures due to precision issues (the vertices being snapped to a regular lattice imposed by the precision of the representation) However, the triangle in my example

'POLYGON ((3498246.2563 5815493.2389, 3498246.2472 5815493.1259, 3498246.2094 5815493.1202, 3498246.2563 5815493.2389))'

has a normal shape. See what your manual routine returns for it? With a double we have 52 bits of precision, which is enough space to store 15 digits of information, which is more than we have here… The reason I wonder if we're being coerced into single precision is that there are only 23 bits of precision there, so a max untruncated number of 8388608, so most of the digits after the decimal point in these examples would be lost. This also fits with the idea that even large polygons are showing similar quantization effects — if the vertices are being pushed into a single precision grid.

comment:10 by nicklas, 14 years ago

I get 0.001953125000000 from the spreadsheet.

I add the spreadsheet

/Nicklas

by nicklas, 14 years ago

Attachment: areacalc.ods added

area calculation in spreadsheet for example in comment 9

comment:11 by blaggner, 14 years ago

Nicklas, I suppose OpenOffice isn't very useful for calculating such terms. I used another formula, not based on the cross product of the vectors but on the trapezes formed by 2 consecutive points and the x-axis, and - depending on if I have eliminated the brackets in my formula or not - I get your result in one case and a different result in the other. My KCalc (internal calculator from OpenSuse) and PostgreSQL itself are getting slightly different results from what OpenOffice gets. And, if I'm calculating the area of my triangle-shaped geometries directly in PostgreSQL using this formula I found, then I get areas that look very reasonable to me (values are corresponding with perimeter). Interesting is, if I am using your formula for the direct calculation in PostgreSQL, I get again 0.001953125. So, something might be problematic with that formula or its calcluation in PostgreSQL, too.

Should I add examples for my calculation experiments?

comment:12 by nicklas, 14 years ago

I think I have found something. I can see the same calculation gets different results from inside postgresql/postgis than if I run it in a standalone c program in the same environment.

I tried with the two first coordinates in Paul's example above: In the area function the calculation is: ringarea += ( p1.x * p2.y ) - ( p1.y * p2.x );

I put a lwnotice just before this calculation that looks like: lwnotice("ringarea before:%f, first part:%f, second part:%f, all:%f\n", ringarea, ( p1.x * p2.y ), ( p1.y * p2.x ), (( p1.x * p2.y ) - ( p1.y * p2.x )));

in Ubuntu (9.10 32 bit) this gives: NOTICE: ringarea before:0.000000, first part:20344027056218.062500, second part:20344027398598.898438, all:-342380.838655

in windows (XP 32 bit on the same machine) this gives: NOTICE: ringarea before:0.000000, first part:20344027056218.062000, second part:20344027398598.898000, all:-342380.835938

Then I tried to run it in windows from a program like this:

#include <stdio.h>

int 
main(){
double x1 = 3498246.2563;
double x2 = 5815493.2389;
double y1 = 3498246.2472;
double y2 = 5815493.1259;
  printf("first part=%f, second part = %f, all=%f\n", ( x1 * y2 ),( y1 * x2 ),(( x1 * y2 ) - ( y1 * x2 ))  );
}

first part=20344027056218.062000, second part = 20344027398598.898000, all=-342380.838655

So here I get exactly the same result in the end like ubuntu.

Then I did put the same code in the end of the area function inside postgis and out putted through lwnotice.

.........
		poly_area += ringarea;
	}
/*My extra code*/
double x1 = 3498246.2563;
double x2 = 5815493.2389;
double y1 = 3498246.2472;
double y2 = 5815493.1259;
  lwnotice("first part=%f, second part = %f, all=%f\n", ( x1 * y2 ),( y1 * x2 ),(( x1 * y2 ) - ( y1 * x2 ))  );

/*Back to original*/
	return poly_area;
}

Then I had the wrong result again : NOTICE: first part=20344027056218.062000, second part = 20344027398598.898000, all=-342380.835938

So something is different inside postgresql/postgis than outside. Can the truncation in the first and second part affect or is that only a limitation in the output.

/Nicklas

comment:13 by mcayland, 14 years ago

I wonder if this is a side-effect of now compiling with -ffloat-store. Nicklas, can you please try the following:

  • "make clean" in your 1.5 source tree
  • Re-run configure
  • Remove -ffloat-store from NUMERIC_FLAGS in liblwgeom/Makefile
  • "make install"

Then re-run your PostGIS test again and see if it makes a difference.

Mark.

comment:14 by nicklas, 14 years ago

Mark

I commented away the whole row since that was the only of NUMERIC_FLAGS, but I see no difference. Same problem in windows.

Fromwhat I have understood we have this problem on windows 32 bit and linux 64 bit. Has anyone had the problem on a linux 32 bit system?

Another interesting note, if I run the calculation directly in postgresql like this:

select ( x1 * y2) - ( y1 * x2 ) from
(select 
3498246.2563 as x1,
5815493.2389 as x2,
3498246.2472 as y1,
5815493.1259 as y2) a

then I get exactly the same answer (-342380.83848791) on both platforms. Not exactly the same as inside postgis in any of the cases but quite close to the answer from ubuntu 32 bit which seems to give the right area.

I guess that makes it more likely that the problem is isolated to postgis.

/Nicklas

comment:15 by robe, 13 years ago

Milestone: PostGIS 1.5.3PostGIS 2.0.0

comment:16 by pramsey, 12 years ago

Resolution: fixed
Status: newclosed

On my 64-bit OSX machine, I'm getting correct results now. We did have a patch for better precision in area calculations applied (#810) to trunk and 1.5 and 1.4. I'm closing until I hear otherwise.

Note: See TracTickets for help on using tickets.