Opened 11 years ago

Closed 11 years ago

#703 closed defect (fixed)

measures test failure

Reported by: strk Owned by: pramsey
Priority: medium Milestone: PostGIS 2.0.0
Component: postgis Version: master
Keywords: Cc:

Description

32bit ubuntu lucid system. measure test:

-distancetest2|0|50|LINESTRING(0 0,0 0)|LINESTRING(0 0,0 0)|LINESTRING(-40 -20,-10 20)|LINESTRING(-10 20,-40 -20)
+distancetest2|0|50|LINESTRING(-1.33226762955019e-15 -6.66133814775094e-16,-1.33226762955019e-15 -6.66133814775094e-16)|LINESTRING(-3.33066907387547e-16 6.66133814775094e-16,-3.33066907387547e-16 6.66133814775094e-16)|LINESTRING(-40 -20,-10 20)|LINESTRING(-10 20,-40 -20)

Sunds more than just precision.. DB is 8.4.5

Attachments (1)

test.c (591 bytes) - added by nicklas 11 years ago.

Download all attachments as: .zip

Change History (20)

comment:1 Changed 11 years ago by strk

Note that on 64bit systems (same distribution) works fine.

comment:2 Changed 11 years ago by nicklas

Hallo

Yes, I have seen this too. I think it is just a precision issue. As you say it doesn't show on linux 64 bit, and I have also found that I don't get it on windows.

I have thought about how to handle it (if it just is a precision issue)

The problem is that when I apply st_snaptogrid the return is null because the zero-length line is not valid and the second repeated point is removed.

So this falls back a little to the question if the design is right to deliberate output an invalid geometry. SQL-server choosed not to and instead return NULL from the function in their new ShortestLineTo?.

But if we should keep this (I like it this way) and we should test this "special case" which I think we should, then I guess we have to split the answer in ST_Startpoint and ST_Endpoint and put ST_Snapto grid on those.

Does that sound ok?

Or first, of course. Is it just a precision problem? The size of the error I think points that way. The error is quite, very, very small.

What is also interesting I think is that we had the same division with linux 32 bit giving one answer and linux 64 and windows giving another. But in that case ubuntu 32 seemed more accurate. That is described with a lot of messy postings from myself in #503. There it seems like ubuntu 64 and windows "rounded" their answers to multiples of 2 raised to ome small number I don't remember. Can that be the same thing happening here, that the linux 32 bit answer actually is the right answer including a floating point error and that ubuntu 64 and windwso hides this hides this very small error by some rounding mechanism.

I am just brainstorming, but I find it a little peculiar that we see the difference between linux 64/windows and linux 32 again.

/Nicklas

comment:3 Changed 11 years ago by strk

Sounds fine to me splitting in start/end point. In that case you also want to check you have 2 points in total and that the type is a LINESTRING.

comment:4 Changed 11 years ago by mcayland

Nicklas,

Could it be that you have a robustness issue in your distance algorithm? I remember experiencing similar problems when a fixing bugs in past, related to bad use of floating point arithmetic when the values are close to each other.

Would you be able to step through this query on a 32-bit machine to find out how the error is being introduced?

comment:5 Changed 11 years ago by nicklas

Yes, I will take a look at that.

comment:6 Changed 11 years ago by nicklas

Ok, here is what I have found so far

I have compared a 64 bit ubuntu 10.4 installation with a 32 bit 10.10 installation on different hardware.

Both is PostgreSQL 9.01

The line numbers refere to revision 6238

the query I have used to compare is.

select  st_astext(st_shortestline(a,b)) from (
	select geomfromtext('LINESTRING(-40 -20 ,10 5)') as a,
		geomfromtext('LINESTRING(-10 20, 1 -2)') as b
	) as foo;

Both systems seems to store a slightly wrong value in variable r on line 882 in measures.c r=8.0000000000000004e-1 instead of 0.8

Then, the difference between the systems occur on line 914 and 915 On the 32 bit system A->x+r*(B->x-A->x) gives 2.22044607925031308e-15 and the 64 bit system gives 0.00000000000000000e+00

A->x and B->x seems to be identical.

So both systems get a floating point error when saving variable r but on 64 bit system that error doesn't seem to be reflected in the result. It disappears in the next calculation.

To me it seems like 2.22044607925031308e-15 is to small to handle in one system but possible to handle in the other. Or is it just rounded by some other reason.

I will try to do the same outside postgis/postgresql in a standalone program to see if the differnce seems to be in the system itself. But I have no time right now.

/Nicklas

comment:7 Changed 11 years ago by nicklas

I see the same difference in a standalone c-program.

compile and run the attached file test.c

I get different result on the 32 bit system and the 64 bit system.

The values I use in test.c corresponds to the query in the post above.

I have not tried in windows (32 bit) but I expect it to behave like 64 bit linux from what I have seen earlier.

My results is:

Ubuntu 64 bit 10.4:
r_top=-1.1000000000000000e+03
r_bot=-1.3750000000000000e+03
r=8.0000000000000004e-01

px=0.0000000000000000e+00
py=0.0000000000000000e+00

Ubuntu 32 bit 10.10:
r_top=-1.1000000000000000e+03
r_bot=-1.3750000000000000e+03
r=8.0000000000000004e-01

px=2.2204460492503131e-15
py=1.1102230246251565e-15

Little strange. I think the mystery behind #503 is in the same neighbourhood.

/Nicklas

Changed 11 years ago by nicklas

Attachment: test.c added

comment:8 Changed 11 years ago by robe

FWIW: All measurement regression tests pass on windows 32-bit against PostgreSQL 9.0. But as noted in #705 it fails my crash test. Not sure if those 2 are related or not.

comment:9 Changed 11 years ago by mcayland

Uggg - that looks nasty :(

Does storing (A->x - B->x) in an intermediate variable help? Otherwise perhaps try compiling with the -ffloat-store flag to see if it's related to floating point memory/register transition on x86.

comment:10 Changed 11 years ago by nicklas

An even more distilled version is to just compile:

#include <stdio.h>

main()
{
double r=1100.0/1375.0;
double px = -40.0+r*(50.0);
printf("r=%.16e\npx=%.16e\n",r, px);
}

gives me on ubuntu 64 bit (and probably win 32 bit):
r=8.0000000000000004e-01
px=2.2204460492503131e-15

and on ubuntu 32 bit:
r=8.0000000000000004e-01
px=0.0000000000000000e+00

Regina, I don't think they are related because this issue, I have known some time but have not had time to look at and thought it is just one of those precision errors, and you say it didn't crash last friday.

/Nicklas

comment:11 Changed 11 years ago by nicklas

Mark

compiling like:
gcc test2.c -ffloat-store

seems to affect this: On the 32 bit system the result is then
r=8.0000000000000004e-01
px=0.0000000000000000e+00

In other words the same as on the 64 bit system.

What does that mean?

/Nicklas

comment:12 Changed 11 years ago by mcayland

Looks like you're being caught by the difference in 80-bit hardware floating point and software 64-bit floating point - since the hardware (i.e. registers) hold more precision then you can get different results depending upon whether you're working with a number in memory or in a register.

Google -ffloat-store for more information, plus also strk has some experience with this in GEOS.

comment:13 Changed 11 years ago by nicklas

Ok, I will try to investigate some.

Is this error different in any way from all the other precision errors we have where we use ST_Snaptogrid in the regression tests?

Or might a solution to this also solve the other cases?

/Nicklas

comment:14 Changed 11 years ago by strk

I think a sane rounding is a good idea to put in place in any case.

comment:15 Changed 11 years ago by nicklas

Yes, nasty I confirm.

I tried with intermediate variable and it doesn't help. How come? I don't understand it.

The row that makes difference between the systems is:

theP.x = A->x+r*(B->x-A->x);

A->x holds the value -40.0 and r*(B->x-A->x) is supposed to be 40.0

So I putted r*(B->x-A->x) in variable t so I then tried theP.x = A->x+t;

that gives the same error. So I substituted A->x with -40.0 => same error.

So I substitute t with 40.0 and the error disappears.

So t holds a value not equal to 40.0 but if I inspect the value like lwnotice("t=%.17e",t); I get just 4.00000000000000000+01

Another interesting thing is that using an intermediate variable removes the error in the stand alone version like:

#include <stdio.h>

main()
{
double r=1100.0/1375.0;
double t=r*(50);
double px = -40.0+r;
printf("r=%.16e\npx=%.16e\n",r, px);
}

So outside postgresql/postgis it works to put the value in an intermediate variable to trunkate it. But not inside Postgis. Why? How come? And where is the extra bytes saved inside postgis, when I have saved it to a double?

Any hint is appreciated.

Thanks Nicklas

comment:16 Changed 11 years ago by nicklas

Hmm, I jst read some more from #503 and saw that you, Mark, mentioned that we actually compile with -ffloat-store flag. Mark, why did you then ask me to try with that flag?

I ask because in my example above when I ran the problem part of the code outside postgis and postgresql, the -ffloat-store flag solved the problem.

But why doesn't it solve the problem inside postgis?

Do we, or do we not use -ffloat-store flag? I see it gets there correctly (as I understand it) in the makefile when I run ./configure, but it doesn't solve the problem.

And again, if we already use it, why did you want me to test it again, Mark. It is something here I don't understand.

/Nicklas

comment:17 Changed 11 years ago by pramsey

Nik, you'll note it only gets passed in the compilation of lwspheroid.c, since it has some performance drawbacks in restricting the kinds of code the compiler generates we only want to use it where we "have" to. In my case I needed it to settle down some precision issues with the spherical code, which were causing incorrect answers. I think that unless the precision issues you are getting are making things actually *wrong* (ie, getting a false where you should get a true) you should just ignore the precision issues and wrap the regression tests in some light rounding.

comment:18 Changed 11 years ago by nicklas

Ok, now it's camouflaged in r6543.

/Nicklas

comment:19 Changed 11 years ago by nicklas

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.