Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#3220 closed defect (fixed)

Mingw failure on cluster test

Reported by: robe Owned by: pramsey
Priority: blocker Milestone: PostGIS 2.2.0
Component: postgis Version: master
Keywords: Cc: pramsey, nicklas, dbaston

Description

I'm getting this failure only under mingw, for some reason regress testing against VC++ doesn't error out (well last I tried it didn't anyway, I'll retry to confirm), which explains why winnie is probably not complaining about this.

It looks like it might be just some sort of ordering difference since all the values are there, but in my mingw it comes in as a single geometry collection and regress expected looks like two geometry collections with polygon:

GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))

coming in via a separate geometry collection. Anyway I have to upgrade to 9.4.4 anyway. Though my 9.5dev fails in same spot I think. I also notice I am running a slightly older GEOS 3.5 than winnie so could be caused by that too. Will close this out if that is the case.

PostgreSQL 9.4.1 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit
  Postgis 2.2.0dev - r13860 - 2015-07-30 23:48:27
  scripts 2.2.0dev r13860
  GEOS: 3.5.0dev-CAPI-1.9.0 r4034
  PROJ: Rel. 4.9.1, 04 March 2015
  SFCGAL: 1.1.0

--- cluster_expected	2015-07-30 14:50:49 -0400
+++ /projects/postgis/tmp/2.2_pg9.4w64/test_31_out	2015-07-30 18:37:21 -0400
@@ -4,8 +4,7 @@
 t2|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))
 t2|GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))
 t2|GEOMETRYCOLLECTION(POLYGON EMPTY)
-t3|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))
-t3|GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))
+t3|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),LINESTRING(6 6,7 7),POLYGON((0 0,4 0,4 4,0 4,0 0)))
 t3|GEOMETRYCOLLECTION(POLYGON EMPTY)
 t4|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),LINESTRING(6 6,7 7),POLYGON((0 0,4 0,4 4,0 4,0 0)))
 t4|GEOMETRYCOLLECTION(POLYGON EMPTY)

Change History (24)

comment:1 by robe, 9 years ago

nope same failure with

PostgreSQL 9.4.1 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit
  Postgis 2.2.0dev - r13860 - 2015-07-30 23:57:58
  scripts 2.2.0dev r13860
  GEOS: 3.5.0dev-CAPI-1.9.0 r4064
  PROJ: Rel. 4.9.1, 04 March 2015
  SFCGAL: 1.1.0

So older GEOS does not seem to be the issue.

comment:2 by strk, 9 years ago

could it be due to unpredictable sorting order in the aggregate ? does the test contain an ORDER BY predicate for it ?

comment:3 by robe, 9 years ago

there is an order by. Test looks like this:

CREATE TEMPORARY TABLE cluster_inputs (id int, geom geometry);
INSERT INTO cluster_inputs VALUES
(1, 'LINESTRING (0 0, 1 1)'),
(2, 'LINESTRING (5 5, 4 4)'),
(3, NULL),
(4, 'LINESTRING (0 0, -1 -1)'),
(5, 'LINESTRING (6 6, 7 7)'),
(6, 'POLYGON EMPTY'),
(7, 'POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))');

SELECT 't1', ST_AsText(unnest(ST_ClusterIntersecting(geom ORDER BY id))) FROM cluster_inputs;
SELECT 't2', ST_AsText(unnest(ST_ClusterIntersecting(ST_Accum(geom ORDER BY id)))) FROM cluster_inputs;
SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.4 ORDER BY id))) FROM cluster_inputs;
SELECT 't4', ST_AsText(unnest(ST_ClusterWithin(ST_Accum(geom ORDER BY id), 1.5))) FROM cluster_inputs;

comment:4 by robe, 9 years ago

I'm going to upgrade to 9.4.4 and see if the issue goes away. Might be because my mingw 9.4 is older version than my EDB. If it is I'm not going to worry about it.

comment:5 by robe, 9 years ago

nope same issue with

PostgreSQL 9.4.4 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit
  Postgis 2.2.0dev - r13860 - 2015-07-31 14:53:41
  scripts 2.2.0dev r13860
  GEOS: 3.5.0dev-CAPI-1.9.0 r4064
  PROJ: Rel. 4.9.1, 04 March 2015

comment:6 by pramsey, 9 years ago

No, your answer is wrong, it's not an ordering problem, it's a "getting it wrong" problem, but why your platform, that I dunno.

select st_distance(
  'GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))',
  'GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))'
);
   st_distance   
-----------------
 1.4142135623731
(1 row)

The failing test is a "cluster within" with a tolerance of 1.4, so the two elements fall outside the tolerance and should be in different clusters, as the platforms that succeed (all of them but one) return.

comment:7 by robe, 9 years ago

Nope that one returns the same answer for me on my mingw64 that is failing this test:

test_address=# select version() || ' ' || postgis_full_version();
NOTICE:  Function postgis_topology_scripts_installed() not found. Is topology support enabled and topology.sql installed?
                                                                                                                                               ?column?
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.4 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit POSTGIS="2.2.0dev r13905" GEOS="3.5.0dev
(1 row)


test_address=#
test_address=# select st_distance(
test_address(#   'GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))',
test_address(#   'GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))'
test_address(# );
   st_distance
-----------------
 1.4142135623731
(1 row)

BTW I still get this annoying topology script not installed notice every once in a while. I think that happens if I had postgis_topology installed and then later uninstall it. I'll ticket that one separately if I can replicate.

Reran the tests and still fails on cluster:

Loading PostGIS into 'postgis_reg'
PostgreSQL 9.4.4 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit
  Postgis 2.2.0dev - r13906 - 2015-08-15 06:08:55
  scripts 2.2.0dev r13906
  GEOS: 3.5.0dev-CAPI-1.9.0 r4072
  PROJ: Rel. 4.9.1, 04 March 2015

Checked still screwed:

SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.39 ORDER BY id))) FROM cluster_inputs;


test_address=# SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.39 ORDER BY id))) FROM cluster_inputs;
 ?column? |                                                              st_astext
----------+--------------------------------------------------------------------------------------------------------------------------------------
 t3       | GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),LINESTRING(6 6,7 7),POLYGON((0 0,4 0,4 4,0 4,0 0)))
 t3       | GEOMETRYCOLLECTION(POLYGON EMPTY)
(2 rows)

If I'm the only one experiencing this issue, and it's not a symptom of a larger issue, I'm okay with closing this as won't fix. This behavior only exhibits when I am testing my 64-bit mingw binaries against my 64-bit mingw compiled PostgreSQL. It doesn't happen if I use the binaries in a VC++ compiled EDB PostgreSQL, which is my production use and what I target.

pramsey maybe I can see what it's doing if I inject some debug clauses. Where is the best place to put that?

Last edited 9 years ago by robe (previous) (diff)

comment:8 by dbaston, 9 years ago

Sorry robe, I just saw this…. The first thing I would check is whether the correct tolerance distance is making it to where it's needed at line 164 of lwgeom_geos_cluster.c.

I would work on this myself if I had a Windows build environment set up….you don't have anything I could RDP into, do you?

comment:9 by robe, 9 years ago

dbaston,

I'll check this out. Sadly no this issue only happens on windows desktop when testing against mingw PostgreSQL. On winnie I have her testing against VC++ builds, where this issue doesn't show, which is why I'm not so concerned if others aren't experiencing it. I'll checkout that line you mentioned and see what's happening there (after I get other tickets squared away).

comment:10 by robe, 9 years ago

dbaston,

Okay I put in a notice just before that line to show the mindist and cxt→tolerance values and your suspicions seem correct, it's not getting into the union loop.

For the test that is failing t3, this is what I see when I run against my mingw64 9.4 postgresql

NOTICE:  mindist: 0, cxt tolerance: 1.38242e+306
NOTICE:  mindist: 0, cxt tolerance: 1.38242e+306
NOTICE:  mindist: 4.24264, cxt tolerance: 1.38242e+306
NOTICE:  mindist: 7.07107, cxt tolerance: 1.38242e+306

So it looks like the e+306 is getting in the way of teh mindist ⇐ cxt→tolerance computation.

Now for comparison, If I run the same binaries against my 64-bit VC++ 9.4 build, here is what I get

NOTICE:  mindist: 0, cxt tolerance: 1.4
NOTICE:  mindist: 0, cxt tolerance: 1.4
NOTICE:  mindist: 0, cxt tolerance: 1.4
NOTICE:  mindist: 1.41421, cxt tolerance: 1.4
NOTICE:  mindist: 1.41421, cxt tolerance: 1.4

So it seems the mindist ⇐ cxt→tolerance is being tripped up by the scientific notation representation of 1.38….

Last edited 9 years ago by robe (previous) (diff)

comment:11 by robe, 9 years ago

Cc: pramsey nicklas added

Okay for completeness, I put in a notice to trap whenever the union operation is being called:

Test:

CREATE TEMPORARY TABLE cluster_inputs (id int, geom geometry);
INSERT INTO cluster_inputs VALUES
(1, 'LINESTRING (0 0, 1 1)'),
(2, 'LINESTRING (5 5, 4 4)'),
(3, NULL),
(4, 'LINESTRING (0 0, -1 -1)'),
(5, 'LINESTRING (6 6, 7 7)'),
(6, 'POLYGON EMPTY'),
(7, 'POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))');

SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.4 ORDER BY id))) FROM cluster_inputs;

— failing mingw64 PostgreSQL using postgis mingw64 compiled binaries gives these notices

NOTICE:  mindist: 0, cxt tolerance: 1.38242e+306
NOTICE:  Performaing union
NOTICE:  mindist: 0, cxt tolerance: 1.38242e+306
NOTICE:  Performaing union
NOTICE:  mindist: 4.24264, cxt tolerance: 1.38242e+306
NOTICE:  Performaing union
NOTICE:  mindist: 7.07107, cxt tolerance: 1.38242e+306
NOTICE:  Performaing union

— passing VC++ build (against mingw64 compiled postgis binaries gives this

NOTICE:  mindist: 0, cxt tolerance: 1.4
NOTICE:  Performaing union
NOTICE:  mindist: 0, cxt tolerance: 1.4
NOTICE:  Performaing union
NOTICE:  mindist: 0, cxt tolerance: 1.4
NOTICE:  Performaing union
NOTICE:  mindist: 1.41421, cxt tolerance: 1.4
NOTICE:  mindist: 1.41421, cxt tolerance: 1.4

So the failing one always thinks 1.38242..+306 is bigger than mindist. I vaguely remember we having this problem before and pramsey or nicklas came up with a fix for it (some sort of macro as I recall). Maybe one of them can remember (added them to cc). I'll hunt in code base to see if I can bring that up.

comment:12 by dbaston, 9 years ago

Cc: dbaston added

OK, working backwards, next place I would check to see if the tolerance is correct is lwgeom_accum.c, line 133. If the value is correct here, then I would wonder if the Datum holding the tolerance is being freed before the value is actually read from it. Does this Datum need to be copied into a different MemoryContext?

comment:13 by dbaston, 9 years ago

Instead of passing p→data on lwgeom_accum.c, line 349, do we want to call datumCopy first?

comment:14 by pramsey, 9 years ago

I suppose trying is no harm… I wouldn't expect that memory to disappear or be over-written, as the execution isn't complete and the stack isn't going to be unwound on it… hm.

comment:15 by robe, 9 years ago

dbaston, pramsey - not sure how to get a read out from line 133. I tried doing this:

   POSTGIS_DEBUGF(2, "Datum value %g", PG_GETARG_DATUM(2));

and turning my postgresql clinet_min_messages to DEBUG2, but not seeing anything.

Last edited 9 years ago by robe (previous) (diff)

comment:16 by dbaston, 9 years ago

robe - I wonder if that's the problem - line 133 is never reached? Is PG_NARGS() not returning "3" ?

comment:17 by robe, 9 years ago

Hmm I thought I actually tried the binaries also on the VC++ instance which does give the right answer. Let me verify that (sometimes I get confused which instance I launched :) ). I'll also chech what PG_NARGS is returning.

comment:18 by robe, 9 years ago

finally got it - I think I just had to compile with enable-debug on.

this is what I get for the mingw postgres (which returns the wrong answer)

DEBUG:  [lwgeom_accum.c:pgis_geometry_accum_transfn:125] Number of args 3
DEBUG:  [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 4.02286e-316

This is for the vc++ postgresql (which returns the right answer)

DEBUG:  [lwgeom_accum.c:pgis_geometry_accum_transfn:125] Number of args 3
DEBUG:  [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 2.47697e-316

comment:19 by robe, 9 years ago

Okay on mingw after I change the debugf to

POSTGIS_DEBUGF(2, "Datum value %g", PG_GETARG_FLOAT8(2));

I get this:

DEBUG:  [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 1.4

on my vc++ postgres compiled I get this which is really odd since this is the one giving the right answer (and yet I get a tolerance error)

DEBUG:  [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 1.4


ERROR:  Tolerance not defined
********** Error **********

ERROR: Tolerance not defined
SQL state: XX000


comment:20 by robe, 9 years ago

Okeedoke - switching to:

POSTGIS_DEBUGF(2, "Datum value %g", DatumGetFloat8(p->data));

and putting that after the

p->data = PG_GETARG_DATUM(2);

Both give the 1.4 answer and VC++ no longer throws an error.

comment:21 by dbaston, 9 years ago

Looking through this more, I think that we need to copy the tolerance value into aggcontext. I'll put together a patch to do this.

comment:22 by dbaston, 9 years ago

OK robe, here is a patch that I hope may fix the issue:

https://patch-diff.githubusercontent.com/raw/postgis/postgis/pull/63.patch

comment:23 by robe, 9 years ago

Resolution: fixed
Status: newclosed

Yap this one does the trick. Committed at r14035. Though I forgot to retest on VC++, but winnie will do that.

comment:24 by robe, 9 years ago

vc++ still looks good too.

Note: See TracTickets for help on using tickets.