Opened 9 years ago

Closed 9 years ago

#2989 closed defect (fixed)

Debbie failing on interrupt test

Reported by: robe Owned by: strk
Priority: blocker Milestone: PostGIS 2.2.0
Component: postgis Version: master
Keywords: Cc:

Description

Though the failure is not consistent. For example her 9.2 was fine Same geos it seems

PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
  Postgis 2.2.0dev - r13122 - 2014-11-10 11:31:08
  scripts 2.2.0dev r13122
  GEOS: 3.4.3dev-CAPI-1.8.3 r4029
  PROJ: Rel. 4.7.1, 23 September 2009

but her 9.1 and 9.3 failed with below

PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
  Postgis 2.2.0dev - r13122 - 2014-11-10 11:37:15
  scripts 2.2.0dev r13122
  GEOS: 3.4.3dev-CAPI-1.8.3 r4029
  PROJ: Rel. 4.7.1, 23 September 2009

interrupt .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_2_pg9.3w64/test_63_diff)
-----------------------------------------------------------------------------
--- interrupt_expected	2014-11-10 02:30:33.000000000 -0800
+++ /var/lib/jenkins/workspace/postgis/tmp/2_2_pg9.3w64/test_63_out	2014-11-10 03:36:40.000000000 -0800
@@ -1,4 +1,4 @@
 NOTICE:  liblwgeom code interrupted
 ERROR:  canceling statement due to statement timeout
-segmentize interrupted on time
+segmentize interrupted late: 00:00:00.193152
 1|LINESTRING(0 0,2 0,4 0)
-----------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
  Postgis 2.2.0dev - r13122 - 2014-11-10 11:35:05
  scripts 2.2.0dev r13122
  GEOS: 3.4.3dev-CAPI-1.8.3 r4029
  PROJ: Rel. 4.7.1, 23 September 2009

 interrupt .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_2_pg9.3w64/test_63_diff)
-----------------------------------------------------------------------------
--- interrupt_expected	2014-11-10 02:30:33.000000000 -0800
+++ /var/lib/jenkins/workspace/postgis/tmp/2_2_pg9.3w64/test_63_out	2014-11-10 03:36:40.000000000 -0800
@@ -1,4 +1,4 @@
 NOTICE:  liblwgeom code interrupted
 ERROR:  canceling statement due to statement timeout
-segmentize interrupted on time
+segmentize interrupted late: 00:00:00.193152
 1|LINESTRING(0 0,2 0,4 0)
-----------------------------------------------------------------------------

Change History (12)

comment:1 by robe, 9 years ago

Owner: changed from pramsey to strk

comment:2 by robe, 9 years ago

winnie is failing too on interrupt. She's running against 3.5. Hers might be slightly out of date 3.5 since I think she was trying to run PostGIS and 3.5 build at same time. So I'll rerun hers after she is done rebuilding 3.5

FWIW her error on 11/6 was:

PostgreSQL 9.4beta3, compiled by Visual C++ build 1800, 64-bit
  Postgis 2.2.0dev
 - r13121
 - 2014-11-06 18:50:05
  scripts 2.2.0dev r13121
  GEOS: 3.5.0dev-CAPI-1.9.0 r4028
  PROJ: Rel. 4.8.0, 6 March 2012
  SFCGAL: 1.0.4

 interrupt .. failed (diff expected obtained: /projects/postgis/tmp/2.2.0dev_pg9.4_geos3.5.0dev_gdal1.11.1w64/test_63_diff)
-----------------------------------------------------------------------------
--- interrupt_expected	2014-11-06 13:48:16 -0500
+++ /projects/postgis/tmp/2.2.0dev_pg9.4_geos3.5.0dev_gdal1.11.1w64/test_63_out	2014-11-06 13:57:35 -0500
@@ -1,5 +1,5 @@
 ERROR:  canceling statement due to statement timeout
-buffer interrupted on time
+buffer interrupted late: 00:00:00.25
 NOTICE:  liblwgeom code interrupted
 ERROR:  canceling statement due to statement timeout
 segmentize interrupted on time
-----------------------------------------------------------------------------

comment:3 by robe, 9 years ago

and yet a hitrd permutation from debbie:

PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
  Postgis 2.2.0dev - r13122 - 2014-11-10 11:37:15
  scripts 2.2.0dev r13122
  GEOS: 3.4.3dev-CAPI-1.8.3 r4029
  PROJ: Rel. 4.7.1, 23 September 2009

interrupt_relate .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_2_pg9.1w64/test_98_diff)
-----------------------------------------------------------------------------
--- interrupt_relate_expected	2014-11-10 02:30:33.000000000 -0800
+++ /var/lib/jenkins/workspace/postgis/tmp/2_2_pg9.1w64/test_98_out	2014-11-10 03:39:34.000000000 -0800
@@ -1,5 +1,5 @@
 ERROR:  canceling statement due to statement timeout
-contains interrupted on time
+contains interrupted late: 00:00:00.209518
 ERROR:  canceling statement due to statement timeout
 covers interrupted on time
 ERROR:  canceling statement due to statement timeout
-----------------------------------------------------------------------------

comment:4 by strk, 9 years ago

Status: newassigned

These are all little timing issues, those bots evidently being slower than expected. It looks like tolerating 250ms would fix all cases reported so far. Maybe a better approach would be making the tolerance dependent on the time it takes for a reference call, like st_npoints. That way we should account for machine-specific speed (would still be possibly affected by machine _state_, but that's harder to control).

I'm on it.

comment:5 by strk, 9 years ago

I found all the cases had in common to be the first one running, while the timer started before creating the input tables. Resetting the timer right before execution of the test should be more stable. Committed as r13124. Please close if that's enough.

comment:6 by strk, 9 years ago

Sorry, r13124 will not pass make check due to unrelated change in r13123. Things should be fine with r13125.

comment:7 by robe, 9 years ago

Resolution: fixed
Status: assignedclosed

Debbie seems to be good. Winnie is still complaining but her current issue seems to have nothing to do with this (and she did run a successful run just before your very last commit). Rerunning hers to see if issue goes away, which I think it will. Closing this out for now.

comment:8 by robe, 9 years ago

Resolution: fixed
Status: closedreopened

comment:9 by robe, 9 years ago

strk sorry. Although debbie complains much less often she still complains about this every so often like in her GDAL Postgres runs and her PostgreSQL Dev weekly runs. In those she may be more taxed because she recompiles GDAL and recompiles PostgreSQL and then does the tests.

Here is what her 9.5 development run looked like:

 interrupt .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/regress_pgdev/tmp/2_2_pg9.5w64/test_63_diff)
-----------------------------------------------------------------------------
--- interrupt_expected	2014-11-15 20:10:13.000000000 -0800
+++ /var/lib/jenkins/workspace/postgis/regress_pgdev/tmp/2_2_pg9.5w64/test_63_out	2014-11-15 20:16:18.000000000 -0800
@@ -1,4 +1,4 @@
 NOTICE:  liblwgeom code interrupted
 ERROR:  canceling statement due to statement timeout
-segmentize interrupted on time
+segmentize interrupted late: 00:00:00.188626
 1|LINESTRING(0 0,2 0,4 0)
-----------------------------------------------------------------------------

comment:10 by strk, 9 years ago

argh, sounds like a machine-dependent timing might be needed, even if still not infallible (is that an english word?)

comment:11 by strk, 9 years ago

For now I raised the tolerated delay to 250ms. r13233 in trunk (2.2.0dev)

comment:12 by robe, 9 years ago

Resolution: fixed
Status: reopenedclosed

She still fails sometimes but much much less and usually only when she's doing some intensive multi-tasking so closing this out.

Note: See TracTickets for help on using tickets.