Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#3226 closed defect (fixed)

interrupt relate regress fails on equals

Reported by: robe Owned by: strk
Priority: medium Milestone: PostGIS 2.2.0
Component: postgis Version: master
Keywords: Cc:

Description

 interrupt_relate .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/regress_pgdev/tmp/2_2_pg9.5w64/test_105_diff)
-----------------------------------------------------------------------------
--- interrupt_relate_expected	2014-11-15 20:10:13.000000000 -0800
+++ /var/lib/jenkins/workspace/postgis/regress_pgdev/tmp/2_2_pg9.5w64/test_105_out	2015-07-31 16:49:09.000000000 -0700
@@ -6,7 +6,7 @@
 coveredby interrupted on time
 ERROR:  canceling statement due to statement timeout
 crosses interrupted on time
-ERROR:  canceling statement due to statement timeout
+t
 crosses interrupted on time
 ERROR:  canceling statement due to statement timeout
 crosses interrupted on time
-----------------------------------------------------------------------------

I accidentally left the interrupt tests enabled on the run that debbie does to test out PostgreSQL Git changes. Looks like something changed in the output (not sure what that extra t is for), that is causing the interrupt_relate to consistently fail.

This started failing July 31st consistently. Can't see anything we changed or that changed in postgresql that would cause this change so a bit puzzled.

the postgres commit that triggered the run was this:

http://debbie.postgis.net:8080/job/PG_Version_9.5_Git/77/changes#detail0

which corresponds to this:

https://github.com/postgres/postgres/commit/edf26ed033f18bddc9bfe5c239388330150766a1

but in that period (there were also a ton of PostGIS changes)

up thru r13865

Change History (9)

comment:1 by strk, 7 years ago

I'm actually seeing this in PostgreSQL 9.3.6 too, on ubuntu. There's something wrong going on with handling interrupts or geos exceptions. Sounds like a recent regression, can it be due to the bounding box short-cut ?

comment:2 by strk, 7 years ago

PS: better not disable this on bots as it might be actually revealing a real bug here (#3214)

comment:3 by strk, 7 years ago

The full environment of my failure:

PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
  Postgis 2.2.0dev - r13868 - 2015-07-30 09:53:31
  scripts 2.2.0dev r13868
  GEOS: 3.5.0dev-CAPI-1.9.0 r4059
  PROJ: Rel. 4.8.0, 6 March 2012

comment:4 by strk, 7 years ago

Summary: interrupt relate regress fails on PostgreSQL 9.5interrupt relate regress fails on equals

After fixing the test output labels with r13870 I can tell that the failure is with ST_Equals, and not the others.

comment:5 by strk, 7 years ago

Reverting r13864 seems to fix this (short-circuited ST_Equals). See #3223

comment:6 by strk, 7 years ago

Alright I see the problem is with the interruption timer kicking in _before_ GEOS is even invoked (ie: while the short-circuit is doing the memcmp operation).

We might change the test to make the short-circuit not enough to return, but still if the kicking happens before GEOS is involved then the test might not effectively test interruptibility of GEOS (in case we'll in the future add a vacuum_delay_point right after short-circuit test)

comment:7 by strk, 7 years ago

on a related note, this kicking before geos gets there means it takes more than 100ms to do the memcmp, for that "big" input (st_memsize=28832)

comment:8 by strk, 7 years ago

Resolution: fixed
Status: newclosed

With r13871 the short-circuit is avoided by st_reverse'ing one of the operands. Will call it enough for closing this.

comment:9 by robe, 7 years ago

I'll leave it on for the 9.5 change (git runs), and we can have it for the winnie git runs. But keep it off for the regular regress. It's the only way to keep peace between you and pramsey until you make the interrupt tests less false positivey :).

Note: See TracTickets for help on using tickets.