Opened 3 years ago

Closed 3 years ago

#3609 closed defect (fixed)

debbie crashing on PostgreSQL 9.6 builds

Reported by: robe Owned by: robe
Priority: blocker Milestone: PostGIS PostgreSQL
Component: build/upgrade/install Version: trunk
Keywords: postgresql 9.6 Cc:

Description

I thought this was just a fluke, but its been happening for past couple of runs. Given that winnie is doing okay and she builds for PostgreSQL 9.6beta3, I'm suspecting its something in PostgreSQL 9.6 that changed recently since debbie builds against PostgreSQL 9.6 head.

-- last fail is this https://debbie.postgis.net/job/PostGIS_Regress/4232/consoleFull

CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE TYPE
CREATE FUNCTION
CREATE CAST
CREATE FUNCTION
CREATE FUNCTION
CREATE CAST
CREATE CAST
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
psql:/var/lib/jenkins/workspace/postgis/branches/2.3/regress/00-regress-install/share/contrib/postgis/postgis.sql:5636: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
psql:/var/lib/jenkins/workspace/postgis/branches/2.3/regress/00-regress-install/share/contrib/postgis/postgis.sql:5636: connection to server was lost
-----------------------------------------------------------------------------

Change History (10)

comment:1 Changed 3 years ago by robe

Further information looking at the failures, started failing after these changes in PostgreSQL 9.6 repo:

    Add missing casts in information schema (Commit 6a9e09c49e1405c47b0870de73fec5748302f92d by peter_e

Add missing casts in information schema 
From: Clément Prévost)

Do not let PostmasterContext survive into background workers.
( Commit ef1b5af82339a49564037be656a3ff657fb2a246 by Tom Lane)



    Make INSERT-from-multiple-VALUES-rows handle targetlist indirection (Commit a3c7a993d5eb29df4d33075b83c75ae25f257897 by Tom Lane)

    Prevent "snapshot too old" from trying to return pruned TOAST tuples. (Commit 3e2f3c2e423b3ae906668c186bac79522b8e3e29 by rhaas)

    doc: Move indexterms to avoid whitespace issue in man pages (Commit 81568a971f2634bc447af2788eafee899f2db2a1 by peter_e)

The last one we can rule out since it's just a doc change.

Version 0, edited 3 years ago by robe (next)

comment:2 Changed 3 years ago by robe

Component: buildbotsbuild/upgrade/install
Keywords: postgresql 9.6 added
Owner: changed from robe to strk

comment:3 Changed 3 years ago by robe

Owner: changed from strk to robe

comment:4 Changed 3 years ago by robe

I built latest postgresql git branch under mingw and was also able to achieve a crash.

Log shows this:

DETAIL:  Failed process was running: CREATE OR REPLACE VIEW geography_columns AS
		SELECT
			current_database() AS f_table_catalog, 
			n.nspname AS f_table_schema, 
			c.relname AS f_table_name, 
			a.attname AS f_geography_column,
			postgis_typmod_dims(a.atttypmod) AS coord_dimension,
			postgis_typmod_srid(a.atttypmod) AS srid,
			postgis_typmod_type(a.atttypmod) AS type
		FROM 
			pg_class c, 
			pg_attribute a, 
			pg_type t, 
			pg_namespace n
		WHERE t.typname = 'geography'
	        AND a.attisdropped = false
	        AND a.atttypid = t.oid
	        AND a.attrelid = c.oid
	        AND c.relnamespace = n.oid
	        AND NOT pg_is_other_temp_schema(c.relnamespace)
	        AND has_table_privilege( c.oid, 'SELECT'::text );
HINT:  See C include file "ntstatus.h" for a description of the hexadecimal value.
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

What's interesting is that the crash doesn't happen if installing via extension. Only old fashioned make check triggers this

-- so this works
make check RUNTESTFLAGS=--extension

-- so this crashes
make check


I think the culprit might be:


{{{
Prevent "snapshot too old" from trying to return pruned TOAST tuples. (Commit 3e2f3c2e423b3ae906668c186bac79522b8e3e29 by rhaas)
}}}

Because the earlier likely cause Tom Lane's commit (run that included that was successful).  Might have been a fluke which is why I haven't ruled that out.

comment:5 Changed 3 years ago by robe

backtrace of error:

Reading symbols from C:\ming64gcc48\projects\postgresql\rel\pg9.6w64gcc48\bin\postgres.exe...done.
0x00000000777cae11 in ntdll!DbgBreakPoint ()
   from C:\Windows\SYSTEM32\ntdll.dll
(gdb) cont
Continuing.
[Thread 27196.0x7f9c exited with code 0]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 27196.0x7198]
GetOldestSnapshot () at snapmgr.c:422
422                     return OldestActiveSnapshot->as_snap;
(gdb) bt
#0  GetOldestSnapshot () at snapmgr.c:422
#1  0x000000000044efdd in init_toast_snapshot (toast_snapshot=0x299e9b0)
    at tuptoaster.c:2314
#2  0x000000000044f123 in toast_fetch_datum (attr=<optimized out>)
    at tuptoaster.c:1869
#3  0x00000000004508c6 in heap_tuple_untoast_attr (attr=0x59c04ca)
    at tuptoaster.c:179
#4  0x00000000007bc9da in pg_detoast_datum_packed (datum=<optimized out>)
    at fmgr.c:2266
#5  0x0000000000790620 in text_to_cstring (t=0x59c04ca) at varlena.c:186
#6  0x00000000007a3a95 in RelationBuildRuleLock (
    relation=relation@entry=0xe7dcd48) at relcache.c:732
#7  0x00000000007a621a in RelationBuildDesc (
    targetRelId=targetRelId@entry=58124, insertIt=insertIt@entry=0 '\000')
    at relcache.c:1035
#8  0x00000000007a6625 in RelationClearRelation (
    relation=relation@entry=0xe7dc4d0, rebuild=rebuild@entry=1 '\001')
    at relcache.c:2218
#9  0x00000000007a70b8 in RelationFlushRelation (relation=0xe7dc4d0)
    at relcache.c:2330
#10 RelationCacheInvalidateEntry (relationId=58124) at relcache.c:2392
#11 0x00000000007a075e in LocalExecuteInvalidationMessage (
    msg=msg@entry=0xe707990) at inval.c:568
#12 0x00000000007a0876 in ProcessInvalidationMessages (hdr=0x4e13088,
    func=0x7a0650 <LocalExecuteInvalidationMessage>) at inval.c:444
#13 0x00000000007a08a9 in CommandEndInvalidationMessages () at inval.c:1056
#14 0x0000000000489486 in AtCCI_LocalCache () at xact.c:1377
#15 CommandCounterIncrement () at xact.c:956
#16 0x00000000006b0d22 in exec_simple_query (
    query_string=0x100000000000000 <error: Cannot access memory at address 0x100000000000000>) at postgres.c:1133
#17 PostgresMain (argc=<optimized out>, argv=argv@entry=0x2495c8,
    dbname=0x18001700160015 <error: Cannot access memory at address 0x18001700160015>, username=<optimized out>) at postgres.c:4074
#18 0x000000000064794d in BackendRun (port=0x299f400) at postmaster.c:4262
#19 SubPostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2b7e80)
    at postmaster.c:4752
#20 0x0000000000803ac8 in main (argc=3, argv=0x2b7e80) at main.c:205

comment:7 Changed 3 years ago by robe

Milestone: PostGIS 2.3.0PostGIS PostgreSQL
Resolution: fixed
Status: newclosed

Tom Lane reports should be fixed now. Will rerun to confirm and reopen if it isn't.

comment:8 Changed 3 years ago by robe

well it's crashing somewhere else now so I think it's a different issue though the original might have been fixed now it's segfaulting everywhere particularly bitmap index scan

https://debbie.postgis.net/job/PostGIS_Regress/4242/consoleFull

-225|BOX(-6 1,3 8)
-226|t
+Segmentation fault
-----------------------------------------------------------------------------
 regress_bdpoly .. ok 
 regress_index .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_3_pg9.6w64/test_60_diff)
-----------------------------------------------------------------------------
--- regress_index_expected	2016-05-11 14:35:41.335696873 +0000
+++ /var/lib/jenkins/workspace/postgis/tmp/2_3_pg9.6w64/test_60_out	2016-08-08 21:28:51.192302858 +0000
@@ -1,16 +1 @@
-scan_idx|Seq Scan
-2594|POINT(130.504303 126.53112)
-3618|POINT(130.447205 131.655289)
-7245|POINT(128.10466 130.94133)
-scan_seq|Index Scan
-2594|POINT(130.504303 126.53112)
-3618|POINT(130.447205 131.655289)
-7245|POINT(128.10466 130.94133)
-3+=5:true
-924+=60:true
-12621+=500:true
-50000+=600:true
-expr|3+=5:true
-expr|924+=60:true
-expr|12621+=500:true
-expr|50000+=600:true
+Segmentation fault
-----------------------------------------------------------------------------
 regress_index_nulls .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_3_pg9.6w64/test_61_diff)
-----------------------------------------------------------------------------
--- regress_index_nulls_expected	2016-05-11 14:35:41.475696901 +0000
+++ /var/lib/jenkins/workspace/postgis/tmp/2_3_pg9.6w64/test_61_out	2016-08-08 21:28:51.212302024 +0000
@@ -1,4 +1 @@
-NOTICE:  table "indexnulls" does not exist, skipping
-NOTICE:  table "indexnulls" does not exist, skipping
-NOTICE:  table "indexempty" does not exist, skipping
-NOTICE:  table "indexempty" does not exist, skipping
+Segmentation fault
-----------------------------------------------------------------------------
 regress_management .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_3_pg9.6w64/test_62_diff)
-----------------------------------------------------------------------------
--- regress_management_expected	2016-05-11 14:35:40.723696753 +0000
+++ /var/lib/jenkins/workspace/postgis/tmp/2_3_pg9.6w64/test_62_out	2016-08-08 21:28:51.228301357 +0000
@@ -1,3 +1 @@
-1
-The result: public.test_pt dropped.
-Unexistant: public.unexistent dropped.
+Segmentation fault


Even build of PostgreSQL 9.6 is segfaulting.

comment:9 Changed 3 years ago by robe

Resolution: fixed
Status: closedreopened

comment:10 Changed 3 years ago by robe

Resolution: fixed
Status: reopenedclosed

purging the source folder and repulling from git.postgresql.org seems to have fixed this issue.

Note: See TracTickets for help on using tickets.