Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1640 closed defect (fixed)

Segfault on hard-upgrade

Reported by: strk Owned by: strk
Priority: blocker Milestone: PostGIS 2.0.0
Component: postgis Version: master
Keywords: Cc:

Description (last modified by strk)

I gave hard-upgrade a try. Target: 2.0.0beta1. Source: unknown (a beast incrementally upgraded with minor_upgrade).

Well, outcome has a couple of issues:

  1. Full of WARNING: SRID 0101010000031746 converted to 999888 (in reserved zone)
  2. psql log end with server closed the connection unexpectedly

I'm cheasing it

Attachments (2)

segfault_min.sql (1.1 KB ) - added by strk 12 years ago.
minimal SQL triggering the crash
crash.sh (306 bytes ) - added by strk 12 years ago.
Script to reproduce the crash — needs postgis-1.5 being installed too

Download all attachments as: .zip

Change History (22)

comment:1 by strk, 12 years ago

Description: modified (diff)

comment:2 by strk, 12 years ago

So SRID handling (thus the WARNING) was fixed with r9361. Segfault still happens.

comment:3 by strk, 12 years ago

I suspect the segfault is due to non-skipping core I/O functions (typmod etc):

KEEP: FUNCTION geography_recv(internal, oid, integer)
KEEP: FUNCTION geography_send(geography)
KEEP: FUNCTION geography_typmod_dims(integer)
KEEP: FUNCTION geography_typmod_srid(integer)
KEEP: FUNCTION geography_typmod_type(integer)

by strk, 12 years ago

Attachment: segfault_min.sql added

minimal SQL triggering the crash

comment:4 by strk, 12 years ago

The attached file triggers a segfault for me when loaded into a just-enabled spatial database (with 2.0.0beta1).

comment:5 by strk, 12 years ago

Actually, even smaller SQL:

CREATE FUNCTION geography_typmod_dims(integer) RETURNS integer
    LANGUAGE c IMMUTABLE STRICT
    AS '$libdir/postgis-1.5', 'geography_typmod_dims';

CREATE TABLE face ( face_id integer, mbr geometry );

COPY face (face_id, mbr) FROM stdin;
643 0103000020BB0B00000100000005000000CDCCCC8CAA443941A4703DAA9C795241CDCCCC8CAA443941CDCCCC0CB5795241D8BF461CF1443941CDCCCC0CB5795241D8BF461CF1443941A4703DAA9C795241CDCCCC8CAA443941A4703DAA9C795241
\.

I don't get how geography_typmod_dims is involved into this, given there are NO traces of geographies!

comment:6 by strk, 12 years ago

Smaller!

CREATE FUNCTION geography_typmod_dims(integer) RETURNS integer
    LANGUAGE c IMMUTABLE STRICT
    AS '$libdir/postgis-1.5', 'geography_typmod_dims';

SELECT 'SRID=3003;POINT(0 0)'::geometry;

comment:7 by nicklas, 12 years ago

Is the crash depending on that 1.5 binaries is present or absent? Or is the crash totally 2.0 code causing it

Just a thought

comment:8 by strk, 12 years ago

The library is there. If I drop it the segfault goes away so this is clearly a function in postgis-1.5 library segfaulting. It's puzzling that calling the offending select again a second time doesn't segfault. But I guess that's what you get when you access arbitrary memory. I'm lucky to get a segfault first time.

I'll fix the upgrade script to properly skip those functions but still I'm curious about _why_ geography_typmod_dims is triggered at all. Maybe we're also failing to drop a TYPE or we reference geography_typmod_dims from the GEOMETRY type definition

comment:9 by strk, 12 years ago

No hits of the function from new scripts:

[strk:postgis(svn-trunk)] grep -i geography_typmod_dims postgis/postgis.sql
[strk:postgis(svn-trunk)] grep -i geography_typmod_dims topology/topology.sql
[strk:postgis(svn-trunk)] grep -i geography_typmod_dims raster/rt_pg/rtpostgis.sql
[strk:postgis(svn-trunk)]

Big dragons here

comment:10 by nicklas, 12 years ago

Do you create a new database when you narrowed the problematic part of the code?

If not, is it possible that there is some other 1.5 functions that is still there somewhere.

I guess all functions from an old dump that is not a part of 2.0 will actually be installed. We have no function to stop the removed functions froom getting installed from a 1.5 dump, do we?

by strk, 12 years ago

Attachment: crash.sh added

Script to reproduce the crash — needs postgis-1.5 being installed too

comment:11 by strk, 12 years ago

Nicklas, see the script I just attached. To be run from postgis top sourcedir. Needs postgis-1.5 installed. Can you reproduce ? It's 100% reproducible here.

comment:12 by strk, 12 years ago

Valgrind say on the matter:

==15124== Invalid write of size 8
==15124==    at 0x13587D97: ptarray_set_point4d (string3.h:52)
==15124==    by 0x13586FF4: ptarray_insert_point (ptarray.c:129)
==15124==    by 0x135871F5: ptarray_append_point (ptarray.c:166)
==15124==    by 0x13597174: wkt_parser_ptarray_add_coord (lwin_wkt.c:269)
==15124==    by 0x13592D1A: wkt_yyparse (lwin_wkt_parse.y:514)
==15124==    by 0x1359414D: lwgeom_parse_wkt (lwin_wkt_parse.y:65)
==15124==    by 0x135612E5: LWGEOM_in (lwgeom_inout.c:107)
==15124==    by 0x3F1A2B: InputFunctionCall (in /usr/lib/postgresql/8.4/bin/postgres)
==15124==    by 0x3F2569: OidInputFunctionCall (in /usr/lib/postgresql/8.4/bin/postgres)
==15124==    by 0x229E55: coerce_type (in /usr/lib/postgresql/8.4/bin/postgres)
==15124==    by 0x22A539: coerce_to_target_type (in /usr/lib/postgresql/8.4/bin/postgres)
==15124==    by 0x21DEF8: ??? (in /usr/lib/postgresql/8.4/bin/postgres)

Still not sure it's related to 1.5 at all…

comment:13 by nicklas, 12 years ago

I don't have 1.5 at the same server. I can install it later …

comment:14 by strk, 12 years ago

The line numbers in ptarray.c reflect the one from postgis-2.0 lib, not 1.5 But if I drop the geography_typmod_dims I still get no crash %-/

comment:15 by strk, 12 years ago

I wonder if this is a bug in PostgreSQL itself (8.4.10)

comment:16 by strk, 12 years ago

Well, I confirm it is fully within 2.0 library:

ptarray_insert_point enter: p=0x7feffc9a0
ptarray_set_point4d: destination pointer:(nil)

I'll be trying to further reduce the test and hopefully get pgsql out of our way. Thanks spurious 1.5 library for being instrumental in catching this.

comment:17 by strk, 12 years ago

Owner: changed from pramsey to strk
Status: newassigned

Bingo, ptarray_set_point4d doesn't check that the pointarray contains the referenced point. Existing test test_lwprint_default_format would trigger an assertion about that.

comment:18 by strk, 12 years ago

Magic, after adding assertions in the code I can't reproduce anymore. Easy come easy go.

comment:19 by strk, 12 years ago

Resolution: fixed
Status: assignedclosed

Well, keeping this open won't help anyway…

comment:20 by strk, 12 years ago

for the record: in r9365 I've drop the old signatures

Note: See TracTickets for help on using tickets.