#4691 closed defect (fixed)

Segfault when creating an index on geography(Point, 4326) column that contains empty points

Reported by: aktiur Owned by: Algunenano
Priority: blocker Milestone: PostGIS 2.5.5
Component: postgis Version: 2.5.x
Keywords: Cc:

Description

When creating a gist index on a geography(Point, 4326) column that happens to contain empty points (with coordinates of [nan, nan]), PostGIS will sometimes encounter a segmentation fault when freeing some memory.

This is not a fully deterministic error. I'm joining a dataset with 11000 points (of which 1000 are empty, the other 10000 are gaussian distributed around the center of France) and a script that imports it that seems to trigger that bug most of the time, but in some circumstances there is no segfault and I have to DROP the table and try again from the beginning to trigger it again (truncating is not enough).

I did not manage to trigger the bug with a smaller dataset with only 1100 points.

I also noticed replacing the column type with geometry does not seem to trigger the segfault, but I did not try with a bigger dataset.

I also joined a full stacktrace.

crash_test=# SELECT version();
                                                    version
---------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.2 (Ubuntu 12.2-4) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-8ubuntu1) 9.3.0, 64-bit
(1 row)

crash_test=# SELECT postgis_full_version();
                                                                        postgis_full_version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.4" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.4.3 (Internal)"
(1 row)

I'm using PostGIS 3 but it seems to be the same bug as this Postgresql ticket #16283 where the reporter was using PostGIS 2.5.

Attachments (3)

stacktrace.txt (20.8 KB) - added by aktiur 17 months ago.
Full stacktrace at segfault
data.txt (469.7 KB) - added by aktiur 17 months ago.
Sample dataset (11,000 points including 1,000 empty points)
script.sql (241 bytes) - added by aktiur 17 months ago.
Sample script to trigger the segfault

Download all attachments as: .zip

Change History (13)

Changed 17 months ago by aktiur

Attachment: stacktrace.txt added

Full stacktrace at segfault

Changed 17 months ago by aktiur

Attachment: data.txt added

Sample dataset (11,000 points including 1,000 empty points)

Changed 17 months ago by aktiur

Attachment: script.sql added

Sample script to trigger the segfault

comment:1 Changed 17 months ago by aktiur

Here is the shell output when running the script

root@agir-test:~/postgresql# sudo -u postgres psql crash_test < script.sql
could not change directory to "/root/postgresql": Permission denied
NOTICE:  extension "postgis" already exists, skipping
CREATE EXTENSION
CREATE TABLE
COPY 11000
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost

comment:2 Changed 17 months ago by Algunenano

Milestone: PostGIS 2.5.5
Priority: lowcritical

I can reproduce it in master using the instructions attached. It doesn't always happen, but creating and dropping the index several times leads to the crash eventually.

comment:3 in reply to:  2 Changed 17 months ago by aktiur

Replying to Algunenano:

I can reproduce it in master using the instructions attached. It doesn't always happen, but creating and dropping the index several times leads to the crash eventually.

It seems the bigger the table, the higher the chance it happens. I guess importing the dataset several times increase the chances, in case it is hard to reproduce.

comment:4 Changed 16 months ago by robe

Priority: criticalblocker

comment:5 Changed 15 months ago by Algunenano

Owner: changed from pramsey to Algunenano

I think I have found the bug: gidx_merge seems to keep ownership of the new gidx pointer when the old one is unknown / empty instead of copying it. When this happens multiple times, the pointer might be freed several times in gserialized_gist_picksplit_addlist.

This seems to fix it: https://github.com/Algunenano/postgis/commit/c35256b1d41852f231451cd3bf3bf72a1dbeaa1d

I'm going to try to create a regress test for it, although it might be complicated since it seems like a heisenbug. I think it might affect all stable releases, but it was made more likely due to the changes in #4139

comment:6 Changed 15 months ago by Algunenano

Looking at Postgresql source code it appears that GIST uses random() in some cases to decide in which bucket to insert a tupe (when several of them are equally good) so finding a use minimal test case is going to be extremely hard.

I'm going to retest the patch for a while to confirm I don't see any crash / invalid reads and push it that way.

BTW, under valgrind it looks like this:

 2020-07-29 18:21:34.532 CEST [205377] [raul @ postgis_crash] LOG: statement: CREATE INDEX crash_test_index ON crash_test USING gist (point);
 Invalid read of size 8
 at 0x6946AB: pfree (mcxt.c:1035)
 by 0x8525833: gserialized_gist_picksplit (gserialized_gist_nd.c:0)
 by 0x66B188: FunctionCall2Coll (fmgr.c:1162)
 by 0x211D27: gistUserPicksplit (gistsplit.c:433)
 by 0x2113F9: gistSplitByKey (gistsplit.c:697)
 by 0x208BE8: gistSplit (gist.c:1451)
 by 0x2081E0: gistplacetopage (gist.c:299)
 by 0x207DFA: gistinserttuples (gist.c:1269)
 by 0x207DFA: gistinserttuple (gist.c:1222)
 by 0x207DFA: gistdoinsert (gist.c:876)
 by 0x212E84: gistBuildCallback (gistbuild.c:489)
 by 0x22DCBA: heapam_index_build_range_scan (heapam_handler.c:1664)
 by 0x212BB4: table_index_build_scan (tableam.h:1522)
 by 0x212BB4: gistbuild (gistbuild.c:196)
 by 0x29BF4D: index_build (index.c:2912)
 Address 0x4028002200000008 is not stack'd, malloc'd or (recently) free'd


 Process terminating with default action of signal 11 (SIGSEGV): dumping core
 General Protection Fault
 at 0x6946AB: pfree (mcxt.c:1035)
 by 0x8525833: gserialized_gist_picksplit (gserialized_gist_nd.c:0)
 by 0x66B188: FunctionCall2Coll (fmgr.c:1162)
 by 0x211D27: gistUserPicksplit (gistsplit.c:433)
 by 0x2113F9: gistSplitByKey (gistsplit.c:697)
 by 0x208BE8: gistSplit (gist.c:1451)
 by 0x2081E0: gistplacetopage (gist.c:299)
 by 0x207DFA: gistinserttuples (gist.c:1269)
 by 0x207DFA: gistinserttuple (gist.c:1222)
 by 0x207DFA: gistdoinsert (gist.c:876)
 by 0x212E84: gistBuildCallback (gistbuild.c:489)
 by 0x22DCBA: heapam_index_build_range_scan (heapam_handler.c:1664)
 by 0x212BB4: table_index_build_scan (tableam.h:1522)
 by 0x212BB4: gistbuild (gistbuild.c:196)
 by 0x29BF4D: index_build (index.c:2912)

comment:7 Changed 15 months ago by Raúl Marín <git@…>

In ee6afa6/git:

Avoid double freeing due to gidx_merge

References #4691

comment:8 Changed 15 months ago by Raúl Marín <git@…>

In 079c7861/git:

Avoid double freeing due to gidx_merge

References #4691

comment:9 Changed 15 months ago by Raúl Marín <git@…>

In a695e05/git:

Avoid double freeing due to gidx_merge

References #4691

comment:10 Changed 15 months ago by Raúl Marín <git@…>

Resolution: fixed
Status: newclosed

In 83eba0d/git:

Avoid double freeing due to gidx_merge

Closes #4691

Note: See TracTickets for help on using tickets.