Opened 5 years ago
Closed 4 years ago
#4691 closed defect (fixed)
Segfault when creating an index on geography(Point, 4326) column that contains empty points
Reported by: | aktiur | Owned by: | Algunenano |
---|---|---|---|
Priority: | blocker | Milestone: | PostGIS 2.5.5 |
Component: | postgis | Version: | 2.5.x -- EOL |
Keywords: | Cc: |
Description
When creating a gist index on a geography(Point, 4326) column that happens to contain empty points (with coordinates of [nan, nan]), PostGIS will sometimes encounter a segmentation fault when freeing some memory.
This is not a fully deterministic error. I'm joining a dataset with 11000 points (of which 1000 are empty, the other 10000 are gaussian distributed around the center of France) and a script that imports it that seems to trigger that bug most of the time, but in some circumstances there is no segfault and I have to DROP the table and try again from the beginning to trigger it again (truncating is not enough).
I did not manage to trigger the bug with a smaller dataset with only 1100 points.
I also noticed replacing the column type with geometry does not seem to trigger the segfault, but I did not try with a bigger dataset.
I also joined a full stacktrace.
crash_test=# SELECT version(); version --------------------------------------------------------------------------------------------------------------- PostgreSQL 12.2 (Ubuntu 12.2-4) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-8ubuntu1) 9.3.0, 64-bit (1 row) crash_test=# SELECT postgis_full_version(); postgis_full_version -------------------------------------------------------------------------------------------------------------------------------------------------------------------- POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.4" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.4.3 (Internal)" (1 row)
I'm using PostGIS 3 but it seems to be the same bug as this Postgresql ticket #16283 where the reporter was using PostGIS 2.5.
Attachments (3)
Change History (13)
by , 5 years ago
Attachment: | stacktrace.txt added |
---|
by , 5 years ago
Sample dataset (11,000 points including 1,000 empty points)
comment:1 by , 5 years ago
Here is the shell output when running the script
root@agir-test:~/postgresql# sudo -u postgres psql crash_test < script.sql could not change directory to "/root/postgresql": Permission denied NOTICE: extension "postgis" already exists, skipping CREATE EXTENSION CREATE TABLE COPY 11000 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost
follow-up: 3 comment:2 by , 5 years ago
Milestone: | → PostGIS 2.5.5 |
---|---|
Priority: | low → critical |
I can reproduce it in master using the instructions attached. It doesn't always happen, but creating and dropping the index several times leads to the crash eventually.
comment:3 by , 5 years ago
Replying to Algunenano:
I can reproduce it in master using the instructions attached. It doesn't always happen, but creating and dropping the index several times leads to the crash eventually.
It seems the bigger the table, the higher the chance it happens. I guess importing the dataset several times increase the chances, in case it is hard to reproduce.
comment:4 by , 4 years ago
Priority: | critical → blocker |
---|
comment:5 by , 4 years ago
Owner: | changed from | to
---|
I think I have found the bug: gidx_merge seems to keep ownership of the new gidx pointer when the old one is unknown / empty instead of copying it. When this happens multiple times, the pointer might be freed several times in gserialized_gist_picksplit_addlist.
This seems to fix it: https://github.com/Algunenano/postgis/commit/c35256b1d41852f231451cd3bf3bf72a1dbeaa1d
I'm going to try to create a regress test for it, although it might be complicated since it seems like a heisenbug. I think it might affect all stable releases, but it was made more likely due to the changes in #4139
comment:6 by , 4 years ago
Looking at Postgresql source code it appears that GIST uses random() in some cases to decide in which bucket to insert a tupe (when several of them are equally good) so finding a use minimal test case is going to be extremely hard.
I'm going to retest the patch for a while to confirm I don't see any crash / invalid reads and push it that way.
BTW, under valgrind it looks like this:
2020-07-29 18:21:34.532 CEST [205377] [raul @ postgis_crash] LOG: statement: CREATE INDEX crash_test_index ON crash_test USING gist (point); Invalid read of size 8 at 0x6946AB: pfree (mcxt.c:1035) by 0x8525833: gserialized_gist_picksplit (gserialized_gist_nd.c:0) by 0x66B188: FunctionCall2Coll (fmgr.c:1162) by 0x211D27: gistUserPicksplit (gistsplit.c:433) by 0x2113F9: gistSplitByKey (gistsplit.c:697) by 0x208BE8: gistSplit (gist.c:1451) by 0x2081E0: gistplacetopage (gist.c:299) by 0x207DFA: gistinserttuples (gist.c:1269) by 0x207DFA: gistinserttuple (gist.c:1222) by 0x207DFA: gistdoinsert (gist.c:876) by 0x212E84: gistBuildCallback (gistbuild.c:489) by 0x22DCBA: heapam_index_build_range_scan (heapam_handler.c:1664) by 0x212BB4: table_index_build_scan (tableam.h:1522) by 0x212BB4: gistbuild (gistbuild.c:196) by 0x29BF4D: index_build (index.c:2912) Address 0x4028002200000008 is not stack'd, malloc'd or (recently) free'd Process terminating with default action of signal 11 (SIGSEGV): dumping core General Protection Fault at 0x6946AB: pfree (mcxt.c:1035) by 0x8525833: gserialized_gist_picksplit (gserialized_gist_nd.c:0) by 0x66B188: FunctionCall2Coll (fmgr.c:1162) by 0x211D27: gistUserPicksplit (gistsplit.c:433) by 0x2113F9: gistSplitByKey (gistsplit.c:697) by 0x208BE8: gistSplit (gist.c:1451) by 0x2081E0: gistplacetopage (gist.c:299) by 0x207DFA: gistinserttuples (gist.c:1269) by 0x207DFA: gistinserttuple (gist.c:1222) by 0x207DFA: gistdoinsert (gist.c:876) by 0x212E84: gistBuildCallback (gistbuild.c:489) by 0x22DCBA: heapam_index_build_range_scan (heapam_handler.c:1664) by 0x212BB4: table_index_build_scan (tableam.h:1522) by 0x212BB4: gistbuild (gistbuild.c:196) by 0x29BF4D: index_build (index.c:2912)
Full stacktrace at segfault