Opened 4 years ago

Last modified 4 years ago

#4776 new defect

st_geometrytype causes 100% CPU loop in postgres13 on specific linestring

Reported by: tvijlbrief Owned by: pramsey
Priority: medium Milestone: PostGIS PostgreSQL
Component: postgis Version: 3.0.x
Keywords: Cc:

Description

I have a dataset which works fine with postgres12/postgis3.0.2 but a specific valid large linestring causes st_geometrytype() to hang:

=====demo13.sh fragment======================

create table demo(wkb_geometry geometry(Geometry,28992));

COPY demo (wkb_geometry) FROM stdin; 0102000020 data skipped (see attached for full script)

\.

select st_isvalid(wkb_geometry) from demo;

select ST_numpoints(wkb_geometry) from demo;

— Next statement will hang Postgres13 server in 100% CPU loop:

select ST_GeometryType(wkb_geometry) from demo;

=============================================

Output from script:

DROP TABLE CREATE TABLE COPY 1

st_isvalid


t

(1 row)

st_numpoints


1518

(1 row)

Script/server hangs….

Attachments (1)

demo13.sh (47.8 KB ) - added by tvijlbrief 4 years ago.
Script to reproduce the problem

Download all attachments as: .zip

Change History (16)

by tvijlbrief, 4 years ago

Attachment: demo13.sh added

Script to reproduce the problem

comment:1 by tvijlbrief, 4 years ago

To be clear, this script runs fine on postgres12 but fails with postgres13

comment:2 by tvijlbrief, 4 years ago

I have tested this with postgresql-13-postgis-3 from deb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main

but also with a Postgis compiled from source and the postgres:13.0-alpine docker image.

comment:3 by robe, 4 years ago

This might be a PostGIS 3.0 probably

I tried on my development version PostGIS 3.1

POSTGIS="3.1.0dev 3.1.0alpha2-111-gd781ecf29" [EXTENSION] PGSQL="130" GEOS="3.9.0-CAPI-1.14.0" PROJ="6.3.2" GDAL="GDAL 3.2.0, released 2020/10/26" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" RASTER PostgreSQL 13.0 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0, 64-bit
DROP TABLE
CREATE TABLE
COPY 1
 st_isvalid
------------
 t
(1 row)


 st_numpoints
--------------
         1518
(1 row)


 st_geometrytype
-----------------
 ST_LineString
(1 row)

But my PostgreSQL 13 with PostGIS 3.0 bad news - same issue

POSTGIS="3.0.2 3.0.2" [EXTENSION] PGSQL="130" GEOS="3.8.1-CAPI-1.13.3" SFCGAL="1.3.8" PROJ="6.3.2" GDAL="GDAL 3.2.0, released 2020/10/26 GDAL_DATA not found" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.4.3 (Internal)" TOPOLOGY RASTER PostgreSQL 13.0, compiled by Visual C++ build 1914, 64-bit

Gets this far

DROP TABLE
CREATE TABLE
COPY 1
 st_isvalid
------------
 t
(1 row)


 st_numpoints
--------------
         1518
(1 row)

hangs

comment:4 by robe, 4 years ago

Milestone: PostGIS 3.1.0PostGIS 3.0.3
Priority: mediumblocker

comment:5 by pramsey, 4 years ago

Running on the latest Pg13 and the PostGIS 3.0 stable branch I cannot replicate this. Could have been fixed in Pg, or in PostGIS? Or platform dependent? (I'm MacOS)

comment:6 by robe, 4 years ago

Given that it works on pramsey's 3.0 branch and both our 3.1 master and doesn't work on 3.0.2, I suspect it might have already been fixed as part of

Fixes for #4767 #4768 #4771 #4772

[c74c522066006fa295645138557320678d2a59b7/git]

comment:7 by tvijlbrief, 4 years ago

I also tried it with current Postgis master (3.1) and it does fail again with a Debian postgres13 docker base image. Looking at the Postgis source I cannot think of a reason why this particular linestring fails with st_geometrytype() and works OK with eg st_numpoints().

The only difference is that st_numpoints() fetches the whole TOAST object and st_geometrytype() just requests a slice at the front of the TOAST object. This looks to me like a low level issue with Postgres13 and TOAST objects of a specific size and or compression behavior.

I have monitored https://www.postgresql.org/list/pgsql-bugs/2020-10/ to see if a generic Postgres13 issue popped up, but it did not, so I decided to start with reporting this as a Postgis issue. Perhaps we should report this issue as a PostgreSQL-13 issue referring to this ticket?

It's a strange issue, also because it does not effect MacOS with this specific instance, and most linestrings work just OK. I was very relieved to see that @robe could reproduce the error.

Last edited 4 years ago by tvijlbrief (previous) (diff)

comment:9 by x4m, 4 years ago

Can you plz provide a backtrace of hanging backend?

comment:10 by tvijlbrief, 4 years ago

When I disable compression the hang is fixed:

create table demo(wkb_geometry geometry(Geometry,28992));

alter table demo alter column wkb_geometry set storage external;

Last edited 4 years ago by tvijlbrief (previous) (diff)

comment:11 by x4m, 4 years ago

It's important o understand was the hanging in 3.0.2 caused by some hard to reproduce bug in PG or something within PostGIS. I would be very greatfull if you could share a call stack of hanging postgres process.

comment:12 by tvijlbrief, 4 years ago

Top 10 stack entries when hanging:

#0  pglz_decompress (source=source@entry=0x55584d6c39b0 "", slen=<optimized out>,
    dest=dest@entry=0x55584d6c3a3c "", rawsize=rawsize@entry=52,
    check_complete=check_complete@entry=false) at ./build/../src/common/pg_lzcompress.c:767
#1  0x000055584c5906cb in toast_decompress_datum_slice (slicelength=52, attr=0x55584d6c39a8)
    at ./build/../src/backend/access/common/detoast.c:483
#2  detoast_attr_slice (attr=<optimized out>, sliceoffset=sliceoffset@entry=0, slicelength=52)
    at ./build/../src/backend/access/common/detoast.c:274
#3  0x000055584c9e6f2a in pg_detoast_datum_slice (datum=<optimized out>, first=first@entry=0,
    count=<optimized out>) at ./build/../src/backend/utils/fmgr/fmgr.c:1754
#4  0x00007f0bc257a7c1 in geometry_geometrytype (fcinfo=0x55584d6bea60) at lwgeom_ogc.c:199
#5  0x000055584c740c9e in ExecInterpExpr (state=0x55584d6be578, econtext=0x55584d6be260,
    isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:699
#6  0x000055584c74ec62 in ExecEvalExprSwitchContext (isNull=0x7ffec8369497, econtext=0x55584d6be260,
    state=0x55584d6be578) at ./build/../src/include/executor/executor.h:313
#7  ExecProject (projInfo=0x55584d6be570) at ./build/../src/include/executor/executor.h:347
#8  ExecScan (node=<optimized out>, accessMtd=0x55584c773130 <SeqNext>,
    recheckMtd=0x55584c7731c0 <SeqRecheck>) at ./build/../src/backend/executor/execScan.c:238
#9  0x000055584c744bcd in ExecProcNode (node=0x55584d6be150)
    at ./build/../src/include/executor/executor.h:245
#10 ExecutePlan (execute_once=<optimized out>, dest=0x55584d6b8640, direction=<optimized out>,
    numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT,
    use_parallel_mode=<optimized out>, planstate=0x55584d6be150, estate=0x55584d6bded8)
    at ./build/../src/backend/executor/execMain.c:1646

comment:15 by robe, 4 years ago

Milestone: PostGIS 3.0.3PostGIS PostgreSQL
Priority: blockermedium

I'll keep it open until PostgreSQL 13.1 is released but flipped to a PostgreSQL problem. I guess that is why it worked on my development PostgreSQL 13 cause that was compiled from git where the fix was probably already in place and my PostGIS 3.0.2/PostgreSQL 13.0 are the released versions in the wild.

Note: See TracTickets for help on using tickets.