Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#5139 closed defect (fixed)

PostGIS causes to_jsonb to no longer be parallel safe, ST_AsGeoJSON and ST_AsGML are also parallel unsafe

Reported by: samuelspurling Owned by: pramsey
Priority: medium Milestone: PostGIS 3.1.7
Component: postgis Version: 3.2.x
Keywords: Cc:

Description

Hi,

Below was using PostgreSQL 13 and PostGIS 3.2.1.

I believe that the automatic conversion of the geometry field to geojson when using to_json or to_jsonb is causing a query of ours to fail.

Below query has been trimmed down a lot to try and show the issue. In the below case 'a_large_table' needs to be big enough to cause the query planner to use parallel queries (or force it using other settings).

-- Does not work
WITH object_list AS (
	SELECT guid,'a_large_table' AS table_name, to_jsonb((a_large_table.geom)) AS json_data
	FROM a_large_table
	WHERE reference like 'A00171%'
)
SELECT * INTO TEMPORARY TABLE object_list_temp
FROM object_list;

This gives the output: ERROR: cannot start commands during a parallel operation

Where: SQL statement "SELECT auth_name
':'auth_srid FROM public.spatial_ref_sys WHERE srid='27700'"
-- If casting the geometry to text the query then works as expected
WITH object_list AS (
	SELECT guid,'a_large_table' AS table_name, to_jsonb((a_large_table.geom::text)) AS json_data
	FROM a_large_table
	WHERE reference like 'A00171%'
)
SELECT * INTO TEMPORARY TABLE object_list_temp
FROM object_list;

Above works correctly for comparison, only difference being the geometry casting to text inside the to_jsonb. It is also worth noting that the part inside the CTE does work correctly when run on its own in either case.

I believe the selection into the temporary table causes it to be parallel restricted (? I am not a DBA, please correct me if incorrect) which is why after surrounding the other query something inside it is not marked as parallel safe and fails.

Any help appreciated, but I believe this a bug.

Thanks

Change History (14)

comment:1 by pramsey, 2 years ago

Does this occur on PgSQL 14? I'm trying to replicate and failing.

=# explain with objs as (select id, fcname, to_jsonb(geom) from large_table) select * into temporary table large_table_tmp from objs where id > 990000;

                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Gather  (cost=1000.00..76996.72 rows=42582 width=41)
   Workers Planned: 2
   ->  Parallel Seq Scan on large_table  (cost=0.00..71738.52 rows=17742 width=41)
         Filter: (id > 990000)
(4 rows)


=# with objs as (select id, fcname, to_jsonb((large_table.geom)) from large_table) select * into temporary table large_table_tmp from objs where id > 990000;

SELECT 40000

comment:2 by robe, 2 years ago

I was able to trigger the same error -

ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL statement "SELECT auth_name||':'||auth_srid 		        FROM public.spatial_ref_sys WHERE srid='27700'"

on both {{{ PostgreSQL 13.4, compiled by Visual C++ build 1914, 64-bit POSTGIS="3.2.0 3.2.0" [EXTENSION] PGSQL="130" GEOS="3.10.1-CAPI-1.16.0" PROJ="7.2.1" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" }}}

and:

PostgreSQL 14.1, compiled by Visual C++ build 1914, 64-bit	POSTGIS="3.2.1 3.2.1" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="7.2.1" GDAL="GDAL 3.4.2, released 2022/03/08" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" RASTER

Using the below script

-- create a fake table
--  28 secs 21 msec.
DROP TABLE IF EXISTS a_large_table;
CREATE TABLE a_large_table AS 
SELECT x AS id, ST_Point(40000 + random()*10000, -100000 + random()*10000, 27700) AS geom
FROM generate_series(1,10000000) AS x;

— this triggers the error

set max_parallel_workers_per_gather=4;
DROP TABLE IF EXISTS object_list_temp;
WITH object_list AS (
	SELECT id,'a_large_table' AS table_name, to_jsonb((a_large_table.geom)) AS json_data
	FROM a_large_table
	WHERE id::text LIKE '10000'
)
SELECT * INTO TEMPORARY TABLE object_list_temp
FROM object_list;

The explain looks like this:

Gather  (cost=1000.00..133115.60 rows=50000 width=68)
  Workers Planned: 4
  ->  Parallel Seq Scan on a_large_table  (cost=0.00..127115.60 rows=12500 width=68)
        Filter: ((id)::text ~~ '10000'::text)

It also fails for me with 2 parallel workers with same error. Using id > 9900000 also triggers the error.

@pramsey are you able to replicate with that above?

I thought maybe we had a short-circuit for 4326 so tried with the same in the example. I confirmed I can trigger with 4326 also.

However, if the dataset has no srid, then the error doesn't happen.

e.g.

-- create a fake table
--  28 secs 21 msec.
DROP TABLE IF EXISTS a_large_table;
CREATE TABLE a_large_table AS 
SELECT x AS id, ST_Point(40000 + random()*10000, -100000 + random()*10000) AS geom
FROM generate_series(1,10000000) AS x;

DROP TABLE IF EXISTS object_list_temp;
-- no error
set max_parallel_workers_per_gather=4;
WITH object_list AS (
	SELECT id,'a_large_table' AS table_name, to_jsonb(a_large_table.geom) AS json_data
	FROM a_large_table
	WHERE id::text LIKE '10000'
)
SELECT * INTO TEMPORARY TABLE object_list_temp
FROM object_list;

The explain of that looks the same as the failing ones

So the issue is the srid call that is causing the problem. Using a_large_table.geom::text simply avoids using ST_AsGeoJSON and the output is not a geojson object and no srid query needs to be done to pull the spatial ref sys metadata.

Version 0, edited 2 years ago by robe (next)

comment:3 by robe, 2 years ago

As I recall I think perhaps in 3.1 we didn't do a call to spatial_ref_sys to look up the auth and srid. I think we just hard-coded EPSG:<srid> of the geometry. So that change to look it up, cause "the auth could be different", probably triggered this issue. I'm not sure there is a fix short of going back to the way we used to do it. We could probably at least hard-code the 4326 case.

comment:4 by robe, 2 years ago

AS expected this is also an issue with ST_AsGeoJSON

WITH object_list AS (
	SELECT id,'a_large_table' AS table_name, ST_AsGeoJSON(a_large_table.geom) AS json_data
	FROM a_large_table
	WHERE id::text LIKE '10000'
)
SELECT * INTO TEMPORARY TABLE object_list_temp
FROM object_list;

triggers the same:

ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL statement "SELECT auth_name||':'||auth_srid FROM public.spatial_ref_sys WHERE srid='27700'"

comment:5 by robe, 2 years ago

Summary: PostGIS causes to_jsonb to no longer be parallel safePostGIS causes to_jsonb to no longer be parallel safe, ST_AsGeoJSON also parallel unsafe

comment:6 by robe, 2 years ago

I wanted to confirm that all output functions that try to output spatial_ref_sys metadata fail.

I confirmed that ST_AsGML also fails.

What is interesting is ST_AsKML doesn't. I assume it's because it's using ST_Transform to convert to WGS84 long lat, but why ST_Transform doesn't fail when it has to also query spatial_ref_sys at some point I haven't quite figured out. It looks like for the most part the projcache is used most of the time instead of directly calling into spatial_ref_sys, so that might be why.

comment:7 by robe, 2 years ago

Summary: PostGIS causes to_jsonb to no longer be parallel safe, ST_AsGeoJSON also parallel unsafePostGIS causes to_jsonb to no longer be parallel safe, ST_AsGeoJSON and ST_AsGML are also parallel unsafe

comment:8 by robe, 2 years ago

Milestone: PostGIS 3.2.2PostGIS 3.1.7

I'm pushing this to PostGIS 3.1. Even though the issue here manifests because of the auto-cast in to_jsonb, to now do proper casting of postgis geometry (instead of just returning the canonical hex), the underlying issue is the ST_AsGeoJSON, ST_AsGML and their call of the spatial_ref_sys table. Perhaps if we switched those to using cached values like ST_Transform, that might fix the issue or at least make it less common.

comment:9 by Regina Obe <lr@…>, 2 years ago

In 6906e407/git:

Use read only SPI to maintain parallel safety
for ST_AsGeoJSON and ST_AsGML
References #5139 for PostGIS 3.3.0
Using same solution from #3952

comment:10 by robe, 2 years ago

I'm going to backport the above to 3.2 and 3.1 after I have a test in place.

comment:11 by Regina Obe <lr@…>, 2 years ago

In 80ff129/git:

Use read only SPI to maintain parallel safety
for ST_AsGeoJSON and ST_AsGML
References #5139 for PostGIS 3.2.3

comment:12 by Regina Obe <lr@…>, 2 years ago

Resolution: fixed
Status: newclosed

In 8209512/git:

Use read only SPI to maintain parallel safety
for ST_AsGeoJSON and ST_AsGML
Closes #5139 for PostGIS 3.1.7

comment:13 by Regina Obe <lr@…>, 2 years ago

In 8ae2e1c1/git:

Fix ST_Point to use older signature in tests. References #5139 for PostGIS 3.1.7

comment:14 by samuelspurling, 2 years ago

Thank you all for fixing this, appreciated!

Note: See TracTickets for help on using tickets.