Opened 5 months ago

Last modified 4 weeks ago

#4129 new defect

Cannot update SecondarySnapshot during a parallel operation

Reported by: davidp Owned by: Bborie Park
Priority: high Milestone: PostGIS 2.4.6
Component: raster Version: 2.4.x
Keywords: SecondarySnapshot parallel Cc:

Description

Hi, since we upgraded our Postgres/Postgis? stack to version 10/2.4 this error starts to appear in our log:

ERROR: cannot update SecondarySnapshot? during a parallel operation

Most often in this context:

SQL statement: "SELECT proj4text FROM public.spatial_ref_sys WHERE srid = 5514 LIMIT 1"

or

SQL statement "SELECT TRUE" PL/pgSQL function _st_intersects(geometry,raster,integer)

There are many diferent sources of this error (PHP, Python, Mapnik), so i thing, there is marginal chance to be caused by our code (SQL statements). When i wrap problematic statements into PL/pgSQL function (which are parallel restricted by default), then this error disappear. So my suspicion falls on PostGIS functions and how they behave in PostgreSQL parallel mode.

For now i thing there is issue in these PostGIS functions:

  • ST_Transform(geometry, int) which executes "select proj4text from..."
  • ST_Intersects(geometry, raster)

Our system info is:

  • PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
  • POSTGIS="2.4.4 r16526" PGSQL="100" GEOS="3.5.1-CAPI-1.9.1 r4246" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.2, released 2016/10/24" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" RASTER

And PostreSQL cluster resides behind PGBouncer in session pooling mode.

Best Regards, David

Attachments (2)

pgbadger-error-report-1.png (23.9 KB) - added by davidp 4 weeks ago.
pgbadger-error-report-2.png (46.5 KB) - added by davidp 4 weeks ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 5 months ago by Algunenano

Component: postgisraster
Owner: changed from pramsey to Bborie Park

This looks like there is some Postgis function marked as PARALLEL SAFE when it isn't.

ST_Transform might execute a query like the one you've posted above, but it's read only and doesn't do anything strange so it should be ok.

On the other hand ST_Intersects(geometry, raster) is defined as:

CREATE OR REPLACE FUNCTION _st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL)
	RETURNS boolean AS $$
	DECLARE
		hasnodata boolean := TRUE;
		_geom geometry;
	BEGIN
		IF ST_SRID(rast) != ST_SRID(geom) THEN
			RAISE EXCEPTION 'Raster and geometry do not have the same SRID';
		END IF;

		_geom := ST_ConvexHull(rast);
...

And if we look at ST_ConverHull(raster) it is missing the parallel declaration:

-- Availability: 2.0.0
-- Changed: 2.1.4 raised cost
-- used to cast raster -> geometry
CREATE OR REPLACE FUNCTION st_convexhull(raster)
    RETURNS geometry
    AS 'MODULE_PATHNAME','RASTER_convex_hull'
    LANGUAGE 'c' IMMUTABLE STRICT
    COST 300;

This means that, by default rules, it's parallel unsafe, which makes anyone using it also unsafe. Now, I don't know if the parallelism declaration is missing or it really is unsafe, but that's where I would start looking into.

comment:2 in reply to:  1 Changed 5 months ago by davidp

Replying to Algunenano:

This means that, by default rules, it's parallel unsafe, which makes anyone using it also unsafe. Now, I don't know if the parallelism declaration is missing or it really is unsafe, but that's where I would start looking into.

I didn't thing so... PARALLEL UNSAFE apply only downstream (by my observations). So, function marked as parallel safe is executed in parallel mode, even if internally calls parallel unsafe function. Contrary, parallel unsafe function never executes in parallel mode, no matter how internally called functions are labeled.

Ind in our DB, functions are declared as follows:

CREATE OR REPLACE FUNCTION public._st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL::integer)
 RETURNS boolean
 LANGUAGE plpgsql
 IMMUTABLE PARALLEL SAFE COST 1000
 ...

and

CREATE OR REPLACE FUNCTION public.st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL::integer)
 RETURNS boolean
 LANGUAGE sql
 IMMUTABLE PARALLEL SAFE COST 1000
AS $function$ SELECT $1 OPERATOR(public.&&) $2::geometry AND public._st_intersects($1, $2, $3); $function$

And one correction to my original post... function default is PARALLEL UNSAFE, not RESTRICTED. Sorry, my fault.

Last edited 5 months ago by davidp (previous) (diff)

comment:3 Changed 5 months ago by Algunenano

So, function marked as parallel safe is executed in parallel mode, even if internally calls parallel unsafe function.

Yes, Postgresql trust the programmer when he declares a function parallel but, by definition, any function that calls another one that is parallel unsafe should be marked as unsafe too. If you don't, then your unsafe function might get called in parallel which leads to errors like this one or worse.

What I meant is, that if ST_ConverHull is unsafe (should be reviewed and tagged appropriately) then st_intersects(geom, raster, int) should be marked as unsafe too, but I haven't investigated any further.

comment:4 Changed 3 months ago by pramsey

Milestone: PostGIS 2.4.5PostGIS 2.4.6

comment:5 Changed 5 weeks ago by robe

In 16988:

Add missing schema qualifications for raster _ST_Intersects
references #4129 for PostGIS 3.0

comment:6 Changed 5 weeks ago by pramsey

So, that particular error message arises from GetLatestSnapshot which is not called at all in the PostGIS code base and pretty infrequently in the PgSQL code base. The only places that I can see that might happen during run time is some replication code paths? Are you running with replication on? I'm looking particularly at the st_transform() code line in PostGIS, and it all seems like it should be fine WRT parallelism. Any hints on replicating this?

comment:7 Changed 4 weeks ago by davidp

Yes, we are using synchronous streaming replication. Pretty simle scenario with one Master and one Hot Stanby.

There is SQL query that causes errors:

with adrm as( select *, ST_Distance( ST_TRANSFORM( ST_SETSRID( ST_MAKEPOINT( $1, $2 ), $4 ), 5514 ), definicni_bod) distance
	from ruian.rn_adresni_misto
	where ST_DWithin( ST_TRANSFORM( ST_SETSRID( ST_MAKEPOINT( $1, $2 ), $4 ), 5514 ), definicni_bod, $5 )
	order by distance asc limit $3 ) 
select json_agg( json_build_object( 'address', address, 'coords', coords, 'distance', distance )) as result
	from(select json_build_object( 'adrm_kod', adrm.kod, 'kraj', ruian.rn_vusc.nazev, 'kraj_kod', ruian.rn_okres.vusc_kod, 'okres', ruian.rn_okres.nazev, 'okres_kod', ruian.rn_obec.okres_kod, 'obec', ruian.rn_obec.nazev, 'obec_kod', ruian.rn_obec.kod, 'cast_obce', ruian.rn_cast_obce.nazev, 'cast_obce_kod', ruian.rn_cast_obce.kod, 'ulice', ruian.rn_ulice.nazev, 'ulice_kod', ruian.rn_ulice.kod, 'stavebni_objekt', ruian.rn_stavebni_objekt.kod, 'co', cislo_orientacni_hodnota, 'co_pism', cislo_orientacni_pismeno, 'cd', cislo_domovni, 'psc', adrp_psc ) address,
			st_asgeojson( ST_Transform( adrm.definicni_bod, $4 ))::json -> 'coordinates' coords, 
			round( adrm.distance::numeric, 2 ) distance
			from adrm
			left join ruian.rn_ulice on
			adrm.ulice_kod = ruian.rn_ulice.kod
			left join ruian.rn_stavebni_objekt on
			ruian.rn_stavebni_objekt.kod = adrm.stavobj_kod
			left join ruian.rn_cast_obce on
			ruian.rn_cast_obce.kod = ruian.rn_stavebni_objekt.cobce_kod
			left join ruian.rn_obec on
			ruian.rn_ulice.obec_kod = ruian.rn_obec.kod
			or ruian.rn_cast_obce.obec_kod = ruian.rn_obec.kod
			left join ruian.rn_okres on
			ruian.rn_obec.okres_kod = ruian.rn_okres.kod
			left join ruian.rn_vusc on
			ruian.rn_okres.vusc_kod = ruian.rn_vusc.kod) a;

where: $1 = X coordinate, $2 = Y coordinate, $3 = row limit, $4 = coordinates SRID, $5 = search radius

Error appears randomly when this query is executed in batch mode - eg 20k of executions... When i transformed this query into function (parallel unsafe), errors disappears.

Hope that helps...

Changed 4 weeks ago by davidp

Attachment: pgbadger-error-report-1.png added

Changed 4 weeks ago by davidp

Attachment: pgbadger-error-report-2.png added
Note: See TracTickets for help on using tickets.