Opened 2 years ago

Closed 8 days ago

Last modified 8 days ago

#4129 closed defect (fixed)

Cannot update SecondarySnapshot during a parallel operation

Reported by: davidp Owned by: Bborie Park
Priority: high Milestone: PostGIS 2.4.9
Component: raster Version: 2.4.x
Keywords: SecondarySnapshot parallel Cc:

Description

Hi, since we upgraded our Postgres/Postgis? stack to version 10/2.4 this error starts to appear in our log:

ERROR: cannot update SecondarySnapshot? during a parallel operation

Most often in this context:

SQL statement: "SELECT proj4text FROM public.spatial_ref_sys WHERE srid = 5514 LIMIT 1"

or

SQL statement "SELECT TRUE" PL/pgSQL function _st_intersects(geometry,raster,integer)

There are many diferent sources of this error (PHP, Python, Mapnik), so i thing, there is marginal chance to be caused by our code (SQL statements). When i wrap problematic statements into PL/pgSQL function (which are parallel restricted by default), then this error disappear. So my suspicion falls on PostGIS functions and how they behave in PostgreSQL parallel mode.

For now i thing there is issue in these PostGIS functions:

  • ST_Transform(geometry, int) which executes "select proj4text from..."
  • ST_Intersects(geometry, raster)

Our system info is:

  • PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
  • POSTGIS="2.4.4 r16526" PGSQL="100" GEOS="3.5.1-CAPI-1.9.1 r4246" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.2, released 2016/10/24" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" RASTER

And PostreSQL cluster resides behind PGBouncer in session pooling mode.

Best Regards, David

Attachments (2)

pgbadger-error-report-1.png (23.9 KB) - added by davidp 21 months ago.
pgbadger-error-report-2.png (46.5 KB) - added by davidp 21 months ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 2 years ago by Algunenano

Component: postgisraster
Owner: changed from pramsey to Bborie Park

This looks like there is some Postgis function marked as PARALLEL SAFE when it isn't.

ST_Transform might execute a query like the one you've posted above, but it's read only and doesn't do anything strange so it should be ok.

On the other hand ST_Intersects(geometry, raster) is defined as:

CREATE OR REPLACE FUNCTION _st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL)
	RETURNS boolean AS $$
	DECLARE
		hasnodata boolean := TRUE;
		_geom geometry;
	BEGIN
		IF ST_SRID(rast) != ST_SRID(geom) THEN
			RAISE EXCEPTION 'Raster and geometry do not have the same SRID';
		END IF;

		_geom := ST_ConvexHull(rast);
...

And if we look at ST_ConverHull(raster) it is missing the parallel declaration:

-- Availability: 2.0.0
-- Changed: 2.1.4 raised cost
-- used to cast raster -> geometry
CREATE OR REPLACE FUNCTION st_convexhull(raster)
    RETURNS geometry
    AS 'MODULE_PATHNAME','RASTER_convex_hull'
    LANGUAGE 'c' IMMUTABLE STRICT
    COST 300;

This means that, by default rules, it's parallel unsafe, which makes anyone using it also unsafe. Now, I don't know if the parallelism declaration is missing or it really is unsafe, but that's where I would start looking into.

comment:2 in reply to:  1 Changed 2 years ago by davidp

Replying to Algunenano:

This means that, by default rules, it's parallel unsafe, which makes anyone using it also unsafe. Now, I don't know if the parallelism declaration is missing or it really is unsafe, but that's where I would start looking into.

I didn't thing so... PARALLEL UNSAFE apply only downstream (by my observations). So, function marked as parallel safe is executed in parallel mode, even if internally calls parallel unsafe function. Contrary, parallel unsafe function never executes in parallel mode, no matter how internally called functions are labeled.

And one correction to my original post... function default is PARALLEL UNSAFE, not RESTRICTED. Sorry, my fault.

Version 0, edited 2 years ago by davidp (next)

comment:3 Changed 2 years ago by Algunenano

So, function marked as parallel safe is executed in parallel mode, even if internally calls parallel unsafe function.

Yes, Postgresql trust the programmer when he declares a function parallel but, by definition, any function that calls another one that is parallel unsafe should be marked as unsafe too. If you don't, then your unsafe function might get called in parallel which leads to errors like this one or worse.

What I meant is, that if ST_ConverHull is unsafe (should be reviewed and tagged appropriately) then st_intersects(geom, raster, int) should be marked as unsafe too, but I haven't investigated any further.

comment:4 Changed 23 months ago by pramsey

Milestone: PostGIS 2.4.5PostGIS 2.4.6

comment:5 Changed 21 months ago by robe

In 16988:

Add missing schema qualifications for raster _ST_Intersects
references #4129 for PostGIS 3.0

comment:6 Changed 21 months ago by pramsey

So, that particular error message arises from GetLatestSnapshot which is not called at all in the PostGIS code base and pretty infrequently in the PgSQL code base. The only places that I can see that might happen during run time is some replication code paths? Are you running with replication on? I'm looking particularly at the st_transform() code line in PostGIS, and it all seems like it should be fine WRT parallelism. Any hints on replicating this?

comment:7 Changed 21 months ago by davidp

Yes, we are using synchronous streaming replication. Pretty simle scenario with one Master and one Hot Stanby.

There is SQL query that causes errors:

with adrm as( select *, ST_Distance( ST_TRANSFORM( ST_SETSRID( ST_MAKEPOINT( $1, $2 ), $4 ), 5514 ), definicni_bod) distance
	from ruian.rn_adresni_misto
	where ST_DWithin( ST_TRANSFORM( ST_SETSRID( ST_MAKEPOINT( $1, $2 ), $4 ), 5514 ), definicni_bod, $5 )
	order by distance asc limit $3 ) 
select json_agg( json_build_object( 'address', address, 'coords', coords, 'distance', distance )) as result
	from(select json_build_object( 'adrm_kod', adrm.kod, 'kraj', ruian.rn_vusc.nazev, 'kraj_kod', ruian.rn_okres.vusc_kod, 'okres', ruian.rn_okres.nazev, 'okres_kod', ruian.rn_obec.okres_kod, 'obec', ruian.rn_obec.nazev, 'obec_kod', ruian.rn_obec.kod, 'cast_obce', ruian.rn_cast_obce.nazev, 'cast_obce_kod', ruian.rn_cast_obce.kod, 'ulice', ruian.rn_ulice.nazev, 'ulice_kod', ruian.rn_ulice.kod, 'stavebni_objekt', ruian.rn_stavebni_objekt.kod, 'co', cislo_orientacni_hodnota, 'co_pism', cislo_orientacni_pismeno, 'cd', cislo_domovni, 'psc', adrp_psc ) address,
			st_asgeojson( ST_Transform( adrm.definicni_bod, $4 ))::json -> 'coordinates' coords, 
			round( adrm.distance::numeric, 2 ) distance
			from adrm
			left join ruian.rn_ulice on
			adrm.ulice_kod = ruian.rn_ulice.kod
			left join ruian.rn_stavebni_objekt on
			ruian.rn_stavebni_objekt.kod = adrm.stavobj_kod
			left join ruian.rn_cast_obce on
			ruian.rn_cast_obce.kod = ruian.rn_stavebni_objekt.cobce_kod
			left join ruian.rn_obec on
			ruian.rn_ulice.obec_kod = ruian.rn_obec.kod
			or ruian.rn_cast_obce.obec_kod = ruian.rn_obec.kod
			left join ruian.rn_okres on
			ruian.rn_obec.okres_kod = ruian.rn_okres.kod
			left join ruian.rn_vusc on
			ruian.rn_okres.vusc_kod = ruian.rn_vusc.kod) a;

where: $1 = X coordinate, $2 = Y coordinate, $3 = row limit, $4 = coordinates SRID, $5 = search radius

Error appears randomly when this query is executed in batch mode - eg 20k of executions... When i transformed this query into function (parallel unsafe), errors disappears.

Hope that helps...

Changed 21 months ago by davidp

Attachment: pgbadger-error-report-1.png added

Changed 21 months ago by davidp

Attachment: pgbadger-error-report-2.png added

comment:8 Changed 10 months ago by robe

Milestone: PostGIS 2.4.6PostGIS 2.4.7

comment:9 Changed 10 months ago by pramsey

Milestone: PostGIS 2.4.7PostGIS 2.4.9

comment:10 Changed 2 months ago by robe

Is this still an issue. I thought it might have had to do with our calls to spatial_ref_sys in C code not being schema qualified, I think we fixed it in newer versions of PostGIS , but I forget if we went as far back as 2.4

comment:11 Changed 8 days ago by robe

Resolution: fixed
Status: newclosed

I suspect this is a similar issue to #4661 and is fixed by

40015c93/git

comment:12 Changed 8 days ago by Regina Obe <lr@…>

In 5e2440d/git:

Revise ST_ConcaveHull to run for 0.99 as well, ammend NEWs for #4129. Closes #4541 for PostGIS 2.5.5. references #4129 for PostGIS 2.5.5

comment:13 Changed 8 days ago by Regina Obe <lr@…>

In ac3d025/git:

Revise ST_ConcaveHull to run for 0.99 as well, ammend NEWs for #4129. Closes #4541 for PostGIS 3.0.2. references #4129 for PostGIS 3.0.2

Note: See TracTickets for help on using tickets.