Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#590 closed task (fixed)

[raster] Two rasters version of ST_MapAlgebra

Reported by: pracine Owned by: pracine
Priority: high Milestone: PostGIS 2.0.0
Component: raster Version: master
Keywords: Cc:

Description

Specs should be added to the wiki soon. Otherwise refer to scripts/plpgsql/mapalgebra.sql

The first version should include optimization provided by the possibility to ST_SetValues over an extent (instead of setting one pixel at a time).

The conditions when to apply such optimizations are still to precise but at a first glance you want to set an extent to the values of one of the provided raster when the expression the equivalent of 'LAST' and you want to set an extent to a single value when there is a replacement value.

For better optimization we might have to replace the "replacement value" concept with four expressions: "NULL-NULL expression", "NULL-value expression", "value-NULL expression", "value-value expression". With this we could easily identify what is the function involved in non-overlapping area and apply the correct optimization. To be followed in the specs...

The possibility to reference neighbour pixels (to implement focal expressions) can be delayed to a further version.

Change History (21)

comment:1 Changed 10 years ago by jorgearevalo

Status: newassigned

comment:2 Changed 10 years ago by pracine

Owner: changed from jorgearevalo to pracine
Status: assignednew

comment:3 Changed 10 years ago by pracine

-Specifications still need to be finished before implementing this one.

comment:4 Changed 10 years ago by pracine

Status: newassigned

comment:5 Changed 10 years ago by pracine

Priority: mediumhigh

comment:6 Changed 10 years ago by pracine

The last incomplete PL/pgSQL version is in http://trac.osgeo.org/postgis/browser/trunk/raster/scripts/plpgsql/st_mapalgebra_optimized.sql

This optimized version tries to fill large areas in a single ST_SetValues call when possible.

comment:7 Changed 10 years ago by pracine

Milestone: PostGIS 2.0.0PostGIS Raster Future

comment:8 Changed 9 years ago by Bborie Park

Milestone: PostGIS Raster FuturePostGIS 2.0.0
Owner: changed from pracine to Bborie Park
Priority: highcritical
Status: assignednew
Summary: [raster] Implement the two rasters version of ST_MapAlgebra[raster] Two rasters version of ST_MapAlgebra

comment:9 Changed 9 years ago by Bborie Park

Status: newassigned

comment:10 Changed 9 years ago by Bborie Park

I'm concerned about several items in the working specs for ST_MapAlgebra before I get into writing it. I've put into blocks the text from the working spec I have questions about...

  1. In the item regarding the alignment of both rasters...
Both raster must be aligned. This is determined using the ST_SameAlignment() function described below. If a resampling is necessary they use the planned ST_Resample() function to resample the second raster to the first one before processing (or the first to the second if a MASTER parameter is provided). If ST_Resample() is not yet implemented when these functions are implemented, just return an error message: "ERROR: MapAlgebra? on rasters with different alignment not yet implemented."

I don't know if it is appropriate for ST_MapAlgebra to do the alignment automatically as the behavior would differ from the behavior of other exceptions, primarily different SRIDs. Technically, you could reproject the second raster to that of the first automatically as well but we're outputting an exception instead. Outputting an exception is in-line with the behavior found in the geometry functions.

In addition, the resampling of the second raster may change (probably will) depending on the resampling algorithm used, which I think is a decision that the end-user must make.

  1. For the resulting raster's extent...
     The computation extent (extentexpr in the functions below) and hence the extent of the resulting raster can be: 

        -'FIRST' (the extent of the first raster) (default),
        -'SECOND' (the extent of the first raster),
        -'INTERSECTION' (the extent of the intersection of both rasters) (could also be default),
        -'UNION' (the extent of the union of both rasters).

I think the default should be INTERSECTION or UNION as they are the most obvious from the user's standpoint. I'd prefer INTERSECTION as that is typically the most compact.

  1. For the expressions dealing with nodata...
    Alternative expressions, evaluated in place of the main expression, can be provided as parameter to deal with nodata values: 

        -nodata1expr is evaluated when the first raster pixel value is nodata or when the first raster is absent from the pixel area being evaluated
        -nodata2expr is evaluated when the second raster pixel value is nodata or when the second raster is absent from the pixel area being evaluated
        -nodatanodataexpr is evaluated when both rasters pixel values are nodata or when the both rastera are absent from the pixel area being evaluated
        -expression is evaluated when both raster pixel values are withdata.

    All expressions default to NULL. When an expression is NULL every pixel filling the condition of this expression is set to NULL (nodata).

My concern is with the last sentence "When an expression is NULL every pixel filling the condition of this expression is set to NULL (nodata).". What is the nodata value in this situation when the pixel is set to NULL?

comment:11 Changed 9 years ago by Bborie Park

I don't believe we want an additional parameter specifying which raster is the primary raster as per:

An optional option could be added to allow specifying which raster, between the first and the second, is the MASTER raster. The MASTER raster is used to determine the pixeltype, the alignment and the nodatavalue of the resulting raster. If the MASTER raster do not have a nodata value defined and it is necessary to set one in the resulting raster then the minimal possible value for the pixeltype should be used as nodata value. In case this option would not be added or in case it is not passed to the function, the first raster should be used to determine those informations. 

Instead, whichever raster is provided first to ST_MapAlgebra should be the MASTER/primary raster.

comment:12 Changed 9 years ago by robe

I concur with Bborie. We have way too many parameters for things already.

Sandro was complaining about his finger getting tired what is with all these permutations. Weren't you Sandro? We've already taken 3 keystrokes away from him with our ST_ - do we need to torture him any more?

comment:13 in reply to:  10 Changed 9 years ago by pracine

Replying to dustymugs:

I don't know if it is appropriate for ST_MapAlgebra to do the alignment automatically as the behavior would differ from the behavior of other exceptions, primarily different SRIDs. Technically, you could reproject the second raster to that of the first automatically as well but we're outputting an exception instead. Outputting an exception is in-line with the behavior found in the geometry functions.

In addition, the resampling of the second raster may change (probably will) depending on the resampling algorithm used, which I think is a decision that the end-user must make.

I would tend to agree. Normally the same way every raster in a table should have the same SRID, they should also be aligned.

  1. For the resulting raster's extent...
     The computation extent (extentexpr in the functions below) and hence the extent of the resulting raster can be: 

        -'FIRST' (the extent of the first raster) (default),
        -'SECOND' (the extent of the first raster),
        -'INTERSECTION' (the extent of the intersection of both rasters) (could also be default),
        -'UNION' (the extent of the union of both rasters).

I think the default should be INTERSECTION or UNION as they are the most obvious from the user's standpoint. I'd prefer INTERSECTION as that is typically the most compact.

I agree.

  1. For the expressions dealing with nodata...
    Alternative expressions, evaluated in place of the main expression, can be provided as parameter to deal with nodata values: 

        -nodata1expr is evaluated when the first raster pixel value is nodata or when the first raster is absent from the pixel area being evaluated
        -nodata2expr is evaluated when the second raster pixel value is nodata or when the second raster is absent from the pixel area being evaluated
        -nodatanodataexpr is evaluated when both rasters pixel values are nodata or when the both rastera are absent from the pixel area being evaluated
        -expression is evaluated when both raster pixel values are withdata.

    All expressions default to NULL. When an expression is NULL every pixel filling the condition of this expression is set to NULL (nodata).

My concern is with the last sentence "When an expression is NULL every pixel filling the condition of this expression is set to NULL (nodata).". What is the nodata value in this situation when the pixel is set to NULL?

It is the nodata value of the new raster which should be determined first by the following rule: 1) The nodata value of the first raster if it exist 2) if not, the nodata value of the second raster if it exist 3) if not, the min possible value according to the type of the raster. We could also add a parameter to specify the nodata value of the new raster.

comment:14 in reply to:  12 Changed 9 years ago by pracine

Replying to robe:

I concur with Bborie. We have way too many parameters for things already.

Sandro was complaining about his finger getting tired what is with all these permutations. Weren't you Sandro? We've already taken 3 keystrokes away from him with our ST_ - do we need to torture him any more?

We have to understand here that rasters have much more internal properties (pixeltype, nodata value, take nodata value into account or not) that we often WANT to be able to set than geometries. There are also many ways to set the alignment of the new raster. This is why we end up with many variants. I see two solutions to this:

1) We should systematically use DEFAULT. Most functions should be revised to see when variant can be removed by using DEFAULT. However from what I understood, using DEFAULT sometimes constraint your flexibility over the order of the parameters (DEFAULT parameters have to last).

2) We should have a "parameter order policy" respected by every functions. We could explain this policy only once in the doc and expect every function to follow this policy.

comment:15 Changed 9 years ago by bnordgren

I would observe a few things:

  1. This is a two band operation, not a two raster operation. Bands have "nodata scalars". Rasters have "nodata vectors".
  2. Perhaps we should have an output_nodata parameter which defaults to the "automatic" scheme described by Pierre, above, but which allows the user to specify a value if they feel like it.
  3. There is no reason whatsoever to assume that the two bands passed to this function come from the same table (and hence are imagined to be aligned for some special case). In fact, it will probably be common to combine tiles from different images.
  4. There is nothing (and should be nothing) requiring rasters from the same table to be aligned (or even to have the same SRID). The data in the raster_columns table is informative, not normative. As Pierre once told me, there is no guarantee that someone didn't insert a rogue raster and mess up your nice table. If you are in fact shooting for "one table per logical theme", you'd better allow for that theme to be global and the table to contain all the UTM zones. One table per theme per UTM zone is insanity personified. Perhaps this should be broken off as a separate discussion. Is there any code which actually depends on all tiles in an individual table being of the same SRID and aligned?

PS: If you're still interested in implementing this using to kick the tires on the raster iterator engine, I'm available to help.

comment:16 in reply to:  15 ; Changed 9 years ago by pracine

Replying to bnordgren:

  1. There is no reason whatsoever to assume that the two bands passed to this function come from the same table (and hence are imagined to be aligned for some special case). In fact, it will probably be common to combine tiles from different images.

Right. This is why I planned that ST_MapAlgebra would implicitely resample. However nothing prevent us from forcing users to make explicit this resampling:

SELECT ST_Mapalgebra(rast1, ST_Resample(rast2, rast1), expr, 'INTERSECTION') FROM cov1, cov2 WHERE ST_Intersects(rast1, rast2)

Are we losing performance here?

  1. There is nothing (and should be nothing) requiring rasters from the same table to be aligned (or even to have the same SRID). The data in the raster_columns table is informative, not normative. As Pierre once told me, there is no guarantee that someone didn't insert a rogue raster and mess up your nice table. If you are in fact shooting for "one table per logical theme", you'd better allow for that theme to be global and the table to contain all the UTM zones. One table per theme per UTM zone is insanity personified. Perhaps this should be broken off as a separate discussion. Is there any code which actually depends on all tiles in an individual table being of the same SRID and aligned?

PostGIS in general assume that all the geometry in one table are of the same SRID. PostGIS DO NOT reproject geometries having different SRID in spatial relationship functions (like ST_Intersection). I think it is wise to follow the same rule (We do not reproject rasters when performing spatial operation between two raster having different SRID).

The question we have to solve, since this is specific to raster, is "Should we resample implicitely when needed?" This is also the first time we are facing the issue since this is the first function involving two rasters (the question should have raised for ST_Intersects(raster, raster) however. Bborie, does that mean that it is possible for two rasters being resampled differently to return a different result for ST_Intersects()?). For sure there should be a warning if implicit resampling is performed.

comment:17 in reply to:  16 Changed 9 years ago by Bborie Park

The question we have to solve, since this is specific to raster, is "Should we resample implicitely when needed?" This is also the first time we are facing the issue since this is the first function involving two rasters (the question should have raised for ST_Intersects(raster, raster) however. Bborie, does that mean that it is possible for two rasters being resampled differently to return a different result for ST_Intersects()?). For sure there should be a warning if implicit resampling is performed.

For the ST_Intersects() function, the alignment of the two rasters do not matter. Only the SRID must be the same. This is because ST_Intersects uses the grid/pixel lines and sees where they intersect. So being aligned and resampling is unnecessary.

As for your question "does that mean that it is possible for two rasters being resampled differently to return a different result for ST_Intersects()?", no. Whatever resampling is done, the output resampled raster's extent would contain the entirety of the input raster.

comment:18 in reply to:  16 Changed 9 years ago by bnordgren

Replying to pracine:

Right. This is why I planned that ST_MapAlgebra would implicitely resample. However nothing prevent us from forcing users to make explicit this resampling:

SELECT ST_Mapalgebra(rast1, ST_Resample(rast2, rast1), expr, 'INTERSECTION') FROM cov1, cov2 WHERE ST_Intersects(rast1, rast2)

Are we losing performance here?

There's another axis here. You've mentioned "explicit" v. "implicit". You didn't mention "on-demand" v. "precomputed tiles". While I'm generally not a fan of implicit operations, I do not see a perfomance issue on that axis. However, "on-demand" resampling could well increase performance. In your example above, you have to cache four tiles to get good performance: rast1, rast2, ST_Resample(rast2, rast1), and the result. With on-demand resampling, you need to cache only three. This means that your tiles can be larger before they cause cache misses.

Now mapalgebra is a special case where this "precomputed tiles" resampling strategy works well. Your biggest performance hit will come with other queries, particularly queries which join a raster coverage with a vector coverage. Users can quickly find themselves causing an entire tile to be resampled in order to lookup one point. Then the tile is resampled again to lookup the next point. Offering on demand resampling would do much to keep users out of trouble...or eliminate the need for them to explicitly create, populate, use, and destroy a temporary table containing the resampled raster.

This is sort of a shameless plug for the spatial collection framework I wrote as part of the raster iterator, as it allows for on-demand operations. And clearly, one can use "on demand" operations to produce a precomputed raster, simply by iterating over all of the cells in the raster. Currently, however, the spatial collection framework is not exposed to the user. It's a toolbox for C programmers to use. It could be exposed to the user if there were a need for it.

PostGIS in general assume that all the geometry in one table are of the same SRID. PostGIS DO NOT reproject geometries having different SRID in spatial relationship functions (like ST_Intersection). I think it is wise to follow the same rule (We do not reproject rasters when performing spatial operation between two raster having different SRID).

It is rare for global vector datasets to be distributed in more than one SRID. It is common for global raster datasets to be distributed in more than one SRID. While I agree that following PostGIS's lead is a good default position, I would suggest that fundamental differences like this represent a solid justification for departing from a behavior which was designed to accommodate vector data.

The question we have to solve, since this is specific to raster, is "Should we resample implicitely when needed?"

In spite of my misgivings about implicit operations, I'd vote yes. Caveats: the resampling should be predictable and shouldn't add a whole bunch of controls to all functions which may need to resample. For instance, the second argument should always be resampled to the first argument; or the user should be able to specify an empty raster which defines the desired grid. If the user wants something other than nearest neighbor, they can explicitly resample.

comment:19 Changed 9 years ago by Bborie Park

Milestone: PostGIS 2.0.0PostGIS Raster Future
Owner: changed from Bborie Park to pracine
Priority: criticalhigh
Status: assignednew

comment:20 Changed 9 years ago by Bborie Park

Resolution: fixed
Status: newclosed

comment:21 Changed 9 years ago by robe

Milestone: PostGIS Raster FuturePostGIS 2.0.0
Note: See TracTickets for help on using tickets.