wiki:WKTRaster/GDALDriverSpecificationWorking

Version 52 (modified by tkump, 11 years ago) ( diff )

InDB rasters and GDALDataset creation.

GDAL Driver for PostGIS Raster Working Specifications

Current status of the driver (September 2012)

The driver is:

  • Able to read regularly/irregularly tiled raster, each tile with same/different pixel size
  • Able to generate two kind of raster object based on two modes:
    • ONE_RASTER_PER_ROW ('mode = 1' in connection string, or nothing): The default mode. Each table row is considered as an independent raster. If the table required has more than one row, and no -where clause has been specified in connection string, all the table rows will be considered as reported as Subdatasets. Unless you specify the other working mode
    • ONE_RASTER_PER_TABLE ('mode = 2' in connection string): Each table is considered as a raster coverage, and each row is a raster tile.
  • Able to provide a color interpretation for bands

The driver is not:

Design principles

Topic: The basis

The main class of a GDAL driver is GDALDataset: A set of associated raster bands. So, 1 GDALDataset must be able to contain:

  • An untiled image stored in a raster table's row
  • A tiled image stored in a raster table (regular or irregular, rectangular or not, with or without missing tiles, with or without overlapping between tiles)
  • A raster object coverage from the rasterization of a vector coverage stored in a raster table (regular or irregular, rectangular or not, with or without missing tiles, with or without overlapping between tiles)

UPDATE: As Pierre suggested, there're only 2 arrangements

  • Regulary tiled raster
  • Irregulary tiled raster

Take into account a raster can contain only 1 tile. In that case, 1 GDALDataset = 1 PostGIS Raster object (= 1 PostGIS Raster table row). Otherwise, 1 GDALDataset = Several PostGIS Raster objects (= several PostGIS Raster rows). For this reason, the GDAL PostGIS Raster driver has 2 working modes: ONE_RASTER_PER_TABLE, ONE_RASTER_PER_ROW.

The driver deals with regulary and irregularly tiled rasters, and each tile can has its own pixel size. It uses VRT + MEM drivers to allow this.

Question: Are 2 working modes enough to manage all the raster arrangements? [SOLVED]: YES

Pierre: I think yes. We have to distinguish "want we want to produce" from "what we have to deal with". The two modes answer "want we want to produce" and the different table arrangement are "what we have to deal with".

From a GDAL user point of view I know there is a bunch of raster rows in the DB and there is only two things I want to do: extract those rasters rows one by one creating one raster per row or treat them all as a single raster and blend them all together. Furthermore, I want to be able to SELECT those rows using a WHERE statement. If I want a single raster from the db, I have to build my WHERE clause accordingly. There is no need for an extra mode for this. Beside, I don't want to know or have to know what is the raster table arrangement. I expect the driver is able to deal with them all.

Then, the driver has to deal with all the possible arrangement of those selected rows in both mode (overlap, gaps, missing tiles, etc…). You tried to enumerate the posssible arrangement above but I think there is only two cases: the tiles are regularly tiled or they are not, whatever the number of tile there is (1 or more). To me the irregular case is a generalization of the first one.

Jorge: I think we have 3 cases: untiled raster, regularly tiled raster and irregularly tiled raster.

Jorge: ok, updated

If, and only if, you can optimize the regularly tiled case, then you write is as an exception. The problem is to make sure the table is REALLY regularly tiled without relying on the user knowledge. Just the introduction of the -a option to raster2pgsql.py allowing to append tiles to an existing table make the "regularly blocked" flag untrustable. If really we want to maintain this flag we will have to create something like a ST_ValidateRegularBlocking aggregate function.

Jorge: fully agree. The only way to ensure a raster is regularly tiled is a checking function. To be used carefully.

Pierre: Then if we can not rely on the raster_columns flag and if a ST_ValidateRegularBlocking() would be too slow, we have to treat "regularly tiled" and "irregularly tiled" as a one unique case hoping that the "regular" one will be faster because it involves less processing when merging the tiles together.

Jorge: Agree.


Topic: Constructing the GDALDataset object

To construct a GDALDataset object, the driver must:

  • Open the dataset (create db connection)
  • Determine, in a 1st, very fast query to the db, by looking in the raster_overview view, what lower resolution table are available for the requested raster table
  • Determine, in a 2nd, fast enough query to the db, the extent and the maximum number of bands of the requested raster be aggregating the extents of all the rasters. This takes about 1 second on 360000 tiles even if there is no index. [DONE] Is this really solved for the InDB rasters? As far what I have tested with the GDAL 1.10 and QGIS it's really slow to calculate values from the wktraster data column.
  • Determine, in a 3rd, very fast query to the db, the pixel size & rotation, the band types and the nodata value for each band of ONLY ONE raster (LIMIT 1). The driver should assume those values will be the same for every other rasters in the table. If when fetching the other tiles, it realizes one does not, we must say that we do not support this arrangement. [DONE]

Question: If in the first query we find a lower resolution table, does the rest of the work must be performed with this lower resolution table? At least these 3 queries, until we want to read the actual raster data to burn it into the buffer. The queries should be faster in an overview table, but the pixel size will not be the same using an overview table instead the normal resolution table. And you don't read from overviews unless you want to implement decimation because your buffer size is different from your raster size. Am I right?

Pierre: Yes to the first question. This is why we query the overview table first. Yes you're right: I guest the driver must be demanding for a lower resolution only when a specific pixel size is requested from an application. right? Otherwise it request only the highest resolution one or all of them (in the case of gdal_translate for example) right? Hos does the driver knows if he has to produce only a specific resolution, only the best one or all of them?

Jorge: The driver looks for the requested table. This is the higher resolution version of the raster, and the driver works with that table, unless the user asks for a lower resolution. This happens, for example, when the user specify -outsize option in gdal_translate. The R/W GDAL functions include a buffer with the needed size to store the data requested with the resolution asked. If, by default, you use the lower resolution table, regardless the user's desire, you must implement decimation by overwritting the R/W GDAL functions. In general, you shouldn't implement decimation (it shouldn't be necessary)


Topic: Reading/Writing raster data

Once constructed the basic structure (GDALDataset object and related GDALRasterBand objects), we read/write the data, following this general method in PostGISRasterBand::IRasterIO: Fetch, in a long query, all the rasters along with their world georeferences (upperleftx and upperlefy, width and height), create MEM datasets with them, create a VRTDataset with the MEMDatasets and call IRasterIO on VRT raster band.

About the IReadBlock method (implemented in the rasterband class too), it just call IRasterIO with proper block parameters.


Topic: Some general thoughts

Question (Pierre, july 2012) - SOLVED (Implemented using VRT driver, as suggested): In the worst case, a VRT is basically the same thing as a PostGIS raster table (just a bunch of non aligned rasters) and I was wondering if we could not simply copy the code of the VRT driver and adjust it so it reads all the rasters of a PostGIS query instead of reading them from the file system. Basically the VRT file resume to all the rasters returned by a where query whatever their alignment are. If I'm right then we would benefit from any smart optimization implemented by Frank in the VRT driver…

  • The driver should not be dependent on a "rid" column nor request for it. Currently, the 'rid' attribute is hardcoded in some queries. A better approach would be looking for a primary/unique key. This is useful when getting subdatasets (each subdataset should be identified by its primary key). But just when write ability exists. In case of reading, you could always create a GDAL-internal key for each tile loaded assuming there is a "management" structure for the set of tiles. Pierre says that, if there’s no primary key, you could use a query with “where ST_UpperLeftX(rast) = XXX AND ST_UpperLeftY(rast) = YYY".

Open Question: Why does the key should only be required when the write ability exists?

  • The driver should not be dependent on the "raster_column" view since entries in this view, contain useful information ONLY if AddRasterConstraint was called on the table. It can however be dependent on "raster_overview" since no overview exists if it is not visible in raster overview.
  • Regularly tiled arrangement should be optimized just by their natural order. Not by trying to discover if they are regularly tiled or not and trying to apply special processing to them. This is not necessary and there is nothing to determine quickly if a coverage is regularly tiled right now (except the flag in the raster_column table which exist only if it was explicitly set. We are planning to make a ST_IsRegularlyTiled() function in the future but it should be a slow process). The trick is to burn data in the GDAL buffer based on their world coordinate compared with the buffer world coordinates. That makes the driver to support transparently not tiled, regularly tiled and irregularly tiled (= missing and/or different-size tiles) tiles, not ordered, all the same way.
  • If, while we fetch the raster data, we discover that one or more rasters among the ones selected (mode=2) are not aligned with the previous rasters then there is two options: 1) We notice the user saying that we can not produce a raster from non aligned tiles 2) We query the database for new versions of those tiles resampled on the current buffer alignment (I don't think this is a good idea.. That could be an explicit option of the driver though.) Same with different pixeltype and different nodata value. These things could be discovered earlier, when requesting for the extent (2nd query), in a future version, if we had a set of ST_SameAlignment(), ST_SameBandPixelType() and ST_SameBandNodataValue() aggregates all returning false if any of those values do not match the others. We could probably have a more general ST_SameBandType() aggregate instead of the two last ones.
  • We can, right now, make a version working on any kind of non rotated, aligned arrangement by using ST_Extent() along with ST_ScaleX, ST_ScaleY, ST_SkewX and ST_SkewY to determine the size and georeference of the raster (see query below). I don't see yet how we can do it on rotated arrangement without a ST_RotatedExtent() function that do not exist yet.
Note: See TracWiki for help on using the wiki.