Changes between Initial Version and Version 1 of WKTRaster/GDALDriverSpecificationWorking


Ignore:
Timestamp:
Feb 17, 2011, 11:12:51 AM (13 years ago)
Author:
jorgearevalo
Comment:

GDAL PostGIS Raster specs, for debate.

Legend:

Unmodified
Added
Removed
Modified
  • WKTRaster/GDALDriverSpecificationWorking

    v1 v1  
     1= '''GDAL Driver for PostGIS Raster Working Specifications''' =
     2{{{
     3#!div  style='background-color: #F4F4F4; padding: 5px; border: 1px solid gray; float: right; margin-left: 5px; width: 260px; font-size: small;' >
     4
     5'''Quick Links'''
     6
     7 * [wiki:WKTRaster PostGIS Raster Home Page]
     8 * [wiki:WKTRaster/PlanningAndFunding Planning & Funding]
     9
     10 * [wiki:WKTRaster/SpecificationWorking02 Working Specifications for PostGIS 2.0]
     11
     12 * [wiki:WKTRaster/SpecificationFinal01 Old Final Specifications for Beta 0.1.6]
     13
     14}}}
     15----
     16
     17== '''Current status of the driver (February 2011)''' ==
     18
     19The driver is:
     20 * Able to read in-db evenly blocked rasters (all blocks with same size)
     21 * Able to read in-db one-row-rasters:
     22   - If the table really has more than one row: using -where clause in connection string
     23   - If the table has more than one row: the table must have been marked as "regularly blocked table", with -k in loader
     24 * Able to manage two working modes:
     25   - ONE_RASTER_PER_ROW ('mode = 1' in connection string, or nothing): The default mode. Each table row is considered as an independent raster. If the table required has more than one row, and no -where clause has been specified in connection string, all the table rows will be considered as reported as Subdatasets. Unless you specify the other working mode
     26   - ONE_RASTER_PER_TABLE ('mode = 2' in connection string): Each table is considered as a raster coverage, and each row is a raster tile.
     27 * Too slow (Reads the entire table metadata for constructing the [http://www.gdal.org/classGDALDataset.html GDALDataset object], and needs one server round per [http://www.gdal.org/classGDALRasterBand.html GDALRasterBand::IReadBlock] call)
     28
     29The driver is not:
     30 * Able to read out-db rasters (developed, but not tested, and with known bugs)
     31 * Able to create new rasters
     32 * Able to manage all the [http://trac.osgeo.org/postgis/raw-attachment/wiki/WKTRaster/Documentation01/WKTRasterArrangements.gif PostGIS Raster arrangements]
     33 * Able to provide a color interpretation for bands
     34
     35
     36== '''Design principles''' ==
     37
     38----
     39=== Topic: The basis ===
     40The main class of a GDAL driver is [http://www.gdal.org/classGDALDataset.html GDALDataset]: A set of associated raster bands. So, 1 GDALDataset must be able to contain:
     41  * An untiled image stored in a raster table's row.
     42  * A tiled image stored in a raster table (regular or irregular, rectangular or not, with or without missing tiles, with or without overlapping between tiles)
     43  * A raster object coverage from the rasterization of a vector coverage stored in a raster table (regular or irregular, rectangular or not, with or without missing tiles, with or without overlapping between tiles)
     44
     45In the first case, 1 GDALDataset = 1 PostGIS Raster object. In the other two cases, 1 GDALDataset = Several PostGIS Raster objects. For this reason, the GDAL PostGIS Raster driver has '''2 working modes''': ONE_RASTER_PER_TABLE, ONE_RASTER_PER_ROW.
     46
     47However, currently the driver only deals with continuous tiled raster layers, when all the raster tiles are the same size, snap to the same grid and do not overlap (the ideal case).
     48
     49'''Open question''': Are 2 working modes enough to manage all the raster arrangements?
     50----
     51=== Topic: Constructing the GDALDataset object ===
     52
     53To construct a GDALDataset object, the driver must:
     54  * Open the dataset (create db connection)
     55  * Read some data about the dataset (metadata): srid, georeference information, projection information, raster data size, band information (number of bands, pixel size, color interpretation, if present), any other driver-specific dataset related information (i.e.: in our case, schema and table name)
     56  * Construct the structure for raster bands, with instances of [http://www.gdal.org/classGDALRasterBand.html GDALRasterBand] class. You need to provide some basic information: data type (pixel size), block size (GDAL contains a concept of the natural block size of rasters so that applications can organized data access efficiently for some file formats) and color interpretation (if any).
     57
     58The metadata must be read from the raster table, using SQL functions like ST_Extent (used for raster data extent), ST_Metadata (used for general raster metadata) or functions like ST_SRID, ST_Width, ST_Height, etc. When your GDALDataset matches only one raster row (a raster tile) this is not a problem. But when your GDALDataset matches a whole raster table (ONE_RASTER_PER_TABLE mode), you have 2 options:
     59  * Call the functions over the whole table and filter the result (i.e.: select distinct st_srid(rast) from raster_table, select distinct st_metadata(rast) from raster table). It can be a really slow operation, but you can check if all tiles are like expected (for example: if they are the same size, if they share the same srid, if they overlap or not, etc)
     60  * Call the functions limiting the output to one result. Fast operation, but may be incorrect
     61
     62Currently, the driver takes the first (and slow) option. That caused performance problems (see ticket #497)
     63
     64'''Open question''': How to fetch the information needed to construct the GDALDataset? Pay attention to the fact that '''you are not asking for raster data yet'''. You only need metadata, for constructing the basic GDALDataset object.
     65
     66----
     67=== Topic: Reading/Writing raster data ===
     68
     69Once constructed the basic structure (GDALDataset object and related GDALRasterBand objects), you need to choose the strategy for raster data reading/writing:
     70  * ''Natural'' block oriented r/w: The driver reads/writes data in equal sized blocks. The potentially more efficient way of r/w data. Really, the natural block size for this dataset is chosen during GDALRasterBand creation. So, '''it's driver's responsibility to provide the desired value for block size'''. To use this method, your driver must provide an implementation of [http://www.gdal.org/classGDALRasterBand.html#09e1d83971ddff0b43deffd54ef25eef IReadBlock].
     71  * Region oriented r/w: The driver reads/writes arbitrary regions of data. It's a potentially less efficient method, because you have to take care of '''data type translation''' if the data type of the buffer is different than that of the GDALRasterBand. You also must takes care of '''image decimation / replication''' if the buffer size (nBufXSize x nBufYSize) is different than the size of the region being accessed (nXSize x nYSize). To use this method, your driver must provide an implementation of [http://www.gdal.org/classGDALRasterBand.html#5497e8d29e743ee9177202cb3f61c3c7 IRasterIO].
     72
     73Clearly, there's no best method for reading/writing data in our case. In the ideal case of regulary blocked rasters, with no overlapping and same grid for all tiles, the block oriented r/w is the more appropiate strategy. But in the rest of the cases, a more general r/w method must be provided.
     74
     75Currently, the natural block oriented r/w method is the one implemented for the driver. This is a limitation for 2 reasons:
     76  * Obviously, it only fits one raster arrangement
     77  * Each ReadBlock call forces a new server round, constructing a Box and getting the raster row that contains it. This can be really slow, in case of huge raster coverages (question raised in ticket #497 too).
     78
     79'''Open question''': How to get the needed metadata in case of ONE_RASTER_PER_TABLE arrangement. As argued in ticket #497, executing ''ST_Extent'' or ''ST_Metadata'' without limits over a big table can be a really heavy process.
     80
     81'''Open question''': What should be the general r/w algorithm?
     82jorgearevalo: I think the strategy ''read as much data as you can'' should be the right one, to minimize server rounds. This is: construct a query that, using ''ST_Intersects'', fetches as much rows as possible. This query would be executed in ''IRasterIO'' method. But I don't know how to choose the geographic limits for the query (how much data is ''as much data as you can''?)
     83
     84----