wiki:WKTRasterDriver

Version 12 (modified by jorgearevalo, 15 years ago) ( diff )

--

Implementation of read-only GDAL driver for WKT Raster extension to PostGIS

This is one of the selected projects for Google Summer of Code 2009. Links to the weekly reports will be posted here during the project development, as well as useful information and conclusions.

Weekly reports

Weekly report #1 (23/05 - 29/05)
Weekly report #2 (29/05 - 05/06)
Weekly report #3 (05/06 - 12/06)
Weekly report #4 (12/06 - 19/06)
Weekly report #5 (19/06 - 26/06)

General overview

The main goal of this project is to create a new type of raster driver in GDAL library. This new type of raster driver will deal with a new type of data: the new PostGIS WKT Raster type (an extension of PostGIS aiming at developing support for raster).

First issue is to match the GDAL Dataset architecture with the new WKT Raster type. So, basically, what is a "WKT Raster"?

  • A 'complete' image.
  • An image 'tile'.
  • A raster object. A new type of object, resulting from the rasterization of a vector coverage.

And a WKT Raster always have:

  • one or more raster bands.
  • Associated metadata, that includes georeference information

Now, what is a GDAL Dataset? An assembly of related raster bands and some information common to them all (metadata)

So, the relation between "WKT Raster object" and "GDAL Dataset" seems to be very clear.

But there is an important issue here. The WKT Raster objects will be stored at PostgreSQL tables. So, a table with a column of type WKT Raster may be seen as:

  1. An image warehouse of untiled and (possibly) unrelated images.
  2. An irregularly tiled raster coverage.
  3. A regularly tiled raster coverage.
  4. A rectangular regularly tiled raster coverage.
  5. A tiled image.
  6. A raster object coverage resulting from the rasterization of a vector coverage.

Options c, d and e should have the easier ones to be read by the GDAL driver. They are raster with "regular blocking" structure. When a raster layer of these types is loaded (list taken from WKT Raster doc provided by Pierre Racine):

  1. All loaded tiles have the same width and height,
  2. All tiles do not overlap and their upper left corners follow a regular block grid,
  3. The global extent of the layer is rectangular and not rotated.

Then, for the basic version of the GDAL WKT Raster driver, we can focus on these tasks (reading raster of types c, d and e). Anyway, in the project plan, all the mandatory and optional tasks are listed.

Implementing the Dataset

One GDAL Dataset will be created based on a set of tiles reading from one or more tables with a WKT Raster column. Yes, we may need to read from more than one table: in the case of tiled images. In this type of images, a single table does not represent a complete coverage; other images forming the rest of the coverage are stored as other tables of tiled images. This structure is not very practical from a GIS analytical point of view since any operations applied to the coverage must also be applied to every table.

There are three important methods that must be implemented in the Dataset class:

  • Open: an static method that establishes a connection with PostgreSQL and try to access the table given as argument. Several security checkings must be performed here: if the database has PostGIS support, if the table (or tables) exists, if has a field of raster type, if has a GIST index, etc. (Any more?). Additionally, this method will create the RasterBand objects, needed for fetching the raster bands' data, and create a pointer to the real data.
  • GetGeoTransform: Fetches the coefficients for transforming between pixel/line (P,L) raster space and projection coordinates (Xp, Yp) space.
  • GetProjectionRef: Fetches the projection coordinate system definition in OpenGIS WKT format. It should be suitable for use with the OGRSpatialReference class.

This is for Dataset based on regular blocking arrangements. If we have non-regular blocking arrangements, things turn more complicated. And how do we know if the table can be considered as a raster coverage with regular blocking arrangement? By querying the raster_columns table.

This table is a key concept in the WKT Raster world. Like the PostGIS "geometry_column" table, the PostGIS WKT Raster "raster_columns" table is a way for applications to get a quick overview of which tables have a raster column, and to get the main characteristics (metadata) of the rasters stored in these columns.

One of this table's fields is a boolean field called 'regular_blocking'. If it is set to 'true' then you treat all the rows as normal tiles and create a single GDALDataset. If it is set to 'false' then you should print an error message warning to check for the "MODE" option. This MODE option could work like this:

-MODE=WHOLE_TABLE, in this case you first query the full extent covered by all the raster contained in the table, create a 'black' image following this extent with nodata values and then 'print' every images in the big raster. This is very similar as if the regular_blocking flag is set but you will have to deal with true georeferences and possible overlaps. This should be very similar to what the gdal_merge.py tool does.

-MODE=FIRST_ROW, in this case you treat only the first row returned by a SQL query. This could be very similar to the 'where' parameter of the GEORASTER GDAL driver.

-MODE=ONE_FILE_PER_ROW, in this case you would read each row as an independent image. I'm not sure though that GDAL allows you to do this. For sure a good WKT exporter/dumper would allow this. The result would be one filesystem raster file per row.

Anyway, as we said, I'm going to focus on reading the regular blocking arrangements.

Implementing the RasterBand

With the Dataset, we will be able to access the table (or tables) that form the WKT Raster object, and to know with which type of raster are we dealing with. But we need to read the information of this raster. This is, to read the Raster Bands. There are two important methods to implement:

  • Constructor. In this method, the object gets important info, like the band number thar represents, the data type (data size) and the block size.
  • IReadBlock: This method is the one that really needs the WKT Raster data (the image or tile data). This method take as input the offset over the data (in tiled images, this offset is an index to move over the data pointed by the data field in the Dataset) and a buffer to store the block (tile) read.

The full set of possible GDALDataType values are declared in gdal.h, and include GDT_Byte, GDT_UInt16, GDT_Int16, and GDT_Float32, but we, probably, only are going to consider GDT_Byte to this project. The block size is used to establish a natural or efficient block size to access the data with. For tiled datasets this will be the size of a tile, while for most other datasets it will be one scanline. So, in our basic WKT Raster types reading, we should interpret this like the size of a tile.

And important issue here is how the tiles will finally be handled when fetching the raster data . Is the part in which I have more questions:

  • The data fetched with the Raster Band, is in HEXWKB format? (I think so) I should code a method to transform to this format to another ones...
  • Do we have to encode the data fetched in PNG/JPEG/TIFF format?
  • May we assume that the tiles stored at the same table are homogeneus? What does exactly 'homogeneus' mean here? Only same pixel type?
  • How do we manage the data fetched? Simply copy the raster data sequentially into the destination? If these raster data come from tiles with different pixeltype or band configuration?

I think we should cut, and take decissions about these questions. My proposal:

  • Data fetched in HEXWKB format (really?).
  • Encoding to, at least, one out-db format (¿TIFF?)
  • Assume that tiles stored at same table have the same pixel type. We can put them sequentially in the output format (whatever).
  • The same if tiles come from different table. Too risk?

Any suggestions?

Overviews

If the RASTER_COLUMNS "regular_blocking" value is true then "all blocks are equal sized, abutted and non-overlapping, started at top left origin", plus additional constraints. This regular blocking capability raises the possibility of having very large contiguous raster coverages (made up of many individual WKTRaster-s) which, in turn, raises potential performance problems. Other raster formats counter this by having overviews; a concept that is already supported by GDAL.

I think that this driver must have support of overviews, because I suppose that it will be common to have large images stored at database, that will produce large datasets when reading. Anyway, this issue was being discused until May 20th (http://postgis.refractions.net/pipermail/postgis-devel/2009-May/005629.html), and the final decission seems to be having an additional table for overviews (this is the way in which Mateusz' script manage it). To follow this issue: http://trac.osgeo.org/postgis/wiki/WKTRaster/SpecificationWorking01#RASTER_OVERVIEWSMetadataTable. I'll keep an eye on it.

Project plan

Participants info

  • Student: Jorge Arévalo (jorgearevalo at gis4free.org)
  • Mentors: Tamas Szekeres, Frank Warmerdarm
Note: See TracWiki for help on using the wiki.