Context Navigation

Changes between Version 2 and Version 3 of WKTRasterTutorial01

Timestamp:: 06/10/10 13:30:25 (15 years ago)
Author:: pracine
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

TabularUnified WKTRasterTutorial01

-              v2
+              v3
 Launch a command prompt (or a shell under Linux) and make sure you can execute "shp2pgsql". If this works, you should get the help for the shp2pgsql command. If not, make sure the folder where it is located ("PostgreSQL_Installation_Path/bin) is in you PATH environement variable or simply type the complete path to shp2pgsql like this:
+{{{
         >"c:/Program Files/PostgreSQL/8.X/bin/shp2pgsql"
+}}}
 shp2pgsql works in two steps: The first step is to create a .sql file that you then have to execute using psql, an application which comes with PostgreSQL. To convert the caribou point shapefile in cariboupoint.sql, we do the following:
         >shp2pgsql.exe -s 32198 -I C:\Temp\Pierre\Data\CaribouPoints\cariboupoints.shp > C:\Temp\
+{{{     >shp2pgsql.exe -s 32198 -I C:\Temp\Pierre\Data\CaribouPoints\cariboupoints.shp > C:\Temp\
 Pierre\Data\CaribouPoints\cariboupoints.sql
+}}}
 The -s option tells the loader that the shapefile points are located in the Quebec Lambert spatial reference system which has SRID number 32198 in the PostGIS spatial_ref_sys table.
 …
 You can then execute the sql file with psql like this:
+{{{
         >psql -f C:\Temp\Pierre\Data\CaribouPoints\cariboupoints.sql demodb
+}}}
 == Visualizing the caribou point in OpenJUMP ==
 …
 . In the query field, type the following command:
+{{{
         SELECT id, ST_AsBinary(the_geom)
+        FROM cariboupoints
+        FROM cariboupoints;
+}}}
 You should see your caribou observation point appear in the OpenJUMP map window.
 …
 To display gdal2wktraster.py help, simply do:
+{{{
         >gdal2wktraster.py -h
+}}}
 We downloaded 8 zipped files. If you have 7-Zip installed, you can simply select all the zip files, right-click on them and select "7-Zip->Extract files..." 7-Zip will extract each TIF file in the same folder. You can then create another folder and move them there. Each file is 6000 pixels wide by 6000 pixel high. You can quickly preview the images if you have IrfanView installed. Efficient raster/vector analysis with PostGIS WKT Raster requires the raster files to be split into tiles in the database. Fortunately gdal2wktraster.py has an option to tile the images the size we like. We will tile them into 100x100 pixels resulting in 60 * 60 * 13 = 46800 tiles. gdal2wktraster.py also accepts wildcard so we will be able to load our 8 big raster in one sole command.
 …
 To load the raster files in PostGIS WKT Raster, do something like:
+{{{
         >gdal2wktraster.py -r C:\Temp\Pierre\Data\SRTM\tif\*.tif -t srtm_tiled -s 4326 -k 100x100 -I > C:\Temp\Pierre\Data\SRTM\srtm.sql
+}}}
 The -r option indicate the series of raster we want to load in the database.
 …
 The -I option tells the loader to create a spatial index on the raster tile table. If you forget to add this option you can always add the index by executing a SQL command similar to this one:
+{{{
         CREATE INDEX srtm_tiled_gist_idx ON srtm_tiled USING GIST (ST_ConvexHull(rast));
+}}}
 The result of the gdal2wktraster.py command is a 1.8 GB .sql file produced in 1 minute on my brand new Lenovo X201 labtop (Intel Core i5, 1.17 GHz, 3 GB or RAM :-).
 …
 The same way we loaded the caribou point sql command file, we will load this sql file using psql:
+{{{
         >psql -f C:\Temp\Pierre\Data\SRTM\srtm.sql demodb
+}}}
 This process took less than 4 minutes. You can quickly verify the success of the loading operation by looking at the number of row present in the srtm_tiled table. There should be 46800 rows.
 …
 You can then visualise the extent of each of those 46800 rasters by typing the following command in the OpenJUMP "Run Datastore Query" dialog:
+        SELECT rid, ST_AsBinary(rast::geometry) FROM srtm_tiled
+{{{
+        SELECT rid, ST_AsBinary(rast::geometry) FROM srtm_tiled;
+}}}
 You can also view the complete area covered by the raster coverage as one geometry by doing this:
+        SELECT ST_AsBinary(ST_Buffer(ST_Union(rast::geometry), 0.000001)) FROM srtm_tiled
+{{{
+        SELECT ST_AsBinary(ST_Buffer(ST_Union(rast::geometry), 0.000001)) FROM srtm_tiled;
+}}}
 As you can see, unlike most raster coverage loaded into a GIS, the area covered by this single table raster coverage is not stricly rectangular and the two big missing square areas are not filled with nodata values. Here you have only one table for the whole coverage which is build from many raster files loaded in a single step. You don't need 13 tables to store 13 rasters. We did this with one band SRTM files in TIF format, but it could have been 3 bands BMP files of JPEG and the total size of the coverage does not really matter (PostgreSQL has a limit of 32 Terabytes)...
 …
 If you want to go to the pixel level and verify the integrity of the values associated to a sample of the raster, (thanks to Jorge Arevalo and GDAL) you can view a vectorization of one of the tile (vectorizing all the table would be way too long) in OpenJUMP. Just type the following SQL query in the same query dialog:
+{{{
         SELECT ST_AsBinary((ST_DumpAsPolygons(rast)).geom), (ST_DumpAsPolygons(rast)).val
         FROM srtm_tiled
+        WHERE rid=34
+        WHERE rid=34;
+}}}
 You will notice that a small area in the upper left corner of the tile is missing. This is the nodata value area. Any analysis function in PostGIS WKT Raster takes the nodata value into account. That means those area will be ommited from the operation. For example in the upcoming intersection operation, if one buffer falls in a nodata value area, st_intersection() and st_intersects() will ignore any part of the raster with nodata values possibly resulting in no intersection at all.
 …
 You can verify that your raster has a nodatavalues set and the effective value of the nodata value by doing this query:
+{{{
         SELECT ST_BandHasNodataValue(rast), ST_BandNodataValue(rast)
         FROM srtm_tiled
+        LIMIT 1
+        LIMIT 1;
+}}}
 I you want a query to ignore the nodata value set and treat it like any other value, you can always replace any rast argument with ST_SetBandHasNodataValue(rast, FALSE) like in this query:
+{{{
         SELECT ST_AsBinary((ST_DumpAsPolygons(ST_SetBandHasNodataValue(rast, FALSE) )).geom), (ST_DumpAsPolygons(ST_SetBandHasNodataValue(rast, FALSE))).val
         FROM srtm_tiled
+        WHERE rid=XXX
+        WHERE rid=34;
+}}}
 == Producing a random point table similar to the caribou shapefile ==
 …
 Now that we know the extent of our elevation data, we can produce a table similar to the caribou_point one. For this we will use a function that generate a certain number of point geometry inside a polygon. Copy this function in the pgAdmin III query window and execute it:
+CREATE OR REPLACE FUNCTION ST_RandomPoints(geom geometry, nb int)
+{{{
+    CREATE OR REPLACE FUNCTION ST_RandomPoints(geom geometry, nb int)
     RETURNS SETOF geometry AS
     $$
 …
     $$
     LANGUAGE 'plpgsql' IMMUTABLE STRICT;
+}}}
 You can then generate a fake cariboupoints table with 814 points doing this query:
+{{{
         CREATE TABLE cariboupoints AS
         SELECT generate_series(1,1000) id,
                ST_RandomPoint(the_geom, 1000) the_geom
+        FROM (SELECT ST_Transform(ST_Extent(rast::geometry), 32198) FROM srtm_tiled) foo
+        FROM (SELECT ST_Transform(ST_Extent(rast::geometry), 32198) FROM srtm_tiled) foo;
+}}}
 Similarly you can visualise your own caribou point layer with this OpenJUMP query:
+{{{
         SELECT id, ST_AsBinary(the_geom)
+        FROM cariboupoints
+        FROM cariboupoints;
+}}}
 Now that we have our two coverages loaded in the database and that we are confident about their values and spatial reference, let's start doing some analysis queries...
 …
 This operation is also very straightforward for any casual PostGIS user. The caribou point table is in a spatial reference system suitable for measurements in meters so we can build our buffers in this system and then convert them to the raster one later in order to intersects them. We will make 1 km buffers around our points and save them as a new table. We do this operation in a pgAdmin III query window (not in OpenJUMP):
+{{{
         CREATE TABLE cariboupoint_buffers AS
         SELECT id, ST_Buffer(the_geom, 1000) the_geom
+        FROM cariboupoints
+        FROM cariboupoints;
+}}}
 Again you can display your buffers in OpenJUMP with this query:
+{{{
         SELECT id, ST_AsBinary(the_geom)
+        FROM cariboupoint_buffers
+        FROM cariboupoint_buffers;
+}}}
 We then reproject our buffers to WGS 84 so we can intersect them with our raster coverage:
+{{{
         CREATE TABLE cariboupoint_buffers_wgs AS
         SELECT id, ST_Transform(the_geom, 4326) the_geom
+        FROM cariboupoint_buffers
+        FROM cariboupoint_buffers;
+}}}
 Again the same kind of query to display them in OpenJUMP and confirm that they overlap with the raster coverage and see that now they are a bit squashed:
+{{{
         SELECT id, ST_AsBinary(the_geom)
+        FROM cariboupoint_buffers_wgs
+        FROM cariboupoint_buffers_wgs;
+}}}
 We could have done those two queries in one:
+{{{
         CREATE TABLE cariboupoint_buffers_wgs AS
         SELECT id, ST_Transform(ST_Buffer(the_geom, 1000), 4326) the_geom
+        FROM cariboupoints
+        FROM cariboupoints;
+}}}
 We then create a spatial index on this last table to make sure the next intersection operation will be done as quickly as possible:
+{{{
         CREATE INDEX cariboupoint_buffers_wgs_geom_idx ON cariboupoint_buffers_wgs USING GIST (the_geom);
+}}}
 == Intersecting the caribou buffers with the elevation rasters ==
 Our two coverages are now ready to be intersected. We will use the brand new st_intersection() function and st_intersects() operator which operates directly on raster and geometries by vectorizing only the necessary part of the raster before doing the intersection and is one of the main feature of PostGIS WKT Raster:
+{{{
         CREATE TABLE caribou_srtm_inter AS
         SELECT id,
 …
         FROM cariboupoint_buffers_wgs,
              srtm_tiled
+        WHERE st_intersects(rast, the_geom)
+        WHERE st_intersects(rast, the_geom);
+}}}
 This query takes about 30 minutes to complete. It can greatly be optimized by invoquing the st_intersection only once with the help of a subquery producing record values that we split later in the main query like this:
+{{{
         CREATE TABLE caribou_srtm_inter AS
         SELECT id,
 …
                    cariboupoint_buffers_wgs
               WHERE st_intersects(rast, the_geom)
+             ) foo
+             ) foo;
+}}}
 This version takes only 17 minutes. Almost half the time spent by the preceding one... This little increase in complexity certainly worths the gain in performance.
 …
 Once again you can visualise the result of the query by doing this query in OpenJUMP (you might have to increase RAM allowed to OpenJUMP by changing "-Xmx256M" to "-Xmx1024M" in "OpenJUMPInstallPath/bin/openjump.bat"):
+{{{
         SELECT id, val, ST_AsBinary(the_geom)
+        FROM caribou_srtm_inter
+        FROM caribou_srtm_inter;
+}}}
 One approach to make this operation without PostGIS WKT Raster would have been to first convert our 13 rasters to 13 shapefiles, import them in PostGIS and do a traditionnal intersection query between geometries only. The problem is that converting only one of those SRTM file is very difficult... The vector version of only one of those files exceed the limit of 2 GB imposed on shapefiles and the GML equivalent file is more than 10 GB... The WKT Raster approach goes like this: "since it is mostly impossible to convert those raster to vector, load them all into PostGIS as is and only the smallest part of them raster will be vectorized to do the requested operation", this using the very efficient GiST index, the versatility of GDAL and of power of the postGIS vector engine. XXXXXWe will later examine alternative solutions to this particular vectorization problem as well as how we could do the complete raster/vector operation using other software.
 …
 The last step of our analysis is to summarize the elevation for each buffer by computing 814 weighted mean elevations. "Weighted mean elevation" means that if the area occupied by pixels with an elevation equal to 234 m is greater than the area occupy pixels with an elevation equal to 56 m, then 234 has a higher weight in the equation (proportional to it relative area).This query should do the work:
+{{{
         CREATE TABLE result01 AS
         SELECT id,
 …
         FROM caribou_srtm_inter1
         GROUP BY id
+        ORDER BY id
+        ORDER BY id;
+}}}