Opened 14 years ago

Closed 12 years ago

#497 closed task (fixed)

[raster] Start testing displaying PostGIS raster with Mapserver

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 2.0.0
Component: raster Version: master
Keywords: Cc:


The good news is that I started to test wktraster in Mapserver and got something to show up and was in the right spot and about the right size. The bad news is I just got a black box where my raster should have been.

I suspect this is just user error on my part since I haven't tried displaying rasters in mapserver in a really long time. Couple of questions — should the standard pixel syntax supposed to work. Was going to try that next. Also not sure how to create overlays — I tried with my particular example and got some gdal2wktraster errors in line such and such. I presume will need this to improve speed.

I'm going to try next with a simpler image since the one I was using I got a black box too when using gdal_translate to export it out to jpeg.

Anyrate this is the connection string construct I used in case anyone is interested in trying. I presume that part is okay. The other stuff below I actually haven't tried yet, but copied from one of Frank W's examples. Should that syntax work okay or you don't know.

	NAME maraster
	TYPE raster
	DATA "PG:host=localhost port=5432 dbname='postgis_in_action' user='postgres' password='whatever' schema='ch13' table='maraster'"	
	PROCESSING "SCALE=-100.5,100.5"
		NAME "red"
		EXPRESSION ([pixel] < 20)
		COLOR 255 0 0
		NAME "green"
		EXPRESSION ([pixel] >= 1000 and [pixel] < 2000)
		COLOR 0 255 0

Change History (22)

comment:1 by robe, 14 years ago

IT WORKS IT WORKS. I was able to get my BIL file that I loaded in the db to display in Mapserver. Even setting color classes works and speed is not too bad. Sure will be better once I get the over view stuff going.

Anyrate the blank screen issue I was having seems to be caused by gdal how its handling my jpegs/pngs imported so not a mapsever issue at all.

comment:2 by pracine, 14 years ago

Please (if you have time???) try compare the performance of those three scenarios:

1) Serving the raster directly from the filesystem (without storing them in PostGIS)

2) Serving the raster when it is stored in WKT Raster (in-db)

3) Serving the raster when it is only referenced by WKT Raster (out-db). This scenario implies: that the gdal wkt raster driver is able to recognize that the raster is stored out-db, and that it can then "morphe" itself into the right driver (raster could be stored as tiff in the filesystem). I would be amazed if it already does that… Further more if the raster is tiles, the gdal driver must be able to load all the necessary tiles as different rasters.

Scenario 1) and 3) should have very similar performance.

Scenario 2) should be slower but is very convenient as everything is stored in the DB.

For sure a real scenario implies that the raster coverage is highly tiled.

comment:3 by pracine, 14 years ago

Milestone: WKTRaster 0.1.6WKTRaster Future

comment:4 by pracine, 14 years ago

Milestone: PostGIS Raster FuturePostGIS 2.0.0

Maybe we could add what we know up to now in the doc for PostGIS 2.0?

comment:5 by pracine, 14 years ago

Owner: changed from pracine to robe

comment:6 by pracine, 14 years ago

Summary: [wktraster] Start testing displaying PostGIS raster with Mapserver[raster] Start testing displaying PostGIS raster with Mapserver

comment:7 by robe, 13 years ago

I started to test this earnestly and as expected its kind of slow to put. Though it does eventually render. Of course I picked the cruelest of tests — trying to load aerials.

My layer looks something like this and I was pulling via OpenLayers.

	  NAME boston_aerials
	  TYPE raster
	  DATA "PG:host='localhost' port='5432' dbname='ma' user='ma' schema='aerials' table='o_2_boston' mode='2'"  

On inspecting the queries it seems to be doing some very intensive stupid things, but it does also do make box queries but using ~ instead of && which we were planning on deprecating and irradicating.

so basically these are the two kinds of queries I am seeing as I zoom on the map.

SELECT rid, rast FROM aerials.o_2_boston WHERE rast ~ ST_SETSRID(ST_makebox2d(st_point(225866,892151), st_point(225926,892211)),26986);

select (*, foo.rid from (select rid, st_metadata(rast) as md                         from aerials.o_2_boston) as foo

why does it need to pull all the meta data for the whole table? I'll do some more logging to make sure its not just that the query is being cut off and I'm not seeing the whole thing.

I'm using a couple of days old GDAL trunk with mapserver I believe.

in reply to:  7 comment:8 by jorgearevalo, 13 years ago

Replying to robe:

On inspecting the queries it seems to be doing some very intensive stupid things, but it does also do make box queries but using ~ instead of && which we were planning on deprecating and irradicating.

SELECT rid, rast FROM aerials.o_2_boston WHERE rast ~ ST_SETSRID(ST_makebox2d(st_point(225866,892151), st_point(225926,892211)),26986);

That query is the most important thing to change in the driver, IMHO. GDAL drivers try to read data blocks in a "smart" way (and that's not smart, I know. It's very slow). The GDAL API provides a RasterBlock function (for blocked rasters) and a more generic RasterIO function, harder to implement, but more flexible. The PostGIS Raster driver was built on the ReadBlock function, choosing blocks of the same size of the one you specify with -k option in raster2pgsql. So, basically, each call to ReadBlock fetches one database row.

And which row? As input parameters, you have 2 integers (nXBlockOff, nYBlockOff). You have to transform them into a specific database row. In the simplest case, is reasonable to suppose the (nXBlockOff = 0, nYBlockOff = 0) block will match the first raster block in the table, but as each raster table row really is a separated raster, you can't ensure that. The "mode" can tell the driver the expected arrangement for this case is "ONE_RASTER_PER_BLOCK" or "ONE_RASTER_PER_TABLE", and the simplest case is an specific case of the last one. I should use a smarter query in that case (the 0,0 block matches the row with rid = 0, and so on…)

Anyway, one server round per readblock call is a really slow way to fetch the data. I think I need to implement my own RasterIO function, to solve this. And yet, the other related problem to solve is: how to fetch the data? One server round per block (table row) is slow, but how safe is to suppose a bunch of rows are arranged along a grid? You need to get each block's georeference information to be sure. And that's slow.

Comments welcome. And if I'm talking non-sense, please let me know.

select (*, foo.rid from (select rid, st_metadata(rast) as md                         from aerials.o_2_boston) as foo

why does it need to pull all the meta data for the whole table? I'll do some more logging to make sure its not just that the query is being cut off and I'm not seeing the whole thing.

No, you don't need more logging. This query is used to fill the GDALDataset properties with the raster data. With "mode = 2", you're in ONE_RASTER_PER_TABLE mode. So, all the table will be considered as a raster and will match one single GDALDataset object.

Anyway, I should use "DISTINCT" here, to detect if all the raster rows have the same srid and snap to the same grid. I'll change it.

I'm using a couple of days old GDAL trunk with mapserver I believe.

That's the last version of GDAL driver. OK.

comment:9 by robe, 13 years ago

Jorge — I'm actually wondering if you should have another mode.

I guess the question is what do you need from the meta data. Doing a simple distinct won't do it and would be worse because it would still return many records since each tile has a different upperleftx,y and actually doing a DISTINCT even when I get rid of all of that still takes like 10 seconds to read my hires table (which is a small sample of 4 tiles and 20,000 records) (and 2 seconds for my low res). It gets exponentially worse the bigger your table. It really should be linear according to the number of tiles you are pulling (not the number of records in your table). So I'm really much more bothered about the meta data call than the tile call (that aside from the use of ~ seems like as efficient as it can be without too much fuss and for my case my OpenLayer tile sizes match my block size, so it would be making separate calls anyway via mapserver) because I suspect that is a good chunk of where the time is going.

I think the general use case is that people will have a folder of rasters that form a non-overlapping coverage like my case and bring them into the table. You can't really rely on the rid # either since in most cases (at least for mine), I may be bringing them in piecemeal like one town at a time or one state at a time especially for huge things where I need to provision disk space but I want people to see as a contiguous layer on the map.

But more often than not the table will have the same SRID — actually it has to because we enforce it with the SRID constraint during add.

Does gdal ever look in the raster_columns table. Part of my problem is that because the raster2pgsql didn't add the rast column to my overviews — it was simpler for me to just type in rast for column than put in an AddRasterColumn call.

comment:10 by robe, 13 years ago

Actually what I was thinking is you could use the window ROW_NUMBER() function to simulate contiguous rids for the range. Does that work (or maybe I don't understand the problem).

That would mean getting rid of support for PostgreSQL 8.3 (at least for rasters) which we are planning to do for PostGIS 2.0 anyway.

comment:11 by jorgearevalo, 13 years ago

I need to construct a GDALDataset object, so, I need the raster metadata. The driver considers 2 working modes:

ONE_RASTER_PER_ROW: The table is a store of not necessarily related raster objects. You need to specify which row you want in "where" clause of connection string. In that case, you only need metadata for one row, of course.

ONE_RASTER_PER_TABLE: All the table's rows are expected to form a coverage. Things go difficult here because you don't know if all the rows have the same srid, snap to the same grid and do not overlap. Anyway, in that case you need the metadata of your whole raster coverage. How would you get this?

My rough approach was to get metadata of all raster objects. In the ideal case, they all share srid, size, pixel size and skew. You can calculate the extent of the raster. That works for the ideal case, but this is slow, and doesn't take into account other raster arrangements. If not all the rows share grid alignment, I should resample all the rows to the grid of the first one, or provide a new grid alingment for all the rows.

Then, constructing a GDALDataset can be a very slow task, as you have experienced. How would be the better way to get the metadata for a whole raster coverage when you only have the metadata of each separated block and you don't know if all the blocks snap to the same grid or not?

About the ROW_NUMBER issue, I'll try to explain it better. GDAL only knows about "blocks" of data. When you create a new GDALDataset, you can say "my raster data have x pixels width and y pixels height, and is blocked. Each block has n pixels width and m pixels height". Then, GDAL core calculates how many ReadBlock calls needs. In each call, x and y offset are provided, based in the calculated number of blocks and the raster data size.

It's driver's responsibility to match that offsets with one raster block. I can't use "rid" because I don't know it offset (0, 0) matches the raster block with rid 0, and so on. I'm using that offset (0, 0) matches the raster block that starts in pixel 0, 0. With the georeference information and a spatial query, I can get the raster data block.

Of course, I am assuming all the raster blocks have the same dimensions, and this is not a requisite. The driver must deal with this problem too.

About raster_columns, not, the driver doesn't use it.

comment:12 by robe, 13 years ago

Jorge, I'm still not quite understanding . I'm still not quite clear why you need to pull the whole tables metadata (even with a distinct on those elements — which is pretty slow too). Wouldn't it be sufficient to pull the records that overlap the region of interest or even do a limit 1 call for the meta data? Or is the issue because you need to start a a pixel (0,0) and need to know its offset from that?

If I use mode='2' I was assuming that meant to create one GDALRaster dataset and assume that by saying mode='2' I'm basically saying its evenly blocked, it has the same srid and there are no overlapping tiles. otherwise I'd want multiple datasets.

If that is not the case then like I said we should introduce a mode='3' since I really think this is the most common use case.

That said I would think you could reduce your first call to

SELECT (, ( , etc.
FROM (select st_metadata(rast) as md   from aerials.o_2_boston WHERE rast && 
LIMIT 1) as foo }}}

or at the very list put in a rast && somebbox call in there.

In the second query you do a spatial range anyway so I would expect the metadata call to do the same is all. I'll play around with not tiling and see if it is better.

I think part of the issue of the slowness in this particular usecase I am testing is that when using OpenLayers I normally use the buffer tiles mode so the responsiveness is better (e.g. if you have a 600x600 screen — it will break it into several calls so you can see the screen loading — so I might have like 12 calls of 100x100 pixelsize each. So the fact you are pulling the meta data of the whole table on each call is really killing the performance.

comment:13 by jorgearevalo, 13 years ago

We're mixing 2 different queries here:

The metadata query is only used when creating the GDALDataset object, before querying for any data block. Just to inform GDAL about the properties of the raster. With "mode = 2", the driver expect 1 table = 1 raster coverage = 1 GDALDataset object, and it needs to know some things about the raster, like georreference (of the WHOLE coverage, not only 1 row), extent, srid (again, of the whole coverage). For this reason, I execute a query to fetch the metadata of all rows. I could limit the query to 1 result, like you said, but we have a problem here:

If I use mode='2' I was assuming that meant to create one GDALRaster dataset and assume that by saying mode='2' I'm basically saying its evenly blocked, it has the same srid and there are no overlapping tiles.

The driver expects that, but if the raster is not evenly blocked, if there're rows with different srid… it must do something. For example: warn the user and exit, or resample all the rows to the grid of the first one, or ask for a new grid alingment.

Of course, the first option is the easiest one, but the point is the driver still needs to know it. It must check if everything is just like expected (raster evenly blocked, all tiles with same srid, no overlapping tiles). It must get the metadata of all tiles, to check it. Or am I wrong with my approach?

Apart from this, a different query is the query executed when the driver really asks for data. No metadata involved in this query. Only x,y offsets that must be converted to a specific data block (using georeference information and a spatial operator, like &&, or ~). I argue the query I'm using is too slow, because implies one server round per readblock call.

SELECT rid, rast FROM aerials.o_2_boston WHERE rast ~ ST_SETSRID(ST_makebox2d(st_point(225866,892151), st_point(225926,892211)),26986);

comment:14 by pracine, 13 years ago

1) There is a confusion here with srid. srid have nothing to do with regular blocking or irregular blocking. The driver should assume that every row is in the same srid. You REALLY don't have to fetch all rows to know this. If by bad table design or error one or many raster get in a different srid, meaning the georeference would be in different unit or just very far from where it should be, then the extent of the whole table might be wrong and those rasters certainly will be wrongly geolocated. That`s life, we can do nothing againts it and the same happen with a geometry table with different srid. We do not support this because it is very bad user design… The same idea apply for pixelsize.

If really we want to support these kind of stupid arrangement (with different srid or different pixelsize) later then we will have to have really good reason to do so.

2) There are two ways to know the global extent or a table. a) Fetch the raster_column table b) Query the table with ST_Extent

Method a) is becoming less and less reliable. See ticket #542.

Method b) is very fast when there is an index on the table (which we should also assume since creating a table with many tile without creating an index would be foolish)

3) In mode = 2, the driver should first query the extent with ST_Extent and then just query all the tiles accompanied by their respective metadata in a unique query

4) You certainly don't need everything provided by ST_Metadata. You should query every metadata item with their respective function. ST_Metadata by itself is slow because the whole tile has to be red. If you get metadata items while you fetch the tiles it should be much faster since the server has to load the tile anyway. It will give you the metadata item for free = no time

5) To know if the table is regularly blocked is another story. From the beginning I have been against this concept for reason examplified by ticket 542. My point is: Just fetch the tile you need alltogether and burn them in your raster buffer using the georeference. If this is not possible with RasterBlock, then do it with RasterIO…

I think we need a clear an detailed specification document for the driver that we can discuss on. We haven't got one yet…

comment:15 by robe, 13 years ago

Pierre, I think I'm closer to your thinking that Jorge's. What I thought Jorge was saying was that he has to do an individual call for each block of data which I argue is way better than what he is actually doing with that ugly metadata call.

My assumption how this worked which I'm beginning to think I might be wrong is you should never have to do an ST_Extent of the whole table. Pierre an extent of a million record table ain't that fast either.

The way I envisioned how images were rendered in GDAL was

1) You request for the bounding box of you want. If you don't then it would do an extent of the whole table or (if you did a filter, the extent of your where clause).

e.g. Lets say I have a town with town_id = 1 that I want data for then

SELECT ST_Extent(rast) from raster_table where town_id = 1

should be sufficient.

As Pierre said I would assume at the very least the srid of all the rasters were the same.

So I could even do — though it looks yuckier — probably much faste

SELECT ST_Extent(rast), (* FROM raster_table CROSS JOIN (SELECT (ST_MetaData(rast)) As md fROM raster_table WHERE town_id =1 LIMIT 1) As foo where town_id = 1

Knowing the width/height of each raster and srid and extent of the rasters I'm asking for are the same would be sufficient to build the structure of the GDALDataset to populate with my rasters

2) Then you do the call for the rasters. My understanding is that while you could do this as a single call (since each raster has its meta data), Jorge has to do two calls because the first call needs to create the GDALDataset local memory table to hold the blocks and needs to know the extent of the blocks, and the second call does a call for each block.

I think what he is saying is that is a separate call for each rid. Which is actually fine and really not where I'm seeing the time spent. The pain I see is the meta data call.

3) It woulds sequential add the tiles and them clip to the extent of the area that was asked for. Possibly doing a query for each rid (which is may not be that slow because I figure for a good size — you may do 24 calls — I do 1000 calls of stuff in less than a second (granted these will be heavier))

What I was arguing which maybe it was lost, is that in modern web mapping, when you need to pull raster, and you have a 600x800 or whatever screen — you rarely ask for a 600x800 raster.

What the javascript (e.g. OpenLayers) does is says — I want to chop this up into (my mastery of elemental math is really bad so bare with me I can only deal with as , bs and xs)

Okay it will say — I need to make 24 calls of 200x400 size rasters (some are buffered calls not shown on the screen and some are loaded) — to give a seamless slippery map feel.

So what ends up happening is that meta data call you have gets called 24 times and in some(I don't even have any raster data in that location). If I did a bounding query it would be fast, but doing an extent of a 1 million record table is really slow. It really isn't cacheable because its not done as a single call with (what PostgreSQL would consider static data).

comment:16 by pracine, 13 years ago

I still think we need a detailled document. Everybody is assuming (or not assuming) too many things here and the discussion is somewhat confuse.

We need to know:

-Briefly what does the GDAL driver needs and does -What are the planned strategies to answer the driver needs -A progressive, step by step, implementation plan

All raster arrangements should be taken into account even if we plan to support them much later.

comment:17 by jorgearevalo, 13 years ago

Working on GDAL driver specs, for the wiki. Better place to discuss it.

comment:18 by jorgearevalo, 13 years ago

New wiki page created. I'm going to add the link to Planning&Funding page. Anyway, take a look:

comment:19 by pracine, 13 years ago

Great! I added some comments/questions…

comment:20 by pramsey, 12 years ago

This ticket feels ready to close…

comment:21 by robe, 12 years ago

Nope not ready. Jorge still needs to make fixes to teh driver and I'll retest after he has done that. Right now its pretty broken.

comment:22 by robe, 12 years ago

Resolution: fixed
Status: newclosed

okay as Pierre pointed out the outstanding issue is documented in #1422 so I guess we can close this out.

Note: See TracTickets for help on using tickets.