Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#5046 closed defect (fixed)

PostGis RasterDriver : Dataset Open taking too much time

Reported by: ryagz Owned by: jorgearevalo
Priority: high Milestone: 1.10.1
Component: GDAL_Raster Version: 1.10.0
Severity: major Keywords:
Cc: jorgearevalo

Description

I am having a postgis raster table which has raster data about 40 GB, when i try to load the dataset its taking too much time (about 40 mins) in the function "GDALDataset* PostGISRasterDataset::Open(GDALOpenInfo* poOpenInfo) " .

When i digged into the code i found out that in the function "GBool PostGISRasterDataset::SetRasterProperties(const char * pszValidConnectionString)"

there is an sql query issued on the database to get raster properties

if (pszWhere == NULL) {

osCommand.Printf(

"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select st_srid(%s) srid, " "st_extent(%s::geometry) geom, max(ST_NumBands(rast)) nbband from %s.%s " "group by st_srid(%s)) foo", pszColumn, pszColumn, pszSchema, pszTable, pszColumn);

}

else {

osCommand.Printf(

"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select st_srid(%s) srid, " "st_extent(%s::geometry) geom, max(ST_NumBands(rast)) nbband from %s.%s " "where %s group by st_srid(%s)) foo", pszColumn, pszColumn, pszSchema, pszTable, pszWhere, pszColumn);

}

This sql query is causing all the delay because its going to scan the entire table and get the raster properties. One idea what i would like to propose is if the user has already applied the function AddRasterConstraints on the raster table then all the raster properties which are required will be readily available in raster_columns table which is present in public schema.

So for the cases where RasterConstraints is already applied we could have the sql query some thing like this:

if (pszWhere == NULL) {

osCommand.Printf(

"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select srid srid, " "extent geom, num_bands nbband from raster_columns where r_table_schema = '%s' " "and r_table_name = '%s') foo", pszSchema, pszTable);

}

else {

osCommand.Printf(

"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select srid srid, " "extent geom, num_bands nbband from raster_columns where r_table_schema = '%s' " "and r_table_name = '%s' where %s) foo", pszSchema, pszTable, pszWhere);

}

I have tested the above mentioned sql query with the same table of 40GB size, the Dataset open speed has drastically improved, Now Dataset Open happens in couple of seconds.

I guess we could have a if else condition, that is if raster properties are available in raster_columns table use SQL Query 2 else use SQL Query 1 mentioned above.

Change History (11)

comment:1 by jorgearevalo, 11 years ago

Owner: changed from warmerdam to jorgearevalo
Status: newassigned

comment:2 by jorgearevalo, 11 years ago

Milestone: 1.10.1

comment:3 by jorgearevalo, 11 years ago

Version: svn-trunk1.10.0

comment:4 by jorgearevalo, 11 years ago

Resolution: fixed
Status: assignedclosed

Fixed in branches/1.10 (r25999) and trunk (r26000)

comment:5 by pracine, 11 years ago

It is not as much the size of the data that is important here as the number of rows (tiles) in the table. How many rows are there in this table?

comment:6 by jorgearevalo, 11 years ago

I agree. I'm having problems with a table of 1251 rows. Even solving this ticket, any time you want the metadata of all tiles (not even the data itself) and you scan the whole table, it's really slow.

comment:7 by pracine, 11 years ago

I do this query on a table of 300000 row in less than one second...

comment:8 by pracine, 11 years ago

What if you fetch only the extent? What if you do it with PgAdmin (or through another client)? The idea here is it seems strange that such a query take so long. Try to simplify it in order to find what make it slow.

comment:9 by jorgearevalo, 11 years ago

Are you talking about this query (reads from raster_columns)?

select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select srid srid, extent geom, num_bands nbband from raster_columns where r_table_schema = '%s' and r_table_name = '%s' where %s) foo, pszSchema, pszTable, pszWhere);

Because that is fast. The slow one should be the other query (it is for me)

comment:10 by pracine, 11 years ago

I'm speaking about the first one above (not relying on raster_column).

comment:11 by jorgearevalo, 11 years ago

Interesting... That query really scan the whole table, fetching some metadata of all tiles. I understand if it is slow. What indices do you have in your table?

Note: See TracTickets for help on using tickets.