#5046 closed defect (fixed)
PostGis RasterDriver : Dataset Open taking too much time
Reported by: | ryagz | Owned by: | jorgearevalo |
---|---|---|---|
Priority: | high | Milestone: | 1.10.1 |
Component: | GDAL_Raster | Version: | 1.10.0 |
Severity: | major | Keywords: | |
Cc: | jorgearevalo |
Description
I am having a postgis raster table which has raster data about 40 GB, when i try to load the dataset its taking too much time (about 40 mins) in the function "GDALDataset* PostGISRasterDataset::Open(GDALOpenInfo* poOpenInfo) " .
When i digged into the code i found out that in the function "GBool PostGISRasterDataset::SetRasterProperties(const char * pszValidConnectionString)"
there is an sql query issued on the database to get raster properties
if (pszWhere == NULL) {
osCommand.Printf(
"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select st_srid(%s) srid, " "st_extent(%s::geometry) geom, max(ST_NumBands(rast)) nbband from %s.%s " "group by st_srid(%s)) foo", pszColumn, pszColumn, pszSchema, pszTable, pszColumn);
}
else {
osCommand.Printf(
"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select st_srid(%s) srid, " "st_extent(%s::geometry) geom, max(ST_NumBands(rast)) nbband from %s.%s " "where %s group by st_srid(%s)) foo", pszColumn, pszColumn, pszSchema, pszTable, pszWhere, pszColumn);
}
This sql query is causing all the delay because its going to scan the entire table and get the raster properties. One idea what i would like to propose is if the user has already applied the function AddRasterConstraints on the raster table then all the raster properties which are required will be readily available in raster_columns table which is present in public schema.
So for the cases where RasterConstraints is already applied we could have the sql query some thing like this:
if (pszWhere == NULL) {
osCommand.Printf(
"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select srid srid, " "extent geom, num_bands nbband from raster_columns where r_table_schema = '%s' " "and r_table_name = '%s') foo", pszSchema, pszTable);
}
else {
osCommand.Printf(
"select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, " "st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select srid srid, " "extent geom, num_bands nbband from raster_columns where r_table_schema = '%s' " "and r_table_name = '%s' where %s) foo", pszSchema, pszTable, pszWhere);
}
I have tested the above mentioned sql query with the same table of 40GB size, the Dataset open speed has drastically improved, Now Dataset Open happens in couple of seconds.
I guess we could have a if else condition, that is if raster properties are available in raster_columns table use SQL Query 2 else use SQL Query 1 mentioned above.
Change History (11)
comment:1 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 11 years ago
Milestone: | → 1.10.1 |
---|
comment:3 by , 11 years ago
Version: | svn-trunk → 1.10.0 |
---|
comment:4 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:5 by , 11 years ago
It is not as much the size of the data that is important here as the number of rows (tiles) in the table. How many rows are there in this table?
comment:6 by , 11 years ago
I agree. I'm having problems with a table of 1251 rows. Even solving this ticket, any time you want the metadata of all tiles (not even the data itself) and you scan the whole table, it's really slow.
comment:8 by , 11 years ago
What if you fetch only the extent? What if you do it with PgAdmin (or through another client)? The idea here is it seems strange that such a query take so long. Try to simplify it in order to find what make it slow.
comment:9 by , 11 years ago
Are you talking about this query (reads from raster_columns)?
select srid, nbband, st_xmin(geom) as xmin, st_xmax(geom) as xmax, st_ymin(geom) as ymin, st_ymax(geom) as ymax from (select srid srid, extent geom, num_bands nbband from raster_columns where r_table_schema = '%s' and r_table_name = '%s' where %s) foo, pszSchema, pszTable, pszWhere);
Because that is fast. The slow one should be the other query (it is for me)
comment:11 by , 11 years ago
Interesting... That query really scan the whole table, fetching some metadata of all tiles. I understand if it is slow. What indices do you have in your table?
Fixed in branches/1.10 (r25999) and trunk (r26000)