Opened 22 months ago

Last modified 22 months ago

#5176 new defect

no data check in raster2pgsql and also ST_BandIsNoData are slow

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS Fund Me
Component: raster Version: master
Keywords: Cc:

Description

I suspect it's too late to do anything about this in 3.3 and maybe nothing can be done.

In PostGIS 3.2 we added the -k to raster2pgsql to allow skipping no data check. From PostGIS 3.2 on, the no data check is automatically done unless -k is specified.

I am finding that this check is taking quite a long time for large files. By large I mean like a tiff of 300MB or more.

For example

I've been trying to output to disk a raster2pgsql for a 500 MB file which gdalinfo shows this:

Size is 637200, 270000
Coordinate System is:
:
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-125.000000000000000,49.000000000000000)
Pixel Size = (0.000092592592593,-0.000092592592593)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-125.0000000,  49.0000000) (125d 0' 0.00"W, 49d 0' 0.00"N)
Lower Left  (-125.0000000,  24.0000000) (125d 0' 0.00"W, 24d 0' 0.00"N)
Upper Right ( -66.0000000,  49.0000000) ( 66d 0' 0.00"W, 49d 0' 0.00"N)
Lower Right ( -66.0000000,  24.0000000) ( 66d 0' 0.00"W, 24d 0' 0.00"N)
Center      ( -95.5000000,  36.5000000) ( 95d30' 0.00"W, 36d30' 0.00"N)
Band 1 Block=637200x1 Type=Byte, ColorInterp=Gray
  NoData Value=0

my call looks something like this: My original test started 10 hrs ago hasn't spit out anything yet. I assume it is still stuck in the no data check loop.

raster2pgsql -I -t auto -Y 15000 -e file.tif >> test2.sql

# this one with the -k option started spitting out to disk within 5 minutes, after 2 hrs it's at 8GB

raster2pgsql -k -I -t auto -Y 15000 -e file.tif >> test2.sql

I was thinking that for these kinds of cases, perhaps loading all the junk in and purging the no data rows would be faster, but sadly

http://postgis.net/docs/RT_ST_BandIsNoData.html Doing something like below to force the check is equally slow.

SELECT ST_BandIsNoData(rast, true)
FROM sometable

Change History (2)

comment:1 by robe, 22 months ago

This isn't quite as bad of an issue as I thought. The slowness of no data checking seemed to be attributed to the block size.

After converting with:

gdal_translate -a_srs EPSG:4326 -co COMPRESS=LZW -co TILED=YES data/file.tif data/file_retiled.tif

which gives default of 256x256 blocks,

Upper Left  (-125.0000000,  49.0000000) (125d 0' 0.00"W, 49d 0' 0.00"N)
Lower Left  (-125.0000000,  24.0000000) (125d 0' 0.00"W, 24d 0' 0.00"N)
Upper Right ( -66.0000000,  49.0000000) ( 66d 0' 0.00"W, 49d 0' 0.00"N)
Lower Right ( -66.0000000,  24.0000000) ( 66d 0' 0.00"W, 24d 0' 0.00"N)
Center      ( -95.5000000,  36.5000000) ( 95d30' 0.00"W, 36d30' 0.00"N)
Band 1 Block=256x256 Type=Byte, ColorInterp=Gray
  NoData Value=0

then the no data checking finished within about 20 minutes. Still waiting for the load though.

comment:2 by robe, 22 months ago

For this particular file, it was like 95% no data and 5% data. After retiling, the time for the whole process for this took about 41 minutes

# this took 40 minutes
raster2pgsql -I -t 128x128 -Y 15000 -e file_retiled.tif t_128_128 >> test.sql

# this took 75.56 s
psql -f test.sql

I think going direct to PostgreSQL took about 25 minutes, but have to recheck

raster2pgsql -I -t 128x128 -Y 15000 -e file_retiled.tif t_128_128 | psql

So much of the slowness was just the tiff blocksize of Block=637200x1. It never completed so I killed it after 48 hrs.

Still would be nice to get the no data checking timing down a bit, but it does end up much faster doing the no data check than trying to load like 30GB of useless no data data in this case.

Note: See TracTickets for help on using tickets.