Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#5828 closed defect (fixed)

gdal_translate performance degrades on wide VRTs with PNG output

Reported by: riveryeti Owned by: warmerdam
Priority: normal Milestone: 2.0.0
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords: vrt
Cc:

Description (last modified by riveryeti)

I am using the commandline gdal_translate -of png -co "ZLEVEL=1" [filename].vrt [filename].png to translate two rasters from VRT to PNG (also converting to tiled tiff, but that works fine).

Given two vrt "rasters" with sizes as follows:

191488,46080 (wide raster) 59392,201728 (tall raster)

performance degrades to about 20bytes/sec on the wide raster when I hit about 30% processing (still at 30% + 1 dot 3 hours later). On the other hand, performance does not noticeably degrade when processing the tall raster, and the entire translation takes about 2-3 minutes.

both vrts are built from 1024x1024 uncompressed TIFF tiles generated using otherwise the same methods/settings/projection/coordinate system/etc.

gdalinfo from "good" vrt below (excluding file-list) vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Size is 59392, 201728
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-123.610326674747300,48.157496099385916)
Pixel Size = (0.000001641647305,-0.000001100758214)
Corner Coordinates:
Upper Left  (-123.6103267,  48.1574961) (123d36'37.18"W, 48d 9'26.99"N)
Lower Left  (-123.6103267,  47.9354423) (123d36'37.18"W, 47d56' 7.59"N)
Upper Right (-123.5128260,  48.1574961) (123d30'46.17"W, 48d 9'26.99"N)
Lower Right (-123.5128260,  47.9354423) (123d30'46.17"W, 47d56' 7.59"N)
Center      (-123.5615763,  48.0464692) (123d33'41.67"W, 48d 2'47.29"N)
Band 1 Block=128x128 Type=Byte, ColorInterp=Red
  NoData Value=0
Band 2 Block=128x128 Type=Byte, ColorInterp=Green
  NoData Value=0
Band 3 Block=128x128 Type=Byte, ColorInterp=Blue
  NoData Value=0
Band 4 Block=128x128 Type=Byte, ColorInterp=Alpha
  NoData Value=0

gdalinfo from "bad" vrt below (excluding file-list) vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Size is 191488, 46080
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-123.657297355231180,48.158143203488351)
Pixel Size = (0.000001374831964,-0.000000920181878)
Corner Coordinates:
Upper Left  (-123.6572974,  48.1581432) (123d39'26.27"W, 48d 9'29.32"N)
Lower Left  (-123.6572974,  48.1157412) (123d39'26.27"W, 48d 6'56.67"N)
Upper Right (-123.3940335,  48.1581432) (123d23'38.52"W, 48d 9'29.32"N)
Lower Right (-123.3940335,  48.1157412) (123d23'38.52"W, 48d 6'56.67"N)
Center      (-123.5256654,  48.1369422) (123d31'32.40"W, 48d 8'12.99"N)
Band 1 Block=128x128 Type=Byte, ColorInterp=Red
  NoData Value=0
Band 2 Block=128x128 Type=Byte, ColorInterp=Green
  NoData Value=0
Band 3 Block=128x128 Type=Byte, ColorInterp=Blue
  NoData Value=0
Band 4 Block=128x128 Type=Byte, ColorInterp=Alpha
  NoData Value=0

end.

Change History (18)

comment:1 Changed 7 years ago by Jukka Rahkonen

Description: modified (diff)

comment:2 Changed 7 years ago by Even Rouault

  • Is it specific to gdal 2.0dev ? (That would be surprising)
  • I think you're confusing wide and tall rasters. Size is 59392, 201728 means "width = 59392 and height = 201712", so this is a tall raster.

It is not completely surprising that wide rasters are slower than tall rasters as copying operates by chunks of lines. You could try increasing the GDAL block cache size to 200 MB for example by setting GDAL_CACHEMAX=200 as environment variable.

comment:3 Changed 7 years ago by riveryeti

Description: modified (diff)

Whoops, thanks rouault for catching that. I mixed up the wide and tall labels. Corrected now. Haven't tested other versions. Did try changing cache size to 256 MB then 2048 MB. Could try larger value too (I have 192 GB RAM). Also tried converting single TIF --> PNG, and never got past 0 pct (only ran for 10 minutes).

comment:4 Changed 7 years ago by riveryeti

Almost 70% done now. For curiosity's sake I started a new cmd instance and tried running gdal_translate after I set GDAL_CACHEMAX=100000 and it still stalled at 30%.

Not sure if it will help with anything, but I have some stats I can relay...

after 42:01:06 of CPU time, here's what Task Manager says:

Memory - working set 134,300 K Memory - peak working set 156,600 K Memory - working set delta 0 K Memory - Private working set 126,756 K Memory - Commit size 135,520 K Memory - Paged Pool 188 K Memory - Non-paged Pool 13 K Page Faults 61,515 Page Fault delta 0 Base Priority Normal Handles 130 Threads 1 USER Objects 1 GDI Objects 4 I/O Reads 124,747,204 I/O Writes 1,145,332 I/O Other 2,859,733,901 I/O Read Bytes 500,755,968,154 I/O Write Bytes 3,132,090,337 I/O Other Bytes 11,266,925,531,594

comment:5 Changed 7 years ago by Even Rouault

I've done a small test by creating a fake huge tiled TIFF with python:

from osgeo impot gdal
ds = gdal.GetDriverByName('GTiff').Create('huge.tif', 191488, 46080, 4, options = ['TILED=YES', 'SPARSE_OK=YES'])
ds = None

and then: "gdal_translate huge.tif huge.png -of png --config GDAL_CACHEMAX 1000"

A very quick profiling shows that most of the time is spent in PNG encoding, so it is not obvious there's something wrong on GDAL side. (Note: I've committed in r28415 a change that *potentially* could help, although I'm not sure it will make any difference)

comment:6 Changed 7 years ago by riveryeti

So if most the time is spent on PNG encoding, I should be looking for the cause here?

https://trac.osgeo.org/gdal/browser/trunk/gdal/frmts/grib/degrib18/g2clib-1.0.4/enc_png.c

comment:7 Changed 7 years ago by Even Rouault

No, this is in libpng itself (a copy exists in frmts/png/libpng), but I doubt you can do anything. PNG compression is rather slow and probably not meant for such big images.

You'd better outputing to a TILED GeoTIFF with DEFLATE or LZW compression (with -co TILED=YES -co COMPRESS=DEFLATE for example, and possibly with -co PREDICTOR=2 added)

comment:8 Changed 7 years ago by riveryeti

Thanks rouault for your feedback. My preferred format is actually a tiled geotiff, and I generated one of those too using the following:

gdal_translate -of Gtiff -co "COMPRESS=JPEG" -co "JPEG_QUALITY=90" -co "TILED=YES" -co "PHOTOMETRIC=YCBCR" -co BLOCKYSIZE=256 -co BLOCKXSIZE=256 -co "TFW=YES" -b 1 -b 2 -b 3 [infile].vrt [outfile].tif

unfortunately I haven't been able to figure out how to get anything other than PNG to upload to gigapan, which is the only way I've figured out how to easily share the imagery I am generating (http://gigapan.com/gigapans?tags=Elwha)

In an interesting (to me) development, at some point in the last hour or so the process got motivated again and finished the remaining 30%.

I still don't understand why a wide image would take ~500x longer than a tall image, even though the tall image is 35% larger. Although if it's related to some inherent difference in behavior copying long vs wide images (like compression efficiency degrading the longer the line or chunk of x lines) that might make sense because most of the data was in the middle of the image. Maybe that's why I got a speed burst at the beginning and the end.

comment:9 Changed 7 years ago by Jukka Rahkonen

Hi,

I would suggest to write a mail to Gigapan and suggest them to do some development on their side and add support for BigTIFF format. Now their website informs "TIFF files have a maximum size of 4 gigabytes. Recommended if your image is small enough to fit within 4 gigabytes, otherwise use Photoshop Raw."

Png suits for you in this special case, but otherwise a multigigabyte png is a very poor selection as an image format for geospatial images. Png does not support direct access to arbitrary location inside an image nor overviews which both are essential features.

comment:10 Changed 7 years ago by Even Rouault

Resolution: invalid
Status: newclosed

libpng uses different heuristics to compress (horizontal differencing, vertical differencing or more complex combinations), so the compression time might depend on the content.

I'm closing this ticket as it does not *appear* there's anything we can do in GDAL and the issue might be in libpng itself (or a "feature" of libpng). I can be wrong of course, but I don't think anyone will investigate more on that (and it would probably require access to your precise dataset for deep analysis), so no point in keeping it open.

comment:11 Changed 7 years ago by riveryeti

Thanks rouault for helping me trace the source of the problem. I'll contact the appropriate libpng mailing list.

Out of curiousity, if I wanted to profile the process while it was running to see where it's spending the most time (for example to make sure it's not in the VRT decoding) how would I go about doing that?

Thanks again,

Andy

comment:12 Changed 7 years ago by Even Rouault

Regarding profiling, my favorite method is to run the process under a debugger and regularly pause it, look it where it has stoped, resume, break again, look, resume etc... When there's a big bottleneck, you'll see it quicly because you'll break often in the same code. That's what I've done in my above test

comment:13 Changed 7 years ago by riveryeti

I started a conversation over on the PNG listserv and there's some discussion there re what's libpng's fault and what's gdal's fault (some of the programming lingo is a little past me). Also mentioning challenging the closure. I'm not quite sure the best way to proceed, being just a capable user who wants to contribute by pointing out when things seem to go wrong...

comment:14 Changed 7 years ago by jbowler

Resolution: invalid
Status: closedreopened

I can repro the bug using Andy Ritchies original test files on Linux (gdal 1.11.1), when use perf to profile where the problem is I see that only 0.79% of the CPU time is spent inside libpng and 0.52% of the time inside zlib, so the problem is nothing to do with PNG/LZ77 compression.

By simply C'ing gdal_translate under gdb I can see that on my system it takes about 2s to process each scan line and almost all of this time is spent *reading* the TIFF files.

From the profile and backtrace using ctrl-C I can see that most of the time is going in reading the directory of TIFF files (about 6600 files), which seems to be getting done at least once per scanline.

Andy's much bigger tall/narrow case takes about 1/40th of the time to process and then 75% of the time is spent compressing the image, as might be expected (LZW decode is fast, LZ77 encode is slow).

In fact libpng seems to take pretty much exactly the same time, per pixel, to compress both images; both the fast case and the slow (buggy) case.

I've reopened the bug, it's clearly a bug (it shouldn't take 40x longer to compress an image just because it has 25% more input files) and it's easy to repro, given Andy's test set.

comment:15 Changed 7 years ago by Even Rouault

OK, so if it seems to be due to VRT specificaly as input, I'd recommand increasing the GDAL_MAX_DATASET_POOL_SIZE configuration option/environmenent variable from its default value which is 100 (files) to a higher value. I assume that in the wide VRT they are more than 100 files in width. If so at each line reading, the cache of 100 files isn't enough and all files are to be reopened again. Note: on Linux you cannot increase GDAL_MAX_DATASET_POOL_SIZE to more than 1000 without having administrative privilege.

comment:16 Changed 7 years ago by jbowler

The tiles are 1024x1024 pixels, the 'narrow' case therefore has 58 tiles in each row, the 'wide' case has 187.

I've run Andy's wide test case with GDAL_MAX_DATASET_POOL_SIZE=256. The gdal_translate programs use almost trivial amounts of memory, the narrow case (with the default setting) is taking under 200MiB virtual memory and the wide case with 256 is taking under 400MiB of virtual memory. `It seems pretty clear that most of the memory of the process is being used for the dataset pool.

With the setting gdal_translate (to PNG) takes 18 minutes, compared to over 350 minutes without. This is faster than the 'narrow' case which takes 24.5 minutes, but that's expected because the narrow case is 25% bigger.

perf report shows that, with the increase pool size, gdal_translate is spending its time (well at least 50% of it) compressing the output.

Changing the pool size increases the time for gdalbuildvrt to run, presumably because it doesn't benefit from the caching because it reads each input TIFF only once.

Perhaps it would be a good idea for gdalbuildvrt to set the pool size to a small number and gdal_translate to set it to at least the number of tiles in a row when the translate code reads the input line-by-line. That would fix the bug. At the very least gdal_translate should warn that a very bad performance issue is about to happen if the cache isn't large enough.

comment:17 Changed 7 years ago by Even Rouault

Component: defaultGDAL_Raster
Keywords: vrt added
Milestone: 2.0
Resolution: fixed
Status: reopenedclosed
Summary: gdal_translate (2.0.0dev) performance degrades on wide VRTs (VRT to PNG only, TIFF OK)gdal_translate performance degrades on wide VRTs with PNG output

Picking up the right value is very complicated for many reasons :

  • gdal_translate doesn't know that a VRT is made of several files
  • VRT aren't necessarily regularly tiled
  • datasets referenced by VRT may need several file descriptors and not just 1
  • the optimal size depends on the access pattern. When translating to a tiled dataset, we don't need a big value for GDAL_MAX_DATASET_POOL_SIZE. When translating to formats like PNG or JPEG that acquire the input datasets by line, it is not enough. But the way the output driver operates is completely driver dependant, and gdal_translate has no idea about it. That would require some driver metadata to advertize its I/O strategy.

Anyway the following should hopefully be good enough: trunk r28454 "gdal_translate and gdalwarp: increase GDAL_MAX_DATASET_POOL_SIZE default value to 450. VRT format documentation: document GDAL_MAX_DATASET_POOL_SIZE (#5828)"

comment:18 Changed 6 years ago by Even Rouault

Milestone: 2.02.0.0

Milestone renamed

Note: See TracTickets for help on using tickets.