Ticket #2158 (new defect)
gdal becomes painfully slow when used in directories with large number of files
| Reported by: | sroberts | Owned by: | warmerdam |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.8.1 |
| Component: | GDAL_Raster | Version: | 1.5.0 |
| Severity: | normal | Keywords: | |
| Cc: | rouault, daniel112b@…, kyle, mloskot |
Description
Starting with gdal-1.5.0 gdal operations done in directories with large number of files have become very slow. I'm working in a directory that has around 54,000 files with a mix of GeoTIFF's and other auxiliary files. So as an example, running gdaltindex on 500 GeoTIFF's that are in this directory using the command "gdaltindex foo.shp stere-2*tiff" will take only 3 seconds using version 1.4.2 but over 13 minutes using 1.5.0! Using gdalinfo on a single GeoTIFF on this director jumped from 00.05 to 01.35 seconds when using 1.5.0. And viewing these GeoTIFFs using mapserver linked to gdal 1.5.0 increased from around a second to over 1/2 minutes! Running strace on gdaltindex shows for every GeoTIFF open there is a corresponding:
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 7
It now appears that every time gdal opens a file it also generates a directory listing! I assume this is the source of the slowdown. And I assume this directory open is related to the 1.5.0 change "Added Identify() method on drivers (per RFC 11: Fast Format Identify)"? I should mention this is being done on RedHat? Linux.
-Steve

