Ticket #2883 (new enhancement)

Opened 6 years ago

Last modified 3 years ago

GDALOpen() is too slow when the direcotry containing the file to open has many files

Reported by: Kasulle Owned by: warmerdam
Priority: normal Milestone:
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description (last modified by warmerdam) (diff)

I'm using GDAL to access Tile Pyramid. Each Tile is a single file, so when the image is very large, like (100,000 * 100,000), there are many files in the direcotry storing the whole Tile Pyramid. In this case, the source image is 100,000*300,000 ,and the tile size is 512*512, so there are about 160,000 files in the direcotry.

When i use GDALOpen() to open one file, it takes about 0.6s; then i copy the file to an empty directory, this time it just takes about 0.02s;

My Analysis

I found that: in GDALOpen() function, it first create an GDALOpenInfo object; in the GDALOpenInfo constructor, there is one statement like this:

 papszSiblingFiles = VSIReadDir( osDir ); 

when the doDir has many files, VSIReadDir() will be very slow. Now I just change the statement to:

   papszSiblingFiles = NULL;

Is there any better solution?

Change History

Changed 6 years ago by rouault

Yes, there is a solution/workaround. Define GDAL_DISABLE_READDIR_ON_OPEN=YES as an environment variable and it will skip listing the files of the directory.

Frank, I'm wondering if we shouldn't invert the default value for this option ? I've the feeling that the use cases where the ReadDir? is interesting do not represent the majority of the use cases, and if the gain is really enough to justify major slowdowns with big number of files in the directory.

(This was already discussed in ticket #2158)

Changed 3 years ago by warmerdam

  • description modified (diff)
  • milestone 1.6.4 deleted

removing milestone.

Note: See TracTickets for help on using tickets.