Opened 13 years ago

Closed 12 years ago

#1198 closed defect (fixed)

GDAL error reading ESRI binary GRIDs with > 1 tile index file

Reported by: jctoney@… Owned by: warmerdam
Priority: normal Milestone: 1.4.2
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords: aigrid
Cc: matt.wilkie@…

Description (last modified by warmerdam)

When the uncompressed size of an ESRI binary GRID dataset is near 2 GB, a second tile index file (e.g., w001000x.adf) is created which references a second raster data file (e.g., w001000.adf). This is true even if the actual run-length-encoded data size is well under 2 GB. ESRI support site refers to these as "multitiled" rasters in Article ID 27056:

"A multitiled raster is usually created when the number of cells times the number of bytes per cell is close to 2 GB. Multitiled raster can be identified by looking in the workspace folder. If there is more than one wNNNNNNx.adf (N = numeric value) file, then the raster is multitiled."

GDAL (1.3.1 and 1.3.2) returns an error when it attempts to read data referenced by the second tile index file, e.g.,

ERROR 1: IReadBlock failed at X offset 0, Y offset 5016
ERROR 1: GetBlockRef failed at X block offset 0, Y block offset 5016

OpenEV will open these "multitiled" GRIDs, but does not correctly display the data referenced by the second tile index.

An example dataset (79470565.zip) is at: (slow)

ftp://ftp2.fs.fed.us/incoming/rmrs/missoula/ifsl/LANDFIRE/CToney_7.21.05/

or via http at:
http://www.spatialsimulation.org/LFDT/79470565.zip

Attachments (1)

gridlib.c (37.7 KB) - added by rengelke 13 years ago.
source code

Download all attachments as: .zip

Change History (22)

comment:1 Changed 13 years ago by warmerdam

Mateusz, 

Can you dig into this one. 

This Arc/Info binary grid format is a binary format that was reverse engineered.
The documentation is at:

  http://home.gdal.org/projects/aigrid/aigrid_format.html

The code in question is gdal/frmts/aigrid/aigopen.c and gridlib.c in the
same place.  You will need to work out how we are supposed to know when 
things split to multiple files (perhaps even by scanning the directory?)
and update the format docs and code accordingly. 

PS. the sample file is roughly 130MB!

If you run into problems, then wait for me to get back from my trip and I'll
help out or take this problem.  

comment:2 Changed 13 years ago by Mateusz Łoskot

I've just commited changes fixing this bug.

comment:3 Changed 13 years ago by Mateusz Łoskot

Ups! I'm sorry, I wanted to fix Bug 1195 but not this one. So, I'm reopening the Bug 1198. Sorry for confusing you.

comment:4 Changed 13 years ago by warmerdam

Description: modified (diff)
Milestone: 1.5.0
Priority: highnormal

Changed 13 years ago by rengelke

Attachment: gridlib.c added

source code

comment:5 Changed 13 years ago by rengelke

I ran into the same problem with some images I had. I modified the code to optionally read in these if they exist. aigrid.h adds counts and pointers for these. aigopen.c reads in this info if present in AIGOpen, and AIGReadTile and AIGReadFloatTile use this information. In gridlib.c I modified AIGReadBlockIndex to handle either file.

I looked at GDAL code for 1.4.1 and saw the Chicago problem, which didn't address my problem. I skipped the handling for that.

comment:6 Changed 12 years ago by warmerdam

Milestone: 1.5.01.4.2
Owner: changed from Mateusz Łoskot to warmerdam
Status: reopenednew

I'm going to take this and see what I can do today on it.

comment:7 Changed 12 years ago by warmerdam

Status: newassigned

Hello,

I've come up with a hacky solution to be able to access all the data when the dataset is multitiled.

In my case, the data files are: w001001.adf w001000.adf z001001.adf z001002.adf

Below this e-mail I've included a Perl script that reads hdr.adf, dblbnd.adf, sta.adf, and w001001x.adf, and produces some more info than gdalinfo. Especially useful, it gives the number of tiles in the index file, which you can use to compute where the next file starts.

In my case w001001x.adf has info for 506520 tiles. Divide this by HTilesPerRow (201) to get 2520, multiply by HTileYSize (4) to get 10800 rows. My HPixelSizeY is 5, so now I know the w001001.adf has data for 10800 x 5 = 50400 pixel rows.

From the boundary file I know that the upper left Y is at 475000,

so now I know the next file (apparently w001000.adf, strange ordering) starts at 475000 - 50400 = 424600.

The 2nd and 3rd file have the same number of tiles as the 1st, so this means the 3rd file starts at Y = 424600 - 50400 = 374200, and the 4th at 374200 - 50400 = 323800. The 4th file has 170448 tiles (number obtained by running the Perl script below as: read_adf_info.pl z001002x.adf). All the math checks out with the boundaries of the dataset.

To get data out of the 4th file, for example, I did the following:

  • create a subdirectory "data" in the dataset directory
  • move the w001* files in there
  • create symbolic links: ln -s ./data/z001002.adf w001001.adf ln -s ./data/z001002x.adf w001001x.adf (this to make sure gdal_translate uses the 4th file)
  • compute the -srcwin coordinates, taking into account the origin of the 4th file, which is down 3 x 50400 = 151200 from the dataset origin
  • run gdal_translate

This example, more specifically: my dataset upper left is at (13565, 475000). I wanted to extract a grid at upper left (194990, 317925). The upper left of the 4th file is at (13565, 323800), which, as far as gdal_translate is now concerned, is the upper left of the whole dataset.

So the delta Y = 323800 - 317925 = 5875, divide by cell size 5 = 5875 / 5 = row 1175. The delta X = 194990 - 13565 = 180925 / 5 = column 36285.

The grid to extract is 600 by 600 cells, so the command becomes:

gdal_translate -oi AAIGrid -srcwin 36285 1175 600 600 hdr.adf mygrid.agr

This is a grid cell for which I had a known other height field, and they corresponded :-)

I hope this helps people with larger datasets, let me know if anything is not clear,

Patrick


Perl script: read_adf_info.pl


#!/usr/bin/perl

print "\nread_adf_info.pl\nwritten by Patrick Min, May 2007, v1\n\n";

$adfx_filespec = $ARGV[0];

if ($adfx_filespec eq "") {

print "Usage: read_adf_info.pl [<adfx filespec>]\n"; print " if no filename is given, a default [w001001x.adf] is assumed\n\n"; $adfx_filespec = "w001001x.adf";

}

&read_bounds_info;

&read_stats_info;

&read_header;

&read_index_info;

print "done\n\n";

sub convert_msb_int32 {

my(@values) = @_;

my $int_value = $values[0] * 256 * 256 * 256 + $values[1] * 256 *

256 + $values[2] * 256 + $values[3];

return $int_value;

} # convert_msb_int32

sub read_msb_int32 {

my($handle) = @_; my $buf;

read($handle, $buf, 4); my @b = unpack("CCCC", $buf); my $int32 = &convert_msb_int32(@b);

return $int32;

} # read_msb_int32

sub read_index_info {

# # construct index filename #

open($index_handle, $adfx_filespec)
die "Could not open

[$adfx_filespec]\n";

binmode $index_handle;

# # read magic number # print "[$adfx_filespec] magic number ["; read($index_handle, $magic, 8); @b = unpack("CCCCCCCC", $magic); for($i=0; $i < 8; $i++) {

$hex = sprintf("%x", $b[$i]); print $hex;

} print "]\n";

read($index_handle, $dummy, 16);

$filesize = 2 * &read_msb_int32($index_handle); print "Filesize [$filesize], "; $in_mb = $filesize / 1048576; print " or [$in_mb MB]\n";

read($index_handle, $dummy, 72);

$nr_tiles = 0; open(LOG, "> tiles_index.log"); while(!(eof($index_handle))) {

# for($i=0; $i < $NR_TILES; $i++) {

$tile_offset[$nr_tiles] = 2 * &read_msb_int32($index_handle); print LOG "Tile $nr_tiles offset [$tile_offset[$nr_tiles]]\n";

$tile_sizes[$nr_tiles] = 2 * &read_msb_int32($index_handle); print LOG "Tile size [$tile_sizes[$nr_tiles]]\n";

$nr_tiles++;

} # for close LOG; print "read index info for $nr_tiles tiles\n";

close $index_handle;

} # read_index_info

sub read_bounds_info {

$bounds_filespec = "dblbnd.adf"; print "read_bounds_info($bounds_filespec)\n";

open(BOUNDS, $bounds_filespec)
die "Could not open [$bounds_filespec]\n";

binmode BOUNDS;

@vars = ("LLX", "LLY", "URX", "URY");

for($i=0; $i < 4; $i++) {

read(BOUNDS, $buf, 8); @b = unpack("CCCCCCCC", $buf);

for($j=0; $j < 8; $j++) {

$b2[$j] = $b[7 - $j];

}

$buf2 = pack("CCCCCCCC", @b2);

$value = unpack("d", $buf2);

print " $vars[$i]: [$value]\n";

} # for

close BOUNDS; print "\n";

} # read_bounds_info

sub read_msb_double64 {

my($handle) = @_; my $buf; my $j; my @b2; my $buf2;

read($handle, $buf, 8); my @b = unpack("CCCCCCCC", $buf);

for($j=0; $j < 8; $j++) {

$b2[$j] = $b[7 - $j];

}

$buf2 = pack("CCCCCCCC", @b2);

my $value = unpack("d", $buf2);

return $value;

} # read_msb_double64

sub read_stats_info {

$stats_filespec = "sta.adf"; print "read_stats_info($stats_filespec)\n";

open($stats_handle, $stats_filespec)
die "Could not open

[$stats_filespec]\n";

binmode $stats_handle;

@vars = ("SMin", "SMax", "SMean", "SStdDev");

for($i=0; $i < 4; $i++) {

$value = &read_msb_double64($stats_handle); print " $vars[$i]: [$value]\n";

} # for

close $stats_handle; print "\n";

} # read_stats_info

sub read_header {

$header_filespec = "hdr.adf"; print "read_header($header_filespec)\n";

open($hdr_handle, $header_filespec)
die "Could not open

[$header_filespec]\n";

binmode $hdr_handle;

print " HMagic ["; for($i=0; $i < 8; $i++) {

read($hdr_handle, $buf, 1); ($id) = unpack("a", $buf); print $id;

} print "]\n";

read($hdr_handle, $dummy, 8);

$celltype = &read_msb_int32($hdr_handle); print " HCellType [$celltype]\n";

read($hdr_handle, $dummy, 236);

$pixel_size_x = &read_msb_double64($hdr_handle); print " HPixelSizeX [$pixel_size_x]\n"; $pixel_size_y = &read_msb_double64($hdr_handle); print " HPixelSizeY [$pixel_size_y]\n";

$unknown = &read_msb_double64($hdr_handle); print " unknown1 [$unknown]\n"; $unknown = &read_msb_double64($hdr_handle); print " unknown2 [$unknown]\n";

$tiles_per_row = &read_msb_int32($hdr_handle); print " HTilesPerRow [$tiles_per_row]\n";

$tiles_per_column = &read_msb_int32($hdr_handle); print " HTilesPerColumn [$tiles_per_column]\n";

$tile_x_size = &read_msb_int32($hdr_handle); print " HTileXSize [$tile_x_size]\n";

$unknown = &read_msb_int32($hdr_handle); print " unknown [$unknown]\n";

$tile_y_size = &read_msb_int32($hdr_handle); print " HTileYSize [$tile_y_size]\n";

close($hdr_handle);

} # read_header

comment:9 Changed 12 years ago by warmerdam

Keywords: aigrid added

comment:10 Changed 12 years ago by maphew

Cc: matt.wilkie@… added

adding myself to cc: list. I have GRID and can spend some time generating various sized rasters for testing, just let me know what you'd like.

the section "Grid Data Structure" at http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=About_the_ESRI_grid_format says "The size of the tile for a grid is based on the number of rows and columns in the grid at the time of creation. The upper limit on the size of a tile is set by the application and is very large (currently set at 4,000,000 x 4,000,000 cells). As a result, most grids used for GIS applications are automatically stored in a single tile. The spatial data for a grid is automatically split across multiple tiles if the size of the grid at the time of creation is larger than the upper limit for the size of a tile"

and further down under "The Tile Files", The w001001.adf (q0x1y1) and w001001x.adf (q0x1y1x) files store the data and the index for the first or base tile in a grid. ... If additional tiles are used, they are automatically numbered based on their spatial relationship to the first tile. Tiles are implemented as variable-length binary files. With versions prior to ARC/INFO 7.x, these files were named q0x1y1 and q0x1y1x and will still work with the current software.

I'm guessing the key words there are spatial relationship.

comment:11 Changed 12 years ago by warmerdam

OK, I have done a preliminary implementation of multi-tile support in trunk. It works with a "two tile" sample dataset (listed above). But I'm not very confident of the naming convention, and some additional testing and samples will be needed. I'm using the following rule:

/* -------------------------------------------------------------------- */
/*      Compute the basename.                                           */
/* -------------------------------------------------------------------- */
    if( iTileY == 0 )
        sprintf( szBasename, "w%03d001", iTileX + 1 );
    else if( iTileY == 1 )
        sprintf( szBasename, "w%03d000", iTileX + 1 );
    else
        sprintf( szBasename, "z%03d%03d", iTileX + 1, iTileY - 2 );
   

This presumes that "X tiles" are the first three digit value, and that they will always be ascending starting with 001. I have seen, nor even heard of any datasets with more than 1 tile in the X direction. I suspect they have to be very very wide for this to kick in.

For the "Y tiles" I'm assuming they count back from 001 and when they go negative the "w" prefix changes to a "z". There have been some anecdotal reports of this. Depending on immediate feedback, I'll try and prepare a new FWTools release tomorrow so the new implementation can be tested by folks creating datasets (ie. Matt, and Chris).

comment:12 Changed 12 years ago by maphew

I started a process to create a 500,000 by 500,000 raster filled with random values yesterday at 5pm and it's still running this morning at almost 10am. I don't know if the process is working or hung. Earlier trials with larger extents failed with "System Error (0): Unexpected Error"; (lack of disk or swap space perhaps?).

I'll try and create a >4 million pixel image with something else and then translate that with IMAGEGRID and see what happens.

comment:13 Changed 12 years ago by warmerdam

Guys,

I've posted an FWTools for windows at:

http://home.gdal.org/fwtools/FWTools132.exe

with my current changes. Any testing appreciated.

comment:14 Changed 12 years ago by warmerdam

I've reposted the FWTools132.exe with a change to the file name convention and a many tile file now seems to work.

Further testing appreciated.

comment:15 Changed 12 years ago by 19771124

Is this can add myself as Cc?I now work with 84G data.

comment:16 Changed 12 years ago by 19771124

comment:17 Changed 12 years ago by liufei

Thanks for Mr. Warmerdam's work.

comment:18 Changed 12 years ago by warmerdam

The http://home.gdal.org/fwtools/FWTools132.exe link is working for me. Perhaps it is blocked by your firewall for some reason? I'd suggest trying again.

comment:19 Changed 12 years ago by ctoney

The latest update in FWTools132 seems to be working well. I ran a few tests on large "multi-tiled" GRIDs created from (slightly) different software. Tests were to check the display in OpenEV, then use Python/GDAL code to extract pixel values at several point locations across the rasters and compare to known correct values. So far I haven't found any exceptions to file naming convention (w001001, w001000, z001001, z001002, ...). The test datasets were the following:

  • 16-bit int, 154009 x 96559, 15-tiled GRID created in ArcInfo? Workstation 9.0 –

OpenEV display no errors, pixel extraction on about 40,000 points, no errors

  • random floating point, 50000 x 50000, 5-tiled GRID created in ArcGIS 9.2 – OpenEV display no errors, pixel extraction on 10 points, no errors
  • 32-bit int, 25096 x 24694, 2-tiled GRID from the USGS National Map (created with ArcObjects? code extracting from SDE raster?) – OpenEV display no errors, pixel extraction on about 38,000 points, no errors.

Thanks much for the update.

comment:20 Changed 12 years ago by maphew

I haven't yet been able to create a grid large enough to start the tiling process. I need to do some housekeeping and free up some disk space first. I just wanted to let everyone know why I've been quiet.

comment:21 Changed 12 years ago by warmerdam

Resolution: fixed
Status: assignedclosed

Changes backported into 1.4.x as r11617.

Closing as fixed. Please reopen if any problems are encountered in further testing. Especially of value is very *wide* grids which activate the horizontal tiling support.

Note: See TracTickets for help on using tickets.