Ticket #1900 (assigned defect)

Opened 1 year ago

Last modified 11 months ago

Get hdf band, inverse data set dimensions

Reported by: cermak Assigned to: dron (accepted)
Priority: normal Milestone:
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords: hdf4
Cc: ilucena, dnadeau, dron

Description (Last modified by warmerdam)

Hi!

In the MODIS cloud product hdf files (MOD03, MYD03) some data sets are stored with the band dimension last instead of first.

gdalinfo myfile.hdf (or gdal.Dataset.GetSubDatasets) yields:
[snip]
SUBDATASET_52_NAME=HDF4_EOS:EOS_SWATH:'myfile.hdf':mod06:Cloud_Mask_1km
SUBDATASET_52_DESC=[2040x1354x2] Cloud_Mask_1km mod06 (8-bit integer)
[snip]

So obviously there are two bands of size 2040x1354 in the subdataset. However, in order for gdal to recognize them as two bands the dimensions would have to be [2x2040x1354].

Accordingly, when I do gdalinfo HDF4_EOS:EOS_SWATH:'myfile.hdf':mod06:Cloud_Mask_1km I get information on 2040 different bands, each of size 1354x2

It would be useful to have a feature in gdal that allows explicit selection of the dimension that holds the band count or even let gdal decide on this.

A sample file is at: ftp://ladsweb.nascom.nasa.gov/allData/5/MYD06_L2/2006/220/MYD06_L2.A2006220.1315.005.2006222153137.hdf

Thanks and all best, Jan

PS: hdfview can read these files.

Attachments

input.c (18.2 kB) - added by ilucena on 10/15/07 08:19:31.
Souce code from Modis Swath Reprojection Tool

Change History

10/11/07 11:25:03 changed by warmerdam

  • keywords set to hdf4.
  • status changed from new to assigned.
  • component changed from default to GDAL_Raster.
  • description changed.
  • cc set to ilucena, dnadeau, dron.

Added a few people knowledgable about HDF4 to cc list.

10/11/07 13:59:12 changed by ilucena

I downloaded the Modis Swath Reprojection tool with source code and I found on input.c the function FindInputDim?() with those comments:

/*Figure out which are the line/sample dimensions and which are the

extra dimensions. The line and sample dimensions are expected to be greater than MIN_LS_DIM_SIZE. If too many dimensions are "line/sample" dimensions, then it is an error. If not enough dimensions are available for "line/sample" dimensions, then it is also an error.*/

I guess that this is the only way to solve that problem since a call to SDgetdimid() and SDdiminfo() cannot guaranty the exact information about who is who in the dimension array.

(follow-up: ↓ 10 ) 10/15/07 06:24:29 changed by dron

  • status changed from assigned to new.
  • owner changed from warmerdam to dron.

I have no idea what MIN_LS_DIM_SIZE should be. If we have a hyperspectral HDF (and I have such samples) it is possible that number of bands will be greater than image width. We can take a liberty to assume that number of bands can't be greater than 256, but it is still risky.

Jan,

Talking about referenced sample I have noticed that its dimensions [2040x1354x2] called as "Cell_Along_Swath_1km,Cell_Across_Swath_1km,Cloud_Mask_1km_Num_Bytes"

Note that the third dimension called "Cloud_Mask_1km_Num_Bytes". Is it possible that this has nothing to do with the number of bands? Maybe we should interpret the whole thing in the different way? The product specification could help here.

Best regards, Andrey

10/15/07 06:24:51 changed by dron

  • status changed from new to assigned.

10/15/07 08:19:31 changed by ilucena

  • attachment input.c added.

Souce code from Modis Swath Reprojection Tool

10/15/07 08:52:02 changed by ilucena

Andrey,

Sorry I forgot to attach the input.c source code from MRTSwath.

I faced that problem before and by that time I contact several people at NASA/NCSA to sort it out. The conclusion was that not all HDF4 products are well documented.

The proof of that is that their own (NASA/NSCA) software MRT/MRTSwath tries to guess witch dimension is witch. That is way I mentioned their source code.

The solution that I implemented on Idrisi was something like that: I select the dimension with closed length (assuming that columns always come after rows):

[2x430x500] = 2 bands of 430x500 [430x500x2] = 2 bands of 430x500 [430x4x500 = 4 bands of 430x500 /*crazy but possible*/ [430x500x600] = 600 bands of 430x500 /*almost got it wrong*/

That is not a perfect solution but we can use it in case there is not sufficient information on the file itself or there is not enough knowledge about the product specification.

I hope that would help.

Best regards,

Ivan

10/15/07 08:56:27 changed by ilucena

Corrections:

- That is way I mentioned their source code.

+ That is why I mentioned their source code.

- closed length (assuming that columns always come after rows):

+ closest length (assuming that columns always come after rows):

Reformatting:

[2x430x500] = 2 bands of 430x500

[430x500x2] = 2 bands of 430x500

[430x4x500 = 4 bands of 430x500 /*crazy but possible*/

[430x500x600] = 600 bands of 430x500 /*almost got it wrong*

10/15/07 09:53:35 changed by dron

GDAL is a general raster processing tool, and I hate to add more intelligence to our already overintelligent driver. I am thinking about open options like XDIM, YDIM, BANDDIM to point the exact dimensions numbers. Probably it will be the best and most common solution.

Best regards,

Andrey

10/15/07 10:20:50 changed by ilucena

Andrey, As a GDAL user, what I need is to run MapServer? (gdalindex) and/or ArcGIS ImageServer? to go through thousands of HDF4 files and make footprints of all datasets/bands into a shapefile. And there is no way to add a user-option on that process. As a GDAL programmer, what I can do is to write the guessing-dimension stuff on the driver and send it to you as a patch so you can evaluate and commit if you want. Best regards, Ivan

10/15/07 12:00:01 changed by dron

Ivan,

Actually it is not a problem to assign MIN_LS_DIM_SIZE to some number (in your case it is 250). It is a problem to generalize the whole process. We should develop a way to deduce the dimension map of the arbitrary dataset. There are datasets that have (width < num_bands), so I still do not understand how we can guess image dimensions based on dimension array only.

Folks, I want to get the one day time out to think out this thing again. Max band number constant is the first solution, the second one is introducing the global variables XDIM, YDIM and BANDDIM. I do not think we will be able to implement the general case here. We will need a control from the user side, because HDF is a user oriented format.

Best reghrds, Andrey

(in reply to: ↑ 3 ) 10/16/07 00:51:43 changed by cermak

Replying to dron:

Talking about referenced sample I have noticed that its dimensions [2040x1354x2] called as "Cell_Along_Swath_1km,Cell_Across_Swath_1km,Cloud_Mask_1km_Num_Bytes" Note that the third dimension called "Cloud_Mask_1km_Num_Bytes". Is it possible that this has nothing to do with the number of bands? Maybe we should interpret the whole thing in the different way? The product specification could help here.

Hi Andrey,

That depends on your definition of a band. These are not 'bands' in the sense of different spectral channels in a radiometer. They are however bands in that they contain distinct subsets of information relating to the 'Cloud_Mask_1km' product. (Each band contains 1 byte of bit-coded information, thence 'Num_Bytes')

Product format info is on http://modis-atmos.gsfc.nasa.gov/MOD06_L2/format.html

Best, Jan