Opened 16 years ago

Closed 13 years ago

#2412 closed defect (fixed)

Incorrect handling of HDF5 metadata in python bindings

Reported by: antonio Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: GDAL_Raster Version: 1.5.1
Severity: normal Keywords: HDF5
Cc: dnadeau, Kyle Shannon, alexmantaut

Description

It is not possible to access HDF5 metadata from python bindings. Small test performed on Debian Sid (x86_64) with gdal 1.5.1

$ gdal-config --version
1.5.1

$ gdalinfo test.h5 Driver: HDF5/Hierarchical Data Format Release 5
Size is 512, 512
Coordinate System is `'
Metadata:
  G1:b1=1
  G1:b2=2
  G1:b3=3
Subdatasets:
  SUBDATASET_0_NAME=HDF5:"test.h5"://G1/D1
  SUBDATASET_0_DESC=[1x1] //G1/D1 (16-bit integer)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

$ gdalinfo HDF5:test.h5://G1/D1szFilenname test.h5
Driver: HDF5Image/HDF5 Dataset
Size is 1, 1
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        TOWGS84[0,0,0,0,0,0,0],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9108"]],
    AXIS["Lat",NORTH],
    AXIS["Long",EAST],
    AUTHORITY["EPSG","4326"]]
Metadata:
  G1:b1=1
  G1:b2=2
  G1:b3=3
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,    1.0)
Upper Right (    1.0,    0.0)
Lower Right (    1.0,    1.0)
Center      (    0.5,    0.5)
Band 1 Block=1x1 Type=Int16, ColorInterp=Undefined
  Metadata:
    D1:c1=1
    D1:c2=2
    D1:c3=3

$ ipython
Python 2.5.2 (r252:60911, May  7 2008, 15:21:12)
Type "copyright", "credits" or "license" for more information.

IPython 0.8.2 -- An enhanced Interactive Python.
?       -> Introduction to IPython's features.
%magic  -> Information about IPython's 'magic' % functions.
help    -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.

In [1]: from osgeo import gdal

In [2]: d = gdal.Open('test.h5')

In [3]: d.GetMetadata()
Out[3]: {'G1': 'b3=3 '}

In [4]: d = gdal.Open('HDF5:test.h5://G1/D1')
szFilenname test.h5

In [5]: d.GetMetadata()
Out[5]: {'G1': 'b3=3 '}

In [6]: b = d.GetRasterBand(1)

In [7]: d.GetMetadata()
Out[7]: {'D1': 'c3=3 '}

Both in datasets and raster bands only the last metadata item is showed in form 'key=value', whith the name of HDF5 group/dataset is used as metadata key.

Attachments (2)

hdf5-metadata.patch (9.2 KB ) - added by antonio 14 years ago.
metadata.h5 (25.7 KB ) - added by antonio 14 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 by warmerdam, 16 years ago

Cc: dnadeau added
Owner: changed from hobu to warmerdam
Status: newassigned

The problem is that ":" is considered to be a keyword/value seperator much like "=" is. So the metadata items end up being treated as a bunch of "G1" keywords and when put into a dictionary in Python they collide and only the last survives.

I think we need to review the naming convention for metadata in the HDF5 driver.

The problem isn't in the Python bindings per-se, it's just that it manifests there because they take the dictionary like behavior of the metadata more literally than the core API.

Denis - any thoughts on the keyword naming and if it can be easily changed?

in reply to:  1 comment:2 by antonio, 16 years ago

Replying to warmerdam:

The problem is that ":" is considered to be a keyword/value seperator much like "=" is. So the metadata items end up being treated as a bunch of "G1" keywords and when put into a dictionary in Python they collide and only the last survives.

I think we need to review the naming convention for metadata in the HDF5 driver.

The problem isn't in the Python bindings per-se, it's just that it manifests there because they take the dictionary like behavior of the metadata more literally than the core API.

Denis - any thoughts on the keyword naming and if it can be easily changed?

Oh, my suggestion is to use the full-path of the attribute: key="/G1/D1/c3". Is it feasible?

Another solution could be using XML, but I suppose this will take a larger effort.

comment:3 by warmerdam, 16 years ago

The keys are intended to be "proper tokens". I would advise against using special characters like slashes. Better (I think) to replace the : with an underscore.

But I'm not sure what the implications of doing this are.

in reply to:  3 comment:4 by antonio, 16 years ago

Replying to warmerdam:

The keys are intended to be "proper tokens". I would advise against using special characters like slashes. Better (I think) to replace the : with an underscore.

But I'm not sure what the implications of doing this are.

Yes you are absolutely right.

Also consider that HDF5 attributes may have names with space inside and products of the COSMO SkyMed mission have a lot of this ones:

/S01/SBI:Column Spacing
/S01/SBI:Bottom Left Geodetic Coordinates
/S01/SBI:Bottom Right Geodetic Coordinates
/S01:Calibration Constant
/:Acquisition Mode
...

It is also important to note that metadata needed to actually use COSMO SkyMed products are spread over the root group (/), the /S01 group and the /S01/SBI dataset.

It would be nice to have a mechanism that allows to get them all together in a consistent way.

comment:5 by Even Rouault, 14 years ago

I recently discovered the existence of the GetMetadata_List() method that returns a list of "key=value" metadata item without trying to separate the key and value as in the dictionnary object returned by GetMetadata(). This might help a lot for the issue at stake here.

in reply to:  5 comment:6 by antonio, 14 years ago

Replying to rouault:

I recently discovered the existence of the GetMetadata_List() method that returns a list of "key=value" metadata item without trying to separate the key and value as in the dictionnary object returned by GetMetadata(). This might help a lot for the issue at stake here.

thanks Even, good suggestion.

Anyway I hope a decision on a more radical solution will be taken soon. I volunteer to implement the patch.

Further than solution suggested in previous comments:

  1. a better naming convention, e.g.
        "S01|SBI||Bottom Right Geodetic Coordinates"
    
  2. XML

I would like to suggest a third possible way:

  1. using dataset domains (e.g. "S01" or "S01|SBI") and the HDF5 attribute name ("Bottom Right Geodetic Coordinates") as metadata key. In order to allow the user to know available domains a special attribute could be added to the default domain:
           HDF_DOMAINS=S01, S01|SBI, ...
    

comment:7 by warmerdam, 14 years ago

Once again, I would suggest underscores instead of a special character like "|" for directory delimeters in metadata item names.

Pushing metadata into many domains will make it inaccessable in most applications. The normal practice is to flatten it out into the default domain unless it is anticipated to be very bulky and of little general interest.

I have so far avoided metadata listing domain names - partly with the idea that a uniform mechanism for domain discovery might be added in the future. Also, it would be easy to get into the situation where only the default domain metadata with the HDF_DOMAINS is copied to a new dataset but the non-default domains are not copied.

A flattening patch with underscores would be welcome.

in reply to:  7 comment:8 by antonio, 14 years ago

Replying to warmerdam:

Once again, I would suggest underscores instead of a special character like "|" for directory delimeters in metadata item names.

[CUT]

A flattening patch with underscores would be welcome.

OK Frank.

Just few questions:

  • HDF5 node names and attribute names with spaces are OK or they need a special handling?
  • if node or attribute name already has an underscore inside what should we do? ignore them? convert them into a double underscore?
  • being able to identify the original HDF5 path from the metadata key is a requirement?

Do you have any suggestion?

comment:9 by warmerdam, 14 years ago

I think spaces need to be converted to underscores. The GDAL "metadata key" name should be a well behaved identifier.

There is no need to change existing underscores.

I do not think it is important to be able to convert the metadata key name back into an HDF path.

comment:10 by warmerdam, 14 years ago

Component: PythonBindingsGDAL_Raster
Keywords: python removed
Milestone: 1.5.41.7.3

by antonio, 14 years ago

Attachment: hdf5-metadata.patch added

by antonio, 14 years ago

Attachment: metadata.h5 added

in reply to:  9 comment:11 by antonio, 14 years ago

Hi Frank,

Replying to warmerdam:

I think spaces need to be converted to underscores. The GDAL "metadata key" name should be a well behaved identifier.

There is no need to change existing underscores.

I've just uploaded a patch for HDF5 metadata names convention change (it also includes tests) and a test dataset to be added to autotest/gdrivers/data.

Please note that with the new naming convention metadata names are always prefixed by the HDF5 path, so for example

$ gdalinfo HDF5:data/metadata.h5://G1/D1
Driver: HDF5Image/HDF5 Dataset
Files: data/metadata.h5
Size is 1, 1
Metadata:
  G1_attribute_with_underscores=0 
  G1_attribute_with_spaces=0d 
  G1_attribute_with_spaces_and_underscores=0.1 
  G1_attribute=value
  Group_with_spaces_attribute_with_underscores=0 
  Group_with_spaces_attribute_with_spaces=0d 
  Group_with_spaces_attribute_with_spaces_and_underscores=0.1 
  Group_with_spaces_attribute=value
  Group_with_spaces_and_underscores_attribute_with_underscores=0 
  Group_with_spaces_and_underscores_attribute_with_spaces=0d 
  Group_with_spaces_and_underscores_attribute_with_spaces_and_underscores=0.1 
  Group_with_spaces_and_underscores_attribute=value
  Group_with_underscores_attribute_with_underscores=0 
  Group_with_underscores_attribute_with_spaces=0d 
  Group_with_underscores_attribute_with_spaces_and_underscores=0.1 
  Group_with_underscores_attribute=value
  attribute=value
  attribute_with_spaces=0 
  attribute_with_underscores=0d 
  attribute_with_spaces_and_underscores=0.100000 
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,    1.0)
Upper Right (    1.0,    0.0)
Lower Right (    1.0,    1.0)
Center      (    0.5,    0.5)
Band 1 Block=1x1 Type=Int32, ColorInterp=Undefined
  Metadata:
    G1_D1_attribute_with_underscores=0d
    G1_D1_attribute=value
    G1_D1_attribute_with_spaces_and_underscores=0.100000 
    G1_D1_attribute_with_spaces=0 

Metadata attached to the raster band are prefixed with the HDF5 path even if it could be considered redundant. Do you think this is the correct behaviour? Is it preferable to to remove the HDF5 path component of the name for metadata attached to raster bands?

I do not think it is important to be able to convert the metadata key name back into an HDF path.

In my opinion being able to rebuild the original HDF5 path could be important in case one want to implement write support for the HDF5 driver.

comment:12 by Kyle Shannon, 14 years ago

Cc: Kyle Shannon added

comment:13 by alexmantaut, 13 years ago

Cc: alexmantaut added

Hi all,

I've tried to apply the patch posted earlier to gdal 1.8.0 but it fails on the 5º chunk (hdf5dataset.cpp from line 598). It seems that a significant part of the code on hdf5dataset.cpp has changed since the patch was released a year ago... Can somebody release a newer version of the patch?

comment:14 by warmerdam, 13 years ago

Milestone: 1.8.11.9.0
Resolution: fixed
Status: assignedclosed

I have slightly updated the patch and applied it in trunk (r22517).

At this point I think it is somewhat disruptive to apply in 1.8 branch.

Thanks!

Note: See TracTickets for help on using tickets.