Opened 16 years ago
Closed 13 years ago
#2412 closed defect (fixed)
Incorrect handling of HDF5 metadata in python bindings
Reported by: | antonio | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | 1.9.0 |
Component: | GDAL_Raster | Version: | 1.5.1 |
Severity: | normal | Keywords: | HDF5 |
Cc: | dnadeau, Kyle Shannon, alexmantaut |
Description
It is not possible to access HDF5 metadata from python bindings. Small test performed on Debian Sid (x86_64) with gdal 1.5.1
$ gdal-config --version 1.5.1 $ gdalinfo test.h5 Driver: HDF5/Hierarchical Data Format Release 5 Size is 512, 512 Coordinate System is `' Metadata: G1:b1=1 G1:b2=2 G1:b3=3 Subdatasets: SUBDATASET_0_NAME=HDF5:"test.h5"://G1/D1 SUBDATASET_0_DESC=[1x1] //G1/D1 (16-bit integer) Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 512.0) Upper Right ( 512.0, 0.0) Lower Right ( 512.0, 512.0) Center ( 256.0, 256.0) $ gdalinfo HDF5:test.h5://G1/D1szFilenname test.h5 Driver: HDF5Image/HDF5 Dataset Size is 1, 1 Coordinate System is: GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9108"]], AXIS["Lat",NORTH], AXIS["Long",EAST], AUTHORITY["EPSG","4326"]] Metadata: G1:b1=1 G1:b2=2 G1:b3=3 Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 1.0) Upper Right ( 1.0, 0.0) Lower Right ( 1.0, 1.0) Center ( 0.5, 0.5) Band 1 Block=1x1 Type=Int16, ColorInterp=Undefined Metadata: D1:c1=1 D1:c2=2 D1:c3=3 $ ipython Python 2.5.2 (r252:60911, May 7 2008, 15:21:12) Type "copyright", "credits" or "license" for more information. IPython 0.8.2 -- An enhanced Interactive Python. ? -> Introduction to IPython's features. %magic -> Information about IPython's 'magic' % functions. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: from osgeo import gdal In [2]: d = gdal.Open('test.h5') In [3]: d.GetMetadata() Out[3]: {'G1': 'b3=3 '} In [4]: d = gdal.Open('HDF5:test.h5://G1/D1') szFilenname test.h5 In [5]: d.GetMetadata() Out[5]: {'G1': 'b3=3 '} In [6]: b = d.GetRasterBand(1) In [7]: d.GetMetadata() Out[7]: {'D1': 'c3=3 '}
Both in datasets and raster bands only the last metadata item is showed in form 'key=value', whith the name of HDF5 group/dataset is used as metadata key.
Attachments (2)
Change History (16)
follow-up: 2 comment:1 by , 16 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:2 by , 16 years ago
Replying to warmerdam:
The problem is that ":" is considered to be a keyword/value seperator much like "=" is. So the metadata items end up being treated as a bunch of "G1" keywords and when put into a dictionary in Python they collide and only the last survives.
I think we need to review the naming convention for metadata in the HDF5 driver.
The problem isn't in the Python bindings per-se, it's just that it manifests there because they take the dictionary like behavior of the metadata more literally than the core API.
Denis - any thoughts on the keyword naming and if it can be easily changed?
Oh, my suggestion is to use the full-path of the attribute: key="/G1/D1/c3". Is it feasible?
Another solution could be using XML, but I suppose this will take a larger effort.
follow-up: 4 comment:3 by , 16 years ago
The keys are intended to be "proper tokens". I would advise against using special characters like slashes. Better (I think) to replace the : with an underscore.
But I'm not sure what the implications of doing this are.
comment:4 by , 16 years ago
Replying to warmerdam:
The keys are intended to be "proper tokens". I would advise against using special characters like slashes. Better (I think) to replace the : with an underscore.
But I'm not sure what the implications of doing this are.
Yes you are absolutely right.
Also consider that HDF5 attributes may have names with space inside and products of the COSMO SkyMed mission have a lot of this ones:
/S01/SBI:Column Spacing /S01/SBI:Bottom Left Geodetic Coordinates /S01/SBI:Bottom Right Geodetic Coordinates /S01:Calibration Constant /:Acquisition Mode ...
It is also important to note that metadata needed to actually use COSMO SkyMed products are spread over the root group (/), the /S01 group and the /S01/SBI dataset.
It would be nice to have a mechanism that allows to get them all together in a consistent way.
follow-up: 6 comment:5 by , 14 years ago
I recently discovered the existence of the GetMetadata_List() method that returns a list of "key=value" metadata item without trying to separate the key and value as in the dictionnary object returned by GetMetadata(). This might help a lot for the issue at stake here.
comment:6 by , 14 years ago
Replying to rouault:
I recently discovered the existence of the GetMetadata_List() method that returns a list of "key=value" metadata item without trying to separate the key and value as in the dictionnary object returned by GetMetadata(). This might help a lot for the issue at stake here.
thanks Even, good suggestion.
Anyway I hope a decision on a more radical solution will be taken soon. I volunteer to implement the patch.
Further than solution suggested in previous comments:
- a better naming convention, e.g.
"S01|SBI||Bottom Right Geodetic Coordinates"
- XML
I would like to suggest a third possible way:
- using dataset domains (e.g. "S01" or "S01|SBI") and the HDF5 attribute name ("Bottom Right Geodetic Coordinates") as metadata key.
In order to allow the user to know available domains a special attribute could be added to the default domain:
HDF_DOMAINS=S01, S01|SBI, ...
follow-up: 8 comment:7 by , 14 years ago
Once again, I would suggest underscores instead of a special character like "|" for directory delimeters in metadata item names.
Pushing metadata into many domains will make it inaccessable in most applications. The normal practice is to flatten it out into the default domain unless it is anticipated to be very bulky and of little general interest.
I have so far avoided metadata listing domain names - partly with the idea that a uniform mechanism for domain discovery might be added in the future. Also, it would be easy to get into the situation where only the default domain metadata with the HDF_DOMAINS is copied to a new dataset but the non-default domains are not copied.
A flattening patch with underscores would be welcome.
comment:8 by , 14 years ago
Replying to warmerdam:
Once again, I would suggest underscores instead of a special character like "|" for directory delimeters in metadata item names.
[CUT]
A flattening patch with underscores would be welcome.
OK Frank.
Just few questions:
- HDF5 node names and attribute names with spaces are OK or they need a special handling?
- if node or attribute name already has an underscore inside what should we do? ignore them? convert them into a double underscore?
- being able to identify the original HDF5 path from the metadata key is a requirement?
Do you have any suggestion?
follow-up: 11 comment:9 by , 14 years ago
I think spaces need to be converted to underscores. The GDAL "metadata key" name should be a well behaved identifier.
There is no need to change existing underscores.
I do not think it is important to be able to convert the metadata key name back into an HDF path.
comment:10 by , 14 years ago
Component: | PythonBindings → GDAL_Raster |
---|---|
Keywords: | python removed |
Milestone: | 1.5.4 → 1.7.3 |
by , 14 years ago
Attachment: | hdf5-metadata.patch added |
---|
by , 14 years ago
Attachment: | metadata.h5 added |
---|
comment:11 by , 14 years ago
Hi Frank,
Replying to warmerdam:
I think spaces need to be converted to underscores. The GDAL "metadata key" name should be a well behaved identifier.
There is no need to change existing underscores.
I've just uploaded a patch for HDF5 metadata names convention change (it also includes tests) and a test dataset to be added to autotest/gdrivers/data
.
Please note that with the new naming convention metadata names are always prefixed by the HDF5 path, so for example
$ gdalinfo HDF5:data/metadata.h5://G1/D1 Driver: HDF5Image/HDF5 Dataset Files: data/metadata.h5 Size is 1, 1 Metadata: G1_attribute_with_underscores=0 G1_attribute_with_spaces=0d G1_attribute_with_spaces_and_underscores=0.1 G1_attribute=value Group_with_spaces_attribute_with_underscores=0 Group_with_spaces_attribute_with_spaces=0d Group_with_spaces_attribute_with_spaces_and_underscores=0.1 Group_with_spaces_attribute=value Group_with_spaces_and_underscores_attribute_with_underscores=0 Group_with_spaces_and_underscores_attribute_with_spaces=0d Group_with_spaces_and_underscores_attribute_with_spaces_and_underscores=0.1 Group_with_spaces_and_underscores_attribute=value Group_with_underscores_attribute_with_underscores=0 Group_with_underscores_attribute_with_spaces=0d Group_with_underscores_attribute_with_spaces_and_underscores=0.1 Group_with_underscores_attribute=value attribute=value attribute_with_spaces=0 attribute_with_underscores=0d attribute_with_spaces_and_underscores=0.100000 Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 1.0) Upper Right ( 1.0, 0.0) Lower Right ( 1.0, 1.0) Center ( 0.5, 0.5) Band 1 Block=1x1 Type=Int32, ColorInterp=Undefined Metadata: G1_D1_attribute_with_underscores=0d G1_D1_attribute=value G1_D1_attribute_with_spaces_and_underscores=0.100000 G1_D1_attribute_with_spaces=0
Metadata attached to the raster band are prefixed with the HDF5 path even if it could be considered redundant. Do you think this is the correct behaviour? Is it preferable to to remove the HDF5 path component of the name for metadata attached to raster bands?
I do not think it is important to be able to convert the metadata key name back into an HDF path.
In my opinion being able to rebuild the original HDF5 path could be important in case one want to implement write support for the HDF5 driver.
comment:12 by , 14 years ago
Cc: | added |
---|
comment:13 by , 13 years ago
Cc: | added |
---|
Hi all,
I've tried to apply the patch posted earlier to gdal 1.8.0 but it fails on the 5º chunk (hdf5dataset.cpp from line 598). It seems that a significant part of the code on hdf5dataset.cpp has changed since the patch was released a year ago... Can somebody release a newer version of the patch?
comment:14 by , 13 years ago
Milestone: | 1.8.1 → 1.9.0 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
I have slightly updated the patch and applied it in trunk (r22517).
At this point I think it is somewhat disruptive to apply in 1.8 branch.
Thanks!
The problem is that ":" is considered to be a keyword/value seperator much like "=" is. So the metadata items end up being treated as a bunch of "G1" keywords and when put into a dictionary in Python they collide and only the last survives.
I think we need to review the naming convention for metadata in the HDF5 driver.
The problem isn't in the Python bindings per-se, it's just that it manifests there because they take the dictionary like behavior of the metadata more literally than the core API.
Denis - any thoughts on the keyword naming and if it can be easily changed?