Opened 10 years ago

Closed 3 years ago

#4245 closed defect (wontfix)

PAM stores duplicate metadata even if the format and driver support

Reported by: etourigny Owned by: warmerdam
Priority: normal Milestone: closed_because_of_github_migration
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords: pam, metatada, netcdf
Cc:

Description

PAM-enabled drivers always store metadata when called to save the xml in SerializeToXML(), regardless of internal support for metadata (e.g. gtiff, netcdf). This has the effect of duplicating information and can be confusing and lead to conflicting values. This also does not conform to the spirit of PAM, which is to add information that the driver/format cannot support (e.g. global metadata).

Of all the information stored in PAM files by SerializetoXML(), metadata is the only which is automatically copied.

For example, when calling gdalinfo -stats trmm.tif band statistics are saved in trmm.aux.xml for the raster bands, but all the global metadata is saved also, even there is no need for it as the tiff file has that metadata.

A simple fix would be to add an option in GDALPamDataset for saving global metadata through a new flag GPF_NOSAVEMETADATA in gdal_pam.h . This would not cause any disruptive changes, as nothing is changed unless a driver author adds that flag to his driver.

#define GPF_NOSAVEMETADATA      0x20  // do not try to save metadata - use if metadata is supported internally

and the metadata section in function GDALPamRasterBand::SerializeToXML() would be modified like:

+    if ( ! ( nPamFlags & GPF_NOSAVEMETADATA ) ) {
        CPLXMLNode *psMD;
        
        psMD = oMDMD.Serialize();
...
+    }

to implement this in a driver add to it's Open() function :

    poDS->nPamFlags |= GPF_NOSAVEMETADATA;

A more involved fix would be to track or detect which (new) metatada items cannot be saved to the file and only save those (e.g. when read-only such as in gdalinfo, or when unsupported by the file format).

Attaching a small patch for GDALPamDataset and test file trmm.tif

Attachments (3)

patch-pam-metadata.txt (1.7 KB) - added by etourigny 10 years ago.
trmm.tif (8.8 KB) - added by etourigny 10 years ago.
trmm.tif.aux.xml (2.5 KB) - added by etourigny 10 years ago.

Download all attachments as: .zip

Change History (8)

Changed 10 years ago by etourigny

Attachment: patch-pam-metadata.txt added

Changed 10 years ago by etourigny

Attachment: trmm.tif added

comment:1 Changed 10 years ago by Even Rouault

Etienne, I think that SerializeToXML() will only serialize the metadata that reached PAM, because it inspects its own oMDMD member, not the metadata at the dataset or band level, that would be unknown by him if the driver overrides the GetMetadata?(), GetMetadataItem?(), SetMetadata?() and SetMetadataItem?() methods to do its own management. Generally this is done by having a oDriverNameMDMD member in the driver and playing with PAM oMDMD member as well. The GTiff driver is a - complicated - example of how to do this (although there are some complexities/specificities in it you can drop, like the _temporary_ metadata domain management). I believe it behaves as you wish (it should minimize what goes to PAM to the minimum, and successfully merges the PAM and geotiff metadata), but AFAIR, this has been a long process to reach that state.

Although, if you just want to avoid the duplication of metadata in the CreateCopy?() case, I believe you could just replace the following line at the end of CreateCopy?() :

poDS->CloneInfo?( poSrcDS, GCIF_PAM_DEFAULT )

by

poDS->CloneInfo?( poSrcDS, GCIF_PAM_DEFAULT & ~ (GCIF_METADATA | GCIF_BAND_METADATA) )

This should have an excellent benefit/cost ratio ;-)

comment:2 Changed 10 years ago by etourigny

Even, thanks for the info, I will try this for netcdf. However, I believe there is a flaw in the geotiff PAM, in that it adds duplicate metadata.

I will attach a .aux.xml file resulting from "gdalinfo -stats", there is a bunch of metadata in there that is duplicated needlessly.

Changed 10 years ago by etourigny

Attachment: trmm.tif.aux.xml added

comment:3 Changed 10 years ago by Even Rouault

Looks like I was a bit too optimistic on the "should minimize what goes to PAM to the minimum" aspect ;-) But I've verified that the merging between PAM metadata and GTiff metadata works (with PAM metadata overriding GTiff metadata in case of "conflict"). Current behaviour can indeed cause weird behaviour : for example, if you have a metadata item that exists in both PAM and GTiff metadata (let's say with identical values), you open the geotiff in update mode, and change the value of the item, when you close it the value inside the geotiff will be updated with the new value, but when you reopen it, the old value that is still in PAM will be used. I'm afraid it would require substantial changes in the metadata management of the GTiff driver to fix that, because I think it implies one should keep track of what exists in PAM, in the GTiff, and the merged result, and proceed to updating both PAM and GTiff metadata differently in certain circumstances... So, if you really want to have "perfect" behaviour in the netCDF driver, the GTiff driver isn't the perfect example (the GTiff driver having complications since it stores metadata at dataset and band level, and that the metadata TIFF tag cannot exceed 32000 bytes in content, so when you go over that, you must fallback to PAM anyway).

comment:4 Changed 10 years ago by etourigny

Keywords: pam metatada netcdf added

comment:5 Changed 3 years ago by Even Rouault

Milestone: closed_because_of_github_migration
Resolution: wontfix
Status: newclosed

This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub?. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.

Note: See TracTickets for help on using tickets.