wiki:ISO2ebRIMIssues

Version 14 (modified by heikki, 16 years ago) ( diff )

--

ebXML : Transforming ISO19139 metadata to ebRIM : issues in the specification

author: Heikki Doeleman

This page describes uncertainties arising from the obscurity of OGC 07-038, (mainly) section F.


Introduction

The specification in OGC 07-038 section F about how to register ISO metadata in a ebRIM registry is rather obscure. Apart from a very loose use of language relating to specific technical concepts like XML 'elements' and 'attributes' (usually anything is called an 'attribute' or a 'property' in that document, regardless), there are more things unclear. This page lists our uncertainties in how to interpret that document.


the list

General

  • The object type of DataSet is defined in the Basic Extension package (OGC 07-144r2), as being "urn:ogc:def:ebRIM-ObjectType:OGC:Dataset". In the CIM spec (07-038), many references are made to DataSet, but with object type "urn:x-ogc:specification:csw-ebrim-cim:ObjectType:Dataset". This type is not defined in 07-038 (nor anywhere else that we are aware of). So what should we use ?? For now, I'm asuming the type defined in 07-144r2 is preferred, as it is actually defined, whereas the alternative type in 07-038 is undefined.

Table F.2 describes the creation of a ResourceMetadata object.

  • fileIdentifier : the table says it is not mapped, but "see Table F.1". Does this mean the information (already put in a MetadataInformation in table F.1) must be repeated in this ResourceMetadata ? In this case I opted for YES.
  • language : the table says it is not mapped, but "see Table F.1". Does this mean the information (already put in a MetadataInformation in table F.1) must be repeated in this ResourceMetadata ? In this case I opted for YES.
  • parentIdentifier : the table says it is not mapped, but "see Table F.1". Does this mean the information (already processed into an extra MetadataInformation in table F.1) must be repeated ? In this case I opted for NO, as there already is a parent MetadataInformation as per table F.1.

Section F.3

  • identificationInfo : "In this profile, the cardinality of this property is restricted to 1..1 for the ISO 19139 metadata files stored in the ebRIM Repository." Very good, but what to do with perfectly valid ISO19139 documents that have more than 1 identificationInfo ? Then in again, in Table F.2 it is stated to apply the Section F.3 transformation to "each instance of the property (identificationInfo)". I'm doing it for-each, now.
  • Section F.3.1 distinguishes DataSet/DatasetCollection, Service and Application types of Information Resource by the value of 'hierarchyLevel'. In ISO19139 it is perfectly valid to have 0 or more than 1 hierarchyLevel. What to do in these cases ??? For now, I'm just using the first one if there are more, and if there are zero, try 'DataSet'.
  • Section F.3.1 tries to describe cases for distinguishing DataSet from DatasetCollection, when hierarchyLevel is 'dataset'. Says the spec: "in the case of an ISO 19139 compliant metadata record, the value of the MD_Metadata.hierarchyLevel property may serve as a discriminator since ISO 19139 extends the MD_ScopeCode codelist to add specific values for aggregation;" But alas, it's not defined exactly *which* codelist values map to aggregate. This is not obvious. I'm ignoring this for now.
  • Section F.3.2 is highly unintelligible. Two remarks here: (1) "Application, Dataset and DatasetCollection instances are part of the metadata context and do not have any attributes." Because I don't see much point in generating ExtrinsicObjects without any attributes (they would just clutter up the registry, not being related to anything), in my opinion we better skip these. And (2) the instance of DataMetadata it talks about -- I'm assuming they mean that the earlier-created ResourceMetadata (which is an abstract type, so really couldn't be instantiated) is here further specified in the case of the DataMetadata subtype. So I'm not creating a new ExtrinsicObject here, but continue processing the ResourceMetadata (now known to be DataMetadata) created earlier. Whether any of this is what the spec writer intended to convey, I don't know..
  • Table F.3 : abstract is mapped to Description. Now is a Description something that has a LocalizedString, where the actual string value goes in attribute 'value'. That one has a max length of 1024 !! I've seen many ISO19139 documents where the abstract is longer than that. These will all be rendered into invalid ebRIM documents. I think something needs to be done about this.
  • Table F.3 : spatialResolution may be mapped to <<slot>> resolution. In this case there are 2 values involved: "uom" (units) and the distance itself. I put this in a slot with a valuelist that has 2 values, arbitrarily having put the uom in the first value. Is there no specification about this ??
  • Tale F.3 : descriptiveKeywords : "The classification defines both the keyword type, the keyword and its thesaurus." Keyword type can be one of "discipline", "place", "stratum", "temporal" or "theme". Little problem: the keywords in ISO descriptiveKeywords are not qualified with any type. How can our XSLT transformation decide the type of arbitrary keywords? For now I'm ignoring the keyword type classification.

Table F.15

  • "The existence of an instance of MD_Metadata.referenceSystemInfo will possibly imply to create an instance of CitedItem along with an instance of the association Auhority between IdentifiedItem and CitedItem."

*possibly* ? what is that supposed to mean ? I'm assuming : if there is an authority element in referenceSystemInfo.

  • alternateTitle : this has cardinality 0..n, but this spec doesn't mention it. I'm assuming they mean to say "for each".
  • date : must be mapped to <<slot>> created, <<slot>> modified or <<slot>> issued. The spec does not say *how* this must be mapped. I'm using 'creation', 'revision' and 'publication' from the codelists used in ISO.
  • date : this has cardinality 1..n, but this spec doesn't mention it. I'm assuming they mean to say "for each".
  • identifier.MD_Identifier.code : "Identifiers with no codespace do not carry sufficient information and are not mapped to externalIdentifier, for which the codespace is required." BUT MD_Identifier *never* has a codespace, per the XSD. Only its substitutiongroup RS_Identifier may have a codespace. I'm assuming they intended to say, RS_Identifier.

Table F.16

  • everytime, an Organization is created. So in this way these Organizations are never re-used / shared between data referring to them. Does not seem to make much sense, to me.
  • individualName : is ignored, but not "If needed". Well .. I'm ignoring it.
  • organizationName : this must be organisationName (with 's') in ISO.
  • organizationName : this is not a required element in ISO. What if it is absent ? The created Organization will be rather non-descript.
  • about the CitedResponsibleParty Association : "The association Type has a set of subtypes operating to the same object types: PointOfCOntact, Author, Originator, Publisher." This is not true, no such subtypes are defined. From clues elsewhere in that document I take it this stuff is handled by classifying the association.
  • the codelist values for gmd:role can be many other things than just 'pointOfCOntact', 'author', 'originator', or 'publisher'. If it is not one of those 4, I ignore it so no classification will be created. Does it make sense to you?

TO BE CONTINUED


Note: See TracWiki for help on using the wiki.