wiki:ISO2ebRIMIssues

ebXML : Transforming ISO19139 metadata to ebRIM : issues in the specification

author: Heikki Doeleman

This page describes uncertainties arising from the obscurity of OGC 07-038, (mainly) section F.


Introduction

The specification in OGC 07-038 section F about how to register ISO metadata in a ebRIM registry is rather obscure. Apart from a very loose use of language relating to specific technical concepts, there are more things unclear. This page lists our uncertainties in how to interpret that document and choices we have made.


the list

  1. Type of DataSet

The object type of DataSet is defined in the Basic Extension package (OGC 07-144r2), as being "urn:ogc:def:ebRIM-ObjectType:OGC:Dataset". In the CIM spec (07-038), many references are made to DataSet, but with object type "urn:x-ogc:specification:csw-ebrim-cim:ObjectType:Dataset". This type is not defined in 07-038 (nor anywhere else that we are aware of). So what should we use? For now we have used the type defined in 07-144r2, as it is actually defined whereas the alternative type in 07-038 is undefined.

  1. Repeated filedIdentifier

Table F.2 says it is not mapped, but "see Table F.1". Does this mean the information (already put in a MetadataInformation in table F.1) must be repeated in this ResourceMetadata ? In this case we chose YES.

  1. Repeated language

Table F.2 says it is not mapped, but "see Table F.1". Does this mean the information (already put in a MetadataInformation in table F.1) must be repeated in this ResourceMetadata ? In this case we chose YES.

  1. Repeated parentIdentifier

Table F.2 says it is not mapped, but "see Table F.1". Does this mean the information (already processed into an extra MetadataInformation in table F.1) must be repeated ? In this case we opted for NO, as there already is a parent MetadataInformation created as per table F.1.

  1. Cardinality of identificationInfo

"In this profile, the cardinality of this property is restricted to 1..1 for the ISO 19139 metadata files stored in the ebRIM Repository." On the other hand, in table F.2 it says to process identificationInfo "for each instance of the property". We chose to have the transformation do it for each instance, and the caller of the transformation (the ISO2ebRIMService) is responsible for enforcing the cardinality constraint: it rejects ISO19139 documents that do not have exactly 1 identificationInfo.

  • Section F.3.1 distinguishes DataSet/DatasetCollection, Service and Application types of Information Resource by the value of 'hierarchyLevel'. In ISO19139 it is perfectly valid to have 0 or more than 1 hierarchyLevel. What to do in these cases ??? For now, I'm just using the first one if there are more, and if there are zero, try 'DataSet'.
  • Section F.3.1 tries to describe cases for distinguishing DataSet from DatasetCollection, when hierarchyLevel is 'dataset'. Says the spec: "in the case of an ISO 19139 compliant metadata record, the value of the MD_Metadata.hierarchyLevel property may serve as a discriminator since ISO 19139 extends the MD_ScopeCode codelist to add specific values for aggregation;" But alas, it's not defined exactly *which* codelist values map to aggregate. This is not obvious. I'm ignoring this for now.
  • Section F.3.2 is highly unintelligible. Two remarks here: (1) "Application, Dataset and DatasetCollection instances are part of the metadata context and do not have any attributes." Because I don't see much point in generating ExtrinsicObjects without any attributes (they would just clutter up the registry, not being related to anything), in my opinion we better skip these. And (2) the instance of DataMetadata it talks about -- I'm assuming they mean that the earlier-created ResourceMetadata (which is an abstract type, so really couldn't be instantiated) is here further specified in the case of the DataMetadata subtype. So I'm not creating a new ExtrinsicObject here, but continue processing the ResourceMetadata (now known to be DataMetadata) created earlier. Whether any of this is what the spec writer intended to convey, I don't know..
  • Table F.3 : abstract is mapped to Description. Now is a Description something that has a LocalizedString, where the actual string value goes in attribute 'value'. That one has a max length of 1024 !! I've seen many ISO19139 documents where the abstract is longer than that. These will all be rendered into invalid ebRIM documents. I think something needs to be done about this.
  • Table F.3 : spatialResolution may be mapped to <<slot>> resolution. In this case there are 2 values involved: "uom" (units) and the distance itself. I put this in a slot with a valuelist that has 2 values, arbitrarily having put the uom in the first value. Is there no specification about this ??
  • Tale F.3 : descriptiveKeywords : "The classification defines both the keyword type, the keyword and its thesaurus." Keyword type can be one of "discipline", "place", "stratum", "temporal" or "theme". Little problem: the keywords in ISO descriptiveKeywords are not qualified with any type. How can our XSLT transformation decide the type of arbitrary keywords? For now I'm ignoring the keyword type classification.

Section F.4 (Tables F.7, F.8, F.9)

  • this section states that for every occurrence of MD_Constraints, MD_LegalConstraints and/or MD_SecurityConstraints, an instanceo of Rights is created. Then it continues to state that in case of MD_LegalConstraints, an instance of Rights and of LegalConstraints are created. Likewise for MD_SecurityConstraints, an instance of Rights and of SecurityConstraints is created. Now, LegalConstraints and SecurityConstraints *extend* Rights. Wouldn't it be better if they would actually act as such, rather than be created in addition to the instance of their base class Rights ?
  • the instance of Rights contains a <<slot>> abstract with the value of useLimitation. But useLimitation is an optional element in MD_Constraints, MD_LegalConstraints and MD_SecurityConstraints. What to do when it is missing ? At the moment an 'empty' Rights is created.

Table F.15

  • "The existence of an instance of MD_Metadata.referenceSystemInfo will possibly imply to create an instance of CitedItem along with an instance of the association Auhority between IdentifiedItem and CitedItem."

*possibly* ? what is that supposed to mean ? I'm assuming : if there is an authority element in referenceSystemInfo.

  • alternateTitle : this has cardinality 0..n, but this spec doesn't mention it. I'm assuming they mean to say "for each".
  • date : must be mapped to <<slot>> created, <<slot>> modified or <<slot>> issued. The spec does not say *how* this must be mapped. I'm using 'creation', 'revision' and 'publication' from the codelists used in ISO.
  • date : this has cardinality 1..n, but this spec doesn't mention it. I'm assuming they mean to say "for each".
  • identifier.MD_Identifier.code : "Identifiers with no codespace do not carry sufficient information and are not mapped to externalIdentifier, for which the codespace is required." BUT MD_Identifier *never* has a codespace, per the XSD. Only its substitutiongroup RS_Identifier may have a codespace. I'm assuming they intended to say, RS_Identifier.

Table F.16

  • everytime, an Organization is created. So in this way these Organizations are never re-used / shared between data referring to them. Does not seem to make much sense, to me.
  • individualName : is ignored, but not "If needed". Well .. I'm ignoring it.
  • organizationName : this must be organisationName (with 's') in ISO.
  • organizationName : this is not a required element in ISO. What if it is absent ? The created Organization will be rather non-descript.
  • about the CitedResponsibleParty Association : "The association Type has a set of subtypes operating to the same object types: PointOfCOntact, Author, Originator, Publisher." This is not true, no such subtypes are defined. From clues elsewhere in that document I take it this stuff is handled by classifying the association.
  • the codelist values for gmd:role can be many other things than just 'pointOfCOntact', 'author', 'originator', or 'publisher'. If it is not one of those 4, I ignore it so no classification will be created. Does it make sense to you?

TO BE CONTINUED


Last modified 11 years ago Last modified on Mar 22, 2009, 4:36:18 AM