"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Metadata Profiles

Contents

Related pages



Overview - Need to establish "Best Practice"

Metadata profiles are being widely established to allow common discovery and cataloging based on the ISO 19115 standard. This provides an extensive dictionary of reusable elements, with the explicit intention that a profile will be created to enable implementations.

There appears, however, to be unresolved issues in the implementation of such profiles. Partly this stems from lack of an XML implementation binding for the generalised concept of profiles (ISO 19106), and as such each jurisdiction is developings similar but inconsistent mechanisms.

Some entries in Bryan Lawrence's blog tease out some of the issues - see to extend or not to extend ... and A proposal for profiling ISO19139.

A proposal

This is an attempt to summarize best practice, in a way that is conformant with ISO 19115, 19139 and consistent with the encoding rules used to convert UML to GML-style XML Schema.

Key references are
  • ISO 19115:2003
    • Annex C Metadata extensions and profiles
    • Annex F Metadata extension methodology in particular F.10 "Documentation of metadata extensions"
  • ISO DTS 19139:2005(E)
    • Annex A.3 Conformance Requirements - Extensions
    • Annex A.4 Conformance Requirements - Restrictions

Also used is
  • ISO 19118 revision - Committee Draft ISO document 211n2151
    • Annex A.5 Schema conversion rules is a slight elaboration of Annex E from GML 3.2 which outlines the UML-GML encoding rules, and in particular defines a set of UML tagged-values that support implementation-specific rules.

Profiles may both restrict and extend the standard model, as illustrated in the "eggs and bacon" graphic from ISO 19115 Annex C:
eggsAndBacon.PNG

Restrictions

Restrictions come primarily (only?) in two forms
  1. more stringent metadata obligation
  2. limiting the members that can be selected from a choice, union, substitution group or "wildcard" (including xs:anyType and xs:string)
    • (special case of the previous item) more restrictive value-space for a literal value (i.e. codeList, pattern, range of validity - including both text and numerics)

All obligations of the base schema must still be satisfied, and all restricted values must conform to the original value-spaces. An instance that is valid according to a profile derived by restriction must be valid against the base schema (i.e. we are considering that part of the bacon that overlaps the egg).

ISO 19139 Annex A.4 directs that

"restriction ... is done through annotation in UML"
  • the key point here is that the annotation mechanism is explicitly mandated in preference to the "override" mechanism available in UML (... whereby an attribute or role is re-declared on a specialized class, with its declaration replacing the declaration on the parent). There is a comment on this in the introductory paragraph to sub-clause 8.5.3.
  • detailed implementation: the directive is a little coy (maybe reflecting limitations of some UML tools). In fact all valid restrictions can be expressed as constraints, so the "annotation" may be expressed formally as a constraint on the specialized class. Furthermore, since we are dealing with a static data representation, we need only consider Invariants.

"... and enforced via a tool other than an XML Schema validator in the namespaces defined in this TS"
  • i.e. validity according to a restriction profile should be tested using an additional tool, in a separate pass independent of XML Schema validation used to test conformance to the ISO 19139 schema
  • detailed implementation: the most standardized constraint-language for XML is Schematron - ISO/IEC 19757-3:2006, so the "tool" could take the form of a Schematron schema and processor

Application example

For example, the ANZLIC ISO Profile specifies that the Obligation of MD_Metadata/fileIdentifier should be changed from the base schema value of "Optional" [0..1], to "Mandatory" [1..1]. This UML expresses the revised obligation as a constraint:
restrictionExample.png

Here the constraint is expressed in the language usually used in conjunction with UML - Object Contraint Language (OCL). In addition, the tagged value xsdDerivation is set to "false". This is a flag to the UML->XML encoding tool that the specialized class should not lead to generation of a new XML Schema type and element, since the specialization condition is expressed as a constraint on the element from the base schema instead.

In XML the constraint can be implemented using a Schematron assertion, which in this case may be formulated to be syntactically identical to the OCL, except for the appearance of the "gmd" namespace prefix:

   <sch:pattern name="fileIdentifier required">
      <sch:rule context="//gmd:MD_Metadata">
         <sch:assert test="count(gmd:fileIdentifier) = 1">fileIdentifier not present</sch:assert>
         <!-- the text "fileIdentifier not present" only gets emitted if the assertion fails -->
      </sch:rule>
   </sch:pattern>

It might be expected that a "symmetric" rule using the sch:report construct would have the same effect. e.g.

   <sch:pattern name="fileIdentifier required">
      <sch:rule context="//gmd:MD_Metadata">
         <sch:report test="count(gmd:fileIdentifier) = 0">fileIdentifier not present</sch:assert>
      </sch:rule>
   </sch:pattern>

However, this is not always the case. For example, the XPath count() function does not get fired if no gmd:fileIdentifier is found. (maybe ... still testing).

OCL vs Schematron

This pattern seems to be under control except that the constraint language normally used with UML (though not mandatory) is OCL. OCL is considerably more expressive than Schematron, and there is no general method to transform OCL into Schematron. It may be possible to describe an OCL profile that is compatible with Schematron. Or perhaps we can actually use Schematron as the constraint language in the UML model. The examples below show that this may not be as crazy as it sounds.

Schematron is useful for other conformance rules too

Schematron can effectively test some restrictions that cannot be expressed using the XML Schema grammar (e.g. conditional obligations, or co-constraints where the value or type of a node depends on the value or type of another node that is not in its direct parent tree). Hence, Schematron may be used to test conformance to the "Conformance Rules not enforceable with XML Schema" as listed in Annex A, Table A.1 of ISO DTS 19139. A sample schema for this is available at XmmlSVN:metadata/Tools/isoConformanceRules.sch

"Clone and Modify"?

An XML Schema for a restricted profile that can test both profile validity and base conformance in a single-pass may be generated using a "clone-and-modify" approach, starting with the ISO 19139 schema. However, this is likely to lead to maintenance challenges, and in particular to general confusion if such a modified schema - that claims to define XML in the same namespaces as ISO 19139 - gets out into the wild. Hence, the clone-and-modify strategy should only be used in private, if single-pass validation pipeline is essential for engineering reasons.

Extensions

Extensions come in many more forms. ISO 19115 Annex F identifies
  1. new metadata section
  2. new codelist
  3. new codelist element
  4. new metadata element (i.e. UML class attribute or association)
  5. new metadata entity (i.e. UML class)
and also recognises that these may be combined with restrictions (i.e. the bacon that overlaps part of the egg, and also flaps outside)

ISO 19115 Annex F provides a detailed methodology including
  • assessing the need to introduce an extension
  • documenting it
    1. using the Metadata Extension classes
      • the XML implementation is implemented in the metadataExtension.xsd document in ISO 19139. Instances of these elements should accompany a metadata document that contains extensions
    2. using UML in a new package(s)
    3. in data dictionary elements following the pattern of ISO 19115 Annex B

ISO 19139 sub-clause 8.5.3 describes the XML Schema implemention of classes derived by inheritance in UML. In addition, Annex A.3 specifies the following related to metadata extensions:
  1. any restrictions combined into the profile must follow the same rules as for a profile that is purely restrictive - i.e. use constraints, not overrides
  2. new elements may not be added to the base classes - sub-classing must be used
  3. new entities must be in their own namespace (and UML package)
  4. XML elements that represent classes that are extended versions of classes from the base schema should be specified to carry the attribute gco:isoType. In an instance document its value is the name of the element/class from the base schema that it is substituting for.

The last item provides a way for a client application that only understands the base schema to process a metadata document that conforms to an extension profile.

Note: In ISO 19139 the @gco:isoType attribute is of type="xs:string". The value may be expressed as a xs:QName in order to precisely describe the XML implementation, rather than the UML class name.

This suggestion has been submitted to ISO as a proposed erratum to ISO 19139.

Application example

For example, the Australian Marine metadata Profile specifies that the primary metadata entity MP_Metadata should add an attribute revisionDate to the ANZLIC ANZ_Metadata class. This UML shows the extended class:
extensionExample.png

The new class is in a new package. The tagged value xsdDerivation is now set to "true" (which is its default value).

The XML Schema implementation is as follows:
<schema 
   targetNamespace="http://www.noo.gov.au/xml/mp" 
   xmlns="http://www.w3.org/2001/XMLSchema" 
   xmlns:mp="http://www.noo.gov.au/xml/mp" 
   xmlns:gco="http://www.isotc211.org/2005/gco" 
   xmlns:gmd="http://www.isotc211.org/2005/gmd">
    <!-- ============= -->
    <import namespace="http://www.isotc211.org/2005/gmd"/>
    <import namespace="http://www.isotc211.org/2005/gco"/>
    <!-- ============= -->
    <element name="MP_Metadata" type="mp:MP_Metadata_Type"/>
    <!-- .............................. -->
    <complexType name="MP_Metadata_Type">
        <complexContent>
            <extension base="gmd:MD_Metadata_Type">
                <sequence>
                    <element name="revisionDate" type="gco:Date_PropertyType"/>
                </sequence>
                <attribute ref="gco:isoType" fixed="gmd:MD_Metadata" use="required"/>
            </extension>
        </complexContent>
    </complexType>
    <!-- ============= -->
</schema> 

There is no XML schema implementation of ANZ_Metadata for the ANZLIC profile, so the MP entity extends the parent type from the base "gmd" schema.

In an instance document, an example of a metadata entity may appear (in part) as follows:
<mp:MP_Metadata 
   xmlns:mp="http://www.noo.gov.au/xml/mp" 
   xmlns:gco="http://www.isotc211.org/2005/gco" 
   xmlns:gmd="http://www.isotc211.org/2005/gmd" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:gml="http://www.opengis.net/gml" 
   xsi:schemaLocation="http://www.noo.gov.au/xml/mp ./MP.xsd" 
   gco:isoType="gmd:MD_Metadata" 
   id="MP_test_1">
    <gmd:fileIdentifier>
        <gco:CharacterString>MP.xml</gco:CharacterString>
    </gmd:fileIdentifier>
    <gmd:contact xlink:href="urn:x-ogc:def:nil:OGC:unknown"/>
    <gmd:dateStamp>
        <gco:Date>2006-10-16</gco:Date>
    </gmd:dateStamp>
    <gmd:identificationInfo xlink:href="urn:x-ogc:def:nil:OGC:unknown"/>
    <mp:revisionDate>
        <gco:Date>2006-10-17</gco:Date>
    </mp:revisionDate>
</mp:MP_Metadata> 

mp:MP_Metadata carries the attribute gco:isoType="gmd:MD_Metadata" indicating to an ISO-aware reading application that it is a specialization of gmd:MD_Metadata

Converting a profile back to ISO

Any ISO-aware processor which is not configured to understand the MP schema can process this document with a simple rule: if an unknown element (from an unknown namespace) is encountered, inspect it to see if is has a gco:isoType attribute. If so, then replace its name with this value. If not, throw it away.

We have implemented an XSLT script to process documents this way - see XmmlSVN:metadata/Tools/isoTypeReplace.xslt

Validation of a profile with both extensions and restrictions

In order to test conformance to to the Australian Marine profile at least three operations must be performed:
  1. validate using the MP schema (which imports MD)
    • this will test both the MP extensions and base MD validity
  2. convert the MP instance back to an MD instance
    • this produces a document containing only those namespaces understood by the ANZLIC processor
  3. test that the document conforms to the ANZLIC constraints, using the ANZLIC Schematron schema

In any situation where profiles are stacked on other profiles, validation has to pay careful attention to "unwinding" multiple extensions and restrictions in the correct order.

OCL paths vs GML pattern and XPath

The following example combines the two specializations shown above, and also shows one of the other kinds of constraint, viz. use of a specialized vocabulary "ANZ Theme" on the descriptiveKeywords within ANZ_DataIdentification: MetadataProfilePatterns.png

The constraint involves testing the value of a non-local parameter thesaurusName. Two implementations of the constraint are illustrated: the first one uses OCL syntax, the second as the XPath expression that would be in the equivalent Schematron sch:assert statement on the XML implementation.

The OCL syntax for paths that traverse associations is different to XPath. Here the differences are because:
  • the OCL path separator "." is different to XPath "/"
  • the XPath expression reflects the UML->XML encoding rule, in which the UML class name appears in the path as well as the property names

However, the conversion from OCL to Schematron is still relatively straightforward. (I wonder if it always is?)

-- SimonCox - 04 Feb 2007


Other Issues

Externally defined values

According to ISO 19106, "Specifications of the applications of each referenced base standard or profile, stating the choice of classes or conformance subsets, and the selection of options, ranges of parameter values, for profiles;"

In the simplest cases, small, static lists of allowable terms may be defined (at the implementation level) using GML dictionaries ( CodeListsAndDictionaries) but this does not hold true in the more general cases where metadata needs to be tied to actual data. In this case the controlled vocabularies (including thesauri, ontologies) etc used within the data product specification should be directly referenced by the metadata profile.

For example, a biodiversity related data set ("Beetles of Lower Wombat") might refer to a subset of the (large) species taxonomy. It is clearly impractical to specify the list of species names in an enumeration - it will be large, structured and dynamic (every survey might add new species), and yet it is also clearly desirable to be able to locate the data set using the term "Scarabidae" from the species taxonomy for example.

Another basic pattern is the use of a set of features managed through another process to provide a frame of reference - for example water quality might reference a set of sampling stations, on named rivers. In the meterological domain, weather stations are referenced in data set metadata.

Even the process of referencing the feature type catalog (cf DGIWG profile) in use in the data set creates a "foreign key" to a set of externally managed entities.

Every metadata profile reviewed to date has adopted a different set of approaches. A common approach is required.

see TermResolutionMechanisms.

Tooling

Software to support metadata profiles, and data standards in general is readily conceivable, but is hampered by a lack of clarity about the implementation and deployment model for metadata profiles, and the lack of commonality in the way vocabularies are referenced.

The set of tools that can be made consistent, with this in place, might include:
  • metadata entry
  • catalogues
  • data modelling aids
  • data access services
  • discovery components

Implementations

An (incomplete) review of activities underway has shown the need for the development and adoption of a common best practice in these matters. Please contact us Main.RobAtkinson to record links to other activities.

The following jurisdictions are known to be undertaking development of ISO 19115 based Metadata Profiles:

And the following subject domains with levels of international collaboration:

This is also expected to be common practice at sub-national and national, domain-specific activities.

Incorporating Metadata Profiles into domain models

Metadata needs to be managed during data creation, and exploited during data access. It is therefore imperative that metadata objects are included within the scope of domain models and application schemas. Incorporating metada profiles allows explicit modelling of common semantics, and provides a way to express dependencies between registers of items and references to these used during the discovery or binding process.

The following example shows the inclusion of a metadata profile into the ANZLIC Harmonised Data Framework (work in progress)

* ANZLIC HDM Metadata Profile relationships:
ISO_Profile_Perspective.JPG

Issues outstanding:
  • Do Feature and Attribute Level metadata derive from a super- or sub- profile of the ANZLICMetadataProfile?
  • If domain packages (eg. HDM_Roads) derive from an international standard, what is the relationship to the local (ANZLIC) metadata profile and the one in use in the international standard (this is shown as a <> association, on the basis that the derivation may be one of finding a consistent implementation, not direct derivation

Imagine now a domain specific metadata profile:

* Domain Metadata Profile Usage example:
DomainMetadataProfile.JPG

RoadsMetadataProfile is a specialised metadata profile constrained to desribe datasets that realise the HDM_Roads domain model. (This allows for vastly more convenient data entry as well as a means for establishing semantic interoperability of data services in this domain.

Clearly there is a need to keep the RoadsMetadataProfile aligned with the HDM_Roads model, and common governance of the semantics is indicated. The model is designed to allow creation of data sets, and the metadata should be seen as part of the data set.

-- RobAtkinson - 29 Sep 2006

 
Topic attachments
I Attachment Action Size Date Who Comment
DomainMetadataProfile.JPGJPG DomainMetadataProfile.JPG manage 36.7 K 29 Sep 2006 - 12:06 RobAtkinson Domain Metadata Profile Usage example
ISO_Profile_Perspective.JPGJPG ISO_Profile_Perspective.JPG manage 38.8 K 29 Sep 2006 - 11:51 RobAtkinson ANZLIC HDM Metadata Profile relationships
MetadataProfilePatterns.pngpng MetadataProfilePatterns.png manage 13.7 K 04 Feb 2007 - 14:17 SimonCox UML implementation of Metadata profile patterns
eggsAndBacon.PNGPNG eggsAndBacon.PNG manage 16.1 K 02 Feb 2007 - 11:39 SimonCox Profiles may both restrict and extend the standard model
extensionExample.pngpng extensionExample.png manage 5.4 K 04 Feb 2007 - 11:35 SimonCox Example of a metadata-entity derived by extension
restrictionExample.pngpng restrictionExample.png manage 4.5 K 04 Feb 2007 - 11:04 SimonCox ANZLIC restriction example
Topic revision: r16 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).