"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Resolving terms referenced from within XML schemas and instance documents

Contents

Related pages



Overview of issue

There are many cases in XML based deployment of information systems where we wish to specify constraints on an element that its content must belong to an externally defined codeSpace.

It has been proposed to use *GML Dictionaries" for this purpose but they are clearly inadequate:
  • syntax for resolution of a term (URI-to-dictionary#term) is not actually resolvable at the remote URI - this syntax explicity demands that the entire dictionary is delivered to the client, which then locates

Architecture Workshop

see VocabularyBindingMechanismsWorkshop for plans to explore registry oriented approaches to vocabulary bindings in XML and other metadata artefacts.

Available Mechanisms

Term patterns

Binding (of a term to a register/dictionary) can be achieved in several ways. The usual pattern described above is to append an ID to a URI

  • URI#id

which could be used by the client to locate an id within the result set.

An alternative would be
  • URI#xpointer

where the xpointer would be evaluated.

Another approach would be to implement a production rule:

URI#xpointer => (URI + xpointer)#xpointer

eg

CodeList="http://vocabs_r_us.com/taxonomyService?taxonomy=things" CodeListValue="termId" I've corrected this to what I think you meant? -- SimonCox - 22 Nov 2006

becomes

http://vocabs_r_us.com/taxonomyService?taxonomy=things&term=termId#termId

(both server side and client side selection of relevant node) Don't quite understand the point of this -- SimonCox - 22 Nov 2006

Of course there is a lot of existing work on identifier-resolution technologies. For example see OpenURL (not sure how Open it is since a private company (ExLibris) iappears to be involved) and XRI. We should inspect these to see if there is anything to be either learned or adopted.

-- SimonCox - 22 Nov 2006

CodeList

These mechanisms make use of the ISO19115 CodeList:

GML dictionaries (GMLD)

see CodeListsAndDictionaries

Note that the proposed mechanism #term is not "engineering feasible" in the general case, and also that a profile must specify explicitly that a GML dictionary is resolvable at the end of a CodeList URI

Magic Direct URI (URI)

In this case a URI is provided, and its interpretation is on the basis of some "magic" - an implicit contract between the data creator and the software that resolves the linkage.

The URI will link directly to a resource where the term can be found For instance, any URI that does not specify a particular resource type (i.e. the resource type is not specified in the profile).

The ISO 19139 spec itself shows examples of this:
<dateType>
  <CI_DateTypeCode codeList="./resources/codeList.xml#CI_DateTypeCode"
      codeListValue="publication">publication</CI_DateTypeCode>
</dateType>
 

Many such URIs are conceivable:

WFS reference: http://a-domain.org/mywfs?service=WFS&request=GetFeature&id=60405

SPARQL reference:

(can no longer find a URI binding to SPARQL, but one could image a production rule to express the term within a SPARQL query)

Notice that the information seems to be redundant in this usage - codeListValue replicates the element content

URN

In this case the CodeList is a URN reference that the client must resolve somehow.

Indirect URI (IURI)

In this case the CodeList is a URI that must be resolved by the client to find a term.

An example would be a link to a WSDL binding for a catalog service, or a WFS capabilities etc that would be interpreted in a specific way to allow a service invocation to be performed (a direct URI is a trivial case of this)

ISO 19115 CT_Catalogue

This seems to be a construct designed to allow packaging of dictionaries with data transfer:

"In practice, the information needed to exploit a dataset or an aggregate is not limited to their metadata. Particularly:
  • the metadata cites the feature and portrayal catalogues but does not embed them;
  • the metadata instances reference information such as codelists, unit of measures and coordinate
reference systems that all need to be accessed. All of those resources may be managed externally in on-line registries, but it is usually necessary, in the context of interchange by transfer, to be able to provide that information within the transfer datasets and transfer aggregates. The abstract concept of catalogue (CT_Catalogue) corresponds exactly to those resources needed to exploit the datasets, aggregates and their metadata. This concept is detailed in 7.4.4 Catalogues are associated to transfer datasets (MX_Dataset) and transfer aggregates (MX_Aggregate)."

ISO 19115 MD_Identifier, RS_Identifier (MD_I)

MD_identifier is a construct that implements a value that is an externally defined identifier. This is a more heavyweight version of the CodeList concept, not clear when one or other should be used.

e.g.
   <RS_Identifier>
      <code>
        <gco:CharacterString>GDA 94</gco:CharacterString>
      </code>
      <codeSpace>
         <gco:CharacterString>DIPR</gco:CharacterString>
      </codeSpace>
  </RS_Identifier>

codeSpace is just as limited as in CodeList -and needs a resolution mechanism to work

Ad hoc sets of elements (nElements)

In this example multiple related elements of the feature (eg metadata record) are defined, such as:

FeatureCatalog name of FeatureCatalog in use
FeatureName name of feature from Feature Type Catalog specified in FeatureCatalog
FeatureCatalogPublicationDate Data of Publication of FeatureCatalog

Usage in the wild

GMLD URI IURI nElements

Discussion

Towards best practice

A priori we can assume:

  • a simple case should allow GML dictionaries
  • the general mechanism should support external registries exposing registers
  • an extensible mechanism is required to support large, potentially dynamic taxonomies, addressable in whole or part.
  • the mechanism must allow remote resolution of term, as well as local resolution ( # syntax used with care!)

 
Topic revision: r7 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).