See also:

Vocabulary harmonization - Mapping vocabularies with RDF and SKOS

Two or more vocabularies are said to be harmonized if they are represented in a unique similar system allowing to create relations among them which are inherent of the system/model used. For example if two vocabularies are harmonized in a tree-type model the only relations allowed among them are parent and child. A relational database schema can be another harmonization mechanism, allowing the terms to be expressed following a unique schema and store in records in a table. Relations among the terms depends on the design on the schema. For example a table could have an attribute parent, which points to another record in the table where the parent term resides. Ontologies are another example of a harmonization model (this definition comes from MMI pages).

The problem

For some key vocabularies there are multiple potential sources available. Units of measure is a prime example - its used by everyone, and almost everyone has had a go at creating a list ... It would clearly be optimal for everyone to agree to use the same list. But since people are used to (at least) using different symbols for concepts, and sometimes subtly different definitions, a transition strategy requires that we support 'mappings' between concepts, definitions, and labels.

Furthermore, some core vocabularies are not actually available from a single source, and may not be available in a suitable form.

Harmonization/selection of ontologies is a very active topic in the KR/semantic web world.

You can find below some implementation examples:

SKOS support for harmonization

SKOS provides some useful hooks to assist vocabulary harmonization, in particular
  • mapping properties are designed for indicating equivalent of concepts from different schemes
  • lexical labels allow you to associate multiple labels with the same concept, with 'preferred' labels scoped by natural language, and unlimited 'alternative' labels
  • notations are special labels, typically scoped to a particular authority

Managing the repository

Some key use-cases for the vocabulary service involve retrieving a concept definition, or verifying that a definition exists, but using one of the variant labels as the key. For example, Authority1 wants to use the symbol 'SYMB1' to access the definition of http://www.example.org/refs/symbols.xml#A_1. In order to support this efficiently, the base vocabulary and the 'decorations' need to be in the same repository.

However, the base vocabulary and the decorations are not under the same governance arrangements ('SYMB1' is governed by 'Authority1', and 'A_1' is governed by 'example.org', and are not part of the 'the same vocabulary'. So they should be registered and described separately, and the mapping relations as a third resource.

Metadata about a vocabulary and its members

Using SKOS to implement a vocabulary means that both the vocabulary as a whole, and each member, has a URI to identify it.

Metadata - such as Creation date, Last modification date, Version number, Status (Proposed, Valid, Superseded, Retired), Created by, Modified by, etc - can be stored in a registry object (i.e. a register item). A registry object can indicate which repository item it describes using the URI for the repository item. This could be either a vocabulary or one of its members.

In the (common) case where the vocabulary has been adopted complete from an external authority, the items in the vocabulary do not need to be registered separately. The external authority is responsible for these being included in the vocabulary, and their governance rules apply. Refer to that authority for this information.

In the case where we are locally responsible for the curation of the vocabulary, including the selection and collation of the items in the vocabulary, a metadata record should be associated with each concept.

Metadata about the decorations

The 'decorations' illustrated above are statements about a vocabulary item. As commented previously, while they may be in the same repository as the vocabulary, their status is quite different from the resources in the core vocabulary, and they should have separate metadata in the registry.

RDF solution

Under the RDF model there is no standard 'container' for these separate from the resource that they describe, so they can't easily be identified as a collection.

Associating metadata with the decoration means making statements about statements. In pure RDF this is enabled by 'reifying' the statement, to make it a resource in its own right, with its own identifier. This is done by adding an explicit attribute to the statement, using the pattern shown in the following example (which is just an elaboration of the example given above):

   <rdf:Description rdf:about="http://www.example.org/refs/symbols.xml#A_1">
                <skos:notation rdf:ID="A_1-n2" rdf:datatype="http://www.Authority1/">SYMB1</skos:notation>
                <skos:altLabel rdf:ID="A_1-al1">a_1</skos:altLabel>
                <skos:altLabel rdf:ID="A_1-al2">symb1</skos:altLabel>
                <skos:exactMatch rdf:ID="A_1m1" rdf:resource="urn:ogc:def:sym:SYMB:SymbA_1"/>
   </rdf:Description>

The presence of the rdf:ID tells an RDF processor that there is a resource of type="rdf:Statement" with an identifier generated by appending the value of the rdf:ID to the value of the document base URI. For example, if xml:base="http://www.example.org/" for the context, then the URI for the first statement above is http://www.example.org/#A_1. This provides a handle for associating registration information/metadata with the decorations on a statement-by-statement basis.

Sesame solution

The Sesame RDF service provides another hook that may be used instead. Within each repository each triple may optionally be associated with a context. The value of the context parameter is a URI, which is typically set automatically to the filename of a set of triples uploaded into the repo, or may be set explicitly to provide a trace to its source. As a short-term measure, at least, AuScope will use the context to indicate governance arrangements for multiple sets of statements in a single repository.
Topic revision: r15 - 31 Jul 2012, SimonCox
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).