"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

SKOS and SISSvoc

Role: VocabularyOwner


SISSvoc provides a HTTP interface to vocabularies formalized using the Simple Knowledge Organization System (SKOS) which is a W3C standard to implement key elements of thesaurus best-practice in RDF. On this page we provide a brief introduction to RDF and SKOS with links to more detail. We provide some guidelines for the preparation of a vocabulary that will be well-behaved in the context of SISSvoc.

About RDF

RDF (Resource Description Framework) is a way to formalize descriptions of resources as a set of information triples, each composed of a Subject-property-Object. The Subject and Object of each triple is a resource denoted by a URI, or sometimes a 'literal' value. The property denotes a specific semantic relationship.

DirectedGraphPlugin_1.png diagram

Since, a resource may serve as either Subject or Object of multiple triples, a set of triples describing some resources is a directed graph.

DirectedGraphPlugin_2.png diagram

RDF-primer offers a good introduction: http://www.w3.org/TR/rdf-primer/

A presentation by Tom Baker provides and introduction and vision for RDF metadata.

About SKOS

SKOS (Simple Knowledge Organization System) is an RDF (actually OWL) application suitable for encoding simple vocabularies as RDF graphs. The items in a SKOS vocabulary are concepts whose labels (classified as either 'preferred' or 'alternate') correspond to the terms and synonyms from a conventional vocabulary. 'Simple' means that a basic set of properties to record semantic relationships between concepts is provided, supporting broader and narrower relations within a vocabulary, and various levels of broad or close matching relationships between items from different vocabularies. These were designed to generally correspond to the structure found in conventional thesauri. Two explicit aggregation mechanisms are provided: Concept Schemes and Concept Collections.

SKOS-primer offers a great introduction: http://www.w3.org/TR/skos-primer/

Encoding a vocabulary

A variety of different serialization syntaxes are available for persistence and transfer. The most commonly encountered are: RDF/XML which is supported by all tools but is verbose; Turtle is more human readable; and N-Triples is closest to the database view. The W3C OWL 2 Specification requires every OWL application to support RDF/XML, while the other serialization formats are optional. Since the OWL API expects RDF/XML when loading external data (as indicated by the owl:imports property), in practice RDF/XML is the default for distribution of RDF content. Most RDF tools (OpenRDF /Sesame, Jena, TopBraid) will load data serialized in any of the standard formats, and will convert to any of the others for export.

Turtle is easy to view in a text editor, and is probably easiest to use when initially loading data into RDF from an existing source, such as a database or spreadsheet. However, maintaining a SKOS vocabulary is best done using an RDF or OWL editor, like Protege or TopBraid, which maintains the integrity of relationships in the data automatically.

While RDF/XML is well-formed XML, there are limitations in using standard XML processing tools based on XPath (e.g. XSLT, XQuery). Any specific RDF graph can be serialized in a variety of ways in RDF/XML, resulting in completely different paths to the same data, so XPath-based processing rapidly gets out of control. For processing RDF data such as a SKOS vocabulary, an RDF API (e.g. SPARQL) is recommended.

SISSvoc and SKOS properties

SISSVoc queries a vocabulary on the basis of the standard SKOS predicates, such as prefLabel, broader, narrower, etc. Any relationships that are intended to be used for SISSVoc requests must be available in the data.

When preparing a vocabulary it is common practice to focus on adding just a subset of all the possible relationships as the initial assertions. For example, you might build up hierarchies using just broader relationships. However, a other relationships are implied because of the definitions of the properties. This is the key reason to use OWL and RDFS applications like SKOS. For example, in the SKOS vocabulary skos:broader is defined as inverse to skos:narrower, so each broader relation is implicitly complemented by a narrower relation with the Subject and Object reversed. Other inferences follow from properties that are defined to be subproperties (specializations) of other properties. For example, because

skos:broader rdfs:subPropertyOf skos:broaderTransitive .
skos:broaderTransitive rdf:type owl:TransitiveProperty .

a transitive relationship may be inferred between resources that are removed two or more steps by broader relationships.

In principle all the inferrred properties can be determined at run-time by a query engine that supports 'reasoning'. The SKOS Reference states that "by convention, skos:broaderTransitive and skos:narrowerTransitive are not used to make assertions, but are instead used only to draw inferences"; i.e. they may be constructed by a reasoner but are not normally persisted in the triple-store directly. However, because a vocabulary is usually a slowly changing resource, a pragmatic solution for optimum performance is to persist the useful inferences as assertions. Hence, a vocabulary might be 'pre-conditioned' for optimal performance in SISSVoc by ensuring that properties that are implied by the SKOS/OWL/RDF inferencing rules and feature in the request interface are explicitly included in the set. Note, however, that vocabulary conditioning can and should be done using a reasoner, for both speed and accuracy!

From the considerations above and experience elsewhere, a set of best practices have been developed that will assist in the preparation of a well-behaved vocabulary. See VocabularyFormalizationInSKOS
Topic revision: r39 - 29 May 2014, TerryRankine

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).