"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Names, Identifiers, Labels and Handles

Contents

Related pages



Names and identifiers is a sometimes confusing topic. In some contexts there is an apparently clear distinction, with names being "words" and identifiers being numbers or codes. However, in practice a "name" is just a memorable identifier, and these concepts slide into each other. Thus it is appropriate to consider them together and use a unified approach.

Most Objects and Features encoded in GML and XMML carry both an gml:id attribute, and one or more gml:name elements. These have different roles:

  1. the value of the gml:id has type="xsd:ID" so must be unique within the (XML) document. It is a document fragment identifier, and acts as a handle for an XML element within the document, functionally equivalent to a RDBMS "primary key". This supports cross-references within a document, and references involving individual nodes (elements) within a system of documents.
  2. the value of a gml:name has type="gml:CodeType", which is a string with a "codeSpace" attribute. In the context of a GML object the value of a gml:name element is a label or identifier for the object described by the parent element.

Name - externally assigned designator for the real world object

The label or name is usually assigned by an external agency to the real-world object under consideration, according to the relevant business rules of that organisation. In many cases the rules will specify that the name is unique and has continuing significance in the external context, so if present it should be treated as persistent. Instead of "organisation", substitute "family" or "registry of births, deaths and marriages" and the rule still holds up.

Names may be any combination of characters, including whitespace. The value will often be "human readable", possibly with semantic significance. However, gml:name elements should used to carry all externally-significant designators (names, labels, identifiers).

Some objects may have multiple names, which may be assigned by different authorities. In GML (and XMML) these are disambiguated by using the codeSpace attribute on the name element. The convention is that the value of a name element with no codeSpace is the canonical name in the context of the service providing the document. Aliases from other authorities are given in additional name elements which do carry a codeSpace.

As an illustration of how this might be used, consider a simple instance document consisting of a collection containing two features, each of which has two names from different codeSpaces:
<Collection id="bar">
  <member>
    <Feature id="foo">
      <name codeSpace="ABC">efg56kj9_o</name>
      <name codeSpace="DEF">Dick</name>
    </Feature>
  </member>
  <member>
    <Feature id="baz">
      <name codeSpace="ABC">Dick</name>
      <name codeSpace="DEF">Harry</name>
    </Feature>
  </member>
</Collection>
The XPath expression
   //Feature[./name[@codeSpace="DEF"]="Dick"]
selects the first Feature, while the expression
   //Feature[./name[@codeSpace="ABC"]="Dick"]
selects the second.

This approach, focussing on the value of a gml:name property, allows access to the nodes in an XML document on the basis of the externally assigned name or identifier that applies to the object being decribed.

ID value - system assigned handle for XML element

The value of gml:id will normally be assigned according to rules determined by the service provider responsible for generating the XML representation of the data. The rules for assigning document identifiers are, in general, outside the scope of an encoding standard.

The value may be opaque, and should not be assumed to have any significance outside the context of the document or system of documents, so caution should be applied in using it as a persistent identifier. Note also that strictly speaking the id applies to the XML element in its document context, rather than the real-world object being described. It has similar significance to a "Primary Key" in a RDBMS system.

It may be convenient for the value to have some mnemonic significance, but this is not required. It is merely necessary that the value of the gml:id attribute is unique within the current context (i.e. the XML document). A particular service may use locally convenient rules to generate unique id's: these may use timestamps, sequence numbers, even random numbers as inputs.

The Xpath required to identify the XML element describing the Feature is slightly simpler when using the id:
//Feature[@id="foo"]
for the first, and
//Feature[@id="baz"]
for the second.

Attributes of type xs:ID are an XML document-oriented feature, similar to RDBMS keys. It is unwise to overload their semantics. gml:id is best used as an internal, opaque identifier, with no significance outside the XML document context. It is useful to support fragment-identification-by-XML-ID, and is the standard XML mechanism for doing so.

An example of a methodical set of rules for generating values for gml:id is provided by US Census: see the section headed gml:id Attributes in Census TIGER/GML in http://aries.geo.census.gov/TIGERGML/CensusTIGERGMLSchemas.html. Note that TIGER/GML documents involve a lot of topology, with a lot of re-use of geometry objects by reference, so a systematic approach to assigning values for gml:id is essential.

There are constraints on the lexical form of an ID in XML - it may not start with a digit or contain spaces. Thus, even if the value of the id is patterned after one or other label, you may see a "_" prepended to a value that starts with a digit in order to make it valid, and spaces replaced with "_" characters also, etc.

The specification for components of type "ID" is given here: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#ID

Turning labels and handles into URIs

To use a handle or name in cross-referencing, its context in the form of the document identifier (often a URL) is prepended to form a globally unique identifier known as a URI Reference for the XML element.

The specification for components of type "anyURI" is given here: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#anyURI The Generic Syntax for URI's is described here: http://www.ietf.org/rfc/rfc2396.txt

ids

Using XPointer syntax, the URI reference for the element where id="foo" within the document "file:///C:/920500-A1-c.xml" is
file:///C:/920500-A1-c.xml#xpointer(id('foo')) 
which may also be expressed in abbreviated form (known as a shorthand pointer or barename)
file:///C:/920500-A1-c.xml#foo.  
or using the XPointer element syntax
file:///C:/920500-A1-c.xml#element(foo) 

The short form "#foo" (equivalent to "#xpointer(id('foo'))" or "#element(foo)") may be used for cross-referencing within a document. The empty path stem indicates that the xpointer is relative to the "current document". It has the equivalent forms "./#foo" or "./#xpointer(id('foo'))" or "./#element(foo)".

This URI reference identifies "the XML element that carries the id 'foo' within the context".

The use of a URI or URI Reference to refer to a resource or fragment is functionally equivalent to a RDBMS "foreign key". Examples using both long and short forms are shown in GmlProperty.

names

Using the XPointer scheme, for the element named Feature where the value of the name subelement with the value "Dick" in codeSpace "DEF" within the document "http://my.big.org/test.xml", the URI reference is
http://my.big.org/test.xml#xpointer(//Feature[./name[@codeSpace="DEF"]="Dick"]) 

Given the semantics of the gml:name property described above, this URI reference identifies "the XML element that describes the object that has a name 'Dick' assigned by the authority responsible for the codeSpace 'DEF'".

Note that this XPath/XPointer syntax can be adapted to any element path, not just those "Feature" elements involving "name" properties.

Unique Identifier?

One issue with the GML pattern of gml:id and gml:name is that some of the information models that we expect to be implemented in GML have the notion of one "unique identifier" in the current context, plus a set of alternative identifiers assigned by other authorities. This includes gazetteer, CRS, and in fact any item that is registered (see the models in ISO 19135). For example, in the UML model for OGC Topic 2 [OGC 04-046r3], the attribute "name : RS_Identifier" serves as the "unique identifier" for each CRS and CRS-related object. The attribute "identifier [0..*] : RS_Identifier" is used to record object names/identifiers assigned by other authorities.

Should such identifiers be encoded as a gml:id or a gml:name?

The gml:id is the slot reserved for an identifier that applies to the XML element in the scope of its appearance within a particular document, and is usually assigned by the information management system since it is primarily significant in that context.

On the other hand, gml:name is the slot reserved for identifiers that apply to the object being described, that are typically assigned by an external agency, and should be used for identifiers that are required to be
  • persistent
  • subject to constraints (e.g. uniqueness) applicable to a context wider than just the document scope.

The uniqueness in these cases is essentially scoped to the registry and its governance model, and does not cancel the idea that different authorities will have different authoritative identifiers for the same item. However, in the registered-item context, one identifier is promoted and overrides GML's essentially neutral view of the status of multiple "name" properties, as all being equal. Note that the gml:Definition model restricts gml:_GML to ensure that at least one gml:name is present to accomplish this.

(There is discussion as to whether
  1. a new tag should be added to hold the "unique identifier", e.g. identifier
  2. the value should be anyURI rather than string+codeSpace.)

URN

It is frequently convenient to use a unique, persistent identifier for resources. Furthermore, in some cases it is useful if this identifier is not directly tied to a web-resolvable address. By using a URN a location-neutral identifier can be used. If an actual resource is required, a resolver must direct the request to the best source. Note the "best" source might depend on context: different resources might be delivered to different clients, but if they are identified by the same URN they are implied to be semantically equivalent.

URN syntax is described here: http://www.ietf.org/rfc/rfc2141.txt URN's consist of a series of (normally) colon-separated fields, whose meaning is defined by the organisation responsible for the URN schema. If desired, the fields can define a complex set of "parameters", which might constitute arguments to a function-call or service-request used by the resolver.

The OGC URN Scheme

OGC has a (proposed) URN scheme with the following structure
urn:ogc:{category.label}:{resource.group}:{resource.type}{-resource.subtype}?{[doc.id]}?:{[resource.label]}?:{[release]}?:{[parameters]}
where category.label = [specification|service|tc|def]

The "def" branch of the OGC URN scheme is useful for identifiers for CRSs and UOMs - see OGC Best Practices Paper 06-023r1 (publicly available at http://portal.opengeospatial.org/files/?artifact_id=16339). Examples include

urn:ogc:def:crs:OGC:1.3:CRS84 WGS 84 longitude-latitude
urn:ogc:def:uom:OGC:1.0:radian Angular radian
urn:ogc:def:nil:OGC:unknown Nil value - reason: unknown
urn:ogc:def:documentType:OGC:schema-GML GML Application Schema
urn:ogc:def:uom:SI:kg kilogram
urn:ogc:def:uom:UCUM:[ft_us] US survey foot

A URN Scheme for XMML and SEE Grid

In XMML examples, we have found it useful to use URN's as identifiers for (at least) terms, objects and classifiers. e.g.
urn:x-seegrid:definition:gml:NilReason:unknown

urn:x-seegrid:definition:xmml:AssayProcedure:XRF_1

urn:x-seegrid:feature:xmml:GeochemSpecimen:WA_1_139459

urn:x-seegrid:feature:xmml:GeochemMeasurement:WA_1_139459_Al2O3

urn:x-seegrid:feature:xmml:GeochemMeasurement
The value of a cross-reference may not be known "exactly" to the service writing the document. In this case, the link may be specified as a parameterised request to another service, such as a WFS GetFeature request.

However, even when a cross-reference includes a fragment identifier, then the "context" of the identifier may be parameterised, for example by service identity and feature-type.

More later ...
Topic revision: r32 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).