"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Persistent Identifiers in GeoSciML Services

Also see PersistentIdentifiersInGeoSciMLServicesDiscussion

Introduction

The following proposal relates to GeoSciML v3.0 and services implemented by Testbed 4 participants. The intent is to test the use of HTTP URIs as an alternative to URNs for persistent identifiers for features and classifiers. OGC now recommends HTTP URIs as the preferred means of identifying OGC resources and CGI wishes to conform with OGC convention. If successful/accepted this can be documented as a best practice/standard for by the CGI Interoperability Working Group.

This proposal has tried to implement nothing new. It uses existing W3C practices or existing proposals (such as recent UK public sector recommendations).

Assumptions

  • This document assumes the use of GeoSciML 3.0 and that GeoSciML 3.0 is a GML 3.2 application schema.
  • GeoSciML feature types, classifiers and classifier schemes (controlled concepts and vocabularies) are non-information resources (see here).
  • This document assumes a certain amount of background reading not limited to the references below. Terms to understand include: Information resource; Non-information resource; Dereference (AKA resolve); Representation; Canonical form; Default representation.

Description

  1. All GeoSciML features and classifiers are to be identified with an HTTP URI structured according to the proposed CGI HTTP URI Scheme.
  2. References to these resources in structured/linked representations will use an HTTP URI value that behaves as a foreign key to dereference to a representation of the resource (see 3.).
  3. Requests to the HTTP URI for non-information resources will result in a 303 redirect (see Cool URIs 4.2: 303 URIs) returning a URL containing an appropriate request for the default representation of the resource. The recommended ‘default’ response for a GeoSciML feature is an HTTP GET WFS get feature request.
  4. Where multiple representations of the resource may be available, for example documents encoded according to different versions of GeoSciML plus RDF, a client may use content-negotiation using the HTTP Accept: header to request the preferred representation (encoding) of the resource.
    • different clients may request different representations, for example
      • a browser will usually prefer HTML
      • a GML aware client will prefer some flavour of GML (GeoSciML)
      • a semantic-web client will usually prefer RDF (incl. SKOS, OWL)
    • HTML encoding should be the default representation if a http Accept: value is not provided by the requester
      • this is for compatibility with the assumptions of generic web clients, and applications that are not GeoSciML- or GML-aware
      • note that a corollary of this is that GeoSciML-aware clients must use the http Accept: header to get a XML representation
  5. In GeoSciML documents:
    • identifiers will be encoded as gml:identifier values;
    • references may be encoded as xlink:href or gml:CodeTypevalues;
      • the accompanying codeSpace attribute values for identifiers will be the URI of the specification for URI generic syntax: http://www.ietf.org/rfc/rfc2616;
      • the accompanying codeSpace attribute values for references to classifiers encoded as ScopedName values shall be the URI of the classifier scheme; and
      • should a provider wish to also provide URNs for backwards compatibility with previous versions of GeoSciML they should be encoded as gml:name values with a codeSpace attribute value of http://www.ietf.org/rfc/rfc2141.

Implementation

Dereferencing/resolution of the HTTP URIs is expected to use HTTP GET requests to web servers using commonly available rewrite engines (such as Apache’s mod_rewrite module).

CGI HTTP-URI Scheme

The CGI URI Scheme takes the existing CGI URN scheme and adapts it to conform to the URI generic syntax (RFC2616) for the http identifier scheme.

CGI HTTP-URI format

  • http://{host}/{cgi resource class}/{authority}/{resource specific identifier}

Facet definitions

  • {host} - location at which the HTTP-URI may be dereferenced.
    • will be related to the {authority}, but is not necessarily tightly coupled - for example one organisation or community may provide hosting services for another.
  • {cgi resource class} - resource classes as defined in the CGIResourceClassRegister.
  • {authority} - party that originally defined, or owns, the resource.
  • {resource specific identifier} - character string that may be faceted to allow the full character string (starting with 'http://') to be universally unique.
    • Wherever possible should be confirm to 'Cool URI' standards but may vary for pragmatic reasons, for example to assist in dereferencing the URI.

Constraints

Examples

  • http://geology.data.vic.gov.au/feature/gsv/geologicunit/16777549126930817
  • http://resource.geosciml.org/classifier/cgi/lithology/106
  • http://resource.geosciml.org/classifierscheme/cgi/200811/simplelithology

Other URIs

  • Flags (nil values, namespaces, uom values,and other URIs not necessarily intended to be resolved) are to be implemented as HTTP URIs as formalised by the OGC.
  • URIs identifying actual information resources are beyond the scope of this document but will ultimately need to be formalised by the CGI IWG, specific communities (eg USGIN, GIN, INSPIRE or AuScope) or both.

Example Instances

Identifier value

<gml:identifier codeSpace="http://www.ietf.org/rfc/rfc2616">
    http://geology.data.vic.gov.au/feature/gsv/geologicunit/16777549126930817</gml:identifier>

Backwards compatible (GeoSciML v2.1 or lower) URN value:
<gml:name codeSpace="http://www.ietf.org/rfc/rfc2141">urn:cgi:feature:GSV:GeologicUnit:16777549126930817</gml:name>

Reference to feature

<gsml:specification xlink:href="http://geology.data.vic.gov.au/feature/gsv/geologicunit/16777549126930817"/>
<gsml:occurrence xlink:href="http://geology.data.vic.gov.au/feature/gsv/mappedfeature/930817"/>

Reference to classifier

<gsml:lithology xlink:href="http://resource.geosciml.org/classifier/cgi/lithology/106" xlink:title="basalt"/>

Reference to missing value

<gsml:value codeSpace="http://www.opengis.net/def/nil/OGC/0">http://www.opengis.net/def/nil/OGC/0/missing</gsml:value>

Reference to UOM

<gsml:principalValue uom="http://www.opengis.net/def/uom/OGC/1.0/metre">100.0</gsml:principalValue>

Dereferencing HTTP-URIs

This table is a summary of the MIME types expected to be supported for each CGI resource class and the expected HTTP response codes and content encoding. Example responses will provided with time.
Resource Class Supported MIME types Required Response Code (Success/Fail) Constraints Example Response Comments
classifier application/rdf+xml Y 303/406 SKOS/RDF representation    
  application/xml Y 303/406 SKOS/RDF representation    
  text/xml Y 303/406 SKOS/RDF representation    
  text/html D 303/406      
classifierScheme application/rdf+xml Y 303/406 SKOS/RDF representation    
  application/xml Y 303/406 SKOS/RDF representation    
  text/xml Y 303/406 SKOS/RDF representation    
  text/html D 303/406      
feature application/xml Y 303/406 Return GML feature-type, no service artifacts   Service artifacts examples include WFS feature collections
  text/xml Y 303/406 Return GML feature-type, no service artifacts   Service artifacts examples include WFS feature collections
  application/gml+xml Y 303/406 Return GML feature-type, no service artifacts   MIME type formally registered yet?
  text/html D 303/406      
  image/* N 303/406 Georeferenced image.    
service (WxS) application/xml Y 303/406 GetCapabilities document    
  text/xml Y 303/406 GetCapabilities document    
  text/html D 303/406      
more to come...            

Required MIME types

  • 'Y': required type. The resource should respond to a request for a representation of this type.
  • 'D': desirable type. It is strongly recommended that a request for this type be honoured with an appropriate response.
  • 'N': not required. The server may respond to a request for a representation of this type.

HTML vs XML represensations

Support for HTML representations have not been mandated as service providers may not have the ability to generate them, however:
  • We strongly recommend making HTML representations available; and
  • We strongly recommend making HTML representations the default response to any request that does not specify a desired MIME type in the request's accept header.

Response Codes

  • 303 response codes are specified as they explicitly forbid caching of the URL provided in the response - this must be allowed to vary. The user agent can determine from the subsequent response whether it can cache that.
  • 406 response code are specified based on the PersistentIdentifiersInGeoSciMLServicesDiscussion.
Example Implementation in Apache

The following is from a prototype implementation using 303 redirects implemented in Apache’s mod_rewrite module. In this prototype the URL is mapped to a WFS GetFeature HTTP get request using the featureid parameter. The gml:id values in the source WFS have been constructed to be compatible with (the doesn’t necessarily mean matching exactly) the URI for easy mapping. There is nothing stopping a provider implementing a different mapping, or using a different rewrite engine, provided a request for the right feature is returned.

URLs below are not real to protect innocent development servers but this has been successfully implemented.

From httpd.conf: 
[snip]
##########################
## URI Rewrites for GSV testbed 4
##########################

<IfModule mod_rewrite.c>[snip]
RewriteEngine on

RewriteRule  ^/feature/gsv/geologicunit/(.*) http://geology.data.vic.gov.au/GeoSciML/GeologicUnit/wfs?
service=WFS&version=1.1.0&request=GetFeature&typeName=gsml:GeologicUnit&featureid=gsml.geologicunit.$1 [NC,R=303,L]

RewriteRule  ^/feature/gsv/mappedfeature/(.*)  http:// geology.data.vic.gov.au/GeoSciML/GeologicUnit/wfs?
service=WFS&version=1.1.0&request=GetFeature&typeName=gsml:MappedFeature&featureid=gsml.mappedfeature.$1 [NC,R=303,L]
[snip]</IfModule>

Background Reading

Topic revision: r15 - 08 Jun 2011, EricBoisvert
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).