Persistent Identifiers (PID) Service - Technology review

Contents

Overview

Technology review and comparison of available open-source PID Service solutions. Detailed review of project specific requirements and features taken into consideration as well as ability of each available solution to cope with imposed requirements.

Precedents

  • PURL
  • URL shorteners (bit.ly, etc.)
  • Arizona Resolver (Django-based)
  • Apache, .htaccess, etc.
  • “SWIM” Persistent ID Service

Considerations

Pattern mappings/regex

For example:
From To
^/resource/feature/gswa/mineraloccurrence/(.+)$ http://hostname/gswa-earthresource/wfs?service=WFS&version=1.1.0&request=GetFeature&typeName=er:MineralOccurrence&featureid=er.mineraloccurrence.$1

MIME type content negotiation

MIME type content negotiation should also support prioritisation. For example:

Accept: text/html, application/xhtml+xml, */*

where / or * returns a default response (default MIME type, application/atom+xml, etc). See Extended negotiation.

If the user-agent requests a resource in a format that is not available, but other forms of the resource are, then the HTTP response code should be 406 (not acceptable) or 415 (unsupported media type), rather than 404 which implies there is nothing to find.

Fine grained content negotiation

The ability to request a document of a specific MIME type and its "subtype". For example, GML document for a specific application schema (information model):

application/gml+xml; subtype=gml/3.1.1; schema=GeoSciML 2.0

Extended negotiation

The ability to request the list of supported MIME types for the specified persistent identifier. The respective policy should be developed whether an Atom response (or similar) should be the default response or another default option should be somehow configured in a PID Service.

SISS4BoM project specific requirements

Requirements analysis

  • URI policies, templates, formation rules - complexity level and specific use cases.
  • URI mapping and URI resolution rules management. User interface, manual configuration, programmable API?
  • URI mapping fallback rules.
  • Should partial URIs be resolvable? How it should be managed?
  • Content negotiation, negotiation protocol. Is it required at this stage at all?
  • Redirects vs. proxying requests.
  • Scalability. What number of URI mappings/rules we're looking at?

An ontology example

The key use-case focussed on for the PID is going from a URI for a real-world-thing to a representation or description in the form of a digital document. However, the ontology space itself is not simple. Here is a very specific ontology example, from the work I’ve been doing with ISO tempered by discussions with Epimorphics:

URI Response
http://def.seegrid.csiro.au/isotc211/iso19115/2003/metadata a resource describing the whole ontology (#localName identifies the ontology elements, but is stripped by HTTP)
http://def.seegrid.csiro.au/isotc211/iso19115/2003/metadata/code/ a list of codelists
http://def.seegrid.csiro.au/isotc211/iso19115/2003/metadata/code/MediumNameCode the container resource for one of the codelists (actually a skos:ConceptScheme or skos:Collection)
http://def.seegrid.csiro.au/isotc211/iso19115/2003/metadata/code/MediumNameCode/ a list of members of the codelist
http://def.seegrid.csiro.au/isotc211/iso19115/2003/metadata/code/MediumNameCode/9trackTape a member of the codelist

In each case there will be HTML, RDF/XML, TTL versions, also possibly ATOM, selected by conneg using HTTP headers.

Comparison

  PURL URL shorteners Arizona Resolver Apache SWIM
Features
1:1 mapping DONE DONE DONE DONE DONE
URL stem/path substitution ALERT! ALERT!   DONE  
Pattern mappings/regex ALERT! ALERT! DONE DONE  
MIME type content negotiation ALERT! ALERT! DONE
(partial)
   
Finer grained conneg ALERT! ALERT!      
Extended negotiation (e.g. Atom response with options) ALERT! ALERT! ALERT! ALERT! ALERT!
Redirect DONE DONE DONE DONE DONE
Proxy ALERT! ALERT! ALERT! DONE  
Management/programmability/scalability
Configuration user interface DONE DONE DONE ALERT!  
Programmable API DONE ALERT! DONE ALERT!  
Decomissining URIs ALERT! DONE DONE DONE  
Scalability (> million of URIs/rules) ? ? DONE ALERT!  
RDBMS back end     DONE ALERT!  
Support and maintenance
Community support ALERT! DONE DONE DONE DONE
Open for improvements ALERT! ALERT! DONE ALERT! DONE
           

PURL

  • Pros:
    • RESTful management interface
      • Supports creation/modification of batches of PURLs.
  • Cons:
    • Support only redirection with 30x and 4xx HTTP response codes (no proxying possible).
    • Impossibility to decomission/update existing PURL.
    • Scalability is under a big question mark (no way to test the possibility to load 1 million URLs).
    • Doesn't seem to be well supported. The latest PURL news is one year old and the last press release on the web site is dated back to 11 July 2007.
    • Even though it is stated that PURL technology in open-source on http://purl.org/docs/brief_intro.html the download link http://www.oclc.org/purl/docs/download.html is dead.

-- PavelGolodoniuc - 25 Jul 2012
Topic revision: r4 - 21 Aug 2012, PavelGolodoniuc