"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Persistent Identifier Service (PID Service)

Contents

User Guide

For detailed user guide refer to:

Overview

Persistent Identifier Service (PID Service) enables resolution of persistent identifiers. The proposed solution is using an approach to intercept all incoming HTTP requests at the Apache HTTP web server level and pass it through to the PID Service dispatcher servlet that implements a logic to recognise a pattern of an incoming request and compare it with one of the patterns configured in the PID Service and stored in a persistent relational data store (e.g. PostgreSQL) and then performs a set of user-defined actions, such as, HTTP header manipulation, redirects, proxying requests, delegating resolution to another service, etc. It features extendable architecture for future improvements and supports multiple control interfaces - visual user interface (UI) as well as programmable API for remote user-less management of URI mapping rules.

Implementation has taken into account findings, requirements and observations discovered during technology review and prototype implementation phases that immediately preceded implementation of the PID Service:

License Agreement

CSIRO Open Source Software License Agreement applies.

Releases


PLEASE NOTE the main PID Service repository has been migrated over to the GitHub at https://github.com/SISS/PID

Method

The solution is using an approach to intercept all incoming HTTP requests at the Apache HTTP web server level and pass it through to the PID Service dispatcher servlet that implements a logic to recognise the pattern of an incoming request (URI) and compare it with one of the patterns configured in the PID Service and stored in a persistent relational data store (e.g. PostgreSQL). Once the match is found it performs a set of finer grained comparisons configured by a user for that particular URI applying different actions depending on particulars of the incoming request - e.g. the behaviour might vary depending on the requested content type(s), query string parameters of the request, file extension, etc. Once the matching condition is found it performs a set of user-defined actions, such as, HTTP header manipulation, redirects, proxying requests, delegating resolution to another service, etc. In case if there's no matching condition found the service will fire the default action - user-defined HTTP response or proxying request through to another location.

Core principle activity diagram

Installation

Prerequisites

Assumptions

  • %TOMCAT_HOME% - home directory of the Apache Tomcat container.
  • %HOSTNAME% - fully-qualified hostname of the deployment machine.
  • %DEPLOYMENT_DIR% - web application deployment directory (e.g. /usr/local).
  • %PIDSVC_HOME% - PID Service home directory (e.g. %TOMCAT_HOME%/webapps/pidsvc/).

Database Setup

PID Service uses PostgreSQL relational database management system (DBMS) as a persistent store for service configuration and URI mappings. Prior to deploying the PID Service itself it is recommended to go through the following steps to create and configure the database. Database setup is quite trivial and straightforward. The following guide is based on PostgreSQL 9.1 deployed on a Debian OS but similar steps should be performed on other platforms:

Log in as a postgres user:
sudo su - postgres

Create pidsvc-admin superuser:
createuser pidsvc-admin -P
Enter password for new role: <enter password>
Enter it again: <enter password>
Shall the new role be a superuser? (y/n) y

Create a new pidsvc database and set ownership:
createdb pidsvc -O pidsvc-admin

Note that even if pidsvc is the default recommended name for the database you may change it if required (e.g. if you're using a shared DBMS for multiple instances of the PID Service).

Create plpgsql language:
createlang plpgsql pidsvc

Run postgresql.sql script supplied as part of the distribution package to create database schema and populate it with default settings. The script postgresql.sql can be obtained from subversion repository https://www.seegrid.csiro.au/subversion/PID/trunk/pidsvc/src/main/db/postgresql.sql
wget https://www.seegrid.csiro.au/subversion/PID/trunk/pidsvc/src/main/db/postgresql.sql
psql -d pidsvc -f postgresql.sql

Note that it will throw a few errors/warnings that some database objects do not exists. Ignore these warnings.

Inspect the database using your favorite tool (e.g. pgAdmin III, Naticat, etc.) that it has tables and views. If it does you have configured the database correctly.

Service Deployment

The PID Service is provided as a pre-compiled web application (WAR archive), which is ready for immediate deployment. The installation of the PID Service is simple and straightforward, and will only take a few minutes.

  • Download the latest WAR-archive from SWRepo Download Server:
  • Deploy WAR file using one of the following methods:
    1. Deployment using Tomcat web interface:
      • Log on to http://%HOSTNAME%:8080/manager/html/
      • Using the "WAR file to deploy" form upload pidsvc.war file.
    2. Manual deployment into webappsdirectory:
      • Drop the WAR file into %TOMCAT_HOME%/webapps/ directory.
    3. Manual deployment using separate context configuration file:
      • Create pidsvc.xml in %TOMCAT_HOME%/Catalina/localhost/ with the following content:
        <Context path="/pidsvc"
           docBase="%DEPLOYMENT_DIR%/pidsvc/pidsvc.war"
           crossContext="false"
           reloadable="false">
           <Resource
              name="jdbc/pidsvc"
              auth="Container"
              type="javax.sql.DataSource"
              driverClassName="org.postgresql.Driver"
              url="jdbc:postgresql://%HOSTNAME%:5432/pidsvc"
              username="%USERNAME%"
              password="%PASSWORD%"
              maxActive="-1"
              minIdle="0"
              maxIdle="10"
              maxWait="10000"
              minEvictableIdleTimeMillis="300000"
              timeBetweenEvictionRunsMillis="300000"
              numTestsPerEvictionRun="20"
              poolPreparedStatements="true"
              maxOpenPreparedStatements="100"
              testOnBorrow="true"
              accessToUnderlyingConnectionAllowed="true"
              validationQuery="SELECT VERSION();"
           />
        </Context>
  • Unless you used manual deployment using separate context configuration file create a JDNI resource for database connection:
    • Add the following resource into %TOMCAT_HOME%/config/context.xml:
      <Resource
         name="jdbc/pidsvc"
         auth="Container"
         type="javax.sql.DataSource"
         driverClassName="org.postgresql.Driver"
         url="jdbc:postgresql://%HOSTNAME%:5432/pidsvc"
         username="%USERNAME%"
         password="%PASSWORD%"
         maxActive="-1"
         minIdle="0"
         maxIdle="10"
         maxWait="10000"
         minEvictableIdleTimeMillis="300000"
         timeBetweenEvictionRunsMillis="300000"
         numTestsPerEvictionRun="20"
         poolPreparedStatements="true"
         maxOpenPreparedStatements="100"
         testOnBorrow="true"
         accessToUnderlyingConnectionAllowed="true"
         validationQuery="SELECT VERSION();"
      />
  • Restart Tomcat
    • service tomcat6 restart

External properties file

Configuration .properties may be put externally from the WAR archive. To make it external you need to specify the path to the pidsvc.properties file in the pidsvc.settings environment option. To achieve this add the following into your tomcat context configuration file:

<Environment
   name="pidsvc.settings"
   value="D:\Projects\PIDService\pidsvc.properties"
   type="java.lang.String"
   override="false"
/>

JDNI Connection

By default the service is using jdbc/pidsvc JNDI resource name for the the database connection. If you decided to use another name you will also need to make appropriate changes in the %PIDSVC_HOME%/WEB-INF/mappingstore.properties configuration file by changing the following line:
jndiReferenceName = jdbc/pidsvc

Apache HTTP Server Configuration

The Apache HTTP Server is being used to intercept incoming requests and pass it through the URI dispatcher servlet of the PID Service unless they need to be proxied directly to an appropriate HTTP handler (e.g. PID Service web management cosole and its API servlets). It is achieved via the use of mod_poxy, mod_headers and mod_rewrite modules.

  • Activate Apache modules:
    a2enmod proxy
    a2enmod proxy_http
    a2enmod proxy_ajp
    a2enmod headers
    a2enmod rewrite
  • Configure Apache HTTP Server to intercept requests and pass it through to an appropriate HTTP handler. The minimal configuration is provided below, which needs to be merged with your current Apache HTTP Server configuration:
    <VirtualHost *>
       ServerName %HOSTNAME%
       RedirectMatch ^/$ /pidsvc
    
       ProxyRequests Off
       ProxyPreserveHost On
    
       <Location /pidsvc>
          ProxyPass ajp://localhost:8009/pidsvc keepalive=On
          ProxyPassReverse http://%HOSTNAME%/pidsvc
       </Location>
    
       RewriteEngine on
       RewriteRule ^(/(?!pidsvc(?:$|/)|favicon\.ico|robots\.txt|manager(?:$|/)).+)$ http://localhost:8080/pidsvc/dispatcher?$1 [NC,B,QSA,P,L]
    </VirtualHost>

RedirectMatch line is optional and may be omitted.

This configurations intercepts all incoming requests and tries to resolve them via PID Service dispatcher unless they start with with /pidsvc/ (PID Service Management Web Console) and /manager/ (Tomcat Web Application Manager). If you deploy the PID Service on a shared machine you will need to amend the rewrite rule to add an exception for other applications in a similar way by adding |your_app_name(?:$|/) to ^(/(?!pidsvc(?:$|/)|favicon\.ico|robots\.txt|manager(?:$|/)).+)$ regular expression right after |manager(?:$|/).

Security Considerations

The PID Service exposes a few endpoints, some of which should be available for anonymous users (e.g. dispatcher) and some of which should require a user authentication mechanism set up to protect the service from unauthorized uses. It can be achieved by various means and depends on what security principles are being used at the target infrastructure. Access to security sensitive endpoints may also be restricted by the firewall.

The table below provides an explanation of each endpoint and security requirements that should be taken into consideration.

Endpoint Description Security Requirements
/pidsvc/dispatcher PID Service dispatcher endpoint Harmless read-only interface used to resolve URIs.
Interface is used internally and direct access may be prohibited.
/pidsvc/controller PID Service Application Programming Interface (API) Anonymous access must be prohibited.
API is used to manage the service programmatically via web service calls. Access to the controller interface must only be granted to applications from authorised sources, such as PID Service Management Web Console and any other applications that may need to manage URI mappings programmatically.
/pidsvc/info AJAX auxiliary interface Harmless read-only interface used to provide access to URI mappings in the data store via AJAX in the PID Service Management Web Console.
Direct access from other sources may be prohibited.
/pidsvc/* PID Service Management Web Console - graphical web-based user interface for service management and monitoring Anonymous access must be prohibited.
Authorisation is required to gain access to management console
It is recommended to only allow access from the intranet and prohibit access from the outside world (can be configured in the firewall).

Use of Proxy Action

The use of Proxy action in an URI mapping causes the server to initiate an HTTP request on user's behalf to the destination URL preserving HTTP Accept headers from the original request. Service maintainer should be responsible for ensuring that firewall rules allow outbound requests to be initiated from the server. However, if for whatever reason the use of Proxy action must be prohibited, service custodian may disable this action by setting the following setting from /WEB-INF/pidsvc.properties file to false.
allowProxyAction = true

When Proxy action is disabled any attempt to use it in the URI mapping will be superseded by 302 Simple Redirection action.

LDAP Authentication

Below is an example of LDAP Authentication that can secure configuration UI and API interface:
<Location /pidsvc/>
   # Authentication
   AuthName %LDAP_GROUP_NAME%
   AuthType Basic
   AuthBasicProvider ldap
   AuthLDAPURL %LDAP_URL%
   AuthLDAPGroupAttribute memberUid
   AuthLDAPGroupAttributeIsDN off

   require ldap-group %LDAP_GROUP%
</Location>

Deployment testing

Once the service is deployed it is recommended to test the service using a test mapping rule. To conduct a test download and import InitialTest.xml file following the Import procedure.

Once the backup is restored you should be able to see a new rule QR code in the list of One-to-one (1:1) mapping in the left hand side menu. Click on it and make sure you see a QR code on the "Mapping Configuration" page. Click on the QR code and make sure the service has redirected you to http://www.google.com.au/?q=QR+Code

Service Performance and Reliability

Service performance and reliability study has been performed as a mutual effort of CSIRO and the Australian Bureau of Meteorology engineers using a test set of URI mappings. The test procedure and results are detailed here.

-- PavelGolodoniuc - 10 Oct 2012
Topic attachments
I Attachment Action Size Date Who Comment
InitialTest.xmlxml InitialTest.xml manage 0.7 K 26 Mar 2015 - 18:35 PavelGolodoniuc Deployment test mapping rule
license.txttxt license.txt manage 3.5 K 30 Mar 2015 - 15:58 PavelGolodoniuc License
Topic revision: r25 - 15 Mar 2016, PavelGolodoniuc
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).