"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

ANDS - making GeoNetwork work for ANDS (RIF-CS and other bits)

Introduction

NOTE: The following information is based on AuScope catalog experiences however it is a good reflection of how to make data available to ANDS and therefore generic

AuScope is engaged with Australian National Data Service (ANDS) to provide metadata of research data to Research Data Australia (RDA).

ANDS harvests metadata through the OAI-PMH interface. The AuScope Grid utilises Opensource Geonetwork as the metadata catalogue which functions as an OAI-PMH data provider.

This document briefly describes how Geonetwork implements OAI-PMH protocols and how ANDS harvester accesses the Geonetwork records.

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) provides an application-independent interoperability framework based on metadata harvesting. (http://www.openarchives.org/OAI/openarchivesprotocol.html) An OAI-PMH data provider such as Geonetwork implements the six OAI-PMH services:
  • GetRecord
  • Identify
  • ListIdentifiers
  • ListMetadataFormats
  • ListRecords
  • ListSets
To get a list of records for a given metadata format (e.g. rif), issue the following request: http://auscope-portal.arrc.csiro.au/geonetwork/srv/en/oaipmh?verb=ListRecords&metadataPrefix=rif

This request will return a list of metadata in RIF-CS format. By default, the maximum records return from the request is 10 records. To view the complete lists, use the Resumption Token. Copy the value of the resumptionToken at the end of the result page (e.g.

<resumptionToken expirationDate="2011-09-15T06:42:42">/-/rif/-//-//-/xq7zo54t5gn5/-/10</resumptionToken>

Make another request: http://auscope-portal.arrc.csiro.au/geonetwork/srv/en/oaipmh?verb=ListRecords&resumptionToken=/-/rif/-//-//-/xq7zo54t5gn5/-/10.

Registry Interchange Format – Collections and Services (RIF-CS)

ANDS collects and stores metadata in the RIF-CS format. (http://ands.org.au/resource/rif-cs.html)

Metadata in Geonetwork are stored in iso19139 format as default. Geonetwork implements a metadata converter from iso19139 to RIF-CS (geonetwork/xml/schemas/iso19139/convert/rif.xsl).

To enable the RIF-CS capability, apply the patches to GN 2.6.3 source and build from source.

ANDS RIF-CS Harvester

ANDS collects metadata using the RIF-CS harvester. The harvester sends OAI-PMH requests to Geonetwork based on the information provided in the Data Source configurations, such as the URI, Harvest Frequency. Refer to the attached "Data Source Administrator Role" document for detailed information.

Access to RDA and Sandbox
  1. Create an ARCS IDP, (https://idp.arcs.org.au/idp_reg/)
  2. Log into ANDS online services https://services.ands.org.au/home/ (Sandbox: https://services.ands.org.au/sandbox/ ) using the ARCS IDP credential
  3. Send the AAF token and data source details to ANDS ( services@ands.org.au) to enable access. The AAF token comes up when logging in to online services via the AAF, in big letters in the middle of the first screen, and at the top right of the screen on every screen.
Sample data source configuration in ANDS sandbox:
datasource.png

ANDS harvest is triggered by clicking the import button. Before "Import", "Test" schema validity of your records. If an Harvest Frequency is configured, an ANDS harvest will be started automatically at scheduled date and time.
test_import.png

The ANDS harvester is intended to harvest all records from a data feed. The harvesting request is in the following form: http://portal.auscope.org/geonetwork/srv/en/oaipmh?verb=ListRecords&metadataPrefix=rif&from=2000-01-20T13:00:15Z&until=2011-02-08T00:00:00Z.

The date range specified in the request is intended to cover all metadata records in the repository. The from parameter is from the earlistDatestamp in the OAI-PMH Identify request: http://portal.auscope.org/geonetwork/srv/en/oaipmh?verb=Identify

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2011-02-11T08:36:59Z</responseDate>
  <request verb="Identify">http://portal.auscope.org:80/geonetwork/srv/en/oaipmh</request>
  <Identify>
    <repositoryName>PortalGeonetwork</repositoryName>
    <baseURL>http://portal.auscope.org:80/geonetwork/srv/en/oaipmh</baseURL>
    <protocolVersion>2.0</protocolVersion>
    <adminEmail>cg-admin@csiro.au</adminEmail>
    <earliestDatestamp>2010-06-16T20:30:12Z</earliestDatestamp>
    <deletedRecord>no</deletedRecord>
    <granularity>YYYY-MM-DDThh:mm:ssZ</granularity>
  </Identify>
</OAI-PMH>

The until parameter is the current date in UTC time zone.

Notes
  • The earliestDatestamp in the above "Identify" response is retrieved from the earliest metadata changeDate in the Metadata table. It is not the earliest changeDate for the actual records. This will cause issue for harvested CSW records (GA GeoMet /GeoCat records.). This is not an issue for manually entered metadata and harvested OGC WxS records.
  • Date range in OAI-PMH requests are in UTC time zone. However Geonetwork stores Metedata in local time zone. Thus missing some records in OAI-PMH search
  • In release 2.6.x, Geonetwork supports two modes of date search, temporal date and modification date. Since not all records have temporal extent, the OAI modification date search must be set through the Geonetwork System Configuration screen.
  • Therefore, for ANDS harvest request, Geonetwork should simply return all records. This is done by removing date range condition in Geonetwork search. (refer to the patches). This feature is tailored for ANDS harvester only and is not intended to a generic Geonetwork feature.
// OAI harvest request from external OAI harvester, with fromDate equals
// to the ealiestDatestamp,
// in this case, set the toDate to the default toDate, so that
// harvest all records.
String query = "SELECT min(changeDate) as mcd FROM Metadata";
List list = dbms.select(query).getChildren();
if (list.size() > 0) {
   Element rec = (Element) list.get(0);
   String earliestDatestamp = rec.getChildText("mcd");
   if (fromTime.getText().contains(earliestDatestamp)) {
      fromTime.setText(defaultFromTime);
      toTime.setText(defaultToTime);
      return;
   }
}

modificationDate.png

Data quality

Metadata providers must comply with ANDS RIF-CS data quality requirement, such as schema validity and content quality. For detailed information, please refer to http://ands.org.au/guides/content-providers-guide.html

A data source must pass the ANDS Sandbox data source quality check. ANDS has developed a “Data Source Quality Check” tool in the ANDS Online Services Sandbox which helps data providers improve their metadata quality. To access this tool, you need to contact services@ands.org.au or your ANDS contact to create a Data Source Account in the Sandbox.

A common data quality issue is related to record contact, which is missing contact details such as phone, email and address.

Data quality check:
qualitycheck.png

Check for Red items and fix them. Enrich data content by addressing the Recommended items. AuScope has agreed to provide Activity records to ANDS (detailed in the following section).

Known issues (To-Do)

1. Activity mapping

Status: working solution in place

When providing RIF-CS records to ANDS, for each records in geonetwork, there need to be an activity records describing how the data is collected, maintained etc. ISO19115 has no information for activities which is a mandatory element in collections. The current solution is to create manual activity records and link them to collections. The manually created activities is stored in geonetwork/xml/schemas/iso19139/convert/activities.xml.

1. Manually create activities records in geonetwork/xml/schemas/iso19139/convert/activities.xml

2. In geonetwork/xml/schemas/iso19139/convert/rif.xsl, read in the activities.

3 For each collection, loop through the "$activities" and add relatedObject (activity) to the collection. An Activity is linked to an collection by mapping record url.

4. Create activity record itself (copy activity from activities.xml).

Sample activities.xml and rif.xsl are provided in the patches.

Notes:

The drawbacks of this approach are:

1. For each record in Geonetwork, an activity needs to be manually created. This solution was adopted at the time there were only few WxS harvesting nodes and no manually created iso19139 records in Geonetwork. The solution is inefficient when there is a large number of manually entered iso19139 records.

2. In a production environment, every time there is a change in the activities.xml, Tomcat needs a restart for the change to take effect.

2. Persistent Identifier for harvested records.

Status: Done

Most AuScope records are harvested from WxS services. Apparently, in Geonetwork "every time the OGC harvester runs, it will remove previously harvested information and create new ones. GeoNetwork will generate the id for all metadata (both service and datasets)." (http://geonetwork-opensource.org/manuals/2.6.3/users/admin/harvesting/index.html#ogc-service-harvesting-type)

At the moment, the OGC harvest frequency in AuScope Geonetwork is every 1 hour 30 minutes, which means the harvested records are recreated every 1.5 hours with new UUID. ANDS harvests AuScope records on a monthly basis and regards the recreated records as new records, thus result in duplicate records in ANDS collection repository. The issue can be duplicated in ANDS Sandbox. A workaround is to remove all AuScope records from RDA before a new harvest is initiated. (The drawback is that the record link in RDA changes in every harvest)

A long term solution is to have Geonetwork assign Persistent Identifier to harvest records. (Accepted but no progress yet) http://trac.osgeo.org/geonetwork/wiki/Bolsena2010 (item 30).

https://twiki.auscope.org/wiki/bin/view/Grid/GeoNetworkHarvestingInvestigation (Metadata Identifiers).

Craig Jones suggested a good approach for fixing Geonetwork:

"Perhaps we could use the GetCapabilities url and append the layer name/coverage name/feature type name as a way of indicating which section of the GetCapabilities statement the metadata was harvested from (e.g add #feature-type-name to the getCapabilities url when hashing)."

A patch has been created to fix the issue. Ogcharvester.patch: OGC harvester patch addressing the non-persistent UUID issue.

And the patch will be included in future Geonetwork.

3. Providing ANZSRC code (subject category code from ABS) to ANDS.

Status: Open

At the moment, research Subject categories are populated with Geonetwork local vocabulary and user specified subjects. Such as

<subject type="local">geoscientificInformation</subject>

ANDS requires all research data have anzsrc code, such as:

<subject type="anzsrc-for">04</subject>

For now, the ANZSRC-FOR code 04 Earth Sciences is used for all AuScope records.

<subject type="anzsrc-for">04</subject>

In long term, subject sub-catagory codes should be provided, such as Geology(0403), Geophysics(0404) etc.

4. RIF-CS v1.2.0 support

Status: Done

ANDS will harvest RIF-CS v1.0.1 records until 30 June 2011. From 1 July 2011, all RIF-CS records must comply with RIF-CS v1.2.0 schema. The AuScope Geonetwork RIF-CS converter has made the mandatory changes as per http://www.ands.org.au/resource/rifcsnov2010.html

In highlights:
   <xsl:element name="physical">      
      <xsl:element name="addressPart">
         <xsl:attribute name="type">
            <xsl:text>faxNumber</xsl:text>
         </xsl:attribute>
         <xsl:value-of select="translate(translate(.,'+',''),' ','-')"/>
      </xsl:element>
   </xsl:element>
<xsl:element name="physical">      
      <xsl:element name="addressPart">
         <xsl:attribute name="type">
            <xsl:text>telephoneNumber</xsl:text>
         </xsl:attribute>
         <xsl:value-of select="translate(translate(.,'+',''),' ','-')"/>
      </xsl:element>
   </xsl:element>

Geoscience Australian Geomet/Geocat catalog

* GeometGeocat - Info on Geoscience Australian Geomet/Geocat catalog

-- XiangtanLin - 27 Apr 2011
 
Topic attachments
I Attachment Action Size Date Who Comment
Data_Source_Administrator_role_resp.pdfpdf Data_Source_Administrator_role_resp.pdf manage 103.4 K 23 Feb 2011 - 07:04 XiangtanLin Data Source Administrator role
datasource.pngpng datasource.png manage 120.5 K 15 Jun 2011 - 14:21 XiangtanLin data source configuration in ANDS sandbox
modificationDate.pngpng modificationDate.png manage 116.7 K 16 Jun 2011 - 07:24 XiangtanLin Change OAI provider Datesearch type
qualitycheck.pngpng qualitycheck.png manage 153.7 K 15 Jun 2011 - 14:45 XiangtanLin Data quality check
test_import.pngpng test_import.png manage 135.3 K 15 Jun 2011 - 14:39 XiangtanLin test and import metadata
Topic revision: r21 - 06 Aug 2012, RiniAngreani
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).