"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

SEEgrid Information Services Roadmap



Summary

The development of this Roadmap for Implementing Interoperability of Geospatial Data for the Minerals Industry is supported financially and in-kind by the AusIndustry Innovation Access Program, Minerals Council of Australia, Government Geologists Information Advisory Council, Geoscience Australia (GGIPAC), Social Change Online, CSIRO Exploration and Mining, the CRC for Predictive Mineral Discovery and members of the minerals industry.

The Australian Government’s Innovation Access Program – Industry provides grants for industry-led projects as part of its support for innovation, investment and international competitiveness of Australian businesses. The program is offered through AusIndustry and adopts the take-up of technologies and best practice processes to enhance the competitiveness of Australian companies. This can include grants for international specialist visits to Australia, technology access workshops, and demonstration and awareness projects. Private sector companies, industry groups, universities, scientific agencies and training institutions are all eligible for funding.

The Roadmap exercise is being conducted in 3 parts, with a target completion date of March 2005. The provisional Roadmap was written during Part A, and will be followed by a "reference implementation" demonstration supported by GGIPAC and Geoscience Australia in Part B. This provisional roadmap will be reviewed in light of the demonstration project and an open workshop will be held in March 2005. This workshop follows on from the inaugeral Solid Earth and Environment Grid (SEEGrid) workshop held in August 2003.

This provisional Roadmap is a "living document" and as such its authors welcome comments and suggestions to improve its content, style and implementation into the community of practice. It was written by a consortium consisting of:
  1. Dr. Robert Woodcock CSIRO Exploration and Mining
  2. Dr. Simon Cox CSIRO Exploration and Mining
  3. Mr. Rob Atkinson Social Change Online
  4. Mr. Stuart Girvan Geoscience Australia
  5. Mr. Tim Mackey Geoscience Australia
  6. Dr. Lesley Wyborn Geoscience Australia

Goals

  1. SEEGrid will provide an infrastructure that enables multi-stakeholder collaborations for research, operational and outreach activities with particular emphasis on government, industry and academic sectors.
  2. SEEGrid will seek to leverage and empower existing and future activities through enabling interoperability within the sector, providing an overarching framework for interoperability, as opposed to prescribing technical solutions for individual projects.
  3. SEEGrid will not be based on any single technology platform, but will allow open source and proprietary solutions to co-exist.
  4. SEEGrid Information Serices Roadmap will provide a design to systematically evolve an effective community through emerging activities within the sector (as opposed to a single technology deployment exercise)
  5. The SEEGrid infrastructure will encompass the technical standards, policy, governance framework to allow for semantic interoperability to be established within the community of practice.
  6. There may be either technical or business reasons why potential stakeholders will be unable to participate in initial stages, so SEEGrid must establish initial capabilities that demonstrate a long term value proposition and provide opportunities for engagement.
  7. SEEGrid will provide the bridge between data management, research, high performance computing and business needs. In particular, it will provide the means for technical collaboration between these functions
  8. SEEGrid will be implemented within the context of the Australian Spatial Data Infrastructure. This means that it will:
    • conform to relevant government policies
    • be able to seamlessly access services provided by external agencies
    • become the primary mechanism for delivery of geosciences and environmental data and modelling capabilities to external agencies
    • be a focal point for resourcing the resolution of technical and institutional issues hindering interoperability within and across the SEE domain

Summary of requirements for interoperable Web Feature Services

This section highlights the critical issues underlying interoperability of data access services.

Information standards

  • Community data models
    • GML Application schema(s) for feature types of interest
    • code-lists/authority tables
    • feature-type index/classification (ontology?)

Software

  • server-side WFS software
    • connecting to existing DB and GIS systems
    • supporting a mapping to arbitrary (community) information model/schema
  • client (desktop, browser and/or server application) WFS software(to explore design, exercise and demonstrate information services)
    • basic semi-interactive portrayal service (i.e. drawing maps)
    • query-building interface
  • access to shared infrastructure services
    • service catalog - somewhere to advertise service access point to a user community
    • feature type catalog - a means of deploying the community schema as reference point so that the delivered data can be described

Softer-ware (process and governance)

  • conformance testing
    • WFS software
    • GML application schemas
    • service instances
  • design
    • clear "use cases" for the system
    • modelling of what information is required to support use cases
  • policy
    • access control strategy
    • accounting strategy
    • versioning strategy


Document structure

This document is organised according to the Reference Model for Open Distributed Processing (RM-ODP) analysis viewpoints. RM-ODP offers an approach that supports coordination in the development of complex software systems implemented in a distributed context. The viewpoints are as follows:

The RM-ODP framework provides five generic and complementary viewpoints on the system and its environment:

  • The enterprise viewpoint, which focuses on the purpose, scope and policies for the system. It describes the business requirements and how to meet them.

  • The information viewpoint, which focuses on the semantics of the information and the information processing performed. It describes the information managed by the system and the structure and content type of the supporting data.

  • The computational viewpoint, which enables distribution through functional decomposition on the system into objects which interact at interfaces. It describes the functionality provided by the system and its functional decomposition.

  • The engineering viewpoint, which focuses on the mechanisms and functions required to support distributed interactions between objects in the system. It describes the distribution of processing performed by the system to manage the information and provide the functionality.

  • The technology viewpoint, which focuses on the choice of technology of the system. It describes the technologies chosen to provide the processing, functionality and presentation of information.

A viewpoint is a subdivision of the specification of a complete system, established to bring together those particular pieces of information relevant to some particular area of concern during the design of the system. Although separately specified, the viewpoints are not completely independent; key items in each are identified as related to items in the other viewpoints. However, the viewpoints are sufficiently independent to simplify reasoning about the complete specification. The mutual consistency among the viewpoints is ensured by the architecture defined by RM-ODP, and the use of a common object model provides the glue that binds them all together.

(from Vallecillo, RM-ODP: The ISO Reference Model for Open Distributed Processing )


Enterprise viewpoint

The enterprise viewpoint focusses on the institutional issues that need to be addressed in order for an organisation to participate in interoperability. Important issues that are addressed include:
  • the costs involved in participating (or not). This includes a cost-benefit analysis, comparing the cost of servicing individual requests by traditional means, as opposed to adopting this roadmap;
  • why and how to adopt technical standards;
  • creating a corporate attitude that technical standards enable the business, rather than constrain it;
  • developing a governance framework to define responsibility and control mechanisms.

See EnterpriseViewpoint.

Information viewpoint

The information viewpoint is concerned with establishing the semantic context of system components. This is specific to the application domain and the computational problems that it addresses.

For geographic information, the contemporary meta-model, advocated by ISO/TC 211 and Open GIS Consortium, focusses on geographic features, and the process focusses on information communities .

Establishing the information model for the community is primarily a matter of developing a catalogue of feature types of interest to the community.

See InformationViewpoint

Computational Viewpoint

This viewpoint captures the details of components and interfaces from a functional point of view, without regard to distribution.

SEEGrid will be established by linking business functions through network infrastructure. Each business function will be encapsulated through a service that provides a consistent and predictable means of accessing that function. This "Service Oriented Architecture" (SOA) approach distinguishes SEEGrid from distributed systems that use opaque proprietary protocols and/or persistent connections between stateful software components.

See ComputationalViewpoint

Engineering Viewpoint

The Engineering Viewpoint introduces practical issues about what components are deployed where. SEEGrid responds to the many challenges inherent in optimising deployment configurations and information transfers by enforcing sufficient semantic consistency that, in general, any process can be deployed anywhere on the network without having to redesign the information models.

See EngineeringViewpoint

Technology Viewpoint

Overview needed ...

See TechnologyViewpoint


SEEgrid Roadmap - Enterprise Viewpoint



Review of Business Drivers

This section reviews some of the core business drivers that impact on the architecture of a SEEGrid.

Economic factors

GRID computing and SDI share a common rationale - more opportunity and less cost. The key to this is the "commodification of access arrangements" - i.e. the establishment of a regime (aka a DataGrid) where access arrangements to resources (data, computing, network) is agreed beforehand and thus workflows can be established by either users (or programmers), without the typical overhead involving specialist skills and mandates such as accountants, business development, lawyers, policy development, etc.

A comparison of the "*Total Cost of Business Outcome*" [I made that up but we do need a good buzzword to sheet the concept home] between the SEEGrid concept and typical current practices can be made by fully accounting for all the activities required to deliver an information product, with and without a DataGrid. TCBO estimate example to be developed by TimMackey + PaulTreloar ??

The purpose of including this comparison within the SEEGrid Roadmap is to highlight the nature of real world interoperability considerations and ensure that all aspects are within addressed the scope of the governance framework and architecture.

External Influences

In addition to technology and standards bodies, and the direct influences of relevant policy frameworks, the SEEGrid framework must provide for effective collaboration with other significant drivers within its subject domain and technical environment.

GEON (US) - "A cyberinfrastructure for the geosciences"

The GEON project addresses many of the same issues within a broadly equivalent sector within the US. Much can be gained from closely monitoring its organisational and technical evolution. The approach of the GEON project may not however be directly transferable to the SEEGrid context because the institutional environment is different and the level of resources available are different. The table below provides a comparison of how the GEON approach and SEEGrid requirements match. Of particular interest is the effort to establish common ontologies within the GEON community.

GEON Approach Commonalities with SEEGrid Differences with SEEGrid
Development of common ontologies through stakeholder Deployment of shared ontologies as a interoperability principle
GEON common ontologies as useful resources
Resources required out of scope
SEEGrid domain is broader
Stakeholders will be engaged over time
single vendor GIS technology base (ESRI as industry sponsor) GIS is a relevant technology
ESRI products in use by many stakeholders
traditional GIS not most important technology focus
No resources or mandate to enforce a single vendor deployment
NSF funding Analagous to the AEON community (see below) seen as SEEGrid stakeholders GEON is research oriented (NSF), but with funds to develop infrastructure.

NERC Data Grid (UK)

The NERC Data Grid (NDG) is a UK e-science project to establish a coherent access capability across the data repositories managed by NERC. This activity is particularly significant because:

  • It has a strong focus on custodial arrangements for data
  • It provides a exemplar for the bridge between government and research communities
  • It has undertaken significant research into the practical issues of managing stakeholder access to distributed data sources
  • It has identified that information community semantics and interoperability via Web Services is the next key development, and that a collaboration with SEEGrid is a fruitful way to undertake this exercise

This appears to be nicely complementary to the current project.

AEON - Australian Earth and Oceans Network

AEON is an emerging network of researchers in the "Earth and Ocean" domain in Australia, in the process of applying for funding to ARC for establishment as an ARC Network. The AEON community is comparable with GEON, but it will be resourced only to provide a coordination function. AEON will look to independently funded external activities (such as SEEGrid) for the establishment of the data access and computational infrastructures.

Close collaboration exists between the AEON and SEEGrid teams. The two activities are naturally complementary, potentially providing technical interoperability brokering and communication channel functions between two large, diverse stakeholder communities.

ASDI - Australian Spatial Data Infrastructure

The ASDI is not an operational program at this stage, nor has it yet specified, adopted or identified critical interoperability standards for transferred information and service metadata. Rather, SEEGrid is seen as a driver for the ASDI, which will be created through such initiatives adopting compatible frameworks. At this stage the ASDI has only deployed the Australian Spatial Data Directory - a Z39.50 based catalog of metadata about data sets. This will be useful as a classification vocabulary in the medium term. SEEGrid should provide insight into the future developments and requirements for such catalogs within a service oriented architecture.

There are expected to be a number of key activities that will contribute deployed capabilities or reusable standards to the emerging ASDI, with cross-jurisdictional activities being by far the most important. To date these include:
  • National Land and Water Resource Audit
  • ICSM common data model
  • National Oceans Office Portal
  • ASBIA Interoperability Demonstrator Pilot

SEEGrid will need to resource ongoing liaison with these activities to ensure cost-effective implementation.

Technical Standards

This section explores the fundamentals of why and how to adopt technical standards and which standards to adopt. Technical standards will allow the SEEGrid vision to be decomposed into specific components that can be safely developed, deployed and updated independently by multiple stakeholders.

A brief outline of relevant technical standards is provided in StandardsFramework.

Layered standards

A key concept in this Road Map is that standards are layered. They build on each other, and each standard has a particular role to play. We must also distinguish between the marketing view of standards - which tend to be focussed on the potential benefit (or buzzword conformance) within a few of these roles, and a robust architecture which must fully explore how the relevant standards fit together. We assume that certain layers can be taken for granted, while others - primarily the more local or domain-specific elements - need to have specified policies, technical development or deployment strategies applied.

For example, there are a plethora of technical standards underpinning the humble electrical power socket, and a small number of basic patterns for these. Nevertheless, when designing an electrical appliance, it is only the well-known interface that is necessary to conform to.

When developing web-resources it is not usually necessary to know the specifications for TCP/IP, DNS, or even much about http. But at this early stage in the development of the Web Services paradigm, some understanding of lower-level or generic standards is required when designing a geoscience data grid.

For example, WSDL - Web Services Definition Language - is a means of describing the syntax of an interface with a web service. Interoperability however also requires that the nature of the service and the data to be transferred has a common meaning to both the provider and client. Thus OpenGIS Web Services may be described in WSDL, but so also may undocumented proprietary processing functions.

Multiple interfaces

Furthermore, a single business function may have multiple interfaces, each conforming to a different technology platform. Thus a single map rendering services can support a proprietary API and the OpenGIS Web Map Server interface. Of perhaps more direct relevance is that a single data base can support multiple representations (Feature Types) through WFS interfaces, equivalent coverages through WCS interfaces, catalog functions through Z39.50, directory functions through LDAP and an ontology view through OWL (an XML language). All are standards, yet all have specific advantages or functions.

Criteria for relevance

A requirement of this Roadmap is to articulate the broad decomposition of the infrastructure into components, where the abstract interfaces define the roles of standards in delivering interoperability. From this perspective it will be possible to identify an initial set of specific standards that meet a broad criteria for adoption.

The Engineering Viewpoint within this RoadMap will provide specific guidance as to the relevant standards layers and a baseline set for initial implementation phases.

Stakeholder relationship/system diagram (to be attached)

Deployment strategy

The immediate challenge facing SEEGrid is the establishment of deployed services that can act as a nucleus for future projects to enhance the capabilities available. Of particular importance are the data access services that establish common information models. From this base it becomes reasonable to expect the evolution of visualisation services and business applications (basic SDI issues) but also the capability to start to evolve additional data through ability to promulgate modelling services and dissemination of modelled results.

Thus, each project within the emerging SEEGrid infrastructure must provide for a legacy of discoverable information products.

Semantic resources

There will be diverse set of initial capabilities, information models and business applications. It is not expected that all activities must fit into a common semantic framework, but it is necessary that each deployed component provides for the publication of relevant metadata in a machine readable form - including:

  • data structures
  • vocabularies (or more formal ontologies) used
  • service locations

This will allow the development of either "wrappers" or "crosswalks" between different components as the need arises, without special agreements or expensive re-engineering. The components involved may provide complementary service functions, or may even be from different application domains. Nevertheless, use of existing semantic components (catalog and data standards e.g. both de facto and de jure) is to be encouraged, and must be promoted as a means to reduce the cost and effort of designing and building new components.

Technology deployment

Technology deployment strategy should prioritise the approaches required to deploy persistent data access services against:

  • key existing data sets
  • key existing data management technologies
  • open source options

The crunch

The main dilemma to be addressed is: An organisation with installed technology X is willing to serve its data, but the technology can only support an automatically generated view, published according to schemas that directly reflect the storage schema, and not the community information model. Options available are:

  1. support the registration of such services - thus requiring clients to undertake programming to interoperate
  2. resource the construction and deployment of an appropriate "wrapper", either installed locally around each service, or as a mediator service
  3. harvest the data in its entirety, using the proprietary schema, into a conformant repository (i.e. a "forward cache")
  4. develop supported profiles of community schemas that can be implemented through configuration of internal storage schemas

None of these is ideal.

Consideration must be given first to the testing of technology options against support for real community schemas, and the resourcing of technology options that are acceptable within the most common or strategic business contexts. This could be accomplished by specifiying conformance to the community schema within procurement activities, so that vendors become motivated to establish suitable capabilities.

It is recommended that a register be kept of successful implementations of community schema with available technologies, allowing download of configuration files to assist new data suppliers to quickly meet a common standard. This requirement has minor implications for the system architecture, since such a register must be seen as part of the community schema-management process.

Governance Framework

Each component within SEEGrid must be available for discovery and use by some set of the SEEGrid stakeholders. This implies a lot more than mere technical connectivity and protocol conformance. The governance framework for establishing common registries (including at least data models, vocabularies, service registries and authorisation profiles) defines the information community ("SEEGrid"), acts as the reference point for common semantics, and provides for service accessibility (access, authorisation and accounting).

A goal of SEEGrid is to evolve the most effective and least-cost governance framework that allows sub-communities to converge on interoperable patterns. The governance model must also be capable of scaling to include variable access priveleges and accounting.

The SEEGrid governance framework is intended to be established in the following initial phases:

Phase Governance requirements Responsibilities
1 - Demonstration Common information model and a sample processing chain.
Vocabularies of data services published as network resources by data providers
Registries managed by project team for duration of project
SSEGrid Roadmap project team (Geoscience Australia, PMD*CRC, CSIRO, Social Change Online)
2 - Initial Capability Registries established under a Service Level Agreement
Registries managed according to ISO 19135 principles
Feature Type Catalogs for all supported domain models
Components must be conformant to a registered domain model OR publish a complete new domain model
TBD - Would be good to be able to flesh this out though
3 - DataGrid Data repositories established
Virtual Data repository management (ability to store workflow outputs and preconfigured workflows)
Access, Authentication, Accounting framework established
Formal interoperability arrangements with international DataGrids
TBD - GRID Computing infrastructure services

Future phases would be based on specific requirements from emerging business applications, SDI and GRID infrastructures, and academic network capabilities. At some stage the close integration of modelling

Conformance

I made this up - seems like common sense but there may be a better blueprint or we might not want to include it all?

SEEGrid is likely to require a base set of standards to be mandated to ensure that the there is a consistent implementation of community semantics across various components. Naturally, however, the SEEGrid framework will be extended to include new data sources, new processing capabilities and new technology opportunities. It is proposed that a set of conformance profiles be established under a versioning scheme so that:

  1. Under Version X, new service interfaces and data models may be added that do not require redefinition of current standards.
These may be mandatory (for a given service) if they extend without replacing existing component standards. New interfaces may be added as optional without causing a version migration.
  1. Under Version X+1, some service interfaces and data models from Version X may be deprecated and new ones introduced as mandatory.

Conformance testing facilities for common services are encompassed in the notional architecture. These are a key support mechanism for both the community and the agencies managing SEEGrid infrastructure services.


Back to RoadmapDocument.

SEEgrid Roadmap - Information Viewpoint



Introduction

An information model underlies every data access and processing system. The model defines what object types may be operated on, and constrains what operations are available. In the ideal case the model captures the essence of the technical language of the users.

Depending on how wide or narrow the user base is, the information model may be more or less specialised. In turn this constrains the level of specialisation of processing applications within the system. In general, information with richer semantics provides a basis for richer processing with less user intervention.

Distributed processing requires that the information is transferred between system components. The semantics of the information transferred constrains the use that can be made of it by other components. This is particularly important when using processing chains to accomplish complex operations. The type of information relayed by each link must make sense in context.

The discussion is kept mostly at a conceptual level, but for illustrative purposes we present some implementation-level examples, primarily using XML and XML Schema Language.

Data models and semantics

In a computational system involving information transfer, the information model is realised in file or stream formats. In many cases the description of the file format is the only formal documentation, so it may be necessary to analyse this as a proxy for the data model.

Some formats are targetted at generic processing applications, and only explicitly capture low-level semantics. For example: html documents are for rendering in a web browser; spreadsheets are for loading in an application that manipulates data stored in table cells. These are "generic", in the sense that the organisation of the information is independent of the application domain. Higher-level semantics may be given in metadata or may be implied by layout conventions - e.g. column headings in tables – but this information does not directly affect processing.

Richer data models are based on the conceptual significance of the data, not just its structure. For example, to support decision making in their domain, geoscientists usually talk about "faults", "plutons", "boreholes" and "measurements", not "points", "lines" and "polygons", and certainly not “tables”, “tuples”, “lists” or “pages” etc. The latter are geometry-centric and representation-centric abstractions, which are necessary at an implementation level, but are not used when information is being conveyed between practitioners.

Similarly, when information is being transferred between components of a distributed processing system, an effective encoding will capture its meaning, and not just its geometric abstraction. The aim is for the digital representation to reflect the language used by practitioners in the application domain.

In the context of web-services, the exchange format is usually based on XML, in an application language described using W3C XML Schema Language. XML documents provide direct serialisation of tree-based data structures. However, the tree may reflect models at various conceptual levels, from page-layout, through tables, to fully structured conceptual data models.

Information models for geographic information systems

Geometry-centric models

Most conventional GIS require the user to work with a geometry-centric data model (points, lines, polygons) loosely coupled to attribute tables. Useful maps can be produced from these using colour, symbolisation and overlay. The technology is mature and broadly available.

But these data models only give limited guidance as to its meaning, since the same structures are shared by data types that are semantically distinct. For example, it is necessary to explicitly inform an application that the layer called "ROADS" is a valid basis for routing analysis, while "FENCES" isn't (unless they are electric fences), even though they are both formatted as sets of curves. More sophisticated processing typically starts by requiring a human to interpret layer or column names.

The converse also occurs, where information with the same underlying meaning is delivered in varying structures. For example, a physical field, such as gravity or temperature, may be represented by both a raster and set of point samples.

The meaning of the information in geometry-centric representations is typically captured in words accompanying the data. Interoperability can be achieved through use of standard words in layer names, attribute tables, and authority tables for attribute values. These are often established by a dominant data supplier, but the approach might be consolidated by a clearly articulated governance framework to establish and maintain the model and terminology for the community of interest.

However, there are other limitations in the geometry-centric model. In particular, an object is tied to a single geometry at a single scale. While this has certain advantages in software implementations, it biases the representation of real-world objects towards simple models using a limited set of geometries.

The feature model

Newer systems have moved away from geometry-centric modeling, in favour of models pitched at a higher semantic level, using a model for information whose central concept is the geographic Feature.

A feature instance is an identifiable object in the world, or the digital representation of it. As a rule-of-thumb, if an object
  1. has a name, serial number, or can other wise be assigned a unique identifier
  2. is interesting enough that a description of it might be transferred from one party to another

then it is a candidate "feature". This may include things from the following general categories
  1. physical objects created by people, such as mines and boreholes
  2. natural objects, such as islands, faults, beds
  3. objects defined by a combination of natural and social/economic factors, such as ore-bodies
  4. transient objects such as events, including observation events
  5. "coverages" - objects that encapsulate the variation of a property in space &| time

Note also that the same information set may be considered for different purposes, and therefore expressed as different feature types. For example, given a set of observations made on a set of specimens collected in a borehole, we might
  1. describe each observation separately, with its metadata describing the procedures applied, date and operator etc, as an "observation" feature
  2. bundle observations on a specimen-by-specimen basis, with a set of properties for each "specimen" feature
  3. bundle results to show the variation of a single property along the borehole, as a "coverage" feature

These are all legitimate views of the information, and thus sensible feature types to use. See more discussion at https://www.seegrid.csiro.au/twiki/bin/view/Xmml/InformationViews.

Feature types and feature properties

Following classical category theory, features are classified into feature types on the basis of common sets of characteristics or properties (see General Feature Model (GFM) defined in ISO 19109). The properties may be:
  • attributes
  • associations
  • operations

In the REST architecture which underlies the Open GIS Consortium Web Services model, interactions are stateless. The practical impact of this is that while services exhibit behaviours, only static data may be transferred. Consistent with this, an XML "document" is a static data representation, so only attributes and associations can be described on the wire. For consistency with the GFM, the term "properties" is used in discussions of GML.

In languages based on GML, each feature instance is described using an XML element, whose name indicates the feature-type. Sub-elements and (occasionally) XML attributes record the properties.

See https://www.seegrid.csiro.au/twiki/bin/view/Xmml/WebHome#GML for more detail on GML patterns and their relationship with other information model approaches.

Property values, value-spaces

We distinguish between the property-type (its semantic value) (indicated in GML by the property XML element name), and the property-value (given by the element content). The value may be a literal (WXS simple type), or may have explicit substructure (usually instantiated in GML using sub-elements whose structure is described in WXS complex-type). For example:
  • all features have an (optional) property description whose value is some text describing the feature using natural language;
  • many features have a property position which has a Point as its value. Point is itself an "object", and has the property pos which contains the coordinates, and another property srsName gives the coordinate reference system.

For many properties in a domain-specific language, their values are required to be members of a specific value-space. Text values might be selected from a code-list or vocabulary, or be required to follow a certain pattern. (Boolean can be seen as a special case, where the value-space only has two members.) Numeric values must have a unit-of-measure and may be limited to a certain interval or precision. It is essential that the model provide a means to either
  • constrain the property values to the relevant value-space, or
  • indicate the value-space being used in the current instance.

A key part of establishing a community schema is the selection and prescription of the vocabularies, units, reference systems etc that describe the relevant value-spaces.

Spatial properties

Although "features" as used here almost always have a spatial context, the feature model does not consider geometric and spatial properties of features to be different to other properties. A feature may have multiple geometries, each labelled with a role such as “centroid”, “boundary”, “trace”, “shape-at-1:25000-scale”, etc.

Note that even in a feature-oriented system, a geometry-centric model will often still be used behind the scenes. The goal of the feature-centric approach is to at least insulate the user from this abstraction, in favour of a model that operates at a level which is more natural for problem solving in the application domain. These alternatives thus correspond to different levels of abstraction, both of which may be realised in different layers in an implementation:
  • the feature model corresponding to domain concepts will be shown on the interface
  • a lower-level abstraction, such as the geometry-centric model, may be used internally and for storage.

But if a feature oriented view of the information can be provided on the user-interface, it is a small extra step to make this available through software interfaces. This then supports the deployment of a semantically-aware service architecture.

The coverage

The principal alternative to the feature view is the geographic coverage, which focusses on the variation of a property within the spatio-temporal domain of interest. This is described in ISO 19123.

Coverages are commonly encountered based on a grid (e.g. imagery) though any geometry complex of any dimensionality may be the domain of a discrete coverage. Note that the term coverage is used in a subtly different way in some common GIS software.

Coverage types

Often typed on basis of domain geometry

Semantic typing alternative

Information models and interoperability

A vocabulary is always tied to a community. The community may quite reasonably be scoped in a variety of ways, such as:
  • a single work-group or enterprise
  • a cartel or group of enterprises that have common transactional relationships
  • an industry, at a local, state, national or international level
  • a technical discipline, sub-discipline or group of disciplines
  • etc.

The size of the community that agrees on a data model fixes the boundaries of interoperability. In order to be a member of a particular community you must agree to speak and listen to the community language.

Local or private models

Most existing data models are schemas developed within a single organisation, within which software tools will be available to manage information according to the model. The interoperable community is effectively limited to the organisation.

The organisation may also choose to publish a description of the model - perhaps even a "GML View" - in order to make information products available. However, the client is stuck with the job of dealing with information provided in a model that is foreign to them. So if information from two different sources but covering the same topic is to be reconciled, the client must convert one or both datasets. A client that wishes to access many information services will need to understand all of their models.

Community models

The proposition of the SEEgrid architecture is that the relevant community for many purposes is larger than a single organisation, and communication between the parties within the community should use a common language. The language is primarily composed of the feature-types of interest, and information services should therefore provide data products that are based on the community feature types.

Publishing data in this way will usually involve a mapping from a private model (the storage schema) to a public model (the community schema), but the burden of translation is pushed back to the server. This is entirely appropriate, since the information service owner will have the best understanding of their internal data model and is in the best position to map it to the community model. The community model is the lingua franca, but having this common point means that interoperability requires order N mappings (to and from each private model to the community model) rather than order N2. It is expected that this reduced complexity will eventually be seen to be the driver for significant cost/benefit advantages. (This is a topic covered in the EnterpriseViewpoint, however the metrics for assessing some costs derive from the information design issues raised here).

Note that it is entirely possible, and indeed quite reasonable, for feature types with the same name to be defined independently by different communities, resulting in definitions with different models. For example the communities may have have different interests in a feature type (e.g. use vs. maintenance) leading them to focus on different properties of the same feature. Effectively, these definitions are of different feature types, and the communities will be unable to share instances in any meaningful way. Practically this is managed by explicitly identifying the authority for the definition in the serialisation, e.g. using XML namespaces, where the name "use:Road" is completely distinct from "dig:Road".

Feature-type catalogue

The Feature-type catalogue (FTC) is the primary vocabulary for the community, defining the nouns in the application language. A complete FTC provides
  • a list of feature types
  • relationships between feature types
  • definitions of the feature types, in terms of their properties.

A FTC may be implemented using a number of different technologies. Use of a formal notation, such as UML, Express, or W3C XML Schema (WXS) is essential to remove ambiguity, and also for software production.

Within the OWS context, the default implementation is as a GML application schema. This conforms to the requirements of the Web Feature Service (WFS) interface, in which the response to a DescribeFeatureType request is the XML schema for the feature type.

Organising the Feature Type Catalogue

The FTC may be instantiated as a simple list of feature type definitions: this is shown in ISO 19109 and ISO 19110. Thus, the complete XML Schema describing the GML representation of a set of feature types and property definitions - i.e. the GML Application Schema - can act as a formal FTC.

The WXS definitions in the application schema provide
  • a description of the structure of the feature types of interest, in terms of their content-model (i.e. the XML Schema type definition), leading directly to the associated syntax for encoding this, as an XML document
  • implications of some semantic relationships between feature types, using WXS "substitution groups" (and the supporting type derivation chains).

The semantic relationships between feature types are usually quite important. For both data discovery and publishing, it is necessary to be able to explore the set of available feature types, in terms of their definitions, but also in terms of the relationships between types. For example, for some operations a "fault" may be needed, while for other purposes the more generalised "geological boundary" may be adequate. However, faults should be included when requesting boundaries.

However, in this area WXS is limited. Elements may only be assigned to a single substitution group, and membership is constrained by the requirement that the WXS type of the member is derived from the WXS type of the head - effectively a single-inheritance model, supporting a single "semantic" hierarchy.

In many cases semantic relationships are underpinned by structural relationships, so inheritance of properties is appropriate. But this does not always match the conceptual model. Furthermore the peculiarities of WXS sometimes get in the way of developing the required derivation chains. Finally, multiple indepedent classifications of feature-types may be required, for example in a "facetted" classification system. Thus, it may be useful to assert the semantic relationships between feature-types independently of the WXS definitions, and to provide multiple interfaces to the feature type catalogue in support of multiple hierarchies.

There are a few methods that might be used to support this. The OASIS Registry Information Model addresses the issue directly by supporting multiple classification views of the same objects. The Web Ontology Language (OWL) is an RDF-based serialisation for semantic relationships in the form of assertions linking one resource to a another. The nature of the relationships between resources can be defined explicitly, and includes but is not limited to "subtype-of" type relationships.

Ontologies

Ontologies provide for formalisation of complex semantics. SEEGrid will use ontologies as its basic unit of semantic agreement (i.e. common understandings and classifications will be treated as ontologies, not just vocabularies). This means that "word lists" will be instantiated as objects that can, for example, be further described, cross-referenced, versioned etc.

This is consistent with the current state-of-the art within the ASDI context, for example the National Oceans Office portal design requirements stresses the role of ontologies. It is also consistent with mainstream IT developments, in particular the ebXML standards framework has a registry model (RIM) that supports interrelationships in what is effectively a registry view of an ontology.

The set of feature types and the relationships between them defines a feature-type ontology for the application. Interfaces based on this are a key to
  1. discovery of suitable data by a potential data consumer, and
  2. assignment by a data provider of data to types from a community language

The technologies used for strong- and soft-typing come into play as follows:
  • Relationships between strongly-typed feature types are primarily expressed in the UML context by class-inheritance, or in the XML context by substitution group chains that depend in turn on type derivation by extension and restriction. XML Schema derivation only supports single-inheritance explicitly, and there are certain other restrictions on derivation emerging from the peculiarities of the XML Schema language. Overall, the set of relationships that can be described in either UML inheritance of XML Schema derivation is usually incomplete relative to the complete set of conceptual relationships that exist in the application domain
  • Relationships between weak-typing classifiers can be asserted directly, for example by using OWL (Web Ontology Language). Relationships can be added arbitrarily, with no requirement for any underlying relationship between the feature-type definitions (i.e. the property sets can be completely disjoint). This supports more powerful and flexible discovery mechanisms, but it also does not ensure any conceptual integrity to the ontology.

These mechanisms need not be exclusive. For example, in order to gain the processing benefits of strong-typing, a feature type catalogue may be provided as an XML Schema defining the feature-type structures, but an associated OWL model may be provided describing the relationships between the feature types. The latter provides a highly structured "index" of the catalogue, supporting a richer "discovery" view with additional relationships that are not possible from the XML Schema.

Duality of GML Feature Catalogue (strongly typed) and Ontology (used in weak typing slot)

The SEEGrid Roadmap introduces a novel, but practical, strategy for dealing with the dilemmas posed by different drivers for strong vs weak typing: a mechanism for formal equivalence. Whilst some experimentation is still required, strong and weakly typed schemas can be derived from an "ontology of feature" types view, where the ontology supports abstraction relationships as well as lower level property data types. This is in fact a formalism of the ISO 19110 Feature Type Catalog and promises to have the following advantages:
  • manage a richer feature type description than possible just in UML or XML-schema
  • derive multiple mappings from the feature types to different weakly typed template objects
  • provide vocabulary and ontology views naturally to support classification of external objects
  • manage multiple representations in a single "code base"
  • easily extensible without compromising a self-contained schema or model
  • fits easily into community registry governance model
  • can be done in standard XML technologies (OWL, XML-schema, XSL, GML)

[diagram here]

Developing an Information model

A general methdology for developing a community application schema is outlined in ISO 19109, from which the feature catalogue emerges. It involves four steps:

  1. surveying requirements from the application field
  2. making a conceptual model using the concepts from the GFM
  3. describing the application schema (feature types) in a formal modelling language (e.g. UML + OCL)
  4. integrating the application schema with other standardised schemas (spatial schema, temporal schema, etc)

These steps are not consecutive, but provide a framework, upon which we can examine the status of the application domain of interest: Australian geoscience.

Requirements analysis

Development and maintenance of a feature-type catalogue will involving various levels of consultation and consensus (ISO 19135). However, in order to be acceptable and useful within the community, it must capture established usage at an appropriate level of detail.

Exploration geoscience is a relatively mature application domain, with a history of information sharing driven by an important mining industry, an active statutory sector, and containing a vigorous software-development sector. Thus many data models are already available. Primarily these are comprised of database schemas and export file formats from application software.

A particularly significant Australian initiative was AMIRA P431, which developed a consistent model for exploration data. Another important contemporary initiative is the North American Data Model (NADM) developed by USGS, GSC and the state and provincial surveys for mapping data and intepretations.

A significant fraction of the information for step 1 is therefore already available, at least implicitly, through reviewing the legacy models and formats. However, given the abundance of information available, determining the (current) model scope is necessary in order to allow focus on the relevant legacy material.

Given that many of the existing models were developed for limited or specialised purposes, it is important to do regular scope checks, and to ensure to take input more broadly than merely reviewing legacy models and formats.

Conceptual model development

In general, the modelling methodology used in geoscience has been inconsistent. Similar information (e.g. location) is handled differently across tables, even from the same organisation. In AMIRA P431 the meta-model is provided only implicitly, by its use of E-R notation and a particular CASE tool ("System Architect"). A notable exception was the sample and site oriented databases held at GA, that were unified under the "sites" model in the 1990's, though this is now being superceded. NADM is primarily based around a meta-model for "concept", and a firm distinction between observations (evidence) and interpretation. But overall, currently available models need considerable adaptation in order to use the GFM.

A critical part of step 2 is to identify distinct feature-types of interest.

At this point questions of feature-type granularity become important: when to split categories into more specialised types. Splitting results in simpler and cleaner data instances, fewer optional components in models, lighter-weight processing modules, and more specialisation within the associated authority tables constraining property value-spaces. However, a profusion of types is difficult to use without an effective index and accessibility mechanism, especially if there are extended derivation and inheritance chains. More types also imposes a model maintenance burden, and clients carry a cost in having to implement more modules.

A model with fewer, more generalised features, may be easier to maintain, and more elegant. But generalised features also require more flexibility in their model, with many properties "optional" because they are used by only a subset of the type. Generalisation also requires users to be willing to lump features, or apply abstractions that may not be immediately obvious (e.g. a well-log is merely a 1-D example of a "coverage", most usually encountered as the model for 2-D gridded data and imagery). Furthermore, in order to capture the complete semantics, it is likely that a "soft-typing" parameter will be required, to indicate the relevant sub-type when a more generalised feature-type is used.

Modeling has to offset the need for specialization and precision against simplicity, which in turn determines the balance of the processing burden between data provider and data consumer.

Thus, the issue of granularity is strongly related to strong- vs. weak-typing approaches, discussed at some length in StrongWeakTyping.

Formalising the conceptual model

Use of a conceptual schema language

The technology used in both modelling and implementation has an influence on the likelihood of success with each approach. The ISO 19100 standards formally require that a "model driven architecture" approach is used. This involves development of a complete information model, using a suitable conceptual schema language ("CSL", usually UML). Since UML is a graphical modelling and analysis notation, it must then be converted to the desired implementation model - for example, a database schema for persistence, and XML for transfer. The MDA theory is for this to be generated automatically as far as possible, by application of a set of conversion rules. The strength of this approach is that multiple implementations can be generated from the same conceptual model, with assurance that they are fully equivalent.

Alternative modelling platforms

This is strictly possible only if each implementation platform has the capability of fully implementing all of the capability of the CSL. In practice each implementation platform has different strengths and quirks, and indeed so does UML. So each translation inevitably distorts or dilutes the original model, which means that round-tripping involving anything more than a subset of the capabailities of each language is almost certainly imperfect. Furthermore, simple application of mechnical translation rules also means that potentially useful or efficient capabilities of the implementation platform are ignored if they do not map to a capability of the CSL.

For example, XML is a static data notation, so UML class operations must be either ignored, or represented by elements indistinguishable from those representing attributes and associations. When converting from XML to UML there is no way to detect these. There are richer rules concerning element cardinality, and choice models within content, in XML Schema than UML. Element order is significant in XML documents, so relationships between information items may be indicated by proximity instead of nesting, in a way that is meaningless in UML. Perhaps the biggest strength built-in to XML is hyperlinks, which supports associations with remote information items compactly and directly. Breaking the "closed-world" assumption of conventional information systems is perhaps the biggest innovation of web-based information architecture. The same capability can be modelled in UML, but is not native so does not emerge naturally from modelling exercises based in UML.

There is no question that UML is the most powerful tool available for describing and analysing object models, of which applications schemas based on the GFM are a special case. So, for step 3, we have found that there is considerable merit is using other notations, such as W3C XML Schema Language, for modelling in parallel. Useful "idioms" emerge that can then be reflected back into a UML implementation. If WXS is preferred for modelling, then UML can serve a very useful function of documentation, since it is a standard graphical notation.

The GML data model (meta-model) is highly regular (see https://www.seegrid.csiro.au/twiki/bin/view/Xmml/GmlFeature), so provided that the GML Rules for application schema are adhered to, or provided the profile of UML described in ISO 19103 is used, then generation of GML schemas and instances from UML and vice versa is straightforward. The rules for converting both ways are given formally in an annex to the GML 3.1 recommendation.

Note that NADM does not follow ISO 19103, though this may be resolved through a collaboration currently underway under the auspices of IUGS.

Scope

When formalising the model, the scope issue surfaces particularly in terms of whether the intention is to develop a comprehensive model, or whether incremental, prioritised development, only leading to a limited set of feature types nominated by key stakeholders is acceptable. The latter approach reduces the risk associated with "big-bang" development, and is more scalable. But subsequent development of additional feature types inevitably leads to a re-examination of the existing "completed" components, which is likely to require that a versioning mechanism be introduced.

Use of a modular platform, such as XML with namespaces, and UML with packages, provides good support for incremental development.

Integration with standard components

In the context of Open GIS Consortium Web Services (OWS), step 4 is realised by the development of a GML Application Schema. This describes an XML language for the feature types in the application domain of interest.

A GML application language has the following key characteristics:
  • a pattern of element names and nesting that directly instantiates the GFM model for each feature type
  • standard capabilities (spatial, temporal, corodinate reference systems, others) are implemented through importing components provided in the core GML schemas
  • the normative version is expressed using WXS.

Development of a conformant GML application language is the subject of a major clause in the GML Recommendation paper. A paper on Developing and Managing GML Application Schemas compiled by Galdos Inc is available from GeoConnections Canada.

XMML and related languages

The eXploration and Mining Markup Language (XMML) project has been developing a GML Application schema for exploration geoscience in collaboration with several of the sponsors of this project.

For a summary of the project goals and current capabailities provided by XMML, see https://www.seegrid.csiro.au/twiki/bin/view/Xmml/ProjectSummary.

The Feature Types developed for XMML v 1.0 are primarily artefacts of exploration activities (boreholes, observations, procedures etc). Other relevant feature types are under development in complementary projects as follows:

  • Geology features are the subject of projects underway in some of the Australian surveys, in British Geological Survey, and most notably in the long-running North American Data Model from USGS/GSC, which focusses on types found on geological maps. Harmonisation of NADM with the OGC/ISO Feature model and GML/XMML encodings is underway.
  • Geochemistry/Assay data - through the ADX project
  • Plate tectonics descriptions - through the GPlates/GPML project

In most cases these will produce FTC's in other namespaces.


Governance of the FTC: how do I get my feature type included?

We need to establish syntactical and governance mechanisms so that interested members of the community can choose, per feature type, as appropriate:
  1. to reuse a feature type already defined and registered in the community FTC
  2. to extend a feature type, be it abstract or a similar type of object (eg "Hazard")
  3. to create a new feature type, and document it well enough to allow reuse by others (including conforming to the community baseline standards and governance models)
  4. register it as a "foreign" with a mapping to a feature type from the community FTC for information purposes
  5. create private feature types not intended for re-use by others.

Service interface to Features: who translates the data?

The OGC's Web Feature Service (WFS) is the canonical interface through which a data provider publishes descriptions of feature instances. However, the WFS specification is neutral regarding the design of the GML language exposed by a WFS service: a service using a GML-ised private model is conformant. There is no requirement to use a model defined by a community.

Thus perhaps the biggest information viewpoint issue is whether services publish their offerings merely using a version of their corporate data model, or alternatively map their data to some community schema on the interface. The former is easier for the service provider to implement, and is effectively what most of the COTS WFS systems support now. But it pushes out to the client the processing burden to reconcile data sourced from multiple services and expressed in different models.

Note that the community approach is at least implied, and is arguably quite explicit in ISO 19109 and ISO 19110. These standards do not concern the WFS interface directly, and there are plenty of vendors who resist the notion that a useful WFS (as opposed to a merely conformant WFS) includes a model/schema mapping layer.

This means that a degree of coherence between different service interfaces can be established using externally defined Feature Types:

* Relationship between Feature Types and Service Interfaces:
Relationship between Feature Types and Service Interfaces

However, the "lazy" approach is used by most existing WFS software, in which the GML is generated by a direct mapping from the table structure of the source, so the feature-type definitions are directly related to the storage model. But if you want to compare information coming from different sources, then it has to be made commensurate somehow. It either gets done by the server or by the client, and we suggest that as the organization hosting the server understands their data best, they should be responsible for the mapping. We suggest that interoperability is best accomplished by the server accepting the responsibility of mapping to a community data model. This supports the deployment of lighter-weight clients that can pre-configured to the model. But it requires some governance process for the "community".

Using software designed to support the "lazy" approach, if the service provider wishes to publish using the community model, they must convert the storage accessed by the WFS to a schema corresponding to the public model. For most organisations this will result in replication of their data: once in their private model that serves most of their business purposes (and probably also includes private fields), and once in a cache to support the community WFS view. Synchronisation of the two data stores then becomes an issue, particularly if the WFS is transactional - i.e. an information upload as well as publishing service.

For more discussion of mappings between the feature model/GML and conventional tables schemas, see https://www.seegrid.csiro.au/twiki/bin/view/Xmml/InformationModels.


References

ebRIM
Registry Information Model http://www.oasis-open.org/committees/regrep/documents/2.0/specs/ebrim.pdf
OWL
OWL Web Ontology Language, Overview http://www.w3.org/TR/2004/REC-owl-features-20040210/


Back to RoadmapDocument

SEEgrid Roadmap - Computational Viewpoint



The computational viewpoint is concerned with the functional decomposition of the system into a set of services that interact at interfaces.

The SEEGrid computational architecture has two main challenges:
  1. Identify the necessary components
  2. Identify suitable interfaces between these components

The SEEGrid architecture will draw on "mainstream IT", with particular reference to those initiatives that address interoperability across enterprise boundaries.

This section will identify key patterns and relevant standards, and pay particular attention to the additional considerations that must be supported by the SEEGrid architecture by the addition of appropriate layers of the architecture.

Service Oriented Architectures

The basic SOA model is shown in Figure 1.

pub_find_bind.gif

Figure 1: Service Oriented Architecture: Publish/Find/Bind Pattern

Of particular note is that in mainstream IT, business functions are essentially unbounded, and thus there is an assumption that a Developer will find it necessary to create customised code to access a service. The SEEGrid architecture (in line with the SDI model) will seek to constrain service interfaces so that certain types of services are *interoperable.*This means that data access services will become standardised and a software framework can be established.

The SEEGrid architecture also adopts a key principle from ISO19119 and the OGC-RM: a service can have multiple interfaces . This means, for example, that a given business function can support multiple information views (e.g. as a feature collection, a coverage or a controlled vocabulary) and multiple " Distributed Computing Platforms " - i.e. technical implementations. Consequently, whilst there is a need to fully implement at least one DCP profile of component interfaces to ensure that services will interoperate semantically, specific functions can be integrated with external systems or specialised workflows. Thus the most likely pattern is to ensure that each component has at least one common mandatory implementation and optional alternatives. If it proves necessary SEEGrid itself can then undergo coherent evolution through versioning of the mandatory profiles.

The key Service Oriented Architectures of note are those promoted by the W3C, UDDI and ebXML communities, and the Open Grid Services Interface. There are a number of harmonisation efforts under way and a new approach recently published (WSRF - Web Services for Resource Frameworks) that seeks to bring the state-ful nature of GRID processing into a position where it is an extension of the W3C Web Services work (SOAP, WSDL)

The actual drivers for the decisions rest in the business issues (Enterprise Viewpoint) and deployment practicalities (Engineeering Viewpoint), but this viewpoint must establish the basic principles, feasibility, starting points and approaches to sustainability of the computational aspects.

The notional architecture section below describes the components, and how they might be partitioned. The interfaces between them will be based on the "stack" of standards shown in Figure 2. Note that standards are largely described by functional behaviour with current best practice implementations highlighted in (parentheses). Note also that care has been taken to separate the protocol functions from the content and metadata encoding standards.

Figure 2

Figure 2: "Layer" view of key protocol and content description standards

Thus, the SEEGrid architecture can be seen as a layering of standards and efforts, with each layer potentially evolving, being replaced or augmented by alternatives, yet the overall effort of creating a series of semantically interoperable capabilities involving a very specific set of challenges.

Notional architecture

Figure 3 shows the key components required to implement the SEEGrid framework. The interfaces between these components are based on the suite of protocols and content encodings as per Figure 2.

components.gif

Figure 3: Notional Component Architecture

An animated version of Figure 3, highlighting specific examples, is available:

Points to note:
  • Each component may have multiple deployed instances
  • Each component may support multiple bindings of interfaces to technologies
  • All interactions are governed by the same set of rules
  • The diagram clearly shows which content and metadata types are managed as registries (using common community data models and semantics) and which can be provided by data access services that merely need to conform to the published data model. (in other words, the registry contents define the semantic scope of SEEGrid)
  • Models and Observational data are regarded as equivalent
  • Feature Collections may provide ontologies in thier own right - for example a set of named phenomena may be used to classify other resources
  • Model management is seen as an exercise in creating and publishing service chains and results. There must be a tight coupling between the service chaining framework and the registry model, consistent with other forms of data classification.
  • There is an implicit relationship between Feature Type Catalog, Features and Coverages, as explained in the Information Viewpoint

Overlapping GRID Infrastructures

GRID infrastructures will tend to deliver a set of functional services, to a community defined by jurisdiction and domain. Thus, the UK NERC DataGrid will deliver data access services to the stakeholders of Natural Environment Research Council (both as clients and providers).

SEEGrid will thus establish interconnectivity at both a technical and governance level with key GRID infrastructures. The first priority for SEEGrid will be to establish a "DataGRID" capability to allow the exploitation of emerging computational GRID capabilities, and the semantic interoperability with specialist modelling facilities and academic networks.

[more details and Louis Moresi's pic in here - have contact him to ask for it.]

Integrating Suites of Interface Standards

The discussion above shows the key components and a functional breakdown of technical standards. SEEGrid will be based, in practice, on "suites" of interface standards that meet the minimum requirements:
  • managed by a transparent and robust process
  • logically consistent with SEEGrid requirements
  • implemented (inc planned or implementable) by available software
  • preferably supported by open source "reference implementations"
  • provide a significant advantage for at least some interoperable services over alternative options

The key suites of interface standards of interest are the Web based SOA components of
  • ISO 19000 series
  • ISO 15000 series (OASIS ebXML)
  • W3C transport oriented protocols and metadata (HTTP, SOAP, WSDL etc)
  • OpenGIS Consortium interfaces (as per ASDI policy)
  • WSRF : Web Services Resource Framework (aka Open Grid Services Architecture)

The key suites of "content" standards are
  • W3C XML
  • W3C OWL
  • OGC GML (aka ISO19136) (especially XMML as a developed application schema)
  • NetCDF - used for existing "packaged" information products

The roles and layers described above are critical to understand how these standards interact and in fact are mutually interdependent. SEEGrid will provide the mechanism to adopt technical standards as they prove their usefulness and ability to "map" onto the semantics of the overarching Service Oriented Architecture. The logic here is that if the semantics can be mapped then it is possible to build a data converter or gateway service between components. The onus is however on the proponents to provide the formal semantic mapping and sample gateway or data conversion services for broader SEEGrid community to be able to evaluate the option, and then adopt if advantageous.

SEEGrid will maintain a Web accessible formal ontology of such standards and tools to map between them, so that it will always be possible to identify how possible standards can be applied, and also the preferred options for interoperability within the SEEGrid community.

Many of these mappings are obvious or being explored as part of the standards tracks themselves. SEEGrid will nevertheless introduce an "information community" perspective into these efforts, and doubtless be required to pioneer some level of implmentation, refinement and community consensus approaches.

The key mappings to be explored within the next phases of SEEGrid are:

Phase 1b:

  • GML application schema / OpenGIS WFS implementations

Phase 2:

  • GML coverages using native NetCDF encoding
  • GML application schema to OWL ontology
  • OWL ontologies to ISO/OGC/ebXML registry concepts (Feature Type Catalog)

Phase 3 (in collaboration with NERC)

SEEgrid Roadmap - Engineering Viewpoint



Overview

The Enterprise, Information, and Computational viewpoints describe a system in terms of its purposes, its content, and its functions. The Engineering viewpoint relates these to specific components linked by a communications network. This viewpoint is concerned primarily with the interaction between distinct computational objects : its chief concerns are communication, computing systems, software processes and the clustering of computational functions at physical nodes of a communications network. The engineering viewpoint also provides terms for assessing the “transparency” of a system of networked components – that is, how well each piece works without detailed knowledge of the computational infrastructure.

SEEGrid is a SOA that allows for a great deal of flexibility in how services and data are deployed. The actually deployment of processing services, data and applications will evolve as efficiencies and organisational responsibilities become clearer. SEEGrid will focus on ensuring the standards to be used allow the components to be deployed anywhere in the network. Thus, a portrayal service (visualisation) may occur within a modelling package, in a separate network service, in an application service or in a user's browser environment. SEEGrid needs to ensure that the semantics of the data and the process are well enough defined so that all these options are possible, and let component owners, software builders and network optimisation issues resolve the "best" way of solving a particular problem.

Deployment View

Shared (Authoritative) Components

The following components need to be managed as well-known, authoritative services:

Registries

Registries use "catalog" interfaces to support search and retrieval, but are characterised by the "governance framework" that allows a client to pose a meaningful query and interpret the data retrieved. Registries may be centralised or "federated" through harvesting, delegation of queries to "partitions", searched in parallel or linked through referral mechanisms. The exact mechanism is not important, however the commonality of sematics and syntax (interoperability) between registries is critical. SEEGrid will seek to adopt (as a documented community consensus policy) externally provided registry infrastructure as it becomes available, but will establish its own registries and governance models where required.

Ontologies

The semantic framework to support interoperability must be shared, and this means that common vocabularies must be established as web accessible components. Further to this, the relationships between vocabularies as a means of classifying content (data) and as descriptors of structural information elements must be managed (strong vs weak typing), and finally, relationships between registered objects of different types must be managed in an authoritative way. The upshot is a general requirement to manage ontologies, initially at least through registration of normative description (e.g. in OWL syntax) in a register. In many cases, the ontology will actually be the contents of one or more registers.

Ownership and management of common ontologies has always been at the heart of any successful interoperability framework. SEEGrid faces the challenge of identifying an appropriate registry owner to host ontologies, in particular the Feature Type Catalog that are in turn managed or extensible by the SEEGrid community.

Quality of Service

Quality of service is a significant engineering issue in an SOA. Much of the framework for handling intermittently available services and varying network traffic considerations is provided through basic Grid technology. SEEGrid will thus initially focus on developing an understanding of the potential data flows in the domain and characterising these according to data volume, sensitivity, etc. From this it will be possible to work out guidelines for different classes of services which can be characterised by run-time behaviour. What processing components get deployed where will be heavily influenced by the specific problem, and a general approach will need to be identified to aid in maximising reusability of services.

Redundancy, Brokering and Caching

These issues are all critical to a robust, scalable architecture. SEEGrid adopts the pragmatic approach that the first priority is to create semantically robust services that can be replicated, discovered and reused. Ongoing tracking and further analysis of requirements, and probably evolution of the core computing frameworks in this regard, is necesary.

Deploying new standards

The critical success factor of SEEGrid will be the ability to support ongoing adoption and adaption of standards within sub-communities without unnecessarily constraining them, or allowing them to diverge to the point of non-interoperability.

The basic practice proposed is that a baseline of standards is published, and stakeholders invited to register their adoption of each standard. Each sub-community that wishes to adapt or change these must:
  • publish a rationale for a new set of standards, and invite comment for a set period
  • provide a migration strategy that addresses the needs of all stakeholders
  • register all the necessary components of the standard in appropriate formats
  • update the SEEGrid community standards ontology with relationships between new and existing standards
  • provide gateway services to for legacy implementations where necessary (i.e. deploy a "wrapper" service, register stylesheets or vocabulary cross-walks etc)

By following this model, the entire community is alway privy to what has changed and why, and has access to all the resources required to migrate. It also makes it easy for sub-groups to extend to meet business needs, but harder to create a completely incompatible island of technology.

Deployment of wrapper services around legacy technologies (where such services cant support multiple interfaces as a matter of course) means that there will be a mild performance cost for not upgrading, but still a basic capability left for users.

It is also worth maintaining a register of client applications that rely on services so that it becomes possible to contact all interested parties if an upgrade is planned. Thus, if a user doesnt want to register their interest they take their chances with stability.
Back to RoadmapDocument.

SEEgrid Roadmap - Technology Viewpoint



Overview

Available choices

The SEEGrid RoadMap does not prescribe a complete suite of specific technologies for a homogoneous network. Nevertheless, a set of technology choices is required to establish an initial capability and demonstrate the potential value.

The available technology options and practical examples of their implementation are given in Figure 1, using the components described in the Notional Component Architecture (Computational Viewpoint - Fig 3). The table gives implementers and end users of services some guidance as to possible technology options, but it should be noted this is not an exhaustive nor comprehensive list.

Figure 1: Technological options and examples from Computational Viewpoint

Domain model & profiles

SEEGrid adopts the pragmatic approach of building in capability to integrate legacy technologies, whilst preserving the maximum semantic interoperability for the future applications. A consequence of this is a requirement for partial implementations of application schema using limited profiles of the GML technology, compatible with the limited search interfaces available through current software implementations. For example, the "GML Profile Level 0", recently adopted by OGC, is designed to make simple geometry centric WFS's more consistent. This particular tool won't resolve the challenges facing SEEGrid, but does point the way to an acceptable approach.

SEEGrid will thus need to establish, per application domain (i.e. many such activities with overlapping resources)

  • scope of GML application schemas
  • Feature Type Catalog (including additional relationships) for target schemas (i.e. the normative "Domain Model")
  • Governance framework for Domain Model
  • permitted simplified profiles ("information products")
  • data access services, including custodial and service level agreement provisions
  • mappings to relevant overlapping domains
  • conformance testing arrangements
  • sample queries and/or responses for development of interoperable

Reference Implementation

Publishing to an interoperable schema from a private storage schema

Data sources for a WFS implemetation can potentially be from either databases, typically relational, or existing GIS data sets. However existing GIS data sets are in many cases likely to be based upon data held in relational databases and best practice suggests that storing original or source data in a controlled database is a better management strategy than trying to maintain data in a format that is prone to uncontrolled mutation. Additionally, in the context of geosciences, target community schemas are complex and can not be effectively represented in a flat GIS data structure. Thus the following discussion focuses on relational databases as a data source, although the principles may also apply to other sources.

For interoperability within the community, it is critical to avoid proliferation of undocumented "profiles" that amount only to the dumping of "private" storage schema (table designs) into the Web environment. A basic capability required is the ability to deploy a WFS that sources data from an arbitrary relational database schema and publishes it according to a community endorsed GML schema (see discussion in InformationViewpoint#Service_interface_to_Features_wh). This step may be seen of as creating an abstraction layer between the WFS response and storage schema.

The WFS specification is agnostic about the relationship between the data sources and the public schema, and most existing WFS software does not make this distinction. Effectively, this means to publish according to a community schema requires copying data from an established corporate system into a schema that mirrors the community schema, and deploying the WFS on top of that instead of the corporate system. The SEEGrid project is sponsoring enhancements to the open source WFS reference implementation ("GeoServer" - http://cite.occamlab.com/reference/) to support live mappings instead. This will provide a basic capability to establish interoperable WFS services supporting community GML application schemas, such as using the "observations and measurements" pattern.

Query and Response: real-world queries against complex databases

Another extension is required to link the search (query) schema and the response (public) schema. It may be critical to understand which types of queries can be efficiently supported by the underlying database. This is not currently part of the WFS specification, but can be supported in a number of ways, including registering "query profiles" of Feature Types.

(An example from another domain: A WFS might support a "phone book entry" feature type, but the simple query profile might be limited to allow only name and suburb to be used as query parameters. This is necessary to prevent use of the service to do reverse lookups, which is illegal in Australia. The client should find query profiles of "phone book entry" and then use these to construct permissible queries against a WFS that advertises just "phone book entry" response types.)

Another typical scenario of more direct relevance is the ability to handle some types of queries against complex data structures efficiently, where other queries might cause unacceptable load or not provide useful responsiveness.

In this case query profiles may be managed simply through a catalogue, so this is another example of using a registry to maintain an ontology of related objects, which can then be used to construct service chains. There is a strong semantic mapping between the possible service chains and the relationships in the ontology design, but the whole approach is quite simple and extensible.

Within the SEEGrid demonstrator phase query profiles will be used to initiate service chains - i.e. the user will be asked to interact with a specific query form, not a complex data object. This will be implemented in a scalable implementation, driven by semantics of registered objects within the WebMap Composer spatial portal toolkit, which provides all the tools required for this type of user-focussed workflow management.


Back to RoadmapDocument
Topic revision: r2 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).