"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Geospatial Metadata and ebXML Registry

Contents

Related pages



The debate

Use of the OGC Catalogue for more than traditional "dataset metadata" has thrown up some challenges. In an attempt to work around these, and also to move closer to the IT mainstream, a number of other registry models have been considered. Most prominently, the OGC recently adopted ebXML Registry Information Model as the preferred technology for developing new metadata profiles.

However, ebXML Registry is a very general solution, with a correspondingly abstract model. Furthermore, it is a coupled Registry-Repository model, which is somewhat alien to the conventional view of "metadata". Making this real for specific applications requires a "profile" and there have been a lot of misunderstandings about exactly how this would be done. This has led to some concern that the "traditional" metadata use-cases are being sidelined.

I believe that some of the concerns are due to a fundamental misunderstanding of the role of the Repository element of the ebXML Registry. Here's some notes clipped from an email I wrote as part of a recent thread on this topic.

The Repository is important

The discussions seem to not be properly taking into account a key pattern that is implemented by the ebXML Registry model.

Essentially it allows us to divide our consideration between four kinds of artefacts:
  • 1 resources that we don't know about
  • 2 resources that we know about but don't manage
  • 3 resources that we manage
  • 4 the index.

In the ebXML Registry these are implemented as follows (reverse order):
  • 4 Registry Objects (RO)
  • 3 Repository Items (RI)
    • each has a corresponding Extrinsic Object (EO) (a specialized RO) in the registry,
    • the RI may only be accessed via the RO (i.e. the EO) - this ensures the integrity of the resource management rule
  • 2 resources in other "repositories" (e.g. on other servers) that we know about but don't control
    • each has a corresponding External Link (EL) (a specialized RO) in the registry
    • synchronization between the EL and the resource is not guaranteed
  • 1 other resources
    • we let Google index these for us wink

The Extrinsic Object is effectively an indexed proxy for managed resources. The External Link is an indexed proxy for non-managed resources.

Application to metadata records

In the metadata game, the assumption has conventionally been that the complete metadata record is indexed. So people have often looked at how to jam all of the many fields from an ISO 19115/19119 "record" into Registry Object(s).

However, this may be confusing the various different roles of metadata. In particular, not all of the information in an ISO record is related to discovery.

The ebXML Registry model enables another approach. The ISO metadata record can be treated as a "managed resource" in its own right, rather than as the index artefact. i.e. the ISO "record" (encoded as an ISO 19139 document) is a Repository Item. Then a subset of the information it contains is harvested into its associated Extrinsic Object - in particular the information that is useful for the discovery purpose. That gives you the "quick look" index.

[N.B. An Association (another specialized RO) keeps track of the link between the ISO Record (via its EO) and the dataset that it describes (via its EL). ]

For more detail about a specific record, such as the non "discovery" metadata, you do another transaction on the Registry to obtain the full metadata record in all its ISO 19139 glory - i.e. the Repository Item corresponding to a discovered Extrinsic Object.

Under this model, the high-performance index is deliberately limited to containing a subset of information harvested from the rich metadata records. I believe this is the intention of the "Dublin Core" query set in OGC Catalogue. Following Dublin Core principles this is "cross-domain discovery metadata". In a GI profile we may choose to promote some more information up to the index, but IMHO its kinda missing the point of the ebXML Registry to imagine that the entire ISO metadata record should be promoted into a Registry Object(s).

Furthermore, an ISO 19115/19139 record is often not fully self contained: the record may be "normalized" whereby some of the associations might be better implemented as "links" or "references" to other first-class resources. This raises the issue of setting rules concerning de-normalization for discovery, how many links to traverse, etc. (This is particularly hairy when considering the anyText case! You could end up denormalizing a major fraction of your database!!) Some aspects of this are actually made easier by mapping the ISO metadata Object Model onto the ebXML RIM. But again, we should be wary of developing gory ebRIM profiles that are just another encoding for ISO 19115/19119. Since we now have a nice XML document format for metadata (ISO 19139) then let's manage ISO documents as artefacts in the Repository, and pull a subset of this out into RegistryObjects for indexing.

Finally, the model does provide a scalable way to associate metadata records with the resources that they describe, but which are managed externally. An external resource is recognised in the registry through an ExternalLink object, and a managed resource is recognised through an ExtrinsicObject object. An Association object (another kind of ebXML Registry Object) records a link between any pair of Registry Objects, so a full metadata record is associated with a dataset using an Association object which has an ExtrinsicObject and ExternalLink as its two members.

-- SimonCox - 19 Jan 2007/9 Mar 2007

Further to the idea of a registry entry pointing to the full ISO record, this seems to me to easily allow multiple entries pointing to the same record. Thus, more tightly-focussed and useful entries may be created, umm, just like library card catalogues!

-- AndyDent - 08 Feb 2007

Issues: implementing flat queries

Something that is also overlooked in catalogue services but is an essential part of the Z39.50 and OGC abstract catalog models, is the notion of abstraction. It is precisely this extraction of 'flat' query points from a richer structure that enables discovery, including search on non-text properties such as geographic coordinates and dates and results in the presentation of the less-than-full record and ultimately the full record if it's of interest. What is still not clear to me is how consistently ExtrinsicObjects might be queried, though this seems a possibility. The Catalog WG has realized that any widespread useful adoption of the ebXML RIM approach will require some carefully scripted, illuminating examples or use cases to inform all aspects of the provider/consumer interaction with the registry. As you say, the model is very capable but capability and diversity of interpretation won't necessarily lead to technical interoperability.

-- DougNebert - 14 Mar 2007 - 02:17

The way I see this is * given a standardized ebRIM profile for GI data (so we know which items of info are found in which ROs) * standard GI-queries should be enabled by matching ebRS "Stored queries"

But, as you say, there must be some worked examples to illustrate this.

While being generally an ebXML booster, one of my chief concerns in the OGC context is the neglect of the ebRS side of the equation. This is surprising really since the primary OGC focus is interfaces, not information models.

In this spirit, I suggest that an ebXML Registry profile of OGC Catalogue should focus on an ebRS binding (i.e. implementing the CS operations including core-queries as ebRS stored-queries) ... alongside the http, z39.50 and CORBA bindings. Of course the ebRS query depends on the ebRIM profile used, but making the query the goal would focus work better.

Then a CS implementation could be layered over an ebXML implementation, with the latter just serving to abstract the general registry function from the persistence layer (typically a RDBMS of course).

-- SimonCox - 14 Mar 2007

I now see that I got tangled up in mechanics, and perhaps didn't see the issue that you were raising. A "flat" query also implies not needing to know the resource structure.

This is a special-case of decoupling the query and response model.

-- SimonCox - 21 Aug 2007
Topic revision: r9 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).