"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Schema documents, namespaces and validation

Contents

Related pages



The role of schema validation

The GML representation of an Application Schema usually encapsulates many, but not all, of the rules that constrains the model for feature and other object types in an application domain. Certain desirable validation factors cannot be described using W3C XML Schema. For example, co-constraints, where the value of a node is constrained by a value in a sibling branch of the document, require additional processing, such as provided by Schematron. Furthermore, higher-level conceptual constraints, such as those imposed by the GML Feature model and the Object-property encoding pattern, cannot by expressed in the schema.

Other apparently desirable constraints turn out, on further analysis, to be business or dataflow logic that is not appropriate in a static information model. Thus, it is often prudent to resist the temptation to describe such logic in the information model, even if is apparently possible.

Thus, schema validation is usually necessary, but not sufficient, to ensure valid data instances.

Schemas vs schema documents

The role of XML schema documents is summarized in the following extract from the W3C XML Schema recommendation: "An XML Schema consists of components such as type definitions and element declarations. ... XML Schemas can be described in terms of an abstract data model ... To facilitate interoperation and sharing of schema information, a normative XML interchange format for schemas is provided". Elsewhere it states "A schema is represented in XML by one or more ·schema documents·, that is, one or more <schema> element information items. A ·schema document· contains representations for a collection of schema components, e.g. type definitions and element declarations, which have a common {target namespace}. A ·schema document· which has one or more <import> element information items corresponds to a schema with components with more than one {target namespace}".

The "XMML schema" comprises the components that are described in this documentation. The normative representation of the XMML schema uses the XML interchange format provided by W3C XML Schema. The descriptions of the set of components are factored into schema documents, as shown in XmmlSchemaDependencies, where each document gathers together components which are thematically related.

However, while the XML representation of each XMML schema component is normative, the packaging into schema documents is not. Schema components may be repackaged in various ways into schema documents for the convenence of particular application. Depending on the processing engine used, the dependency of components on other components that are being re-used will often impose constraints on packaging into documents.

Namespaces

XML namespaces provide a means to identify the source of definitions of components in an XML document. A side-effect of this is the mechanism for avoiding ambiguity arising from name clashes within XML documents composed of elements and attributes defined by more than one authority. A namespace has a unique identifier.

The identifer for the XMML namespace is http://www.opengis.net/xmml.

Namespace identifiers are commonly constructed with the form of a URI, often a URL. This is convenient, since if people only construct namespaces in a domain that they have some control over, it is a mechanism that allows decentralised construction of namespace identifiers with little risk of clashes. However, it is important to realise that in this context, even though it may look like a dereferencable URL, the namespace identifier is merely a unique text string and there should be no other assumption about its significance or usefulness.

For background, see the W3C Namespaces spec. Also see David Orchards blog entry on this topic. The W3C note on URIs, URLs, and URNs is also interesting.

Versioning

There are signficant design issues, and no universally accepted convention, around the area of schema versions and namespace identifiers. Many authorities, including TBL of W3C, OASIS, UN and the US Navy propose that versioning information (or at least a datestamp) should usually be visible in (namespace) identifiers.

However, respected observers including Dave Orchard and Dare Obasanjo of Microsoft (summarised here) discuss some practical needs for making documents forward compatible with v2 schemas, that seems to point in the other direction. We will be returing to this issue when the opportunity arises.

Within OGC there has been a lot of discussion of namespace identifiers for versions and profiles, particularly triggered by increased use of GML profiles - see Namespaces for versions and profiles of XML Schemas.

Namespace prefixes

Within each XML document a local prefix is assigned for each namespace used, using an attribute in the xmlns namespace such as
xmlns:gml="http://www.opengis.net/gml"
xmlns:xmml="http://www.opengis.net/xmml"
which assign the prefix gml for http://www.opengis.net/gml, and xmml for http://www.opengis.net/xmml. Namespace-qualified names then appear within the document in "colon-ised" form, e.g. gml:description.

The namespace declaration may appear as an attribute on any element in an XML document. This binds the prefix to the namespace for the scope of the element. It is most common for namespace declarations to appear on the root element, so the scope of the declaration is the whole document.

While there may be a "conventional" namespace-prefix for many well known namespaces, the actual prefix is local to a document. Thus, "xsd:element" in one document may be completely equivalent to "xs:element" in another document, providing the prefixes "xsd" and "xs" are bound to the same namespace identifier. Thus, a processor should always replace the prefix with the full identifier internally. Strictly speaking, "gml:description" is a lexical shorthand for the tuple that might be expressed more fully as (http://www.opengis.net/gml , description). The corollary is that a processor should never assume a specific namespace prefix.

Within each document the null string may be used as the prefix for one namespace. In this case the namespace-prefix assignment looks like
xmlns="http://www.w3.org/2001/XMLSchema"
For compactness and readability, the "no-prefix namespace" is usually the namespace contributing the most names within a document, for example the XML Schema namespace itself often uses no prefix in XML Schema documents, so element declarations appear unqualified:
   <element name="Mineral" type="xmml:MaterialType" substitutionGroup="xmml:Material"/>
which is equivalent to
   <xs:element xs:name="Mineral" xs:type="xmml:MaterialType" xs:substitutionGroup="xmml:Material"/>
when the prefix "xs" is bound to the XML Schema namespace.

Significance of namespace URIs

XML namespace identifiers are conventionally encoded as a URI. This is commonly given as a URL, which is sometimes taken to imply a network address for the schema documents. This is incorrect: the namespace identifier is only meant to provide a unique string in order to disambiguate names. However, a RDDL directory may be provided at the network address corresponding to a namespace identifier. This will often support resolution of the namespace to the schema documents that describe components in the namespace. For example, the RDDL document at the URL used as the XMML namespace identifier http://www.opengis.net/xmml provides links back to the XMML TWiki and XMML Subversion repository.

Namespaces and schema documents

All components described in a single schema document are in one namespace, indicated by the targetNamespace attribute on the schema element. However, more than one schema document may describe components in a single namespace. The complete schema is composed using <include> elements, which (effectively) copy the declarations from another document with the same targetNamespace into the current document. Includes are transitive for most processors.

Components from another namespace may be used providing they are made available via an <import> element. Note that each specific external namespace may only be introduced once - if the schema document has more than one <import> element for the same namespace, then only one will be processed. The behaviour of processors is unpredictable when several schema documents with the same targetNamespace are composed, and where each imports the same external namespace. Some processors (Xerces C++) only process the first import statement encountered for each namespace. Thus, it is necessary to ensure that the schemaLocation (see below) that is provided points to a document that includes all the components required from the external namespace by all the schema documents in the include tree.

Namespaces and applications

Within the XML development community there are precedents for assigning either one or several namespaces to a set of schema components for a single application. The use of a single namespace for XMML is consistent with the non-normative factoring of the XML representation of components between schema documents. Note, however, that since XMML reuses GML components, XMML instance documents will contain elements from both namespaces, plus any others that may be imported either directly, or transitively (such as xlink).

UML Packages vs XML namespaces/schema documents

UML uses packages to collect related components in a way which is similar to the packaging of XML component representations into schema documents. However, for the reasons given above, packaging of schema components is into schema documents is non-normative, so there is no necessary correspondence between UML packages and XML schema documents.

Furthermore, within the ISO 19100 series of International Standards, prefixes following the pattern "AA_" are used to distinguish classes from different packages in a way that resembles XML namespaces. It might be useful to preserve this similarity.

Locations of schema documents

Two different attributes concerning the locations of schema documents appear in XML documents:

  • in schema documents, the <include> and <import> elements carry a xsd:schemaLocation attribute. Its value indicates the location of a schema document containing descriptions of components in the local target namespace, and in an external or foreign namespace, respectively.
  • in instance documents, any element may carry a xsi:schemaLocation attribute, though it is common to put them all on the root element. Its value is a sequence of two-member tuples. Within each tuple the first element is an identifier of a namespace used in the document, and the second indicates the location of a schema document containing descriptions of components in the namespace. As with all namespace-related XML attributes, this is scoped to the context of the element that carries the attribute, and its descendants.

In each case the purpose of the schemaLocation tag is effectively to provide a schema document which is the source of components in a particular namespace. The schema document is indicated by its location. The location is given as a URI. This may be in the form of an absolute or relative path, a URL indicating a network-accessible resource, or another URI. Note, however, that a URI resolver may redirect the reference to a local resource - see XML Catalogs below.

Path conventions used in XMML

In the XMML distribution, certain conventions are followed for the design of URI's used in schemaLocation values.

Note that this is the goal. Due to the constraints of the development environment (we do not have Catalog support for all processors), the external locations appearing within the documents in the repository are currently set to relative paths. Thus, to use these schemas directly, your directory structure must be a mirror of the repository tree.

include and import within schema documents

  • where a schema document <import>s another document describing components in a foreign namespace, the schemaLocation URI is absolute, preferably using the path to the definitive repository maintained by the schema custodian except when the reference is to a "stub" schema as described below;
  • where a schema document <include>s another document describing components in the same namespace, the schemaLocation URI is a relative path, except within a "stub" schema as described below;

This strategy relies on stable local relative paths, but allows a set of schema documents for a single namespace to be moved "en masse" while preserving the interdependencies.

Instance documents

  • within data instances, all URI's indicating paths to schemaLocations should be given in absolute form, preferably using a network accessible form.

Note, however, that a URI resolver may redirect the reference to a local resource - see XML Catalogs below.

Stub schemas

As noted above, more than one schema document may describe components in a single namespace. An instance document may require access to several components from a namespace which are not accessible by binding a single schema document to the namespace, in a xsi:schemaLocation attribute. A schema document may require access to several components from a foreign namespace which are not accessible by importing a single external schema document In these cases the following method may be used.

The components are pre-collected in a "stub" schema document, on which the targetNamespace is the required (foreign) namespace, and which is composed otherwise simply of a series of <include> elements - e.g. see XmmlSchemaRepository:trunk/XMML/gml4coverage.xsd. Each stub schema is constructed as-needed for specific schemas in the local namespace, so functionally is part of the distribution package required for the local namespace. Thus the stub-schema will usually be moved as part of the package describing the local namespace, rather than as part of the foreign namespace.

The following convention supports this use-case:
  • within the stub schema, the schemaLocation URI on each <include> is absolute
  • where a schema document <import>s a stub schema, the schemaLocation URI is relative

This strategy allows a set of schema documents for a single namespace to be moved "en masse" while preserving the interdependencies.

XML Catalogs

An OASIS Catalog document may be used to resolve references within XML documents to locatable resources. Most validators support Catalog, provided they are correctly configured. For example, XML Spy users should modify their "CustomCatalog.xml" document to point to local or remote versions of the schema documents, as desired. Other processing environments will use other methods to load the catalog document.

There are two key Catalog use-cases:

Local copies

It may be necessary tp redirect remote URL's to a (local?) copy of the resource which is preferred by the user.

This redirect is useful if
  • the user is disconnected from the network, or
  • wants to validate using a local copy which has been modified in some useful ways (e.g. suppressing deprecated components).

URNs

In some cases the URI for schemas in a foreign namespace is given as a URN, such as

  • urn:opengis:specification:gml:schema-xsd:feature:v3.1.0

In order to allow schema validation, the URN (which is a location-independent name for the resource) must be resolved to a path that the processor can follow, such as the network URL

or a local path, such as

  • C:\Documents and Settings\localUser\My Documents\xmml-dev-svn\trunk\gml\base\feature.xsd

Within:

<uri> elements are used to describe the necessary resolution of each URN to the URL where the current network-accessible schema resides.
Topic revision: r26 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).