Strong- vs weak- typing for features
See also a discussion of strong- vs weak- typing for properties
Designing a Feature-type catalogue requires that the set of feature types of interest to the community be identified, and content models for these types be agreed.
The number of members of the feature-type catalogue are determined by the breadth of the domain of interest, and the level-of-detail at which modelling takes place.
The Feature Model
The General Feature Model (ISO 19109) defines feature types through their set of properties (attributes and operations).
The set of properties represents the consensus of a community concerned with this feature type: e.g. a "Road" may have the standard properties "width", "centre-line", "classification", "surface treatment" etc.
This approach to classification is the "classical" one
(Lakoff in Women, Fire and Dangerous Things
summarises work in cognitive linguistics that challenges the classical approach.)
In practice, it requires that the type of a feature instance is determined before it can be described.
The feature model may be instantiated in either a "strong-typed" or "weak-typed" (also known as soft-typed) manner.
, the feature type is assigned or determined first, then the necessary properties listed and values provided.
In a XML implementation, this can result in relatively simple feature instances where the feature-type is given by an XML element name, and the properties are subelements with appropriate names.
<description> ... </description>
<surfaceTreatment> Bitumen </surfaceTreatment>
This is the usual GML encoding pattern.
The strong-typing approach is particularly suitable for domains where the items of interest are artefacts that are created or asserted (e.g. constructions, observation features), so their type is known a priori
Otherwise, the strong-typed features may thought of as "snapshots" of features representing the "current interpretation and model".
When applying a full weak-typing
pattern, a generic feature has generic properties, and maybe a property that specifies (or refines) the "feature-type" itself.
The value of this feature-type parameter may be selected from a controlled source (list or authority table), but does not directly affect other aspects of the feature description.
For maximum flexibility the feature has unlimited number of properties which are soft-typed themselves.
For a particular feature type, the weak-typed model should be isomorphic with the equivalent strong-typed model (i.e. with the same graph or tree structure) but the semantics are carried by a slightly different syntax.
In an XML implementation this leads to less tidy instances, where semantic information occurs as element and attribute values rather than in tag names, for example -
<property name="description" type="string"> ... </property>
<property name="surfaceTreatment" type="token">Bitumen</property>
Like the strong-typed equivalent above, the data model for "Road" is exposed in the data instance, through the labelling and nesting of elements.
The weak-typing pattern is much more flexible at "run-time" (i.e. in the instance), but provides no constraints on the structure according to feature-type.
The weak-typing approach is particularly suitable for domains where the classification of items of interest emerges from their properties, or where varying or evolving classification methodologies apply.
The weak-type provided by the data supplier may even be treated as a "hint", and the user may take responsibility for classification on the basis of the properties and their values.
In this encoding, the way that such an feature might be built on-demand by a set of "joins" to a narrow database table of properties (see #Data_source
below) is quite obvious.
Because of this, we can refer to the weak-typed model as more "normalised".
In common with highly normalised database table schemas, the additional flexibility comes with an attendant processing burden.
Typing information == schema information
In the soft-typed pattern the encoding in the data instance is closer to a "schema-level" view.
The typing information in the soft-typed version is essentially a "mini-schema language", using attributes and properties called (or functionally equivalent to) "name" and "type" etc.
For example, the weakly-typed Road instance shown above is almost the same as the W3C
XML schema that defines the strong-typed version:
<element name="description" type="string"/>
<element name="surfaceTreatment" type="token"/>
Another way of thinking about it is that, putting schema-level information in the instance is a kind of "just-in-time" schema.
But if this is all we are doing, then why not generate a just-in-time XML Schema
? - i.e.
- the instance would be strong-typed, but
- the schemaLocation would refer to an XML Schema generated dynamically, for only this instance
The disadvantage of this full-on just-in-time XML schema approach are
- the "feature-type" catalogue becomes largely dynamic, rather than agreed in advance by a domain community, so
- processing software must be fully XML schema-aware, and cannot be pre-configured for specific feature types.
| A sensible "soft-typing" approach permits limited "schema" information at selected positions in an instance. This allows us to retain a set of generic feature-types, for which consuming software can be pre-configured if desired, with flexibility constrained to the necessary, more reasonable level.
Generic feature-types vs. Root feature-types
The different capabilities of XML Schema and UML lead to slightly divergent approaches in determining the higher-level feature types.
The constraint language is an integral part of XML Schema, whereas it is an add-on (in the form of OCL) in UML.
In UML the abstract types at the top of a hierarchy tend to have few attributes, and specialisation simply involves adding attributes.
This corresponds with the XML Schema "extension" mechanism.
In GML languages based on XML Schema we tend to introduce high-level types with many generalised
attributes, and then specialise by restriction
as well as extension.
The "restriction" phase (which can only be implemented using a constraint language in UML) is effectively where weakly-typed, but fully capable, objects are turned into strongly-typed objects.
So in GML the high-level types serve two functions:
- to act as root objects at the top of some derivation tree
- to act as weakly-typed objects which may cover a range of more strongly-typed objects.
These two objectives are usually separated more in the UML-based models: the root objects have semantically meaningful names but very few attributes, whereas if a fully-capable object type is required then it is the end of a derivation chain, not the beginning.
Members of the feature-type catalogue
So how "generic" should our implemented feature-types be?
The examples shown above are in fact merely positions on what will normally be a continuum of types.
For example, the hierarchies:
show sequences in which the set of properties on a sample might be progressively refined.
In the first sequence it is pretty clear that, towards the more specialised end, data that should probably be the values of properties is sneaking up into the type classification.
Such specialised typing would certainly be a scalability and maintenance nightmare.
But exactly where the line should be drawn requires a more thorough exploration of some use cases and other issues.
Consider some example instances of the latter two:
<numericProperty property="frictionAngle" uom="deg">20.1</numericProperty>
<numericProperty property="cohesion" uom="MPa">1.3</numericProperty>
<numericProperty property="density" uom="kg.m-3">2450.</numericProperty>
<categoryProperty property="taste" codeSpace="http://www.ga.gov.au/lists/tastes">sour</categoryProperty>
The second example only contains a subset of the properties that are available and shown in the first.
Let us look at these alternative encodings from a number of points of view.
Data source considerations
In a rock-property database it is unlikely that the set of measurements on each specimen is homogeneous.
Furthermore, it is questionable whether the set of properties that might be measured can even be enumerated.
Thus, most datastores are likely to be highly normalised, with a narrow table with columns such as
In order to report such data in the context of a feature-instance, the following options are possible:
- report all values in a weak-typed feature such as RockWithProperties shown above.
- select the values required to populate a strong-typed feature such as RockWithMohrCoulombProperties
It is not at all clear that there would be a significant burden on the data provider either way.
In fact, in many observational settings, the properties being observed, or the exact set involved in a particular observation, is highly variable from observation to observation.
There are a few cases (e.g. gravity measurements?) where the complexity of the observation metadata, and historical practice, suggest that a strongly typed measurement type is appropriate,
But in general, every geophysics survey is likely to have a slightly different suite of observations, similar for geochemistry.
In principle it is possible to define strongly-typed data structures for each specific measurement suite.
But, just as a "normalised" approach is often taken in storing such data, a soft-typed approach, where the observed property is a variable as well as the value, is more scalable and ultimately more practical.
Data discovery considerations
... registries must be aware of type specialisation hierarchies - must ensure that we find specialised types when looking for more general types ...
Data retrieval considerations
A client for the data is likely to have a purpose in mind for which they require a specific set of properties.
If we assume that the features would be delivered through a WFS, then formulation of requests is the most significant issue.
For a client requiring values of Mohr-Coulomb parameters for a sample, the WFS request against the most strongly typed feature is relatively straightforward:
would be most of what you need. Note that no <wfs:PropertyName> elements are required within the Query clause when the client wants the full set of mandatory properties for a RockWithMohrCoulombProperties
On the other hand, construction of a request in respect of the feature with soft-typed properties is more difficult.
(dunno if I have the namspaces right here)
features, reporting all (?) numericProperty
properties, where the selected features must satisfy the condition that the property
attribute of a numericProperty
property element must match the values "frictionAngle", "cohesion" etc.
Governance and maintenance of community schema
... more detailed models, stronger typing, type-proliferation → more onerous governance and maintenance framework, probably a smaller community ...
... more generic models, weaker typing, fewer types → may support larger, less specialised communities, once the catalogue is designed there will be less maintenance, but requires that users can cope with higher levels of abstraction (back to point-, line- and polygon-features!) ...
Pros and Cons
The use of strong- and weak-typing patterns have various advantages and disadvantages.
Maintenance of the catalogue
From the point of view of the community responsible for maintaining the catalogue of feature types:
- Strong-typing requires that an exhaustive catalogue of feature type definitions, including their property sets, is maintained. Refinement of the application multiplies the number of feature types that must be defined and maintained. Versioning must be handled carefully.
- Weak-typing requires only that a vocabulary of feature-type labels be maintained. Adding an item to a list is easier than creating a complete type definition, so versioning is not so onerous or risky. However, the relationship between classifiers and content models is indirect.
- Under strong-typing, the definition of a feature type is reflected directly in the data structure. For example, using XML the feature type is fully described by the XML Schema for the element representing this feature type. Validity of feature instances is fully constrained by schema validation.
- Under weak-typing, the set of properties is not constrained by structural validity. Feature-instance production is described by rules, rather than a grammar.
- In a strong-typing environment, the consuming software will typically be pre-configured for feature types from the community catalogue. The type of the feature provides an explicit switch for parsing and processing, and the validity of the structure is easily verified. Alternatively, the consuming software may be auto-configured by processing the relevant community schema, in which case access to the latter and knowledge of its schema language is the only requirement.
- Weakly-typed data is more highly normalised, and consumption depends on (often repetitive) application of more generic operations. However, the semantics are indirect, and use of the data typically requires traversal of multiple structures (tables) and reconstruction of data structures requires more processing.
- The status quo has been weak typed structures with varying degrees of formal control over the vocabularies used, and no common technology approach for sharing such vocabularies. This has proven sufficiently ineffective at achieving interoperability that there is wide recognition that alternatives are required.
- It must be determined, possibly on a case by case basis, whether the effort of moving to a formalised semantics and interoperable syntax is easier for a upgrade to weakly typed, better managed patterns (with only incremental benefit) or to a strongly typed pattern (a radical change with potentially large benefits)
- There is, as yet, a relatively small body of experience in managing, and even more particularly using, shared vocabularies. There is less experience in managing strong feature typing regimes. SEEGrid and related projects are directly addressing this issue.
Back to RoadmapDocument