Observations and Sampling
Much of the information concerning the natural world that is transferred or otherwise shared is based on Observations and Sampling. These provide the evidence
that forms the basis for the description of features, interpretations and models.
In many cases, the details of how the observations were actually made is of little interest, and it is sufficient to embed the results in the description of a feature. However, in other cases, more information is needed. It turns out that the kinds of information that are associated with reporting of an act of observation are common, across all disciplines, when classified appropriately.
Consider the following statements:
- The 7th banana weighed 270gm on the kitchen scales this morning
- The attitude of the foliation at outcrop 321 of the Leederville Formation was 63/085, measured using a Brunton compass/clinometer on 2006-08-08
- Specimen H69 was determined on 1999-01-14 by Amy Bachrach to be of the species Eucalyptus Caesia
- IR image ASgh67c of Camp Iota was obtained by Aster in 2003
- Sample WMC997t collected at Empire Dam on 1996-03-30 was found to have 5.6 g/T Au as measured by ICPMS at ABC Labs on 1996-05-31
- The X-Z Geobarometer determined that the ore-body was at depth 3.5 km at 1.75 Ga
- The simulation run on 2004-09-09 indicated a pressure reduction of 4 MPa in geologic unit Q at 600 Ma
All of these sentences contain the same kinds of information (though not in the same order). (UML object diagrams for these, using the model described in the following sections, are shown in #Examples
The structure of this information can be summarized by the Observations and Measurements information model known as O&M
. Version 2 of O&M is published in two parts: ISO 19156:2011 Geographic Information - Observations and Measurements http://www.iso.org/iso/catalogue_detail.htm?csnumber=32574
(and also published as OGC Abstract Specification - Topic 20
, available for free at http://www.opengeospatial.org/standards/as
), with an implementation published as Observations and Measurements - XML Implementation
. A provisional OWL ontology for O&M
is available from CSIRO.
The model was derived from some earlier work by Fowler and Odell, described in Fowler's 1997 book "Analysis Patterns". It is summarized in the following UML class diagram:
Notation: the GFI_Feature class is a "wildcard" representing feature types from any application schema.
- Observation model as UML class diagram:
This can be put into words as follows:
| An Observation is an action whose result is an estimate of the value of some property of the feature-of-interest, obtained using a specified procedure
The key insights are
- to separate
- the observation act from
- the procedure
- which may be used for other observations
- the feature-of-interest
- which has many properties, the values of each of which may estimated more than once, at different times or using different procedures
- and to recognise that the outcome of an Observation is a result
- the value of which constitutes an estimate of a value of a property
The observation provides the context within which a specific procedure
is associated with the feature-of-interest
. The procedure may be generic and re-usable, or may be once-only with particular bound parameters.
provides a general functional model and XML encoding for describing observation procedures.
of the result
must be commensurate with the observed property
. For example, it may be a scaled number
(in which case the act is often referred to as a "Measurement"), textual
- including category
(in which case the act may be called Classification or Category Observation), or geometric
(for observations of location, shape, etc). It may be a compound value
with structured components. If the property varies in some way within the feature of interest, then the property value, for which the observation result provides an estimate, is a function
Feature of interest
The concept of feature-of-interest
is key to the Observation model. This links observations to the General Feature Model (GFM) (ISO 19109), which formalizes the notion that a feature
has a type
which is defined by its characteristic properties
. Thus, the observed property
must be found within the type
of the feature-of-interest
of the observation.
A corollary of the GFM is that, at an instance level, every property is ultimately bound to a specific feature. In the context of an Observation this is realized by the mandatory presence of the feature-of-interest. (But see #Unknown_features
for how to represent an observation whose feature-of-interest is not (yet) known.)
By clearly distinguishing the feature-of-interest from the observation, this enables multiple observations to be made concerning the same feature. These may be repeats for the same observed property at different times or using different procedures, or separate observations of different properties.
Separate encapsulation of the feature-of-interest and procedure allow these to carry different location information. This supports a common treatment of in-situ observations
(sensor positioned with the subject, in the world), remote sensing
(sensor separated from subject, but the latter still in place), laboratory observations
(subject taken from the world, and moved to the instrument), simulations
(location and time of the feature of interest far removed from the machine or algorithm doing the simulation), etc.
The spatio-temporal context for analysis of the state of the world is usually inherent in the feature-of-interest
, not the procedure
. Thus, "location" is not
an unambiguous property of "Observation".
Similar arguments apply to time. In general the time of interest for analysis is the time associated with the real-world phenomena. This may be inherent in the identity of the feature of interest, but most often an observation is a snapshot of an observedProperty which varies with time in the feature.
Three time-stamps are associated directly with the Observation class:
- phenomenonTime is the time for which the observation result provides an estimate of the value of the property in the "real-world"
- resultTime is the time when the result became available (if different)
- validTime is the time period during which the result should be used (this is important where the result is a forecast)
See below for some usage examples.
Observed property and result type
The generic Observation model has a wildcard ("Any") for the result. This follows from the fact that features have many types of property. However, it is required that the result-type be commensurate with the observed-property.
For scalar-values properties (e.g. mass, length, density, species) that are constant in value for the feature of interest, the result can be encoded as a scaled number, a term from a dictionary, etc.
For complex-valued properties (e.g. time, position, velocity, weather, water quality, material-composition, colour), the result must be structured to reflect this - e.g.
EPSG:4939 (-31.939, 115.832, 45)
RGB 112, 34, 118
- XML structured data, etc.
- could use GML structures if appropriate, or maybe SWECommon, as described in SensorML
Note that the encoding requires both (a) that the elements are clearly delimited, and (b) that the tuple- or record-structure is indicated in some way.
Some properties vary within the scope of the identified feature. The variation may be a function of space (e.g. colour within a scene; lithology along a borehole; salinity along an estuary) or time (e.g. temperature at a weather station; salinity at a water quality monitoring station; location of a vehicle).
The value of one of these properties must be expressed as a function or coverage
. Hence, the result of an observation of one of these properties is a coverage. This case is particularly important in remote sensing and environmental monitoring. The sampling-manifolds, described below, are usually associated with variable properties.
There are two common ways to encode a coverage:
- as a function from the "domain" (i.e. spatio-temporal extent delimited by the feature) to the "range"
- this form is commonly used for imagery, where the geometry is a grid so can be described in a compact form
- the GML XML schema components for coverages follows this approach
- as a list of "geometry-value" pairs, one for each element of the domain
Note that an element of the range (== "value" in the geometry-value pair) may be complex if required.
The development of WaterML2
in OGC has led to a rigorous examination of O&M. Mostly it has stood up well, but in one area the discussion has been considerable. In water observation time-series current practice is to embed certain 'metadata' related to points within a time-series, such as a flag to indicate different interpolation rules, or to attach specific quality measures or even commentary (e.g. "engineer on site") to individual points within the series. How much of this should be captured in O&M? Opinions differ. Here is a perceptive commentary from Gavin Walker:
O&M starts with the concept of an observation. In its simplest form this is an observation object with a single simple result. Observations can relate to each other, though O&M doesn’t say much about the type of relationships. Where individual observations form a temporal series the related observations could be specialised to include useful information about the relationships between these, such as interpolation parameters, though I can imagine other type of relationships.
Ok. Enter the time series. Here we say that observations form a series I the time domain with a degree of homogeneity. We can take the set of individual observations pull out all the common bits and be left with a series with common metadata. There is nothing in the time series that could not have been done with individual observations. The construct is here because it is a common way of thinking.
How is a series represented? There have been two broad schools of thoughts: Interleaved (i.e time value pairs) and coverages. We are mostly arguing over the former. The latter represents the data in domain and range blocks and is well suited to uniform and dense data sets. Uniform and dense metadata can be represented by additional coverages or masks, while other metadata can be provided in a coverage containing a reference. Interleaved assumes reasonably sporadic data and provides a mechanism to provide metadata inline. The defaulting mechanism, as you pointed out I-Lin, is about identifying blocks of homogeneity in metadata, which is exactly what O&M time series sought to define. Most of the discussions seem to be around encoding style and not conceptual issues.
I think the biggest problem with the idea of time series is it creates ambiguity as two whether an observation is a collection or an object. i.e. separated a sheep from the sheep. While O&M has the concept of sampling feature collection it leaves the concept of observation collection to SOS. Collections of observations is fundamental to water. I think if we describe properly what a observation collection means for water and specialise related observation appropriately then we will get a handle on the concept of blocking.
In many practical cases, observations are not made on the feature of ultimate interest to an investigation, because either or both
- the feature is inaccessible (e.g. concealed, or too large for exhaustive observation)
- this introduces the concept of sampling, whereby observations are made on a subset of the complete feature, with the intention that the sample represents the whole
- the properties are not directly observable (e.g. the feature is remote, or for other reasons does not provides a direct physical signal)
- however, there are sensible properties that may be combined and/or further processed to obtain an estimate of the property of interest
A similar strategy is typically used to overcome both of these challenges, involving the use of a proximate "sampling feature" for initial observations. The sampling feature is accessible and has properties that are sensible. Similar kinds of sampling features have been used to investigate spatial features across various application domains.
Sampling features classified by shape
A primary classification focuses on their dimensionality (points, curves, surfaces, solids).
- Sampling manifolds model:
A number of domain-specific types are shown informatively on this diagram, to demonstrate how they map to the generic model.
"Extensive" sampling features are used to assess the spatial variation of a property value, such as the colour of an image or scene, or the variation of rock type within a borehole. The sampling feature provides a manifold within which observations may be made. Note, however, that the rate of sub-sampling within the manifold is not fixed by the feature geometry. Furthermore, different sub-sampling strategies may be associated with different observations on the same sampling feature.
Another common application of sampling is to take a specimen from the ultimate feature of interest, for ex-situ
observation and analysis.
- Specimen model:
It is common for a single specimen instance to be the feature-of-interest of many observations. Certain properties, such as source location, collection and preparation procedures, are more naturally associated with the specimen, rather than the observation.
A provisional OWL ontology for Sampling
is available from CSIRO. An example RDF encoding of a specimen description corresponding to the example in the OGC XML repository
is shown here:
rdf:type sam:Specimen ;
rdfs:comment "A specimen encoded using the RDF representation of the O&M Sampling Feature model"^^xsd:string ;
rdfs:label "SIO specimen abc123"^^xsd:string ;
sam:currentLocation <http://example.org/various/Warehouse3/shelf9/box67> ;
sam:materialClass ex:rock ;
[ rdf:type sam:PreparationStep ;
sam:processOperator ex:JohnDoe ;
sam:sampledFeature ex:midAtlanticRidge ;
[ rdf:type sam:SamplingFeatureComplex ;
sam:samplingMethod <http://ldeo.columbia.edu/sampling/ghostbuster> ;
[ rdf:type tm:Instant ;
[ rdf:type basic:Weight ;
basic:uom <http://www.opengis.net/def/uom/UCUM/0/kg> ;
sam:specimenType ex:splitCore .
Ted Habermann (NOAA) has also proposed a mapping to an ISO 19115/19139 metadata record.
Relationships of sampling features to other features
There are interesting relationships between sampling features and other features. In particular:
- every sampling feature exists because of an intentionto sample or represent one or more domain features
an ObservationWell (a kind of SamplingCurve) samples one or more Aquifers;
a RockSample (a kind of Specimen) samples a GeologicUnit;
an Outcrop (a kind of SamplingPoint) samples a GeologicStructure and one or more GeologicUnits;
a Scene (a kind of SamplingSurface) samples a Landscape
- a sampling feature exists because observations have been or will be made which utilize it
an ObservationWell may carry a set of Logs (observations whose result is a 1-D Coverage);
a RockSample may carry a set of Assay measurements and Geochronology measurements
- samples are commonly related to other samples, through sub-sampling, as part of an array, etc
Intervals (SamplingCurves) and Specimens may be contained within, or retrieved from, an ObservationWell;
a RockSample may yield a set of Splits or Separates (sub-samples);
a Specimen may be taken from an Outcrop
For example, a specimen must be tied to the domain feature (e.g. an organism, a material etc, as its sampledFeature
), but may also be associated with a Sampling Point or interval, etc, if the details of the sampling location within the domain feature are of interest
- Abstract sampling feature model:
for how to manage the case where the identity of the sampled feature is not (yet) known.
Note that the sampledFeature is expected to be a 'domain feature' - i.e. a real-world feature that is not a sampling feature or observation.
In fact the relationship between observations, sampling features and domain features is subtle, and data providers may make different choices about how much information about the sampling regime to provide to the data consumer. For example, a provider may choose to indicate the domain feature as the observation feature of interest, and bundle the description of the sampling feature with the observation process as a 'protocol'.
- Observations, sampling and protocols:
The observation model requires that a feature of interest be identified, and that the observed property must be part of the definition of the type of the feature of interest. This constraint is required for consistency with the GFM, as described in #Feature_of_interest
In some circumstances, while the type
of the feature-of-interest
of an observation, or of the sampled-feature
of a sampling feature, is clear, its identity
is not (yet) known. In this situation the necessary information can be conveyed by
- indicating the type of the target feature, while
- admitting that the identity of the target is unknown, by using a "null" for the value of the feature reference
In the GML-conformant encoding of O&M this may be accomplished using the standard xlink attributes on the relevant property elements. xlink:href points to the target instance, while xlink:role indicates its nature or type. For example:
<sa:sampledFeature xlink:role="urn:cgi:featureType:CGI:GeoSciML:2.0:GeologicUnit" xlink:href="http://www.opengis.net/def/nil/OGC/0/unknown"/>
<sa:shape><gml:Point> ... </gml:Point></sa:shape>
<om:phenomenonTime><gml:TimePeriod> ... </gml:TimePeriod></om:phenomenonTime>
<om:resultTime><gml:TimeInstant> ... </gml:TimeInstant></om:resultTime>
<om:featureOfInterest xlink:role="urn:cgi:featureType:CGI:GeoSciML:2.0:GeologicUnit" xlink:href="http://www.opengis.net/def/nil/OGC/0/unknown">
urn:cgi:propertyType:GSV:magneticSusceptibility are defined in the CGIIdentifierScheme .
What is this all for?
The Observation viewpoint is applicable when
- the assignment of a property-value to a feature has a finite error
- the "metadata" associated with the assignment of the property-value (i.e. the data capture event) is of interest.
The Sampling Feature view is useful
- to describe the specific sampling strategy associated with an observation
- this would use the Observation/featureOfInterest/SamplingFeature arrangement
- to provide access to a collection of observations made using the same sampling regime
- this would use the SamplingFeature/relatedObservation[*]/Observation arrangement
In some technical applications the sampling-strategy is well known to the practitioners, so it provides a primary access route to the observations. For example "observations of fish occurrence made on that cruise", or "observations of geologic structure made at that outcrop", or "all the observations made on that drill-core". In remote sensing and earth simulations many earth-system properties are sampled in a 2-D scene or 3-D grid, and the grid is "standardized" and shared between users. The ultimate feature of interest is the ocean, atmosphere, earth, or a geologic unit or sequence, etc, but the sampling feature provides a conventional access route to a useful subset of observations related to the domain feature.
See also the discussion of InformationViews
, which shows how the complete "observation" view may be important at certain stages of processing/interpretation, while other views, which take certain pieces of the observation information (usually the result
) and package it differently, are available for other purposes.
Relationship to GML observation
The structure of the O&M Observation model is essentially the same as GML observation. However, the O&M Observation schema differs slightly. In particular,
- the resultOf property - which in GML always contains a GML Object, even for simple values - is replaced by the result property with the type "Any"
- the subject - which in GML may be either a feature or a geometry - is replaced by the featureOfInterest property, whose value must be a feature instance
Observations vs. Interpretations
Some modelling frameworks make a fundamental distinction between observations
as the basis for their information modelling approach. This supports a framework in which "observations" are given precedence and archived, while "interpretations" are more transient, being the result of applying the current algorithms and paradigms to the currently available observations.
An alternative view is that the distinction is not absolute, but is more one of degree. Even the most trivial "observations" are mediated by some theory or procedure. For example, the primary measurement using a mercury-in-glass thermometer is the position of the meniscus relative to graduations; this provides the length of the column, and a theory of thermal expansion plus a calibration etc allows conversion to an inferred temperature. Other observations and measurements all involve some kind of processing from the primary observable. For modern instruments the primary observable is almost always voltage or resistance from some kind of sensing element, so the "procedure" typically involves calibrations, etc, built on a theory of operation for the sensor. But the same high-level information model
- that every "value" is an estimate
of the value of a property, with a finite error, generated using a procedure
- applies to both "observations" and "interpretations". It is just that the higher the semantic value of the estimate, the more theory and processing is involved.
In some instances it may be useful to explicitly describe the processing chain
that takes more primitive observations (e.g. an image) and retrieves higher level observations (e.g. the presence of a certain type of feature instance) through the application of one or more processing steps.
- Object model - The 7th banana weighed 270gm on the kitchen scales this morning:
- Object model - The attitude of the foliation at outcrop 321 of the Leederville Formation was 63/085, measured using a Brunton on 2006-08-08:
- Object model - Specimen H69 was determined on 1999-01-14 by Amy Bachrach to be of the species Eucalyptus Caesia:
- Object model - IR image ASgh67c of Camp Iota was obtained by Aster in 2003:
- Object Model - Sample WMC997t collected at Empire Dam on 1996-03-30 was found to have 5.6 g/T Au as measured by ICPMS at ABC Labs on 1996-05-31:
- Object model - The X-Z Geobarometer determined that the ore-body was at depth 3.5 km at 1.75 Ga:
- Object model - The simulation run on 2004-09-09 indicated a pressure reduction of 4 MPa in geologic unit Q at 600 Ma:
Wells and aquifers
- Object diagram showing relationships between two wells, three intervals, four observations, two sensors, three aquifers:
This describes an observational strategy concerning the contents of three aquifers.
Evert's well samples two aquifers, at separate intervals in the well. Rob's well samples the other aquifer.
Two instrument types are available. The Fooglemeter 2000 is used to analyze water from both intervals in the first well. The Farklemeter XP is used to analyze water from one interval in the first well, and from the second well. These produce a collection of observations.
All observations have the same observedProperty. Only one result is shown in this diagram. It is of type DiscreteTimeInstantCoverage
since the observations are monitoring the observed property over time.
Sampling on a ferry track
proposed the following scenario:
we have been trying to figure out how to represent observations along transects, like water quality measurements made on a ferry going across the Albemarle-Pamlico Sound. In that instance, there is a "track" that can be represented by a line, the data have latitude, longitude, time and a bunch of measured variables in each record.
The Observation and Sampling model provides for (at least) two treatments of this case. In both, the ferry track is modeled as a
whose shape is a
(in this case the specialization called
In the first treatment, there is one
associated with the track. The complexity that multiple observations were made along the track is encapsulated in the
, whose type is a
- Ferry sampling scenario - observation result is a coverage:
In the second treatment, a series of stations (
) are identified along the track, where individual samples were taken. Each of these has a specific
as the value of its
property, and and an explicit relationship with the track. There is one
associated with each
of each is simpler, being a
of the various parameters within the
Note that the
instances have the same
, and all observations have the same
- Ferry sampling scenario - set of observations whose results are simple records:
Borehole and outcrop
- Object diagram showing specimens, outcrops, boreholes and associated observations:
This describes a set of observations concerning some geological features - two units and one fault.
A Christian's Well samples both units, and Steve's Knoll samples one units and the fault. A specimen retrieved from the well samples one unit. A specimen obtained from the knoll samples the other unit.
An observation on one specimen provides an estimate of the age of the sampled unit. An observation on the other specimen provides an estimate of the gold concentration in its sampled unit. Observations made at the outcrop measure the orientation of the bedding and fault.
Assay suite on a specimen collected from an outcrop
Element assays are usually done on a "pulp" that is obtained from a specimen collected in the field. While they both might be considered aspects of the analytical procedure, the details of the procedure used to produce the pulp are usually separated from the analytical procedure (the machine that goes "ping"). Treating it this way we get a series of sampling features (the knoll and two specimens at different states of prep) with various relationships to the ultimate feature of interest (the rock unit).
Note that the Observation/result (the actual assay values) are formatted as a Record, providing values for each of the elements analysed.
- Assay on Pulp from Specimen from Knoll from Unit: