"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Information viewpoints and subsets

Contents

Related pages


A specific set of information may be used for many purposes. Depending on the precise use-case, it may be convenient or necessary to re-factor or subset the information available.

For example, consider the information associated with a set of geochemical analyses. A summary of the results might tabulate concentrations of analytes against specimens.

  • Tabulation of geochemical assay results:
    table-views.png
The boxes highlight three other views consisting primarily of subsets of the information:

  1. Each cell contains a result for a single analyte using a single procedure for a single specimen
  2. Each column describes the variation in the concentration of one analyte as a function of specimen
  3. Each row describes a single specimen in terms of its set of analyte concentrations
Teasing these out a little more:

Cell or "Observation" view

The table "coordinates" of a single result are the analyte/method, and the specimen. For complete understanding it will also be necessary to have access to additional metadata relevant to a result, such as a description of the method, instrument, operator, time and date, uncertainties, etc.

This view of the information might be characterised as the observation view, focussing on all the information concerning a single estimate of a single property. The observation event may be modelled as a Feature, where the specimen, result, method, etc are each a GmlProperty of the observation feature.

In a "GML" representation this might appear:

<Observation gml:id="ABC-123-Cu-A">
   <subject>
      <Specimen gml:id="ABC-123">
         <location>
            <Point gml:id="pon456"><pos srsName="gda1994">-33.0 145.8</pos></Point>
         </location>
         <samplingMethod> ... </samplingMethod>
         ... a bunch more properties of the specimen on which the observation was made ...
      </Specimen>
   </subject>
   <method>
      <Procedure gml:id="assay-a">
         <description>Assay procedure a which uses instrument I369 by company C621 at their office L890</description>
      </Procedure>
   </method>
   <observationTime>2003-12-10T16:35:30.00Z+08:00</observationTime>
   <property xlink:href="urn:iugs:ontology:materials:concentration:Cu"/>
   <result uom="percent">3.45</result>
   <quality> ... </quality>
</Observation>

The observation focus is typical during data capture, for insertion of a result into a data-store, and for evaluation and quality control of single results. Thus, the "observation" view is most important in the earlier stages in the value-adding chain. Information provided in this primitive, fully-normalised view can be readily composited to construct the row- or column-oriented summary views, so this view may be considered the most generic.

The table of Geochemistry Results supplied by GA follows this pattern.

Column or "Coverage" view

The variation of a property as a function of location is sampled by the values in a single column. This view of the information corresponds with a coverage, which is a spatial function. It may be represented

  • as a set of simple features each having a location plus one other property
  • using a special coverage feature, which uses a more compact notation in which the values in the domain and range are grouped separately.
In a "GML" representation this might appear:

<Coverage gml:id="cub345">
   <metaDataProperty>
      ... description of data capture method, processing and interpolation (if necessary), etc ...
   </metaDataProperty>
   <domain>
      <MultiPoint gml:id="d345">
         <member>
            <Point gml:id="pon456"><pos srsName="gda1994">-33.0 145.8</pos></Point>
         </member>
         <member ... />
         <member ... />
         <member ... />
      </MultiPoint >
   </domain>
   <range>
      <BandList>
         <measureList uom="percent" property="urn:iugs:ontology:materials:concentration:Cu">4.23 3.12 1.02 ...</measureList>
      </BandList>
   </range>
</Coverage>

(The values of Cu concentration are understood to be listed in the same sequence as the memebers of the domain.)

A coverage may also be considered as an elaboration of the observation view, in which the domain corresponds to a composite subject (e.g. set of grid cells) and the range to a composite, but homogeneously-typed, result (i.e. a value for each component of the domain). This view is common in imagery, where the method, instrument, operator, etc are standard metadata.

The coverage focus is typical during exploratory data analysis, anomaly or "feature" detection. Information refactored in this way is "data prepared for analysis".

Row or "Feature" view

Each row describes the values of all the properties (of interest) for a single specimen. The specimen may be modelled as a Feature.

In a "GML" representation this might appear:

<Specimen gml:id="ABC-123">
   <position>
      <Point gml:id="pon456"><pos srsName="gda1994">-33.0 145.8</pos></Point>
   </position>
   <property propertyType="urn:iugs:ontology:materials:concentration:Au" uom="ppm">1.23</property>
   <property propertyType="urn:iugs:ontology:materials:concentration:Cu" uom="percent">3.45</property>
   <property propertyType="urn:iugs:ontology:materials:concentration:Cu" uom="percent">4.23</property>
   <property propertyType="urn:iugs:ontology:materials:concentration:As" uom="ppm">0.5</property>
   <property propertyType="urn:iugs:ontology:materials:concentration:Sb" uom="ppm">0.34</property>
</Specimen>

This "Specimen" description is essentially the same as the Specimen that was the target of the Observation above, but we have used multiple "soft-typed" property elements to carry the chemistry data. In the example here there are two values for the copper concentration, but the level of detail provided by the value of the "propertyType" attribute does not expose the fact that the analytical method differed.

The Specimen focus primarily concerns a description of the location from which the specimen was taken. The "specimen" view may be used at several stages during processing, but probably most importantly to investigate the full details of an identified anomaly. Information may be extracted from a set of specimen-oriented views to construct the property-oriented or coverage view provided the property of interest is recorded for each specimen. However, information provided in the specimen-oriented view does not support easy disaggregation into the cell or observation-view, and is normally incomplete anyway.

The table of Geochemistry Results supplied by WA follows this pattern.

Alternatively, the description of a Specimen may indicate that the values of the properties were obtained through observation, by explicitly attaching a set of Observations to it:

<Specimen gml:id="ABC-123">
   <position>
      <Point gml:id="pon456"><pos srsName="gda1994">-33.0 145.8</pos></Point>
   </position>
   <relatedObservation>
      <Observation gml:id="ABC-123-Au">
         <method>
            <Procedure gml:id="assay-Au">
               <description>Assay procedure a which uses instrument I370 by company C621 at their office L890</description>
            </Procedure>
         </method>
         <observationTime>2003-12-10T16:35:30.00Z+08:00</observationTime>
         <property xlink:href="urn:iugs:ontology:materials:concentration:Au"/>
         <result uom="ppm">1.23</result>
         <quality> ... </quality>
      </Observation>
   </relatedObservation> 
   <relatedObservation>
      <Observation gml:id="ABC-123-Cu-A">
         <method>
            <Procedure gml:id="assay-a">
               <description>Assay procedure a which uses instrument I369 by company C621 at their office L890</description>
            </Procedure>
         </method>
         <observationTime>2003-12-10T16:35:30.00Z+08:30</observationTime>
         <property xlink:href="urn:iugs:ontology:materials:concentration:Cu"/>
         <result uom="percent">3.45</result>
         <quality> ... </quality>
      </Observation>
   </relatedObservation> 
   <relatedObservation ... /> 
   <relatedObservation ... /> 
   <relatedObservation ... /> 
</Specimen>

The "Observation" values are encoded essentially the same as the Observation feature above, except that the "subject" property is omitted because it is implicit from the context (its parent).

Clearly encoding this all inline in a single document is getting a little baroque, but the model is consistent. This last case is perhaps more appropriately managed by using the alternative version of GmlProperty encoding, in which the descriptions of the observations are normalised into a different document or service:

<Specimen gml:id="ABC-123">
   <position>
      <Point gml:id="pon456"><pos srsName="gda1994">-33.0 145.8</pos></Point>
   </position>
   <relatedObservation xlink:href="http://my.big.org/observations/ABC-123-Au"/>
   <relatedObservation xlink:href="http://my.big.org/observations/ABC-123-Cu-A"/>
   <relatedObservation xlink:href="http://my.big.org/observations/ABC-123-Cu-B"/>
   <relatedObservation xlink:href="http://my.big.org/observations/ABC-123-As"/>
   <relatedObservation xlink:href="http://my.big.org/observations/ABC-123-Sb"/>
</Specimen>

Conclusion

There is almost never a single "correct" way to represent information. Business requirements vary, and components can be reorganised and re-factored to make them "fit for a purpose". Different aspects of the information are emphasized depending on what operations are to be performed. In the example here,

  • the "observation" view puts the data-capture-event at the centre,
  • the "coverage" view supports the assessment of the variation of a property,
  • while the "specimen" view focusses on complete description of a location.
For other types of information different model variations will be applicable.

A corollary is that, when providing a data-service, it is wise to consider what your clients may want the information for. If it is expected that they will perform a variety of analyses, then it may be worth providing a variety of "views" of your data. As shown above, the same data-store can underlie several different data models. So, for example, one geochemistry database could have two Web Feature Service interfaces (or one WFS with two feature-types) serving observations and specimens, plus a Web Coverage Service. In this way additional value is added by the service provider to their data holdings purely by publishing the data differently for multiple use-cases.

However, implementing multiple service interfaces requires explicit support of SchemaMapping from the private schema (e.g. database table structure) to the public view(s).
Topic revision: r22 - 03 Mar 2011, SimonCox
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).