Units of Measure


Contents

Related pages


GML standard

The Open GIS Consortium recommendation paper on Units of Measure Use and Definition Recommendations explains the rationale for the GML encoding of Quantities. It deals with three issues:
  • the requirement that the unit-of-measure (uom) must always accompany a quantity - resolved with the
    <myQuantity uom="myUnits">123.456</myQuantity> pattern
  • the set of units-of-measure, and symbols for these, must be extensible:
    • this requires a mechanism for describing the definition of units, including new ones for local or restricted use - resolved with the GML units dictionary (XmmlSchemaRepository:trunk/gml/base/units.xsd)
    • reference to entries from the dictionary are supported through the value of the uom attribute being a URI reference

Note that the abbreviated XPointer version of URI reference ("#anchor") allows the uom reference to be compact when the definition is in the same document as the data.

However, the downside of the URI approach is that familar symbols for units, such as "m" for meters, are invalid. The shortest possible alternative is "#m", but even this requires another element in the same document with the gml:id="m" where a definition or redirection is given. And other well-known symbols for units may not even be legal URI's. There has been widespread "civil disobedience" over this issue, with many people using the conventional symbols.

Furthermore, the ISO/IEC directive requires that ISO standard use the ISO 31 as the basis for units of measure. This has led to a re-evaluation of the XML Schema type of the uom attribute in GML, being undertaken in conjunction with the revision of GML for ISO 19136.

Relevant standards

ISO 31 Quantities and Units (13 parts, plus amendments) is the most recent ISO standard. It provides symbols for units of measure, and a grammar for combining these. However, it has the following limitations for our purposes:
  1. the standard provides recommendations for use of the symbols in print. This includes use of
    • non-latin characters (e.g. Ω for electrical resistance, μ for "micro", ° for degrees of arc or Celsius)
    • superscripts for units raised to a power
  2. normative symbols are provided for the SI base and derived units, the SI prefixes for powers-of-ten, and a small number of additional units labelled "Units used with SI" - these are summarized in ISO 31-0. Informative annexes in the other parts of the standard provide symbols for many more "conventional" units, most of which can be defined in terms of algebraic combinations of the SI units, perhaps with additional numeric factors applied. However, the set of symbols available this way is not comprehensive, and with the "informative" label, the status of many of them is unclear.

NIST Special Publication 811 Guide for the Use of the International System of Units (SI)

ISO 2955:1983Information processing - Representation of SI and other units in systems with limited character sets This appears to be the most relevant ISO standard, specifically aimed at 7-bit us-ascii encoding. However it was "withdrawn" in 2001, and was not replaced by another ISO standard.

ANSI X3.50-1986 Representations for U.S. Customary, SI, and Other Units to be Used in Systems with Limited Character Sets This is essentially an extension of ISO 2955, including additional units of measure used in the U.S. However, it no longer appears to be available from ANSI.

Representation of numerical values and SI units in character strings for information interchanges is an internet draft, which is explicitly aimed at providing a 7-bit ascii enciding for interchange, apparently inspired by the Mars lander fiasco. It advocates the use of the caret "^" for an exponent, the use of "o" for degrees, "u" for the "micro" prefix, and spelling out "Ohm". However, it proposes some rather unusual notation, including a period "." between the number and uom in a quantity, and noting that this requires modifications to the syntax of common programming languages! The draft now appears to be stale, though discussion is being maintained at http://swiss.csail.mit.edu/~jaffer/MIXF .

The Unified Code for Units of Measure (UCUM) from Gunther Schadow at the Regenstrief Institute contains a discussion of the difficulties with the existing standards, including the issues outlined above, plus other ambiguities such as those arising from combination of a prefix and symbol that is identical to another symbol. UCUM proposes both case-sensitive and case-insensitive symbols for a large suite of units of measure. In general these follow the precedents from the earlier standards, with small number of adjustments to remove ambiguities. Section 2.2 provides a grammar for generating additional symbols as follows:
  • "." and "/" are used to combine symbols, with parenthese "(" and ")" to remove ambiguity
  • "deg" for degrees, "Ohm" for Ohm, "u" for micro, "%" for percent
  • exponents are written simply as signed numbers - numbers only occur as exponents within uom so there is no ambiguity (e.g. m.s-1 equivalent to m/s)
  • subscripts are prefixed by the "_" character (e.g. cal_th for thermochemical calorie)
  • annotations are enclosed in curly braces "{ }" (e.g. %{vol} for volume fraction expressed in percent)
  • "customary" units are enclosed in square brackets "[ ]" (e.g. [ft_us] for U.S. foot)
  • where a term is a suffix used to modify the sense of the primary symbol, it is enclosed in square brackets (e.g. m[H2O] as a unit of pressure)
  • the apostrophe "'" is used to separate words when necessary (e.g. [todd'U]).

Overall, it appears that UCUM is the most up-to-date, systematic and comprehensive source available. The main difficulty in using it is its lack of a formal status within any standards-setting context, and the fact that its normative identifier is a rather informal-looking URL - http://aurora.regenstrief.org/UCUM.

Proposal

In the GML Schemas, and related application schemas, replace instances of

  <attribute name="uom" type="anyURI" use="required"/>

with

  <attribute name="uom" type="gml:UomIdentifer" use="required"/>

where

   <simpleType name="UomIdentifer">
       <union memberTypes="gml:UomSymbol gml:UomURI"/>
   </simpleType>

   <simpleType name="UomURI">
      <restriction base="anyURI">
         <pattern value="([a-zA-Z][a-zA-Z0-9\-\+\.]*:|\.\./|\./|#).*"/>
      </restriction>
   </simpleType>

   <simpleType name="UomSymbol">
      <restriction base="string">
         <pattern value="[^: \n\r\t]+"/>
      </restriction>
   </simpleType>

Thus, the URI form is still available, but is supplemented by the choice of a restricted string "UomSymbol".

  • the symbol may not contain any colons or whitespace, and the recommendation in the text of the specification is that the symbol should use the symbols from the c/s column in the tables in UCUM, combined using the grammar given in Clause 2 of UCUM.
  • the URI is a subset of the complete URI syntax, being restricted to forms that start with one of the following:
    • <scheme>:
    • ../
    • ./
    • #
This restriction ensures that the URI can be easily distinguished from certain uom symbols which are valid (relative) URIs. It does not disallow the forms that have been used in the documentation and known implementations of earlier versions of GML.
Topic revision: r10 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).