"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"
Attached is my first set of observations and notes on trying to map data into a schema equivalent to that of the target document.

The target document is which is documented here: GeochemistryMeasurement

My attempt at creating an equivalent GeochemSpecimen feature type - GeochemSpecimen.xml

My attempt at creating an equivalent GeochemMeasurement feature type - GeochemMeasurement.xml

Both of these were generated by querying on a specific sample id. I could have equally have done this by querying by analyte, but by sample id keeps things simple.

My comments are in the attached document . It's a first go at it so please feel free to point out what else I might be able to do to get closer alignment between the examples I've generated and the target schema. Overall I would say the biggest problem we have is the inflexibility of the mapping from a result set to the schema elements (this is what Pete and Rob A had mentioned to me recently and is elaborated on in the attached document). I'll also point out that I'm not trying to be negative, just trying to illustrate the gaps between our goal and what we can actually achieve. And in that spirit I've attached two examples of responses, based on Peter's intuitive guesses of what a generic non-xmml compliant geochem schema might look like, so that Simon and Rob W can actually see where SCO have managed to get to with the Geoserver extensions.

A response by getting "all" analyses of Ag (dataset is limited to 5 sites) - MultipleSiteAgGeneric.xml

A response getting all analytes for a single specimen/sample - SingleSiteAllGeneric.xml

Other schemas Simon has in XMML may be more simple to map to than the current target, but I suspect we may end up with some kind of compromise given how complex the issue of mapping from a result set to a complicated xml document is. Hopefully the generic documents give an idea of what is currently possible.

Rob and Pete - for your benefit I've attached the info.xml and schema.xml documents I used in creating the GeochemSpecimen and GeochemMeasurement feature types. Again please feel free to point out anything else I could do to more closely resemble the target document.

Geochem Specimen files - info-gs.xml, schema-gs.xml

Geochem Measurement files - info-gm.xml, schema-gm.xml

-- StuartGirvan - 27 Sep 2004

Use of deprecated GML components

The example documents use some GML components that are deprecated in GML 3, viz:

  • gml:coordinates - use gml:pos and gml:posList instead
  • fid - when used to assign a handle or primary key - use gml:id instead

Incorrect use of GML components

  • fid - when used carry a reference or foreign key - use xlink:href instead

Comparison with XMML

GeochemMeasurement.xml

This is pretty close to the xmml:GeochemMeasurement, except that Geochem looks like a redundent extra tag - the first child element GeochemMeasurement should be promoted to be the container instead. Then the rest of the properties are pretty close to correct. The main issue is that the element content mostly consists of a text literal ("Unknown") which is incorrect. It must either be the appropriate element content, or you must use an xlink:href to point to the value. In the examples I have indicated how the link can be to an "exception" value if necessary.

Note that the XMML examples (GeochemistryMeasurement#Measurement_view_procedure_orien) bundle all the normalised information into a single XML document. This is for illustration only. A scenario analysis may show that a single feature-type would be requested first (xmml:GeochemMeasurement) resulting in a homogeneous feature collection. The instances in this collection would then carry links to external supporting feature (e.g. xmml:GeochemSpecimen, xmml:Station) and non-feature (xmml:AssayProcedure) objects. Each link might be in the form of a WFS request for feature-by-identifier, which would be obtained through further hits on the WFS. The client or middleware would have to choreograph whether the secondary calls happen immediately and the results cached, or only when needed, following the "hyperlink" approach.

See GeochemistryMeasurement#Mixed_type_feature_collections for discussion of multiple feature-types.

GeochemSpecimen.xml

This one is pretty baffling - not quite sure what you are trying to do.

MultipleSiteAgGeneric.xml

SingleSiteAllGeneric.xml

  • Geochem appears to be a location-oriented container element, approximately equivalent to xmml:Station. See this variant example https://www.seegrid.csiro.au/subversion/xmml/trunk/Examples/geochem/GA_1_measurementsSpecimensAndStations.xml where I have added in this extra feature type. It requires another xlink:href to tie the gml:position of the Specimen to the gml:position of the Station.
  • Rock == xmml:GeochemSpecimen
    • SampleId == gml:name
    • QualifiedLithName == xmml:material
  • Analysis == xmml:GeochemMeasurement, but much of the content should be moved into an xmml:AssayProcedure object.

See GeochemistryMeasurement#Non_feature_objects_needed for discussion of how to handle the xmml:AssayProcedure objects.

-- SimonCox - 27 Sep 2004

More Feedback

OK let me explain a bit more on the GeochemSpecimen.xml attempt.

1) The <place><gmlLocationString> nesting in the example can't be replicated because in the current extensions you can't arbitratily map a value to a child element unless it actually has a "data" parent value. In this case I think <place> would need to have some value.

What does 'has a "data" parent value' mean? - SC

I made it up because we're working in a bizarre half relational - half tree structure world. I meant that if you have a "column" in a result set you can only map it to say <place><gmlLocationString> if you have previously mapped another, ealier "column" to <place> (which implies a "parent-child relationship", hence my strange name). There's quite a bit I've left unsaid here because it requires a long discussion based on the schema.xml and info.xml files used in the configuration for the feature. - SG

So is the problem merely the fact that the value is wrapped in two tags instead of one? The reason for this is the GML Object-property model - see https://www.seegrid.csiro.au/twiki/bin/view/Xmml/GmlImplementation - SC

2) We can't add in attributes or xlinks to elements as yet so the relatedFeature bits don't exist in either of the schemas. This is obviously a major issue. As Simon has pointed out, if we can use xlinks cunningly we may be able to get around some of our other limitations.

3) I can change analyteType to material.

4) This is the important and difficult bit. The realtedObservations for a specimen in the target example are a simple listing that effectively point to the relevant measurement id:

  <relatedObservation xlink:href="#GA_1_90980153_Ag" /> 
  <relatedObservation xlink:href="#GA_1_90980153_Al" /> 
  <relatedObservation xlink:href="#GA_1_90980153_As" /> 
  <relatedObservation xlink:href="#GA_1_90980153_Bi" /> 
  <relatedObservation xlink:href="#GA_1_90980153_Ca" />

My attempt at replication this looked like:


<xmml:Analysis>
  <xmml:Result fid="779943">
    <xmml:Analyte&gt;SiO2&lt;/xmml:Analyte>
  </xmml:Result>
  <xmml:Result fid="785671">
    <xmml:Analyte&gt;TiO2&lt;/xmml:Analyte> 
  </xmml:Result>

The most important point is that the result id in my attempt is effectively the equivalent of the measurement id given in the relatedObservation xlink:href in the test example. The current extensions mean that I can't show the resultid (which as Simon has pointed out works like a primary key/foriegn key in a relational structure) without also exposing the analyte of the result - more generically you can only show an id that's like a primary/foriegn key in an oracle table if you expose at least one other column (element in XML terms) from that table.

Additionally you can't seperate the fid (the foriegn/primary key) from it's element.

I also don't seem to be able to get rid of the xmml:analysis tag. I don't understand why because it's not in my schema.xml or info.xml files. It may have something to do with the fact that in our relational model the result id is a "child" of analysis. But there's no oracle column or table name called Analysis so I'm not exactly sure what's going on - maybe something is getting suck in memory?

-- StuartGirvan - 28 Sep 2004

In your discussion you refer consistently to "primary/foreign key" as if this is a single concept. In the GML pattern these roles are very clearly distinguished:
  • the value of gml:id (was fid) assigns a handle or "primary key" to an object;
  • the value of xlink:href makes reference to an objects via its handle, so the value of the href is like a "foreign key".
I find it really hard to follow your argument unless these are teased apart.

In the example perhaps you are using "fid" to carry a reference (foreign key). Actually, "fid" is used for the other role, to assign a handle (primary key) (though "fid" is now deprecated in favour of gml:id). SG - Exactly!! - see below

-- SimonCox - 28 Sep 2004

The primary/foriegn key thing depends on which table the key is in in relational terms, so one table's foriegn key is another table's primary key (or in GML one feature's foriegn key is another's primary key)

not quite - in GML one property's href is (related to) another Feature's id - this is important - see https://www.seegrid.csiro.au/twiki/bin/view/Xmml/LabelsAndHandles and https://www.seegrid.csiro.au/twiki/bin/view/Xmml/GmlImplementation#Implementation_of_properties - also, in the web-architecture, ID's and URI references are symmetrical but not identical - SC

- but it doesn't actually matter which one you select when retrieving a result set from a database when using a join (which is why I was being free and easy with the nomenclature - must remember, especially after yesterday's discussion, the non-relational perspective :-)).

At the moment though the closest we can get to creating a list of xlink:href values or "foriegn keys", which is what we really want to do, is to generate a lot of fids (gml:ids ) - the "primary keys" (actually being populated with the foreign key values) in GML terms - and we are forced to generate each with at least one other attribute of the object the primary key belongs to.

It would actually be easier (currently) to produce all of the specimen feature's measurements as objects within the specimen. Which is basically what - SingleSiteAllGeneric.xml does.

-- StuartGirvan - 28 Sep 2004

My latest attempts at generating equivalents to the target schema are attached:

GeochemMeasurement3rd.xml

GeochemSpecimen3rd.xml

And here's the feedback for Rob and Pete - Feedback on Geoserver extensions 2.doc

-- StuartGirvan - 01 Oct 2004

Here's the latest feedback for Rob, Pete and Simon - it's a powerpoint presentation so that you can get an immediate view of the differences between the target schema and what I've been able to achieve so far.

geosfeedback.ppt

The only bit of additional feedback is to reiterate the problem with only being allowed to pass in a single parameter to a single placeholder in the SQL statement.

Here's the dilema:

Say I want to get all the Geochemistrty data for an area where Au values are > 0.5 ppm. I would need to send parameters x=Au (analyte)and y=0.5 (analyte_result). In the SQL statement for the feature I'd need to be able to say

where ........ analyte=x and analyte_result > y

We can't do that because we only allow one paremeter to be passed in (either x or y).

Additionally if I wanted to get all the data for a Sample whose ID I already knew (say 10000, and we pass it as parameter x) I would need the SQL statement to have something like

where ........... sample_id=x

Because at the moment you can only define a feature with one SQL statement and one place holder for an argument, I'd have to have two different feature type services defined to deal with the two differenet scenarios described above (even if you could construct the first one in some manner).

I guess this could all be solved if it was possible to add the SQL generated by the xml filter before the Order By clause (which is in turn crucial for the schema mapping) in the constructed SQL statement, but I'm assuming this is pretty difficult from what Pete has told me.

-- StuartGirvan - 19 Oct 2004

Here's my comments on Stuarts PowerPoint:

  1. The pattern
 <propertyA>
    <ObjectB ... />
 </propertyA>
is pretty fundamental to GML, going back to GML 2. So this restriction is not acceptable in general. However, in this specific case, gml:LocationString/LocationKeyword is a little wierd since they are Object level elements with simpleContent, which is not the common pattern in GML. So I can see some merit in making have simpleContent. I will make this change in the schema and examples. Thanks for drawing attention to this.

Thanks for that. I guess the problem is pretty fundamental for mapping to any XML but unfortunately there's nothing I can do about the mapping limitations at the moment and I'm guessing that a more flexible solution that can handle this will take a while to come up with as it's a very difficult problem to solve. I'm not expecting a solution in time for the end of November but it would be good to get clarification from Rob A and/or Pete as to whether or when this is likely to get done? I'm expecting an answer of "Not at all" as part of this project or at least not for a while yet. But I agree it's fundamental. -- StuartGirvan - 20 Oct 2004

I'm not sure it is an issue... the mapping limitations center on the use of multiple-value properties. So
<a>
   <relationshipX>
        <b>
              <bproperty1>
                <bproperty2>
        </b>
   </relationshipX>
   <relationshipY>
        <c>
              <cproperty1>
                <cproperty2>
        </c>
   </relationshipY>
   <relationshipY>
        <c>
              <cproperty1>
                <cproperty2>
        </c>
   </relationshipY>
<a>

is fine - as long as object b appears once only, and its properties do not require multi-valued nesting -- RobAtkinson - 20 Oct 2004

  1. coordinates vs pos - This is fair enough from one POV - the current official version of WFS (1.0) specifies GML2 (though WFS 1.1 will specify GML 3.1). I can't imagine this would be a very hard change to make to GeoTools though - I briefly discussed this with RobA, maybe a GML2/3 "switch" in the request ...?

Need comment from Rob A or Pete on how difficult/likely to get done this is. -- StuartGirvan - 20 Oct 2004

  1. moving LDL to the result element: You can't just go adding arbitrary extra XML attributes wherever convenient!
This is not in the schema. My contention is that the sensitivity is Procedure information, so that's where it should be recorded. When the procedures are recorded in a more orderly manner, then there will usually be many analyteDetails properties for a single Procedure. Now its OK to have a unique procedure per measurement, but we shouldn't be creating new object types (Assay Procedure without analyteDetails) when there is a perfectly good object that already lifts the same load ... So this one I pretty much reject.

No problems. I can get around it I think. -- StuartGirvan - 20 Oct 2004

New build addresses most of the issues below:

-- RobAtkinson - 30 Sep 2004

Topic revision: r2 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).