"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

About Cocoon

This is truly a quick introduction on how to read a sitemap, there are a lot of material on Cocoon on the web (Introduction to Cocoon). And the best Cocoon book around is : Cocoon Developer's Handbook

Introduction to Cocoon

Cocoon is a xml integration framework. The whole idea of Cocoon is a collection of component that exchange data in xml. The connection between those component is done through pipelines

A Cocoon pipeline is logic construct made of
  • 1 generator : which is a component that generate SAX events (the most common generator is the ‘file’ generator that reads an xml document on the disk)
  • 0 or more transformers : which are components that alters by various means the stream of SAX events (the most common is XSLT)
  • 1 serializer: which is a component that get the event at the end of the pipeline and and reserialize them – the inverse of the generator job, it takes SAX events and turns them into a file or a stream of bytes. Serializer often don’t do much, it collects the resulting events and merely assigns a mime-type to the result and send it to the caller of the pipeline.

So you can picture a pipeline like this

Generator -> Transformer -> Transformer -> Transformer -> Serializer

Pipeline can then be connected together to do useful work.

Those pipeline are declared in a special file called a sitemap. This file is a collection of component declaration and pipelines. Sitemaps are organised into hierarchies from the topmost sitemap that sits under the cocoon directory (the root) and application specific in sub directories. A typical Cocoon installation is made of a series of directory and sub directories which represent sites and sub sites.

When a url is parsed by cocoon, such as

The topmost sitemap is used to match the portion after cocoon/ (iugs/testbed2/arcims) with a pipeline. In the default installation, if the sitemap does not contain any matching pipeline, it will attempt to read one level lower, expecting in this case a iugs directory and match Testbed2/arcims in the sub sitemap, and so on. (This behaviour can be configured, but this is the default and mostly used).


The key point here to understand is that none of these directory structure really need to exists. They are only convenience to manage your site, but you could as well have all your site in a single sitemap, even is the uri suggest a complex structure of directory. For cocoon, a uri is just a structured string to match against a pipeline.

In my example, my sitemap.xmap file is located in the cocoon/iusg directory and I’ll used default cocoon behaviour : redirect all http://...:8080/cocoon/iusg/… to this sitemap. Therefore, http://localhost:8080/cocoon/iugs/testbed2/arcims will be handled by the cocoon/iugs/sitemap.xmap.

Within the sitemap, there are various way to ‘match’ this uri to a particular pipeline. The sitemap contains a collections of pipelines, like a big java switch statement, that is processed from top to bottom until a pipeline is selected for processing.

Here’s some example of matches

<map:match pattern= “*.html”>

will match

<map:match pattern= “**.html”>

will match

Double start
means any number of directories. Note that application/test are not real directories, nor index.html or testbed2.html are real files. Sitemap are disconnected from the directory structure.

<map:match pattern= “testbed2/arcims”>

will only match

and not

<map:match pattern= “testbed2/arcims/**”>

will match both ! (
means all the rest of the uri string)

Note that matches don’t include the
portion, this portion has already been matched by the higher level sitemap (cocoon/sitemap.xmap)

You must consider the order in which the matches are listed. For example:

<map:match pattern= “**.html”>
<map:match pattern= “testbed/**.html”>

the second pipeline will never be invoked, because the first match already matched it before it had the chance to get to the second one.

<map:match pattern= “testbed/**.html”>
<map:match pattern= “**.html”>
will do the trick,

will be matched by the first one, and

by the second one.

Hello world pipeline

Now, let’s see a full example

       <!-- matches all uri , the middle part can be anything (*) -->
   <map:match pattern="test/*/hello.html">
          <!-- read the xml file-->
     <map:generate type="file” src=”content/greeting.xml”/>
     <map:transform type="xslt" src=”stylesheets/greet2html.xslt”>
                <!-- pass what has been matched by * (the first and only substitution) as a parameter -->
      <map:parameter name=”lang” value=”{1}”/>
  <map:serialize type="html"/>


This is the file that will be read by the generator

<?xml version="1.0" encoding="ISO-8859-1"?>
   <Greeting lang="fr">Bonjour</Greeting>
   <Greeting lang="en">Hello</Greeting>
   <Greeting lang="es">Hola</Greeting>


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   <!-- read the parameter -->   
   <xsl:param name="locale">en</xsl:param>
   <xsl:template match="/">
      <!-- get the greeting in the correct language -->
      <xsl:value-of select="Greetings/Greeting[@lang=$locale]"/>

resulting html page

When test/fr/hello.html is called, you get


when test/en/hello.html is called, you get


and so on.

If you try test/it/hello.html... you get nothing (you could improve the stylesheet to handle this situation)

-- EricBoisvert - 12 Sep 2005
Topic revision: r4 - 15 Oct 2010, UnknownUser

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).