"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Workflow Systems Survey

Overview

As part of the VGL project, a course-grain survey of other Workflow tools was conducted. This review is an extension on previous reviews of scientific workflow systems completed by team members. Details of which can be found here: https://discovery.mdu.csiro.au/wiki/DataInfrastructure/Darda/ScientificWorkflowSystems (requires auth)

In this evaluation, first we will briefly look at the differences between scientific workflows and business workflows. This will enable us to better understand the characteristics of scientific workflows. With that undertaking and knowledge, we then proceed to investigate several well-known scientific workflow systems in e-Research/e-Science community. This investigation exercise will try to (a) find out common patterns used in those scientific workflow systems (is there a one size fits all workflow system?) and (b) briefly look at the features/functionality implemented by those workflow systems. Last we will use what we have learned about other scientific workflow systems to evaluate our own Virtual Geophysics Laboratory (VGL) workflow system. The aim is to find room for improvement/innovation which is the main goal of this workflow systems evaluation exercise.

Scientific Workflows vs Business Workflows

(What differentiate a scientific workflow from traditional business process workflow)
  • Sharing and reusing of workflows between scientists [1]
  • Scientific workflow systems provide an environment to aid the scientific discovery process through the combination of scientific data management, analysis, simulation, and visualisation.
  • Scientific workflows focus on innovation, rather than automation [11].
Scientific Workflow Business Workflow
Experimental-driven i.e. the outcomes of
computational experiment may validate/prove or
invalidate a scientific hypothesis.
Business-driven i.e. the outcome of a business
workflow is known before the workflow starts e.g. when
applying for an annual leave, the employee's leave application
may either get approved or rejected.
Multiple related and interdependent workflow instances [2] [3]
e.g. duplicate an executed workflow with different
parameter values and etc.
Commonly handle large numbers of independent workflow
instances at any given time [3] e.g. leave application workflow,
an order workflow and etc.
Rarely requiring human intervention (mostly done in
an early stage of workflow preparation i.e. to provide
input parameters, select computation resources etc.).
Usually involve numerous people in different roles (esp. human
interaction workflows) and in different stages of workflow execution.
Few people are involved in the same process, usually
involve only a scientist but several different complex
computing units and complex data [2].
The fulfillment of a business goal is about a set of clear business rules within
an organisation hierarchy context, a set of deontic rules, and a clear coordination
rules amongst people involved in the process [2].
Computation model is dataflow oriented and its execution
control flows implicitly with the data (^2). E.g. A -> B,
actor A produces data that B consumes.
Typically the computation model is control-flow/process oriented
and dataflow is often implicit or modeled separately. E.g.
A -> B, B can only start after A finishes.
Typically involve transportation and analysis of large
quantities of data (big data) between distributed
repositories.
Requires considerations for stream-based and
concurrent execution of tasks.
Light-weight control data.


Many different models of computation exist e.g. dataflow
process network (PN), synchronous dataflow (SDF),
collection-oriented modeling and design (COMAD), etc.

No single best or universal model of computation that fits
all needs equally. Dataflow-based computation models are
widespread among scientific workflows.

OMII-BPEL, led by the Open Middleware Infrastructure
Institute attempts to bring an industrial standard, BPEL
to scientific workflow modeling and Grid services
orchestration [11] [19].

Petri nets (^1) are used as the underlying and unifying foundation for
describing and analysing workflows.

Standards: BPEL (2.0), BPMN (2.0), YAWL (1.2), XPDL (2.2)
(see Glossary for details)
Provenance management or tracking [1] [2]
Scientists concerned a lot on the intermediate steps,
results and data of a scientific process.
Process mining [2]
Business people want to know which parts of their processes can be optimised
based on previous runs to reduce the maintenance costs.
Typical workflow life cycle [3]

(A) Hypothesis/Experiment Goals
-> (B) Experiment/Workflow Design -> (C) Workflow
Preparation -> (D) Workflow Execution -> (E) Post-Execution
Analysis -> (A)

Typical workflow life cycle [12]

(A) Model -> (B) Implement -> (C) Execute -> (D) Monitor -> (E) Optimise -> (A)

Open Source

Pegasus, Kelper, Taverna, Triana and etc (see next
section for further details).

Commercial

InforSense Platform (http://www.inforsense.com)

Open Source

Activiti (http://www.activiti.org)
jBPM (http://www.jboss.org/jbpm)
Enhydra Shark (http://www.together.at/prod/workflow/tws)
Joget Workflow (http://www.joget.org)
More java-based workflow (see http://java-source.net/open-source/workflow-engines)
Spiff Workflow (https://github.com/knipknap/SpiffWorkflow/wiki)

Commercial

Microsoft InfoPath (http://office.microsoft.com/en-au/infopath)
K2 Workflow & BPM (http://www.k2.com/en/index.aspx)
NINTEX Workflow (http://www.nintex.com/en-US/Pages/default.aspx)
IBM Lotus Workflow (http://www-01.ibm.com/software/lotus/products/workflow/)

^1 Petri nets have an exact mathematical definition of their execution semantics, with a well-developed mathematical theory for process analysis [4].
^2 Pro: Resulting model is often simpler and allows stream-based, pipeline-parallel execution [2].

Scientific Workflow Systems

Due to large number of scientific workflow systems out there and with limited time, it is impossible to cover all of them in this survey. Below we choose to provide a survey of most popular, vibrant and mature scientific workflow systems and leave out others by providing their respective hyperlinks for future reference and investigation.

Pegasus

  • Website: http://pegasus.isi.edu/
  • Job-oriented "grid workflows" and employ a DAG-based execution model without loops.
  • Consists of a workflow mapper and a workflow executor/engine known as DAGMan.
  • Support for RESTful web services & OGC service consumption:
  • Support for the use of Cloud in workflow execution:

Kepler

  • Established in 2002 by members of Science Environment for Ecological Knowledge (SEEK) project and the Scientific Data Management (SDM) project.
  • Built upon Ptolemy II framework, developed at University of California, Berkeley. (http://ptolemy.eecs.berkeley.edu/ptolemyII/)
  • Website: https://kepler-project.org/
  • Uniform access to computational components through actor model (containing executable code).
  • Many different models of computation are possible e.g. Synchronous - processing occurs one component at a time, Parallel - one or more components run simultaneously.
  • Focus on actor-oriented design - actors are executable components of a workflow, director controls execution of workflow, ports are used to produce and consume data and communicate with other actors in workflow and parameters are values that can be attached to workflow or individual directors/actors. Every Kepler workflow needs a director.
  • Covered in CSIRO Workspace Product Comparison (see https://wiki.csiro.au/display/Workspace/Features for further details).
  • Support for RESTful web services & OGC service consumption:
  • Support for the use of Cloud in workflow execution:

Taverna

  • Created by myGrid team (http://www.mygrid.org.uk/) and funded through the OMII-UK (http://www.omii.ac.uk/).
  • Website: http://www.taverna.org.uk/
  • Domain-independent and used in domains such as Bioinformatics, chemistry, astronomy, data and text mining, document and image analysis etc.
  • Implemented as a service-oriented architecture, based on Web service standards (such as SOAP/WSDL, REST, etc.). [6]
  • Uses proprietary workflow language known as Simple Conceptual Unified Flow Language (SCUFL) for representing workflows as DAGs. Making the workflows simple to share and manipulate outside the editor.
  • Predominately dataflow-oriented model of execution and support loops [3] [21]
  • Nodes in the graph present processors which transform input data to output data. A processor with no input acts as a data source and a processor with no outputs acts as a data sink/output. The directed arcs between the nodes are generally channels for passing the output of one processor as input to another. It supports iterative execution of a processor and a number of control flow constructs (if-else, switch-case, etc.) for organising control flow operations.
  • Covered in CSIRO Workspace Product Comparison (see https://wiki.csiro.au/display/Workspace/Features for further details).
  • Support for RESTful web services & OGC service consumption: Taverna can invoke generic WSDL-style or REST-style Web services. There are 2 example Taverna workflows in myExperiment website showing (a) how to invoke gdalinfo service and list various information about a GDAL supported raster data sets. (b) how it reads and processes GML file with featureCollection where polygon is described using gml:coordinate element structure. (see this link http://www.myexperiment.org/workflows?query=OGC for further details)
  • Support for the use of Cloud in workflow execution: Taverna is deployed and used mainly as standalone workbench. It has a server which can be set up as a dedicated server for executing workflows remotely.

CSIRO Workspace

  • Developed and supported by the Computational Model Group in CMIS.
  • Designed to be a general purpose framework.
  • Website: https://wiki.csiro.au/display/Workspace/Main
  • Support for RESTful web services & OGC service consumption:
  • Support for the use of Cloud in workflow execution:

TWB/Project Trident

  Taverna (2.4) Kepler (2.3) Pegasus (4.1) Workspace (2.26.2) Project Trident (1.2)
Domain(s) Various domains,
including Arts,
Astronomy, Biodiversity,
Bioinformatics, Chemistry,
Data and text mining,
Geoinformatics, etc [20]
Numerous domains,
including bioinformatics,
ecoinformatics,
geoinformatics, etc.
Various domains,
including astronomy,
bioinformatics, earthquake
science, gravitational wave
physics, ocean science,
etc.
General purpose.

General purpose.

So far, it has been used in
hydrology, geospatial,
oceanography, astronomy
and bioinformatics
research projects.

Data Discovery,
Access &
Transformation

Via web service registries, including
BioMOBY (a registry
of bioinformatics web
services), caBIG (the
cancer biomedical
informatics grid), biomart
(a bioinformatics data
repository), soaplab
(a web service
framework specialised
for bioinformatics
programs), etc. [23]

About 3000 services
available by in Taverna.

New services can be
gathered from anywhere
on the web if they
comply with WSDL
standard.

Provides Shim service to
'glue' services together
that otherwise have
incompatible outputs/
inputs. (similar to the
concepts of pre- or post-
processing of data)

Provides direct access
to scientific data that
has been archived in
many of the commonly
used data archives.

E.g. access to data
stored in the Knowledge
Network for Biocomplexity (KNB)
Metacat server and other
data sources.


Via Replica Catalog Supports any existing
data type in the workflow
but it doesn't mention
how the data can be
discovered.

Via Trident registry.

It serves as a catalog of
known data sets, services,
workflows and activities,
and compute resources,
as well as maintaining
state for all active workflows.

Can store links to data
in the registry.

Model of Computation
(Data Analysis &
Computation)

Predominately
dataflow-oriented.

It supports loop and
provides a number of
control flow constructs
for organising control
flow operations.

Supports many different
models of computation
inherited from Ptolemy II
such as PN, DE, SDF, SR,
CT, etc via a software
component called
a director.

Kepler defines control
using directors that
determine the workflow
node behaviour.

Predominately
dataflow-oriented.
Predominately
dataflow-oriented.

Dataflow-oriented like
other scientific workflow
systems.

Supports for basic
workflow constructs
such as If Else activity,
etc.

Comes with out-of-box
activities as workflow
building block.

Allows custom activity
libraries to be built to
handle tasks such as
specialised data
processing procedures.

Workflow Langauge SCUFL None though workflows
are saved as XML files.
DAX None None
Workflow Composition Via Taverna Workbench,
a desktop client
application.
Via easily navigated,
drag-and-drop visual
interface.

Via Java, Perl,
Python API.

No GUI exists yet
but it is in their
project plan and
architecture diagram.

Via Feature-rich
Graphical Workspace
Editor.

Supports arbitrary user
interface creation via
Qt toolkit and allows
these user interfaces to be
tied into workflows.

Via Trident Workflow
Composer and library that
enable scientists to visually
author a workflow using
a catalog of existing
activities and complete
workflows. [22]
Workflow Execution

On a dedicated server
(Taverna Server), grid,
cloud, behind a portal or
bundled with products.

CLI tool exists for a
quick execution of
workflows from a terminal
without the overheads
of GUI.

Depends on Ptolemy II
for the execution of its
workflows.

Can run as a standalone
application with GUI
support.

On DAGMan workflow
engine.

Can execute on a
number of resources:
local machine, campus
clusters, grids and
clouds (EC2, S3)

DAGMan relies on the
resources (compute,
storage, and network)
defined in the workflow
to perform necessary
actions.

Run in the graphical
editor, as a
command-line task (in
batch mode) or
combined with a
user interface to
create custom
standalone application.


Currently it doesn't
support parallel
(multi-core and
distributed) workflow execution.

Platform dependent.

Can run on stand-alone
Windows workstation or
distributed Windows HPC
Server 2008 cluster.

There're 2 types of
workflow application:
client and Silverlight-
based.

Allows users to schedule
and queue workflow
execution based on time,
resource availability, etc.

Workflow Management/
Monitoring

Via Taverna Workbench.

It can be used to
monitor the running of
a workflow and to
examine the provenance
of data produced.

Unknown as this is not
mentioned anywhere
in their documentation.

Uses pegasus-status
a CLI to monitor
the execution of
workflow.

Uses pegasus-analyzer
a CLI to analyse
workflow and provides
a summary of run.

Supports event notification (defined
in DAX) to workflow
and tasks such as on start, on end, on failure, on success, etc.

 

Via Management Studio.

It handles a variety of
workflow related tasks
such as schedule jobs,
managing running jobs,
examine completed jobs,
etc.

Supports a variety of
event notifications such
as upon job completion,
etc.

Data Visualisation
& Interaction
Comes with a set of
built-in "renderers" for
displaying data.
Renderers are selected
based on the MIME type
associated with any
given workflow output. [23]

Via third-party tools e.g.
UTOPIA (a visualisation
tool for DNA and proteins).

Bundled with several
actors that provide
visualisation support. [23]

These include 2D/3D
plots, bar charts and
tables.

Not supported or
out of scope.
Supports 2D/3D
visualisation & inte-
raction capabilities.
Not supported or
out of scope.
Error Handling/
Recovery
Provides failure
notifications, retry,
failover and automatic
substitution of alternates.
Unknown as this is not
mentioned anywhere
in their documentation.
Provides fine and coarse
grained error recovery
supports by retrying tasks,
retrying entire workflow,
by trying alternative data
sources, etc.
There is no mentioning of
error recovery on
the project website.

Has fault-tolerance
and recovery service for workflows.

Facilitates smart reruns,
what-if analysis, etc.

Workflows Sharing
& Reuse

Via myExperiment a
social networking site and Virtual Research
Environment designed
for people to share,
discover and reuse
workflows and other files.

Taverna Workbench has
a built-in support for
myExperiment.

Workflows and customised
components can be saved,
reused, and shared using
Kelper archive format (KAR).

Provides a centralised
component repository
where components and
workflows can be
uploaded, downloaded,
searched and shared
with the community or
designated users.

  There is no mentioning of
workflows sharing and
reuse on the project
website.

Uses myExperiment
(similar to Taverna) as
the community site for
sharing workflows, along
with provenance traces.

Workflow can be
packaged as Open
Packaging Convention
(OPC) file.

Written In Java Java Java 1.6, Python 2.4 C++ (workflow engine),
Qt toolkit (user interface)
.NET framework
programming languages,
Windows Workflow Foundation.
WPF, Windows Form or 
Silverlight (user interface).
Provenance

Offers a provenance
export capability to
OPM graph [14] or
Janus RDF graph [15].

Provides programmatic
access to provenance
data or direct access
to provenance DB.

Support exists in the
form of an optional
add-on module suite
known as Provenance
Module.

That module
includes a Java API
to access provenance
data stored in
relational DB.

The provenance data is collected in a database,
and the data can be summarised with tools
such as pegasus-statistics, pegasus-plots, or directly
with SQL queries.
There is no mentioning of provenance handling on
the project website.

Provenance record
capture either locally or in
the cloud.

The provenance service
records a detailed
history of each Trident
job.

Security

Uses HTTPS for secure
web services invocation.

Provides secure
management of users'
credentials via Credential
Manager.

Unknown as this is not
mentioned anywhere
in their documentation.

It could very much
depend on individual
application which uses
Kelper as their
workflow framework.
     
Portal-based Access

Designed to run
workflows within the
Taverna Workbench.

However, workflows
can also be run web
pages and behind a
portal.

Kepler is a desktop based application.

Hydrant [18] is a web-
based portal which sits
on top of the core Kepler engine.

None.

However, portal 
based access is in their project plan and architecture
diagram.
None. Workspace is a
desktop based
application.
Includes a web portal
written in Silverlight that
allows scientists to
launch and manage
workflows from any
internet location.
License LGPL BSD Apache 2.0

Flexible licensing model
which allows closed and
open source development.

LGPL for Qt toolkit

Apache 2.0
Other functionality
or selling points
Supports pausing/
resuming of workflow
execution.
Its functionality can be
extended by creating
new actors.
  Allows new
functionality to be
added via plugins
development (in C).

Supports workflow
execution monitoring with
resource usage analysis
and intelligent completion
estimates.

Allows serialisation and
restoring of the entire
working state of an in-
progress workflow hence
allowing pausing and
resuming of workflows
and archive intermediate
state to any capable
storage device.

Others

Analysis of VGL Workflow (version 1.1)

  • Built on top of Spatial Information Services Stack (SISS)
  • Implemented as a service-oriented architecture (SOA), based on Web service, spatial data interoperability and metadata standards.
  • Domain: Scientific workflow portal for geophysicists.
  • Discovering and capturing of data is done via intuitive and interactive web user interface.
  • Using code to describe workflow logic/code-centric (code samples are provided)
  • Parallel execution (multi-core) of workflow is provided by individual toolbox/execution environment through the use of MPI programming library.
  • Cloud based (Nova for compute and Swift for storage)
  • Any spatial data type can be supported due to its code-centric nature.
  • Data, workflow logic and the workflow execution results are stored in S3 cloud storage and can be published to GeoNetwork. (Provenance tracking)
  • Uses OpenID for authentication and current OpenID provider is myOpenID run by Janran, Inc. Integration with AAF is underway.
  • Provides role-based access control to computational toolbox/software.
  • Provenance can be tracked via the inspection of data sets, workflow logic (Python script), workflow execution results and execution log.
  • GNU Lesser General Public License Version 3.

Key Strengths and Weaknesses

+ Rich web user interface for spatial data discovery inherited from AuScope portal.
+ Portal-based access - no installation is required by the users.
+ Ability to reproduce or reuse a complete and executing workflow with ease - limited only to individual who created the workflows.
+ Ability to run workflows on Amazon EC2 compatible cloud computing platform (such as OpenStack 's Nova).
+ Ability to perform real-time monitoring and coarse grained management of workflow execution.
+ Provenance data is stored permanently on relational database and Amazon S3 compatible cloud storage (such as OpenStack 's Swift).
+ Ability to support multitude of computational models due to its code-centric nature.
+ Ability to import input files/data sets into the workflow via remote web service or from local filesystem.
+ Platform independent - it can run on any Java EE compliant web container (Tomcat Server, JBoss AS, GlassFish, Jetty Web Server, etc.)

- No online documentation exists yet (user guide and tutorials do exist in wiki).
- No visual authoring and representation of workflow logic (both the data and control flows) or data analytical steps.
- No support for custom user interface to be built on top of a workflow due to tightly coupled system architecture and its code-centric nature.
- No plugin support when it comes to the installation and integration of new toolbox/software, and also the development of code libraries for respective toolboxes/software.
- No scheduling and queuing support yet for workflow execution based on time, resource availability, etc.
- Workflow logic is language (Python) dependent and execution environment (toolbox or domain/sub-domain) specific.
- Distributed execution of workflow is not currently supported yet.
- Sharing of workflows (related files and meta-data) between scientists is possible (via GeoNetwork) but it requires some effort and manual process (no documentation on this yet).
- Limited code samples and libraries for respective toolboxes/software.
- Event notifications are not yet supported.
- No support for nested workflows (ability to execute sub-workflows from a workflow).
- No Internationalisation support yet for user interface.

Discussion/Conclusions

Thus far we've come to know the differences between scientific workflow and business workflow, evaluated a number of scientific workflow systems and analysed our own VGL workflow portal. With this experience, we will now conclude this survey by selectively choosing 5 aspects of VGL workflow portal for further discussion. This section is intended to be open-ended so to allow reviewers of this survey (especially for those who are familiar with or interested in scientific workflow systems and VGL project) to contribute their opinions, ideas and suggestions for the future direction and improvement of VGL.

Declarative vs Procedural Workflow

  • VGL is a code-centric scientific workflow system. It doesn't make use of a high level workflow language to describe its workflow and it presumes all scientists while doing science are also coders. This makes VGL hard to use by geophysicists who are non-coders. Most scientific workflow systems evaluated in this survey store workflow in a proprietary language (usually in XML format). There isn't any standard exist yet for scientific workflow language unlike the de-facto business process execution language (BPEL) used in open source and commercial business workflow systems. There were attempts made at adapting BPEL for scientific workflows but none of the scientific workflow systems evaluated above support the use of BPEL in describing scientific workflows. The main reason is because science and business people work in different paradigms. Science focuses more on data whereas business on processes. So, whether or not VGL should implement yet another proprietary scientific workflow language, I think more research is needed in this space to see if we could either adapt an existing scientific workflow language or come out with a novel approach to make VGL easy to use by non-coders and in the most ideal world, a workflow that can be run in other scientific workflow systems or engines.

Error Handling/Fault-tolerance

  • Error handling is important particularly for workflow system designed and implemented based on SOA. The web services and resources VGL depends on can become unavailable at any point in time without any notification from the providers. This can upset/surprise prospective users and hinder the uptake of VGL even though it isn't the fault of VGL workflow portal. From observations, 3 out of 5 evaluated scientific workflow systems provide some kind of fault-tolerance and error recovery mechanism to deal with unexpected error during workflow submission and execution. We could learn from those workflow systems. Here are a number of ideas to consider: (a) We probably shouldn't bother users with online resources (compute or storage) not available error instead we should queue up user's job submission for silent retry at later time in the event when those online resources are not available. (b) During job execution, in the event when the web services or resources used by the workflow becomes unstable or unavailable, the workflow execution script could retry for a pre-defined number of times before failing the job and send failure notification to the users. (c) In the worst case scenario of machine instance malfunction, entire workflow could be retried automatically.

Security

  • VGL is built for geophysicists to do their sciences. Even though its computational and storage resources are protected by OpenID, VGL doesn't check or know if a user is a geophysicist before allowing he or she to use those resources. Currently, VGL authenticates a user via myOpenID an OpenID identify provider. So, anyone who has account with myOpenID can access and use those online resources. This is a known issue to VGL development team and workaround solution does exist to restrict access to scientific computational toolbox/software. However, this workaround solution is less than ideal and requires manual configuration for each new user. One possible solution has been discussed by the team i.e. to provide simple registration process on top of myOpenID signing up process. From observations, none of the above evaluated scientific workflow systems have a solution to this problem and may purposely omit it from their engine/framework as security requirement is very different from one application to another.
  • Some other areas of security to look into are secure access to password protected web services and the current issue of integrating with AAF (need someone to provide more inputs).

Provenance Handling

  • One of the requirements and desiderata of scientific workflow system is that the results must be reproducible. In VGL v1.1, there is a possibility that a change or an upgrade in computational toolbox/software may produce different execution results when a completed job is duplicated to run with the same input data sets and workflow script. Some scientific workflow systems out there do provide versioning of provenance data. It is still an active area of research. As workflow in VGL is execution environment (toolbox or domain/sub-domain) specific, we may consider supporting multiple versions of toolbox/software as possible solution. The downside of that solution is maintenance overhead.
  • Another area in provenance in which we may want to look into is whether VGL should support Open Provenance Model in the future. Taverna is most probably the first to provide the export of their provenance data into OPM.

Code Library

  • A code-centric scientific workflow system like VGL has its own advantages too despite being not so user friendly and not so easy to use by scientist who is a non-coder. The most appealing advantages are (a) it is much more simpler and flexible in terms of providing a variety models of computation for the workflow (i.e. both dataflow and control-flow constructs are already provided in the underlying scripting language) (b) it can re-use and integrate well with existing heavily invested scientific code without major refactoring or rewriting. For this practical reason, it is probably worthwhile for VGL to remain as code-centric scientific workflow. Having said that we still need to find ways to improve the usability of VGL for non-technical users in workflow composition. One possible solution could be the use of categorised code library to assist users in workflow composition. Prior version of VGL seems to have this kind of code library and it is probably worthwhile to consider it again. Another idea which we could probably try is to give power users or toolbox/scientific code developers the ability to develop custom code library that can be easily plugged into VGL's script builder.

Glossary

  • BPMN - Business Process Model and Notation is the de-facto standard for business process modeling that provides a graphical notation for specifying business processes in a Business Process Diagram (BPD).
  • BPEL - Business Process Execution Language, short for Web Services Business Process Execution Language (WS-BPEL) [7] is an OASIS standard executable language for specifying actions within business processes with web services. Currently, it is a de-facto standard way of orchestrating Web services [6].
  • YAWL - Yet Another Workflow Language is a workflow language based on rigorous analysis of workflow patterns (akin to design patterns in the discipline of software engineering) [8] [9]. It is seen as alternative to BPEL and has been designed based on the Petri Net paradigm to satisfy the full set of workflow patterns, under the assumption that this will satisfy the needs of both scientific and business communities [11]. This language is supported by YAWL (http://sourceforge.net/projects/yawl) an open source workflow system.
  • XPDL - XML Process Definition Language is a format standardised by the Workflow Management Coalition (WfMC) to interchange business process definitions (both the graphics and the semantics of a workflow business process) between different workflow products i.e. modelling tools and workflow engines. The following workflow engines support this language: WfMOpen (http://wfmopen.sourceforge.net), Joget Workflow, Enhydra Shark etc.
  • DAG - Directed Acyclic Graph is a directed graph that contains no cycles. E.g. a tree is a special kind of graph that contains no cycles.
  • OPM - Open Provenance Model is a community-driven model for provenance, which originates from the Provenance Challenge series (initiated in May 2006, at the first IPAW workshop), allowing provenance to be exchanged between systems. (http://openprovenance.org)
  • RDF - Resource Description Framework [16]
  • SCUFL - Simple Conceptual Unified Flow Language is a high level XML-based conceptual language for specifying Taverna workflows. The new version of SCUFL (i.e. SCUFL2) adopts Linked Data technology and preservation methodologies to create a platform-independent workflow language that can be inspected, modified, created and executed. See [17] for further details.
  • PN - Process Networks (http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIlatest/ptII/ptolemy/domains/pn/doc/index.htm)
  • DE - Discrete-events (http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIlatest/ptII/ptolemy/domains/de/doc/index.htm)
  • SDF - Synchronous Dataflow (http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIlatest/ptII/ptolemy/domains/sdf/doc/index.htm)
  • CT/CD - Continuous Time/Domain (http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIlatest/ptII/ptolemy/domains/continuous/doc/index.htm)
  • WSFL - Web Services Flow Language is an XML language for the description of Web Services compositions as part of a business process definition. (http://www.ebpml.org/wsfl.htm)
  • DAX - Directed Acyclic Graph in XML is a description of an abstract workflow in XML format that is used as primary input into Pegasus.

References

  1. Scientific workflow system (http://en.wikipedia.org/wiki/Scientific_workflow_system)
  2. Business versus Scientific Workflow: A Comparative Study (http://www.cs.ucdavis.edu/research/tech-reports/2009/CSE-2009-3.pdf)
  3. Scientific Workflows: Business as Usual? (https://docs.google.com/viewer?a=v&q=cache:fZVWhxm7GW4J:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.149.8657%26rep%3Drep1%26type%3Dpdf+&hl=en&gl=au&pid=bl&srcid=ADGEESjRpwrbkOvPDSvEEowMVlu1GGOwAmy-an1ileh5D6KwVGsKC2bfFbmf29VLZ26uMW2H1UlK6xm8YiiHVTMrNu7kMFtAdSMbz0hSyPXrZ4Z80I4O1iJcwZ2Ou_IXgjQHH5E15UTC&sig=AHIEtbT8XQRqX24Y-pvPIkDXhXd3kln7xg)
  4. Petri net (http://en.wikipedia.org/wiki/Petri_net)
  5. Business Process Model and Notation (http://en.wikipedia.org/wiki/BPMN)
  6. Adam Barker and Jano van Hemert. Scientific Workflow: A Survey and Research Directions. Parallel Processing and Applied Mathematics Lecture Notes in Computer Science Volume 4967, pp 746-753. (http://link.springer.com/chapter/10.1007%2F978-3-540-68111-3_78)
  7. Business Process Execution Language (http://en.wikipedia.org/wiki/Business_Process_Execution_Language)
  8. YAWL (http://en.wikipedia.org/wiki/YAWL)
  9. Workflow Patterns (http://en.wikipedia.org/wiki/Workflow_patterns)
  10. XPDL (http://en.wikipedia.org/wiki/XPDL)
  11. Scientific workflow system - can one size fit all?
  12. Business Process Management Life Cycle (http://www.pnmsoft.com/resources/bpm-tutorial/bpm-lifecycle)
  13. Directed acyclic graph (http://en.wikipedia.org/wiki/Directed_acyclic_graph)
  14. The OPM Provenance Model (http://openprovenance.org/)
  15. Janus: from Workflows to Semantic Provenance and Linked Open Data (http://www.bioontology.org/sites/default/files/Janus.pdf)
  16. Resource Description Framework (http://en.wikipedia.org/wiki/Resource_Description_Framework)
  17. SCUFL2 (http://dev.mygrid.org.uk/wiki/display/developer/SCUFL2)
  18. hydrant-kepler - Web front end for the kepler scientific workflow application (http://code.google.com/p/hydrant-kepler/)
  19. OMII-BPEL (http://www.omii.ac.uk/wiki/BPEL)
  20. Taverna in use - by domain (http://www.taverna.org.uk/introduction/taverna-in-use/by-domain/)
  21. Meta-Workflows: Pattern-based Interoperability between Galaxy and Taverna
  22. The Trident Scientifc Workflow Workbench (http://research.microsoft.com/pubs/63956/barga2008TridentScientificWorkflowWorkbench.pdf)
  23. Craig Scott, Ben Morris, Lachlan Hetherton. Workspace Product Comparison (https://wiki.csiro.au/display/Workspace/Features)

-- RichardGoh - 27 Nov 2012
Topic revision: r31 - 11 Dec 2012, RichardGoh
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).