"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

Tutorial on using escript in VGL

Overview

In a nutshell, escript is partial differential equations (PDE) modelling and solving toolkit with python interface (see escript website i.e. https://launchpad.net/escript-finley for further details).

In VGL, you provide escript with some 2D data (gravity anomaly, magnetic field, or both) that spans a geographic area, a few parameters and submit it to the cloud for processing. Once the processing is completed, you will get a density or magnetic susceptibility value for each cell of the 3D domain (mesh) that is generated by your escript in either VTK or Silo format. These output files can then be used by scientist to deduce where minerals such as gold, ore, etc are located in the specified geographic area.

This tutorial will provide a step-by-step guidance on how to perform the above mentioned tasks with a pre-defined dataset and an example escript. The objectives are to get you familiar with VGL and eventually run your own escript code in VGL.

Step 0 – Preparation

To run through this tutorial, you need an internet connection and a web browser installed on your computer.

To visualise the output generated in this tutorial, you need to download and install a separate data visualisation tool called VisIt (https://wci.llnl.gov/codes/visit).

This tutorial was successfully tested with the following web browser: Internet Explorer 9, Firefox 16.x and Google Chrome 23.0.1271.95.

This tutorial has been prepared for VGL 1.1 Release Candidate 2.

The following shows a list of VGL deployment environments:

Step 1 – Dataset Selection

The first step in using escript in VGL is the user must first capture a subset of coverage data from data selection page (see Part I of VGL User Guide for detailed instructions) and ensure the data to be requested from the remote service is of type NetCDF.

This tutorial will make use of the following dataset:

Coverage: Onshore Only Bouguer Geodetic
Data Type: NetCDF
Region: -13.5819209 (North), 133.50585938 (West), -14.78338079 (South), 134.296875 (East)
Location: /tmp/subset-request
Name: Subset of Onshore Only Bouguer Geodetic
Selection Size: Approximately 14,000 data points in total. Uncompressed that's roughly 53.7 KB (the data points and size are auto-calculated as the region values get adjusted)

Step 2 – Job Construction

Once the coverage data is captured, the next step is to build a job to process the captured dataset by using the “Job Wizard” which can be accessed by using the 'create a job' or ‘Submit Jobs’ link. The 'create a job' link can be found in the Request Saved notification pop-up window whereas the 'Submit Jobs' link can be found on the top right hand corner of VGL website.

To construct and submit a job in VGL, you need to sign in with your OpenID credentials (see this link if you haven't got an OpenID account set up yet).

Select Job Series

The first step in creating a job is to assign it to a series. Let us proceed to create a new series and name it “eScriptTutorial”. You can provide your own series description.

Enter Job Details

Once the name and description of a series are provided, click Next to proceed and enter the job details. To run escript’s python code, the user must select ‘escript’ software from the Toolbox drop-down list.

This tutorial will use the Compute and Storage Providers from National Computing Infrastructure (NCI) in Canberra to run the job and to store its input data files and also execution results (see Compute and Storage Providers in VGL Guide for further details). Alternatively, you could also choose to use Compute and Storage resources provided by NeCTAR to perform previously mentioned tasks. Regardless of which Compute and Storage Providers you choose, the end results of your job execution will always be the same.

Note: From observations, it seems like the Cloud computing infrastructure provided by NeCTAR performed better than the one provided by NCI at the time of preparing this tutorial. You're recommended to use NeCTAR as your Compute and Storage providers in this tutorial.

Manage Job Input Files

VGL will show your input files after you entered job details. You should see the dataset you captured in Step 1 displayed on the “Input files” panel.

At this step, you can add more inputs to the job (see Part I of VGL User Guide for further details).

This tutorial does not require you to provide any further input.

Define Your Job Script

In VGL, the user can define an escript’s python code from scratch, copy and paste it from somewhere or import the code from existing script template.

If the user chooses to define an escript’s python code from scratch or copy and paste the code from somewhere to work on previously captured or added datasets, he or she must remember the path to those datasets.

For the purpose of this tutorial, we choose to work from existing script template. On “Define your job script” page, expand the “escript Examples” tree and double click on “Gravity Inversion” node to import the gravity inversion’s script template into the script builder.

When prompted to provide the path to a NetCDF input file, select “/tmp/subset-request” (i.e. the dataset captured in Step 1) from the Dataset dropdown list. You will also have the option of fine tuning some of the inversion input parameters. For the purpose of this tutorial, we will use the default values of those input parameters (Max Depth: 4000, Air Buffer: 6000, Z Mesh Elements: 25, X Padding: 0.2 and Y Padding: 0.2).

Once the dataset’s path is provided, the following escript’s python code will be inserted into the script builder (the dataset path in bold is the path you previously selected):

########################################################
#
# Copyright (c) 2003-2012 by University of Queensland
# Earth Systems Science Computational Center (ESSCC)
# http://www.uq.edu.au/esscc
#
# Primary Business: Queensland, Australia
# Licensed under the Open Software License version 3.0
# http://www.opensource.org/licenses/osl-3.0.php
#
########################################################

### Basic script to run gravity inversion with escript ###

# Filename for input data
DATASET='<b>/tmp/subset-request</b>'
# maximum depth (in meters)
DEPTH=4000
# buffer zone above data (in meters; 6-10km recommended)
AIR=6000
# number of mesh elements in vertical direction (~1 element per 2km recommended)
NE_Z=25
# amount of horizontal padding (this affects end result, about 20% recommended)
PAD_X=0.2
PAD_Y=0.2

####### Do not change anything below this line #######

import os
import subprocess
import sys

try:
    from esys.downunder import *
    from esys.weipa import saveSilo
except ImportError:
    line=["/opt/escript/bin/run-escript","-t4"]+sys.argv
    ret=subprocess.call(line)
    sys.exit(ret)

def saveAndUpload(fn, **args):
    saveSilo(fn, **args)
    subprocess.call(["cloud", "upload", fn, fn, "--set-acl=public-read"])

source=NetCdfData(DataSource.GRAVITY, DATASET)
db=DomainBuilder()
db.addSource(source)
db.setVerticalExtents(depth=DEPTH, air_layer=AIR, num_cells=NE_Z)
db.setPadding(PAD_X, PAD_Y)
inv=GravityInversion()
inv.setup(db)
g, chi = inv.getForwardModel().getSurvey(0)
density=inv.run()
saveAndUpload('result.silo', density_mask=inv.getRegularization().location_of_set_m, gravity_anomaly=g[2], gravity_weight=chi[2], density=density)

Once the above script template is loaded into the builder, you can then proceed to modify the code and submit it to the cloud for execution. For the purpose of this tutorial, we shall leave the above script as it is and move on.

Review Job before Submission

The following review page will be shown before your job submission. It gives you opportunity to add further input files to your job and examine input files to be submitted for processing.

As this tutorial does not require any further input, click “Submit Job” to proceed.

If the job submission is successful, you will be redirected to “Monitor Jobs” page where you can then monitor the status of the submitted job and view/download the job’s input and output files. VGL will display an error message if it fails to submit the job to the cloud for execution.

Step 3 – Job Monitoring

At this point in time, you should have submitted a job named “eScriptJob1” for processing in the cloud.

A job belongs to a job series. To monitor the status of your submitted job, you must first select a series from the “Series List” pane. In our case, select a series named “eScriptTutorial”. If you have a large number of series records created, use the “Query” button to search for that series.

Once the “eScriptTutorial” series is selected, all jobs belonging to that series will be listed on the “Jobs of selected series” pane.

At any point in time, a job can be in one of the following four states:
No. Status Description
1. Saved A job will be in this state if it hasn’t been submitted for processing yet or it was cancelled by the user shortly after its submission. You can edit, submit or delete a “Saved” job but not cancel or duplicate it.
2. Pending A job will be in this state if it has already been successfully submitted to the cloud for processing and is waiting for a compute resource to process it. You can only cancel and duplicate a “Pending” job.
3. Active A job will be in this state if it is being processed by the compute resource. Like a “Pending” job, you can only cancel and duplicate an “Active” job.
4. Done A job will be in this state if it has completed its execution. A completed job does not guarantee the job is successfully executed. VGL v1.1 doesn’t provide a straight forward way to indicate a job execution success or failure. The only way to figure out this is to look at the files it generated.
The number of files generated by an active or completed job is different depending on which toolbox you select to process your job in. Every successfully executed job will have a file called “vegl.sh.log” generated. This file keeps track of the job execution log and it can be used to troubleshoot why a job failed in its execution. You can only delete or duplicate a “Done” job.
In this tutorial, we are only interested in our previously submitted job named “eScriptJob1”. To update its job status, use the “Refresh” button. A job normally (provided you don’t cancel it during its execution) goes through the following lifecycle: Saved -> Pending/Active -> Done.

The following screenshot demonstrates that the job named "eScriptJob1" is in "Pending" state:

Depending on the size of your input dataset, the computational logic and processing load at NCI or NeCTAR, the above job will take approximately 1-2 hours to finish on NeCTAR 's research cloud.

Every job that executes will generate a log file that captures the standard output of the executing job. That log file is written to a file called "vegl.sh.log" which can be downloaded or alternatively browsed through the inbuilt log viewer.

You can click on the "Files" tab on "Details" pane to browse through "eScriptJob1" job's input and output files:

You can use the "Logs" tab and its sub-tabs on "Details" pane to inspect the "eScriptJob1" job's execution log:

Step 4 – Job Registration

The last step of this tutorial is to register the results of a successfully executed job to GeoNetwork (see this link http://www.osgeo.org/geonetwork for further information). Only job that is completed and hasn’t been registered can be registered to GeoNetwork.

To register the results of “eScriptJob1” job to GeoNetwork, first check to ensure the job status has changed to “Done”, select the job and click on the “Register to GeoNetwork ” button. You will be prompted to provide/update contact and other details for the job. These details will be pre-populated if you have previously provided them. If this is the first time you perform job registration, proceed to fill in those details (enter your own contact and other details) and click on the “Register” button.