Virtual Geophysics Laboratory User Guide
This is a guide tailored for the end user of the Virtual Geophysics Laboratory (VGL)
. It is split into two main parts. The first part will explain the general overview of what a typical "Job" or "Workflow" within VGL looks like and how you interact with them. The second part to this guide will cover the specifics of a few workflows explicitly supported by VGL.
Part I General Usage
In general the VGL laboratory workflows can be broken down into three main phases. The first phase involves discovering and then selecting data sets, the second involves building a script to process said data and the third involves collecting/publishing the results of the processing.
When VGL is first loaded you will be presented with the data selection page (see figure 1). It will consist of a list of available data sets and a viewport for visualising the spatial components of those data sets. The datasets will be presented as a set of 'layers' that can be added to the map for visualisation.
Figure 1 - Overview of Data selection
To get more information about any given layer (see figure 2) you can either:
Figure 2 - Layer Interactions
- Select the plus icon to expand a longer description of the layer;
- Select the data/image icon to find out more information about what data services are powering the layer;
- Select the magnifying glass to highlight the layer's spatial region on the map, double clicking the icon will pan the viewport to that spatial region.
Once a layer of interest has been identified it can be visualised on the map by selecting the layer and pressing the "Add Layer to Map" button. The layer's data services will be queried and the responses displayed on the map. The form of the visualisation depends entirely on the data service. For example: layers with a Web Map Service (WMS) will have the appropriate WMS layers overlaid on the map.
Some layers, when added to the map, will have additional visualisation options in the form of filters. If a layer has extra filter options they will be presented in the filter window whenever the layer is selected in the active layers panel (see Figures 3a and 3b).
Figure 3a - Layer filtering window for geophysics data sets grouped by project type
Figure 3b - Layer filtering window for a map imagery
After interrogating a layer (and it's data) visually the next step is to 'select' the data so that it can be made available to your upcoming processing job. The process of data selection varies slightly depending on the type of data, the specific selection procedures are documented below.
Selecting Coverage Data
A coverage is defined as a one or more data variables that vary over a continuous spatiotemporal region. Coverages are typically very large data sets that require subsetting in order to process in manageable chunks. VGL allows a coverage to be subsetted spatially by drawing a bounding box on the viewport using a mouse.
Coverage data selection is initiated by clicking on the 'Select Data' button on the viewport (Figure 4a). If the selected spatial bounding box or region has more than one coverage data set, the coverage data selection window will be displayed with a list of available coverage data sets. There is an edit icon for each data set listed. When the icon is clicked, its data set metadata will be displayed (Figure 4b) where you can change some information such as the data format you would like to capture the coverage in, where the data should be stored, etc. The data storage location is important because it's how you will be accessing the data from your job script (more on this later). Remember to click on the 'Save Changes' button once changes have been made on coverage metadata editing window. To make the coverage data sets available for your job script, you will need to select one or more data sets by using the checkbox and press the 'Capture Data' button.
Figure 4a - Coverage data selection
Figure 4b - Coverage metadata editing window
Selecting Model Data
Simulation models differ from coverages in that there aren't any data services to subset or query against. Instead the entire model file will need to be downloaded and made available to a job for processing. To select one or more model files you will need to select the spatial region of that model in the viewport. Upon selection you will be shown a popup (Figure 5) containing information about the model, a link back to the library where this model is cataloged and a list of files associated with the model.
Figure 5 - Model Selection
The model files will have an edit icon (Figure 6) that when clicked will show the file metadata (Figure 7) where you can change its location, name and description. The most important piece of information is where should the model file should be stored? The data storage location is important because it's how you will be accessing the data from your job script (more on this later). Remember to click on the 'Save Changes' button once changes have been made on model file metadata editing window. To make these model files available for your job script, you can select all or one or more files by using the checkbox and press the 'Capture selected' button.
Figure 6 - Available model files selection window
Figure 7 - Model file metadata editing window
Once a suitable set of data has been collected, the next step is to build a processing job to actually do something useful with the data you've selected. To access this step select the 'Submit Jobs' link next to the VGL banner. You will be required to authenticate with an Open ID provider before continuing. Please note that all steps in the job construction phase come in the form a 'Task Wizard' where you will be shown a sequence of forms which can be advanced/reversed by pressing the 'Next'/'Previous' buttons.
The first step in creating a new job is to assign it to a series (Figure 8). A series is a way of organizing like jobs for easier access. You can either create a new series here by selecting the 'New Series' radio button or you can select an existing one from the combo box. Selecting an existing series will show you a list of all jobs that currently belong to that series. Right click an existing job to show a list of actions that can be applied to the selected job. Once you are happy with the selected series, press 'Next'.
Figure 8 - Job series selection
The next step (Figure 9) involves adding a brief description of the job you are creating as well as selecting a compute provider, a storage provider, a toolbox and a resource selection.
Compute/Storage Provider is a research or commercial entity that provides computing infrastructure (pool of resources such as compute, storage and networking) to perform computational intensive job or workflow. As of release 1.1, VGL provides compute and storage resources from National Computing Infrastructure (NCI) in Canberra and National eResearch Collaboration Tools and Resources (NeCTAR
) in Melbourne.
A Toolbox defines a set of pre-installed software and libraries that will be made available to your processing script (more on this later) at startup. Certain toolboxes will be restricted to authorised users only due to licensing reasons.
The resources selection allows you to choose how much computing power and memory you wish to allocate to this job.
After entering the job details, press Next.
Figure 9 - Job metadata
Now it's time to review the job input files (Figure 10) you selected during the 'Data selection' phase, they should all be listed on this page. You can also add additional inputs in the form of remote HTTP downloads or files uploaded from your PC. If you plan on processing a large dataset it is recommended that you make it accessible via a public URL instead of directly uploading it via this form.
Figure 10 - Job inputs
Finally it's time to define a python script (Figure 11) that will be executed in an environment where it has access to all of the configured input files. The environment executing the script has a few pieces of important information that you should be aware of:
- The script will be executed using a Python 2.7 environment.
- All input files will be available as soon as the job starts executing.
- There will always be a utility program called 'cloud' installed on the PATH for simplifying access to cloud storage. It has the following commands:
cloud upload [uploadedFileName] [file]
cloud download [cloudFileName] [outputFile]
- You can create as many temporary/output files as you wish until the HDD is filled.
- As soon as this script finishes executing the entire environment will be destroyed, including all results. To persist any outputs you will need to upload them using the cloud command.
To aid in the construction of your python script there is also a number of code snippets/templates that can be added to the code window. Most of these snippets are specific to a single workflow and are explained in more details later in this guide.
Once the script has been finalised, hit next and you will have the option of reviewing all of the input files (and the script you just created) on the next form. Pressing 'Submit' will start your processing job. If the submission succeeds, you should be redirected to the job monitoring page.
Figure 11 - Script Builder with example script
The final piece of the workflow involves monitoring a job's execution and it's outputs. Initially you will only be shown a list of series on this page, selecting a series will display the set of jobs belonging to that series. Selecting a job (Figure 12) will allow you to interrogate the input/output files and execution logs for the job along with the names/descriptions configured during job creation. A Job whose status is 'Pending' or 'Active' may continue to create output files so the displayed list of files may NOT be exhaustive.
Figure 12 - Job Monitoring
When a job is 'Pending' or 'Active', you can choose to cancel its execution by first selecting the job to be cancelled and invoke the 'Cancel job' action from either the jobs selection panel's Actions dropdown menu (Figure 12b) or the individual job's right click context sensitive menu (Figure 12a). Once the job is cancelled, you can then edit the job and re-submit it for processing. All the output files generated by previous execution will be discarded.
| Figure 12a - Cancelling a 'Pending' job by using
context sensitive menu
| Figure 12b - Cancelling a 'Pending' job by using
If the results of a job a worth keeping then select the 'Register to GeoNetwork
' button. A job registration details window (Figure 13b) will then be displayed for user to enter his/her contact and other details associated with the job. These details will be stored in VGL for subsequent uses. Once the 'Register' button is clicked, VGL will persist the results and generate an ISO19115 metadata record that describes the entire process used to generate the job's results. The resulting metadata record will be stored in an instance of geonetwork associated with VGL. You can access the record (after registration) by selecting the registered job and inspecting the 'Registered URL' detail under the description tab (Figure 13a)
Figure 13a - Job monitoring with a registered Job
Figure 13b - Job registration details
Finally a job can always be deleted or duplicated by right clicking the job and selecting the appropriate action. A job with 'Saved' status cannot be duplicated. Duplicated jobs will duplicate all metadata and remote service downloads by default, the remaining input/output files can optionally be copied across into the duplicated job (Figure 14).
Figure 14 - Duplicate job files
Part II Specific Workflows
The following information is usage notes on how to use the various the script snippets/toolboxes within VGL. Please make sure you read and understand the above guide first.
For these script snippets you will need to ensure that you have selected the coverage to be captured using CSV
and the UBC-GIF
toolbox. Please note that the UBC GIF toolbox is restricted to licensed users only.
When using the UBC-GIF script templates you will be prompted for the input CSV file and the associated spatial bounds using UTM coordinates. If the bounds were selected using VGL these values will be auto populated. The only remaining fields to be filled out are the sizes of the cells to use during the inversion process.
Using eScript (Gravity Inversions Only)
For this script snippet you will need to ensure that you have selected the coverage to be captured using NetCDF
and the eScript
When using the eScript script template you will be prompted for the input NetCDF
Using GOCAD Models + Underworld
For this script snippet you will need to have captured a GOCAD model and associated CSV key describing the various parameters inside the model.
Part III Step-by-Step Tutorials
- 15 Oct 2012