"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

APAC Project Requirements


Job Management

Back to ApacProjectRequirements

Use Cases



  • APAC-JobManager-UseCaseDiagram1.gif:
    APAC-JobManager-UseCaseDiagram1.gif




CRC's



SERVICE COLLABORATOR
Job Submission [Hardware Resource]
Job Scheduling [Hardware Resource]
Job Migration [Hardware Resource]
Calculating Scheduling Information Accounting Module (AAA)
Resource and Service Brokerage Resource and Server Registries
Resource Allocation [Hardware Resource]
Check that user has authorisation for job AAA
Job Control (resume, pause, cancel) [Hardware Resource]
Spawn Job  





Sequence Diagrams




* APAC-JS001-SeqDiagram.GIF:
APAC-JS001-SeqDiagram.GIF

  • APAC-JS012-SeqDiagram.GIF:
    APAC-JS012-SeqDiagram.GIF

  • APAC-JS017-SeqDiagram.GIF:
    APAC-JS017-SeqDiagram.GIF


Functional Requirements


Req Sub Req1 Sub Req2 Child Of Requirement Comment
JS001       Submit Job User will able to submit a job to the APAC Grid in a uniform and consistent fashion
  JS002   JS001 Setup Job User will be required to setup various job submission parameters including the following: * which certificate/project code/user id to use
    JS003 JS002 Manually Locate Resource and Service User will be able to manually search and locate a resource to use and upload it to the service; The user will be able to manulally search and locate a resource and tell the system where to locate it - the system then goes and fetches it when it is required; The use will be able to manually search and locate a service and select it for use by the job
    JS004 JS002 Resource and Service Broker The user should be able to describe or indicate the type of job they are submitting and the system should be able to automcatically locate a relevant service and resource(s)
    JS005 JS002 Upload Data The user should have the ability to upload local and remote data to a service
    JS006 JS002 Specify Data Location The user should have the ability to tell the system where data is located and the system will fetch it when it requires it
  JS007   JS001 Schedule Job The user should be able to schedule a job on the grid
    JS008 JS007 Calculate schedule information The user should be able to calculate an approximate time for a job and based on this be able to schedule its execution, by looking at availabilities of resources
    JS009 JS007 Determine Execution Host The user should be given a selection of available resources for their job, the user should be able to select from these resources which they would like the job to execute on
  JS010   JS001 Accept (Start)/Reject (Cancel) Job At anytime during setting up a job, the user should be able to reject (cancel) a job. The user must Accept a job before it starts on a resource (This could be implemented by the user pressing the button "Accept" or "Start"
JS011       Migrate Job The system should provide the functionality for the user to manually migrate a job to a different resource. The system could also provide this functionality automatically, for example the system should be able to migrate/move jobs to different resources if they are not being used
JS012       Job Control The user should have some abilities to stop and pause a job's execution
  JS013   JS012 Pause Job The user should have the ability to pause a jobs execution and later resume it from where it was stopped. In the event of a pause, the job is put aside and other jobs could be executed on the resource node. When the user resumes execution it will need to be re-queued
  JS014   JS012 Stop Job The user should have the ability to stop or cancel a job in execution or one the is waiting in a queue. The job will be terminated and removed from the queue
JS015       Spawn Job The user should have the ability to spawn/duplicate a job on another server and potentially duplicating a job or running it with different datasets
JS016       Load Job Setup (User Session collaboration) user should be able to load the setup information for a job and continue with a job submission
JS017       Save Job Setup (User Session collaboration) user should be able to save the setup information for a job and later be able to retrieve it
JS018       Modify Job - modify the jobs setup A user should be able to modify a job setup indefinitely before it is submitted



Non-Functional Requirements



  • Reliability
  • Performance Requirements - Max time required to submit a job (<1 minute)
  • Supportability

Possibly Equipment Based Job Requirements:
  • Saftey Mechanism to ensure that the minimum number of specialist technicians available at equipment site for safe remote operation of equipment
  • Provide a "kill" switch for the equipment both on the local site and remotely
  • Checks to ensure that values entered for the job are within the equipments tolerance range (this could possibly be quite applciation specific, thus may need to be developed on an application basis, ie - in the applications PDC file)


Related Works and Documents




Preliminary Discussions



  • "GUI" submits a job to the "Job Management Portal"
  • a "job request" contains a job/session can consist of multiple tasks where each task consists of 1 computional node
  • a "job request" includes an attachment which could be either: 1) ...... 2) ..............

  • The required response from the Portal includes: - validation of "job request" attachment(s) - find computational nodes available for the job - queue tasks and schedule - manage tastks - updata task states

  • Portal could make use of the following: - SOAP comms - Registry of CN's/services - User DB (Mysql) - Notification services - Results archive - Visualisation from CN's/services - Error handling/reset of CN's/services

-- RyanFraser - 01 Feb 2005

Requirements: (Draft) Job Management • Job Submission: It’s clear that users require easy-to-use tools for submitting jobs. In particular, we require robust support for batch job submission, including the ability to generate and submit appropriate batch scripts.

• Job Monitoring: Users should be able to monitor the status of their jobs online and recieve notification when jobs complete or fail. In addition, they should be able to monitor the performance of their jobs.

• Job Migration: Given poor performance or the choice of better available resources, it should be simple for users to migrate their jobs. Furthermore, they should be able to programmatically specify conditions for migrations at time of job submission.

• Job History: Users should be able to examine their job history, along with status information, output to stdout and stderr, and performance information. This capability could be enhanced with the ability to archive associated executables, input files, and output files.

-- RyanFraser - 23 Feb 2005

Some comments on the last few paras. It is probable that users will want to query/kill individual tasks within a batch, which will require a finer grain of communication between the resource and the portal.

Depending on how it's implemented, job migration may be very tricky. Some numeric codes are platform specific, and will not/cannot support datafiles from different platforms, so one has to be a little careful in regards to where a job is migrated. Even within the same platform, there are issues with licenced software only being able to run a specific job on one particular machine (eg Flac3D). In this instance, migration is not plausable.

-- GordonGerman - 19 Apr 2005

The non functional requirement of less than 1 minute might not be realistic due to the fact that we are dependant on the speed of the network links and how much data a job encompasses (eg might have to upload 1GB of data).

-- RobertCheung - 20 Apr 2005
Topic attachments
I Attachment Action Size Date Who Comment
APAC-JS001-SeqDiagram.GIFGIF APAC-JS001-SeqDiagram.GIF manage 13.8 K 04 Apr 2005 - 16:27 RyanFraser  
APAC-JS012-SeqDiagram.GIFGIF APAC-JS012-SeqDiagram.GIF manage 10.0 K 04 Apr 2005 - 16:27 RyanFraser  
APAC-JS017-SeqDiagram.GIFGIF APAC-JS017-SeqDiagram.GIF manage 5.7 K 04 Apr 2005 - 16:27 RyanFraser  
APAC-JobManager-UseCaseDiagram1.gifgif APAC-JobManager-UseCaseDiagram1.gif manage 18.3 K 18 Mar 2005 - 15:43 RyanFraser  
Topic revision: r15 - 15 Oct 2010, UnknownUser
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).