"Seegrid will be due for a migration to confluence on the 1st of August. Any update on or after the 1st of August will NOT be migrated"

The Problem

A 'quota' was issued to a group for 'Maximum usage' - it is expressed in cores. A project or group can have up to xxx cores on the cloud provider.

The cloud provider then chose to oversubscribe the resource for some reason.

We now have an issue 'launching' instances with in our allocation and can not assume that the cloud provider will keep their word as expressed by the allocation of resources.

Our current resource usage is a simple equation

resources = current running jobs

The 'address it' Options

Provider quota stats

Get provider to allow for stats over API to

The 'Work around it' Options

Evaluation
  • Resources used
  • Current 'fit' - the 'gap' to use it
  • Cloud Ideal - only use what is needed, when its needed
  • Multiple controlers - can we share the project with developers/servers

'Hot Spare'

For each user LOGGED ON - at the time the user logs in:
  1. Launch an instance - let it wait in a polling state
  2. Report the launch status in the UI as 'good to go'
  3. Instruct worker:
    1. Activate with job, start another spare
    2. Die on command or after timeout
  4. Always maintain the 'capacity' for every user to work, or know something about the queue
Evaluation    
Resources used jobs+users+1  
Current fit no 'idle controlers', comms to instance  
Cloud Ideal Pretty close  
Multiple controlers yes - 'known' resource owners  
     

'Dedicated pool'

Consume all our resources and try to make them function like a 'always on' cluster in the cloud.

Push work to a worker node, then reboot the node 'clean'

Evaluation    
Resources used all of them!  
Current fit no 'idle controlers', comms to instance  
Cloud Ideal far away  
Multiple controlers no: need a 'free/busy' method (extra controler)  

'Meet the Quota'

-- TerryRankine - 05 Nov 2014
Topic revision: r2 - 05 Nov 2014, TerryRankine
 

Current license: All material on this collaboration platform is licensed under a Creative Commons Attribution 3.0 Australia Licence (CC BY 3.0).