Glasgow Ganga Quickstart Guide
Contents
Ganga Links
Also useful are introductions to python, e.g., the python tutorial. The iPython Documentation points out where iPython syntax differs from normal python scripts.
HOWTO for Glasgow
Getting Started
- Login to svr020 using gsissh or ssh
- Type ganga
- Say yes to setup the standard config files
Your First Ganga Job
In [3]:j1 = Job(application=Executable(exe='/bin/echo',args=['Hello, World'])) In [4]:j1.submit() Ganga.GPIDev.Lib.Job : INFO submitting job 0 Ganga.GPIDev.Adapters : INFO submitting job 0 to Local backend Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "submitted" Out[4]: 1 In [5]: Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "running" Ganga.GPIDev.Lib.Job : INFO job 0 status changed to "completed" In [5]:print file(j1.outputdir+'stdout').read() Hello, World
Note this job ran locally on the UI, which is not too interesting.
Your First Ganga Grid Job
Prequel
You should quit ganga and edit the VirtualOrganisation stanza in .gangarc to your VO, e.g.,
VirtualOrganisation = gridpp
You should also ensure that ganga maintains the validity of your grid proxy, so in the [GridProxy_Properties] section uncomment the lines validityAtCreation and minValidity, putting, e.g.,
validityAtCreation = 36:00 minValidity = 24:00
(See also the later section on certificates.)
LCG Backend
Running jobs on the grid is easy - just change the job's backend to LCG:
In [2]:gridJob=Job(backend=LCG(), application=Executable(exe='/bin/echo',args=['Hello, World']))
Targeting Glasgow
The above job can run anywhere your VO is supported. However, if you are preparing an environment to specifically target Glasgow, then you need to tell ganga not to send the job anywhere else. Do this by adding the CE's queue name to the job:
In [5]:gridJob.backend.CE='svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp'
(Change gridpp to the name of your VO, or the queue which your VO can access - you can check the queues in the information system monitor.)
Submitting The Job
Now run the job:
In [6]: gridJob.submit() Ganga.GPIDev.Lib.Job : INFO submitting job 2 Ganga.GPIDev.Adapters : INFO submitting job 2 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "submitted"
Ganga will submit the job to our resource broker, then poll it for status changes for you. When the job is done the output is retrieved and stored in the ganga work directory.
Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "running" Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "completing" Ganga.GPIDev.Lib.Job : INFO job 2 status changed to "completed"
This is much more convenient that having to poll edg-job-status by hand.
Job Output
Each job has an outputdir, and all output from the job will be stored here. You can process this inside ganga, using standard python, or (more likely), process the output offline with other tools.
By default, ganga will store jobs' outputs in ~/gangadir/workspace/Local/JOB_ID/output, where JOB_ID is a sequential job number.
Wrapper Scripts and Sandboxes
Wrapper to Start a Prepared Binary
When the job wakes up in the batch system it's probably not in the working directory you expect - it will usually be in a scratch directory for the job.
If you have prepared binaries in your $CLUSTER_SHARED area, and perhaps some input files and output directories, you might want to use a wrapper script that moves to the right directory, then starts up the correct code.
Here's an example, which uses some environment variables to make sure the job is running in a unique directory:
#! /bin/bash # # Make a structured directory to run the job in - the job's output files should go somewhere sensible BASE_DIR=$CLUSTER_SHARED/sieve/run cd $BASE_DIR || exit 1 JOB_DIR="$(date +'%Y-%m-%d')/$PBS_JOBID" mkdir -p $JOB_DIR || exit 1 cd $JOB_DIR || exit 1 # Now invoke the program BINARY=$CLUSTER_SHARED/sieve/sieve echo "Invoking $BINARY $@" $BINARY "$@" if [ $? == "0" ]; then echo "All done. Make tea..." else echo "$BINARY failed with status $?. Oh dear..." fi
If this wrapper is in, say, CLUSTER_SHARED/bin/sievewrapper.sh then the ganga job can be defined as:
In [53]: import os In [54]:sieveJob=Job(backend=LCG(CE='svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp'), ....: application=Executable(exe=os.environ['CLUSTER_SHARED']+'/wrappers/sievewrap.sh',args=['-s', '1000', '-e', '1000000000']))
A Little Python Aside
iPython is a fully functioning python shell, so it takes all the normal python commands. In the last example we imported the os module, which allows us to access the environment variables, such as CLUSTER_SHARED within python using os.environ.
Sandboxes
If you are just running on the Glasgow cluster then you probably don't need sandboxes (sets of files copied to/from the batch system with the job) - just work in the CLUSTER_SHARED directory. However, they can be useful, so here's how to use them:
Input Sandboxes
When a job's defined as
Executable(exe='/bin/echo', ...)
then it's the binary on the remote system which is executed. If you want to send a wrapper script with the job, then tell ganga the exe is a File:
Executable(exe=File('~/wrappers/sievewrap.sh'), ...)
Then the sievewrap.sh script is parceled up with the job and sent along with it. (In standard EGEE speak the file becomes part of the job's input sandbox.)
You can add other files to the job's sandbox using, e.g.,
gridJob.inputsandbox=[File('~/inputs/myJobInputs.dat')]
Again, these files will be in the job's working directory when the job starts.
Output Sandboxes
Output sandboxes are files which will be retrieved from the batch system once a job has run. They will be passed back to you as files in the output directory of that job.
gridJob.outputsandbox=['someOutput.txt', 'jobLogs.*']
Grid Certificates in Ganga
Controlling Certificate Lifetime
Grid jobs need to have a valid proxy certificate for the entire lifetime of the job - and this has to include any queuing time. You can ensure that ganga will submit certificates with suitable lifetimes by changing the parameters in .gangarc. E.g., if your job takes 24 hours to run, and you want to allow for 24 hours of time in the queue, then perhaps
[GridProxy_Properties] # Proxy validity at creation (hh:mm) validityAtCreation = 72:00 # Minimum proxy validity (hh:mm), below which new proxy needs to be created minValidity = 48:00
Ganga will now refuse to submit the jobs unless a proxy of 48 hours exists. Use
gridProxy.renew()
to renew your proxy. gridProxy.info() will tell you how much time is left.
Using a MyProxy Server
The above method is quite risky, in that it exposes long lived proxies on sites. Much better than this is to upload a long lived proxy to a MyProxy server. Then the Glasgow resource broker will renew proxies for jobs which are running short. The default MyProxy server on svr020 is hosted at RAL Tier1 and the command to upload a proxy certificate to here is:
$ myproxy-init -d -n
The default lifetime of the proxy is 7 days. This can be increased, but it's better to just renew it as necessary.
To get information about your uploaded proxy use myproxy-info -d or to delete an uploaded proxy use myproxy-destroy -d
For more details see https://edms.cern.ch/file/722398/1.1/gLite-3-UserGuide.html (Section on Proxy Renewal).
Bulk Job Submission
Ganga includes a very simple job splitter, which can be used to take an array of jobs, each with different input parameters, and then submit them in bulk to the cluster.
It's easiest to illustrate with a simple example:
In [87]:import os In [88]:jobArray = list() In [89]:for n in range(20): ....: jobArray.append(Executable(exe=os.environ['CLUSTER_SHARED']+'/bin/myapp', args=["--verbose", "--logFile=runZ%03d.log" % n])) ....: In [90]:jobArray[1] Out[90]: Executable ( exe = '/cluster/share/gla012/bin/myapp' , env = {} , args = ['--verbose', '--logFile=runZ001.log'] )
Note we:
- Need to import the os module to get access to the environment list.
- Use the string % operator to zero pad the log file name. (See the python manual.)
Now we use an ExeSplitter to define our multi-part job:
In [92]:bulkGridJob=Job(splitter=ExeSplitter(apps=jobArray), \ backend=LCG(CE='svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp')) In [93]:bulkGridJob.submit() Ganga.GPIDev.Lib.Job : INFO submitting job 19 Ganga.GPIDev.Adapters : INFO submitting job 19.0 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 19.0 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 19.1 to LCG backend Ganga.GPIDev.Lib.Job : INFO job 19.1 status changed to "submitted" Ganga.GPIDev.Adapters : INFO submitting job 19.2 to LCG backend ...
The submission of each sub-job is done separately, which can take a little time. As usual ganga will take care of polling the status of each job and retrieving the output when it becomes available.
In this way ganga can control the submission of several 100 jobs quite easily.
Note that the output of each subjob will be found in a numbered subdirectory of the main controlling job (in this case, job 19):
In [97]:bulkGridJob.subjobs[1].outputdir Out[97]: /clusterhome/home/gla012/gangadir/workspace/Local/19/1/output/
And all the other job parameters can be queried in the same way:
In [99]:bulkGridJob.subjobs[1].backend Out[99]: LCG ( status = 'Scheduled' , reason = 'Job successfully submitted to Globus' , iocache = '' , CE = 'svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp' , middleware = 'EDG' , actualCE = 'svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp' , id = 'https://svr023.gla.scotgrid.ac.uk:9000/S_RBomCRMwFN0kG_rUp7Gg' , jobtype = 'Normal' , exitcode = None , requirements = LCGRequirements ( other = [] , nodenumber = 1 , memory = None , software = [] , ipconnectivity = 0 , cputime = None , walltime = None ) )
Disconnecting and Reconnecting
Starting Up Again
Ganga keeps all state about your jobs in ~/gangadir. When you restart ganga it will reread the last state and take appropriate actions (querying running job statuses, downloading outputs, etc.). However, it will have forgotten local names for your jobs, but you can reset these using the jobs object, which contains all of your jobs.
svr020:~$ ganga *** Welcome to Ganga *** Version: Ganga-4-4-1 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. This is free software (GPL), and you are welcome to redistribute it under certain conditions; type license() for details. Ganga.GPIDev.Lib.JobRegistry : INFO Found 3 jobs in jobs Ganga.GPIDev.Lib.JobRegistry : INFO Found 0 jobs in templates In [1]:jobs Out[1]: Statistics: 3 jobs -------------- # id status name subjobs application backend backend.actualCE # 0 failed Executable LCG svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcg # 1 completed Executable LCG svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcg # 2 new Executable LCG In [2]: myJob=jobs(2) In [3]: myJob.submit() ...
Screen
It's also possible to run your ganga session in screen, which allows you to disconnect and logout, while ganga still runs. You can then reconnect when you log back in (possibly from a different machine). There's a nice screen tutorial here. N.B. to reattach to a screen running on svr020 use:
screen -r