To answer this question you will learn to:
Summarise the basic building blocks of a cluster (servers, storage, network(s)).
Identify some cases when work is or is not suitable for running on a cluster.
Referencing modules: Introduction
To answer this question you will learn to:
Explain the differences between serial, threaded and MPI programs.
Identify what types of programs can run across nodes.
Referencing modules: Processors and Processes
To answer this question you will learn to:
Login to Legion either from UCL or remotely.
Login to a specific node.
Referencing modules: Legion
To answer this question you will learn to:
Copy compressed data on to Legion from your local machine using login05.
Decompress uploaded compressed data into its final location.
Explain the different quotas and how flexible they are.
Explain the read/write accessibility to the different areas.
Understand that ~/Scratch
is a shortcut to /scratch/scratch/$USER
.
Explain the backup policy of the areas.
Explain the performance differences between the areas.
Explain why you would want to write to $TMPDIR
.
Referencing modules: Data Management on Legion
To answer this question you will learn to:
Use module list to see the default modules.
Use module avail to see all the modules.
Load a module that has prerequisites and requires changes to the default modules.
Put a module load command in your .bashrc
and start a new shell.
Start a X11 server on your local machine and run nedit
on Legion.
Referencing modules: Using software on Legion
To answer this question you will learn to:
Understand what wallclock time is.
Explain the assignment of resources to users to maintain fairness.
Write and submit a simple job script that leaves some resources as default values.
Use qstat
after submitting a job and qstat -j
to see what resources you requested.
Write and submit a job script that specifies all resources appropriately.
Write and submit a job script that writes to $TEMPDIR
and copies data back.
Understand that SGE’s working directory and the working directory for the program you are running inside the script can be different.
Explain that #Local2Scratch
happens outside of wallclock time, while other copying methods happen inside it.
Explain the difference in intended use for some of Legion’s nodes.
Explain and justify why you might want to run on a specific node.
Explain what common qstat
statuses mean.
Use qexplain
to identify faults with a submitted job.
Use qdel
to delete a job.
Use jobhist
after a job has ended.
Referencing modules: Jobs on Legion
To answer this question you will learn to:
Understand Legion’s usage policies.
Understand that data you have responsibilities for under the Data Protection Act may not be stored on Legion.
Understand the applicability of Research Data T&C for your work.
Know that CRAG will discuss additional resource requests and the requirements for making such a request.
Understand how data sharing is undertaken.
Know where to find help and the protocol for requesting assistance.
Know about different support options including drop-in sessions and the Research Programming Hub.
Referencing modules: Policies and Further Resources