Learning Outcomes

What is a cluster and why would I use one?

To answer this question you will learn to:

  1. Summarise the basic building blocks of a cluster (servers, storage, network(s)).

  2. Identify some cases when work is or is not suitable for running on a cluster.

Referencing modules: Introduction

What types of program are there and how do they differ?

To answer this question you will learn to:

  1. Explain the differences between serial, threaded and MPI programs.

  2. Identify what types of programs can run across nodes.

Referencing modules: Processors and Processes

How do I access and login to Legion?

To answer this question you will learn to:

  1. Login to Legion either from UCL or remotely.

  2. Login to a specific node.

Referencing modules: Legion

What file stores are on Legion and how do I use them?

To answer this question you will learn to:

  1. Copy compressed data on to Legion from your local machine using login05.

  2. Decompress uploaded compressed data into its final location.

  3. Explain the different quotas and how flexible they are.

  4. Explain the read/write accessibility to the different areas.

  5. Understand that ~/Scratch is a shortcut to /scratch/scratch/$USER .

  6. Explain the backup policy of the areas.

  7. Explain the performance differences between the areas.

  8. Explain why you would want to write to $TMPDIR.

Referencing modules: Data Management on Legion

What software is available and how do you run it?

To answer this question you will learn to:

  1. Use module list to see the default modules.

  2. Use module avail to see all the modules.

  3. Load a module that has prerequisites and requires changes to the default modules.

  4. Put a module load command in your .bashrc and start a new shell.

  5. Start a X11 server on your local machine and run nedit on Legion.

Referencing modules: Using software on Legion

How do I run and manage jobs?

To answer this question you will learn to:

  1. Understand what wallclock time is.

  2. Explain the assignment of resources to users to maintain fairness.

  3. Write and submit a simple job script that leaves some resources as default values.

  4. Use qstat after submitting a job and qstat -j to see what resources you requested.

  5. Write and submit a job script that specifies all resources appropriately.

  6. Write and submit a job script that writes to $TEMPDIR and copies data back.

  7. Understand that SGE’s working directory and the working directory for the program you are running inside the script can be different.

  8. Explain that #Local2Scratch happens outside of wallclock time, while other copying methods happen inside it.

  9. Explain the difference in intended use for some of Legion’s nodes.

  10. Explain and justify why you might want to run on a specific node.

  11. Explain what common qstat statuses mean.

  12. Use qexplain to identify faults with a submitted job.

  13. Use qdel to delete a job.

  14. Use jobhist after a job has ended.

Referencing modules: Jobs on Legion

What are the rules for using Legion and where do I get more information?

To answer this question you will learn to:

  1. Understand Legion’s usage policies.

  2. Understand that data you have responsibilities for under the Data Protection Act may not be stored on Legion.

  3. Understand the applicability of Research Data T&C for your work.

  4. Know that CRAG will discuss additional resource requests and the requirements for making such a request.

  5. Understand how data sharing is undertaken.

  6. Know where to find help and the protocol for requesting assistance.

  7. Know about different support options including drop-in sessions and the Research Programming Hub.

Referencing modules: Policies and Further Resources