2020-07-27-UCL

Welcome to The Carpentries Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org).

Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/

----------------------------------------------------------------------------

Call link:

Blackboard Collaborate:

Participants

Instructors (in order of appearance)

name | pronoun |
Matthew Gillman | |
Heather Kelly | she |
Tuomas Koskela | he/him | MacOS -- Research Software Developer / former plasma physicist

Helpers (in not any special order)

name | pronoun | other info
Hillary Bunketi | she/her
Ian Kirker | he/him//they/them | macOS (HPC Cluster Support / former Computational Chemist)
Tom Couch |

Presenter - Room DJ

name | pronoun |
David Pérez-Suárez | He-him-his |

Learners / Check in

Name | pronoun | Operative System & Other info you would like to share | Helper assigned to you
Angelos Kalitzeos | He | Linux - Institute of Ophthalmology, Postdoc Hillary
Jadesada Schneider | He/him/they| MacOS - Genetics, Evolution, Environment - Undergrad Intern Hillary
Jenifer Suntharalingham|She|Windows Ian
Marina | She | Windows Ian
Alex Ho | He | Windows David
Alasdair Warwick | He | MacOS - medic, 1st year (end of) PhD David
Wei Cui | He | Mac OS- economics department lecturer Ian
Mi | | Windows | David
Sam | He/him/his | Windows Tuomas
Sameer | He |Windows Tuomas

HPC - SHELL

Command notes

hotname - tells the name of the machine
pwd - prints working directory ( ~ means home! i.e. your default directory)
ls - lists the content of the directory
-l (long)
   -a (all, even "hidden" files or directories, these that start with a ".file")
   a directory name (even . -here- or .. -directory above-)
cd - change directory (we need a directory name, or a succession of them via dir1/dir2/...)
mkdir - create (make) a directory
rmdir - deletes (removes) a directory
nano - a command-line text editor
nano filename - will "create" a file with that name if it doesn't exist.
^ = ctrl key; ^O = save; ^X = exit (it will ask if you want to save it if not done before)
cat filename1 filename2 - concatenates the files on the screen
cp filename1 filename_copy - copies a file into a new one
rm filename - deletes (remove) the file (⚠ there's not a recycle bin ⚠)
mv filename_old filename_new - renames (moves) the file. It can also move it somewhere else.
    mv filename_old directory/ - moves the file into that directory.
rm -rf directory - deletes a directory even when there are files inside (-r for recursive, -f to "force")
tar - is a command to group/ungroup multiple files into one
tar -xvf filename.tar.gz - unzip and extract the tar files (x for extract, v for verbose, and f to point to read the filename)
head filename - shows you the first 10 lines of a file
tail filename - the last 10 lines of a file
less filename - display the content of the file in a nice way that allows you to go up/down, search, ...
wc -l filename - tells you the number of lines of that file
wc -l * - will run wc to all the files in the directory.
wc -l *.fasq - will run wc for the ones that end by .fasq
> - redirection command
echo set of words - will print that on the screen
echo "this is a test" > test.txt - redirects the output from echo into a file.
&> - redirects the normal output (standard output) and the error output.
>> - redirects like > but appends to the end of the file
grep something filename - finds where "something" is in filename
| - Pipe command, the output of a command is sent as input for the next
grep something filename | head -n 1 (shows only the first line of the output of grep)
grep something filename | wc -l (shows the number of lines where "something" appears in the filename)
zcat filename.tar.gz - shows you the content of a zipped file
chmod - changes file permissions (see "Writing Scripts", below)

Bonus Command Notes

- If you've been using tar for a while, you might notice that we've skipped the "z" to indicate the compression format -- you don't need to use this in modern versions of tar, it'll automatically try to work it out.
- In other systems, you may need to redirect the standard error as: wc -l * > lines_lengths 2>&1 (redirects the standard output into lines_length, and the error (2) into where the standard output (1) is being redirected )
- Unlike .zip files you might be familiar with, .gz files only compress one file. This is why we combine the gzip format with a .tar file, which can bundle a collection of files together.

Writing scripts

nano myscript.sh (Remember .sh is not needed, but it helps us to identify it as a shell script)
#!/bin/bash - In the first line we tell linux what to use to execute this.
Other lines that start with # are comments and not executed

Permissions: When running "ls -l" it shows some columns at the beginning of each line:
-rw-r--r--: we can separate them into four: - rw- r-- r--

whether it's a file (-) or a directory (d) or some other more advance stuff
rwx : read, write, execute permissions for the owner of the file. In the above example the owner can only read and write.
and 4: same than 2 but for the group and the rest of the world. In the above example, they can only read.

chmod +x myscript.sh - Adds a "execution" persmission to myscript for the owner.
./myscript.sh then can run

Variables
In bash you can define a variable as: VAR="this is something", but to refer to it you need to use $VAR. (echo $VAR)
Note, no spaces around the "=" sign!!

Save the output of a command into a variable:
TEST=ls -l -- It doesn't work!
TEST=$(ls -l) -- it does!!

Quiz #1 - Absolute vs Relative path
cd .. or cd ~ ✅
cd .. ✅
cd ~ or cd .. or cd ~/data/.. ✅
cd .. ✅
cd .. ✅
5, 6, 7, 8, 9 (✅ all but no 6th)

Dataset to use later
wget http://rits.github-pages.ucl.ac.uk/hpc-shell/files/bash-lesson.tar.gz

Exercise 1:

Multiple wildcards

You can even use multiple *s at a time. How would you run wc -l on every file with “fb” in i
wc -l *fb* ✅
wc -l *fb* ✅

Using other commands

1. Now let’s try cleaning up our working directory a bit. Create a folder called “fastq” and move all of our .fastq files there in one mv command.

mkdir fastq
mv *.fastq fastq
ls fastq/

Writing commands using pipes

How many files are there in the “fastq” directory we made earlier? (Use the shell to do this.)

ls fastq | wc -l ✅
ls -l fastq/ | wc -l ❌ (careful that -l shows you a line on the top with information of the directory!)
17
ls fastq | wc -l ✅

Writing our own scripts and loops

cd to our fastq directory from earlier and write a loop to print off the name and top 4 lines of every fastq file in that directory.

Is there a way to only run the loop on fastq files ending in _1.fastq?

#!/bin/bash

cd fastq
for FILE in *.fastq
do
echo $FILE
head -n 4 $FILE
done

Feedback https://jamboard.google.com/d/1YfdfV4d5xSYSWz4P-1K4qtWranudOr6wUp3480lkqa4/edit?usp=sharing

HPC - INTRO 28.7.2020

Instructors (in order of appearance)

name | pronoun |
Tuomas Koskela | he/him | MacOS -- Research Software Developer / former plasma physicist
- https://shellshare.net/r/i9aDHkkLZ9tJ3EBx9c

Presenter - Room DJ

name | pronoun |
David Pérez-Suárez | He-him-his |

Helpers (in not any special order)

name | pronoun | other info
Hillary Bunketi | she/her
Ian Kirker | he/him//they/them | macOS (HPC Cluster Support / former Computational Chemist)
Tom Couch | he/him
Matthew Gillman | |
Heather Kelly | she |

Learners / Check in

Name | pronoun | Operative System & Other info you would like to share | Helper assigned to you
- Shellshare link (it will be explained later)
Angelos Kalitzeos | He | Linux - Institute of Ophthalmology, Postdoc | Hillary
- https://shellshare.net/r/fnk1e3tIKXNnpvmd7R
Jadesada Schneider | He/him/they| MacOS - Genetics, Evolution, Environment - Undergrad Intern | Hillary
- https://shellshare.net/r/4y3e2dvSjzbfkgu4gQ
Jenifer Suntharalingham|She|Windows | Ian
- Sharing terminal in https://shellshare.net/r/lFUPnlbWcmRDDGdwC0
Marina | She | Windows | Ian
Alex Ho | He | Windows | Matthew
- https://shellshare.net/r/Kt9VeoSd2OpbYvkW5O
Alasdair Warwick | He | MacOS - medic, 1st year (end of) PhD | Matthew
- https://shellshare.net/r/2S3jRn3emMe0UDAVSR
Wei Cui | He | Mac OS- economics department lecturer | Heather
- https://shellshare.net/r/sK3zMR7x4wXk8nx1po
Mi | | Windows | Heather
- https://shellshare.net/r/AIrhvc0NoLLYcim9Xb
Sam | He/him/his | Windows | Tom
- https://shellshare.net/r/I97kxs3q79FXXmKs0m
Sameer | He |Windows | Tom
- https://shellshare.net/r/SjeBH3j0SazBloVLNt

Shellshare:

wget -qO shellshare https://get.shellshare.net && python shellshare

ssh configuration (Include proxyCommand only if outside UCL VPN)

https://www.rc.ucl.ac.uk/docs/howto/#single-step-logins-using-tunnelling

Edit the file .ssh/config and add the following

Host <choose a name>
HostName myriad.rc.ucl.ac.uk
User <your username>
ForwardX11Trusted yes
proxyCommand ssh -W myriad.rc.ucl.ac.uk:22 <your_username>@socrates.ucl.ac.uk

Exercise: Sharpen a fuzzy image

Clone the git repository https://github.com/davidhenty/sharpen to your Scratch directory on Myriad
Compile one of the parallel versions of sharpen, for example F-MPI
1. Use the command make
2. You may need to load the compilers/intel module
Write a job script to run it on 2 processes
1. Add #$ -pe mpi N option to request N processes
2. It will run in less than a minute and not require large amounts of memory or disk
Submit the job and wait for it to run
Examine the output
1. Load the graphicsmagick module and use gm display command to view image files
2. The scheduler output file contains performance information
Repeat 3-5 with 4 processes, what happened to the run time?

How do you use computing in your work?

Using Python/Matlab based scripts to process image sequences (videos) that take quite a long time on a PC
For genetic analysis and running pipelines (still in beginner phase)
optimisation of molecular structure
Using R to analyse large healthcare datasets

How would your work benefit from High Performance Computing?

Matlab parellel computing
Myriad comes with a lot of useful modules of genetic analysis pre-installed
fast post processing
Be able to use this for my analysis have large data sets to analyse
intensive calculation
Manage large databases and run R which is too much for a standard computer

git clone https://github.com/lh3/seqtk.git

Commands:

nproc --all - Number of CPUs
free -h - Memory available
nodetypes - Shows the different types of nodes
qhost - gives you the information of all the nodes
qhost -h <nodename> - gives you information about the node
qsub script - to submit the job
qstat - tells you whehter the job is running, waiting,...
watch -n 10 <command> - runs a command every 10 seconds
ctrl+R - allows you to search backwards in the terminal history
qdel <jobnumber> - it cancels a particular job
qdel -u <user> - it will cancel all jobs from your users
qrsh -l <and other options> - it gets you access to an interactive shell

qrsh -l mem=512M,h_rt=0:10:00 (note! commands separated by a comma, and without spaces)

module list - Shows a list of the modules that has been loaded
module avail - What's available in the system
   module avail python - will show only the ones related with python.
module load <modulename> - adds that module to your "environment".
module unload <modulename> - removes that module to your "environment".
module swap <modulename>/version - to swap between different version
module purge - to remove all the modules. One way to recover them back is to log-out/in again.
    module load default-modules/2018 - will load the default env too.
module show <modulename> - tells you what it does when you load the module
echo $PATH - shows you the places where the Operative System looks for a program when you call it. $PATH is an environment variable and is modified when we (un)load modules.
lquota - shows the quota your accounts has got (as in disk space)
scp from to - copy files via ssh
- on your own machine: scp <username>@myriad.rc.ucl.ac.uk:/path/to/file/to/copy .
     (note the last dot! that means "here" as to the directory you run that command)
- on your own machine: scp localfile.pdf <username>@myriad.rc.ucl.ac.uk:~/hpc-test
    (note the ~ is a shortcut for /home/<username>/ )
rsync -avzP path/to/local/file.txt yourUsername@myriad.rc.ucl.ac.uk:path/on/Myriad
tar czvf my_job.tar.gz job* - will tar and zip all the files that start by job

~/.bashrc - is a file that sets some variables when you login into the shell. You can add `module load <modulename>` commands there, so you don't need to run them everytime you log into myriad.

Options for the scheduler (man qsub)
#$ -N <jobname> - the name of the job
#$ -wd /path/to/working/directory - defines where you want the code to "run"
#$ -l h_rt=hh:mm:ss - How much real-world time you need to run the job
#$ -pe mpi - How many CPUs does your job need?
#$ -mem=<bytes> - How much memory per process does your job need?
#$ -m be - sends an email when begins/ends/aborts o reschedules/suspends
#$ -M e-mail@address - to who you want to spam with your results

https://jamboard.google.com/d/1YfdfV4d5xSYSWz4P-1K4qtWranudOr6wUp3480lkqa4/viewer?f=0