Welcome to The Carpentries Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org).

Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/

 ----------------------------------------------------------------------------

Call link:


Blackboard Collaborate: 

Participants



Instructors (in order of appearance)

Helpers (in not any special order)

Presenter - Room DJ

Learners / Check in



HPC - SHELL



Command notes



hotname - tells the name of the machine
pwd - prints working directory ( ~ means home! i.e. your default directory)
ls  - lists the content of the directory
  -l (long)
   -a (all, even "hidden" files or directories, these that start with a ".file")
   a directory name (even . -here- or .. -directory above-)
cd - change directory (we need a directory name, or a succession of them via dir1/dir2/...)
mkdir - create (make) a directory 
rmdir - deletes (removes) a directory
nano - a command-line text editor
  nano filename  - will "create" a file with that name if it doesn't exist.
  ^ = ctrl key; ^O = save; ^X = exit (it will ask if you want to save it if not done before)
cat filename1 filename2 - concatenates the files on the screen
cp filename1 filename_copy - copies a file into a new one
rm filename - deletes (remove) the file (⚠ there's not a recycle bin ⚠)
mv filename_old filename_new - renames (moves) the file. It can also move it somewhere else.
    mv filename_old directory/ - moves the file into that directory.
rm -rf directory  - deletes a directory even when there are files inside (-r for recursive, -f to "force")
tar - is a command to group/ungroup multiple files into one
  tar -xvf filename.tar.gz - unzip and extract the tar files (x for extract, v for verbose, and f to point to read the filename)
head filename - shows you the first 10 lines of a file
tail filename - the last 10 lines of a file
less filename - display the content of the file in a nice way that allows you to go up/down, search, ...
wc -l filename - tells you the number of lines of that file
  wc -l * - will run wc to all the files in the directory.
  wc -l *.fasq - will run wc for the ones that end by .fasq
> - redirection command
echo set of words - will print that on the screen
  echo "this is a test" > test.txt  - redirects the output from echo into a file.
&> - redirects the normal output (standard output) and the error output.
>> - redirects like > but appends to the end of the file
grep something filename  - finds where "something" is in filename
| - Pipe command, the output of a command is sent as input for the next
  grep something filename | head -n 1  (shows only the first line of the output of grep)
  grep something filename | wc -l (shows the number of lines where "something" appears in the filename)
zcat filename.tar.gz - shows you the content of a zipped file
chmod - changes file permissions (see "Writing Scripts", below)

Bonus Command Notes


  - If you've been using tar for a while, you might notice that we've skipped the "z" to indicate the compression format -- you don't need to use this in modern versions of tar, it'll automatically try to work it out.
  - In other systems, you may need to redirect the standard error as: wc -l * > lines_lengths 2>&1  (redirects the standard output into lines_length, and the error (2) into where the standard output (1) is being redirected )
 - Unlike .zip files you might be familiar with, .gz files only compress one file. This is why we combine the gzip format with a .tar file, which can bundle a collection of files together.
 

Writing scripts


nano myscript.sh (Remember .sh is not needed, but it helps us to identify it as a shell script)
#!/bin/bash - In the first line we tell linux what to use to execute this.
Other lines that start with # are comments and not executed

Permissions: When running "ls -l" it shows some columns at the beginning of each line:
  -rw-r--r--: we can separate them into four: -  rw-  r-- r-- 
  1.     whether it's a file (-) or a directory (d) or some other more advance stuff
  2. rwx : read, write, execute permissions for the owner of the file. In the above example the owner can only read and write.
  3. and 4: same than 2 but for the group and the rest of the world. In the above example, they can only read.
chmod +x myscript.sh - Adds a "execution" persmission to myscript for the owner.
./myscript.sh then can run

Variables
In bash you can define a variable as: VAR="this is something", but to refer to it you need to use $VAR. (echo $VAR)
Note, no spaces around the "=" sign!!

Save the output of a command into a variable: 
TEST=ls -l  -- It doesn't work!
TEST=$(ls -l) -- it does!!


Quiz #1 - Absolute vs Relative path
cd .. or cd ~  ✅
cd ..  ✅
cd ~  or cd .. or cd ~/data/.. ✅
cd .. ✅
cd .. ✅
5, 6, 7, 8, 9 (✅ all but no 6th)


Dataset to use later
wget http://rits.github-pages.ucl.ac.uk/hpc-shell/files/bash-lesson.tar.gz

Exercise 1:

Multiple wildcards


You can even use multiple *s at a time. How would you run wc -l on every file with “fb” in i
wc -l *fb* ✅
wc -l *fb* ✅

Using other commands


    1. Now let’s try cleaning up our working directory a bit. Create a folder called “fastq” and move all of our .fastq files there in one mv command.

mkdir fastq
mv *.fastq fastq
ls fastq/

Writing commands using pipes


How many files are there in the “fastq” directory we made earlier? (Use the shell to do this.)

ls fastq | wc -l ✅
ls -l fastq/ | wc -l ❌ (careful that -l shows you a line on the top with information of the directory!)
17
ls fastq | wc -l ✅


Writing our own scripts and loops


cd to our fastq directory from earlier and write a loop to print off the name and top 4 lines of every fastq file in that directory.
Is there a way to only run the loop on fastq files ending in _1.fastq?

#!/bin/bash

cd fastq
for FILE in *.fastq
do
   echo $FILE
   head -n 4 $FILE
done



Feedback  https://jamboard.google.com/d/1YfdfV4d5xSYSWz4P-1K4qtWranudOr6wUp3480lkqa4/edit?usp=sharing


HPC - INTRO 28.7.2020



Instructors (in order of appearance)

Presenter - Room DJ

Helpers (in not any special order)

Learners / Check in

Shellshare:


wget -qO shellshare https://get.shellshare.net && python shellshare

ssh configuration (Include proxyCommand only if outside UCL VPN)


https://www.rc.ucl.ac.uk/docs/howto/#single-step-logins-using-tunnelling

Edit the file .ssh/config and add the following


Exercise: Sharpen a fuzzy image


  1. Clone the git repository https://github.com/davidhenty/sharpen to your Scratch directory on Myriad
  2. Compile one of the parallel versions of sharpen, for example F-MPI
    1. Use the command make
    2. You may need to load the compilers/intel module
  3. Write a job script to run it on 2 processes
    1. Add #$ -pe mpi N option to request N processes
    2. It will run in less than a minute and not require large amounts of memory or disk
  4. Submit the job and wait for it to run
  5. Examine the output
    1. Load the graphicsmagick module and use gm display command to view image files
    2. The scheduler output file contains performance information
  6. Repeat 3-5 with 4 processes, what happened to the run time?

How do you use computing in your work?


Using Python/Matlab based scripts to process image sequences (videos) that take quite a long time on a PC
For genetic analysis and running pipelines (still in beginner phase)
optimisation of molecular structure
Using R to analyse large healthcare datasets

How would your work benefit from High Performance Computing?


Matlab parellel computing 
Myriad comes with a lot of useful modules of genetic analysis pre-installed
fast post processing
Be able to use this for my analysis have large data sets to analyse
intensive calculation
Manage large databases and run R which is too much for a standard computer

git clone https://github.com/lh3/seqtk.git

Commands:


nproc --all  -  Number of CPUs
free -h  -  Memory available
nodetypes - Shows the different types of nodes
qhost - gives you the information of all the nodes
qhost -h <nodename> - gives you information about the node
qsub script - to submit the job
qstat - tells you whehter the job is running, waiting,... 
watch -n 10 <command> - runs a command every 10 seconds
ctrl+R - allows you to search backwards in the terminal history
qdel <jobnumber> - it cancels a particular job
qdel -u <user> - it will cancel all jobs from your users
qrsh -l <and other options> - it gets you access to an interactive shell
module list - Shows a list of the modules that has been loaded
module avail - What's available in the system
   module avail python - will show only the ones related with python.
module load <modulename> - adds that module to your "environment".
module unload <modulename> - removes that module to your "environment".
module swap <modulename>/version - to swap between different version
module purge - to remove all the modules. One way to recover them back is to log-out/in again.
    module load default-modules/2018 - will load the default env too.
module show <modulename> - tells you what it does when you load the module
echo $PATH - shows you the places where the Operative System looks for a program when you call it. $PATH is an environment variable and is modified when we (un)load modules.
lquota - shows the quota your accounts has got (as in disk space)
scp  from to - copy files via ssh
  - on your own machine: scp <username>@myriad.rc.ucl.ac.uk:/path/to/file/to/copy  .
     (note the last dot! that means "here" as to the directory you run that command)
 - on your own machine: scp localfile.pdf  <username>@myriad.rc.ucl.ac.uk:~/hpc-test
    (note the ~ is a shortcut for /home/<username>/ )
rsync -avzP path/to/local/file.txt yourUsername@myriad.rc.ucl.ac.uk:path/on/Myriad
tar czvf my_job.tar.gz job* - will tar and zip all the files that start by job


~/.bashrc  - is a file that sets some variables when you login into the shell. You can add `module load <modulename>` commands there, so you don't need to run them everytime you log into myriad.

Options for the scheduler (man qsub)
#$ -N <jobname>  - the name of the job
#$ -wd /path/to/working/directory -  defines where you want the code to "run"
#$ -l h_rt=hh:mm:ss  - How much real-world time you need to run the job
#$ -pe  mpi - How many CPUs does your job need? 
#$ -mem=<bytes> - How much memory per process does your job need? 
#$ -m be - sends an email when begins/ends/aborts o reschedules/suspends
#$ -M e-mail@address - to who you want to spam with your results







https://jamboard.google.com/d/1YfdfV4d5xSYSWz4P-1K4qtWranudOr6wUp3480lkqa4/viewer?f=0