http://bit.ly/201909_UCL_sc

Socrative link:   https://b.socrative.com/login/student/  - room: RITS

Workshop page:   http://rits.github-pages.ucl.ac.uk/2019-09-25-UCL_software_carpentry/

Unix

Data needed for the workshop: http://swcarpentry.github.io/shell-novice/data/data-shell.zip

Download it into your desktop.

Acronyms

CLI - command line interface

GUI - graphical user interface

HPC - high performance computing

Bash - Bourne again shell

SSH - secure shell, used for remote access

Shell Basics

User controls the shell by issuing commands. The shell performs the commands and gives the control back to the user. This loop repeats until the shell is stopped. Depending on the system, there are various commands with different syntax.

Here are basic commands:

Current/Working Directory

In shell, there is a current directory (or working directory). When the shell starts, this is usually the current user’s home directory. The default shell prompt (characters written before the command) always shows the current directory. The tilde sign (~) stands for the home directory.

The working directory can be manipulated over time by the following commands:

Issuing Commands

Commands have names, arguments and options. The name determines what a command does, the arguments usually determine what objects are affected (files, directories) and options are used for configuration. Options are also sometimes called switches if they configure a binary (yes/no) setting.

Breaking up a command, the name is the first word (mandatory), it is usually followed by arbitrary long sequence of options and arguments (whether they are mandatory or optional usually depends on the command in question).

Example:  “ls -F /usr”

  1. Name: “ls” - list directory
  2. Options: “-F” - print symbolic markers at the end of file names, classifying their type
  3. Arguments: “/usr” - which directory to list

Manipulating Directories

Basic commands:

The “ls” command has some useful options:

The “cp” command options:

Manipulating Files

Basic commands:

Paths

File system commands often include file or directory names. These can also be paths , so that it’s not necessary to change directories too often. The only difference between file/directory names and paths is that paths can include multiple nested directories. For instance, the path “a/b/c” points to a file “c” located in directory “b” that is located within another directory “a”. The slash (“/”) acts as a path separator or delimiter (note that this is different from Windows, where the path separator is backslash “\”).

There are some special directory names that are often used as shorthands, even though they do not  really exist on the file system. These are:

This way, file system can be traversed and modified efficiently. For instance, “cd ../../a” changes the directory to the directory “a” two levels above the current directory.

Paths are classified as either absolute or relative . The difference is that relative paths are dependent on the current working directory, whereas absolute paths are not. This is important because some commands require their arguments to be absolute paths, which are generally considered less prone to error (the same relative path in different working directories may have point to different places). Relative paths, on the other hand, are often shorter to type.

Wildcards

For convenience, shell allows paths to contain wildcards. These are simple symbols that allow to describe multiple files with a single path, so that we don’t have to type too much.

The simplest wildcard is an asterisk “*”. It stands for arbitrary character, or a sequence of more characters. When used in a shell command, it gets expanded depending on the file system contents. For instance, if directory “a” contains files “b”, “c”, and “d”. The path “a/*” will be expanded to “a/b”, “a/c” and “a/d”. This way, we can affect all files in the “a” directory simultaneously. Wildcards can also be used repeatedly or with prefixes/suffixes. If the “a” directory additionally contains files “x1”, “x2” and “x3”, the path “a/x*” will be expanded to only the files prefixed with “x”, whereas the previous path “a/*” will be expanded to all files in the directory (prefixed with “x” or not).

There is a lot of fun to be had with the asterisk. Here are some basic examples:

Basic file system commands (e.g. cp, mv, rmdir, mkdir, rm and many more) were made to accept not necessarily exactly one, but an arbitrary number of arguments. This is to enable compatibility with wildcards. Here’s how they work:

Following up on the example above, the command “mv a/x* y” will be expanded to “mv a/x1 a/x2 a/x3 y”, a command that will move files “a/x1”, “a/x2” and “a/x3” to the “y” directory.

Asterisk is not the only wildcard available. There is also:

Pipes

When commands are executed in shell, they have standard input and standard output. By default, the standard input is whatever user types in on their keyboard and the standard output is displayed in the console. This can be however changed using the vertical line character “|”. The vertical line allows to create pipes , which can be used to daisy-chain multiple commands together, connecting the standard output of one command to the standard input of another. This way, an arbitrary number of commands can be connected to achieve more advanced objectives. The standard input of the first command and the standard output of the last command will be default (keyboard and console).

Here is a simple example: “cat a| sort | wc -l”

Reading from the left to the right, this command:

  1. Prints the contents of a file called “a”
  2. Sorts the printed contents line-by-line alphabetically
  3. Displays the number of lines in the sorted output

At the end of the command above, only the line count is printed to the console. Outputs of “cat” and “sort” are consumed by their successive commands chained by “|”.

Furthermore, it’s possible to redirect standard input/output of commands to files. This is done using the “<” and “>” characters respectively, and can (but does not have to) be used in conjunction with pipes. Following up the example above, the command “cat a >b” will print the contents of file “a”, but will effectively dump them into a file called “b” since its output is redirected to that file. Similarly, the command “sort <a” will print the sorted contents of a file “a”. Combined, the command “sort <a >b” will do the same, but save the output in file called “b”.

Note that extra care needs to be taken when redirecting standard output to files, as files can be easily overwritten this way. Sometimes, it’s safer to use “>>” instead of “>”. This syntax has the same semantics as “>” but file contents are appended at the end of the file instead of being overwritten, preserving any previous contents of the file. If the target file does not exist, both “>” and “>>” will create it. It is generally recommended to use at most a single redirection for each stream (input and output). If more redirections are used at the same time (e.g. “>” as well as “>>”, or “>” as well as “|”), no errors will be produced, but the results on the file system are likely to differ from the original expectations, as each redirection of a stream will undo the previous one (e.g. “cat a >b >c” will create both files “b” and “c” but only “c” will contain the output of “cat a”)

Basic Scripting

Bash is a complete programming language, and it is possible to write any complex scripts or functions that you might desire. One simple example is a FOR loop:

for   variable   in   <list of files, numerical values, etc.>

do

        command1

        command2

        commandN

d one

The syntax so far is not remarkable in comparison to other programming and scripting languages; the bolded keywords denote a substructure of the script within which you can insert your own commands and variables. Variables can be assigned a value with a command such as “x=2”, and later accessed or called by a function using “$x” (e.g. “print $x”).

As an alternative to writing a script on separate lines, semicolons “;” may be used instead of line breaks.

It will rapidly become inefficient to write long and complicated scripts directly into the terminal, especially if they are likely to be useful on more than one occasion. Bash scripts can be stored into a text file with the “.sh” extension. This text file should begin with the comment line:

#!/usr/bin/bash

which instructs the system to use a bash interpreter to execute the script. Scripts are executed in the CLI by the command “bash <filename>”.

Searching

The “grep” command can search files line-by-line, printing lines that match given criteria. The basic usage is: “grep regular_expression   file_name ”. The regular expression is a string, which may contain wildcards specifying the lines to print. Note that these are not  the same wildcards as used in shell commands, so they usually need to be enclosed in apostrophes not to be expanded by the shell. Sometimes “egrep” (extended grep) is used instead of “grep” if extended regular expressions are required.

Here are some options for grepping:

The “find” command can search files and directories, printing paths of file names that match given criteria. The basic usage is: “find directory_name options ”, where options are used to specify the search criteria.

The “locate” command is helpful for finding files if you are not sure where they are and need to search a very broad region of the filesystem. It is faster than find  because it uses a cached database, though this database may not be up-to-date (see updatedb  command).

Git

Please make sure you have signed up for an account at GitHub.com before the session.

Cheatsheet for reference: http://swcarpentry.github.io/git-novice/reference

:)

;)

Why is Version Control important? If that doesn't fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of 'It's really pretty simple, just think of branches as...' and eventually you'll learn the commands that will fix everything.

Because it is.

What’s the mouseover text for this one?

<img src="//imgs.xkcd.com/comics/git.png" title="If that doesn't fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of 'It's really pretty simple, just think of branches as...' and eventually you'll learn the commands that will fix everything." alt="Git" srcset="//imgs.xkcd.com/comics/git_2x.png 2x">

Nice

You’re welcome

:O

Git was originally developed by Linus Torvalds as a way of not having to deal with people, and it is counterintuitive on purpose to anyone familiar with other version control systems

:O

How to Setup Git

The command “git config --list” should return your username and email address. If it does not:

You can also set your default text editor using “git config --global core.editor <text editor>”. Vim is objectively the best editor.

Then, use “git config --global core.autocrlf <X>” where

so that line-endings are compatible across these platforms.

Creating a Local Repository

In your chosen directory, create a directory called recipes  (“mkdir recipes”) and enter it (“cd recipes”). The following commands will be useful:

Working in a Remote Repository

In your chosen directory,

Markdown

Learn it it’s great (no) :(

Useful tricks/tools

Python

Setup

1st exercise!