Installing Software
Installing software§
If you want to request that software be installed centrally, you can email us at rc-support@ucl.ac.uk. When you send in a request please address the following questions so that the install can be properly prioitised and planned,
- Can you provide some details as to why and give an idea of the timeline you would like us to build it in?
- Do you have an idea of the user base for this software within your community? If you are asking for some software on the MMM machines this can be the wider community and for UCL machines users in your ecosystem.
- If the software is only required by you would you be open to trying to install the software in your home space? We can provide some assistance here if you tell us what problems you are encountering.
The requests will be added to the issues in our buildscripts repository. The buildscripts themselves are there too, so you can see how we built and installed our central software stack.
You can install software yourself in your space on the cluster. Below are some tips for installing packages for languages such as Python or Perl as well as compiling software.
No sudo!§
You cannot install anything using sudo
(and neither can we!). If the instructions tell you to do that, read further to see if they also have instructions for installing in user space, or for doing an install from source if they are RPMs.
Alternatively, just leave off the sudo
from the command they tell you to run and look for an alternative way to give it an install location if it tries to install somewhere that isn't in your space (examples for some common build systems are below).
Download source code§
Use wget or curl to download the source code for the software you want to install
to your account on the cluster. You can use tar
to extract source archives named
like tar.gz
or .tgz
or .tar.bz2
among others and unzip
for .zip
files.
xz --decompress
will expand .xz
files.
wget https://www.example.com/program.tar.gz
tar -xvf program.tar.gz
You will not be able to use a package manager like yum
, and will need to follow
the manual installation instructions for a user-space install (not using sudo
).
Set up modules§
Before you start compiling, you need to make sure you have the right compilers, libraries and other tools available for your software. If you haven't changed anything, you will have the default modules loaded.
Check what the instructions for your software tell you about compiling it. If the website doesn't say much, the source code will hopefully have a README or INSTALL file.
You may want to use a different compiler - the default is the Intel compiler.
module avail compilers
will show you all the compiler modules available. Most
Open Source software tends to assume you're using GCC and OpenMPI (if it uses MPI)
and is most tested with that combination, so if it doesn't tell you otherwise you
may want to begin there (do check what the newest modules available are - the below
is correct at time of writing):
# unload your current compiler and mpi modules
module unload -f compilers mpi
# load the GNU compiler
module load compilers/gnu/4.9.2
# these three modules are only needed on Myriad
module load numactl/2.0.12
module load binutils/2.29.1/gnu-4.9.2
module load ucx/1.8.0/gnu-4.9.2
# load OpenMPI
module load mpi/openmpi/4.0.3/gnu-4.9.2
Useful resources:
- Modules pt 1 (moodle) (UCL users)
- Modules pt 2 (moodle) (UCL users)
- Modules pt 1 (mediacentral) (non-UCL users)
- Modules pt 2 (mediacentral) (non-UCL users)
Newer versions of GCC and GLIBCXX§
The software you want to run may require newer compilers or a precompiled binary may say that it needs a newer GLIBCXX to be able to run. You can access these as follows:
# make all the newer versions visible
module load beta-modules
# unload current compiler, mpi and gcc-libs modules
module unload -f compilers mpi gcc-libs
# load GCC 10.2.0
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
The gcc-libs
module contains the actual compiler and libraries, while the compilers/gnu
module sets environment variables that are likely to be picked up by build systems, telling them
what the C, C++ and Fortran compilers are called.
GLIBC version error§
If you get an error saying that a precompiled binary that you are installing needs a newer GLIBC (not GLIBCXX) then this has been compiled on a newer operating system and will not work on our clusters. Look for a binary that was created for CentOS 7 (we have RHEL 7) or build the program from source if possible.
Build systems§
Most software will use some kind of build system to manage how files are compiled and linked and in what order. Here are a few common ones.
Automake configure§
Automake will generate the Makefile for you and hopefully pick up sensible options through configuration. You can give it an install prefix to tell it where to install (or you can build it in place and not use make install at all).
./configure --prefix=/home/username/place/you/want/to/install
make
# if it has a test suite, good idea to use it
make test
make install
If it has more configuration flags, you can use ./configure --help
to
view them.
Usually configure will create a config.log: you can look in there to find if any tests have failed or things you think should have been picked up haven't.
CMake§
CMake is another build system. It will have a
CMakeFile or the instructions will ask you to use cmake or ccmake rather
than make. It also generates Makefiles for you. ccmake
is a
terminal-based interactive interface where you can see what variables
are set to and change them, then repeatedly configure until everything
is correct, generate the Makefile and quit. cmake
is the commandline
version. The interactive process tends to go like this:
ccmake CMakeLists.txt
# press c to configure - will pick up some options
# press t to toggle advanced options
# keep making changes and configuring until no more errors or changes
# press g to generate and exit
make
# if it has a test suite, good idea to use it
make test
make install
The options that you set using ccmake can also be passed on the commandline to
cmake with -D
. This allows you to script an install and run it again later.
CMAKE_INSTALL_PREFIX
is how you tell it where to install.
# making a build directory allows you to clean it up more easily
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/home/username/place/you/want/to/install
If you need to rerun cmake/ccmake and reconfigure, remember to delete the
CMakeCache.txt
file first or it will still use your old options.
Turning on verbose Makefiles in cmake is also useful if your code
didn't compile first time - you'll be able to see what flags the
compiler or linker is actually being given when it fails.
Make§
Your code may come with a Makefile and have no configure, in which case the generic way to compile it is as follows:
make targetname
There's usually a default target, which make
on its own will use. make all
is also frequently used.
If you need to change any configuration options, you'll need to edit those
sections of the Makefile (usually near the top, where the variables/flags are
defined).
Here are some typical variables you may want to change in a Makefile.
These are what compilers/mpi wrappers to use - these are also defined by
the compiler modules, so you can see what they should be. Intel would be
icc
, icpc
, ifort
, while the GNU compiler would be gcc
, g++
, gfortran
.
If this is a program that can be compiled using MPI and only has a variable for CC,
then set that to mpicc.
CC=gcc
CXX=g++
FC=gfortran
MPICC=mpicc
MPICXX=mpicxx
MPIF90=mpif90
CFLAGS and LDFLAGS are flags for the compiler and linker respectively,
and there might be LIBS or INCLUDE in the Makefile as well. When linking a library
with the name libfoo, use -lfoo
.
CFLAGS="-I/path/to/include"
LDFLAGS="-L/path/to/foo/lib -L/path/to/bar/lib"
LDLIBS="-lfoo -lbar"
Remember to make clean
first if you are recompiling with new options. This will delete
object files from previous attempts.
BLAS and LAPACK§
BLAS and LAPACK are linear algebra libraries that are provided as part of MKL, OpenBLAS or ATLAS. There are several different OpenBLAS and ATLAS modules for different compilers. MKL is available as part of each Intel compiler module.
Your code may try to link -lblas -llapack
: this isn't the right way to use BLAS
and LAPACK with MKL or ATLAS (though our OpenBLAS now has symlinks that mean this
will work).
MKL§
When you have an Intel compiler module loaded, typing
echo $MKLROOT
will show you that MKL is available.
Easy linking of MKL§
If you can, try to use -mkl
as a compiler flag - if that works, it should get
all the correct libraries linked in the right order. Some build systems do not
work with this however and need explicit linking.
Intel MKL link line advisor§
It can be complicated to get the correct link line for MKL, so Intel has provided a tool which will give you the link line with the libraries in the right order.
- https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
Pick the version of MKL you are using (for the Intel 2018 compiler it should be Intel(R) MKL 2018.0), and these options:
- OS: Linux
- Pick your compiler. BLAS and LAPACK are Fortran95 interfaces, to select them pick a Fortran compiler.
- Architecture: Intel(R) 64
- You can choose what type of linking you prefer. Dynamic linking means the libraries are linked at runtime and use the .so library, while static means they are linked at compile time and use the .a library. The Single Dynamic Library for later MKL versions will mean MKL will do clever things to work out which parts of it you are using.
- Interface layer: 64-bit integer
- Threading layer: You probably want sequential threading in most cases.
- Select additional libraries (ScaLAPACK) if required.
- Select Intel MPI if required.
- Select 'Link with Intel MKL libraries explicitly'
You'll get something like this:
${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl
and compiler options:
-i8 -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include
It is a good idea to double check the library locations given by the tool are
correct: do an ls ${MKLROOT}/lib/intel64
and make sure the directory exists
and contains the libraries. In the past there have been slight path differences
between tool and install for some versions.
OpenBLAS§
We have native threads, OpenMP and serial versions of OpenBLAS.
Type module avail openblas
to see the available versions.
Linking OpenBLAS§
Our OpenBLAS modules now contain symlinks for libblas
and liblapack
that both
point to libopenblas
. This means that the default -lblas -llapack
will in fact work.
This is how you would normally link OpenBLAS:
-L${OPENBLASROOT}/lib -lopenblas
If code you are compiling requires separate entries for BLAS and LAPACK, set them
both to -lopenblas
.
Troubleshooting: OpenMP loop warning§
If you are running a threaded program and get this warning:
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.
Then tell OpenBLAS to use only one thread by adding the below to your jobscript
(this overrides $OMP_NUM_THREADS
for OpenBLAS only):
export OPENBLAS_NUM_THREADS=1
If it is your own code, you can also set it in the code with the function
void openblas_set_num_threads(int num_threads);
You can avoid this error by compiling with one of the native-threads
or serial
OpenBLAS modules instead of the openmp
one.
ATLAS§
We would generally recommend using OpenBLAS instead at present, but we do have ATLAS modules.
Dynamic linking ATLAS§
There is one combined library each for serial and threaded ATLAS (in most circumstances you probably want the serial version).
Serial:
-L${ATLASROOT}/lib -lsatlas
Threaded:
-L${ATLASROOT}/lib -ltatlas
Static linking ATLAS§
There are multiple libraries to link.
Serial:
-L${ATLASROOT}/lib -llapack -lf77blas -lcblas -latlas
Threaded:
-L${ATLASROOT}/lib -llapack -lptf77blas -lptcblas -latlas
Troubleshooting: libgfortran or lifcore cannot be found§
If you get a runtime error saying that libgfortran.so
cannot be found,
you need to add -lgfortran
to your link line.
The Intel equivalent is -lifcore
.
You can do a module show on the compiler module you are using to see where the Fortran libraries are located if you need to give a full path to them.
Installing additional packages for an existing scripting language§
Python§
There are python2/recommended
and python3/recommended
module bundles you will see if you type
module avail python
. These use a virtualenv, have a lot of Python packages installed already,
like numpy and scipy (see the Python package list)
and have pip
set up for you.
Load the GNU compiler§
Our Python installs were built with GCC. You can run them without problems with the default Intel
compilers loaded because it also depends on the gcc-libs/4.9.2
module. However, when you are
installing your own Python packages you should make sure you have the GNU compiler module loaded.
This is to avoid the situation where you build your package with the Intel compiler and then try to
run it with our GNU-based Python. If it compiled any C code, it will be unable to find Intel-specific
instructions and give you errors.
Change your compiler module:
module unload compilers
module load compilers/gnu/4.9.2
If you get an error like this when trying to run something, you built a package with the Intel compiler.
undefined symbol: __intel_sse2_strrchr
Install your own packages in the same virtualenv§
This will use our central virtualenv and the packages we have already installed.
# for Python 2
pip install --user <python2pkg>
# for Python 3
pip3 install --user <python3pkg>
These will install into .python2local
or .python3local
in your home directory.
If your own installed Python packages get into a mess, you can delete (or rename) the whole
.python3local
and start again.
Using your own virtualenv§
If you need different packages that are not compatible with the centrally installed versions (eg. what you are trying to install depends on a different version of something we have already installed) then you can create a new virtualenv and only packages you are installing yourself will be in it.
In this case, you do not want our virtualenv with our packages to also be active.
We have two types of Python modules. If you type module avail python
there are
"bundles" which are named like python3/3.7
- these include our virtualenv and
packages. Then there are the base modules for just python itself, like python/3.7.4
.
When using your own virtualenv, you want to load one of the base python modules.
# load a base python module (you will always need to do this)
module load python/3.7.4
# create the new virtualenv, with any name you want
virtualenv <DIR>
# activate it
source <DIR>/bin/activate
Your bash prompt will change to show you that a different virtualenv is active.
(This one is called venv
).
(venv) [uccacxx@login03 ~]$
deactivate
will deactivate your virtualenv and your prompt will return to normal.
You only need to create the virtualenv the first time.
Error while loading shared libraries§
You will always need to load the base python module before activating your virtualenv or you will get an error like this:
python3: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
Installing via setup.py§
If you need to install by downloading a package and using setup.py
, you can use the --user
flag and as long as one of our python module bundles are loaded, it will install into the same
.python2local
or .python3local
as pip
does and your packages will be found automatically.
python setup.py install --user
If you want to install to a different directory in your space to keep this package separate,
you can use --prefix
instead. You'll need to add that location to your $PYTHONPATH
and $PATH
as well so it can be found. Some install methods won't create the prefix directory you requested
for you automatically, so you would need to create it yourself first.
This type of install makes it easier for you to only have this package in your paths when you want to use it, which is helpful if it conflicts with something else.
# add location to PYTHONPATH so Python can find it
export PYTHONPATH=/home/username/your/path/lib/python3.7/site-packages:$PYTHONPATH
# if necessary, create lib/pythonx.x/site-packages in your desired install location
mkdir -p /home/username/your/path/lib/python3.7/site-packages
# do the install
python setup.py install --prefix=/home/username/your/path
It will tend to tell you at install time if you need to change or create the $PYTHONPATH
directory.
To use this package, you'll need to add it to your paths in your jobscript or .bashrc
.
Check that the PATH
is where your Python executables were installed.
export PYTHONPATH=/home/username/your/path/lib/python3.7/site-packages:$PYTHONPATH
export PATH=/home/username/your/path/bin:$PATH
It is very important that you keep the :$PYTHONPATH
or :$PATH
at the end of these - you
are putting your location at the front of the existing contents of the path. If you leave
them out, then only your package location will be found and nothing else.
Troubleshooting: remove your pip cache§
If you built something and it went wrong, and are trying to reinstall it with pip
and keep
getting errors that you think you should have fixed, you may still be using a previous cached version.
The cache is in .cache/pip
in your home directory, and you can delete it.
You can prevent caching entirely by installing using pip3 install --user --no-cache-dir <python3pkg>
Troubleshooting: Python script executable paths§
If you have an executable Python script (eg. something you run using pyutility
and not
python pyutility.py
) that begins like this:
#!/usr/bin/python2.6
and fails because that Python doesn't exist in that location or isn't the one that has the additional packages installed, then you should change it so it uses the first Python found in your environment instead, which will be the one from the Python module you've loaded.
#!/usr/bin/env python