Python, R, Matlab and Perl¶
Scripting languages often support modules or libraries for additional functionality or convenience functions. We encourage users to install modules locally for only the current user.
Python¶
You can install many python and non-python packages yourself using conda or especially for bioinformatics software bioconda.
Conda enables you to easily install complex packages and software. Creating multiple enviroments enables you to have installations of the same software in different versions or incompatible software collections at once. You can easily share a list of the installed packages with collaborators or colleagues, so they can setup the same eniviroment in a matter of minutes.
Setup¶
First you load the miniconda module which is like a python and r package manager. Conda makes it easy to have multiple environments for example one python2 and one python3 based parallel to each other without interfering.
Start by removing all preloaded modules which can complicate things. We then display all installed version and load the newest one (4.6.14):
ml purge
ml avail Miniconda
ml Miniconda3/4.6.14
To install packages we first have to add the package repository to conda (we only have to do this once) and set up a new conda environment which we will call e.g. “python3” where we also specify which python version we want and which packages should be installed (matplotlib numpy):
conda config --add channels defaults
conda config --add channels conda-forge
conda create --name python3 python=3 matplotlib numpy
If you want install bioinformatics packages you should also add the bioconda channel:
conda config --add channels bioconda
In case you want to install the conda environment in another directory than the home, you can add –prefix PATH. This enables multiple users of a project to share the conda environment by installing it into their project folder instead of the users home.
To suppress the warning that a newer version of conda exists which is usually not important for most users and will be fixed by us by installing a new module:
conda config --set notify_outdated_conda false
Daily usage¶
To load this environment you have to use the following commands either on the command line or in your job script:
ml purge
ml Miniconda3/4.6.14
conda activate python3
Then you can use all software as usual.
To deactivate the current environment:
conda deactivate
If you need to install additional software or packages, we can search for it with:
conda search SOMESOFTWARE
and install it with:
conda install -n python3 SOMESOFTWARE
If the python package you are looking for is not available in conda you can use **pip** like usually from within a conda environment to install additional python packages:
pip install SOMEPACKAGE
To update the a single package with conda:
conda update -n python3 SOMESOFTWARE
or to update all packages:
conda update -n python3 --all
Additional Conda information¶
Cheatsheet and built-in help¶
See this cheatsheet for an overview over the most important conda commands.
In case you get confused by the conda commands and command line options you can get help by adding –help to any conda command or have a look at the conda documentation.
Miniconda vs. Anaconda¶
Both Miniconda and Anaconda are distributions of the conda repository managment system. But while Miniconda brings just the managment system (the conda command), Anaconda comes with a lot of built-in packages.
Both are installed on Stallo but we advise the use of Miniconda. By explicitly installing packages into your own enviroment the chance for unwanted effects and errors due to wrong or incomaptible versions is reduced. Also you can be sure that everything that happens with your setup is controlled by yourself.
Virtual Environments (deprecated)¶
We recommend using Conda , but there is another way to install modules locally is using virtual environments
As an example we install the Biopython package (and here we use
the Python/3.6.4-intel-2018a
module as an example):
$ module load Python/3.6.4-intel-2018a
$ virtualenv venv
$ source venv/bin/activate
$ pip install biopython
Next time you log into the machine you have to activate the virtual environment:
$ source venv/bin/activate
If you want to leave the virtual environment again, type:
$ deactivate
And you do not have to call it “venv”. It is no problem to have many virtual environments in your home directory. Each will start as a clean Python setup which you then can modify. This is also a great system to have different versions of the same module installed side by side.
If you want to inherit system site packages into your virtual environment, do this instead:
$ virtualenv --system-site-packages venv
$ source venv/bin/activate
$ pip install biopython
R¶
Load R¶
Using R on Stallo is quite straightforward. First check which versions are available:
ml avail -r '^R/'
To load a version:
ml R/3.5.0-iomkl-2018a-X11-20180131
Now you can use R from the command line just as you would on your local computer.
Install Packages¶
To install R packages use install.packages()
.
First open the R command line and then install a package e.g. “tidyverse”:
R
install.packages("tidyverse")
Note: The first time you install new packages, R will ask you
whether it should install these packages into your home folder.
Confirm both questions with y
and then choose a close download mirror
MATLAB¶
Load MATLAB¶
To use MATLAB simply load the module at the start of your jobscript or type them on the command line:
ml purge
ml avail matlab # To display all installed versions
ml MATLAB/R2018a-foss-2017a # or any other version you want
Interactice Shell¶
On the login nodes you can start a normal MATLAB session with an graphical user interface (GUI). You can use this to visualize and look at data. Just type matlab
.
But remember NOT to run calculations on the login nodes as this might slow down the system for all stallo users. If this happens we will kill the process without prior warning.
You can also start an interactive matlab shell on the command line without graphical user interface (headless) with:
matlab -nodesktop -nodisplay -nosplash
See matlab -h
for all command line options.
If you are on a compute node matlab
always starts a headless matlab shell.
Running MATLAB Scripts¶
You can run a matlab script by:
matlab -r -nodisplay -nosplash -r 'run("SCRIPT.m")'
In some instances it might be necessary to use an absolute file path to the script.
Tips¶
- You can reduce the memory usage by starting matlab without java support, just add
-nojvm
. - To get a graphical interface when starting
matlab
on a login node, you need to activate X11 forwarding for your ssh connection to stallo. If you connect to stallo from a linux machine usessh -X
to tunnel graphical output to your computer.
Perl¶
We will use Perl 5.28 and use the standard paths. This follows the general instruction given here: https://metacpan.org/pod/local::lib.
$ module load Perl/5.28.0-GCCcore-7.3.0
$ mkdir my_perl_installs # or however you want to call this temporary folder
$ cd my_perl_installs
# Check the newest version on metacpan.org and search for local::lib
$ wget https://cpan.metacpan.org/authors/id/H/HA/HAARG/local-lib-2.000024.tar.gz
$ tar xzf local-lib-2.000024.tar.gz
$ cd local-lib-2.000024
$ perl Makefile.PL --bootstrap
$ make test
$ make install
$ echo 'eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >> ~/.bashrc
$ source ~/.bashrc
Now, the module local::lib is installed and the ~/.bashrc
changed such that Perl should now recognize your local folder as module folder.
All future modules will be installed to ~/perl5
.
If you want to install, for example, the module Math::Vector::Real, just call cpan:
$ cpan Math::Vector::Real
Remember to load the right Perl version first (module load ...
).
The first time you call cpan, it will ask you to do some configurations.
Just press enter (let it do its configurations).