Talapas and R

How to use R in talapas

You can find the Talapas knowledge base here.

Logging in to Talapas

Usde ssh to login through the terminal:

ssh username@talapas-login.uoregon.edu

Use scp to copy files to/from your local machine. Adding the -r flag lets you copy whole directories and all of the files within that directory. Some examples of its usage:

scp ~/myfile.tsv username@talapas-login.uoregon.edu:./projects/my_project
scp username@talapas-login.uoregon.edu:./projects/my_project/myfile.tsv ~/
scp -r ~/my_directory username@talapas-login.uoregon.edu:./projects/my_project

To login to a virtual desktop environment go to talapas-login.uoregon.edu with your browser.

Submitting a job to slurm

When you login to talapas, you will be on one of the login nodes. To run analyses, you will have to access a compute node by submitting a job. From the terminal, there are two ways of submitting a job. You can either start an interactive session in your terminal using srun or submit a batch script using sbatch, which will run automatically in the background. In either case, you will have to wait in the queue until a node opens up for the amount of CPU, memory, and time that you need.

An example srun command:

srun --account=crobe --pty --partition=short --mem=1024M --time=240 bash

you must have --account=crobe and --pty and bash in there. The other commands are optional, but let you spectify memory, time, CPUs, etc. Also, these flags (e.g., --time=240) are the same as what you would put into a batch script. --partition=short is the default type of partition you will use. --mem=1024M gives you a gig of RAM/memory and --time=240 gives you 4 hours of time on the node.

An example batch script:

#!/bin/bash
#SBATCH --partition=short       ### queue to submit to
#SBATCH --job-name=varcomp      ### job name
#SBATCH --output=varcomp.out    ### file in which to store job stdout
#SBATCH --error=varcomp.err     ### file in which to store job stderr
#SBATCH --time=30               ### wall-clock time limit, in minutes
#SBATCH --nodes=1               ### number of nodes to request
#SBATCH --ntasks-per-node=1     ### number of tasks per node
#SBATCH --cpus-per-task=28      ### number of cores/CPUs
#SBATCH -A crobe                ### account
module load R/3.6.1 # optional if you want to load a specific version
cd projects/Royal_Society/R
Rscript var_comp.R

A batch script is essentially just a normal bash script with some extra parameters at the top, which slurm reads. You should specify --partition=short, a --job-name and where to put --output and --error. --time sets the time in minutes. --nodes and --ntasks-per-node will pretty much always by equal to 1 for R jobs. You can set how many cores/CPUs you want with --cpus-per-task and the amount of memory you want with --mem where it takes a number and a unit. For example, --mem=64G will gives you 64 gigabytes of RAM on that node. I think the default is something like 4 GB per CPU. At the bottom of the script, you should enter normal terminal commands as you would in a bash script or as you would enter them directly at the command line. This script can be in your home directory or in the project directory.

Run a batch script by typing sbatch and then the name of the batch script (the filename extension doesn’t matter so I usually just use .batch):

sbatch my_script.batch

To check on your jobs you can type in:

squeue -u username

(Replacing username with your username.) This will tell you whether they’re still in the queue, how long they’ve been running, etc.

You can cancel a job by getting the job ID from squeue and then entering:

scancel job-id

(Replacing job-id with the actual number.) Any output that your commands print to the terminal will be saved in the output file that you specify. Any errors specifically from slurm, for example if your job crashes for some reason, will be printed to the error file that you specify.

Using R

Check if R is installed

First, login to talapas using ssh. By default, when you login to talapas, R is already loaded with a recent installed version (at least on my account). Check by entering:

R --version

at the command line. If this returns an error or if you want a specific version of R you can see the installed versions with:

module spider R

and load the latest version with:

module load R

or a specific version with:

module load R/3.6.1

This is important to check first, because if it’s not automatically loaded, you will have to include module load R at the top of your batch file when you go to submit jobs.

(It’s probably safest to always load a specific version of R at the top of your batch file, but I don’t do this. 🤷)

Running R at the command line

Run R directly by typing

R

at the command line, which will open an R console. This is a good way of testing things and installing packages. You should manually install all of the packages you need this way (with the appropriate R module loaded) before submitting a batch script. When you run install.package() from talapas it will ask you to select a mirror from which to download the files. I usually choose the OSU one.

Running an R script from the command line

Use the function Rscript to run a script from the terminal:

Rscript my_script.R

By default, the output is not saved to a file, but is just printed to the terminal. So any output you need should be manually saved within your script.

(if you’re running scripts manually at the command line without a batch job (i.e., without using sbatch) then remember to login to a compute node using srun instead of running your script on the login node.)

Running an R script from within a batch file using slurm

If you want to submit your R script as a job to talapas, just create a batch script in which you cd into the folder with your R script and use the Rscript command like you would at the command line.

Your batch script will look something like this:

#!/bin/bash
#SBATCH --partition=short       ### queue to submit to
#SBATCH --job-name=varcomp      ### job name
#SBATCH --output=varcomp.out    ### file in which to store job stdout
#SBATCH --error=varcomp.err     ### file in which to store job stderr
#SBATCH --time=30               ### wall-clock time limit, in minutes
#SBATCH --nodes=1               ### number of nodes to request
#SBATCH --ntasks-per-node=1     ### number of tasks per node
#SBATCH --cpus-per-task=28      ### number of cores you want to use
#SBATCH -A crobe                ### account
module load R/3.6.1 # optional if you want to load a specific version
cd projects/Royal_Society/R
Rscript var_comp.R

And you will run it at the command line like this (except replace varcomp.batch with whatever the name of your batch file is):

sbatch varcomp.batch

(It can be helpful to use require('package_name') instead of library('package_name') so that you receive output in the event that a package fails to load.)

(If you’ve ever used R CMD BATCH to run R scripts at the command line, Rscript works a little differently. It prints output to the terminal (a.k.a., stdout) and does not save an .Rout file. Therefore, any output or errors will print to your sbatch.out file defined in your batch script.)

Using rstudio on talapas

If you want to use rstudio on talapas you can login to an Open OnDemand virtual desktop by opening your web browser and navigating to talapas-login.uoregon.edu. Enter your uoregon login info and then click on “Interactive Apps” at the top of the screen and then “Talapas Desktop” from the dropdown menu that appears. This will take you to a screen where you enter your normal srun or sbatch info for slurm including your account name (crobe), the length of your job, how many cpus you need, etc. Once your session begins, click on “Launch noVNC in New Tab” to open the desktop.

Once you have the talapas desktop open, go to the top of the screen and click on the terminal button. At the terminal type in

module load rstudio

and if you want a specific version of R you can type in

module load R/3.6.1

or whatever version you want. Finally, type in

rstudio

to start the rstudio application. Your working directory will be your normal talapas home directory and you can use rstudio like you would on your local machine but with the power 💪 of talapas.

Using qiime2 with modules and miniconda

If you want to use qiime2 on talapas, you should put the following commands into your batch script:

module load miniconda
conda activate qiime2-2019.10

where you should replace qiime2-2019.10 with the name of the conda environment that has the version of qiime2 that you need. You can see which qiime2 environments are installed with:

conda env list

How to install a specific version of qiime2 on Talapas using miniconda

When you do this, you should go to the qiime2 docs under “Installing QIIME 2” and run the commands from there to get the latest version. This will create a miniconda environment specific to your talapas user account. Below, I have copied how to do this for the current version (version 2020.2) as an example.

First, if you haven’t already, activate the miniconda module:

module load miniconda

Then you can install qiime2 in a new environment. Just update the url and environment name with whatever specific version you want to install:

wget https://data.qiime2.org/distro/core/qiime2-2020.2-py36-linux-conda.yml
conda env create -n qiime2-2020.2 --file qiime2-2020.2-py36-linux-conda.yml
# OPTIONAL CLEANUP
rm qiime2-2020.2-py36-linux-conda.yml

Then any time you what to use that version simply run:

source activate qiime2-2020.2

If you forget what it’s called, it will be at the top of this left:

conda env list
Andrew H. Morris
Andrew H. Morris
Post-doctoral Scholar

Community Ecology.