Author: Emmanuel Dollinger
How to run python on the HPC3
Sam Morabito recently showed me a trick to get python working on the HPC3. This makes running python on the HPC3 actually useable. This trick relies on using conda environments, you can read up on conda here and on conda virtual environments here.
Scope:
This will only work if you are running your code in a programming language such as python or R. If you are running a package directly from bash such as cellranger this isn’t what you want to do.
I won’t go over basic HPC3/bash code here, see this link for HPC3 stuff.
We will install Scanpy, which is a nice single cell sequencing package for python. This is a nice usecase because Scanpy is both very common and not in conda.
Create new conda environment
ssh -y edolling@hpc3.rcic.uci.edu
module load anaconda
conda init #do whatever it tells you to do, probably will have to quit and login in again to the HPC3.
conda create -n EnvName python=3.6 # create a new environment with specified python version
It will ask you to install packages, continue through.
Now we have a virtual conda environment, that we can install packages in directly. This totally circumvents the onerous requirement to have modulefiles and such for each package. Best practices are to create a virtual environment for each project.
Install packages that are in conda
Every time you want to use a virtual environment, you need to activate it (again see the env tutorial above).
conda activate EnvName
The environment name should change from (base) to (EnvName).
Best practice for packages that are not in conda is to first install everything that is in conda and then download via pip. The reason for this is conda can update its packages but not other packages, and doesn’t “see” pip updating packages. See this post for more.
conda install pandas
# Install a bunch of packages
conda install matplotlib
# Install moar packages
conda install seaborn
# INSTALL ALL THE PACKAGES
# etc
If you already have a conda env that has all the packages you want, you can do:
conda activate OldEnv
conda list -e > packagelist.txt
conda create -n NewEnv --file packagelist.txt
You can also create the conda env and pass the file to conda install
afterwards. NB this will error if you have pip installed packages in the old conda env.
Install packages not in conda
Once you’ve installed most/all of the packages in conda, you can simply pip install scanpy.
conda activate EnvName # Don't forget to do this
pip install scanpy
Load scanpy in python file
Now we will submit a job to the scheduler that just loads scanpy and quits. You need two scripts, a slurm script that submits the python script and the python script.
slurm.sub script:
#!/bin/bash
#SBATCH --job-name=scanpytest ## Name of the job.
#SBATCH -A qnie_lab ## account to charge
#SBATCH -p standard ## partition/queue name
#SBATCH --nodes=1 ## (-N) number of nodes to use
#SBATCH --ntasks=1 ## (-n) number of tasks to launch
#SBATCH --cpus-per-task=2 ## number of cores the job needs
#SBATCH --error=slurm-%J.err ## error log file
#SBATCH --output=../out ## out log file
# Run the following two lines every time you submit a python script to slurm, this tells slurm about your conda env and loads it.
source ~/.bashrc
conda activate trVAE
# This next line just runs the python script
python3 testscipt.py
testscipt.py script:
import scanpy as sc
print("Scanpy successfully loaded.")
In ../out:
Scanpy successfully loaded.
That’s it! NB this will also work with R and any other language that conda supports.