Skip to content

GL4U: Amplicon Seq 2023 Pilot Individual Setup

Mike Lee edited this page Jul 13, 2023 · 2 revisions

This page details one path to set up the required environment for running the GL4U Amplicon Seq 2023 Pilot Jupyter notebooks. The processing is light enough to be run on a typical laptop. It requires a Unix-like environment and utilizes conda for installing all required tools as detailed below.

We will be helping folks who want to do this during the bootcamp, but if you try to pursue this later and run into any issues, feel free to reach out to Mike (Mike.Lee@nasa.gov) and/or Amanda (Amanda.M.Saravia-Butler@nasa.gov) for help 🙂


Page contents


Accessing a Unix-like environment

A Unix-like environment is required.

  • On a Mac or Linux, this can be accessed by searching for and opening the "Terminal" app.
  • On a Windows computer, installing the Windows Subsystem for Linux (WSL) is required. You can try opening "PowerShell" and running wsl --install and following along with the process. After the installation is complete, you would want to open "Ubuntu" on the Windows computer to access your Unix-like environment.

Installing conda

Conda is a package and environment manager, and it is the method used here to setup the required environment to run the Jupyter notebooks. One place you can learn more about conda is this page at Happy Belly Bioinformatics, or you can just skip to this section and follow the installation instructions. Be sure to start with the curl command there that is specific to if you are on a Mac, Windows, or Linux machine (these commands should be run in your Unix-like environment).

After finishing that installation (and there is a "(base)" at the start of your prompt at the command line), the first thing we are going to install with conda is mamba, to enable faster installations, by running the following:

conda install -y -n base -c conda-forge mamba

Creating the needed environment

The following command will create a conda environment called "GL4U-amplicon-2023", and may take a few minutes to complete:

mamba create -n GL4U-amplicon-2023 -y -c conda-forge -c bioconda -c defaults \
             jupyterlab=3.6.0 bash_kernel=0.9.0 r-irkernel=1.3.2 coreutils=9.1 \
             r-base=4.1.3 r-tidyverse=1.3.2 r-vegan=2.6_4 r-dendextend=1.16.0 \
             bioconductor-dada2=1.22.0 bioconductor-decipher=2.22.0 \
             bioconductor-phyloseq=1.38.0 bioconductor-deseq2=1.34.0 \
             fastqc=0.11.9 multiqc=1.12 jupyter_contrib_nbextensions=0.7.0

Downloading and launching the notebooks

Still in our Unix-like environment, running this next codeblock will download and unpack the Jupyter notebooks into locations in our home directory:

# downloading notebooks
curl -L -o ~/GL4U-2023-amplicon-bootcamp-notebooks.zip https://figshare.com/ndownloader/files/41500734

# unpacking
unzip ~/GL4U-2023-amplicon-bootcamp-notebooks.zip -d ~/
# that includes:
    # 00-overview.ipynb
    # intro-notebooks/
    # amplicon-notebooks/

# removing zip
rm ~/GL4U-2023-amplicon-bootcamp-notebooks.zip

Then we can activate the conda environment we created above, and launch the Jupyter notebooks like so:

# changing into home directory
cd ~/

# activating conda enviroment
conda activate GL4U-amplicon-2023

# launching jupyter lab
jupyter lab 00-overview.ipynb

That conda environment will always need to be active (so our prompt should start with "(GL4U-amplicon-2023)") if we want to run these Jupyter notebooks.