Skip to content

Commit

Permalink
Merge pull request #68 from NickEngland/fix-docker-build-2023
Browse files Browse the repository at this point in the history
Merge changes to fix docker build and update dependences for 2023 versions.
  • Loading branch information
NickEngland authored Jul 26, 2023
2 parents c518e57 + b68b4a5 commit 62d2edb
Show file tree
Hide file tree
Showing 19 changed files with 349 additions and 157 deletions.
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Dockerfile
.git
venv
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,5 @@ unversioned/
# Various
mess/
Rplots.pdf
venv/
.DS_Store
129 changes: 81 additions & 48 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,71 +2,104 @@
# docker build -t bracer .

#start off with a plain Debian
FROM debian:latest
FROM debian:trixie-20230703-slim

#basic setup stuff, including bowtie2
RUN apt-get update && apt-get -y upgrade
RUN apt-get -y install wget curl unzip build-essential zlib1g-dev git python3 python3-pip bowtie2 openjdk-8-jre
#basic setup stuff, including bowtie2 for Bracer, libcurl4-openssl-dev r-base libxml2-dev for Alakazam
RUN apt-get update && apt-get -y upgrade && \
apt-get -y install \
bowtie2 \
build-essential \
cmake \
curl \
default-jre \
git \
graphviz \
jellyfish \
libcairo2-dev \
libcurl4-openssl-dev \
libfreetype6-dev \
libgirepository1.0-dev \
libxml2-dev \
pkg-config \
python3-dev \
python3-pip \
python3-venv \
r-base \
salmon \
samtools \
unzip \
wget \
zlib1g-dev \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

#Trinity - depends on zlib1g-dev and openjdk-8-jre installed previously
RUN wget https://github.com/trinityrnaseq/trinityrnaseq/archive/Trinity-v2.4.0.zip
RUN unzip Trinity-v2.4.0.zip && rm Trinity-v2.4.0.zip
RUN cd /trinityrnaseq-Trinity-v2.4.0 && make

#IgBLAST, plus the setup of its super weird internal_data thing. don't ask. just needs to happen
#and then on top of that, the environmental variable thing facilitates the creation of a shell wrapper. fun
RUN wget ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/1.4.0/ncbi-igblast-1.4.0-x64-linux.tar.gz
RUN tar -xzvf ncbi-igblast-1.4.0-x64-linux.tar.gz && rm ncbi-igblast-1.4.0-x64-linux.tar.gz
RUN cd /ncbi-igblast-1.4.0/bin/ && wget -r ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data && \
mv ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data . && rm -r ftp.ncbi.nih.gov

#aligners - kallisto and salmon
RUN wget https://github.com/pachterlab/kallisto/releases/download/v0.43.1/kallisto_linux-v0.43.1.tar.gz
RUN tar -xzvf kallisto_linux-v0.43.1.tar.gz && rm kallisto_linux-v0.43.1.tar.gz
RUN wget https://github.com/COMBINE-lab/salmon/releases/download/v0.8.2/Salmon-0.8.2_linux_x86_64.tar.gz
RUN tar -xzvf Salmon-0.8.2_linux_x86_64.tar.gz && rm Salmon-0.8.2_linux_x86_64.tar.gz

#graphviz, along with its sea of dependencies that otherwise trip up the dpkg -i
RUN apt-get -y install libgd3 libgts-0.7-5 liblasi0 libltdl7 freeglut3 libglade2-0 libglu1-mesa libglu1 libgtkglext1 libxaw7
RUN wget http://www.graphviz.org/pub/graphviz/stable/ubuntu/ub13.10/x86_64/libgraphviz4_2.38.0-1~saucy_amd64.deb
RUN dpkg -i libgraphviz4_2.38.0-1~saucy_amd64.deb && apt-get -y -f install
RUN wget http://www.graphviz.org/pub/graphviz/stable/ubuntu/ub13.10/x86_64/graphviz_2.38.0-1~saucy_amd64.deb
RUN dpkg -i graphviz_2.38.0-1~saucy_amd64.deb && apt-get -y -f install
RUN rm libgraphviz4_2.38.0-1~saucy_amd64.deb && rm graphviz_2.38.0-1~saucy_amd64.deb
RUN wget https://github.com/trinityrnaseq/trinityrnaseq/releases/download/Trinity-v2.15.1/trinityrnaseq-v2.15.1.FULL.tar.gz
RUN tar -xf trinityrnaseq-v2.15.1.FULL.tar.gz
# Currently trinity won't compile without a #include <string> in this file
RUN sed -i '1s;^;#include <string>\n;' /trinityrnaseq-v2.15.1/trinity-plugins/bamsifter/sift_bam_max_cov.cpp
RUN cd /trinityrnaseq-v2.15.1 && make

#IgBLAST
RUN wget https://ftp.ncbi.nih.gov/blast/executables/igblast/release/1.21.0/ncbi-igblast-1.21.0-x64-linux.tar.gz && \
tar -xf ncbi-igblast-1.21.0-x64-linux.tar.gz && \
rm ncbi-igblast-1.21.0-x64-linux.tar.gz


COPY docker_helper_files/gencode_parse.py /bracer/docker_helper_files/gencode_parse.py

#aligners - kallisto
RUN wget https://github.com/pachterlab/kallisto/releases/download/v0.48.0/kallisto_linux-v0.48.0.tar.gz && tar -xzvf kallisto_linux-v0.48.0.tar.gz && rm kallisto_linux-v0.48.0.tar.gz

#obtaining the transcript sequences, no need for kallisto/salmon indices
RUN mkdir GRCh38 && cd GRCh38 && wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.transcripts.fa.gz && \
gunzip gencode.v43.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.v43.transcripts.fa && rm gencode.v43.transcripts.fa

RUN mkdir GRCm38 && cd GRCm38 && wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.transcripts.fa.gz && \
gunzip gencode.vM32.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.vM32.transcripts.fa && rm gencode.vM32.transcripts.fa

#regular BLAST
RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz
RUN tar -xzvf ncbi-blast-2.6.0+-x64-linux.tar.gz && rm ncbi-blast-2.6.0+-x64-linux.tar.gz
RUN wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.14.0/ncbi-blast-2.14.0+-x64-linux.tar.gz \
&& tar -xzvf ncbi-blast-2.14.0+-x64-linux.tar.gz && rm ncbi-blast-2.14.0+-x64-linux.tar.gz

#phylip
RUN wget http://evolution.gs.washington.edu/phylip/download/phylip-3.696.tar.gz
RUN tar -xzvf phylip-3.696.tar.gz && rm phylip-3.696.tar.gz
RUN cd phylip-3.696/src && make -f Makefile.unx install
RUN wget http://evolution.gs.washington.edu/phylip/download/phylip-3.697.tar.gz && tar -xzvf phylip-3.697.tar.gz && rm phylip-3.697.tar.gz
RUN cd phylip-3.697/src && sed -i 's/^CFLAGS =/CFLAGS = -fcommon/g' Makefile.unx && make -f Makefile.unx install

#Trim Galore! plus its dependency FastqC
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5.zip
RUN unzip fastqc_v0.11.5.zip && rm fastqc_v0.11.5.zip
RUN wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.1.zip && unzip fastqc_v0.12.1.zip && rm fastqc_v0.12.1.zip
RUN chmod 755 /FastQC/fastqc
RUN ln -s /FastQC/fastqc /usr/local/bin/fastqc
RUN curl -fsSL https://github.com/FelixKrueger/TrimGalore/archive/0.4.3.tar.gz -o trim_galore.tar.gz
RUN tar xvzf trim_galore.tar.gz && mv TrimGalore-0.4.3/trim_galore /usr/bin
RUN curl -fsSL https://github.com/FelixKrueger/TrimGalore/archive/refs/tags/0.6.10.tar.gz -o trim_galore.tar.gz
RUN tar xvzf trim_galore.tar.gz && mv TrimGalore-0.6.10/trim_galore /usr/bin

#R dependencies. libxml2-dev is a ghost dependency of an alakazam dependency not mentioned by the install crash
RUN apt-get -y install r-base libxml2-dev
#R dependencies
RUN R -e "install.packages('BiocManager')"
RUN R -e "BiocManager::install(c('GenomicAlignments', 'Biostrings', 'IRanges'))"
RUN R -e "install.packages(c('alakazam', 'ggplot2'), repos='http://cran.us.r-project.org')"


#Bowtie 2 as needs version 2.5.1 due to a bug in 2.5.0
RUN wget https://downloads.sourceforge.net/project/bowtie-bio/bowtie2/2.5.1/bowtie2-2.5.1-linux-x86_64.zip && unzip bowtie2-2.5.1-linux-x86_64.zip && rm bowtie2-2.5.1-linux-x86_64.zip

#bracer proper, no need to reposition resources as config will now know where this lives
COPY ./docker_helper_files/requirements_stable.txt ./bracer/docker_helper_files/
WORKDIR /bracer
RUN python3 -m venv venv
ENV VIRTUAL_ENV=/bracer/venv
ENV PATH=$VIRTUAL_ENV/bin:$PATH
RUN pip install --upgrade pip --break-system-packages
RUN pip3 install -r docker_helper_files/requirements_stable.txt --break-system-packages
COPY . /bracer
RUN cd /bracer && pip3 install -r docker_helper_files/requirements_stable.txt && python3 setup.py install

#obtaining the transcript sequences, no need for kallisto/salmon indices
RUN mkdir GRCh38 && cd GRCh38 && wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.transcripts.fa.gz && \
gunzip gencode.v27.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.v27.transcripts.fa && rm gencode.v27.transcripts.fa
RUN mkdir GRCm38 && cd GRCm38 && wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M15/gencode.vM15.transcripts.fa.gz && \
gunzip gencode.vM15.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.vM15.transcripts.fa && rm gencode.vM15.transcripts.fa
RUN python3 setup.py install
WORKDIR /

#placing a preconfigured bracer.conf in ~/.bracerrc
RUN cp /bracer/docker_helper_files/docker_bracer.conf ~/.bracerrc
RUN cp /bracer/docker_helper_files/docker_bracer.conf /home/.bracerrc

ENV BRACER_CONF=/bracer/docker_helper_files/docker_bracer.conf
ENV IGDATA=/ncbi-igblast-1.21.0/

#this is a bracer container, so let's point it at a bracer wrapper that sets the silly IgBLAST environment variable thing
#this is a bracer container, so let's point it at bracer and set -e
ENTRYPOINT ["bash", "/bracer/docker_helper_files/docker_wrapper.sh"]
59 changes: 51 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The developmental version of BraCeR is tested on Ubuntu 12.04.5 LTS. The Docker
#### Software requirements
1. [Python3](https://www.python.org) - BraCeR requires Python (>=3.4.0), as one of the required tools has this as a requirement.
2. [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) - required for alignment of reads to synthetic BCR genomes. Bowtie1 is also required.
3. [Trinity](https://github.com/trinityrnaseq/trinityrnaseq/wiki) - required for assembly of reads into BCR contigs. BraCeR requires Trinity v2.4.0.
3. [Trinity](https://github.com/trinityrnaseq/trinityrnaseq/wiki) - required for assembly of reads into BCR contigs. BraCeR requires Trinity >v2.4.0.
4. [IgBLAST](http://www.ncbi.nlm.nih.gov/igblast/faq.html#standalone) - required for analysis of assembled contigs. (ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/).
5. [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download) - required for determination of isotype. (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/).
6. [Kallisto](http://pachterlab.github.io/kallisto/) - software for quantification of BCR expression.
Expand All @@ -47,11 +47,13 @@ The developmental version of BraCeR is tested on Ubuntu 12.04.5 LTS. The Docker
9. [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) - required for adapter and quality trimming (optional).

##### Software versions
BraCeR has been tested on the following versions of software dependencies: bowtie2 v2.2.8, bowtie v.1.1.2, IgBlast v.1.4.0 - v.1.7.0, BLAST v.2.2.31+, Kallisto v.0.43.0, Trinity v.2.4.0, graphwiz v.2.26.3, changeo v.0.3.7, RScript v.3.3.2, phylip (dnapars) v.3.696, Trim Galore v.0.4.4, cutadapt v.1.14, ggplot2 v.2.2.1, alakazam v.0.2.6
BraCeR 0.1 has been tested on the following versions of software dependencies: bowtie2 v2.2.8, bowtie v.1.1.2, IgBlast v.1.4.0 - v.1.7.0, BLAST v.2.2.31+, Kallisto v.0.43.0, Trinity v.2.4.0, graphwiz v.2.26.3, changeo v.0.3.7, RScript v.3.3.2, phylip (dnapars) v.3.696, Trim Galore v.0.4.4, cutadapt v.1.14, ggplot2 v.2.2.1, alakazam v.0.2.6

Bracer 0.2 has been tested on the following versions of software dependencies: bowtie2 v2.5.1, IgBlast v1.21.0, BLAST v2.14.0, Kallisto v0.48.0, Trinity v2.15.1, graphviz v2.42.2, changeo v1.3.0, RScript v4.3.1, phylip v3.697, Trim Galore v0.6.10, cutadapt v4.4, ggplot2 v3.4.2, alakazam v1.2.1


##### Installing IgBlast
You should also ensure to set the `$IGDATA` environment variable to point to the location of the IgBlast executable. For example run `export IGDATA=/<path_to_igblast>/igblast/1.4.0/bin`.
You should also ensure to set the `$IGDATA` environment variable to point to the location of the IgBlast `internal_data` parent folder. For example run `export IGDATA=/<path_to_igblast>/igblast/1.4.0/bin` or with the latest version of IgBlast `export IGDATA=/<path_to_igblast>/ncbi-igblast-1.21.0/`

#### R packages
The following R packages are required if BraCeR is run with `--infer_lineage`.
Expand Down Expand Up @@ -132,11 +134,11 @@ Trinity needs to know the maximum memory available to it for the Jellyfish compo

Location of the transcriptome fasta file to which the specific BCR sequences will be appended from each cell. This must be a plain-text fasta file so decompress it if necessary. Transcriptome files for human or mice may be downloaded with the following code:

mkdir GRCh38 && cd GRCh38 && wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.transcripts.fa.gz && \
gunzip gencode.v27.transcripts.fa.gz && python3 /path/to/bracer/docker_helper_files/gencode_parse.py gencode.v27.transcripts.fa && rm gencode.v27.transcripts.fa
mkdir GRCh38 && cd GRCh38 && wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.transcripts.fa.gz && \
gunzip gencode.v43.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.v43.transcripts.fa && rm gencode.v43.transcripts.fa

mkdir GRCm38 && cd GRCm38 && wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M15/gencode.vM15.transcripts.fa.gz && \
gunzip gencode.vM15.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.vM15.transcripts.fa && rm gencode.vM15.transcripts.fa
mkdir GRCm38 && cd GRCm38 && wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.transcripts.fa.gz && \
gunzip gencode.vM32.transcripts.fa.gz && python3 /bracer/docker_helper_files/gencode_parse.py gencode.vM32.transcripts.fa && rm gencode.vM32.transcripts.fa

### BraCeR directory

Expand Down Expand Up @@ -318,7 +320,7 @@ The following output files and subdirectories may be generated (depending on whi

## Docker image

BraCeR is also available as a standalone Docker image on [DockerHub](https://hub.docker.com/r/teichlab/bracer/), with all of its dependencies installed and configured appropriately. Running BraCeR from the image is very similar to running it from a normal installation. You can pass all the appropriate arguments to the Docker command with the usual syntax as described above. One difference is that you don't need to worry about specifying a configuration file. This is included in the container.
BraCeR 0.1 is also available as a standalone Docker image on [DockerHub](https://hub.docker.com/r/teichlab/bracer/), with all of its dependencies installed and configured appropriately. Running BraCeR from the image is very similar to running it from a normal installation. You can pass all the appropriate arguments to the Docker command with the usual syntax as described above. One difference is that you don't need to worry about specifying a configuration file. This is included in the container.

To run the BraCeR Docker image, run the following command from within a directory that contains your input data:

Expand All @@ -335,3 +337,44 @@ For example, if you wanted to run the test analysis, you should clone this GitHu
If you wish to use `bracer build`, you will need to specify `--resource_dir /scratch`, as otherwise the resulting resources will be saved in the default location of the container and subsequently get forgotten about when the build analysis completes, making them unuseable for any actual analyses you may want to perform. This will make the Docker container save the resulting resources in the volume you created, and you can use them for assemble/summarise by running the Dockerised BraCeR from the same directory as the one you used for the build and specifying `--resource_dir /scratch`.

You may need to explicitly tell Docker to increase the memory that it can use. Instructions for [Windows](https://docs.docker.com/docker-for-windows/#advanced) and [Mac](https://docs.docker.com/docker-for-mac/#advanced). Something like 6 or 8 GB is likely to be ok.

To build the BraCeR 0.2 docker image, run the following command:

docker build . -t bracer

## Singularity

If you want to convert the Docker image to a singularity image and run it, you can do so as follows. You need to specify the location of the config file within the singularity container for Bracer 0.1 due to the differences in how Docker and Singularity handle users and $HOME expansion:

singularity pull bracer.sif docker://teichlab/bracer
singularity run \
--bind $PWD \
--pwd $PWD \
--containall \
--cleanenv \
./bracer.sif test -c /bracer/docker_helper_files/docker_bracer.conf -o test_data

For BraCeR 0.2, you will need to first build the image yourself.

docker build . -t bracer
singularity build bracer_0.2.sif docker-daemon://bracer:latest
singularity run \
--bind $PWD \
--pwd $PWD \
--containall \
--cleanenv \
./bracer_0.2.sif

If you want to run the test data all the way through to the lineage pdf, then you'll first need to copy the cell2 and cell3 test data out of the image as follows.

mkdir test_data
singularity build --sandbox bracer_singularity bracer_0.2.sif
cp -r bracer_singularity/bracer/test_data/results/ test_data/
singularity run \
--bind $PWD \
--pwd $PWD \
--containall \
--cleanenv \
./bracer_0.2.sif test -o test_data --infer_lineage


Loading

0 comments on commit 62d2edb

Please sign in to comment.