Single B cell analysis of BND2 population

Writen by Kristen L. Wells

All scripts necessary to replicate the analysis from the manuscript Identification of an anergic BND cell-derived activate B cell population (BND2) in young-onset type 1 diabetes.

Data processing steps

Starting from raw fastq files

If you choose to start from the raw fastq files, you can run these through an existing snakemake pipeline. To run this pipeline, first download the raw fastq files from GEO (link) and then update the config file below. The snakemake pipline is built to work on an lsf server.

Otherwise, you can update the config file with your own structure.

To use the snakemake pipeline

Download and install miniconda3: For Linux

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh

Install Snakemake:

conda install snakemake -c bioconda -c conda-forge

Update the config file (config.yaml)

RAW_DATA: Location of the raw data

SAMPLES: the list of samples you want to test. This is the name that will be in the output files. The order must be identical to the order of RNA_SAMPLES, ADT_SAMPLES, and VDJ_SAMPLES

AGGR_SAMPLES: Which samples should be aggregated together at the end of the pipeline. These should be samples that were split between multiple 10x runs.

RNA_SAMPLES: The name of the RNA samples. Not the full fastq name, but the name that is the same for all samples. Likely ends in "GEX"

ADT_SAMPLES: optional The name of the samples from CITE-seq or hashtagging. If CITE-seq and hashtagging files are separate, include them both separated by a comma. Put samples in the same order as their RNA counterparts. If CITE-seq or hashtagging were not performed, leave this blank.

VDJ_SAMPLES: optional The name of the samples from VDJ-seq. Put samples in the same order as their RNA counterparts. If VDJ-sequencing was not peformed, leave this blank.

RESULTS: Path to the output directory

GENOME: Path to the cellranger genome reference

ADT_REF: optional Path to the ADT-reference. Should be a comma separated file with the columns described in the 10x tutorial: id, name, read, pattern, sequence, and feature_type. The feature_type will be Antibody Capture. The name will be the name in the final output matrix. Leave this blank if CITE-seq or hashtagging were not performed.

VDJ_REF: Path to the cellranger VDJ reference. If VDJ sequencing were not performed, leave this blank.

MAX_JOBS: The maximum number of jobs that can be submitted by cell ranger at a time

LSF_TEMPLATE: Path to an LSF template. One is included in this git repo.

CHEMISTRY: optional Arguments to the --chemstiry flag in cellranger count. If left blank, chemistry will be auto. Only use this if the pipeline failed because of a failure to detect the chemistry. Can be filled in for only some samples.

Note: we sequenced the same capture two times. To use the config file as is for the raw data, create the following directory structure

- full_raw_data
|--- round_1
|---|--- JH_GEX_S34_L001_R1_001.fastq.gz # JH_round1_GEX_S34_L001_R1_001.fastq.gz
|---|--- JH_GEX_S34_L001_R2_001.fastq.gz # JH_round1_GEX_S34_L001_R2_001.fastq.gz
|---|--- JH_ADT_S7_L003_R1_001.fastq.gz # JH_round1_ADT_S7_L003_R1_001.fastq.gz
|---|--- JH_ADT_S7_L003_R2_001.fastq.gz # JH_round1_ADT_S7_L003_R2_001.fastq.gz
|--- round_2
|---|--- JH_GEX_S6_L002_R1_001.fastq.gz # JH_round2_GEX_S6_L002_R1_001.fastq.gz
|---|--- JH_GEX_S6_L002_R2_001.fastq.gz # JH_round2_GEX_S6_L002_R2_001.fastq.gz
|---|--- JH_ADT_S5_L002_R1_001.fastq.gz # JH_round2_ADT_S5_L002_R1_001.fastq.gz
|---|--- JH_ADT_S5_L002_R2_001.fastq.gz # JH_round2_ADT_S5_L002_R2_001.fastq.gz

Update snakecharmer.sh to your specific cluster specs.

change the -q argument to the queue you want to use

submit the job using bsub < snakecharmer.sh
I highly recommend looking at the csv files that are generated and passed to cell ranger to ensure that the correct fastq files have been detected for each sample.

Starting from the output of cellranger

If you start from the files output from cellranger either by running the above pipeline or by downloading the files from geo (here) you can run just the r scripts.

All the scripts rely on the scAnalysisR package on github

library(devtools)
install_github("CUAnschutzBDC/scAnalysisR")

To directly run the scripts, you must first put the filtered outpts (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz) into a folder called results/healthy_bcells_all/outs/filtered_feature_bc_matrix)

Once this is done, you can run the scripts in the healthy_bcells_all directory in numeric order. And then run the scripts in the healthy_bcells_all_subset directory also in numeric order. This will produce the final UMAP from the figures.

If you only want the final umap, you can just download the metadata, subset to only cells that are "TRUE" for the column "in_final_obj". The UMAP and cluster data are contained with the metadata.

The metadata is also contained in this repository under geo/final_metadata.csv

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
files		files
geo		geo
notes		notes
renv		renv
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Smith_210825_b_cell_analysis.Rproj		Smith_210825_b_cell_analysis.Rproj
Snakefile		Snakefile
config.yaml		config.yaml
lsf.template		lsf.template
plotting_function_for_zach.R		plotting_function_for_zach.R
renv.lock		renv.lock
run_vdj.sh		run_vdj.sh
snakecharmer.sh		snakecharmer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single B cell analysis of BND2 population

Data processing steps

Starting from raw fastq files

To use the snakemake pipeline

Starting from the output of cellranger

About

Releases

Packages

Languages

License

CUAnschutzBDC/Smith_210825_b_cell_analysis

Folders and files

Latest commit

History

Repository files navigation

Single B cell analysis of BND2 population

Data processing steps

Starting from raw fastq files

To use the snakemake pipeline

Starting from the output of cellranger

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages