AgRenSeq is a pipeline to identify candidate resistance (R) genes in plants directly from a diversity panel. The diversity panel needs to be sequenced (R gene enrichment sequencing - RenSeq) and phenotyped. Phenotype scores need to be converted to AgRenSeq scores that assign positive values to resistance and negative values to suscetibility. An intermediate phenotype should have an AgRenSeq score close to zero.
For RenSeq you will need a bait library that targets R genes in your plant species. A bait library for Aegilops tauschii can be found here. We reccomend Arbor biosciences for synthesis of baits. They also offer the enrichment service.
More about this method can be found in the manuscript http://biorxiv.org/cgi/content/short/248146v1. The final paper has been published here and includes an improved method to generate association scores.
Make sure you have the Java Runtime Environments 1.6 or higher. Download from http://java.com
We recommend to quality trim your sequences before the k-mer counting. Check your quality with FastQC. Use a tool such as Trimmomatic (http://www.usadellab.org/cms/?page=trimmomatic) for read preprocessing.
Download and install jellyfish from http://www.genome.umd.edu/jellyfish.html
At one point we need assemblies of RenSeq data. We reccomend using CLC assembly cell (https://www.qiagenbioinformatics.com/products/clc-assembly-cell/), but free software, such as MaSuRCA (http://www.genome.umd.edu/masurca.html) works as well.
Download NLR-Parser from github.org/MutantHunter
To run it, you will also need the meme.xml containing the definitions of NLR associated motifs.
For visualization, we use R. Donwload from https://cran.r-project.org/
zcat accession1_R?.fastq.gz | jellyfish count -C -m 51 -s 3G -o accession1.jf /dev/fd/0
jellyfish dump -L 10 -ct accession1.jf > accession1.dump.txt
This is a simple tab separated file with the accession names in the first column and the paths to the jellyfish dumps in the second column.
accession1 path/to/accession1.dump.txt
accession2 path/to/accession2.dump.txt
...
accessionN path/to/accessionN.dump.txt
java -jar AgRenSeq_CreatePresenceMatrix.jar -i accessions.txt -o AgRenSeq_k51_presencematrix.txt -t 3 -n 10
Parameter | Argument | Description |
---|---|---|
-i | accessions.txt | Mandatory. The path to the configuration file created in step 3. |
-o | outputMatrix.txt | Mandatory. The path to the output file that will contain the matrix. |
-n | integer | Default 10. The minimum kmer count for a k-mer to be considered present. |
-t | integer | Default 3. A k-mer present in less accessions than this value or present in all but this value accessions will not be printed. |
This is a tab separated file with accession names in the first colum. The following columns contain AgRenSeq scores. The recoreded score will be the average of all scores in one line. For AgRenSeq, the scores need to be negative for susceptible and positive for resistant.
This is an example of the conversion for Stackman's IT (for wheat stem rust) to AgRenSeq scores.
Stackman's IT | AgRenSeq score |
---|---|
0 | 2 |
; | 1.67 |
1- | 1.33 |
1 | 1 |
1+ | 0.67 |
2- | 0.33 |
2 | 0 |
2+ | -0.33 |
3- | -0.67 |
3 | -1 |
3+ | -1.33 |
4 | -2 |
Pick an accession where you expect R genes to be (according to your phenotype). Run a de novo assembly on the RenSeq data.
We have good experience with CLC assembly cell.
This will select the contigs in the de novo assembly that are associated with NLRs and in this way gets rid of off-target contigs.
java -jar NLR-Parser.jar -t <number of threads> -y <path/to/meme/bin/mast> -x <path/to/meme.xml> -i <sub-seqeunces.fasta> -o <output.nlr.txt>
This process will sum up the AgRenSeq scores from accessions where a k-mer is present and assigns the sum as an association score to a k-mer. In a second step, all association scores from k-mers within a contig from the de novo assembly will be recorded in a tab separated file. For each contig, one line per unique association score is written as well as the number of k-mers that have been assigned with that score. Column 1 is the contig identifier, column 2 is a running number that increases with each contig, column 3 is the association score, column 4 is the number of k-mers in that contig that have been assigned with that score.
java -jar AgRenSeq_RunAssociation.jar -i prenseceMatrix -p phenotype -o AgRenSeqResult.txt
Plot the result from step 9. using R. A simple script for R will look similar to this:
file<- read.table("AgRenSeqResult.txt", sep="\t")
v <- file$V1
x<- file$V2
y <- file$V3
z<- file$V4
plot( x, y, pch = 20, cex = 0.5, main="TTKSK (Ug99)", ylab="score", xlab="NLR contigs BW_01077")
points( x, y, pch = 20, cex = 0.5)
points( x[z>25], y[z>25], pch = 20, cex = 1)
points( x[z>50], y[z>50], pch = 20, cex = 1.5)
points( x[z>70], y[z>70], pch = 20, cex = 2)
points( x[z>100], y[z>100], pch = 20, cex = 2.5)
points( x[z>125], y[z>125], pch = 20, cex = 3)