Skip to content

Polishing after Trycycler

Ryan Wick edited this page May 26, 2020 · 37 revisions

Now that you've finished running Trycycler, you should have a cluster directory for each replicon in your genome, each containing a 7_final_consensus.fasta file.

You could at this point combine their consensus sequences into a single FASTA file like this:

cat trycycler/cluster_*/7_final_consensus.fasta > trycycler/consensus.fasta

Medaka

Assuming your long reads are from an Oxford Nanopore sequencer, you can run Medaka on Trycycler's consensus sequences to further increase their accuracy. Medaka uses fastq reads as input (as opposed to raw-signal fast5 reads) which makes it easy to run on Trycycler's clusters with partitioned reads. And last time I checked, Medaka gave the highest-identity results for Nanopore-only assemblies.

The commands could look something like this:

for cluster in trycycler/cluster_*; do
    medaka_consensus -i "$cluster"/4_reads.fastq -d "$cluster"/7_final_consensus.fasta -o "$cluster"/medaka -m r941_min_high_g360
    mv "$cluster"/medaka/consensus.fasta "$cluster"/8_medaka.fasta
    rm -r "$cluster"/medaka "$cluster"/*.fai "$cluster"/*.mmi
done

Note that you need to change the model parameter to whatever is most appropriate for your basecalling.

You can then combine the Medaka-polished sequences into a single FASTA file:

cat trycycler/cluster_*/8_medaka.fasta > trycycler/consensus.fasta

Pilon

If you also have Illumina reads, then you could use Pilon to polish more.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Clone this wiki locally