Eukfinder

Overview

Eukfinder is a modular pipeline for classifying WGS metagenomic data and recovering potential eukaryotic sequences. It supports both Illumina short reads (Eukfinder_short) and assemblies or long-read data (Eukfinder_long).

Key Features:

Automated classification of potential eukaryotic sequences.
Flexible design for short-read, long-read, or assembly data.
Optional binning workflow for refining nuclear and mitochondrial genomes.
Customizable databases for different environments (e.g., gut, ocean, soil).

Eukfinder has two different modes of operation based on the input files:

(a) Illumina short reads workflow (Eukfinder_short): Short reads are first classified into five taxonomic categories (Archaeal, Bacterial, Viral, Eukaryotic, and Unknown) using Centrifuge (DB1) and PLAST (DB2). Reads classified as 'Eukaryotic' or 'Unknown' are assembled into contigs using metaSpades. These contigs are then reclassified with Centrifuge and PLAST. Contigs assigned as 'Eukaryotic' or 'Unknown' are combined and treated as potential eukaryotic sequences, which can be further analyzed for downstream binning and genome recovery.
(b) Metagenome assembled contigs or long-read sequencing workflow (Eukfinder_long): For MAG assembled contigs or long-read sequencing data generated by Nanopore or PacBio platforms, the workflow performs a single round of classification to select 'Eukaryotic' and 'Unknown' contigs. These selected contigs are combined and treated as potential eukaryotic sequences, ready for further binning and downstream analysis.

Schematic representation of Eukfinder pipeline:

The Eukfinder documentation is found on the wiki site.

All feedback is appreciated! Please open an issue on this repository if you would like to ask a question or make a comment.

Publication/Citation

Zhao, D., Salas-Leiva, D.E., Williams, S.K., Dunn, K.A. and Roger, A.J., 2023. Eukfinder: a pipeline to retrieve microbial eukaryote genomes from metagenomic sequencing data. bioRxiv, pp.2023-12.

Contact

Dandan Zhao (d.zhao@dal.ca) Dayana Salas-Leiva (ds2000@cam.ac.uk)

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github/workflows		.github/workflows
Building_custom_DB		Building_custom_DB
bin		bin
command_lines_by_task		command_lines_by_task
.gitignore		.gitignore
Eukfinder_workflow.jpg		Eukfinder_workflow.jpg
LICENSE.md		LICENSE.md
README.md		README.md
download_db.sh		download_db.sh
eukfinder_env.yml		eukfinder_env.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eukfinder

Overview

Publication/Citation

Contact

About

Releases 1

Packages

Contributors 3

Languages

License

RogerLab/Eukfinder

Folders and files

Latest commit

History

Repository files navigation

Eukfinder

Overview

Publication/Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages