Skip to content

1. overview of lsaBGC‐Pan

Rauf Salamzade edited this page Aug 13, 2024 · 7 revisions

image

The 12 steps of lsaBGC-Pan:

PART 1

  • Step 1: Assess inputs provided
  • Step 1a: If genomes are provided, perform gene-calling with p(y)rodigal and annotate BGCs with GECCO
  • Step 1b: If antiSMASH results are provided, extracted genes from full genome GenBank files. If GECCO is requested, then it will also be run and overlapping GECCO and antiSMASH BGC regions will be consolidated by taking the larger region.
  • Step 2: Run OrthoFinder/Panaroo for orthology inference.
  • Step 3: Create species tree/phyogeny from (near-) core ortholog groups.
  • Step 4: Infer populations (will do so at multiple "core AAI" cutoffs) [no checkpoint, always rerun].
  • Step 5: Run lsaBGC-Cluster.py to determine evolutionary GCFs (by default in testing mode unless --auto-cluster specified) [no checkpoint, always rerun].

BREAK (optional - but recommended - can be skipped by issuing --no-break)

  • Step 6a: Manually examine which parameters make the most sense for evolutionary clustering of GCFs. Restart the workflow after with parameters for gene cluster clustering adapted.
  • Step 6b: Manually assess how population designations structure along the species tree with different core AAI cutoffs and, if desired, adjust population designations.

PART 2

  • Step 7: Parallel running of zol (twice with and without the -ace option) per GCF.
  • Step 8: Parallel running of GSeeF, lsaBGC-See, and lsaBGC-ComprehenSeeIve per GCF.
  • Step 9: Parallel running of lsaBGC-MIBiGMapper.
  • Step 10: Run lsaBGC-Reconcile
  • Step 11: Run lsaBGC-Sociate
  • Step 12: Create consolidated report of zol, lsaBGC-MIBiGMapper, lsaBGC-Reconcile, and lsaBGC-Sociate results [no checkpoint, always rerun].