Skip to content
Shostina edited this page Aug 20, 2023 · 10 revisions

Intro

Pre-built binaries

Pre-built Linux and Mac binaries are available from Releases page.

Prerequisites (for building from the source code)

  • Qt 6
  • CMake
  • C++17-compliant compiler

Building from source

mkdir build
cd build
cmake ..
make

New features:

Rotate contig

To rotate contig you should hold down the right mouse button and move one of the ends of the contig. video_3

Hi-C links visualization

You can visualize Hi-C links between different contigs in the de Bruijn graph. Hi-C links are shown like as dotted lines connecting the midpoints of contigs.

Load Hi-C metadata

To load Hi-C metadata in Bandage you need to choose "Load Hi-C data" items in menu "File". You can load file with Hi-C data only after loading de Bruijn graph.

image

Below is an example of a Hi-C metadata TXT file:

v1	v2	hic_w
1268598	831795	6516
1072702	831795	5454
1268598	524477	1548

Hi-C metadata file should be in TXT format with three columns split by tab (/t). In the first and second columns there are IDs of contigs connecting by Hi-C links. In the third colums there is weight(number of Hi-C links). First row should contains name of columns.

Draw graph with Hi-C links

To draw de Bruijn graph with Hi-C links you should click on the "Draw graph" button after lading Hi-C metadata.

image

Filter Hi-C links

You should fit and choose filter settings and click on the "Draw graph" button to draw graph after change filter of Hi-C links visualization.

image

  • You can choose min Hi-C weight, then Hi-C links with weight less then min weight will be not visualized.

  • You can choose min length of contig's nucleic acid sequence. Hi-C link connecting short contigs wasn't shown.

  • You can choose filter of Hi-C links inclusion:

    • All edges - All Hi-C links will be shown

    image

    • All edges link groups - All Hi-C links connecting contigs from different graph's connected components will be shown

    image

    • One edge links groups - Only one Hi-C link between different graph's connected components will be shown

    image

A multitude of decision trees visualization

This features allows to visualize RandomForest, AdaBoost or Gradient Boosted Decision Trees. If you use Decision Trees models based on features extracted from metagenomic dataset (for example, features extracted by MetaFX), then it is possible to visualize Forest model due BandageNG. Also this implemetation of BandageNG support mapping features used in decision trees on the part of contigs in De Bruijn graph. To synchronized forest model and De Bruijn graph you should click on the "Map features to De Bruijn graph" button.

otchet_2_11

This implemenation allow to visualize notes for every tree nodes: node ID, Class or custom notes. Choosing one node from forest all information (ID, Split rule description, class (of feature or class of leaf) and set of nucleotide sequences) was visualized.

otchet_2_13

This implemenation allow to choose one of colour schema:

  1. uniform colour - all tree nodes have one uniform colour
  2. class colour - tree nodes coloured according with their class. Node with one class have same colour and nodes with different classes have defiierent colour.
  3. BLAST hits (solid) - could be use only after mapping features to De Bruijn graph. Coloured tree nodes and part of contigs matched ont the nodes in random coloured.
  4. BLAST hits (class colours) - could be use only after mapping features to De Bruijn graph. Coloured tree nodes and part of contigs matched on the nodes acccourding with tree node class.

otchet_2_4

To visualize ML model need to load txt file in special format that is descriped below. You can write TXT file themself or you can use build_model_for_bandage.py script to generate input file. To run script you should fit parameters: --source-dir, --res-file, --model-file

Parameter Description
--model-file Joblib dump trained model. Support sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.AdaBoostClassifier, sklearn.ensemble.GradientBoostingClassifier
--res-file File name to save output result
--source-dir Source directory that contains FAST files ${sourceDir}/contigs_/${fClass}/kmers_fasta/component.fasta for every class in ML model with nucleotide sequences of features. Sequences name in FASTA file should be equal with feature name or ID

build_model_for_bandage.py --model-file '.\RandomForest.joblib' --res-file '.\RandomForestModel.txt' --source-dir '.\source-dir'

Input file format

All trees should be descriped in one TXT file. All treen nodes in forest should have unique ID. Rows in input file could be in three

There are four types of rows in a input file:

Row format Description Example
N ${Node ID} ${Left child ID} ${Right child ID} Row describes tree node and contains node ID and Ids of node children if neede N 1 2 3
F ${Node ID} ${Fetaure ID} ${Threshold} Row describes feature and contains node ID, feature name or ID and threshold value (float) to split tree node to children F 1 f_1 0.25
C ${Node ID} {Class} Row describes node class and contains node ID and class of leaf (for leafes) or class of feature (for inner nodes) C 1 NonIBD
S ${Fetaure ID} {Sequence} Row describe one nucleotide sequence of featuere and contains feature ID or name and one nucleotide sequence S 1 GGAGCG

All data should be separate by tabulation ("\t"). Every rows should start with one of special symbols: N, F, C or S.

Some rules:

  • Every tree node should have only one row with prefix "N" and "C" in input file.
  • Every inner tree node should have only one row with prefix "F" in input file.
  • Every feature can have one or more row with prefix "S" in input file. When feature doesn't have nucleotide sequrnces then this fetaure cannot be matched to contig in De Bruijn graph.

Clone this wiki locally