-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Pre-built Linux and Mac binaries are available from Releases page.
- Qt 6
- CMake
- C++17-compliant compiler
mkdir build
cd build
cmake ..
make
To rotate contig you should hold down the right mouse button and move one of the ends of the contig.
You can visualize Hi-C links between different contigs in the de Bruijn graph. Hi-C links are shown like as dotted lines connecting the midpoints of contigs.
To load Hi-C metadata in Bandage you need to choose "Load Hi-C data" items in menu "File". You can load file with Hi-C data only after loading de Bruijn graph.
Below is an example of a Hi-C metadata TXT file:
v1 v2 hic_w
1268598 831795 6516
1072702 831795 5454
1268598 524477 1548
Hi-C metadata file should be in TXT format with three columns split by tab (/t). In the first and second columns there are IDs of contigs connecting by Hi-C links. In the third colums there is weight(number of Hi-C links). First row should contains name of columns.
To draw de Bruijn graph with Hi-C links you should click on the "Draw graph" button after lading Hi-C metadata.
You should fit and choose filter settings and click on the "Draw graph" button to draw graph after change filter of Hi-C links visualization.
-
You can choose min Hi-C weight, then Hi-C links with weight less then min weight will be not visualized.
-
You can choose min length of contig's nucleic acid sequence. Hi-C link connecting short contigs wasn't shown.
-
You can choose filter of Hi-C links inclusion:
- All edges - All Hi-C links will be shown
- All edges link groups - All Hi-C links connecting contigs from different graph's connected components will be shown
- One edge links groups - Only one Hi-C link between different graph's connected components will be shown
This features allows to visualize RandomForest, AdaBoost or Gradient Boosted Decision Trees. If you use Decision Trees models based on features extracted from metagenomic dataset (for example, features extracted by MetaFX), then it is possible to visualize Forest model due BandageNG. Also this implemetation of BandageNG support mapping features used in decision trees on the part of contigs in De Bruijn graph. To synchronized forest model and De Bruijn graph you should click on the "Map features to De Bruijn graph" button.
This implemenation allow to visualize notes for every tree nodes: node ID, Class or custom notes. Choosing one node from forest all information (ID, Split rule description, class (of feature or class of leaf) and set of nucleotide sequences) was visualized.
This implemenation allow to choose one of colour schema:
- uniform colour - all tree nodes have one uniform colour
- class colour - tree nodes coloured according with their class. Node with one class have same colour and nodes with different classes have defiierent colour.
- BLAST hits (solid) - could be use only after mapping features to De Bruijn graph. Coloured tree nodes and part of contigs matched ont the nodes in random coloured.
- BLAST hits (class colours) - could be use only after mapping features to De Bruijn graph. Coloured tree nodes and part of contigs matched on the nodes acccourding with tree node class.
To visualize ML model need to load txt file in special format that is descriped below. You can write TXT file themself or you can use build_model_for_bandage.py script to generate input file. To run script you should fit parameters: --source-dir, --res-file, --model-file
Parameter | Description |
---|---|
--model-file | Joblib dump trained model. Support sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.AdaBoostClassifier, sklearn.ensemble.GradientBoostingClassifier |
--res-file | File name to save output result |
--source-dir | Source directory that contains FAST files ${sourceDir}/contigs_/${fClass}/kmers_fasta/component.fasta for every class in ML model with nucleotide sequences of features. Sequences name in FASTA file should be equal with feature name or ID |
build_model_for_bandage.py --model-file '.\RandomForest.joblib' --res-file '.\RandomForestModel.txt' --source-dir '.\source-dir'
All trees should be descriped in one TXT file. All treen nodes in forest should have unique ID. Rows in input file could be in three
There are four types of rows in a input file:
Row format | Description | Example |
---|---|---|
N ${Node ID} ${Left child ID} ${Right child ID} | Row describes tree node and contains node ID and Ids of node children if neede | N 1 2 3 |
F ${Node ID} ${Fetaure ID} ${Threshold} | Row describes feature and contains node ID, feature name or ID and threshold value (float) to split tree node to children | F 1 f_1 0.25 |
C ${Node ID} {Class} | Row describes node class and contains node ID and class of leaf (for leafes) or class of feature (for inner nodes) | C 1 NonIBD |
S ${Fetaure ID} {Sequence} | Row describe one nucleotide sequence of featuere and contains feature ID or name and one nucleotide sequence | S 1 GGAGCG |
All data should be separate by tabulation ("\t"). Every rows should start with one of special symbols: N, F, C or S.
Some rules:
- Every tree node should have only one row with prefix "N" and "C" in input file.
- Every inner tree node should have only one row with prefix "F" in input file.
- Every feature can have one or more row with prefix "S" in input file. When feature doesn't have nucleotide sequrnces then this fetaure cannot be matched to contig in De Bruijn graph.