Mixed Perfect Phylogeny v0.1

For questions or problems with this code, contact tomescu@cs.helsinki.fi, or open an issue on github.

This repository contains implementations of the two algorithms from:

Ademir Hujdurović, Urša Kačar, Martin Milanič, Bernard Ries, and Alexandru I. Tomescu, Complexity and algorithms for finding a perfect phylogeny from mixed tumor samples, 2016, submitted.

A shorter version of this paper appeared in the Proceedings of WABI 2015:

Ademir Hujdurović, Urša Kačar, Martin Milanič, Bernard Ries, and Alexandru I. Tomescu, Finding a Perfect Phylogeny from Mixed Tumor Samples. WABI 2015, LNCS 9289, pp. 80-92, extended version available at http://arxiv.org/abs/1506.07675.

Both algorithms address the MINIMUM CONFLICT-FREE ROW SPLIT problem, originally proposed by I. Hajirasouliha, B. Raphael, Reconstructing Mutational History in Multiply Sampled Tumors Using Perfect Phylogeny Mixtures. WABI 2014: 354-367 doi:10.1007/978-3-662-44753-6_27. A visual description of this problem is available in these slides. Our first algorithm is exact, runs in polynomial time, but only solves a particular class of inputs. The second algorithm is heuristic, i.e. runs on all inputs in polynomial time, but may not produce an optimal solution.

1. Input format

The input matrix must be in .csv format, with ';' as value separator. Such files can be written/edited with e.g. Excel. The first column must contain the row (i.e., sample) names, and the first row must contain the column (i.e., mutation location) names. For example, the following table

	c1	c2	c3	c4	c5	c3'
r1	1	0	0	0	0	0
r2	1	1	1	1	0	1
r3	0	1	0	1	0	0
r4	0	1	1	0	1	1
r5	0	1	0	0	0	0

is encoded as:

;c1;c2;c3;c4;c5;c3'
r1;1;0;0;0;0;0
r2;1;1;1;1;0;1
r3;0;1;0;1;0;0
r4;0;1;1;0;1;1
r5;0;1;0;0;0;0

2. Output format

The output matrix has the same .csv format. If a row labeled r is split into k rows in the output matrix, the labels of the resulting rows will be r_1, r_2, ..., r_k. For example:

;c1;c2;c3;c3';c4;c5
r1;1;0;0;0;0;0
r2_1;1;0;0;0;0;0
r2_2;0;1;1;1;0;0
r2_3;0;1;0;0;1;0
r3;0;1;0;0;1;0
r4;0;1;1;1;0;1
r5;0;1;0;0;0;0

The program also outputs the perfect phylogeny tree of this output matrix, in .dot format. This can be visualized by e.g., Graphviz. For example, the tree of this output matrix is

3. Installation

If you use Linux or Mac OS, type in the root directory:

make

This creates the executable:

mixedphylogeny

Our code requires the free Lemon library. (We provided a pre-compiled version for Linux/Mac OS in the directory lemon_binaries_linux.) If you use a different operating system, install Lemon as described in its manual, modify our makefile to point to your installation of Lemon, and run make again.

4. Running

If you want to run the first algorithm (the exact one for the polynomially solvable instances), run e.g.:

./mixedphylogeny -i example/matrix_alg1.csv -o example/matrix_alg1.out.csv

If your input matrix does not belong to the polynomially-solvable case, then the program will tell you so. In that case, you can run our second, heuristic, algorithm, by adding the parameter --heuristic, e.g.:

./mixedphylogeny -i example/matrix_alg2.csv -o example/matrix_alg2.out.csv --heuristic

Usually, you might want to ignore the columns that appear in the matrix few times (i.e., which repeat very few times as binary vectors). They might be caused to errors in the data. You may do so with the option --minsupport, e.g.:

./mixedphylogeny -i example/matrix_alg2.csv -o example/matrix_alg2.out.csv --heuristic --minsupport 2

The above command keeps only those columns that appear at least 2 times (as binary vectors) in the matrix.

5. Experimental results

We tested the heuristic algorithm on the ten binary matrices from:

M. Gerlinger et al., Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing, Nature Genetics, 46(3), 2014, 225-232, doi:10.1038/ng.2891.

Our conflict-free row split matrices and the corresponding phylogenetic trees are available here.

Our results obtained with the option --minsupport 2 are available here.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
example		example
lemon_binaries_linux		lemon_binaries_linux
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixed Perfect Phylogeny v0.1

1. Input format

2. Output format

3. Installation

4. Running

5. Experimental results

About

Releases

Packages

Languages

License

alexandrutomescu/MixedPerfectPhylogeny

Folders and files

Latest commit

History

Repository files navigation

Mixed Perfect Phylogeny v0.1

1. Input format

2. Output format

3. Installation

4. Running

5. Experimental results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages