A perfect phylogeny is a rooted tree representing the evolutionary history of a set of n objects. The objects bijectively label the leaves of the tree and there are m binary characters, each labeling exactly one edge of the tree. For each leaf, the set of characters
that appear on the unique root-to-leaf path is the set of characters taking value 1 at the object labeling the leaf. While every perfect phylogeny naturally corresponds to an n x m binary matrix having objects as rows and characters as columns, the perfect phylogeny
problem asks the opposite question: Does a given binary matrix correspond to a perfect phylogeny? The problem is well known to be polynomially solvable: the yes instances are characterized by the absence of pairs of conflicting columns, where two columns of a binary
matrix are said to be in conflict if there exist three rows on which the two columns read 11, 10, and 01, respectively. The perfect phylogeny problem and various generalizations of it -many of which were proved intractable- have been extensively studied in computational biology.
We will discuss two generalizations of the perfect phylogeny problem, first considered by Hajirasouliha and Raphael in 2014 and motivated by applications in cancer genomics. Both problems are optimizations problems and can be defined as follows:
- The minimum conflict-free row split (MCRS) problem: split each row of a given binary matrix into a bitwise OR of a set of rows so that the resulting matrix has no pairs of conflicting columns (that is, it corresponds to a perfect phylogeny) and has the minimum number of rows among all matrices with this property.
- The minimum distinct conflict-free row split problem: the variant of the problem in which the task is to minimize the number of distinct rows of the resulting matrix.
The talk will focus on various graph theoretic and computational aspects of the two problems, including:
- formulations of the two problems in terms of branchings in a derived directed acyclic graph,
- a related characterization of cocomparability graphs,
- inapproximability results and approximation algorithms for the two problems,
- two polynomial time heuristic algorithms for the MCRS problem: an algorithm based on coloring cocomparability graphs, and an improvement of it that finds an optimal solution in a reduced search space via a new min-max result in weighted acyclic digraphs generalizing Dilworth's theorem.
The results presented in the talk were obtained in collaborations with Ademir Hujdurović, Edin Husić, Ura Kačar, Bernard Ries, Romeo Rizzi, and Alexandru I. Tomescu.