Reconstruction of phylogenetic trees is a fundamental problem
in computational biology. While excellent heuristic methods are
available for many variants of this problem, new advances in phylogeny
inference will be required if we are to be able to continue to make effective
use of the rapidly growing stores of variation data now being
gathered. In this paper, we introduce an integer linear programming formulation
to find the most parsimonious phylogenetic tree from a set of
binary variation data. The method uses a
ow-based formulation that
could use exponential numbers of variables and constraints in the worst
case. The method has, however, proved extremely efficient in practice on
datasets that are well beyond the reach of the available provably efficient
methods. The program solves several large mtDNA and Y-chromosome
instances within a few seconds, giving provably optimal results in times
competitive with fast heuristics than cannot guarantee optimality.