CONVEXMPDIST ------------ This, CONVEXMPDIST, is the second piece of software associated with the paper "A note on convex characters, Fibonacci numbers and exponential-time algorithms", by Kelk et al. It is an exponential-time algorithm for computing the maximum parsimony distance between two binary phylogenetic trees on n taxa. The trees should be provided in Newick format. The first piece of code from the article (CONVEXCOUNT), which counts, samples and lists convex characters where each state occurs on at least k taxa, can be found at: http://skelk.sdf-eu.org/convexcount RUNNING THE CODE ---------------- You will need Java to be installed on your computer (these days it is automatically installed on most platforms). To use the already-compiled .jar file, simply type (for example): java -jar ConvexMPDist.jar < taxa20.txt The output should look similar to the file taxa20.out. Note that, although maximum parsimony distance is oblivious to the presence or absence of a root, the Newick that you feed to the software should be 'rooted'. That is, at the top level of the parantheses there should be 2 children, not 3. So, if your tree has the form (A,B,C) this should be recoded for example as (A,(B,C)). This does not sacrifice generality or change the maximum parsimony distance. COMPILING THE CODE ------------------ To compile the code from scratch, you'll need to have both the Java Development Kit and the JavaCC parser software (used for parsing the input trees) installed on your system. Once installed type, javacc ConvexMPDist.jj javac ConvexMPDist.java Then you can run the software by typing, java ConvexMPDist < taxa20.txt OTHER ISSUES ------------ * If you specify --quiet on the command line the only output of the program will be the optimum. For example, java -jar ConvexMPDist.jar --quiet < taxa20.txt should return simply the number 10. * The running time of the code is \Theta( \phi^{n} . poly(n) ) where \phi is the golden ratio (roughly 1.618) and n is the number of taxa in the trees. As in the paper the poly(n) term is O(n^2). * In the article the convex characters are generated recursively by following the dynamic programming downwards from the root. Here we generate them in a non-recursive fashion to optimize the polynomial part of the running time. In particular, we observe that when Fitch's algorithm is applied to a convex character (where each state occurs at least twice) this generates extensions with a specific form. We loop through a superset of these extensions. The superset naturally has size O(3^n). Not all these extensions generate convex characters in which each state appears at least twice, and some generate the same character. We have special transition rules to ensure that these invalid characters are excluded and no time is wasted looping through exponential-size subsets of invalid characters. This guarantees the \Theta( \phi^{n} . poly(n) ) running time. * The code has been used in the experimental part of a second paper, to appear in Theoretical Computer Science (2016), entitled "Reduction rules for the maximum parsimony distance on phylogenetic trees" by Kelk et al. The arXiv version of the article is available at http://arxiv.org/abs/1512.07459 For more information don't hesitate to contact me at my Maastricht University email address. Kind regards, Steven Kelk, 10th August 2015 and 23rd July 2016