CONVEXMPDIST
------------

This, CONVEXMPDIST, is the second piece of software associated with the 
paper "A note on convex characters, Fibonacci numbers and exponential-time 
algorithms", by Kelk et al.

It is an exponential-time algorithm for computing the maximum parsimony
distance between two binary phylogenetic trees on n taxa. The trees
should be provided in Newick format.

The first piece of code from the article (CONVEXCOUNT), which counts, 
samples and lists convex characters where each state occurs on at 
least k taxa, can be found at:

	http://skelk.sdf-eu.org/convexcount

RUNNING THE CODE
----------------

You will need Java to be installed on your computer (these days it is 
automatically installed on most platforms).

To use the already-compiled .jar file, simply type (for example):

java -jar ConvexMPDist.jar < taxa20.txt

The output should look similar to the file taxa20.out.

Note that, although maximum parsimony distance is oblivious to the 
presence or absence of a root, the Newick that you feed to 
the software should be 'rooted'. That is, at the top level of the 
parantheses there should be 2 children, not 3. So, if your tree has the
form (A,B,C) this should be recoded for example as (A,(B,C)). This does 
not sacrifice generality or change the maximum parsimony distance.

COMPILING THE CODE
------------------

To compile the code from scratch, you'll need to have both the Java 
Development Kit and the JavaCC parser software (used for parsing
the input trees) installed on your system.

Once installed type,

javacc ConvexMPDist.jj
javac ConvexMPDist.java

Then you can run the software by typing,

java ConvexMPDist < taxa20.txt

OTHER ISSUES
------------

* If you specify --quiet on the command line the only output of the 
program will be the optimum. For example,

java -jar ConvexMPDist.jar --quiet < taxa20.txt

should return simply the number 10.

* The running time of the code is \Theta( \phi^{n} . poly(n) )
where \phi is the golden ratio (roughly 1.618) and n is the number of taxa 
in the trees. As in the paper the poly(n) term is O(n^2). 

* In the article the convex characters are generated recursively by 
following the dynamic programming downwards from the root. Here we 
generate them in a non-recursive fashion to optimize the polynomial part 
of the running time. In particular, we observe that when Fitch's 
algorithm is applied to a convex character (where each state occurs at 
least twice) this generates extensions with a specific form. We loop 
through a superset of these extensions. The superset naturally has size 
O(3^n). Not all these extensions generate convex characters in which 
each state appears at least twice, and some generate the same character. 
We have special transition rules to ensure that these invalid characters 
are excluded and no time is wasted looping through exponential-size 
subsets of invalid characters. This guarantees the \Theta( \phi^{n} . 
poly(n) ) running time.

* The code has been used in the experimental part of a second paper,
to appear in Theoretical Computer Science (2016), entitled
"Reduction rules for the maximum parsimony distance on phylogenetic 
trees" by Kelk et al. The arXiv version of the article is available at

	http://arxiv.org/abs/1512.07459

For more information don't hesitate to contact me at my Maastricht 
University email address.

Kind regards,

Steven Kelk, 10th August 2015 and 23rd July 2016