allfiles.tar.gz is an image of this directory and all its subdirectories, which you can unpack on your own computer if you wish. ----------- *** GADGETS ----------- The two folders, * unbounded_symmetry_gadget * 2state_symmetry_gadget contain trees, in Newick format, for the symmetry-breaking gadgets used in the two NP-hardness proofs. The unbounded_symmetry_gadget contains files corresponding to the t_a and t_b trees (6 taxa) and t_A and t_B trees (12 taxa) used in the proof that d_MP is NP-hard. The 2state_symmetry_gadget contains files corresponding to the t_a and t_b trees (8 taxa), t_A and t_B trees (16 taxa), and finally t_AA and t_BB trees (32 taxa) used in the proof that d^{2}_MP is NP-hard. Within these folders you will also find ILPs that have been generated to compute the parsimony distance for these trees. As explained below, the ILP solves in one direction only i.e. max_{f is a character}( parsimony score of second tree under f - parsimony score of first tree under f ). That's why there are two .txt and two .lp files for each pair of trees. To obain the true parsimony distance you should take the maximum of the two ILP runs. ----------- *** THE ILP ----------- The code for generating the ILP can be found in the folder 'ilp'. The code is written in JavaCC, which allows Java code to be extended with parser functionality (in this case: to read in and parse the input trees). You will need to install this. To compile, $ javacc BinaryILP.jj Then compile the .java files that are generated: $ javac BinaryILP.java The code is now ready to run. The Java reads in two trees from the standard input and generates an ILP (in LP / CPLEX format) that can be solved by any ILP solver capable of reading this format. $ java BinaryILP < trees.txt > trees.lp The ILP generated only solves 'half' the problem: it computes max_{characters c}( parsimony score of the second tree in the file under c - parsimony score of the first tree in the file under c). For example, let exampletrees.txt be the following file: (((((2,3),4),5),6),1); ((((2,6),(3,4)),5),1); $ java BinaryILP < exampletrees.txt > exampletrees.lp Solving exampletrees.lp with your favourite ILP solver should give optimum 1. For trees with few taxa a basic solver like GLPK will be OK but for slightly larger trees more powerful solvers such as the open-source SCIP or commercial solvers such as CPLEX or Gurobi are preferable. (The latter two are freely available to academics). If you reverse the order of the trees in the file: ((((2,6),(3,4)),5),1); (((((2,3),4),5),6),1); and re-build the .lp file, you should get optimum 2. So d_MP is max(1,2)=2 and we observe that there is asymmetry between the two trees. --------------------------------- *** BOUNDING THE NUMBER OF STATES --------------------------------- The default mode for the Java is to construct an ILP that does not bound the number of states available to the characters it considers. In other words it computes the normal, 'unbounded' parsimony distance. If, however, you wish to bound the number of states available, to 2 say, you can do this as follows: $ java BinaryILP -states 2 < exampletrees.txt > exampletrees.lp As you will notice this speeds the ILP considerably -- because there are far fewer binary variables in the ILP. --------------------- *** SCRIPTS FOR CPLEX --------------------- For those of you who are running under Linux and have CPLEX installed, the following two simple scripts are available: * computeMP_CPLEX.sh * computeMP_2state_CPLEX.sh These combine all the above steps in a single script. The first is for normal (i.e. unbounded) parsimony distance, the second is for d^{2}_MP. The script takes care of generating and running the ILP twice and (lightly) processing the output. Similar simple scripts can, without too much difficulty, be written for different solvers and for different operating systems. Java, JavaCC and the aforementioned ILP solvers are available on all 3 major operating systems and thse are the core ingredients of this software. Any questions please do not hesitate to contact me/us. Steven Kelk Mareike Fischer December 2014 ---- * Technical note, 13th December 2014. Around the time of uploading the first pre-print we discovered that, for the normal, unbounded parsimony distance, it is safe to bound the number of states in the considered characters to \lfloor n/2 \rfloor, where n is the number of taxa in the trees. This has already been incorporated into the ILP. To use a different upper bound (e.g. taxa-2) change the following line in the file BinaryILP.jj: maxStates = (int) Math.floor( taxa / 2.0 );