CONVEXCOUNT ----------- This, CONVEXCOUNT, is the first piece of software associated with the paper "A note on convex characters, Fibonacci numbers and exponential-time algorithms", by Kelk et al. For a given (unrooted) binary input tree the code counts, samples and lists convex characters where each state occurs on at least k taxa. The other piece of software associated with the paper (CONVEXMPDIST), which can be used to compute maximum parsimony distance, can be found at http://skelk.sdf-eu.org/convexmpdist/ RUNNING THE CODE ---------------- You will need Java to be installed on your computer (these days it is automatically installed on most platforms). To use the already-compiled .jar file, simply type (for example): java -jar ConvexCount.jar < tree.txt The output should look something like this: $ java -jar ConvexCount.jar < tree.txt // Taxa in tree: 9 // Only considering convex characters where each state has at least 1 taxa. // Total characters = 1597 If no command-line options are specified then the code simply counts the number of convex characters on the input tree. The input tree here has 9 taxa, so (as expected) the number of characters is equal to the ((2*9)-1)th Fibonacci number i.e. F(17) = 1597. Note that, although characters are oblivious to the presence or absence of a root, the Newick that you feed to the software should be 'rooted'. That is, at the top level of the parantheses there should be 2 children, not 3. So, if your tree has the form (A,B,C) this should be recoded for example as (A,(B,C)). This does not alter the results in any way. COMPILING THE CODE ------------------ To compile the code from scratch, you'll need to have both the Java Development Kit and the JavaCC parser software (used for parsing the input trees) installed on your system. Once installed type, javacc ConvexCount.jj javac ConvexCount.java Then you can run the software by typing, java ConvexCount < tree.txt COMMAND-LINE OPTIONS -------------------- If you add -help as a command-line option you will see the following. $ java ConvexCount -help java ConvexCount [options] < treeFile.txt ----------------------------------------- Options: -help : displays this message. -k [arg] : consider only convex characters in which each state occurs on at least [arg] taxa. If not specified, k defaults to 1. -list : list all the convex characters in which each state occurs on at least k taxa. -sample [arg] : generate u.a.r. [arg] convex characters in which each state occurs on at least k taxa. If [arg] is set to -1, an infinite number of samples are generated. -spectrum : generate the g_k spectrum for this tree. Here are a few examples. The files tree.txt and tree2.txt correspond to the two trees shown in Figure 2 in the paper. The paper claims that they have different g_3 values, so let's test this. First we analyse tree2.txt : $ java -jar ConvexCount.jar -k 3 -list < tree.txt // Taxa in tree: 9 // Only considering convex characters where each state has at least 3 taxa. // Total characters = 5 // Generating all the characters! 1: {{a,b,c},{d,e,f},{g,h,i}} 2: {{a,b,c,d,e,f},{g,h,i}} 3: {{d,e,f,g,h,i},{a,b,c}} 4: {{a,b,c,g,h,i},{d,e,f}} 5: {{a,b,c,d,e,f,g,h,i}} Next we analyse tree2.txt: $ java -jar ConvexCount.jar -k 3 -list < tree.txt // Taxa in tree: 9 // Only considering convex characters where each state has at least 3 taxa. // Total characters = 6 // Generating all the characters! 1: {{a,b,c},{d,e,f},{g,h,i}} 2: {{a,b,c},{d,e,f,g,h,i}} 3: {{a,b,c,d},{e,f,g,h,i}} 4: {{a,b,c,d,e},{f,g,h,i}} 5: {{a,b,c,d,e,f},{g,h,i}} 6: {{a,b,c,d,e,f,g,h,i}} As claimed in the article, the first tree has g_3 = 5 and the second tree has g_3 = 6. This can also be seen by generating the spectrum of each tree, which is simply the vector of values g_i for 1 <= i <= n where n is the number of taxa. $ java -jar ConvexCount.jar -spectrum < tree.txt // Taxa in tree: 9 // Generating g_k spectrum for this tree. 1597, 21, 5, 1, 1, 1, 1, 1, 1 $ java -jar ConvexCount.jar -spectrum < tree2.txt // Taxa in tree: 9 // Generating g_k spectrum for this tree. 1597, 21, 6, 3, 1, 1, 1, 1, 1 Let's confirm that the tree from Figure 1 (figure1.txt) indeed has 8 convex characters where each state appears on at least 2 taxa. $ java -jar ConvexCount.jar -k 2 -list < figure1.txt // Taxa in tree: 7 // Only considering convex characters where each state has at least 2 taxa. // Total characters = 8 // Generating all the characters! 1: {{a,b},{c,d},{e,g,f}} 2: {{a,b},{c,d,e},{g,f}} 3: {{a,b},{c,d,e,g,f}} 4: {{a,b,c},{d,e},{g,f}} 5: {{a,b,c},{d,e,g,f}} 6: {{a,b,c,d},{e,g,f}} 7: {{a,b,c,d,e},{g,f}} 8: {{a,b,c,d,e,g,f}} Finally, let's u.a.r. sample 15 convex characters (out of a total of 233) from this tree. $ java ConvexCount -k 1 -sample 15 < figure1.txt // Taxa in tree: 7 // Only considering convex characters where each state has at least 1 taxa. // Total characters = 233 // Sampling 15 characters u.a.r. 79: {{b,d},{a},{e,f},{g},{c}} 134: {{a,d},{b},{e,f},{g},{c}} 143: {{a,d,e,g},{b},{f},{c}} 6: {{a},{b},{c},{d,f},{g},{e}} 78: {{b,d},{a},{e},{g,f},{c}} 55: {{a,b},{c,e,g,f},{d}} 21: {{a},{b},{c,e,g,f},{d}} 61: {{a,b},{c,d,f},{g},{e}} 51: {{a,b},{c,e},{g},{f},{d}} 113: {{b,c,d},{a},{e,f},{g}} 47: {{a,b},{c},{d,e,g,f}} 50: {{a,b},{c,g,f},{e},{d}} 80: {{b,d},{a},{e,g},{f},{c}} 155: {{a,c},{b},{d,e,f},{g}} 204: {{a,b,c},{d},{e,g,f}} Note that it is sampling with replacement, so duplicates are possible. MISCELLANEOUS ------------- The files spectrum1.txt and spectrum2.txt contain two trees on 11 taxa that are non-isomorphic (in the unlabelled sense i.e. disregarding the taxa labels) but which do have the same spectrum: 10946, 55, 8, 3, 2, 1, 1, 1, 1, 1, 1 A picture of these two trees is shown in CounterExampleSpectrumConjecture.jpg. The files analysisSpectrum1.txt and analysisSpectrum2.txt list the 8,3,2 characters for g_3, g_4 and g_5 respectively. See the file proof_spectrum_conjecture_until_10taxa.txt for a computational proof that up until 10 taxa the spectrum *does* uniquely specify the tree topology. CONTACT ------- For more information don't hesitate to contact me at my Maastricht University email address. Kind regards, Steven Kelk, 23rd July 2016