The algorithm takes an input consisting of two not-necessarily binary trees on the same set of taxa X and produces a binary network with minimimum hybridization number that displays binary refinements of both the trees in the input. (As described in the article, the fact that the output is binary does not sacrifice generality). The algorithm is a faithful implementation of the algorithm described in the article, with the exception that it additionally checks for the existence of two conflicting size-2 clusters, allowing it to restrict its guess (at that iteration) to the three taxa involved.
Download the latest version of the code from here. Compile and run it like this (you'll need the JavaCC parser and the JavaC compiler installed):
javacc TerminusEst.jj javac TerminusEst.java java TerminusEst < twotrees.txt
...where twotrees.txt is a file containing two trees, in newick format, e.g.
((a,b,g,d),((c,e),f)); ((a,b),(c,d),(e,f,g));
If you specify the -nonetwork option it only computes hybridization number, it doesn't compute a network.
If the whole thing seems to have hung it's probably because you are not using the '<' operator on the command line: the code reads from stdin (i.e. the standard input).
The output should look something like this:
// This is TERMINUSEST, version 31st august 2012. // Usage: java TerminusEst < treeFile.txt // (If the program seems to have hung, you have probably // forgotten the '<' operator). // ------------------------------------- // We saw 7 taxa in total. // Trying r=0 // Trying r=1 // Trying r=2 // ----------------------------- // HYBRIDIZATION NUMBER = 2 // ----------------------------- strict digraph G0 { 1000 [shape=point]; 1001 [shape=point]; 1002 [shape=point]; 1003 [shape=point]; 1004 [shape=circle, width=0.3, label="a"]; 1005 [shape=circle, width=0.3, label="b"]; 1006 [shape=point]; 1007 [shape=circle, width=0.3, label="d"]; 1008 [shape=point]; 1009 [shape=circle, width=0.3, label="g"]; 1010 [shape=point]; 1011 [shape=point]; 1012 [shape=circle, width=0.3, label="f"]; 1013 [shape=point]; 1014 [shape=circle, width=0.3, label="e"]; 1015 [shape=point]; 1016 [shape=circle, width=0.3, label="c"]; 1003 -> 1004 1003 -> 1005 1002 -> 1003 1006 -> 1007 1006 -> 1015 1002 -> 1006 1001 -> 1002 1008 -> 1009 1001 -> 1008 1000 -> 1001 1011 -> 1012 1013 -> 1014 1015 -> 1016 1013 -> 1015 1011 -> 1013 1010 -> 1011 1010 -> 1008 1000 -> 1010 } strict digraph G1 { 1000 [shape=point]; 1001 [shape=point]; 1002 [shape=point]; 1003 [shape=point]; 1004 [shape=circle, width=0.3, label="a"]; 1005 [shape=circle, width=0.3, label="b"]; 1006 [shape=point]; 1007 [shape=circle, width=0.3, label="d"]; 1008 [shape=point]; 1009 [shape=circle, width=0.3, label="g"]; 1010 [shape=point]; 1011 [shape=point]; 1012 [shape=circle, width=0.3, label="f"]; 1013 [shape=point]; 1014 [shape=circle, width=0.3, label="e"]; 1015 [shape=point]; 1016 [shape=circle, width=0.3, label="c"]; 1003 -> 1004 [color=blue] 1003 -> 1005 [color=blue] 1002 -> 1003 [color=blue] 1006 -> 1007 [color=blue] 1006 -> 1015 [color=red] 1002 -> 1006 [color=blue] 1001 -> 1002 [color=blue] 1008 -> 1009 [color=blue] 1001 -> 1008 [color=blue] 1000 -> 1001 [color=blue] 1011 -> 1012 [color=blue] 1013 -> 1014 [color=blue] 1015 -> 1016 [color=blue] 1013 -> 1015 [color=blue] 1011 -> 1013 [color=blue] 1010 -> 1011 [color=blue] 1010 -> 1008 [color=red] 1000 -> 1010 [color=blue] } strict digraph G2 { 1000 [shape=point]; 1001 [shape=point]; 1002 [shape=point]; 1003 [shape=point]; 1004 [shape=circle, width=0.3, label="a"]; 1005 [shape=circle, width=0.3, label="b"]; 1006 [shape=point]; 1007 [shape=circle, width=0.3, label="d"]; 1008 [shape=point]; 1009 [shape=circle, width=0.3, label="g"]; 1010 [shape=point]; 1011 [shape=point]; 1012 [shape=circle, width=0.3, label="f"]; 1013 [shape=point]; 1014 [shape=circle, width=0.3, label="e"]; 1015 [shape=point]; 1016 [shape=circle, width=0.3, label="c"]; 1003 -> 1004 [color=blue] 1003 -> 1005 [color=blue] 1002 -> 1003 [color=blue] 1006 -> 1007 [color=blue] 1006 -> 1015 [color=blue] 1002 -> 1006 [color=blue] 1001 -> 1002 [color=blue] 1008 -> 1009 [color=blue] 1001 -> 1008 [color=red] 1000 -> 1001 [color=blue] 1011 -> 1012 [color=blue] 1013 -> 1014 [color=blue] 1015 -> 1016 [color=blue] 1013 -> 1015 [color=red] 1011 -> 1013 [color=blue] 1010 -> 1011 [color=blue] 1010 -> 1008 [color=blue] 1000 -> 1010 [color=blue] } // ((((a,b),(d,#H2)),(g)#H1),((f,(e,(c)#H2)),#H1))root; // First tree displayed by network! // Second tree displayed by network! // ----------------------------- // HYBRIDIZATION NUMBER = 2 // -----------------------------
The output is in dot format and can be visualized by feeding it to the Linux command of the same name, or any software capable of supporting the dot format (see the DOT wikipedia page). Note that in dot format all lines that begin with with // are ignored.
The second and third networks show how the trees are displayed. The file here output.pdf shows an example of what the output from the program dot should look like.
Just before the end of the output there is also a line of eNewick produced, which can be viewed (for example) with Dendroscope or at the NetTest Server of Arenas et al.