SIMPLISTIC is a heuristic based on the ideas described by Leo van Iersel and Steven Kelk in the article, Constructing the Simplest Possible Phylogenetic Network from Triplets. SIMPLISTIC has many command-line options with which its functionality can be adjusted, but the core of the idea is as follows. Given a dense set of triplets SIMPLISTIC always returns some network consistent with 100% of the dense input triplets. It does this by repeatedly constructing simple networks (networks with only trivial cut-edges) of minimum level consistent with the set of triplets induced by maximal SN-Sets. It is well known that, for a (dense) set of triplets on n leaves, there exists a level-(n-1) network consistent with those triplets, so the program always terminates. The fundamental features of SIMPLISTIC can be summarised as follows.
The following instructions assume a UNIX environment. There are no fundamental reasons however why the algorithm will not work under another operating system environment as long as Java and the graph drawing package DOT are available in that environment. (Please contact Steven Kelk if you are interested in an output format for a package other than DOT.)
We begin by showing how SIMPLISTIC can be run without any complicated command-line options.
We now explain the meaning of the various command-line options that SIMPLISTIC accepts. The command-line has the general form java Simplistic yourtripletdat.txt [options].
--simple | Forces the program to only look for simple networks. If this option is not used SIMPLISTIC will try and compute a general network i.e. will (recursively) compute the set of maximal SN-sets and try to compute a simple network consistent with the set of triplets induced by the maximal SN-sets. |
--all | (This should only be used in conjunction with --simple.) Produces all simple networks consistent with the input triplets. At the moment there is no isomorphism elimination implemented so the output can be rather large when --all is used. |
--enewick | Produces networks in eNewick format instead of DOT format. |
--snsplit [num] | Normally SIMPLISTIC computes the set of triplets induced by the set of maximal SN-sets and tries to build a simple network of a particular level consistent with those induced triplets. If it can't do that it then searches for a simple network of higher level, and so on. With the option --snsplit [num] ([num] should be greater than or equal to 0) SIMPLISTIC tries splitting all subsets of size [num] before moving on to the next higher level. 'Splitting' means replacing an SN-set S with the SN-sets that are maximal when the set of triplets is restricted to the leaf set S. We note that specifying --snsplit 1 guarantees that, if there is a level-2 network consistent with the input triplets, that such a network will be found. (If --snsplit is not used and there exists a level-1 network consistent with the input triplets, then SIMPLISTIC will find such a network.) It might be that using --snplit 2 or --snsplit 3 is enough to ensure that a level-3 network is found (if the input triplets are consistent with some level-3 network) but this remains an open question. |
--beginlev [num] | (This should only be used in conjunction with --simple.) Starts at level [num] when searching for simple networks consistent with the input. The argument [num] should be greater than or equal to 1. |
--stop | (This should only be used in conjunction with --simple.) Normally SIMPLISTIC does not stop once it has found a network. Indeed, even if it finds a solution of level-k then it will continue looking for solutions of level-(k+1) and so on, up to and including level-(n-1). If the --stop option is specified then it will not continue to higher levels. |
--percentage | If this option is switched on SIMPLISTIC will report (when constructing simple networks) how close to 100% consistency it came. Using this option heavily slows the code, however. This happens because SIMPLISTIC normally throws a candidate solution away as soon as it is inconsistent with 1 or more input triplets, but with --percentage specified all such solutions need to be carried through to the end. Note that the reported percentages relate to the set of triplets being considered at that moment: this in general will not be the original input set of triplets (it could be an induced set of triplets, for example.) |
--reflect | Makes SIMPLISTIC check for reflectivity, not consistency. A network reflects a set of triplets if and only if the set of all triplets consistent with the network is equal to the set of input triplets. (Consistency in contrast demands only that the former set is a superset of the latter.) With --reflect specified SIMPLISTIC becomes an implementation of the algorithm MINPITS from the paper Constructing the Simplest Possible Phylogenetic Network from Triplets. In other words, it will find a network of minimum level (and which minimises the number of reticulation vertices) that reflects the input triplets, if such a network exists. Note that it is still not known at what level (as a function of n) we can stop in the search for reflecting networks. Unless otherwise specified using --maxlev the search will be terminated after level-(n-1). |
--alltriplets | If this option is specified the file you provide as input is not a file of triplets, but a network in DOT format. SIMPLISTIC computes the set of all triplets consistent with this network and then uses that as the input. It is not necessary to be 100% in accordance with DOT format, only the edges have to be specified. So it suffices to provide a file where every line is of the form v1 -> v2 where v1 and v2 are the endpoints of the edge in question. It is, however, essential that leaves are numbered 1...n and that all non-leaves (root, split vertices, reticulation vertices) have numbers greater than or equal to 1000. (This matches with the conventions used by the DOT networks that SIMPLISTIC, MARLON and LEVEL2 output. So those networks can be fed back into SIMPLISTIC if desired.) |
--maxlev [num] | Normally the program doesn't go further than level-(n-1) but with this option it continues until level max(n-1,[num]). (Should only really be used with --simple.) |
--help | Prints a short help message. |
Various examples will appear here in due course. In the meantime please see the webpages for MARLON and LEVEL2 (see below) for some data sets.
The code uses the triplet consistency checking algorithm proposed in Worst-case optimal approximation algorithms for maximizing triplet consistency within phylogenetic networks by Jaroslaw Byrka, Pawel Gawrychowski, Katharina T. Huber and Steven Kelk. The code to generate combinations was taken from http://www.merriampark.com/perm.htm.