Phylogenetic Bayesian MCMC Trials
From BEAST Software
This is the beginnings of a site devoted to creating a set of comparisons between different programs that perform Bayesian MCMC phylogenetics. The aim is to identify where these programs overlaps in terms of the models they implement and to provide data sets that can be used to test these models. This is not intended as a competition but rather as a way to cross-validate programs using independent implementations of the same models and to compare the efficiency of different sampling strategies.
At the moment this is a proposal and a call for contributions.
Contents |
The Programs
Here is a list of programs that may be able to participate in this trial. Please add to this list.
- BEAST
- http://beast.bio.ed.ac.uk/
- LAMARC
- http://evolution.gs.washington.edu/lamarc/
- Migrate-n
- http://popgen.scs.fsu.edu/
- MrBayes
- http://mrbayes.csit.fsu.edu/
I am aware that there are a number of other programs (BAMBE, Batwing, BayesPhylogenies, PHASE, PAML - MCMCTree) but I have not used these yet so I am not certain whether they have models in common with the above. The aim of this website is to encourage the authors of the programs to do the actual testing and submit the results.
The Models
The aim is to identify the most complex model (or sets of models) which all the programs have in common. With the three initial programs listed, I think the most complex model intersection is a combination of:
- GTR model of nucleotide substitution
- Exponential-growth Coalescent
- Strict molecular clock
However, for individual pairs of programs, more complex model intersections can be identified:
- BEAST vs. MrBayes
- Birth-Death speciation process
The Data
The aim here would be to present a range of data that are suitable for the models being evaluated. This might involve finding data that are particularly suited to the sampling strategy of a particular program to see how other strategies fare. Some simulated data with known generating parameters might be useful as well.
The Execution
Each program would ideally be run by their developers and they would be encouraged to tune any sampling strategies to give the best efficiency for a given model and data set. Details of the tuning (for example proposal move sizes) should be reported. Running metropolis-coupled MCMC is obviously allowed but the total CPU usage must be calculated for all chains. It may be easier there for these to be run in a non-parallel environment on a single processor.
The Assessment
The aim of assessment is not to "benchmark" the programs in terms of how long they take to run but rather to look at their sampling efficiency. We cannot simply ask how long does it take to run a chain of 1 million steps (generations) because programs differ in how they propose new states in the chain and thus how autocorrelated the samples from the chain are. One possibility is to measure the number of effectively independent samples per CPU hour. The idea is not to test the speed of different processor or code optimizations to a specific machine but the efficiency of particular sampling strategies and proposal moves.
- A standard computer that all tests must be run
- A virtual machine with emulated processor (for example the 486 that is emulated by VirtualPC running on a G5 Macintosh). A particular virtual drive (running Linux) could be provided with all the required tools. A virtual machine that doesn't emulate the processor would not work for this because then the CPU hours would be dependent on the underlying processor.
- Run the complete trial or individual pairs on many different computers. The simple way to do this may be to provide scripts to run the programs, assess the results and then allow individuals to submit the results for a particular machine.
The Results
Here we will provide detailed results for each program assessed including supplementary information such as output files, summary trees, parameter estimates, speeds and times.
The Post-Match Analysis
Here we will have analysis of the results and discussion of the relative merits of different sampling strategies.

