Saharica Trial

From BEAST Software

Jump to: navigation, search

WARNING: This page contains work in progress -- any results on this page are subject to change and have not undergone any peer-review. Use this information at your own risk.

The saharica data set is 30 sequences of 1378 sites with 77 site patterns. This is ten times less site patterns and an order of magnitude less sequence diversity than the Anolis data set. These sequences are all from the same species of frog, so it is also more appropriate data for coalescent-based analysis.

Initially I chose to do a comparison that made for a simple MrBayes input file. So the substitution model chosen was F81 (equilibrium base frequencies). By default the simplest model in BEAST is HKY + empirical base frequencies but with a couple of small changes to the BEAST input XML I was able to support F81 with estimated base frequencies:

MrBayes 3.1.2 file: Saharica.nex

Estimates

Program Theta (*=diploid [??]) TL rootHeight
Saharica Trial MrBayes 3.1.2 0.03684 [0.0228 - 0.05285]* 0.204 [0.178 - 0.228] (stderr 0.0001) 0.06157 [0.0534 - 0.07067]
Saharica Trial BEAST 1.4.3 0.01941 [0.012 - 0.0279] 0.227 [0.201 - 0.252] (stderr 0.0002) 0.07044 [0.0612 - 0.0796]
Saharica Trial migrate 2.3 0.0387 [0.02389 - 0.05563] x x

Mixing characteristics

Program Time Chainlength 5% credible set of clades 50% credible set of clade ESS LnL ESS TL ~ESS LnL / Hour ~ESS TL / Hour Tester Notes
Saharica Trial MrBayes 3.1.2 51.6 CPU minutes 2 * 10,000,000 157 / 150 234 / 231 7162 / 6008 15990 / 15640 15344 36850 AJD Macbook Pro Intel Core Duo 2GHz
Saharica Trial BEAST 1.4.3 26.3 elapsed minutes 10,000,000 159 274 2826 4675 6447 10706 AJD Macbook Pro Intel Core Duo 2GHz
Saharica Trial migrate 2.3 8.3 minutes elapsed 8,999,841 x x 2077 12030 (Theta) 15019 86963 (Theta) PB Macbook Pro Intel Core Duo 2.16GHz


Somewhat similar to the Anolis Trial, this test seems to suggest that MrBayes mixes the continuous parameters better, but BEAST mixes in tree topology space better. Because this data set is far more ambiguous about the tree topology (>18,000 distinct tree topologies sampled in a run that only recorded 20,000 trees) I have summarized the size of the credible set of trees by the number of clades that appear in the given credibility interval. In particular I report the total number of different clades in the 5% and 50% credible sets of trees. I use these numbers because they are more likely to provide accurate results than the 95% credible set when there are this many trees. Nevertheless, there appears to be a significant difference between the two programs - with BEAST finding slightly more clades and therefore slightly lower posterior clade probabilities than MrBayes. If this is an indication of inadequate mixing on the part of MrBayes then it could, in part, explain the observation that posterior clade probabilities often seem to be overestimated by Bayesian analyses. Of course BEAST may also be over-estimating the clade probabilities, just by not quite as much. -AJD

Personal tools