Saharica Trial
From BEAST Software
WARNING: This page contains work in progress -- any results on this page are subject to change and have not undergone any peer-review. Use this information at your own risk.
The saharica data set is 30 sequences of 1378 sites with 77 site patterns. This is ten times less site patterns and an order of magnitude less sequence diversity than the Anolis data set. These sequences are all from the same species of frog, so it is also more appropriate data for coalescent-based analysis.
Initially I chose to do a comparison that made for a simple MrBayes input file. So the substitution model chosen was F81 (equilibrium base frequencies). By default the simplest model in BEAST is HKY + empirical base frequencies but with a couple of small changes to the BEAST input XML I was able to support F81 with estimated base frequencies:
MrBayes 3.1.2 file: Saharica.nex
Estimates
| Program | Theta (*=diploid [??]) | TL | rootHeight |
|---|---|---|---|
| Saharica Trial MrBayes 3.1.2 | 0.03684 [0.0228 - 0.05285]* | 0.204 [0.178 - 0.228] (stderr 0.0001) | 0.06157 [0.0534 - 0.07067] |
| Saharica Trial BEAST 1.4.3 | 0.01941 [0.012 - 0.0279] | 0.227 [0.201 - 0.252] (stderr 0.0002) | 0.07044 [0.0612 - 0.0796] |
| Saharica Trial migrate 2.3 | 0.0387 [0.02389 - 0.05563] | x | x |
Mixing characteristics
| Program | Time | Chainlength | 5% credible set of clades | 50% credible set of clade | ESS LnL | ESS TL | ~ESS LnL / Hour | ~ESS TL / Hour | Tester | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| Saharica Trial MrBayes 3.1.2 | 51.6 CPU minutes | 2 * 10,000,000 | 157 / 150 | 234 / 231 | 7162 / 6008 | 15990 / 15640 | 15344 | 36850 | AJD | Macbook Pro Intel Core Duo 2GHz |
| Saharica Trial BEAST 1.4.3 | 26.3 elapsed minutes | 10,000,000 | 159 | 274 | 2826 | 4675 | 6447 | 10706 | AJD | Macbook Pro Intel Core Duo 2GHz |
| Saharica Trial migrate 2.3 | 8.3 minutes elapsed | 8,999,841 | x | x | 2077 | 12030 (Theta) | 15019 | 86963 (Theta) | PB | Macbook Pro Intel Core Duo 2.16GHz |
Somewhat similar to the Anolis Trial, this test seems to suggest that MrBayes mixes the continuous parameters better, but BEAST mixes in tree topology space better. Because this data set is far more ambiguous about the tree topology (>18,000 distinct tree topologies sampled in a run that only recorded 20,000 trees) I have summarized the size of the credible set of trees by the number of clades that appear in the given credibility interval. In particular I report the total number of different clades in the 5% and 50% credible sets of trees. I use these numbers because they are more likely to provide accurate results than the 95% credible set when there are this many trees. Nevertheless, there appears to be a significant difference between the two programs - with BEAST finding slightly more clades and therefore slightly lower posterior clade probabilities than MrBayes. If this is an indication of inadequate mixing on the part of MrBayes then it could, in part, explain the observation that posterior clade probabilities often seem to be overestimated by Bayesian analyses. Of course BEAST may also be over-estimating the clade probabilities, just by not quite as much. -AJD

