Talk:Phylogenetic Bayesian MCMC Trials

From BEAST Software

Jump to: navigation, search

Interesting idea! I agree that a suite of tests could definitely be beneficial for the authors of these programs as well as the field in general. A few suggestions:

-Perhaps it doesn't need to be said, but while optimizing the run for a particular data set should of course be allowed, people should be discouraged against (or perhaps just reminded of the dangers of) optimizing the run for a particular random number seed. Perhaps the 'standard' results should be averages of three(+) runs with different seeds?

-The question of how well programs do on the most complex common model is one interesting question, but also of potential interest are questions of how model violations affect the results (say, assuming the F84 model of DNA mutation when the data were actually produced with GTR). This is probably more easily testable with simulated data than with actual data.

-Error bars. 95% confidence intervals should include the truth 95% of the time. They should also be as tight as possible, given that restriction. I'm not sure how best to test this, but I think it will be crucial if we want to seriously compare the programs.

-Finding the most complex common model between all the programs and arbitrary subsets of the programs is going to be a lot easier if we have a big table people can add to. Here's a list off the top of my head of stuff LAMARC can handle (some of these terms may be LAMARC-specific; we can certainly change them if they're better understood under a different name):

Data types:

 -DNA
 -SNP
 -Microsatellite
 -K-Allele (i.e. electrophoretic data, phenotypic data, etc.)
 -unlinked combinations of the above
 -linked combinations of the above
 -Data from unlinked loci with different, known, relative effective population sizes (i.e. chromosome/sex chromosome/mitochondria)

Mutation rate variation:

 -Data arising from different, known, categories of mutation rates.
 -Data from (linked/unlinked) loci with different, known, relative mutation rates.
 -Data from multiple unlinked loci with different, unknown relative mutation rates.
 -No mutation rate variation between lineages (i.e. strict molecular clock).

Data models:

 -nucleotide:
    -Jukes/Cantor
    -F84
    -GTR
 -microsatellite:
    -Brownian
    -Stepwise
    -K-Allele
    -mixed Stepwise/K-Allele

Population parameters:

 -Exponential population growth/decline under the coalescent
 -Constant migration between static populations
 -Constant migration between growing/shrinking populations
 -Recombination (constant rate across populations/loci)

If we format this list into a table (er, not sure how to do that on this wiki), it could be expanded for the capabilities of other programs, and we could track which set each program could do.

-Lucian Smith (postdoc, Mary Kuhner's lab)

Personal tools