You are here

Tutorial 8

BEAST Documentation->Tutorials->Tutorial 8


Analyzing multi-locus data

This tutorial describes how to create a BEAST XML file for analyzing multiple loci alignments sampled from the same population. Although the different loci have different genealogies, they are assumed to share the same demographic history. Having multiple unlinked loci can give a large improvement in the power to reconstruct the demographic history.

This tutorial is a rather involved exercise in editing XML with lots of copy and paste. Basically, certain elements must be duplicated so that each locus alignment has its own copy which can operate independently. Other elements (particularly the demographic model) are shared between the multiple alignments. This tutorial describes setting up a two locus model but it should be obvious how to extend it to more.

The easiest way to start to create this analysis is to use BEAUti to create an XML BEAST file for each individual locus' alignment. See this tutorial for instructions on how to do this. Now start editing the XML file for the first locus' alignment and start adding text. Look for the alignment element:

	<alignment id="alignment" dataType="nucleotide">
		<sequence>
			<taxon idref="individual1"/>
			AGAAATATGTCTGATA...
		</sequence>
		<sequence>
			<taxon idref="individual2"/>
			AGAAATATGTCTGATA...
		</sequence>
		<sequence>
			<taxon idref="individual3"/>
			AGAAATATGTCTGATA...
		</sequence>
		<sequence>
			<taxon idref="individual4"/>
			AGAAATATGTCTGATA...
		</sequence>
		<sequence>
			<taxon idref="individual5"/>
			AGAAATATGTCTGACA...
		</sequence>
		<sequence>
			<taxon idref="individual6"/>
			AGAAATACGTCTGACG...
		</sequence>
	</alignment>

Note actual sequences have been truncated for displaying here. The first thing to change is the id of the alignment element to "alignment1":

	<alignment id="alignment1" dataType="nucleotide">

Now paste the alignment element from the XML document for the second locus into the first file just after the first alignment element, changing the id to "alignment2":

	<alignment id="alignment1" dataType="nucleotide">
	.
	.
	.
	</alignment>

	<alignment id="alignment2" dataType="nucleotide">
		<sequence>
			<taxon idref="individual1"/>
			GCCCTCAGTAAGTTG...
		</sequence>
		<sequence>
			<taxon idref="individual2"/>
			GCCCTCAGCAAGTTA...
		</sequence>
		<sequence>
			<taxon idref="individual3"/>
			GCTCTCAGTAAGTTA...
		</sequence>
		<sequence>
			<taxon idref="individual4"/>
			GCCCTTAGTAAGTTA...
		</sequence>
		<sequence>
			<taxon idref="individual5"/>
			GCGAAATTTAGGTTA...
		</sequence>
		<sequence>
			<taxon idref="individual6"/>
			GCGAAATTTAGGTTA...
		</sequence>
	</alignment>

This assumes that the second locus has the same taxon names as the first (i.e., both loci where sampled from the same set of individuals). If this is not the case then the <taxa> element from the second locus XML file will need to be copied over as well.

Now duplicate the <patterns> element, changing the ids and the idrefs of the alignments they operate on:

	<patterns id="patterns1" from="1">
		<alignment idref="alignment1"/>
	</patterns>

	<patterns id="patterns2" from="1">
		<alignment idref="alignment2"/>
	</patterns>

Duplicate the starting <coalescentTree> element and the <treeModel>. This provides a different and independent tree for each locus and provides a different random starting tree for each:

	<coalescentTree id="startingTree1">
		<taxa idref="taxa"/>
		<constantSize idref="initialDemo"/>
	</coalescentTree>

	<coalescentTree id="startingTree2">
		<taxa idref="taxa"/>
		<constantSize idref="initialDemo"/>
	</coalescentTree>

	<treeModel id="treeModel1">
		<coalescentTree idref="startingTree1"/>
		<rootHeight>
			<parameter id="treeModel1.rootHeight"/>
		</rootHeight>
		<nodeHeights internalNodes="true">
			<parameter id="treeModel1.internalNodeHeights"/>
		</nodeHeights>
		<nodeHeights internalNodes="true" rootNode="true">
			<parameter id="treeModel1.allInternalNodeHeights"/>
		</nodeHeights>
	</treeModel>

	<treeModel id="treeModel2">
		<coalescentTree idref="startingTree2"/>
		<rootHeight>
			<parameter id="treeModel2.rootHeight"/>
		</rootHeight>
		<nodeHeights internalNodes="true">
			<parameter id="treeModel2.internalNodeHeights"/>
		</nodeHeights>
		<nodeHeights internalNodes="true" rootNode="true">
			<parameter id="treeModel2.allInternalNodeHeights"/>
		</nodeHeights>
	</treeModel>

Now duplicate the < coalescentLikelihood > element:

	<coalescentLikelihood id="coalescent1">
		<model>
			<constantSize idref="constant"/>
		</model>
		<populationTree>
			<treeModel idref="treeModel1"/>
		</populationTree>
	</coalescentLikelihood>

	<coalescentLikelihood id="coalescent2">
			<model>
				<constantSize idref="constant"/>
			</model>
			<populationTree>
				<treeModel idref="treeModel2"/>
			</populationTree>
		</coalescentLikelihood>

Now duplicate the <treeLikelihood> element:

	<treeLikelihood id="treeLikelihood1">
		<patterns idref="patterns1"/>
		<treeModel idref="treeModel1"/>
		<siteModel idref="siteModel"/>
	</treeLikelihood>

	<treeLikelihood id="treeLikelihood2">
		<patterns idref="patterns2"/>
		<treeModel idref="treeModel2"/>
		<siteModel idref="siteModel"/>
	</treeLikelihood>

Next we must duplicate the operators for the two trees:

	<operators id="operators">
		<scaleOperator scaleFactor="0.5" weight="1">
			<parameter idref="hky.kappa"/>
		</scaleOperator>
		<scaleOperator scaleFactor="0.5" weight="1" adapt="false">
			<parameter idref="constant.popSize"/>
		</scaleOperator>
		<scaleOperator scaleFactor="0.5" weight="1">
			<parameter idref="treeModel1.rootHeight"/>
		</scaleOperator>
		<uniformOperator weight="10">
			<parameter idref="treeModel1.internalNodeHeights"/>
		</uniformOperator>
		<subtreeSlide weight="5" gaussian="true" size="1.0">
			<treeModel idref="treeModel1"/>
		</subtreeSlide>
		<narrowExchange weight="1">
			<treeModel idref="treeModel1"/>
		</narrowExchange>
		<wideExchange weight="1">
			<treeModel idref="treeModel1"/>
		</wideExchange>
		<wilsonBalding weight="1">
			<treeModel idref="treeModel1"/>
			<constantSize idref="constant"/>
		</wilsonBalding>
		<scaleOperator scaleFactor="0.5" weight="1">
			<parameter idref="treeModel2.rootHeight"/>
		</scaleOperator>
		<uniformOperator weight="10">
			<parameter idref="treeModel2.internalNodeHeights"/>
		</uniformOperator>
		<subtreeSlide weight="5" gaussian="true" size="1.0">
			<treeModel idref="treeModel2"/>
		</subtreeSlide>
		<narrowExchange weight="1">
			<treeModel idref="treeModel2"/>
		</narrowExchange>
		<wideExchange weight="1">
			<treeModel idref="treeModel2"/>
		</wideExchange>
		<wilsonBalding weight="1">
			<treeModel idref="treeModel2"/>
			<constantSize idref="constant"/>
		</wilsonBalding>
	</operators>

Finally we must adjust the <mcmc>:

	<mcmc id="mcmc" chainLength="10000000" autoOptimize="true">
		<compoundLikelihood id="likelihood">
			<coalescentLikelihood idref="coalescent1"/>
			<coalescentLikelihood idref="coalescent2"/>
			<treeLikelihood idref="treeLikelihood1"/>
			<treeLikelihood idref="treeLikelihood2"/>
		</compoundLikelihood>
		<operators idref="operators"/>
		<log logEvery="10000">
			<column label="Likelihood" dp="4" width="12">
				<compoundLikelihood idref="likelihood"/>
			</column>
			<column label="Root Height 1" sf="6" width="12">
				<parameter idref="treeModel1.rootHeight"/>
			</column>
			<column label="Root Height 2" sf="6" width="12">
				<parameter idref="treeModel2.rootHeight"/>
			</column>
			<column label="Kappa" sf="6" width="12">
				<parameter idref="hky.kappa"/>
			</column>
		</log>
		<log logEvery="1000" fileName="multilocus.log">
			<compoundLikelihood idref="likelihood"/>
			<parameter idref="constant.popSize"/>
			<parameter idref="treeModel1.rootHeight"/>
			<parameter idref="treeModel2.rootHeight"/>
			<parameter idref="hky.kappa"/>
			<treeLikelihood idref="treeLikelihood1"/>
			<treeLikelihood idref="treeLikelihood2"/>
			<coalescentLikelihood idref="coalescent1"/>
			<coalescentLikelihood idref="coalescent2"/>
		</log>
		<logTree  logEvery="1000" nexusFormat="true" fileName="multilocus.locus1.trees">
			<treeModel idref="treeModel1"/>
		</logTree>
		<logTree  logEvery="1000" nexusFormat="true" fileName="multilocus.locus2.trees">
			<treeModel idref="treeModel2"/>
		</logTree>
	</mcmc>

It is also possible to duplicate the substitution model and gamma site heterogeneity models to allow for a mixed model between loci.


Return to Tutorials