Dating a node with BEAST2.0
For details, read http://beast2.cs.auckland.ac.nz/index.php/Main_Page and http://beast2.cs.auckland.ac.nz/index.php/FAQ . For any problem, do not hesitate to browse through the list of questions on the BEAST forum https://groups.google.com/forum/#!forum/beast-users .
Exercise description
The exercise is mainly based on the Divergence Dating tutorial, but also includes a few screen captures.
Sequences files for this exercise are taken from a very inspiring published work
Schnitzler, J., T.G. Barraclough, J.S. Boatwright, P. Goldblatt, J.C. Manning, M.P. Powell, T. Rebelo, and V. Savolainen. 2011. Causes of Plant Diversification in the Cape Biodiversity Hotspot of South Africa. Systematic Biology 60: 343–357.
Download the two files for the exercise containing aligned sequences, and rename each with .nex instead of .txt extension. Sequences in the two files are in the same order and have exactly the same name.
http://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txt
http://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txt
The files include sequences of the ITS and the trnL regions, from 74 species of the Protea genus, which comprises a total of 115 species, and the same loci from one specimen of Faurea, the sister genus of Protea. According to another study, the split leading to Faurea and to Protea is 28.4 Ma (central 95% range 24.4–32.3 Ma). With an approximate date for this single root node, and the DNA sequences in hand, we will try to answer the question: How old is one of the oldest clade in Protea, comprising P. cryophila, P. pruinosa, P. scabriuscula, P. scolopendrifolia ?
Disclaimer
The objective of the exercise is simply to show how to use the BEAUti and BEAST programs. It should not be taken, as is, as an example good for publication.
Step by step instructions
(Optional, if time allows) Before importing into BEATi, use any phylogenetic program you liked, to run an analysis of the sequence datasets provided, to confirm that the 5 species listed above truly group into a clade.
1 Import the two files in BEAUti, with File/Import Alignment. For each, we will now go over the top panel (Partitions / Tip dates / Site Model / Clock Model / Priors / MCMC), to choose all the required options. BEAUti will then create a .xml file readable by the BEAST program.
2 Tip dates: Nothing is added or modified to this panel
3 Setting the substitution model
It is best to determine which model of nucleotide substitution is best-fit to the alignment by ModelTest or JModelTest2.
For ITS alignment, we'll use the following settings
Substitution rate = 1 ; Check the “Estimates” boxes
Gamma Category Count =4 ; Check the “Estimates” boxes for Shape parameter
Substitution model: HKY Kappa = 2 ; Check the “Estimates” boxes
Frequencies : Estimated
Check the Fix mean mutation rates
For the trnL, we will choose the model selected by JModelTest2
4 Setting the clock model
The molecular clock is a hypothesis that mutation rates and substitution rates do not vary among lineages in a tree (see BEAST2 Glossary). Within closely related taxa, within a genus for instance, mutation rates are likely similar, and the strict clock could be used. For distantly related taxa, rate among lineages will certainly vary, and the clock will not be strict. For this exercise, we will select “Strict Clock”.
5 Setting the Priors
This step is the most critical and most complicated of the whole process.
For the Tree prior, we will select the Yule model. Yule models are generally appropriate with sequences from different species, while Coalescent models are for different populations of a same species. Do the same for both partitions (ITS and trnL)
For BirthRate, leave it at “Uniform”.
For ClockRate for trnL, select Gamma, Alpha=2 estimate, Beta=2,
For the GammaShape, leave it as it is, Exponential, Mean=1, check estimate.
For kappa for ITS, also leave it as is, LogNormal, M=1, S=1.25
We will now add two calibration nodes, to answer the original question. One node will be the root, it will include all taxa (including the single Faurea), and we will set a specific known age to it. The other node will include the 5 species listed above, and we will name this clade “OldClade_ITS” and “OldCladetrnL”. Beast will estimate an age for this clade.
Click the “+” sign at the bottom of the window. Choose which tree to use, begin with ITS.
Select all taxa, and send to the right window. Give a meaningful label to the taxon set e.g. AllTaxa_ITS . Do the same for the other partition (gene), and label it with a different name.
For the two created sets, check the monophyletic box, select Normal, Mean=28.4, Sigma=2.5, do not check the estimate box.
Create two new taxon sets, for the trnL and the ITS tree, each containing the four species P. cryophila, P. pruinosa, P. scabriuscula, P. scolopendrifolia. Label the sets OldClade_ITS and OldClade_trnL. For each set, Select Normal, check the monophyletic box, and add a reasonable number for Mean, Mean=11, check the estimate box, Sigma=1, check estimate.
You can notice that new parameters appeared. Leave them as for the exercise.
6 MCMC
Chain length= 4000000
Store every= 2000
Burnin (trees excluded from the beginning of analysis)= 500 000
Treelog - Log Every = 1000
Screenlog = log every 2000
7 Optional for the exercise
Using BEAUti, set up the same analysis but select the “Sample from prior only” option, under the MCMC options. This is to visualize their distribution in the absence of the sequence data.
8 Run the analysis
Save your the BEAUti settings as a .xml file. Open the BEAST program, Load the .xml file, and run the search. When completed, modify the name. This is because for you be able to re-launch an BEAST analysis with the same .xml file.
9 Analyze the results
Open the program Tracer. Load the newly created .log file.
Select the two mrcatime(OldClade) values, and look at the Estimates.
Try answer the questions http://qcbs.ca/wp-content/uploads/2013/09/Beast_questions3.pdf
10 Look at the trees
Open the TreeAnnotator program.
Burnin (in nb of trees)= 30000
Posterior probability limit=0.9 (a value of 0 will annotate all trees)
Target Tree Type= Maximum clade credibility tree
Node height= Mean heights
Choose the trnL input file and give a name for output file.
Open the FigTree program
Open the newly created tree file
Re-root at Faurea, and order nodes, decreasing or increasing at your preference
Check “Node Bars”, select “Height_95%_HPD”