This is an old revision of the document!


Testing evolutionary hypothesis with BEAST2.0

Exercise description

The exercise is mainly based on the Divergence Dating tutorial, but also includes a few screen captures. The screen captures images are grouped into a single .pdf file to be downloaded, due to a temporary inability of uploading image files to the QCBS wiki.

Sequences files for this exercise are taken from a very inspiring published work
Schnitzler, J., T.G. Barraclough, J.S. Boatwright, P. Goldblatt, J.C. Manning, M.P. Powell, T. Rebelo, and V. Savolainen. 2011. Causes of Plant Diversification in the Cape Biodiversity Hotspot of South Africa. Systematic Biology 60: 343–357.

Download the two files for the exercise containing aligned sequences, and rename each with .nex instead of .txt extension. Sequences in the two files are in the same order and have exactly the same name.
http://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_ITS.txt
http://qcbs.ca/wp-content/uploads/2013/06/ProteaFaurea_trnL.txt

The files include sequences of the ITS and the trnL regions, from 74 species of the Protea genus, which comprises a total of 115 species, and the same loci from one specimen of Faurea, the sister genus of Protea. According to another study, the split leading to Faurea and to Protea is 28.4 Ma (central 95% range 24.4–32.3 Ma). With an approximate date for this single root node, and the DNA sequences in hand, we will try to answer the question: How old is one of the oldest clade in Protea, comprising P. cryophila, P. lorea, P. pruinosa, P. scabriuscula, P. scolopendrifolia ?

Disclaimer

The objective of the exercise is simply to show how to use the BEAUti and BEAST programs. It should not be taken, as is, as an example good for publication.

Step by step instructions

(Optional, if time allows) Before importing into BEATi, use any phylogenetic program you liked to run an analysis of the sequence datasets provided, to confirm that the 5 species listed above truly group into a clade.

1 Import the two files in BEAUti, with File/Import Alignment. For each, we will now go over the top panel (Partitions / Tip dates / Site Model / Clock Model / Priors / MCMC), to choose all the options required to answer the question. BEAUti will then create a .xml file readable by the BEAST program.

2 Tip dates: Nothing is added or modified to this panel

3 Setting the substitution model
It is best to determine which model of nucleotide substitution is best-fit to the alignment by ModelTest or JModelTest2.
For ITS alignment, we'll use the following settings
Substitution rate = 1 ; Check the “Estimates” boxes
Gamma Category Count =4 ; Check the “Estimates” boxes for Shape parameter
Substitution model: HKY Kappa = 2 ; Check the “Estimates” boxes
Frequencies : Estimated
Check the Fix mean mutation rates
Substitution models in BEAUti for ITS

For the trnL, we will choose the model selected by JModelTes2
Substitution models in BEAUti for trnL

4 Setting the clock model
The molecular clock is a hypothesis that mutation rates and substitution rates do not vary among lineages in a tree (see BEAST2 Glossary). Within closely related taxa, within a genus for instance, mutation rates are likely similar, and the strict clock could be used. For distantly related taxa, rate among lineages will certainly vary, and the clock will not be strict. For this exercise, we will select “Strict Clock”.

5 Setting the Priors
This step is the most complicated of the whole process. For the Tree prior, we will select the Yule model. Yule models are generally appropriate with sequences from different species, while Coalescent models are for different populations of a same species. Do the same for both partitions (ITS and trnL)

For BirthRate, leave it at “Uniform”.

For ClockRate for trnL, select Gamma, Alpha=2 estimate, Beta=2,
Clock Rate for trnL

For the GammaShape, leave it as it is, Exponential, Mean=1, check estimate.

For kappa for ITS, also leave it as it is, LogNormal, M=1, S=1.25

We will now add two calibration nodes, to answer the original question. One node will be the root, it will include all taxa (including the single Faurea), and we will set a specific known age to it. The other node will include the 5 species listed above, and we will name this clade “OldClade_ITS” and “OldCladetrnL”. Beast will estimate an age for this clade.
Click the “+” sign at the bottom of the window. Choose which tree to use, begin with ITS.

Select all taxa, and send to the right window. Give a meaningful label to the taxon set e.g. AllTaxa_ITS . Do the same for the other partition (gene), and label it with a different name.
For the two created sets, check the monophyletic box, select Normal, Mean=28.4, Sigma=2.5, do not check the estimate box.

Create two new taxon sets, for the trnL and the ITS tree, each containing the five species P. cryophila, P. lorea, P. pruinosa, P. scabriuscula, P. scolopendrifolia. Label the sets OldClade_ITS and OldClade_trnL. For each set, Select Normal, check the monophyletic box, and add a reasonable number for Mean, Mean=11, check the estimate box, Sigma=1, check estimate.

You can notice that new parameters appeared. Leave them as is for now.

6 MCMC
Chain length= 4000000 Store every= 2000 Burnin (trees excluded from the beginning of analysis)= 500 000 Treelog - Log Every = 1000 Screenlog = log every 2000

7
Using BEAUti, set up the same analysis but under the MCMC options, select the Sample from prior only option. This is to vizualize their distribution in the absence of the sequence data.