This tutorial is intended to describe how we investigate correlated trait evolution, through methods that examine trait associations between discrete traits. Conceptually, the challenge is to identify associated shifts in two discrete traits while accounting for phylogenetic non-independence (i.e., conditioning the data on the phylogeny). For trait associations between discrete changes, we use likelihood or Bayesian approaches. For continuous traits (described next time), we use one of several implementations of phylogenetic generalized least squares.
To run the analyses in this tutorial, please download the following files from our Git-course repository:
Additionally, The R-script for running the code found in this tutorial is found here:
As before, on your local computer, place them in a folder named ‘TutorialData’, which is found in the same directory as this tutorial.
Recall that in macroevolutionary studies, we are often interested in determining how two or more traits covary across a broad set of taxa. For continuous variables, this is a ‘regression’ problem: we statistically quantify the degree of covaraition between Y and X while accounting for the phylogenetic relationships among species. If a significant relationship is observed, we can conclude that there is an evolutionary association between Y and X, which may be due to some biological factor of interest (e.g., natural selection, adaptation, etc.).
As evolutionary biologists we may wish to ask very similar questions when our phenotypic data are not continuous, but are instead discrete. For example, we may have species with two color ‘states’: aposematic or cryptic. And these same species may be classified behaviorally as either solitary or gregarious. An interesting evolutionary question would be: Is there an association between behavior and coloration? Put another way, are cryptic species more often solitary, and aposematic more gregarious? From the data in Fig 1., we may wish to test this hypothesis statistically.
One might be tempted to simply perform a Chi-square test on these data. However, this ignores evolutionary history, and how these are distributed on the phylogeny can make an enormous difference! In case 1 (Fig. 2A) the traits are clustered in two sub-lineages, implying that a single evolutionary transition in each trait could have caused our pattern. This is not overly impressive evolutionarily, as it represents an \(N=1\) case. On the other hand, case 2 (Fig. 2B) shows many independent associated changes of the two traits. This is far more impressive. Clearly, taking phylogeny into consideration is absolutely essential for correct biological inference.
Before we perform our comparative analysis we must read in our data and our phylogeny into R, and ‘link’ them up by pruning them relative to one another so that all taxa in the tree are found in the data matrix and vice versa. Recall that for this to work properly, the rownames of our data matrix (or ‘names’ of a vector if a single variable) must match the names of the taxa at the tips of the phylogeny. These are found as: phy$tip.label
. It may be helpful to view previous tutorials if you need help reading in and matching data.
geiger
, phytools
, and corHMM
package with library()
.read.tree()
to read in the tree titled tree.64.tre
. This file can be found in the data folder of this tutorialread.csv()
function to read in the discrete trait data called DiscreteData.csv
treedata()
to match the phylogeny tips to the trait data. Be sure to set the sort
argument to TRUE
.tree
and the matched data to a new variable called mydata
. The remainder of the tutorial will assume these variable names for the phylogeny and data frmae with traits, respectively.Be sure to get a feel for the phylogeny (use plot()
) and the data (head()
can be useful here)! It is always nice to know the data we are working with before jumping into analyses.
It can also be helpful to plot the trait data on the tree to help visualize various scenarios. Here we are going to plot two traits side by side, V1
and V3
. We can see that both of these traits are binary as they only consist of \(1\) and \(0\). We’ll plot each of the states a different color for the trait. There are a lot of ways to plot traits on a tree, we’ll be using tiplabels()
to get the job done
plot.phylo(tree,show.tip.label = F) ## setting show.tip.label = F prevents the species names from appearing
## We make a colour key to denote which state gets which color
colorkey1 <- c('red','black') ##trait V1
colorkey2 <- c('green','orange') ##trait V3
names(colorkey1) <-c(0,1) ##Here we are saying that 0s get red and 1s get black
names(colorkey2) <-c(0,1) ##Here we are saying that 0s get green and 1s get orange
V1cols<-colorkey1[as.character(mydata[,1])] ##putting the trait values in as.character() is important!
V3cols<-colorkey2[as.character(mydata[,3])]
tiplabels(pch = 17,col = V1cols,cex=0.6)
tiplabels(pch = 19,col = V3cols,offset = 0.2,cex=0.6)
tiplabels()
This is a general plotting function for the tips of a plotted phylogeny tiplabels()
has many different arguments, it can be helpful to look at the help ?tiplabels
to learn more. We’ll just be going over the 3 arguments used for plotting the discrete character data:
1
shapekey <- c(17,19) ##triangles and circles
names(shapekey) <-c(0,1) ##0s get triangles and 19 get circles
shapes<-shapekey[as.character(mydata[,1])] ##putting the trait values in as.character() is important!
plot(tree,show.tip.label = F)
tiplabels(pch=shapes)
One component of phenotypic pattern that differs between discrete and continous traits concerns transition rates. With continuous data, we envision trait change following Brownian motion, where changes in trait values are independent from time step to time step, both in terms of their magnitude (how much change), and their direction (positive or negative change relative to the previous step). For discrete traits, one can also model trait change that way, and in fact several models do so based on MCMC properties (which also the discrete-form of a Brownian motion process). However, because there are a finite number of states, one can ask additional questions about pattern.
For instance, is the rate of change from state \(A \rightarrow B\) greater than the rate of change ‘back’ from \(B \rightarrow A\)? This question has little meaning for continuous traits, but could be very important for discrete traits. These values are embodied in the transition rate matrix Q, which is obtained from the estimated probability matrix (Fig. 3). Derivation of these matrices is described in Pagel (1994).
From these we can pose a question and test a hypothesis: Is the evolutionary transition rate from state \(A \rightarrow B\) the same as the transition rate from \(B \rightarrow A\)? Evaluating this hypothesis requires fitting two evolutioanry models to the data: one where the rates (qAB, qBA) are equal, and one where the two rates are allowed to vary. The fit of these models may then be evaluated using likelihood ratio tests (LRT), AIC values, or some other statistical approach. Below is an example. We’ll be using getStateMat4Dat()
to create our matrix of state transitions and corHMM()
to fit various models of discrete trait evolution.
getStateMat4Dat()
This creates a rate matrix for the data to describe the discrete character evolution. It has the following parameters:
This function returns a list of objects:
\(\$legend\) A named vector. The elements of the vector are all the unique state combinations in the user data. The names of the vector are the state number assigned to each combination.
\(\$rate.mat\) A rate index matrix describing a single rate class. This is the element we’ll use in corHMM()
.
library(corHMM)
## Loading required package: nloptr
## Loading required package: GenSA
plot.phylo(tree,show.tip.label = F)
tiplabels(pch = 17,col = V1cols)
#Set up data. corHMM is picky about how it wants the data formatted
trt1<-cbind(row.names(mydata),mydata[,1])
#Set up initial rate matrices
trait1_model_er <- getStateMat4Dat(trt1,"ER") #equal transition rates
trait1_model_ard <- getStateMat4Dat(trt1,"ARD") #All rates different
We can then view the models we made by simply typing out the variables. We can see the matrix of transitions in the \(\$rate.mat\). Numbers that are the same refer to those transitions having the same rate and \(0\) denotes a transition that never occurs. We can see the states the column and row numbers refer to by looking at \(\$legend\).
We can also use plotMkmodel()
to make a ball and stick visualization of the transition matrix.
trait1_model_er
## $legend
## 1 2
## "0" "1"
##
## $rate.mat
## (1) (2)
## (1) 0 1
## (2) 1 0
trait1_model_ard
## $legend
## 1 2
## "0" "1"
##
## $rate.mat
## (1) (2)
## (1) 0 2
## (2) 1 0
plotMKmodel(trait1_model_er$rate.mat,rate.cat = 1)
plotMKmodel(trait1_model_ard$rate.mat,rate.cat = 1)
After assessing that we’ve set up the models we wanted we can fit and compare the models with corHMM()
corHMM()
This function fits a model of discrete character evolution. This function has many parameters but we’ll be using the following ones:
1
. Numbers greater than 1
will create hidden rate categories.getStateMat4Dat()
here.This function returns the fit model with parameter estimates and summary statistics. Some elements of interest are:
#Fit models: TRAIT 1
trait1_fit_er <-corHMM(phy = tree, data = trt1, rate.cat = 1, rate.mat = trait1_model_er$rate.mat)
## State distribution in data:
## States: 1 2
## Counts: 32 32
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
trait1_fit_ard <-corHMM(phy = tree, data = trt1, rate.cat = 1, rate.mat = trait1_model_ard$rate.mat)
## State distribution in data:
## States: 1 2
## Counts: 32 32
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
##The estimated rates
trait1_fit_er$solution
## (1,R1) (2,R1)
## (1,R1) NA 0.008068182
## (2,R1) 0.008068182 NA
trait1_fit_ard$solution
## (1,R1) (2,R1)
## (1,R1) NA 0.008067153
## (2,R1) 0.00806795 NA
#compare models: logL and AIC
trait1_fit_er$loglik
## [1] -5.824139
trait1_fit_ard$loglik
## [1] -5.824139
##We can formally perform a likelihood ratio test of nested models by doing the following
summary_stat <- -2 * (trait1_fit_er$loglik - trait1_fit_ard$loglik) ## -2 * (model_constrained - model_full)
##This is our p-value
pchisq(summary_stat,df = 1,lower.tail=FALSE ) ##degrees of freedom is the difference in the number of parameters between the two models. In our case 2-1=1
## [1] 1
trait1_fit_er$AIC
## [1] 13.64828
trait1_fit_ard$AIC
## [1] 15.64828
First, we can see that both models came to roughly the same transition rate estimates. This would explain why the log likelihoods are markedly similar between the models. As a rule of thumb, when log likelihoods are sufficiently similar we chose the less complex model. Indeed, when we look at AIC scores that factor in model complexity we see a better score for the equal rates model. The comparison reveals that the more complex model (different rates) is not a better fit as compared to the simpler model (equal rates). Thus, one would conclude that the rate of change from state \(A \rightarrow B\) is equivalent to the rate of change from \(B \rightarrow A\) for these data.
Let’s analyze traits 2 and 3 as well to assess whether an equal rates model or all rates different model best describes the data. Specifically, for both traits 2 and 3:
ARD
model and a ER
modelcorHMM()
Which model fits better in each scenario? What are the transition rates of the better fitting models? What would you say about the evolution of the traits?
Having examined questions concerning rates of change between states of a single trait, we can now return to the question of correlated evolution. Here we wish to know whether two traits are associated phylogenetically. For continuous traits, this is akin to phylogenetic regression or correlation. However, for discrete traits, this question becomes one of evaluating transition rates between sets of trait changes.
In essence, this is a generalization of the rate matrix for a single trait, only now we consider two traits. Here, we have a 4 X 4 rate matrix describing the transition rates between character states for 2 traits (Fig. 4). We can see some zeros in the rate matrix for certain transitions, this is because we assume that only one trait changes at a given time (We assume \(0\_0 \leftrightarrow 1\_1\) or \(0\_1 \leftrightarrow 1\_0\) never occurs).
To evaluate trait associations we now fit two models. The first is our ‘independent’ model. Here transition rates are the same regardless of trait state (in other words, the rate is independent of the row or column in the matrix). For binary traits, we can specify an ‘independent mode’ by giving equal rates across the matrix.
The second model is describes correlated (dependent) trait evolution. Here the trait changes depend on the initial state. This model has more free parameters, as all transition rates are permitted to differ from one another. One then fits the data to the phylogeny under this second model.
We create and fit the models just like we did for the single trait case, however, this time our data frame will contain two traits instead of just one.
##first plot the data
V1cols<-colorkey1[as.character(mydata[,1])] ##putting the trait values in as.character() is important!
V2cols<-colorkey2[as.character(mydata[,2])]
plot(tree,show.tip.label = F)
tiplabels(pch = 17,col = V1cols)
tiplabels(pch = 19,col = V2cols,offset = 0.2)
##we set up the same format of data frame as before but now we include two traits: trait 1 and trait 2
trtset12<-cbind(row.names(mydata),mydata[,1:2])
trait12_model_er <- getStateMat4Dat(trtset12,"ER") #equal transition rates
trait12_model_ard <- getStateMat4Dat(trtset12,"ARD") #All rates different
trait12_model_er
## $legend
## 1 2
## "0_0" "1_1"
##
## $rate.mat
## (1) (2)
## (1) 0 0
## (2) 0 0
trait12_model_ard
## $legend
## 1 2
## "0_0" "1_1"
##
## $rate.mat
## (1) (2)
## (1) 0 0
## (2) 0 0
We have two traits so we should have a \(4\times4\) transition matrix but we don’t see that when visualizing, why not? This actually has to do with something sneaky that happens when we make our transition matrix. Since the states of two traits match each other perfectly in that we only ever see \(0\_1\) or \(1\_0\), the two traits effectively get collapsed into one combined trait with two states. We can fix this issue by making our function think all four trait combinations are present
##coax the full transition matrix by having a data frame with all trait combinations
## First get all the unique states for the traits in question
states1<-unique(mydata[,1])
states2<-unique(mydata[,2])
traits12_expanded<-expand.grid('species',states1,states2) ##Having some string at the beginning is important! Our data needs the first column to be populated with names, this achieves that
##try making matrices again
trait12_model_er <- getStateMat4Dat(traits12_expanded,"ER") #equal transition rates
trait12_model_ard <- getStateMat4Dat(traits12_expanded,"ARD") #All rates different
##The full transition matrix. Hot dog!
trait12_model_er
## $legend
## 1 2 3 4
## "0_0" "0_1" "1_0" "1_1"
##
## $rate.mat
## (1) (2) (3) (4)
## (1) 0 1 1 0
## (2) 1 0 0 1
## (3) 1 0 0 1
## (4) 0 1 1 0
trait12_model_ard
## $legend
## 1 2 3 4
## "0_0" "0_1" "1_0" "1_1"
##
## $rate.mat
## (1) (2) (3) (4)
## (1) 0 3 5 0
## (2) 1 0 0 7
## (3) 2 0 0 8
## (4) 0 4 6 0
plotMKmodel(trait12_model_er$rate.mat,rate.cat = 1)
## Warning in if (class(corhmm.obj) == "matrix" & is.null(rate.cat)) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(corhmm.obj) == "corhmm") {: the condition has length > 1
## and only the first element will be used
## Warning in if (class(corhmm.obj) == "matrix") {: the condition has length > 1
## and only the first element will be used
## Warning in text.default(x, y, labels = labels, col = label.color, family =
## label.family, : font family not found in Windows font database
plotMKmodel(trait12_model_ard$rate.mat,rate.cat = 1)
## Warning in if (class(corhmm.obj) == "matrix" & is.null(rate.cat)) {: the
## condition has length > 1 and only the first element will be used
## Warning in if (class(corhmm.obj) == "corhmm") {: the condition has length > 1
## and only the first element will be used
## Warning in if (class(corhmm.obj) == "matrix") {: the condition has length > 1
## and only the first element will be used
## Warning in text.default(x, y, labels = labels, col = label.color, family =
## label.family, : font family not found in Windows font database
Neat-o! The matrix seems set up with all character states present. Now we could continue fitting the model exactly as we did before and comparing the log likelihoods and AIC scores. Additionally we can try Pagel’s test for correlated trait evolution.
tr1<-mydata[,1]; names(tr1)<-row.names(mydata)
tr2<-mydata[,2]; names(tr2)<-row.names(mydata)
pagel_fit <- fitPagel(tree,x=tr1,y=tr2)
pagel_fit
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## 0|0 0|1 1|0 1|1
## 0|0 -0.016135 0.008068 0.008068 0.000000
## 0|1 0.008068 -0.016135 0.000000 0.008068
## 1|0 0.008068 0.000000 -0.016135 0.008068
## 1|1 0.000000 0.008068 0.008068 -0.016135
##
## Dependent (x & y) model rate matrix:
## 0|0 0|1 1|0 1|1
## 0|0 0.00000 0.00000 0.00000 0.00000
## 0|1 11.08174 -22.16347 0.00000 11.08174
## 1|0 11.08174 0.00000 -22.16347 11.08174
## 1|1 0.00000 0.00000 0.00000 0.00000
##
## Model fit:
## log-likelihood AIC
## independent -11.648278 31.29656
## dependent -2.079442 20.15888
##
## Hypothesis test result:
## likelihood-ratio: 19.1377
## p-value: 0.000738472
##
## Model fitting method used was fitMk
Pagel’s test finds high evidence for correlated evolution but this could be a farce! The two traits are always co-distributed and clustered. Essentially, 1 evolutionary shift could explain the pattern, this is the ‘BiSSE’ problem (need ancestral state estimation: next week) ALWAYS PLOT YOUR DATA! DON’T JUST RUN STATISTICS (statistical analysis alone here leads to mis-interpretation)
Sometimes we have evolutionary hypotheses that require directionality: “Did the evolution of aposematic coloration precipitate the evolution of gregariousness?” To answer such questions, one must create a model where changes in one trait facilitate changes in another trait. This entails alteration of components in the expected rate matrix \(Q\). For example:
trait_model_er <- getStateMat4Dat(traits12_expanded,"ER") #Transition matrix with equal transition rates for to two binary traits
##We will use the ER transition matrix as a starting point for our directional model
trait_model_dir<-trait_model_er
##Give some of the transitions their own rate
trait_model_dir$rate.mat[1,2]<-2
trait_model_dir$rate.mat[3,4]<-3
trait_model_dir
## $legend
## 1 2 3 4
## "0_0" "0_1" "1_0" "1_1"
##
## $rate.mat
## (1) (2) (3) (4)
## (1) 0 2 1 0
## (2) 1 0 0 1
## (3) 1 0 0 3
## (4) 0 1 1 0
In this case, the equal rate model is the null, whereas the directional rate model is the alternative. For this example, we are explicitly testing the hypothesis that shifts in trait \(Y\) from \(0\rightarrow1\) are facilitated by trait \(X\) being in state \(0\), because we are allowing \(Q(0\_0\rightarrow 0\_1) \neq Q(1\_0\rightarrow1\_1)\). In other words, evolutionary shifts to state \(1\) for \(Y\) are different, depending on whether \(X\) is a state \(0\) or a state \(1\).
IMPORTANT! One must think carefully about the hypothesis under investigation to properly adjust the alternative model to represent it!! Thinking is not only encouraged; it is required.
Example with traits 3 and 4
traits34<-cbind(row.names(mydata),mydata[,3:4])
trait34_fit_er <-corHMM(phy = tree, data = traits34, rate.cat = 1, rate.mat = trait_model_er$rate.mat)
## State distribution in data:
## States: 1 4
## Counts: 32 32
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
trait34_fit_dir <-corHMM(phy = tree, data = traits34, rate.cat = 1, rate.mat = trait_model_dir$rate.mat)
## State distribution in data:
## States: 1 4
## Counts: 32 32
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
trait34_fit_er$loglik
## [1] -85.97839
trait34_fit_dir$loglik
## [1] -75.81068
trait34_fit_er$AIC
## [1] 173.9568
trait34_fit_dir$AIC ##Strong preference for trait34_fit_dir
## [1] 157.6214
Here we see that there is strong support for directional dependent evolution. That is, evolutionary changes in Y from \(0 \rightarrow 1\) depend upon the state of X (here, X must be a ‘1’ for the transition to occur).
Porgs of the Star Wars Universe are small sea-dwelling birds that are native to the planet Anch-To. They are a hypercurious species that roosts on cliffs and evolved from seabirds.
Many porg species differ by their presence of webbed feet. These webbed feet help the porgs navigate water better and allow them to hunt many different types of sea creatures. Their land dwelling unwebbed-foot relatives typically have fewer niches to exploit as the terrestrial portions of Anch-To are largely devoid of life. These porgs typically rely on eating bugs in small crevices between rocks. Additionally, some porg species have long proboscis-like beaks while others are entirely beakless. Porgs use their beaks to spear fish or reach food in narrow spaces. The beakless porgs use their lack of beak to fit various crustaceans in their strong mouths, cracking their hard exteriors for sustenance.
Let’s analyze the evolution of beaks and webbed feet in porgs. Specifically, we might be interested in the transition rates between states. Do we see equal rates? Do we see transitions in both directions? Does having webbed feet seem to change the dynamics of transitioning between beaked state and a beakless state?
ARD
model for correlated evolution and an ER
model for ‘independent’ trait evolution.corHMM()
Now try fitting a directional character evolution model. How can we use information about their life strategies to inform a directional model? Would we expect to see webbed porgs to transition to a beakless state at the same rate as unwebbed porgs? Would it seem likely that an unwebbed porg with a beak would transtion to a beakless state?
Which model fits better in each scenario? What are the transition rates of the better fitting models? What would you say about the evolution of the traits?