Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
r_workshop6 [2016/09/05 11:14]
vincent_fugere [Workshop 5: Linear mixed effects models]
r_workshop6 [2021/03/23 12:35] (current)
lsherin
Line 5: Line 5:
  
 This series of [[r|10 workshops]] walks participants through the steps required to use R for a wide array of statistical analyses relevant to research in biology and ecology. These open-access workshops were created by members of the QCBS both for members of the QCBS and the larger community. This series of [[r|10 workshops]] walks participants through the steps required to use R for a wide array of statistical analyses relevant to research in biology and ecology. These open-access workshops were created by members of the QCBS both for members of the QCBS and the larger community.
-====== Workshop ​6: Linear mixed effects ​models ======+ 
 +//The content of this workshop has been peer-reviewed by several QCBS members. If you would like to suggest modifications,​ please contact the current series coordinators,​ listed on the main wiki page// 
 + 
 +<wrap em>​IMPORTANT NOTICE: MAJOR UPDATES</​wrap>​ 
 + 
 + 
 +**March 2021 update:** This wiki has been discontinued and is no longer being actively developed or updated. Updated materials for the QCBS R Workshop Series are now housed on the QCBS R Workshop [[https://​github.com/​QCBSRworkshops/​workshop07|GitHub page]].  
 + 
 +Available materials include; 
 +  - The [[https://​qcbsrworkshops.github.io/​workshop07/​pres-en/​workshop07-pres-en.html|Rmarkdown presentation]] for this workshop; 
 +  - An [[https://​qcbsrworkshops.github.io/​workshop07/​book-en/​workshop07-script-en.R|R script]] that follows the presentation;​ 
 +  - [[https://​qcbsrworkshops.github.io/​workshop07/​book-en/​index.html|Written materials]] that accompany the presentation in bookdown format. 
 + 
 +**January 2021 update:** The structure of Workshops 6 + 7 have been changed.  
 + 
 +If you are looking for the workshop covering generalized linear models (GLM), please refer to [[r_workshop7|Workshop 6]]. 
 + 
 +If you are looking for the workshop covering linear and generalized linear mixed models (LMM and GLMM), please refer to [[r_workshop6|Workshop 7]]. 
 + 
 +====== Workshop ​7: Linear ​and generalized linear ​mixed models ======
  
 Developed by: Catherine Baltazar, Dalal Hanna, Jacob Ziegler Developed by: Catherine Baltazar, Dalal Hanna, Jacob Ziegler
  
-**Summary:​** Mixed effects models allow ecologists to overcome a number of limitations associated with traditional linear models. In this workshop, you will learn when it is important to use a mixed effects model to analyze your data. We will walk you through the steps to conduct a linear mixed model analysis, check its assumptions,​ report results, and visually represent your model in R.+Section on GLMMs developed by: Cédric Frenette Dussault, Vincent Fugère, Thomas Lamy, Zofia Taranu 
 + 
 +**Summary:​** Mixed effects models allow ecologists to overcome a number of limitations associated with traditional linear models. In this workshop, you will learn when it is important to use a mixed effects model to analyze your data. We will walk you through the steps to conduct a linear mixed model analysis, check its assumptions,​ report results, and visually represent your model in R. We will also build on the previous workshop to combine knowledge on linear mixed models and extend it to generalized linear mixed effect models. 
 + 
 +**Link to new [[https://​qcbsrworkshops.github.io/​workshop07/​pres-en/​workshop07-pres-en.html|Rmarkdown presentation]]** 
 + 
 +Link to old [[http://​prezi.com/​5syafj73u0qe/​|Prezi presentation]]
  
-Link to associated Prezi: ​[[http://​prezi.com/​5syafj73u0qe/|Prezi]]+Link to old [[http://​prezi.com/​suxdvftadzl4/​|Prezi ​presentation covering GLMMs]]
  
 Download the R script and data for this lesson: ​ Download the R script and data for this lesson: ​
-  -[[http://​qcbs.ca/​wiki/​_media/​qcbs_w5_data.csv| Fish data]] +  -[[http://​qcbs.ca/​wiki/​_media/​qcbs_w6_data.csv| Fish data]] 
-  -[[http://​qcbs.ca/​wiki/​_media/​qcbs_w5_lmm.r| R Code]]+  -[[http://​qcbs.ca/​wiki/​_media/​lmm_e.r| R Code]] 
 +  -[[http://​qcbs.ca/​wiki/​_media/​banta_totalfruits.csv| Arabidopsis data]] 
 +  -[[http://​qcbs.ca/​wiki/​_media/​inverts.csv| Inverts data]] 
 +  -[[http://​qcbs.ca/​wiki/​_media/​glmm_e.r| GLMM Script]] //(this is the script for [[r_workshop7|Workshop 6]] but lines 434-784 cover the GLMM section of this workshop)//​ 
 +  -[[http://​qcbs.ca/​wiki/​_media/​glmm_funs.r| glmm_funs]] //(code to download for use in the GLMM section)//
  
 +In addition, you can look at [[lmm.workshop | a more up-to-date and detailed presentation mixed models]].
  
 ===== Learning Objectives ===== ===== Learning Objectives =====
Line 25: Line 55:
     -Model validation     -Model validation
     -Interpreting results and visualizing the model     -Interpreting results and visualizing the model
 +  -Using LMMs when the assumptions of normality and error distribution are not met (GLMMs).
  
 ===== Overview ===== ===== Overview =====
Line 41: Line 72:
  
 # Place all workshop material in one folder on your computer # Place all workshop material in one folder on your computer
-# Run the following line of code and use the browsing window to choose the QCBS_W5_Data.csv +# Run the following line of code and use the browsing window to choose the QCBS_W6_Data.csv 
 # file in the folder that contains the workshop material # file in the folder that contains the workshop material
 file.choose() file.choose()
Line 61: Line 92:
 library(AICcmodavg) library(AICcmodavg)
  
-data <- read.csv('​QCBS_W5_Data.csv')+data <- read.csv('​QCBS_W6_Data.csv')
 </​code>​ </​code>​
  
Line 669: Line 700:
 ++++ ++++
  
-=====Conclusion & Thought Experiment=====+=====Thought Experiment=====
 Mixed models are really good at accounting for variation in ecological data while not loosing too many degrees of freedom. Mixed models are really good at accounting for variation in ecological data while not loosing too many degrees of freedom.
  
Line 699: Line 730:
 ++++ ++++
 ---- ----
-=====Textbooks=====+ 
 +===== Generalized linear mixed models (GLMMs) ​===== 
 + 
 +As seen above, heterogeneous response distributions can be accurately modelled by relaxing the assumptions of normally distributed errors and letting the errors take on different distribution families (e.g., Poisson or negative binomial). What happens if there is a structure in the dataset such that observations sampled in certain sites are more similar (correlated) than observations sampled in different sites? As we saw in [[r_workshop6|Workshop 6]], we can account for this lack of independence by adding random effects in the model. Recall that intercepts and/or slopes are allowed to vary among grouping factors (random effects) and the variability in slopes/ intercepts among groups is assumed to follow a normal distribution. Random effects are also referred to as shrinkage estimates because they represent a weighted average of the group-level and the overall fit (fixed effect). The amount of shrinking towards the overall fit is more severe if the within-group variability is large compared to the among-group variability (i.e., when we have less confidence in the random estimate due to low sample size or high within-group variability,​ the estimate is '​pulled'​ towards the fixed effect).  
 + 
 +The properties of Linear Mixed Models (Workshop 5) which incorporate random effects and those of Generalized Linear Models (seen above) which handle non-normal data can be combined to effectively model such instances in a Generalized Linear Mixed Model framework. The extension of GLMs to account for this additional structure follows similar steps introduced in the mixed modelling workshop.\\  
 + 
 +To provide a brief overview of GLMMs, we will visit a case study presented by [[http://​www.facecouncil.org/​puf/​wp-content/​uploads/​2009/​10/​Generalized-linear-mixed-models.pdf|Bolker et al. (2009)]] and later by [[http://​www.cell.com/​cms/​attachment/​601623/​4742452/​mmc1.pdf|Bolker et al. (2011)]] where the structure (or lack of independence) in the dataset was due to observations being measured across different populations. The authors evaluated the extent to which the response of //​Arabidopsis thaliana// (mouse-ear cress) to its environmental drivers (nutrient availability and herbivory) varied as a result of the population and/or genotype they belonged to.\\ 
 + 
 +Using the //​Arabidopsis//​ dataset and detailed analysis presented by Bolker and colleagues, we will explore the eff 
 +ects of nutrient levels, herbivory and their interaction (  
 +fixed effects) on the production //​Arabidopsis thaliana// and the across-population and/or across-genotype variability in these relationships (random eff 
 +ects). 
 + 
 + 
 +<file rsplus | Loading > 
 +# First load and view the dataset 
 +dat.tf <- read.csv("​Banta_TotalFruits.csv"​) 
 +str(dat.tf) 
 + 
 +# X  observation number  
 +# reg  factor for one of three regions; Netherlands,​ Spain or Sweden 
 +# popu  factor with a level for each population (random effect) 
 +# gen  factor with a level for each genotype (random effect) 
 +# rack  nuisance factor for one of two greenhouse racks 
 +# nutrient ​ factor with levels for low (value = 1) or high (value = 8) nutrients (fixed effect) 
 +# amd  factor with levels for no damage or simulated herbivory (apical meristem damage) (fixed effect) 
 +# status ​ nuisance factor for germination method 
 +# total.fruits ​ the response variable; an integer indicating the number of fruits per plant  
 + 
 +# 2-3 genotypes nested within each of the 9 populations 
 +table(dat.tf$popu,​dat.tf$gen) 
 + 
 +# Housekeeping:​ make integers into factors, relevel clipping (amd) and rename nutrient levels 
 +dat.tf <- transform(dat.tf,​ 
 +                    X=factor(X),​ 
 +                    gen=factor(gen),​ 
 +                    rack=factor(rack),​ 
 +                    amd=factor(amd,​levels=c("​unclipped","​clipped"​)),​ 
 +                    nutrient=factor(nutrient,​label=c("​Low","​High"​))) 
 + 
 +# Install/​load packages 
 +if(!require(lme4)){install.packages("​lme4"​)} 
 +require(lme4) 
 +if(!require(coefplot2)){install.packages("​coefplot2",​repos="​http://​www.math.mcmaster.ca/​bolker/​R",​type="​source"​)} 
 +require(coefplot2) ​     
 +if(!require(reshape)){install.packages("​reshape"​)} 
 +require(reshape)  
 +if(!require(ggplot2)){install.packages("​ggplot2"​)} 
 +require(ggplot2) 
 +if(!require(plyr)){install.packages("​plyr"​)} 
 +require(plyr) 
 +if(!require(gridExtra)){install.packages("​gridExtra"​)} 
 +require(gridExtra) 
 +if(!require(emdbook)){install.packages("​emdbook"​)} 
 +require(emdbook) 
 +source("​glmm_funs.R"​) 
 +</​file>​ 
 + 
 +=== Structure in dataset === 
 + 
 +When we look at the interaction between nutrient and clipping across the 9 different populations,​ we note that there is a consistent nutrient e 
 +ffect (higher baseline under the High nutrient treatment). The effect of clipping is weaker, but in some populations we note a negative slope.  
 + 
 +<file rsplus| facet by popu> 
 +ggplot(dat.tf,​aes(x=amd,​y=log(total.fruits+1),​colour=nutrient)) + 
 +  geom_point() +  
 +  stat_summary(aes(x= as.numeric(amd)),​fun.y=mean,​geom="​line"​) + 
 +  theme_bw() + theme(panel.margin=unit(0,"​lines"​)) + 
 +  scale_color_manual(values=c("#​3B9AB2","#​F21A00"​)) + # from Wes Anderson Zissou palette 
 +  facet_wrap(~popu) 
 +</​file> ​  
 + 
 +{{ ::​workshp_6_glmm_popu.png?​600 |}} 
 + 
 +A similar plot can be made for the 24 different genotypes (changing facet_wrap(~gen)).\\ 
 +==== Choosing an error distribution ==== 
 + 
 +The response variable is count data which suggests that a Poisson distribution should be used. Recall that an important property of the Poisson distribution is that the variance is equal to the mean. However, as we will see below, the group variances increase with the mean much more rapidly than expected under the Poisson distribution. 
 + 
 +<file rsplus | Heterogeneity across groups>​ 
 + 
 +# Create new variables that represents every combination nutrient x clipping x random factor 
 +dat.tf <- within(dat.tf,​ 
 +
 +  # genotype x nutrient x clipping 
 +  gna <- interaction(gen,​nutrient,​amd) 
 +  gna <- reorder(gna,​ total.fruits,​ mean) 
 +  # population x nutrient x clipping 
 +  pna <- interaction(popu,​nutrient,​amd) 
 +  pna <- reorder(pna,​ total.fruits,​ mean) 
 +}) 
 + 
 + 
 +# Boxplot of total fruits vs new variable (genotype x nutrient x clipping) 
 +ggplot(data = dat.tf, aes(factor(x = gna),y = log(total.fruits + 1))) + 
 +   ​geom_boxplot(colour = "​skyblue2",​ outlier.shape = 21, outlier.colour = "​skyblue2"​) +  
 +   ​theme_bw() + theme(axis.text.x=element_text(angle=90)) +  
 +   ​stat_summary(fun.y=mean,​ geom="​point",​ colour = "​red"​)  
 + 
 +# Boxplot of total fruits vs new variable (population x nutrient x clipping) 
 +ggplot(data = dat.tf, aes(factor(x = pna),y = log(total.fruits + 1))) + 
 +  geom_boxplot(colour = "​skyblue2",​ outlier.shape = 21, outlier.colour = "​skyblue2"​) +  
 +  theme_bw() + theme(axis.text.x=element_text(angle=90)) +  
 +  stat_summary(fun.y=mean,​ geom="​point",​ colour = "​red"​)  
 +</​file>​ 
 + 
 +{{ :​workshp_6_glmm_boxplot.png?​750 |}}  
 +{{ ::​workshop_6_glmm_boxplot_popu.png?​750 |}} 
 + 
 + 
 +As illustrated above, there is a large amount of heterogeneity among the group variances even when the response variable is transformed. We also note that some groups have a mean and variance of zero. 
 + 
 +To determine __which distribution family to use__, we can run a diagnostic plot of the group variances vs their respective means. We provide an example below for the genotype x nutrient x clipping grouping. 
 + 
 +  - If we observe a linear relationship between the variance and the mean with a slope = 1, then the **Poisson** family is appropriate,​  
 +  - If we observe a linear mean-variance relationship with a slope > 1 (i.e. Var = φµ  where φ  > 1), then the **quasi-Poisson** family (as introduced above) should be applied, 
 +  - Finally, a quadratic relationship between the variance and the mean (i.e. Var = µ(1 + α 
 +) or µ(1 + µ/k)), is characteristic of overdispersed data that is driven by an underlying heterogeneity among samples. In this case, the **negative binomial** (Poisson-gamma) would be more appropriate. 
 +{{ :​worksp_6_error_dist.png?​450 |}} 
 + 
 +From the plot above we note that a linear quasi-Poisson may be better than the negative binomial, but additional modelling is needed. 
 +==== Poisson GLMM ==== 
 + 
 +Let's build a GLMM using the ''​glmer''​ function of the //lme4// package. This model has a random intercept for all genotype and population treatments. We include the nuisance variables (rack and germination method) as fixed effects. Given the mean-variance relationship from above, we most likely will need a model with overdispersion,​ but let's start with a Poisson model. 
 + 
 +<file rsplus | Poisson GLMM> 
 + 
 +mp1 <- glmer(total.fruits ~ nutrient*amd + rack + status + 
 +               ​(1|popu)+ 
 +               ​(1|gen),​ 
 +             ​data=dat.tf,​ family="​poisson"​) 
 +</​file>​ 
 + 
 +We can check for overdispersion using a function ''​overdisp_fun''​ provided by [[http://​www.cell.com/​cms/​attachment/​601623/​4742452/​mmc1.pdf|Bolker et al. (2011)]] which divides the Pearson residuals by the residual degrees of freedom and tests whether these values differ significantly (i.e., whether the ratio of residual deviance to residual df is significantly different from 1). A low p-value indicates that the data are over dispersed (i.e., ratio is significantly different from 1).  
 + 
 +<file rsplus | Overdispersion check> 
 +# Overdispersion?​ 
 +overdisp_fun(mp1) 
 + 
 +# Or as above, we can approximate this by dividing the residual deviance by the residual df 
 +summary(mp1) 
 +# residual deviance = 18253.7 and resid df = 616 
 +</​file>​ 
 + 
 +The ratio is significantly greater than unity, so as expected, we need to try a different distribution where the variance increases more rapidly than the mean, such as the negative binomial. 
 +==== Negative binomial (Poisson-gamma) GLMM ==== 
 + 
 +Recall that the negative binomial (or Poisson-gamma) distribution meets the assumption that the variance is **proportional to the square of the mean**. 
 +<file rsplus | NB> 
 +# Note: This model converges well if you are running R version 3.0.2, but may not converge with later  
 +# versions. If you are having convergence issues, please try with version 3.0.2. 
 + 
 +mnb1 <- glmer.nb(total.fruits ~ nutrient*amd + rack + status +  
 +               ​(1|popu)+ 
 +               ​(1|gen),​ 
 +             ​data=dat.tf,​ control=glmerControl(optimizer="​bobyqa"​)) 
 +# Although beyond the scope of this workshop, the new "​control"​ argument specifies the way we 
 +# optimize the parameter values (i.e. by taking the derivative of a function or proceeding by iteration).  
 +# When taking the derivative is not possible, an iterative algorithm such as bobyqa (Bound Optimization 
 +# BY Quadratic Approximation) is used. 
 + 
 +# Overdispersion?​ 
 +overdisp_fun(mnb1) 
 +</​file>​ 
 + 
 +The ratio is now much closer to unity (although the p-value is still less than 0.05). 
 +==== Poisson-lognormal GLMM ==== 
 + 
 +Another option that we have not yet seen is to use a Poisson-lognormal distribution. This approach deals with overdispersion by implementing an observation-level random effects (OLRE; see [[https://​peerj.com/​articles/​616.pdf|Harrison (2014)]]), which model the extra-Poisson variation in the response variable using a random effect with a unique level for every data point.  
 + 
 +A Poisson-lognormal model effectively places a lognormal prior on ε<​sub>​i</​sub>​. A Poisson-lognormal distribution with mean µ and lognormal prior variance σ<​sup>​2</​sup>​ has variance: 
 + 
 +var(y) = µ + µ<​sup>​2</​sup>​ [exp(σ<​sup>​2</​sup>​) - 1] 
 + 
 +In contrast, for the negative binomial, we saw that the distribution was given by: 
 + 
 +var(y) = µ + µ<​sup>​2</​sup>/​k 
 + 
 +More generally, the variance term σ<​sup>​2</​sup>​ in the Poisson-lognormal distribution will depend on the grouping level we select (e.g., at the individual, genotype or population level). That is, the Poisson-lognormal model can allow for a more flexible approach to assigning the observed aggregation to different sources of heterogeneity. Here, to implement the observation-level random effect, we will evaluate it at the individual level.  
 + 
 +<file rsplus | PL GLMM> 
 +mpl1 <- glmer(total.fruits ~ nutrient*amd + rack + status +  
 +               (1|X) + 
 +               ​(1|popu)+ 
 +               ​(1|gen),​ 
 +             ​data=dat.tf,​ family="​poisson",​ control=glmerControl(optimizer="​bobyqa"​)) 
 + 
 +overdisp_fun(mpl1) 
 +</​file>​ 
 + 
 +We did it! The ratio between residual deviance and residual df now meets our criterion (in fact, it is even //smaller// 1). 
 + 
 +=== Random intercepts === 
 + 
 +Now that we have the appropriate error distribution,​ we can test the importance of the random intercepts (for //​population//​ and //​genotype//​) by comparing nested models with and without the random effects of interest using either:  
 +  - an __information theoretic approach__ (such as, Akaike Information Criterion; AIC), which as we saw in [[r_workshop6#​coding_potential_models_and_model_selection|Workshop 6]] examines several competing hypotheses (models) simultaneously to identify the model with the highest predictive power given the data. As before, we will use the AICc to correct for small sample sizes. Or, 
 +  - a __frequentist approach__ (traditional null hypothesis testing or drop1 approach), where the significance of each term is evaluated in turn (by comparing nested models together using the ''​anova()''​ function and the significance of the likelihood ratio test; LRT). It's important to keep in mind that with this approach we are testing a null hypothesis of zero variance for the random effects, but given that we cannot have negative variance, we are testing the parameter on the boundary of its feasible region. Therefore, the reported p value is approximately twice what it should be (i.e., we've truncated half of the possible values that fall below 0). 
 + 
 +<file rsplus| Evaluating the random intercepts>​ 
 +summary(mpl1)$varcor 
 + 
 +# popu only 
 +mpl1.popu <- glmer(total.fruits ~ nutrient*amd + rack + status +  
 +                     (1|X) + 
 +                     ​(1|popu),​  
 +                     ​data=dat.tf,​ family="​poisson",​ control=glmerControl(optimizer="​bobyqa"​)) 
 + 
 +# gen only 
 +mpl1.gen <​-glmer(total.fruits ~ nutrient*amd + rack + status +  
 +                   (1|X) + 
 +                   ​(1|gen),​  
 +                   ​data=dat.tf,​ family="​poisson",​ control=glmerControl(optimizer="​bobyqa"​)) 
 + 
 +# IC approach using AICc 
 +ICtab(mpl1, mpl1.popu, mpl1.gen, type = c("​AICc"​)) 
 +#            dAICc df 
 +# mpl1       ​0.0 ​  10 
 +# mpl1.popu ​ 2.0   9  
 +# mpl1.gen ​  ​16.1 ​ 9  
 + 
 +# Frequentist approach using LRT 
 +anova(mpl1,​mpl1.popu) 
 +# Data: dat.tf 
 +# Df         ​AIC ​ BIC      logLik ​ deviance ​ Chisq   ​Chi ​   Df    Pr(>​Chisq) ​  
 +# mpl1.popu ​ 9    5017.4 ​  ​5057.4 ​ -2499.7 ​  ​4999.4 ​                           
 +# mpl1       ​10 ​  ​5015.4 ​  ​5059.8 ​ -2497.7 ​  ​4995.4 ​ 4.0639 ​ 1    0.04381 * 
 +#   --- 
 +#   ​Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +anova(mpl1,​mpl1.gen) 
 +# Df        AIC  BIC     ​logLik deviance ​ Chisq    Chi     ​Df ​  ​Pr(>​Chisq) ​    
 +# mpl1.gen ​ 9    5031.5 ​ 5071.5 -2506.8 ​  ​5013.5 ​                             
 +# mpl1      10   ​5015.4 ​ 5059.8 -2497.7 ​  ​4995.4 ​  ​18.177 ​ 1   ​2.014e-05 *** 
 +#   --- 
 +#   ​Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 +</​file>​ 
 + 
 +The model without the random intercept for //​genotype//​ is within 2 AICc units of the full model, which indicates that they are equally plausible (i.e., little evidence for including a random intercept for genotype). However, when using the Likelihood approach, and keeping in mind the boundary effect (p-values are inflated by a factor of 2), we note that p << 0.05 in both anova tests. Thus the model with both random terms (mpl1) is selected. 
 + 
 +=== Parameter plots === 
 + 
 +A graphical representation of the model parameters can be obtained using the ''​coefplot2''​ function. For example, to view the variance terms of our three random intercepts:​ 
 + 
 +<file rsplus| Coefplot2>​ 
 +# Variance terms 
 +coefplot2(mpl1,​ptype="​vcov",​intercept=TRUE,​main="​Random effect variance"​) 
 + 
 +# Fixed effects 
 +coefplot2(mpl1,​intercept=TRUE,​main="​Fixed effect coefficient"​) 
 +</​file>​ 
 + 
 +{{::​glmm_coefplot_int.png?​450|}} {{::​glmm_coefplot_pars.png?​400|}} 
 + 
 +Note that the error bars are only shown for the fixed effects because ''​glmer''​ doesn'​t give us information on the uncertainty of the variance terms. 
 + 
 +=== Random intercept plots === 
 + 
 +We can also extract the random effect (or group-level) deviations from the fixed intercept using the ''​ranef''​ function. This will tell us how much the intercept is shifted up or down in particular //​populations//​ or //​genotypes//​ relative to the fixed intercept. The deviations can then be plotted using ''​dotplot'',​ which will return a two-facetted plot for each random effect (i.e., popu and gen). Note: the ''​grid.arrange''​ function was used to omit the observation-level random effect (i.e. (1|X)).  
 + 
 +<file rsplus| dotplot>​ 
 +pp <- list(layout.widths=list(left.padding=0,​ right.padding=0),​ 
 +           ​layout.heights=list(top.padding=0,​ bottom.padding=0)) 
 +r2 <- ranef(mpl1,​condVar=TRUE) 
 +d2 <- dotplot(r2, par.settings=pp) 
 +grid.arrange(d2$gen,​d2$popu,​nrow=1) 
 +</​file>​ 
 + 
 +{{ ::​glmm_dtoplot.png?​450 |}} 
 + 
 +From this plot we can see a hint of regional variability among populations where the Spanish populations (SP) have larger values than Swedish (SW) and Dutch (NL) populations. The difference among genotypes seems to be largely driven by genotype 34. 
 + 
 +As seen in Workshop 5, we could have also used the ''​coef()''​ function to plot the random effect predictions or "​estimates",​ where the sum of random intercept deviation (obtained from ''​ranef()''​) and the fixed effect (obtained from ''​fixef()''​) is equal to the parameter estimate obtained from ''​coef()''​. 
 + 
 +=== Random slopes === 
 + 
 +++++ Bonus Section: Random slopes| 
 + 
 +**NB**: We cannot test for random slopes in this dataset because whenever random slopes for the fixed effects are included in the model, we obtain a strong correlation between the random intercepts and slopes. These perfect correlations may result from our model being over-parametrized. Thus, while there could very well be some variation in the nutrient or clipping e 
 +ffect at the genotype or population levels, and while this variation may be what we are most interested in, there is simply not enough data to test these e 
 +ffects with confidence. 
 + 
 +Let's have a look at an example of this by testing the random slope terms for clipping: 
 + 
 +<file rsplus | Poisson-lognormal with random slopes>​ 
 +# Poisson-lognormal with random slopes for popu and amd 
 +mpl2 <- glmer(total.fruits ~ nutrient*amd + rack + status +  
 +                (1|X) + 
 +                (amd|popu) + 
 +                (amd|gen),​ 
 +              data=dat.tf,​ family="​poisson",​ control=glmerControl(optimizer="​bobyqa"​)) 
 +</​file>​ 
 + 
 +We can examine the variance components using the model ''​summary''​. Alternatively,​ we could examine the correlation matrix among the random clipping intercepts and slopes. A third option is to use the ''​printvc''​ function that prints the variance-covariance matrices along with an equals sign ( = ) next to any covariance component with a nearly perfect correlation. 
 + 
 +<file rsplus | VarCorr>​ 
 +summary(mpl2) # option 1 
 +attr(VarCorr(mpl2)$gen,"​correlation"​) # option 2 
 +printvc(mpl2) # option 3 
 +</​file>​ 
 + 
 +Notice the perfect correlation between (Intercept) and amdclipped (slope). 
 +++++ 
 + 
 +=== Final model === 
 + 
 +We end this section on GLMM with the evaluation of the fixed effects. We first code potential models and compare them using AICc.  
 + 
 +<file rsplus| AICc approach>​ 
 +mpl2 <- update(mpl1,​ . ~ . - rack) # model without rack 
 +mpl3 <- update(mpl1,​ . ~ . - status) # model without status 
 +mpl4 <- update(mpl1,​ . ~ . - amd:​nutrient) # without amd:​nutrient interaction 
 +ICtab(mpl1,​mpl2,​mpl3,​mpl4,​ type = c("​AICc"​)) 
 +#     dAICc df 
 +# mpl1  0.0  10 
 +# mpl4  1.4  9  
 +# mpl3  1.5  8  
 +# mpl2 55.0  9  
 +</​file>​ 
 + 
 +Alternatively,​ we can use the functions ''​drop1''​ and ''​dfun'',​ where dfun converts the AIC values returned by the drop1 into ΔAIC values (producing a similar table to ''​ICtab''​ above). 
 + 
 +<file rsplus| Frequentist approach (drop1) > 
 +(dd_LRT <- drop1(mpl1,​test="​Chisq"​)) 
 +# Model: 
 +#   ​total.fruits ~ nutrient * amd + rack + status + (1 | X) + (1 | popu) + (1 | gen) 
 +#              Df    AIC   ​LRT ​  ​Pr(Chi) ​    
 +# <​none> ​         5015.4 ​                     
 +# rack          1 5070.5 57.083 4.179e-14 *** 
 +# status ​       2 5017.0 ​ 5.612   ​0.06044 .   
 +# nutrient:​amd ​ 1 5016.8 ​ 3.444   ​0.06349 .   
 +# --- 
 +#   ​Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +(dd_AIC <- dfun(drop1(mpl1))) 
 +#               ​Df ​  ​dAIC 
 +# <​none> ​          ​0.000 
 +# rack          1 55.083 
 +# status ​       2  1.612 
 +# nutrient:​amd ​ 1  1.444 
 +</​file>​ 
 + 
 +There is a strong effect of rack (a change in AIC of 55 if we remove this variable), whereas the effects of status and of the interaction term are weak (change in AIC when either is dropped is less than 2). We can thus start by removing the non-signifi  
 +cant interaction term so that we can test the main e 
 +ffects of nutrient and clipping. 
 + 
 +<file rsplus| Dropping interaction term> 
 +mpl2 <- update(mpl1,​ . ~ . - and:​nutrient) 
 + 
 +# Use AICc: 
 +mpl3 <- update(mpl2,​ . ~ . - rack) # model without rack or interaction 
 +mpl4 <- update(mpl2,​ . ~ . - status) # model without status or interaction 
 +mpl5 <- update(mpl2,​ . ~ . - nutrient) # without nutrient or interaction 
 +mpl6 <- update(mpl2,​ . ~ . - amd) # without clipping or interaction 
 +ICtab(mpl2,​mpl3,​mpl4,​mpl5,​mpl6,​ type = c("​AICc"​)) 
 +#     dAICc df 
 +# mpl2   0.0 9  
 +# mpl4   1.2 7  
 +# mpl6  10.2 8  
 +# mpl3  54.2 8  
 +# mpl5 135.6 8  
 + 
 +# Or use drop1: 
 +(dd_LRT2 <- drop1(mpl2,​test="​Chisq"​)) 
 +# Model: 
 +#   ​total.fruits ~ nutrient + amd + rack + status + (1 | X) + (1 | popu) + (1 | gen) 
 +#            Df    AIC     ​LRT ​  ​Pr(Chi) ​    
 +#   <​none> ​     5016.8 ​                      
 +#   ​nutrient ​ 1 5152.5 137.688 < 2.2e-16 *** 
 +#   ​amd ​      1 5027.0 ​ 12.218 0.0004734 *** 
 +#   ​rack ​     1 5071.0 ​ 56.231 6.443e-14 *** 
 +#   ​status ​   2 5018.1 ​  5.286 0.0711639 .   
 +# --- 
 +#   ​Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +(dd_AIC2 <- dfun(drop1(mpl2))) 
 +#           ​Df ​   dAIC 
 +# <​none> ​       0.000 
 +# nutrient ​ 1 135.688 
 +# amd       ​1 ​ 10.218 
 +# rack      1  54.231 
 +# status ​   2   ​1.286 
 + 
 +summary(mpl2) 
 +</​file>​ 
 + 
 +Both the main effects of nutrient and clipping are strong (large change in AIC of 135.6 (mpl5) and 10.2 (mpl6) if either nutrient or clipping are dropped, respectively). Our fi  
 +nal model includes the   
 +fixed nutrient (strong positive) and clipping (negative) effects, the nuisance variables rack, the observation-level random e 
 +ffects (1 | X) to account for over-dispersion and the variation in overall fruit set at the //​population//​ and //​genotype//​ levels. 
 + 
 + 
 +** CHALLENGE 10 ** 
 + 
 +Using the //inverts// dataset (larval development times (PLD) of 74 marine invertebrate and vertebrate species reared at different temperatures and time), answer the following questions:​ 
 + 
 +  - What is the effect of feeding type and climate (fixed effects) on PLD? 
 +  - Does this relationship vary among taxa (random effects)? 
 +  - What is the best distribution family for this count data? 
 +  - Finally, once you determined the best distribution family, re-evaluate your random and fixed effects. 
 + 
 +++++ Challenge: Solution 3| 
 +<file rsplus | Effect of feeding type and temperature on PLD> 
 +# Load data 
 +inverts <- read.csv("​inverts.csv"​) 
 +str(inverts) 
 + 
 +mlm1 <- lm(PLD ~ temp*feeding.type,​ data=inverts) 
 +anova(mlm1) # all variables are significant 
 +</​file>​ 
 +++++ 
 + 
 +++++ Challenge: Solution 2| 
 +<file rsplus | Relationships across taxa > 
 + 
 +# Response vs fixed effects 
 +ggplot(inverts,​aes(x=temp,​y=log(PLD+1),​colour=feeding.type)) + 
 +  geom_point() + 
 +  stat_summary(aes(x=as.numeric(temp)),​fun.y=mean,​geom="​line"​) + 
 +  theme_bw() + 
 +  scale_color_manual(values=c("#​3B9AB2","#​F21A00"​)) + # from Wes Anderson Zissou palette 
 +  facet_wrap(~taxon) 
 + 
 +# Create new variables that represents every combination feeding type x temp x taxa (random effect) 
 +inverts <- within(inverts,​ 
 +
 +  # taxon x feeding.type 
 +  tft <- interaction(taxon,​feeding.type,​temp) 
 +  tft <- reorder(tft,​ PLD, mean) 
 +}) 
 + 
 +# Boxplot of total fruits vs new variable (feeding type x temp x taxa) 
 +ggplot(data = inverts, aes(factor(x = tft),y = log(PLD))) + 
 +  geom_boxplot(colour = "​skyblue2",​ outlier.shape = 21, outlier.colour = "​skyblue2"​) +  
 +  theme_bw() + theme(axis.text.x=element_text(angle=90)) +  
 +  stat_summary(fun.y=mean,​ geom="​point",​ colour = "​red"​)  
 +</​file>​ 
 +++++ 
 + 
 +++++ Challenge: Solution 3| 
 +<file rsplus | Comparison of distribution families>​ 
 + 
 +grpVars <- tapply(inverts$PLD,​ inverts$tft,​ var) 
 +summary(grpVars) 
 + 
 +grpMeans <- tapply(inverts$PLD,​inverts$tft,​ mean) 
 +summary(grpMeans) 
 + 
 +# Quasi-Poisson 
 +lm1 <- lm(grpVars~grpMeans-1)  
 +phi.fit <- coef(lm1) 
 +# The -1 specifies a model with the intercept set to zero 
 + 
 +# Negative binomial 
 +lm2 <- lm(grpVars ~ I(grpMeans^2) + offset(grpMeans)-1) 
 +k.fit <- 1/​coef(lm2) 
 +# Offset() used to specify that we want group means added as a term with its coefficient fixed to 1 
 + 
 +# Non-parametric loess fit 
 +Lfit <- loess(grpVars~grpMeans) 
 + 
 +plot(grpVars ~ grpMeans, xlab="​group means",​ ylab="​group variances"​ ) 
 +abline(a=0,​b=1,​ lty=2) 
 +text(60,​200,"​Poisson"​) 
 +curve(phi.fit*x,​ col=2,​add=TRUE) 
 +# bquote() is used to substitute numeric values in equations with symbols 
 +text(60,​800,​ 
 +     ​bquote(paste("​QP:​ ",​sigma^2==.(round(phi.fit,​1))*mu)),​col=2) 
 +curve(x*(1+x/​k.fit),​col=4,​add=TRUE) 
 +text(60,​1600,​paste("​NB:​ k=",​round(k.fit,​1),​sep=""​),​col=4) 
 +mvec <- 0:120 
 +lines(mvec,​predict(Lfit,​mvec),​col=5) 
 +text(50,​1300,"​loess",​col=5) 
 + 
 +# Poisson GLMM  
 +mp1 <- glmer(PLD ~ temp*feeding.type + 
 +               ​(1|taxon),​ 
 +             ​data=inverts,​ family="​poisson"​) 
 +overdisp_fun(mp1) 
 +# ratio is significantly >1 
 + 
 +# NB GLMM 
 +mnb1 <- glmer.nb(PLD ~ temp*feeding.type + 
 +                   ​(1|taxon),​ 
 +                 ​data=inverts) 
 +overdisp_fun(mnb1) 
 +# Looks good! 
 +</​file>​ 
 +++++ 
 + 
 +++++ Challenge: Solution 4| 
 +<file rsplus | Re-evaluating random and fixed effects > 
 + 
 +# Re-evaluating random intercepts 
 +summary(mnb1)$varcor 
 + 
 +mnb1.taxless <- glm.nb(PLD ~ temp*feeding.type,​ 
 +                       ​data=inverts) 
 +# Here, because we are comparing a glmer with a glm, we must do something different than calling anova() 
 +# to test importance of random intercept. We will compare the likelihood of each model: 
 +NL1 <- -logLik(mnb1) 
 +NL0 <- -logLik(mnb1.taxless) 
 +devdiff <- 2*(NL0-NL1) 
 +dfdiff <- attr(NL1,"​df"​)-attr(NL0,"​df"​) 
 +pchisq(devdiff,​dfdiff,​lower.tail=FALSE) 
 + 
 +# Could also look at dAIC to compare model with (mnb1) and without (mnb1.taxless) random effects 
 +# Using AICtab() function 
 +AICtab(mnb1,​mnb1.taxless)  
 +# Large change in AIC if drop random intercept. Therefore, worth keeping this in. 
 + 
 +# Diagnostic plots  
 +locscaleplot(mnb1) 
 + 
 +# Plotting variance terms  
 +coefplot2(mnb1,​ptype="​vcov",​intercept=TRUE,​main="​Random effect variance"​) 
 + 
 +# Plot of fixed effects  
 +coefplot2(mnb1,​intercept=TRUE,​main="​Fixed effect coefficient"​) 
 + 
 +# Plotting random intercepts  
 +pp <- list(layout.widths=list(left.padding=0,​ right.padding=0)) 
 +r2 <- ranef(mnb1,​condVar=TRUE) 
 +d2 <- dotplot(r2, par.settings=pp) 
 +grid.arrange(d2$taxon,​nrow=1) 
 + 
 +# Evaluate random slopes 
 +mnb2 <- glmer.nb(PLD ~ temp*feeding.type + 
 +                   ​(PLD|taxon),​ 
 +                 ​data=inverts) 
 +                  
 +# View variance-covariance components 
 +summary(mnb2) # option 1 
 +attr(VarCorr(mnb2)$taxon,"​correlation"​) # option 2 
 +printvc(mnb2) # option 3 
 +# Strong correlation between random effects -> not enough power to test random slopes 
 + 
 +# Re-evaluating fixed effects  
 +# Note: to run the drop1 we need to speficy the theta parameter and run the NB model with glmer: 
 +theta.mnb1 <- theta.md(inverts$PLD,​ fitted(mnb1),​ dfr = df.residual(mnb1)) 
 +mnb1 <- glmer(PLD ~ temp*feeding.type + 
 +                (1|taxon),​ 
 +              data=inverts,​ family=negative.binomial(theta=theta.mnb1)) 
 + 
 +(dd_LRT <- drop1(mnb1,​test="​Chisq"​)) 
 +(dd_AIC <- dfun(drop1(mnb1))) 
 +# dAIC when feeding.type x temp interaction is dropped is greater than 2, suggest to keep  
 +# interaction in model 
 +</​file>​ 
 +++++ 
 + 
 +=====Additional resources===== 
 + 
 +===LMM Textbooks===
 Gelman, A., and Hill, J. (2006). Data analysis using regression and multilevel/​hierarchical models (Cambridge University Press). Gelman, A., and Hill, J. (2006). Data analysis using regression and multilevel/​hierarchical models (Cambridge University Press).
  
 Zuur, A., Ieno, E.N., Walker, N., Saveliev, A.A., and Smith, G.M. (2009). Mixed effects models and extensions in ecology with R (Springer). Zuur, A., Ieno, E.N., Walker, N., Saveliev, A.A., and Smith, G.M. (2009). Mixed effects models and extensions in ecology with R (Springer).
  
 +===GLMM resources===
 +**Books**
 +
 +B. Bolker (2009) [[http://​ms.mcmaster.ca/​~bolker/​emdbook/​|Ecological Models and Data in R]]. Princeton University Press.\\
 +A. Zuur et al. (2009) [[http://​link.springer.com/​book/​10.1007/​978-0-387-87458-6|Mixed Effects Models and Extensions in Ecology with R]]. Springer.\\
 +
 +**Articles **
 +
 +Harrison, X. A., L. Donaldson, M. E. Correa-Cano,​ J. Evans, D. N. Fisher, C. E. D. Goodwin, B. S. Robinson, D. J. Hodgson, and R. Inger. 2018. [[https://​peerj.com/​preprints/​3113/​|A brief introduction to mixed effects modelling and multi-model inference in ecology]]. ​ PeerJ 6:​e4794–32.
 +
 +**Websites**
 +
 +[[http://​glmm.wikidot.com/​|GLMM for ecologists]]:​ A great website on GLMM with a Q&A section!