# Differences

This shows you the differences between two versions of the page.

 r_workshop4 [2018/09/26 15:39]shaun.turney [3.2 T-test] r_workshop4 [2019/08/08 17:52] (current)mariehbrice [Workshop 4: Linear models] 2019/08/08 17:52 mariehbrice [Workshop 4: Linear models] 2018/10/24 10:11 mariehbrice [Workshop 4: Linear models] 2018/10/10 19:35 katherinehebert [7.3 Polynomial regression (advanced section/ optional)] 2018/10/06 16:21 wvieira [6.1 Assumptions] 2018/10/05 15:55 katherinehebert [2.3 Normalizing data] 2018/10/05 14:57 katherinehebert [2.1 Running a linear model] 2018/10/04 13:25 katherinehebert [Workshop 4: Linear models] 2018/09/26 16:26 shaun.turney [1.5 Work flow] 2018/09/26 16:25 shaun.turney [1.5 Work flow] 2018/09/26 16:21 shaun.turney [6.1 Assumptions] 2018/09/26 16:19 shaun.turney [3.6 Complementary test] 2018/09/26 16:17 shaun.turney [1.4 Test statistics and p-values] 2018/09/26 16:11 shaun.turney [1.4 Test statistics and p-values] 2018/09/26 16:03 shaun.turney [1.5 Work flow] 2018/09/26 16:02 shaun.turney [1.4 Work flow] 2018/09/26 16:02 shaun.turney [1.4 Test statistics and p-values] 2018/09/26 15:46 shaun.turney [1.3 Linear model assumptions] 2018/09/26 15:41 shaun.turney [3.3 Running an ANOVA] 2018/09/26 15:41 shaun.turney [3.2 T-test] 2018/09/26 15:39 shaun.turney [3.2 T-test] 2018/09/26 15:37 shaun.turney [3.1 Types of ANOVA] 2018/09/26 15:36 shaun.turney [3. ANOVA] 2018/09/26 15:33 shaun.turney [2.4 Data transformation] 2018/09/26 15:31 shaun.turney [2.2 Verifying assumptions] 2018/09/26 15:27 shaun.turney [2.2 Verifying assumptions] 2018/09/26 15:27 shaun.turney [2.2 Verifying assumptions] 2018/09/26 15:26 shaun.turney [2.1 Running a linear model] 2018/09/26 15:22 shaun.turney [2.1 Running a linear model] 2018/09/26 15:21 shaun.turney [2. Simple linear regression] 2018/09/26 15:20 shaun.turney [1.2 Linear models] 2018/09/26 15:19 shaun.turney [1.3 Linear model assumptions] 2018/09/26 15:18 shaun.turney [1.2 Linear models] 2018/09/26 15:16 shaun.turney [Workshop 4: Linear models] 2018/09/25 16:46 shaun.turney [6.3 Running an ANCOVA] 2018/09/25 16:42 shaun.turney [6.1 Assumptions] 2018/09/25 16:41 shaun.turney [6.2 Types of ANCOVA] 2018/09/25 16:38 shaun.turney [6.1 Assumptions] 2018/09/25 16:38 shaun.turney [4. Two-way ANOVA] 2018/09/25 16:36 shaun.turney [3.2 T-test] 2018/09/25 16:35 shaun.turney [3.2 T-test] 2018/09/25 16:34 shaun.turney [3.4 Verifying assumptions] 2018/09/25 16:31 shaun.turney [3.2 T-test] 2018/09/25 16:29 shaun.turney [3.4 Verifying assumptions] 2018/09/25 16:27 shaun.turney [3.2 T-test] 2018/09/25 16:26 shaun.turney [3. ANOVA] 2019/08/08 17:52 mariehbrice [Workshop 4: Linear models] 2018/10/24 10:11 mariehbrice [Workshop 4: Linear models] 2018/10/10 19:35 katherinehebert [7.3 Polynomial regression (advanced section/ optional)] 2018/10/06 16:21 wvieira [6.1 Assumptions] 2018/10/05 15:55 katherinehebert [2.3 Normalizing data] 2018/10/05 14:57 katherinehebert [2.1 Running a linear model] 2018/10/04 13:25 katherinehebert [Workshop 4: Linear models] 2018/09/26 16:26 shaun.turney [1.5 Work flow] 2018/09/26 16:25 shaun.turney [1.5 Work flow] 2018/09/26 16:21 shaun.turney [6.1 Assumptions] 2018/09/26 16:19 shaun.turney [3.6 Complementary test] 2018/09/26 16:17 shaun.turney [1.4 Test statistics and p-values] 2018/09/26 16:11 shaun.turney [1.4 Test statistics and p-values] 2018/09/26 16:03 shaun.turney [1.5 Work flow] 2018/09/26 16:02 shaun.turney [1.4 Work flow] 2018/09/26 16:02 shaun.turney [1.4 Test statistics and p-values] 2018/09/26 15:46 shaun.turney [1.3 Linear model assumptions] 2018/09/26 15:41 shaun.turney [3.3 Running an ANOVA] 2018/09/26 15:41 shaun.turney [3.2 T-test] 2018/09/26 15:39 shaun.turney [3.2 T-test] 2018/09/26 15:37 shaun.turney [3.1 Types of ANOVA] 2018/09/26 15:36 shaun.turney [3. ANOVA] 2018/09/26 15:33 shaun.turney [2.4 Data transformation] 2018/09/26 15:31 shaun.turney [2.2 Verifying assumptions] 2018/09/26 15:27 shaun.turney [2.2 Verifying assumptions] 2018/09/26 15:27 shaun.turney [2.2 Verifying assumptions] 2018/09/26 15:26 shaun.turney [2.1 Running a linear model] 2018/09/26 15:22 shaun.turney [2.1 Running a linear model] 2018/09/26 15:21 shaun.turney [2. Simple linear regression] 2018/09/26 15:20 shaun.turney [1.2 Linear models] 2018/09/26 15:19 shaun.turney [1.3 Linear model assumptions] 2018/09/26 15:18 shaun.turney [1.2 Linear models] 2018/09/26 15:16 shaun.turney [Workshop 4: Linear models] 2018/09/25 16:46 shaun.turney [6.3 Running an ANCOVA] 2018/09/25 16:42 shaun.turney [6.1 Assumptions] 2018/09/25 16:41 shaun.turney [6.2 Types of ANCOVA] 2018/09/25 16:38 shaun.turney [6.1 Assumptions] 2018/09/25 16:38 shaun.turney [4. Two-way ANOVA] 2018/09/25 16:36 shaun.turney [3.2 T-test] 2018/09/25 16:35 shaun.turney [3.2 T-test] 2018/09/25 16:34 shaun.turney [3.4 Verifying assumptions] 2018/09/25 16:31 shaun.turney [3.2 T-test] 2018/09/25 16:29 shaun.turney [3.4 Verifying assumptions] 2018/09/25 16:27 shaun.turney [3.2 T-test] 2018/09/25 16:26 shaun.turney [3. ANOVA] 2018/09/25 16:22 shaun.turney [2.5 Model output] 2018/09/25 16:19 shaun.turney [2. Simple linear regression] 2018/09/25 16:18 shaun.turney [2. Simple linear regression] 2018/09/25 16:13 shaun.turney [3.9 Contrasts (advanced section/ optional)] 2018/09/25 16:13 shaun.turney [3.8 Plotting] 2018/09/25 16:13 shaun.turney [3.7 Complementary test] Line 9: Line 9: ====== Workshop 4: Linear models ====== ====== Workshop 4: Linear models ====== - Developed by: Catherine Baltazar, Bérenger Bourgeois, Zofia Taranu, Shaun Turney, ​William ​Vieira + Developed by: Catherine Baltazar, Bérenger Bourgeois, Zofia Taranu, Shaun Turney, ​Willian ​Vieira **Summary:​** In this workshop, you will learn how to implement basic linear models commonly used in ecology in R such as simple regression, analysis of variance (ANOVA), analysis of covariance (ANCOVA), and multiple regression. After verifying visually and statistically the assumptions of these models and transforming your data when necessary, the interpretation of model outputs and the plotting of your final model will no longer keep secrets from you! **Summary:​** In this workshop, you will learn how to implement basic linear models commonly used in ecology in R such as simple regression, analysis of variance (ANOVA), analysis of covariance (ANCOVA), and multiple regression. After verifying visually and statistically the assumptions of these models and transforming your data when necessary, the interpretation of model outputs and the plotting of your final model will no longer keep secrets from you! - Link to associated Prezi: [[https://​prezi.com/​qk2xegtlj44b/​|Prezi]] + **Link to new [[https://​qcbsrworkshops.github.io/​workshop04/​workshop04-en/​workshop04-en.html|Rmarkdown presentation]]** + + Link to old [[https://​prezi.com/​qk2xegtlj44b/​|Prezi ​presentation]] Download the R script and data for this lesson: Download the R script and data for this lesson: Line 94: Line 96: In the following sections, we do not always explicitly restate the above assumptions for every model. Be aware, however, that these assumption are implicit in all linear models, including all models presented below. ​ In the following sections, we do not always explicitly restate the above assumptions for every model. Be aware, however, that these assumption are implicit in all linear models, including all models presented below. ​ + ====1.4 Test statistics and p-values==== + + Once you've run your model in R, you will receive a model output that includes many numbers. It takes practice to understand what each of these numbers means and which to pay the most attention to. The model output includes the estimation of the parameters (the β variables). The output also includes test statistics. The particular test statistic depends on the linear model you are using (t is the test statistic for the linear regression and the t test, and F is the test statistic for ANOVA). ​ - ====1.4 Work flow==== + In linear models, the null hypothesis is typically that there is no relationship between two continuous variables, or that there is no difference in the levels of a categorical variable. The larger the absolute value of the test statistic, the more improbable that the null hypothesis is true. The exact probability is given in the model output and is called the p-value. You could think of the p-value as the probability that the null hypothesis is true, although that's a bit of a simplification. (Technically,​ the p-value is the probability that, given the assumption that the null hypothesis is true, the test statistic would be the same as or of greater magnitude than the actual observed test statistic.) By convention, we consider that if the p value is less than 0.05 (5%), then we reject the null hypothesis. This cut-off value is called α (alpha). If we reject the null hypothesis then we say that the alternative hypothesis is supported: there is a significant relationship or a significant difference. Note that we do not "​prove"​ hypotheses, only support or reject them. + ====1.5 Work flow==== Below we will explore several kinds of linear models. The way you create and interpret each model will differ in the specifics, but the principles behind them and the general work flow will remain the same. For each model we will work through the following steps: Below we will explore several kinds of linear models. The way you create and interpret each model will differ in the specifics, but the principles behind them and the general work flow will remain the same. For each model we will work through the following steps: - - Plot the data + - Visualize ​the data (data visualization could also come later in your work flow) - Create a model - Create a model - Test the model assumptions - Test the model assumptions Line 156: Line 162: ^ AvgAbund | The average abundance across all sites\\ where found in NA|Continuous/​ numeric| ​ ^ AvgAbund | The average abundance across all sites\\ where found in NA|Continuous/​ numeric| ​ ^ Mass     | The body size in grams| Continuous/ numeric| ^ Mass     | The body size in grams| Continuous/ numeric| - ^ Diet     | Type of food consumed| Discrete – 5 levels (Plant; PlantInsect;​\\ Insect; ​InserctVert; Vertebrate)| + ^ Diet     | Type of food consumed| Discrete – 5 levels (Plant; PlantInsect;​\\ Insect; ​InsectVert; Vertebrate)| ^ Passerine| Is it a songbird/ perching bird| Boolean (0/1)| ^ Passerine| Is it a songbird/ perching bird| Boolean (0/1)| ^ Aquatic ​ | Is it a bird that primarily lives in/ on/ next to the water| Boolean (0/1)| ^ Aquatic ​ | Is it a bird that primarily lives in/ on/ next to the water| Boolean (0/1)| Line 211: Line 217: ​ - # Plot Y ~ X and the regression line # Plot Y ~ X and the regression line # Plot Y ~ X and the regression line plot(bird$MaxAbund ~ bird$Mass, pch=19, col="​coral",​ ylab="​Maximum Abundance", ​ plot(bird$MaxAbund ~ bird$Mass, pch=19, col="​coral",​ ylab="​Maximum Abundance", ​ Line 573: Line 578: === Running a t-test with lm() === === Running a t-test with lm() === - A t-test is a linear model and a specific case of ANOVA (see below) ​with one factor with 2 levels. As such, we can also run the t-test with the ''​lm()''​ function in R: + A t-test is a linear model and a specific case of ANOVA with one factor with 2 levels. As such, we can also run the t-test with the ''​lm()''​ function in R: Line 600: Line 605: ==== 3.3 Running an ANOVA ==== ==== 3.3 Running an ANOVA ==== - The t-test is only for a single categorical explanatory variable with 2 levels. For all other variables ​with categorical explanatory variables we use ANOVA. First, let's visualize the data using ''​boxplot()''​. Recall that by default, R will order you groups in alphabetical order. We can reorder the groups according to the median of each Diet level. \\ Another way to graphically view the effect sizes is to use ''​plot.design()''​. This function will illustrate the levels of a particular factor along a vertical line, and the overall value of the response is drawn as a horizontal line. + The t-test is only for a single categorical explanatory variable with 2 levels. For all other linear models ​with categorical explanatory variables we use ANOVA. First, let's visualize the data using ''​boxplot()''​. Recall that by default, R will order you groups in alphabetical order. We can reorder the groups according to the median of each Diet level. \\ Another way to graphically view the effect sizes is to use ''​plot.design()''​. This function will illustrate the levels of a particular factor along a vertical line, and the overall value of the response is drawn as a horizontal line. Line 685: Line 690: ==== 3.6 Complementary test ==== ==== 3.6 Complementary test ==== - Importantly,​ ANOVA cannot identify which treatment is different from the others in terms of response variable. To determine ​this, post-hoc tests that compare the levels of the explanatory variables (i.e. the treatments) two by two, must be performed. While several post-hoc tests exist (e.g. Fischer’s least significant difference, Duncan’s new multiple range test, Newman-Keuls method, Dunnett’s test, etc.), the Tukey’s range test is used in this example using the function ''​TukeyHSD''​ as follows: + Importantly,​ ANOVA cannot identify which treatment is different from the others in terms of response variable. It can only identify that a difference is present. To determine ​the location of the difference(s), post-hoc tests that compare the levels of the explanatory variables (i.e. the treatments) two by two, must be performed. While several post-hoc tests exist (e.g. Fischer’s least significant difference, Duncan’s new multiple range test, Newman-Keuls method, Dunnett’s test, etc.), the Tukey’s range test is used in this example using the function ''​TukeyHSD''​ as follows: Line 1018: Line 1023: ==== 6.1 Assumptions ==== ==== 6.1 Assumptions ==== - As with models seen above, to be valid ANCOVA models must meet the statistical assumptions of linear models that can be verified using diagnostic plots, i.e.: + As with models seen above, to be valid ANCOVA models must meet the statistical assumptions of linear models that can be verified using diagnostic plots. In addition, ​ANCOVA ​models must have: - - Normal distribution of the model residuals + - - Homoscedasticty of the residual variance + - - Independence of the residuals + - - Equal variance between different levels of a given factor + - In addition, ​ANOVA models must have: + - The same value range for all covariates - The same value range for all covariates - Variables that are //fixed// - Variables that are //fixed// Line 1280: Line 1280: ---- ---- - CHALLENGE 7 + **CHALLENGE 7** + + Compare the different polynomial models in the previous example, and determine which model is the most appropriate. Extract the adjusted R squared, the regression coefficients,​ and the p-values of this chosen model. ++++ Challenge 7: Solution| ++++ Challenge 7: Solution|