Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
r_workshop2 [2018/10/02 15:52]
katherinehebert [Create and populate columns with ''mutate()'']
r_workshop2 [2021/10/13 16:03] (current)
lsherin
Line 1: Line 1:
 +<WRAP group>
 +<WRAP centeralign>​
 +<WRAP important>​
 +<wrap em> __MAJOR UPDATE__ </​wrap> ​
 +
 +<wrap em> As of Fall 2021, this wiki has been discontinued and is no longer being actively developed. </​wrap> ​
 +
 +<wrap em> All updated materials and announcements for the QCBS R Workshop Series are now housed on the [[https://​r.qcbs.ca/​workshops/​r-workshop-02/​|QCBS R Workshop website]]. Please update your bookmarks accordingly to avoid outdated material and/or broken links. </​wrap>​
 +
 +<wrap em> Thank you for your understanding,​ </​wrap>​
 +
 +<wrap em> Your QCBS R Workshop Coordinators. </​wrap>​
 +
 +</​WRAP>​
 +</​WRAP>​
 +<WRAP clear></​WRAP>​
 +
 ======= QCBS R Workshops ======= ======= QCBS R Workshops =======
  
Line 11: Line 28:
 Developed by: Johanna Bradie, Vincent Fugère, Thomas Lamy Developed by: Johanna Bradie, Vincent Fugère, Thomas Lamy
  
-**Summary:​** In this workshop, you will learn how to load, view, and manipulate your data in R. You will learn basic commands to inspect and visualize your data, and learn how to fix errors that may have occurred while loading your data into R. In addition, you will learn how to write an R script, which is a text file that contains your R commands and allows you to rerun your analyses in one simple touch of a key (or maybe two, or three…)! We will then introduce tidyr and dplyr, two powerful tools to manage and re-format your dataset, as well as apply simple or complex functions on subsets of your data. This workshop will be useful for those progressing through the entire workshop series, but also for those who already have some experience in R and would like to become proficient with new tools and packages.+**Summary:​** In this workshop, you will learn how to load, view, and manipulate your data in R. You will learn basic commands to inspect and visualize your data, and learn how to fix errors that may have occurred while loading your data into R. In addition, you will learn how to write an R script, which is a text file that contains your R commands and allows you to rerun your analyses in one simple touch of a key (or maybe two, or three…)! We have included an advance users section where we will introduce tidyr and dplyr, two powerful tools to manage and re-format your dataset, as well as apply simple or complex functions on subsets of your data. This workshop will be useful for those progressing through the entire workshop series, but also for those who already have some experience in R and would like to become proficient with new tools and packages. 
 + 
 +**Link to new [[https://​qcbsrworkshops.github.io/​workshop02/​workshop02-en/​workshop02-en.html|Rmarkdown presentation]]**
  
-Link to associated Prezi: [[http://​prezi.com/​wg4rggjfqucv/​qcbs-r-workshop-2/​|Prezi]] 
  
 Download the R script and data for this lesson: ​ Download the R script and data for this lesson: ​
-  - [[http://​qcbs.ca/​wiki/​_media/​script_workshop2.R|Script]]+  - [[http://​qcbs.ca/​wiki/​_media/​script_workshop02-en.r|Script]]
   - [[http://​qcbs.ca/​wiki/​_media/​co2_good.csv|Dataset 1]]   - [[http://​qcbs.ca/​wiki/​_media/​co2_good.csv|Dataset 1]]
-  - [[http://qcbs.ca/wiki/_media/​co2_broken.csv|Dataset 2]]+  - [[https://raw.githubusercontent.com/QCBSRworkshops/workshop02/​dev/​workshop02-en/​data/​co2_broken.csv|Dataset 2]] (//After following this link, right-click on the page to save the file as .csv//).
  
 ===== Learning Objectives ===== ===== Learning Objectives =====
  
-  - Creating an R project +1. Creating an R project 
-  ​- ​Writing a script + 
-  ​- ​Loading, exploring and saving data +2. Writing a script 
-  ​- ​Learn to manipulate data frames with tidyr, dplyr, maggritr+ 
 +3. Loading, exploring and saving data 
 + 
 +(For advanced users) 
 + 
 +4. Learn to manipulate data frames with tidyr, dplyr, maggritr
  
 ===== RStudio Projects ===== ===== RStudio Projects =====
  
 What is this?  What is this? 
-  - Within RStudio, ​Projects make it easy to separate and keep your work organized. ​  +  - Projects make it easy to keep your work organized. ​  
-  - All files, scripts, documentation related to a specific project are bound together+  - All files, scripts, documentation related to a specific project are bound together ​with a .Rproj file
  
-Encourages reproducibility and easy sharing+Encourages reproducibility and easy sharing
  
 ===== Create a new project ===== ===== Create a new project =====
Line 44: Line 67:
  
 One project = one folder One project = one folder
 +
 +Place similar files inside of their own folders ​
 +
 +Keep track of versions
  
 {{:​0_folderdata1.png?​400|}} {{:​0_folderdata1.png?​400|}}
Line 53: Line 80:
   * file -> save as .csv   * file -> save as .csv
  
-====Choose file names wisely====+====Naming files====
   * Good:    * Good: 
     * rawDatasetAgo2017.csv     * rawDatasetAgo2017.csv
Line 64: Line 91:
     * Dont.separate.names.with.dots.csv //(Can lead to reading file errors!)//     * Dont.separate.names.with.dots.csv //(Can lead to reading file errors!)//
  
-====Choose variable names wisely====+====Naming variables====
   * Use short informative titles (i.e. "​Time_1"​ not "First time measurement"​)   * Use short informative titles (i.e. "​Time_1"​ not "First time measurement"​)
     * Good: "​Measurements",​ "​SpeciesNames",​ "​Site"​     * Good: "​Measurements",​ "​SpeciesNames",​ "​Site"​
Line 72: Line 99:
  
  
-====Things to consider with your data====+====Common ​data preparation mistakes==== 
   * No text in numeric columns   * No text in numeric columns
   * Do not include spaces!   * Do not include spaces!
Line 87: Line 115:
  
 {{:​excel_notes.png|}} {{:​excel_notes.png|}}
-{{:​horribledata.png|}}+
  
 It is possible to do all your data preparation work within R. This has several benefits: ​ It is possible to do all your data preparation work within R. This has several benefits: ​
Line 249: Line 277:
   * Factors loaded as text (character) and vice versa   * Factors loaded as text (character) and vice versa
   * Factors including too many levels because of a typo   * Factors including too many levels because of a typo
-  * Numeric or integer data being loaded as character due to a typo (including ​space or using a comma instead of a "​."​ for a decimal)+  * Numeric or integer data being loaded as character due to a typo (including space or using a comma instead of a "​."​ for a decimal)
  
 **Exercise** ​ **Exercise** ​
Line 421: Line 449:
 Let's practice how to solve some common errors. ​ Let's practice how to solve some common errors. ​
  
-==== Fix a broken dataframe ​CHALLENGE ​====+==== Fix a broken dataframe ====
  
 # Read co2_broken.csv file into R and find the problems # Read co2_broken.csv file into R and find the problems
Line 451: Line 479:
 **HINT: There are 4 problems!** **HINT: There are 4 problems!**
  
-Answers: 
- 
-Answer #1 
-<​hidden>​ 
 Problem #1: The data appears to be lumped into one column Problem #1: The data appears to be lumped into one column
  
 Solution: Solution:
-<​hidden>​+
 Re-import the data, but specify the separation among entries. Re-import the data, but specify the separation among entries.
 The sep argument tells R what character separates the values on each line of the file. The sep argument tells R what character separates the values on each line of the file.
Line 466: Line 490:
 ?read.csv ?read.csv
 </​code>​ </​code>​
-</​hidden> ​ 
-</​hidden>​ 
  
-Answer #2 +
-<​hidden>​+
 Problem #2: The data does not start until the third line of the txt file, so you end up with notes on the file as the headings. Problem #2: The data does not start until the third line of the txt file, so you end up with notes on the file as the headings.
 <code rsplus | > <code rsplus | >
Line 477: Line 498:
  
 Solution: Solution:
-<​hidden>​+
 To fix this problem, you can tell R to skip the first two rows when reading in this file. To fix this problem, you can tell R to skip the first two rows when reading in this file.
 <code rsplus | > <code rsplus | >
Line 483: Line 504:
 head(CO2) # You can now see that the CO2 object has the appropriate headings head(CO2) # You can now see that the CO2 object has the appropriate headings
 </​code>​ </​code>​
-</​hidden>​ 
-</​hidden>​ 
  
 Answer #3 Answer #3
-<​hidden>​+
 Problem #3: "​conc"​ and "​uptake"​ variables are considered factors instead of numbers, because there are comments/​text in the numeric columns. Problem #3: "​conc"​ and "​uptake"​ variables are considered factors instead of numbers, because there are comments/​text in the numeric columns.
 <code rsplus | > <code rsplus | >
Line 498: Line 517:
  
 Solution: Solution:
-<​hidden>​+
 <code rsplus | > <code rsplus | >
 ?read.csv ?read.csv
Line 517: Line 536:
 str(CO2) # You can see that conc variable is now an integer and the uptake variable is now treated as numeric str(CO2) # You can see that conc variable is now an integer and the uptake variable is now treated as numeric
 </​code>​ </​code>​
-</​hidden>​ 
-</​hidden>​ 
  
-Answer #4 + 
-<​hidden>​+
 Problem #4: There are only two treatments (chilled and nonchilled) but there are spelling errors causing it to look like 4 different treatments. Problem #4: There are only two treatments (chilled and nonchilled) but there are spelling errors causing it to look like 4 different treatments.
 <code rsplus | > <code rsplus | >
Line 530: Line 547:
  
 Solution: Solution:
-<​hidden>​ 
 <code rsplus | > <code rsplus | >
 # You can use which() to find rows with the typo "​nnchilled"​ # You can use which() to find rows with the typo "​nnchilled"​
Line 550: Line 566:
 str(CO2) # Fixed! str(CO2) # Fixed!
 </​code>​ </​code>​
-</​hidden>​ +--- 
-</​hidden>​+ 
 + 
 +=====Advanced users section===== 
 + 
  
----- 
  
 =====Learn to manipulate data with tidyr, dyplr, maggritr===== =====Learn to manipulate data with tidyr, dyplr, maggritr=====
Line 1003: Line 1022:
 ---- ----
  
-==== Ninja hint ====+===== Ninja hint =====
  
 Note that we can group the data frame using more than one factor, using the general syntax as follows: ''​group_by(group1,​ group2, ...)''​ Note that we can group the data frame using more than one factor, using the general syntax as follows: ''​group_by(group1,​ group2, ...)''​
Line 1009: Line 1028:
 Within ''​group_by()'',​ the multiple groups create a layered onion, and each subsequent single use of the ''​summarise()''​ function peels off the outer layer of the onion. In the above example, after we carried out a summary operation on ''​group2'',​ the resulting data set would remain grouped by ''​group1''​ for downstream operations. Within ''​group_by()'',​ the multiple groups create a layered onion, and each subsequent single use of the ''​summarise()''​ function peels off the outer layer of the onion. In the above example, after we carried out a summary operation on ''​group2'',​ the resulting data set would remain grouped by ''​group1''​ for downstream operations.
  
-==== dplyr & magrittr NINJA CHALLENGE ====+===== dplyr & magrittr NINJA CHALLENGE ​=====
  
 //Using the ''​ChickWeight''​ dataset, create a summary table which displays, for each diet, the average individual difference in weight between the end and the beginning of the study. Employ ''​dplyr''​ verbs and the ''​%>​%''​ operator. (Hint: ''​first()''​ and ''​last()''​ may be useful here.)// //Using the ''​ChickWeight''​ dataset, create a summary table which displays, for each diet, the average individual difference in weight between the end and the beginning of the study. Employ ''​dplyr''​ verbs and the ''​%>​%''​ operator. (Hint: ''​first()''​ and ''​last()''​ may be useful here.)//
Line 1035: Line 1054:
 ---- ----
  
-==== dplyr - Merging data frames ====+===== dplyr - Merging data frames ​=====
  
 In addition to all the operations we have explored, ''​dplyr''​ also provides some functions that allow you to join two data frames together. The syntax in these functions is simple relative to alternatives in other ''​R''​ packages: In addition to all the operations we have explored, ''​dplyr''​ also provides some functions that allow you to join two data frames together. The syntax in these functions is simple relative to alternatives in other ''​R''​ packages:
Line 1046: Line 1065:
 These are beyond the scope of the current introductory workshop, but they provide extremely useful functionality you may eventually require for some more advanced data manipulation needs. These are beyond the scope of the current introductory workshop, but they provide extremely useful functionality you may eventually require for some more advanced data manipulation needs.
  
-====More on data manipulation====+=====More ​resources ​on data manipulation=====
   * [[https://​www.rstudio.com/​wp-content/​uploads/​2015/​02/​data-wrangling-cheatsheet.pdf|The RStudio Data Wrangling Cheat Sheet]]   * [[https://​www.rstudio.com/​wp-content/​uploads/​2015/​02/​data-wrangling-cheatsheet.pdf|The RStudio Data Wrangling Cheat Sheet]]
   * [[http://​r4ds.had.co.nz/​transform.html|Learn more about ''​dplyr''​]]   * [[http://​r4ds.had.co.nz/​transform.html|Learn more about ''​dplyr''​]]