Differences

This shows you the differences between two versions of the page.

--- r_workshop2 [2018/10/02 15:52]
katherinehebert [Create and populate columns with ''mutate()'']
+++ r_workshop2 [2021/10/13 16:03] (current)
lsherin
@@ Line 1: / Line 1: @@
+<WRAP group>
+<WRAP centeralign>
+<WRAP important>
+<wrap em> __MAJOR UPDATE__ </wrap>
+<wrap em> As of Fall 2021, this wiki has been discontinued and is no longer being actively developed. </wrap>
+<wrap em> All updated materials and announcements for the QCBS R Workshop Series are now housed on the [[https://r.qcbs.ca/workshops/r-workshop-02/|QCBS R Workshop website]]. Please update your bookmarks accordingly to avoid outdated material and/or broken links. </wrap>
+<wrap em> Thank you for your understanding, </wrap>
+<wrap em> Your QCBS R Workshop Coordinators. </wrap>
+</WRAP>
+</WRAP>
+<WRAP clear></WRAP>
 ======= QCBS R Workshops =======
@@ Line 11: / Line 28: @@
 Developed by: Johanna Bradie, Vincent Fugère, Thomas Lamy
-**Summary:** In this workshop, you will learn how to load, view, and manipulate your data in R. You will learn basic commands to inspect and visualize your data, and learn how to fix errors that may have occurred while loading your data into R. In addition, you will learn how to write an R script, which is a text file that contains your R commands and allows you to rerun your analyses in one simple touch of a key (or maybe two, or three…)! We will then introduce tidyr and dplyr, two powerful tools to manage and re-format your dataset, as well as apply simple or complex functions on subsets of your data. This workshop will be useful for those progressing through the entire workshop series, but also for those who already have some experience in R and would like to become proficient with new tools and packages.
+**Summary:** In this workshop, you will learn how to load, view, and manipulate your data in R. You will learn basic commands to inspect and visualize your data, and learn how to fix errors that may have occurred while loading your data into R. In addition, you will learn how to write an R script, which is a text file that contains your R commands and allows you to rerun your analyses in one simple touch of a key (or maybe two, or three…)! We have included an advance users section where we will introduce tidyr and dplyr, two powerful tools to manage and re-format your dataset, as well as apply simple or complex functions on subsets of your data. This workshop will be useful for those progressing through the entire workshop series, but also for those who already have some experience in R and would like to become proficient with new tools and packages.
+**Link to new [[https://qcbsrworkshops.github.io/workshop02/workshop02-en/workshop02-en.html|Rmarkdown presentation]]**
-Link to associated Prezi: [[http://prezi.com/wg4rggjfqucv/qcbs-r-workshop-2/|Prezi]]
 Download the R script and data for this lesson:
-  - [[http://qcbs.ca/wiki/_media/script_workshop2.R|Script]]
+  - [[http://qcbs.ca/wiki/_media/script_workshop02-en.r|Script]]
   - [[http://qcbs.ca/wiki/_media/co2_good.csv|Dataset 1]]
-  - [[http://qcbs.ca/wiki/_media/co2_broken.csv|Dataset 2]]
+  - [[https://raw.githubusercontent.com/QCBSRworkshops/workshop02/dev/workshop02-en/data/co2_broken.csv|Dataset 2]] (//After following this link, right-click on the page to save the file as .csv//).
 ===== Learning Objectives =====
-  - Creating an R project
+. Creating an R project
-  - Writing a script
-  - Loading, exploring and saving data
+. Writing a script
-  - Learn to manipulate data frames with tidyr, dplyr, maggritr
+. Loading, exploring and saving data
+(For advanced users)
+. Learn to manipulate data frames with tidyr, dplyr, maggritr
 ===== RStudio Projects =====
 What is this?
-  - Within RStudio, Projects make it easy to separate and keep your work organized.
+  - Projects make it easy to keep your work organized.
-  - All files, scripts, documentation related to a specific project are bound together
+  - All files, scripts, documentation related to a specific project are bound together with a .Rproj file
-Encourages reproducibility and easy sharing.
+Encourages reproducibility and easy sharing
 ===== Create a new project =====
@@ Line 44: / Line 67: @@
 One project = one folder
+Place similar files inside of their own folders
+Keep track of versions
 {{:0_folderdata1.png?400|}}
@@ Line 53: / Line 80: @@
   * file -> save as .csv
-====Choose file names wisely====
+====Naming files====
   * Good:
     * rawDatasetAgo2017.csv
@@ Line 64: / Line 91: @@
     * Dont.separate.names.with.dots.csv //(Can lead to reading file errors!)//
-====Choose variable names wisely====
+====Naming variables====
   * Use short informative titles (i.e. "Time_1" not "First time measurement")
     * Good: "Measurements", "SpeciesNames", "Site"
@@ Line 72: / Line 99: @@
-====Things to consider with your data====
+====Common data preparation mistakes====
   * No text in numeric columns
   * Do not include spaces!
@@ Line 87: / Line 115: @@
 {{:excel_notes.png|}}
-{{:horribledata.png|}}
 It is possible to do all your data preparation work within R. This has several benefits:
@@ Line 249: / Line 277: @@
   * Factors loaded as text (character) and vice versa
   * Factors including too many levels because of a typo
-  * Numeric or integer data being loaded as character due to a typo (including a space or using a comma instead of a "." for a decimal)
+  * Numeric or integer data being loaded as a character due to a typo (including space or using a comma instead of a "." for a decimal)
 **Exercise**
@@ Line 421: / Line 449: @@
 Let's practice how to solve some common errors.
-==== Fix a broken dataframe CHALLENGE ====
+==== Fix a broken dataframe ====
 # Read co2_broken.csv file into R and find the problems
@@ Line 451: / Line 479: @@
 **HINT: There are 4 problems!**
-Answers:
-Answer #1
-<hidden>
 Problem #1: The data appears to be lumped into one column
 Solution:
-<hidden>
 Re-import the data, but specify the separation among entries.
 The sep argument tells R what character separates the values on each line of the file.
@@ Line 466: / Line 490: @@
 ?read.csv
 </code>
-</hidden>
-</hidden>
-Answer #2
-<hidden>
 Problem #2: The data does not start until the third line of the txt file, so you end up with notes on the file as the headings.
 <code rsplus | >
@@ Line 477: / Line 498: @@
 Solution:
-<hidden>
 To fix this problem, you can tell R to skip the first two rows when reading in this file.
 <code rsplus | >
@@ Line 483: / Line 504: @@
 head(CO2) # You can now see that the CO2 object has the appropriate headings
 </code>
-</hidden>
-</hidden>
 Answer #3
-<hidden>
 Problem #3: "conc" and "uptake" variables are considered factors instead of numbers, because there are comments/text in the numeric columns.
 <code rsplus | >
@@ Line 498: / Line 517: @@
 Solution:
-<hidden>
 <code rsplus | >
 ?read.csv
@@ Line 517: / Line 536: @@
 str(CO2) # You can see that conc variable is now an integer and the uptake variable is now treated as numeric
 </code>
-</hidden>
-</hidden>
-Answer #4
-<hidden>
 Problem #4: There are only two treatments (chilled and nonchilled) but there are spelling errors causing it to look like 4 different treatments.
 <code rsplus | >
@@ Line 530: / Line 547: @@
 Solution:
-<hidden>
 <code rsplus | >
 # You can use which() to find rows with the typo "nnchilled"
@@ Line 550: / Line 566: @@
 str(CO2) # Fixed!
 </code>
-</hidden>
+---
-</hidden>
+=====Advanced users section=====
-----
 =====Learn to manipulate data with tidyr, dyplr, maggritr=====
@@ Line 1003: / Line 1022: @@
 ----
-==== Ninja hint ====
+===== Ninja hint =====
 Note that we can group the data frame using more than one factor, using the general syntax as follows: ''group_by(group1, group2, ...)''
@@ Line 1009: / Line 1028: @@
 Within ''group_by()'', the multiple groups create a layered onion, and each subsequent single use of the ''summarise()'' function peels off the outer layer of the onion. In the above example, after we carried out a summary operation on ''group2'', the resulting data set would remain grouped by ''group1'' for downstream operations.
-==== dplyr & magrittr NINJA CHALLENGE ====
+===== dplyr & magrittr NINJA CHALLENGE =====
 //Using the ''ChickWeight'' dataset, create a summary table which displays, for each diet, the average individual difference in weight between the end and the beginning of the study. Employ ''dplyr'' verbs and the ''%>%'' operator. (Hint: ''first()'' and ''last()'' may be useful here.)//
@@ Line 1035: / Line 1054: @@
 ----
-==== dplyr - Merging data frames ====
+===== dplyr - Merging data frames =====
 In addition to all the operations we have explored, ''dplyr'' also provides some functions that allow you to join two data frames together. The syntax in these functions is simple relative to alternatives in other ''R'' packages:
@@ Line 1046: / Line 1065: @@
 These are beyond the scope of the current introductory workshop, but they provide extremely useful functionality you may eventually require for some more advanced data manipulation needs.
-====More on data manipulation====
+=====More resources on data manipulation=====
   * [[https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf|The RStudio Data Wrangling Cheat Sheet]]
   * [[http://r4ds.had.co.nz/transform.html|Learn more about ''dplyr'']]