Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
r_workshop2 [2019/08/08 17:48]
mariehbrice [Workshop 2: Loading and manipulating data]
r_workshop2 [2019/09/16 13:29] (current)
fgabriel1891 [Keep your files organized]
Line 11: Line 11:
 Developed by: Johanna Bradie, Vincent Fugère, Thomas Lamy Developed by: Johanna Bradie, Vincent Fugère, Thomas Lamy
  
-**Summary:​** In this workshop, you will learn how to load, view, and manipulate your data in R. You will learn basic commands to inspect and visualize your data, and learn how to fix errors that may have occurred while loading your data into R. In addition, you will learn how to write an R script, which is a text file that contains your R commands and allows you to rerun your analyses in one simple touch of a key (or maybe two, or three…)! We will then introduce tidyr and dplyr, two powerful tools to manage and re-format your dataset, as well as apply simple or complex functions on subsets of your data. This workshop will be useful for those progressing through the entire workshop series, but also for those who already have some experience in R and would like to become proficient with new tools and packages.+**Summary:​** In this workshop, you will learn how to load, view, and manipulate your data in R. You will learn basic commands to inspect and visualize your data, and learn how to fix errors that may have occurred while loading your data into R. In addition, you will learn how to write an R script, which is a text file that contains your R commands and allows you to rerun your analyses in one simple touch of a key (or maybe two, or three…)! We have included an advance users section where we will introduce tidyr and dplyr, two powerful tools to manage and re-format your dataset, as well as apply simple or complex functions on subsets of your data. This workshop will be useful for those progressing through the entire workshop series, but also for those who already have some experience in R and would like to become proficient with new tools and packages.
  
 **Link to new [[https://​qcbsrworkshops.github.io/​workshop02/​workshop02-en/​workshop02-en.html|Rmarkdown presentation]]** **Link to new [[https://​qcbsrworkshops.github.io/​workshop02/​workshop02-en/​workshop02-en.html|Rmarkdown presentation]]**
  
-Link to old [[http://​prezi.com/​wg4rggjfqucv/​qcbs-r-workshop-2/​|Prezi presentation]] 
  
 Download the R script and data for this lesson: ​ Download the R script and data for this lesson: ​
Line 24: Line 23:
 ===== Learning Objectives ===== ===== Learning Objectives =====
  
-  - Creating an R project +1. Creating an R project 
-  ​- ​Writing a script + 
-  ​- ​Loading, exploring and saving data +2. Writing a script 
-  ​- ​Learn to manipulate data frames with tidyr, dplyr, maggritr+ 
 +3. Loading, exploring and saving data 
 + 
 +(For advanced users) 
 + 
 +4. Learn to manipulate data frames with tidyr, dplyr, maggritr
  
 ===== RStudio Projects ===== ===== RStudio Projects =====
  
 What is this?  What is this? 
-  - Within RStudio, ​Projects make it easy to separate and keep your work organized. ​  +  - Projects make it easy to keep your work organized. ​  
-  - All files, scripts, documentation related to a specific project are bound together+  - All files, scripts, documentation related to a specific project are bound together ​with a .Rproj file
  
-Encourages reproducibility and easy sharing+Encourages reproducibility and easy sharing
  
 ===== Create a new project ===== ===== Create a new project =====
Line 46: Line 50:
  
 One project = one folder One project = one folder
 +
 +Place similar files inside of their own folders ​
 +
 +Keep track of versions
  
 {{:​0_folderdata1.png?​400|}} {{:​0_folderdata1.png?​400|}}
Line 55: Line 63:
   * file -> save as .csv   * file -> save as .csv
  
-====Choose file names wisely====+====Naming files====
   * Good:    * Good: 
     * rawDatasetAgo2017.csv     * rawDatasetAgo2017.csv
Line 66: Line 74:
     * Dont.separate.names.with.dots.csv //(Can lead to reading file errors!)//     * Dont.separate.names.with.dots.csv //(Can lead to reading file errors!)//
  
-====Choose variable names wisely====+====Naming variables====
   * Use short informative titles (i.e. "​Time_1"​ not "First time measurement"​)   * Use short informative titles (i.e. "​Time_1"​ not "First time measurement"​)
     * Good: "​Measurements",​ "​SpeciesNames",​ "​Site"​     * Good: "​Measurements",​ "​SpeciesNames",​ "​Site"​
Line 74: Line 82:
  
  
-====Things to consider with your data====+====Common ​data preparation mistakes==== 
   * No text in numeric columns   * No text in numeric columns
   * Do not include spaces!   * Do not include spaces!
Line 89: Line 98:
  
 {{:​excel_notes.png|}} {{:​excel_notes.png|}}
-{{:​horribledata.png|}}+
  
 It is possible to do all your data preparation work within R. This has several benefits: ​ It is possible to do all your data preparation work within R. This has several benefits: ​
Line 251: Line 260:
   * Factors loaded as text (character) and vice versa   * Factors loaded as text (character) and vice versa
   * Factors including too many levels because of a typo   * Factors including too many levels because of a typo
-  * Numeric or integer data being loaded as character due to a typo (including ​space or using a comma instead of a "​."​ for a decimal)+  * Numeric or integer data being loaded as character due to a typo (including space or using a comma instead of a "​."​ for a decimal)
  
 **Exercise** ​ **Exercise** ​
Line 423: Line 432:
 Let's practice how to solve some common errors. ​ Let's practice how to solve some common errors. ​
  
-==== Fix a broken dataframe ​CHALLENGE ​====+==== Fix a broken dataframe ====
  
 # Read co2_broken.csv file into R and find the problems # Read co2_broken.csv file into R and find the problems
Line 453: Line 462:
 **HINT: There are 4 problems!** **HINT: There are 4 problems!**
  
-Answers: 
- 
-Answer #1 
-<​hidden>​ 
 Problem #1: The data appears to be lumped into one column Problem #1: The data appears to be lumped into one column
  
 Solution: Solution:
-<​hidden>​+
 Re-import the data, but specify the separation among entries. Re-import the data, but specify the separation among entries.
 The sep argument tells R what character separates the values on each line of the file. The sep argument tells R what character separates the values on each line of the file.
Line 468: Line 473:
 ?read.csv ?read.csv
 </​code>​ </​code>​
-</​hidden> ​ 
-</​hidden>​ 
  
-Answer #2 +
-<​hidden>​+
 Problem #2: The data does not start until the third line of the txt file, so you end up with notes on the file as the headings. Problem #2: The data does not start until the third line of the txt file, so you end up with notes on the file as the headings.
 <code rsplus | > <code rsplus | >
Line 479: Line 481:
  
 Solution: Solution:
-<​hidden>​+
 To fix this problem, you can tell R to skip the first two rows when reading in this file. To fix this problem, you can tell R to skip the first two rows when reading in this file.
 <code rsplus | > <code rsplus | >
Line 485: Line 487:
 head(CO2) # You can now see that the CO2 object has the appropriate headings head(CO2) # You can now see that the CO2 object has the appropriate headings
 </​code>​ </​code>​
-</​hidden>​ 
-</​hidden>​ 
  
 Answer #3 Answer #3
-<​hidden>​+
 Problem #3: "​conc"​ and "​uptake"​ variables are considered factors instead of numbers, because there are comments/​text in the numeric columns. Problem #3: "​conc"​ and "​uptake"​ variables are considered factors instead of numbers, because there are comments/​text in the numeric columns.
 <code rsplus | > <code rsplus | >
Line 500: Line 500:
  
 Solution: Solution:
-<​hidden>​+
 <code rsplus | > <code rsplus | >
 ?read.csv ?read.csv
Line 519: Line 519:
 str(CO2) # You can see that conc variable is now an integer and the uptake variable is now treated as numeric str(CO2) # You can see that conc variable is now an integer and the uptake variable is now treated as numeric
 </​code>​ </​code>​
-</​hidden>​ 
-</​hidden>​ 
  
-Answer #4 + 
-<​hidden>​+
 Problem #4: There are only two treatments (chilled and nonchilled) but there are spelling errors causing it to look like 4 different treatments. Problem #4: There are only two treatments (chilled and nonchilled) but there are spelling errors causing it to look like 4 different treatments.
 <code rsplus | > <code rsplus | >
Line 532: Line 530:
  
 Solution: Solution:
-<​hidden>​ 
 <code rsplus | > <code rsplus | >
 # You can use which() to find rows with the typo "​nnchilled"​ # You can use which() to find rows with the typo "​nnchilled"​
Line 552: Line 549:
 str(CO2) # Fixed! str(CO2) # Fixed!
 </​code>​ </​code>​
-</​hidden>​ +--- 
-</​hidden>​+ 
 + 
 +=====Advanced users section===== 
 + 
  
----- 
  
 =====Learn to manipulate data with tidyr, dyplr, maggritr===== =====Learn to manipulate data with tidyr, dyplr, maggritr=====