This is an old revision of the document!


QCBS R Workshops

This series of 8 workshops walks participants through the steps required to use R for a wide array of statistical analyses relevant to research in biology and ecology. These open-access workshops were created by members of the QCBS both for members of the QCBS and the larger community.

Workshop 8: Programming in R

Developed by: Johanna Bradie, Sylvain Christin, Ben Haller, Guillaume Larocque

Link to associated Prezi: POST LINK HERE

Download the R script and data for this lesson:

CO2 Dataset

Summary: This workshop focuses on basic programming in R. In this workshop, you will learn how to use for loops, write your own functions and run simulations in R. In addition, you will learn to use data.table to work quickly with large datasets and learn tips to program efficiently. The last part of the workshop will discuss code optimization, as well as parallel and multi-threaded computing.

  1. Flow control
  2. Writing functions in R
  3. Speeding up your code
  4. Useful packages for biologists

Flow control allows you to run the same series of commands multiple times and subject to specified conditions. In this section, you will learn how to:

  1. Execute statements conditionally using: if, if/else
  2. Execute statements multiple times using: for loops, while loops, repeat loops
  3. Modify loop execution using: break statements, next statements

if and if/else statements

if and if/else statements are good for:

  • checking for problems or violations of assumptions
  • treating different rows of your data frame differently
  • testing for the existence of a file or variable

Syntax

if(condition) {
expression}     # The expression can be any command that you would like R to perform.
 
if(condition) {
expression 
} else {
expression }

For example,

if (2+2)==4 {
print("Arithmetic works.") }
 
if (2+1)==4 {
print("Arithmetic works.") }

Curly brackets { } are used so that R knows to expect more input. When using brackets, R waits to evaluate the command until the brackets have been closed.

If the curly brackets are not used, R may not behave as you are expecting. For example, try:

if (2+1)==4 print("Arithmetic works.")
else print("Houston, we have a problem.")

The else statement doesn't work because R evaluates the first line without knowing your command is incomplete.

Instead, use:

if (2+2)==4 {
print("Arithmetic works.")  # R does not evaluate this expression yet because the bracket isn't closed.
} else {
print("Houston, we have a problem.")
}  # Since all brackets are now closed, R will evaluate the commands.

Loops

Loops are good for:

  • doing something for every element of an object
  • doing something until the processed data runs out
  • doing something for every file in a folder
  • doing something that can fail, until it succeeds
  • iterating a calculation until it converges

for loops

The for loop is the most common type of loop. Use a for loop to execute a block of code a known number of times.

Syntax

for (variable in sequence) {
expression
}

Each time the series of commands in a loop are executed, it is known as an iteration.

For example:

for (i in 1:5) {
print(i)
}

In this example, R will evaluate the expression 5 times. In the first iteration, R will replace each instance of i with 1. In the second iteration i would be replaced with 2, and so on.

The letter 'i' can be replaced with any letter and the sequence can be almost anything.

Try:

for (m in 4:10) {
print(m*2) 
}
 
for (a in c("Hello","R","Programmers")) {
print(a) 
}
 
for (z in 1:30) {
a<-rnorm(n=1,mean=5,sd=2) # draw a value from a normal distribution with mean 5 and sd 2
print(a)
}

Loops are often used to loop over a dataset. We will use loops to perform functions on a CO2 dataset.

CO2<-read.csv("co2_data.csv")
for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset
  print(CO2$conc[i]) #print the CO2 concentration
}
 
for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset
  if(CO2$Type[i]=="Quebec") { # if the type is "Quebec"
    print(CO2$conc[i]) #print the CO2 concentration }
    }
}

The expression part of the loop can be almost anything and is usually a compound statement containing many commands.

for (i in 4:5) { # for i in 4 to 5
  print(colnames(CO2[i]))  
  print(mean(CO2[,i])) #print the mean of that column from the CO2 dataset
}

Note that this could be done more quickly using apply(), but that wouldn't teach you about loops.