QCBS R Workshops
This series of 8 workshops walks participants through the steps required to use R for a wide array of statistical analyses relevant to research in biology and ecology. These open-access workshops were created by members of the QCBS both for members of the QCBS and the larger community.
Workshop 8: Programming in R
Developed by: Johanna Bradie, Sylvain Christin, Ben Haller, Guillaume Larocque
Link to associated Prezi: POST LINK HERE
Download the R script and data for this lesson:
Summary: This workshop focuses on basic programming in R. In this workshop, you will learn how to use for loops, write your own functions and run simulations in R. In addition, you will learn to use data.table to work quickly with large datasets and learn tips to program efficiently. The last part of the workshop will discuss code optimization, as well as parallel and multi-threaded computing.
Learning Objectives
- Flow control
- Writing functions in R
- Speeding up your code
- Useful packages for biologists
Flow Control
Flow control allows you to run the same series of commands multiple times and subject to specified conditions. In this section, you will learn how to:
- Execute statements conditionally using: if, if/else
- Execute statements multiple times using: for loops, while loops, repeat loops
- Modify loop execution using: break statements, next statements
if and if/else statements
if and if/else statements are good for:
- checking for problems or violations of assumptions
- treating different rows of your data frame differently
- testing for the existence of a file or variable
Syntax
if(condition) { expression} # The expression can be any command that you would like R to perform. if(condition) { expression } else { expression }
For example,
if (2+2)==4 { print("Arithmetic works.") } if (2+1)==4 { print("Arithmetic works.") }
Curly brackets { } are used so that R knows to expect more input. When using brackets, R waits to evaluate the command until the brackets have been closed. If the curly brackets are not used, R may not behave as you are expecting. For example, try:
if (2+1)==4 print("Arithmetic works.") else print("Houston, we have a problem.")
The else statement doesn't work because R evaluates the first line without knowing your command is incomplete.
Instead, use:
if (2+2)==4 { print("Arithmetic works.") # R does not evaluate this expression yet because the bracket isn't closed. } else { print("Houston, we have a problem.") } # Since all brackets are now closed, R will evaluate the commands.
Note that if and if/else test a single condition. If you want to test a vector of conditions (and get a vector of results), you can use ifelse:
For example,
a<-1:10 ifelse(a>5,"yes","no")
You can also use ifelse in a function to apply a function only under certain conditions:
For example,
a<-(-4):5 sqrt(ifelse(a>=0,a,NA))
Exercise
Paws<-"cat" Scruffy<-"dog" Sassy<-cat animals<-c(Paws,Scruffy,Sassy)
1. Use an if statement to print “meow” if Paws is a “cat”.
2. Use an if/else statement to print “woof” if you supply an object that is a “dog” and “meow” if it is not. Try it out with Paws and Scruffy.
3. Use an ifelse statement to display “woof” for animals that are dogs and “meow” for animals that are cats.
Loops
Loops are good for:
- doing something for every element of an object
- doing something until the processed data runs out
- doing something for every file in a folder
- doing something that can fail, until it succeeds
- iterating a calculation until it converges
for loops
The for loop is the most common type of loop. Use a for loop to execute a block of code a known number of times.
Syntax
for (variable in sequence) { expression }
Each time the series of commands in a loop are executed, it is known as an iteration.
For example:
for (i in 1:5) { print(i) }
In this example, R will evaluate the expression 5 times. In the first iteration, R will replace each instance of i with 1. In the second iteration i would be replaced with 2, and so on.
The letter 'i' can be replaced with any letter and the sequence can be almost anything.
Try:
for (m in 4:10) { print(m*2) } for (a in c("Hello","R","Programmers")) { print(a) } for (z in 1:30) { a<-rnorm(n=1,mean=5,sd=2) # draw a value from a normal distribution with mean 5 and sd 2 print(a) }
Loops are often used to loop over a dataset. We will use loops to perform functions on a CO2 dataset.
CO2<-read.csv("co2_data.csv") for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset print(CO2$conc[i]) #print the CO2 concentration } for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset if(CO2$Type[i]=="Quebec") { # if the type is "Quebec" print(CO2$conc[i]) #print the CO2 concentration } } }
The expression part of the loop can be almost anything and is usually a compound statement containing many commands.
for (i in 4:5) { # for i in 4 to 5 print(colnames(CO2[i])) print(mean(CO2[,i])) #print the mean of that column from the CO2 dataset }
Note that this could be done more quickly using apply(), but that wouldn't teach you about loops.
while loops and repeat loops operate similarly to for loops. Once you understand how for loops work, you should be able to use any type of loop. You will see some examples of while loops and repeat loops in the next section.
Exercise
NEED TO WRITE IN AN EXERCISE
Loop Modifications
Normally, loops iterate over and over until they finish. To change this behavior, you can use break which breaks out of the loops execution entirely, or next, which stops executing the current iteration and jumps to the next iteration.
For example,
# Print the CO2 concentrations for "chilled" treatments and keep count of how many replications there were. count=0 for (i in 1:length(CO2[,1])) { if (CO2$Treatment[i]=="nonchilled") next #Skip to next iteration if treatment is nonchilled count=count+1 print(CO2$conc[i]) } print(count) # The count and print command were performed 42 times. # This could be equivalently written using a repeat loop: count=0 i=0 repeat { i <- i + 1 if (CO2$Treatment[i]=="nonchilled") next # skip this loop count=count+1 print(CO2$conc[i]) if (i == length(CO2[,1])) break # stop looping } print(count) ### This could also be written using a while loop: i <- 0 count=0 while (i < length(CO2[,1])) { i <- i + 1 if (CO2$Treatment[i]=="nonchilled") next # skip this loop count=count+1 print(CO2$conc[i]) } print(count)
Exercise
NEED TO WRITE IN AN EXERCISE
Using flow control to make a complex plot
The idea here is that we have a dataset we want to plot, with concentration and uptake values, but each point has a type (Quebec or Mississippi) and a treatment (“chilled” or “nonchilled”) and we want to plot the points differently for these cases.
You can read more about mathematical typesetting with ?plotmath, and more about the way # that different colors, sizes, rotations, etc. are used in ?par.
head(CO2) # Look at the dataset unique(CO2$Type) unique(CO2$Treatment) # plot the dataset, showing each type and treatment as a different colour plot(x=CO2$conc, y=CO2$uptake, type="n", cex.lab=1.4,xlab="CO2 concentration", ylab="CO2 uptake") # Type "n" tells R to not actually plot the points. for (i in 1:length(CO2[,1])) { if (CO2$Type[i]=="Quebec"&CO2$Treatment[i]=="nonchilled") { points(CO2$conc[i],CO2$uptake[i],col="red",type="p") } if (CO2$Type[i]=="Quebec"&CO2$Treatment[i]=="chilled") { points(CO2$conc[i],CO2$uptake[i],col="blue") } if (CO2$Type[i]=="Mississippi"&CO2$Treatment[i]=="nonchilled") { points(CO2$conc[i],CO2$uptake[i],col="orange") } if (CO2$Type[i]=="Mississippi"&CO2$Treatment[i]=="chilled") { points(CO2$conc[i],CO2$uptake[i],col="green") } }
Exercise
NEED TO WRITE IN AN EXERCISE