Programming basics

R is easiest to use when you know how the R language works. This tutorial will teach you the implicit background knowledge that informs every piece of R code.
Author

Welcome to R

In this tutorial, you’ll learn about:

  • functions and their arguments
  • objects
  • R’s basic data types
  • R’s basic data structures including vectors and lists
  • R’s package system

Functions

Run a function

Can you use the sqrt() function in the chunk below to compute the square root of 962?

Solution
sqrt(962)
sqrt(962)
Examine the function code

Use the code chunk below to examine the code that sqrt() runs.

Solution
sqrt
sqrt
sqrt vs. lm

Compare the code in sqrt() to the code in another R function, lm(). Examine lm()’s code body in the chunk below.

Solution
lm
lm
Help pages

Wow! lm() runs a lot of code. What does it do? Open the help page for lm() in the chunk below and find out.

Solution
?lm
?lm
Code comments

What do you think the chunk below will return? Run it and see. The result should be only 10. R will not run anything on a line after a # symbol. This is useful because it lets you write human readable comments in your code: just place the comments after a #. Now delete the # and re-run the chunk. You should see both results now.

Solution
sqrt(962) sqrt(100)
sqrt(962)
sqrt(100)

Arguments

Find function arguments

rnorm() is a function that generates random variables from a normal distribution. Find the arguments of rnorm().

Solution
args(rnorm)
args(rnorm)
Optional arguments

Which arguments of rnorm are optional?




Using multiple arguments

Use rnrom() to generate 100 random normal values with a mean of 100 and a standard deviation of 15.

Solution
rnorm(100, mean = 100, sd = 15)
rnorm(100, mean = 100, sd = 15)
Spot the flaw

Can you spot the error in the code below? Fix the code and then re-run it.

Solution
rnorm(100, mean = 100, sd = 15)
rnorm(100, mean = 100, sd = 15)

Objects

You can choose almost any name you like for an object, as long as the name does not begin with a number or a special character like +, -, *, /, ^, !, @, or &.

Object names

Which of these would be valid object names?







Using objects

In the code chunk below, save the results of rnorm(100, mean = 100, sd = 15) to an object named data. Then, on a new line, call the hist() function on data to plot a histogram of the random values.

Solution
data <- rnorm(100, mean = 100, sd = 15) hist(data)
data <- rnorm(100, mean = 100, sd = 15)
hist(data)
What if?

What do you think would happen if you assigned data to a new object named copy, like this? Run the code and then inspect both data and copy.

Solution
data <- rnorm(100, mean = 100, sd = 15) copy <- data data copy
data <- rnorm(100, mean = 100, sd = 15)
copy <- data
data
copy
Data sets

Objects provide an easy way to store data sets in R. In fact, R comes with many toy data sets pre-loaded. Examine the contents of iris to see a classic toy data set. Hint: how could you learn more about the iris object?

Solution
iris
iris
rm()

What if you accidentally overwrite an object? If that object came with R or one of its packages, you can restore the original version of the object by removing your version with rm(). Run rm() on iris below to restore the iris data set, examining the contents of iris afterwards to make sure it was restored.

Solution
iris <- 1 iris rm(iris) iris
iris <- 1
iris
rm(iris)
iris

Vectors

Create a vector

In the chunk below, create a vector that contains the integers from one to ten.

Solution
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
The : shortcut

If your vector contains a sequence of contiguous integers, you can create it with the : shortcut. Run 1:10 in the chunk below. What do you get? What do you suppose 1:20 would return?

Solution
1:10
1:10
Element extraction

You can extract any element of a vector by placing a pair of brackets behind the vector. Inside the brackets place the number of the element that you’d like to extract. For example, vec[3] would return the third element of the vector named vec. Use the chunk below to extract the fourth element of vec.

Solution
vec <- c(1, 2, 4, 8, 16) vec[4]
vec <- c(1, 2, 4, 8, 16)
vec[4]
Vector subsetting

You can also use [] to extract multiple elements of a vector. Place the vector c(1,2,5) between the brackets below. What does R return?

Solution
vec <- c(1, 2, 4, 8, 16) vec[c(1,2,5)]
vec <- c(1, 2, 4, 8, 16)
vec[c(1,2,5)]
Names

If the elements of your vector have names, you can extract them by name. To do so place a name or vector of names in the brackets behind a vector. Surround each name with quotation marks, e.g. vec2[c("alpha", "beta")].

Extract the element named gamma from the vector below.

Solution
vec2 <- c(alpha = 1, beta = 2, gamma = 3) vec2["gamma"]
vec2 <- c(alpha = 1, beta = 2, gamma = 3)
vec2["gamma"]
Vectorised operations

Predict what the code below will return. Then look at the result.

Solution
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Vector recycling

Predict what the code below will return. Then look at the result.

Solution
1 + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
1 + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Types

Atomic types

Which of these is not an atomic data type?








What type?

What type of data is "1L"?





Integers

Create a vector of integers from one to five. Can you imagine why you might want to use integers instead of numbers/doubles?

Solution
c(1L, 2L, 3L, 4L, 5L)
c(1L, 2L, 3L, 4L, 5L)
Floating point arithmetic

Computers must use a finite amount of memory to store decimal numbers (which can sometimes require infinite precision). As a result, some decimals can only be saved as very precise approximations. From time to time you’ll notice side effects of this imprecision, like below.

Compute the square root of two, square the answer (e.g. multiply the square root of two by the square root of two), and then subtract two from the result. What answer do you expect? What answer do you get?

Solution
sqrt(2) * sqrt(2) - 2 # sqrt(2)^2 - 2 will also work!
sqrt(2) * sqrt(2) - 2
# sqrt(2)^2 - 2 will also work!
Vectors

How many types of data can you put into a single vector?




Character or object?

One of the most common mistakes in R is to call an object when you mean to call a character string and vice versa.

Which of these are object names? What is the difference between object names and character strings?







Lists

Lists vs. vectors

Which data structure(s) could you use to store these pieces of data in the same object? 1001, TRUE, "stories".




Make a list

Make a list that contains the elements 1001, TRUE, and "stories". Give each element a name.

Solution
list(number = 1001, logical = TRUE, string = "stories")
list(number = 1001, logical = TRUE, string = "stories")
Extract an element

Extract the number 1001 from the list below.

Solution
things <- list(number = 1001, logical = TRUE, string = "stories") things$number
things <- list(number = 1001, logical = TRUE, string = "stories")
things$number
Data Frames

You can make a data frame with the data.frame() function, which works similar to c(), and list(). Assemble the vectors below into a data frame with the column names numbers, logicals, strings.

Solution
nums <- c(1, 2, 3, 4) logs <- c(TRUE, TRUE, FALSE, TRUE) strs <- c("apple", "banana", "carrot", "duck") data.frame(numbers = nums, logicals = logs, strings = strs)
nums <- c(1, 2, 3, 4)
logs <- c(TRUE, TRUE, FALSE, TRUE)
strs <- c("apple", "banana", "carrot", "duck")
data.frame(numbers = nums, logicals = logs, strings = strs)
Extract a column

Given that a data frame is a type of list (with named elements), how could you extract the strings column of the following df data frame?

nums <- c(1, 2, 3, 4)
logs <- c(TRUE, TRUE, FALSE, TRUE)
strs <- c("apple", "banana", "carrot", "duck")
df <- data.frame(numbers = nums, logicals = logs, strings = strs)

Extract the strings column below.

Solution
df$strings
df$strings

Packages

A common error

What does this common error message suggest? object _____ does not exist.




Load a package

In the code chunk below, load the tidyverse package. Whenever you load a package R will also load all of the packages that the first package depends on. tidyverse takes advantage of this to create a shortcut for loading several common packages at once. Whenever you load tidyverse, tidyverse also loads ggplot2, dplyr, tibble, tidyr, readr, and purrr.

Solution
library(tidyverse)
library(tidyverse)

Did you know, library() is a special function in R? You can pass library() a package name in quotes, like library("tidyverse"), or not in quotes, like library(tidyverse)—both will work! That’s often not the case with R functions.

In general, you should always use quotes unless you are writing the name of something that is already loaded into R’s memory, like a function, vector, or data frame.

Install packages

But what if the package that you want to load is not installed on your computer? How would you install the dplyr package on your own computer?

Solution
install.packages("dplyr")
install.packages("dplyr")

Congratulations. You now have a formal sense for how the basics of R work. Although you may think of your self as a Data Scientist, this brief Computer Science background will help you as you analyze data. Whenever R does something unexpected, you can apply your knowledge of how R works to figure out what went wrong.