1 Starting out in R

R is both a programming language and an interactive environment for data exploration and statistics. Today we will be concentrating on R as an interactive environment.

Working with R is primarily text-based. The basic mode of use for R is that the user types in a command in the R language and presses enter, and then R computes and displays the result.

We will be working in RStudio. The easiest way to get started is to go to RStudio Cloud and create a new project. Monash staff and students can log in using their Monash Google account.

The main way of working with R is the console, where you enter commands and view results. RStudio surrounds this with various conveniences. In addition to the console panel, RStudio provides panels containing:

  • A text editor, where R commands can be recorded for future reference.
  • A history of commands that have been typed on the console.
  • An “environment” pane with a list of variables, which contain values that R has been told to save from previous commands.
  • A file manager.
  • Help on the functions available in R.
  • A panel to show plots.

Open RStudio, click on the “Console” pane, type 1+1 and press enter. R displays the result of the calculation. In this document, we will show such an interaction with R as below.

1+1
## [1] 2

+ is called an operator. R has the operators you would expect for for basic mathematics: + - * / ^. It also has operators that do more obscure things.

* has higher precedence than +. We can use brackets if necessary ( ). Try 1+2*3 and (1+2)*3.

Spaces can be used to make code easier to read.

We can compare with == < > <= >= !=. This produces a logical value, TRUE or FALSE. Note the double equals, ==, for equality comparison.

2 * 2 == 4
## [1] TRUE

There are also character strings such as "string". A character string must be surrounded by either single or double quotes.

1.1 Variables

A variable is a name for a value. We can create a new variable by assigning a value to it using <-.

width <- 5

RStudio helpfully shows us the variable in the “Environment” pane. We can also print it by typing the name of the variable and hitting enter. In general, R will print to the console any object returned by a function or operation unless we assign it to a variable.

width
## [1] 5

Examples of valid variables names: hello, subject_id, subject.ID, x42. Spaces aren’t ok inside variable names. Dots (.) are ok in R, unlike in many other languages. Numbers are ok, except as the first character. Punctuation is not allowed, with two exceptions: _ and ..

We can do arithmetic with the variable:

# Area of a square
width * width
## [1] 25

and even save the result in another variable:

# Save area in "area" variable
area <- width * width

We can also change a variable’s value by assigning it a new value:

width <- 10
width
## [1] 10
area
## [1] 25

Notice that the value of area we calculated earlier hasn’t been updated. Assigning a new value to one variable does not change the values of other variables. This is different to a spreadsheet, but usual for programming languages.

1.2 Saving code in an R script

Once we’ve created a few variables, it becomes important to record how they were calculated so we can reproduce them later.

The usual workflow is to save your code in an R script (“.R file”). Go to “File/New File/R Script” to create a new R script. Code in your R script can be sent to the console by selecting it or placing the cursor on the correct line, and then pressing Control-Enter (Command-Enter on a Mac).

Tip

Add comments to code, using lines starting with the # character. This makes it easier for others to follow what the code is doing (and also for us the next time we come back to it).

Challenge: using variables

  1. Re-write this calculation so that it doesn’t use variables:
a <- 4*20
b <- 7
a+b
  1. Re-write this calcuation over multiple lines, using a variable:
2*2+2*2+2*2

1.3 Vectors

A vector of numbers is a collection of numbers. “Vector” means different things in different fields (mathematics, geometry, biology), but in R it is a fancy name for a collection of numbers. We call the individual numbers elements of the vector.

We can make vectors with c( ), for example c(1,2,3). c means “combine”. R is obsesssed with vectors, in R even single numbers are vectors of length one. Many things that can be done with a single number can also be done with a vector. For example arithmetic can be done on vectors as it can be on single numbers.

myvec <- c(10,20,30,40,50)
myvec
## [1] 10 20 30 40 50
myvec + 1
## [1] 11 21 31 41 51
myvec + myvec
## [1]  20  40  60  80 100
length(myvec)
## [1] 5
c(60, myvec)
## [1] 60 10 20 30 40 50
c(myvec, myvec)
##  [1] 10 20 30 40 50 10 20 30 40 50

When we talk about the length of a vector, we are talking about the number of numbers in the vector.

1.4 Types of vector

We will also encounter vectors of character strings, for example "hello" or c("hello","world"). Also we will encounter “logical” vectors, which contain TRUE and FALSE values. R also has “factors”, which are categorical vectors, and behave much like character vectors (think the factors in an experiment).

Challenge: mixing types

Sometimes the best way to understand R is to try some examples and see what it does.

What happens when you try to make a vector containing different types, using c( )? Make a vector with some numbers, and some words (eg. character strings like "test", or "hello").

Why does the output show the numbers surrounded by quotes " " like character strings are?

Because vectors can only contain one type of thing, R chooses a lowest common denominator type of vector, a type that can contain everything we are trying to put in it. A different language might stop with an error, but R tries to soldier on as best it can. A number can be represented as a character string, but a character string can not be represented as a number, so when we try to put both in the same vector R converts everything to a character string.

1.5 Indexing vectors

Access elements of a vector with [ ], for example myvec[1] to get the first element. You can also assign to a specific element of a vector.

myvec[1]
## [1] 10
myvec[2]
## [1] 20
myvec[2] <- 5
myvec
## [1] 10  5 30 40 50

Can we use a vector to index another vector? Yes!

myind <- c(4,3,2)
myvec[myind]
## [1] 40 30  5

We could equivalently have written:

myvec[c(4,3,2)]
## [1] 40 30  5

Challenge: indexing

We can create and index character vectors as well. A cafe is using R to create their menu.

items <- c("spam", "eggs", "beans", "bacon", "sausage")
  1. What does items[-3] produce? Based on what you find, use indexing to create a version of items without "spam".

  2. Use indexing to create a vector containing spam, eggs, sausage, spam, and spam.

  3. Add a new item, “lobster”, to items.

1.6 Sequences

Another way to create a vector is with ::

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

This can be useful when combined with indexing:

items[1:4]
## [1] "spam"  "eggs"  "beans" "bacon"

Sequences are useful for other things, such as a starting point for calculations:

x <- 1:10
x*x
##  [1]   1   4   9  16  25  36  49  64  81 100
plot(x, x*x)

1.7 Functions

Functions are the things that do all the work for us in R: calculate, manipulate data, read and write to files, produce plots. R has many built in functions and we will also be loading more specialized functions from “packages”.

We’ve already seen several functions: c( ), length( ), and plot( ). Let’s now have a look at sum( ).

sum(myvec)
## [1] 135

We called the function sum with the argument myvec, and it returned the value 135. We can get help on how to use sum with:

?sum

Some functions take more than one argument. Let’s look at the function rep, which means “repeat”, and which can take a variety of different arguments. In the simplest case, it takes a value and the number of times to repeat that value.

rep(42, 10)
##  [1] 42 42 42 42 42 42 42 42 42 42

As with many functions in R—which is obsessed with vectors—the thing to be repeated can be a vector with multiple elements.

rep(c(1,2,3), 10)
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

So far we have used positional arguments, where R determines which argument is which by the order in which they are given. We can also give arguments by name. For example, the above is equivalent to

rep(c(1,2,3), times=10)
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
rep(x=c(1,2,3), 10)
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
rep(times=10, x=c(1,2,3))
##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Arguments can have default values, and a function may have many different possible arguments that make it do obscure things. For example, rep can also take an argument each=. It’s typical for a function to be invoked with some number of positional arguments, which are always given, plus some less commonly used arguments, typically given by name.

rep(c(1,2,3), each=3)
## [1] 1 1 1 2 2 2 3 3 3
rep(c(1,2,3), each=3, times=5)
##  [1] 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1
## [39] 1 2 2 2 3 3 3

Challenge: using functions

  1. Use sum to sum from 1 to 10,000.

  2. Look at the documentation for the seq function. What does seq do? Give an example of using seq with either the by or length.out argument.