# 1 Starting out in R

R is both a programming language and an interactive environment for data exploration and statistics. Today we will be concentrating on R as an interactive environment.

Working with R is primarily text-based. The basic mode of use for R is that the user types in a command in the R language and presses enter, and then R computes and displays the result.

We will be working in RStudio. The easiest way to get started is to go to RStudio Cloud and create a new project. Monash staff and students can log in using their Monash Google account.

The main way of working with R is the console, where you enter commands and view results. RStudio surrounds this with various conveniences. In addition to the console panel, RStudio provides panels containing:

• A text editor, where R commands can be recorded for future reference.
• A history of commands that have been typed on the console.
• An “environment” pane with a list of variables, which contain values that R has been told to save from previous commands.
• A file manager.
• Help on the functions available in R.
• A panel to show plots. Open RStudio, click on the “Console” pane, type `1+1` and press enter. R displays the result of the calculation. In this document, we will show such an interaction with R as below.

``1+1``
``##  2``

`+` is called an operator. R has the operators you would expect for for basic mathematics: `+` `-` `*` `/` `^`. It also has operators that do more obscure things.

`*` has higher precedence than `+`. We can use brackets if necessary `( )`. Try `1+2*3` and `(1+2)*3`.

Spaces can be used to make code easier to read.

We can compare with `== < > <= >= !=`. This produces a logical value, `TRUE` or `FALSE`. Note the double equals, `==`, for equality comparison.

``2 * 2 == 4``
``##  TRUE``

There are also character strings such as `"string"`. A character string must be surrounded by either single or double quotes.

## 1.1 Variables

A variable is a name for a value. We can create a new variable by assigning a value to it using `<-`.

``width <- 5``

RStudio helpfully shows us the variable in the “Environment” pane. We can also print it by typing the name of the variable and hitting enter. In general, R will print to the console any object returned by a function or operation unless we assign it to a variable.

``width``
``##  5``

Examples of valid variables names: `hello`, `subject_id`, `subject.ID`, `x42`. Spaces aren’t ok inside variable names. Dots (`.`) are ok in R, unlike in many other languages. Numbers are ok, except as the first character. Punctuation is not allowed, with two exceptions: `_` and `.`.

We can do arithmetic with the variable:

``````# Area of a square
width * width``````
``##  25``

and even save the result in another variable:

``````# Save area in "area" variable
area <- width * width``````

We can also change a variable’s value by assigning it a new value:

``````width <- 10
width``````
``##  10``
``area``
``##  25``

Notice that the value of `area` we calculated earlier hasn’t been updated. Assigning a new value to one variable does not change the values of other variables. This is different to a spreadsheet, but usual for programming languages.

## 1.2 Saving code in an R script

Once we’ve created a few variables, it becomes important to record how they were calculated so we can reproduce them later.

The usual workflow is to save your code in an R script (“.R file”). Go to “File/New File/R Script” to create a new R script. Code in your R script can be sent to the console by selecting it or placing the cursor on the correct line, and then pressing Control-Enter (Command-Enter on a Mac).

### Tip

Add comments to code, using lines starting with the `#` character. This makes it easier for others to follow what the code is doing (and also for us the next time we come back to it).

### Challenge: using variables

1. Re-write this calculation so that it doesn’t use variables:
``````a <- 4*20
b <- 7
a+b``````
1. Re-write this calcuation over multiple lines, using a variable:
``2*2+2*2+2*2``

## 1.3 Vectors

A vector of numbers is a collection of numbers. “Vector” means different things in different fields (mathematics, geometry, biology), but in R it is a fancy name for a collection of numbers. We call the individual numbers elements of the vector.

We can make vectors with `c( )`, for example `c(1,2,3)`. `c` means “combine”. R is obsesssed with vectors, in R even single numbers are vectors of length one. Many things that can be done with a single number can also be done with a vector. For example arithmetic can be done on vectors as it can be on single numbers.

``````myvec <- c(10,20,30,40,50)
myvec``````
``##  10 20 30 40 50``
``myvec + 1``
``##  11 21 31 41 51``
``myvec + myvec``
``##   20  40  60  80 100``
``length(myvec)``
``##  5``
``c(60, myvec)``
``##  60 10 20 30 40 50``
``c(myvec, myvec)``
``##   10 20 30 40 50 10 20 30 40 50``

When we talk about the length of a vector, we are talking about the number of numbers in the vector.

## 1.4 Types of vector

We will also encounter vectors of character strings, for example `"hello"` or `c("hello","world")`. Also we will encounter “logical” vectors, which contain `TRUE` and `FALSE` values. R also has “factors”, which are categorical vectors, and behave much like character vectors (think the factors in an experiment).

### Challenge: mixing types

Sometimes the best way to understand R is to try some examples and see what it does.

What happens when you try to make a vector containing different types, using `c( )`? Make a vector with some numbers, and some words (eg. character strings like `"test"`, or `"hello"`).

Why does the output show the numbers surrounded by quotes `" "` like character strings are?

Because vectors can only contain one type of thing, R chooses a lowest common denominator type of vector, a type that can contain everything we are trying to put in it. A different language might stop with an error, but R tries to soldier on as best it can. A number can be represented as a character string, but a character string can not be represented as a number, so when we try to put both in the same vector R converts everything to a character string.

## 1.5 Indexing vectors

Access elements of a vector with `[ ]`, for example `myvec` to get the first element. You can also assign to a specific element of a vector.

``myvec``
``##  10``
``myvec``
``##  20``
``````myvec <- 5
myvec``````
``##  10  5 30 40 50``

Can we use a vector to index another vector? Yes!

``````myind <- c(4,3,2)
myvec[myind]``````
``##  40 30  5``

We could equivalently have written:

``myvec[c(4,3,2)]``
``##  40 30  5``

### Challenge: indexing

We can create and index character vectors as well. A cafe is using R to create their menu.

``items <- c("spam", "eggs", "beans", "bacon", "sausage")``
1. What does `items[-3]` produce? Based on what you find, use indexing to create a version of `items` without `"spam"`.

2. Use indexing to create a vector containing spam, eggs, sausage, spam, and spam.

3. Add a new item, “lobster”, to `items`.

## 1.6 Sequences

Another way to create a vector is with `:`:

``1:10``
``##    1  2  3  4  5  6  7  8  9 10``

This can be useful when combined with indexing:

``items[1:4]``
``##  "spam"  "eggs"  "beans" "bacon"``

Sequences are useful for other things, such as a starting point for calculations:

``````x <- 1:10
x*x``````
``##     1   4   9  16  25  36  49  64  81 100``
``plot(x, x*x)`` ## 1.7 Functions

Functions are the things that do all the work for us in R: calculate, manipulate data, read and write to files, produce plots. R has many built in functions and we will also be loading more specialized functions from “packages”.

We’ve already seen several functions: `c( )`, `length( )`, and `plot( )`. Let’s now have a look at `sum( )`.

``sum(myvec)``
``##  135``

We called the function `sum` with the argument `myvec`, and it returned the value 135. We can get help on how to use `sum` with:

``?sum``

Some functions take more than one argument. Let’s look at the function `rep`, which means “repeat”, and which can take a variety of different arguments. In the simplest case, it takes a value and the number of times to repeat that value.

``rep(42, 10)``
``##   42 42 42 42 42 42 42 42 42 42``

As with many functions in R—which is obsessed with vectors—the thing to be repeated can be a vector with multiple elements.

``rep(c(1,2,3), 10)``
``##   1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3``

So far we have used positional arguments, where R determines which argument is which by the order in which they are given. We can also give arguments by name. For example, the above is equivalent to

``rep(c(1,2,3), times=10)``
``##   1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3``
``rep(x=c(1,2,3), 10)``
``##   1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3``
``rep(times=10, x=c(1,2,3))``
``##   1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3``

Arguments can have default values, and a function may have many different possible arguments that make it do obscure things. For example, `rep` can also take an argument `each=`. It’s typical for a function to be invoked with some number of positional arguments, which are always given, plus some less commonly used arguments, typically given by name.

``rep(c(1,2,3), each=3)``
``##  1 1 1 2 2 2 3 3 3``
``rep(c(1,2,3), each=3, times=5)``
``````##   1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1
##  1 2 2 2 3 3 3``````

### Challenge: using functions

1. Use `sum` to sum from 1 to 10,000.

2. Look at the documentation for the `seq` function. What does `seq` do? Give an example of using `seq` with either the `by` or `length.out` argument.