## R history




The **new** S language,
published 1988,
introduced functions, ...

Statistical Models in S,
published 1991,
introduced data frames, ...
### S - 1976 ### R - 1993
We will be focussing on the modern [Tidyverse](https://tidyverse.org) approach. * **"tidy data"**: the data frame ("tibble") is the one true data structure * `dplyr`, `ggplot2`, `tidyr`, `purrr`, etc * mostly written by [Hadley Wickham](http://hadley.nz/) * simpler, safer, and less "helpful" ## Analysis cycle From the ["R for Data Science" book](http://r4ds.had.co.nz/) Data analysis isn't just statistical modelling. *The greatest value of a picture is when it forces us
to notice what we never expected to see.* -- John Tukey ## Topics covered today
For modelling, get started with Data Fluency's "Linear Models in R" workshop. ## Programming ### Use structures such as loops to automate repetitive tasks ```{r eval=F} for(i in 1:10) { do_the_thing(i) } ``` ### Follow a script ```{r eval=F} source("myscript.R") ``` * Runs script, command by command, to completion (or first error). Or: use Rmarkdown to produce a document. ### Define new functions for interactive or scripted use ```{r eval=F} source("myfunctions.R") ``` Or: write an R package. ## Programming, from a scientist's perspective If every step of your analysis is recorded in an R script, with no manual steps: * you have a complete record of what you have done * easy to run entire script with test data * changes easily tested, poor early decisions easily fixed * today's big project becomes a function in a package, serves as tomorrow's building block
Programming is an essential part of **reproducible research**. * other researchers can precisely understand and verify your work ## Algorithmic thinking Programming involves two very different activities.
### 1. Thinking through the problem * Define the problem precisely. * Anticipate things that might go wrong or need special handling. * Come up with a step-by-step solution, using phrases like **"for each"** and **"if"**. This is **algorithmic thinking**. It doesn't need a computer!
### 2. Turning the step-by-step solution into R code ## Algorithmic thinking Suppose you want to add the numbers 5, 3, 9, 7. We can write down steps to solve this, then convert them to R code.

``` Start with a total of 0. Add 5 to the total. Add 3 to the total. Add 9 to the total. Add 7 to the total. The total now is the answer. ``` ``` total <- 0 total <- total + 5 total <- total + 3 total <- total + 9 total <- total + 7 total ```
## Algorithmic thinking Suppose you want to add up a collection of numbers **in general**. We can write down steps to solve this, then convert them to R code.

``` Start with a total of 0. For each number x in the collection: Add x to the total. The answer is the final value of total. ``` ``` collection <- c(5, 3, 9, 7) total <- 0 for(x in collection) { total <- total + x } total ```
## Algorithmic thinking Like a cooking recipe or a lab protocol, we can write down the steps once, then use them whenever we need to solve this problem.

``` This is how to add up the numbers in a collection: Start with a total of 0. For each number x in the collection: Add x to the total. The answer is the final value of total. Now add up 5, 3, 9, and 7. ``` ``` addup <- function(collection) { total <- 0 for(x in collection) { total <- total + x } total } addup( c(3,5,9,7) ) ```
## (do programming section) ## Data



As your programs get more complicated, you will also need ways to represent complex data.
Often the largest task is to get the data into the right form to apply a tool
such as `ggplot`, `summarize`, or `lm`. ## Data ### Vectors * A collection of a single kind of data:
numeric, character, factor, logical * Single numbers are a vector of length 1. * Can have `names( )`. ```{r eval=FALSE} c(x=1, y=2, z=3) ``` ### Lists * A special kind of vector that can hold any kind of data, including other vectors and lists. * If you need to bundle together a miscellaneous collection of data, lists are your solution. For example, a function that needs to return multiple results can return a list. * Play the same role as both the `list` and `dict` types in Python, or `object` and `Array` types in Javascript. * Access individual elements with `[[ ]]` or `$`. ```{r eval=FALSE} list(x=TRUE, y="two", z=c(1,2,3)) ``` ## Data ### Data frames * Data frames hold tabular data where the columns may be different types. * Under the hood, a list of column vectors. * Tidyverse has an improved data frame called a "tibble". ### Others (not covered today) * **Matrices** hold tabular data all of the same type, usually numeric.
Distinct from data frames in R!
Can have `rownames( )` and `colnames( )`. * **"S3" objects** are (usually) a list, with a special class attribute.
Example: an `lm` object holding a linear model. * **"S4" objects** are a more formal approach to object-orientation.
Most heavily used by the Bioconductor project.
Example: a `GRanges` object holding genomic ranges. ## Tidy data Tidy data doesn't mean tidy for a person to read, it means the easiest form for the computer to work with. * only use data frames * put all the data in a single data frame * each row is a single unit of observation * each column is a single piece of information Similar to database design. The experimental design is in the body of the table alongside the data, *not* in row names or column names. If you have multiple columns containing the same kind of information, this is a hint the data is not tidy. ## Not tidy ... Example from: [Wickham, H. (2015) Tidy data. The Journal of Statistical Software, vol. 59.](http://vita.had.co.nz/papers/tidy-data.html) ## ... tidier ... tidy ("melt" = "gather" = "pivot_long") ## Learning more ["R for Data Science" by Garret Grolemund and Hadley Wickham](https://r4ds.had.co.nz/) * good general introduction, more detail on most topics covered today [RStudio's cheat sheets](https://www.rstudio.com/resources/cheatsheets/) * good to discover what exists [Hadley Wickham's website](http://hadley.nz/) * further books on Advanced R and R package development [Official R manuals](https://cran.r-project.org/manuals.html) * complete description of the R language ## (do tidyverse section)