5 Thinking in R
The result of a t-test is actually a value we can manipulate further. Two functions help us here. class
gives the “public face” of a value, and typeof
gives its underlying type, the way R thinks of it internally. For example numbers are “numeric” and have some representation in computer memory, either “integer” for whole numbers only, or “double” which can hold fractional numbers (stored in memory in a base-2 version of scientific notation).
class(42)
## [1] "numeric"
typeof(42)
## [1] "double"
Let’s look at the result of a t-test:
<- t.test(gap2010$life_exp, gap2000$life_exp, paired=TRUE)
result
class(result)
## [1] "htest"
typeof(result)
## [1] "list"
names(result)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "stderr" "alternative" "method" "data.name"
$p.value result
## [1] 4.301261e-29
In R, a t-test is just another function returning just another type of data, so it can also be a building block. The value it returns is a special type of vector called a “list”, but with a public face that presents itself nicely. This is a common pattern in R. Besides printing to the console nicely, this public face may alter the behaviour of generic functions such as plot
and summary
.
Similarly a data frame is a list of vectors that is able to present itself nicely.
5.1 Lists
Lists are vectors that can hold anything as elements (even other lists!). It’s possible to create lists with the list
function. This becomes especially useful once you get into the programming side of R. For example writing your own function that needs to return multiple values, it could do so in the form of a list.
<- list(hello=c("Hello","world"), numbers=c(1,2,3,4))
mylist mylist
## $hello
## [1] "Hello" "world"
##
## $numbers
## [1] 1 2 3 4
class(mylist)
## [1] "list"
typeof(mylist)
## [1] "list"
names(mylist)
## [1] "hello" "numbers"
Accessing lists can be done by name with $
or by position with [[ ]]
.
$hello mylist
## [1] "Hello" "world"
2]] mylist[[
## [1] 1 2 3 4
5.2 Other types not covered here
Matrices are another tabular data type. These come up when doing more mathematical tasks in R. They are also commonly used in bioinformatics, for example to represent RNA-Seq count data. A matrix, as compared to a data frame:
- contains only one type of data, usually numeric (rather than different types in different columns).
- commonly has
rownames
as well ascolnames
. (Base R data frames can haverownames
too, but it is easier to have any unique identifier as a normal column instead.) - has individual cells as the unit of observation (rather than rows).
Matrices can be created using as.matrix
from a data frame, matrix
from a single vector, or using rbind
or cbind
with several vectors.
You may also encounter “S4 objects”, especially if you use Bioconductor packages. The syntax for using these is different again, and uses @
to access elements.
5.3 Programming
Once you have a useful data analysis, you may want to do it again with different data. You may have some task that needs to be done many times over. This is where programming comes in:
- Writing your own functions.
- For-loops to do things multiple times.
- If-statements to make decisions.
The “R for Data Science” book is an excellent source to learn more. Monash Data Fluency “Programming and Tidy data analysis in R” course also covers this.