5 Thinking in R
The result of a t-test is actually a value we can manipulate further. Two functions help us here.
class gives the “public face” of a value, and
typeof gives its underlying type, the way R thinks of it internally. For example numbers are “numeric” and have some representation in computer memory, either “integer” for whole numbers only, or “double” which can hold fractional numbers (stored in memory in a base-2 version of scientific notation).
##  "numeric"
##  "double"
Let’s look at the result of a t-test:
result <- t.test(gap2010$life_exp, gap2000$life_exp, paired=TRUE) class(result)
##  "htest"
##  "list"
##  "statistic" "parameter" "p.value" "conf.int" "estimate" ##  "null.value" "alternative" "method" "data.name"
##  4.301261e-29
In R, a t-test is just another function returning just another type of data, so it can also be a building block. The value it returns is a special type of vector called a “list”, but with a public face that presents itself nicely. This is a common pattern in R. Besides printing to the console nicely, this public face may alter the behaviour of generic functions such as
Similarly a data frame is a list of vectors that is able to present itself nicely.
Lists are vectors that can hold anything as elements (even other lists!). It’s possible to create lists with the
list function. This becomes especially useful once you get into the programming side of R. For example writing your own function that needs to return multiple values, it could do so in the form of a list.
mylist <- list(hello=c("Hello","world"), numbers=c(1,2,3,4)) mylist
## $hello ##  "Hello" "world" ## ## $numbers ##  1 2 3 4
##  "list"
##  "list"
##  "hello" "numbers"
Accessing lists can be done by name with
$ or by position with
##  "Hello" "world"
##  1 2 3 4
5.2 Other types not covered here
Matrices are another tabular data type. These come up when doing more mathematical tasks in R. They are also commonly used in bioinformatics, for example to represent RNA-Seq count data. A matrix, as compared to a data frame:
- contains only one type of data, usually numeric (rather than different types in different columns).
- commonly has
rownamesas well as
colnames. (Base R data frames can have
rownamestoo, but it is easier to have any unique identifier as a normal column instead.)
- has individual cells as the unit of observation (rather than rows).
Matrices can be created using
as.matrix from a data frame,
matrix from a single vector, or using
cbind with several vectors.
You may also encounter “S4 objects”, especially if you use Bioconductor packages. The syntax for using these is different again, and uses
@ to access elements.
Once you have a useful data analysis, you may want to do it again with different data. You may have some task that needs to be done many times over. This is where programming comes in: