5 Thinking in R

The result of a t-test is actually a value we can manipulate further. Two functions help us here. class gives the “public face” of a value, and typeof gives its underlying type, the way R thinks of it internally. For example numbers are “numeric” and have some representation in computer memory, either “integer” for whole numbers only, or “double” which can hold fractional numbers (stored in memory in a base-2 version of scientific notation).

## [1] "numeric"
## [1] "double"

Let’s look at the result of a t-test:

## [1] "htest"
## [1] "list"
## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"
## [1] 4.301261e-29

In R, a t-test is just another function returning just another type of data, so it can also be a building block. The value it returns is a special type of vector called a “list”, but with a public face that presents itself nicely. This is a common pattern in R. Besides printing to the console nicely, this public face may alter the behaviour of generic functions such as plot and summary.

Similarly a data frame is a list of vectors that is able to present itself nicely.

5.1 Lists

Lists are vectors that can hold anything as elements (even other lists!). It’s possible to create lists with the list function. This becomes especially useful once you get into the programming side of R. For example writing your own function that needs to return multiple values, it could do so in the form of a list.

## $hello
## [1] "Hello" "world"
## 
## $numbers
## [1] 1 2 3 4
## [1] "list"
## [1] "list"
## [1] "hello"   "numbers"

Accessing lists can be done by name with $ or by position with [[ ]].

## [1] "Hello" "world"
## [1] 1 2 3 4

5.2 Other types not covered here

Matrices are another tabular data type. These come up when doing more mathematical tasks in R. They are also commonly used in bioinformatics, for example to represent RNA-Seq count data. A matrix, as compared to a data frame:

  • contains only one type of data, usually numeric (rather than different types in different columns).
  • commonly has rownames as well as colnames. (Base R data frames can have rownames too, but it is easier to have any unique identifier as a normal column instead.)
  • has individual cells as the unit of observation (rather than rows).

Matrices can be created using as.matrix from a data frame, matrix from a single vector, or using rbind or cbind with several vectors.

You may also encounter “S4 objects”, especially if you use Bioconductor packages. The syntax for using these is different again, and uses @ to access elements.

5.3 Programming

Once you have a useful data analysis, you may want to do it again with different data. You may have some task that needs to be done many times over. This is where programming comes in:

The “R for Data Science” book is an excellent source to learn more. The Monash Bioinformatics Platform “R more” course also covers this.