Linear models

“linear model” “general linear model” “linear predictor” “regression” “multiple regression” “multiple linear regression” “multiple linear regression model”

Learn to predict a response variable as

  • a straight line relationship with a predictor variable
  • or more than one predictor variable
  • actually it doesn’t have to be a straight line
  • some of the predictors could be categorical (ANOVA)
  • and there could be interactions

  • test which variables and interactions the data supports as predictors
  • give a confidence interval on any estimates

Linear models in R

Many features of the S language (predecessor to R) were created to support working with linear models and their generalizations:

  • data.frame type introduced to hold data for modelling.
  • factor type introduced to hold categorical data.
  • y ~ x1+x2+x3 formula syntax specify terms in models.
  • Manipulation of “S3” objects holding fitted models.
  • Rich set of diagnostic visualization functions.

Primary reference is “Statistical models in S”.

Linear maths and R

We will be using vectors and matrices extensively today.

In mathematics, we usually treat a vector as a matrix with a single column. In R, they are two different types. * R also makes a distinction between matrix and data.frame types. There is a good chance you have used data frames but not matrices before now.

Matrices contain all the same type of value, typically numeric, whereas data frames can have different types in different columns.

A matrix can be created from a vector using matrix, or from multiple vectors with cbind (bind columns) or rbind (bind rows), or from a data frame with as.matrix.

Matrix transpose exchanges rows and columns. In maths it is indicated with a small t, eg \(X^\top\). In R use the t function, eg t(X).

\[ X = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \quad \quad X^\top = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \]

Dot products

“dot product” “weighted sum” “linear combination” “linear function” …

Taking the dot product of two vectors we multiply corresponding elements together and add the results to obtain a total.

In mathematical notation:

\[ a^\top b = a_1 b_1 + a_2 b_2 + \dots + a_n b_n \]

In R:


Dot products and geometry

The dot product is our ruler and set-square.


A vector can be thought of as an arrow in a space.

The dot product of a vector with itself \(a^\top a\) is the square of its length.
= Pythagoras, but in as many dimensions as we like.
= Euclidean distance (squared).

Right angles

Two vectors at right angles have a dot product of zero. They are orthogonal.

Matrix vector multiplication

Taking the product of a matrix \(X\) and a vector \(a\) with length matching the number of columns, the result is a vector containing the dot product of each row of the matrix \(X\) with the vector \(a\).

\[ Xa = \begin{bmatrix} x_{1,1} a_1 + x_{1,2} a_2 + \dots \\ x_{2,1} a_1 + x_{2,2} a_2 + \dots \\ \vdots \end{bmatrix} \]

Can also think of this as a weighted sum of the columns of \(X\).

\[ Xa = a_1 \begin{bmatrix} x_{1,1} \\ x_{2,1} \\ \vdots \end{bmatrix} + a_2 \begin{bmatrix} x_{1,2} \\ x_{2,2} \\ \vdots \end{bmatrix} + \dots \]

In R:

X %*% a

(It’s also possible to multiply two matrices with %*% but we won’t need this today.)

Geometry – subspaces

Example: a 3x2 matrix puts
2D vectors into a 2D subspace in 3D.

  • A line in two or more dimensions, passing through the origin.
  • A plane in three or more dimensions, passing through the origin.
  • A point at the origin.

These are all examples of subspaces.

Think of all the vectors that could result from multiplying a matrix \(X\) with some arbitrary vector.

If the matrix \(X\) has \(n\) rows and \(p\) columns, we obtain an (at most) \(p\)-dimensional subspace within an \(n\)-dimensional space.

A subspace has an orthogonal subspace with \(n-p\) dimensions. All the vectors in a subspace are orthogonal to all the vectors in its orthogonal subspace.

Do section: Vectors and matrices


A model can be used to predict a response variable based on a set of predictor variables. * Alternative terms: depedent variable, independent variables.

We are using causal language, but really only describing an association.

The prediction will usually be imperfect, due to random noise.

Linear model

A response \(y\) is produced based on \(p\) predictors \(x_j\) plus noise \(\varepsilon\) (“epsilon”):

\[ y = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \varepsilon \]

The model has \(p\) terms, plus a noise term. The model is specified by the choice of coefficients \(\beta\) (“beta”).

This can also be written as a dot product:

\[ y = \beta^\top x + \varepsilon \]

The noise is assumed to be normally distributed with standard deviation \(\sigma\) (“sigma”) (i.e. variance \(\sigma^2\)):

\[ \varepsilon \sim \mathcal{N}(0,\sigma^2) \]

Typically \(x_1\) will always be 1, so \(\beta_1\) is a constant term in the model. We still count it as one of the \(p\) predictors. * This matches what R does, but may differ from other presentations!

Linear model in R code

For vector of coefficients beta and vector of predictors in some particular case x, the most probable outcome is:

y_predicted <- sum(beta*x)

A simulated possible outcome can be generated by adding random noise:

y_simulated <- sum(beta*x) + rnorm(1, mean=0, sd=sigma)

But where do the coefficients \(\beta\) come from?

Model fitting – estimating \(\beta\)

Say we have observed \(n\) responses \(y_i\) with corresponding vectors of predictors \(x_i\):

\[ \begin{align} y_1 &= \beta_1 x_{1,1} + \beta_2 x_{1,2} + \dots + \beta_p x_{1,p} + \varepsilon_1 \\ y_2 &= \beta_1 x_{2,1} + \beta_2 x_{2,2} + \dots + \beta_p x_{2,p} + \varepsilon_2 \\ & \dots \\ y_n &= \beta_1 x_{n,1} + \beta_2 x_{n,2} + \dots + \beta_p x_{n,p} + \varepsilon_n \end{align} \]

This is conveniently written in terms of a vector of responses \(y\) and matrix of predictors \(X\):

\[ y = X \beta + \varepsilon \]

Each response is assumed to contain the same amount of noise:

\[ \varepsilon_i \sim \mathcal{N}(0,\sigma^2) \]

Model fitting – estimating \(\beta\) with geometry