This workshop is designed to work with RStudio running in Posit Cloud. Go to https://posit.cloud/ and create a new project. Monash users can log in with their Monash Google account. The workshop can also be done using R locally on your laptop (if doing this, we also recommend you create a new project to contain the files).
Running the R code below will download files and install packages used in this workshop.
# Download data
download.file(
"https://monashdatafluency.github.io/r-linear/r-linear-files.zip",
destfile="r-linear-files.zip")
unzip("r-linear-files.zip")
# Install some CRAN packages:
install.packages(c(
"tidyverse", "multcomp", "emmeans",
"lme4", "lmerTest", "pbkrtest", "BiocManager"))
# Install some Bioconductor packages:
BiocManager::install(c("limma","edgeR","topconfects"))
Now load the file linear_models.R
in the
r-linear-files
folder.
Built-in to R:
lm, model.matrix, coef, sigma, df.residual,
predict, confint, summary, anova, drop1,
I, poly
splines
– curve fitting:
ns, bs
multcomp
and emmeans
– linear hypothesis
tests and multiple comparisons:
glht, mcp, confint, summary, emmeans
limma
and edgeR
– fitting many models to
gene expression data:
DGEList, calcNormFactors, cpm,
lmFit, contrasts.fit, eBayes, plotSA, topTable
Postgraduate students at Monash can access statistical consulting, courtesy of the Data Science and AI Platform. This is a good service for beginner to intermediate statistical questions.
The Biostatistics Consulting Platform in the Monash Faculty of Medicine may be more suitable for advanced questions about experimental design and analysis.
Course notes for PH525x. Initial chapters of this edX course cover similar material to this workshop.
StatQuest videos on linear models. A friendly but thorough introduction to key ideas.
Harrel (2015) “Regression Modeling Strategies” has detailed practical advice for creating predictive models, such as models using biomarkers. Frank Harrell’s home page.
James, Witten, Hastie and Tibshirani (2013) “An Introduction to Statistical Learning” describes fundamental ideas and methods in machine learning.
Dance of the CIs app for intuition about Confidence Intervals.
The Art of Linear Algebra for intuition about matrices and vectors – sections 1-3 are relevant to this workshop.
Testing for differential gene expression often uses linear
models. The developers of limma
and edgeR
at
WEHI
have written some good introductions to this topic:
voom
or voomWithQualityWeights
can be used to
account for heteroscedasticity. If using a model other than
~ 0 + group
, read the note in the documentation for
contrasts.fit
and consider using
contrastAsCoef
.Mixed effects models are a popular next step beyond the fixed effects models covered in this workshop.