6 Next steps

6.1 Deepen your understanding

Our number one recommendation is to read the book “R for Data Science” by Garrett Grolemund and Hadley Wickham.

Also, statistical tasks such as model fitting, hypothesis testing, confidence interval calculation, and prediction are a large part of R, and one we haven’t demonstrated fully today. Linear models, and the linear model formula syntax ~, are core to much of what R has to offer statistically. Many statistical techniques take linear models as their starting point, including limma for differential gene expression, glm for logistic regression (etc), survival analysis with coxph, and mixed models to characterize variation within populations.

  • “Statistical Models in S” by J.M. Chambers and T.J. Hastie is the primary reference for this, although there are some small differences between R and its predecessor S.

  • “An Introduction to Statistical Learning” by G. James, D. Witten, T. Hastie and R. Tibshirani can be seen as further development of the ideas in “Statistical Models in S”, and is available online. It has more of a machine learning than a statistics flavour to it (the distinction is fuzzy!).

  • “Modern Applied Statistics with S” by W.N. Venable and B.D. Ripley is a well respected reference covering R and S.

  • “Linear Models with R” and “Extending the Linear Model with R” by J. Faraway cover linear models, with many practical examples.

6.2 Expand your vocabulary

Have a look at these cheat sheets to see what is possible with R.

6.3 Join the community

Join the Data Fluency community at Monash.

  • Mailing list for workshop and event announcements.
  • Slack for discussion.
  • Drop-in sessions on Friday afternoon.

Meetups in Melbourne:

The Carpentries run intensive two day workshops on scientific computing and data science topics worldwide. The style of this present workshop is very much based on theirs. For bioinformatics, COMBINE is an Australian student and early career researcher organization, and runs Carpentries workshops and similar.