This short section looks at a couple of topics to do with professionally communicating your results with others: making nice looking plots, and producing documents including R output and code.

Let’s continue our examination of the FastQC output. If you’re starting fresh for this lesson, you can load the necessary data frame with:

library(tidyverse)

bigtab <- read_csv("r-progtidy-files/fastqc.csv")

Publication quality ggplot2

ggplot2 is covered in our introductory R workshop. Recall that we could assign columns of a data frame to aesthetics–x and y position, color, etc–and then add “geom”s to draw the data.

With ggplot2 we can view the whole data set.

ggplot(bigtab, aes(x=file,y=test,color=grade)) + 
    geom_point()

With categorical data on the x and y axes, a better geom to use is geom_tile.

ggplot(bigtab, aes(x=file,y=test,fill=grade)) + 
    geom_tile()

ggplot2 offers a very wide variety of ways to adjust a plot. For categorical aesthetics, usually the first step is ensuring the relevant column is a factor with a meaningful level order.

# Put x axis in order first found in data frame
x_order <- unique(bigtab$file)
bigtab$file <- factor(bigtab$file, levels=x_order)

# Only necessary if not continuing from previous lesson on programming!
color_order <- c("FAIL", "WARN", "PASS")
bigtab$grade <- factor(bigtab$grade, levels=color_order)

myplot <- ggplot(bigtab, aes(x=file, y=test, fill=grade)) + 
    geom_tile(color="black", size=0.5) +           # Black border on tiles
    scale_y_discrete(limits=rev) +                 # Flip the y axis
    scale_fill_manual(                             # Colors, as color hex codes
        values=c("#D81B60","#FFC107","#1E88E5")) +
    labs(x="", y="", fill="") +                    # Remove axis labels
    coord_fixed() +                                # Square tiles
    theme_minimal() +                              # Minimal theme, no grey background
    theme(panel.grid=element_blank(),              # No underlying grid lines
          axis.text.x=element_text(                # Vertical text on x axis
              angle=90,vjust=0.5,hjust=0))              
myplot

Colors from: https://davidmathlogic.com/colorblind/

See also: forcats package for factor manipulation.

Publication quality images

Plots can be saved with ggsave. Width and height are given in inches, and an image resolution in Dots Per Inch should also be given. The width and height will determine the relative size of text and graphics, so increasing the resolution is best done by adjusting the DPI. Compare the files produced by:

ggsave("plot-bad.png",  myplot, width=10, height=10, dpi=300)
ggsave("plot-good.png", myplot, width=5,  height=5,  dpi=600)

It may be necessary to edit a plot further in a program such as Inkscape or Adobe Illustrator. To allow this, the image needs to be saved in a “vector” format such as SVG, EPS, or PDF. In this case, the DPI argument isn’t needed.

R Markdown documents

This is a very brief introduction to a large topic.

To share results and the R code that produced them, you can write a report in “R Markdown”. The idea is to write a file containing a mixture of text and R code. This is then “knit” into an HTML or PDF document, with the output from the R code and any plots also shown. This workshop is written in R Markdown!

Use R Markdown to:

  • Make a record of results for yourself.

  • Communicate with your colleagues or supervisor.

  • Write a thesis, book, or journal article (but also consider using LaTeX directly).

Create a file called report.Rmd containing:

---
title: Report
---

This is the result from running FastQC.

```{r}
library(tidyverse)
library(knitr)

bigtab <- read_csv("r-progtidy-files/fastqc.csv")

kable(bigtab)
```

Press the “knit” button or type:

rmarkdown::render("report.Rmd")

All of the R code in the file is run, but now an HTML file showing the code and output is created.

Code chunks

Further code chunks can be included:

```{r}
# ...your code here...
```

Code chunks can have options included in the curly brackets. In RStudio, click on the gear icon in the upper right of the code chunk to seem some options.

More on chunk options: https://bookdown.org/yihui/rmarkdown/r-code.html

Challenge

  1. Add a code chunk to produce the plot from the previous section.

  2. Adjust output of messages and warnings, and whether the code is shown.

Markdown

R Markdown is built on Markdown, a concise way of formatting text documents with headings, formatting, links, tables, etc.

Try including some of the following in your report:

# Heading

## Subheading

Some *italic* and **bold** text.

A [link](https://zombo.com/) to a website.

Further reading: https://bookdown.org/yihui/rmarkdown/markdown-syntax.html

Mathematics

R Markdown documents can include LaTeX-style mathematics.

Try including this in your document:

$$
c = \sqrt{a^2+b^2}
$$

The result should appear like:

\[ c = \sqrt{a^2+b^2} \]

More on math in rmarkdown: https://bookdown.org/yihui/rmarkdown/markdown-syntax.html#math-expressions

More on LaTeX math: https://en.wikibooks.org/wiki/LaTeX/Mathematics

A nice web editor: https://latexeditor.lagrida.com/

References

References can be included, with details given in a BibTeX file.

---
bibliography: r-progtidy-files/bibliography.bib
---

An experiment by @cleveland1984 compared different types of visual information.

Citations in R Markdown: https://bookdown.org/yihui/rmarkdown-cookbook/bibliography.html

About the BibTeX format: https://www.bibtex.com/format/

Get BibTeX entry from a DOI: https://www.bibtex.com/c/doi-to-bibtex-converter/

Gotcha: For some bibliography styles, BibTeX will convert some upper-case letters in titles to lower-case. Surround letters or the entire title in an extra { } to prevent this.

Output formats

By default rmarkdown produces HTML outputs. It can also produce PDF and Word documents. To output PDF, you will need to install TeX. It’s also possible to produce slideshows.

In RStudio, click on the gear icon and select “Output options…” or go to “File/New File/Rmarkdown…” and try a few of the template documents.

The output format is determined by the output: ... line in the YAML header at the top of the document.

I don’t like R Notebooks

---
output: html_notebook
---

RStudio also offers “R Notebooks”. These are like R Markdown, but output can be produced chunk by chunk rather than all at once.

You can get in a muddle with R Notebooks by running code in an odd order or using results from code that you then alter.

R Notebooks may be appealing for long running computations. Consider instead putting long computations in a .R script. Save results with saveRDS() and load them in an R Markdown file with readRDS(). Your R Markdown documents should knit quickly.

Under the hood

The process of turning R Markdown into the final output has several steps.

      rmarkdown and knitr                   pandoc
.Rmd ---------------------> .md (markdown) --------> .html


      rmarkdown and knitr                   pandoc                latex
.Rmd ---------------------> .md (markdown) -------> .tex (latex) -------> .pdf

For HTML output, a CSS file can be used to customize the appearance of the output, and you can also directly include HTML tags in the R Markdown document. For PDF output, you can directly use LaTeX commands. By learning HTML and CSS, or LaTeX, you can gain a lot more control over the appearance of the output.

For larger documents, the bookdown package can be used.

If your aim is to produce a PDF document such as a thesis or journal article, you may prefer to write LaTeX directly.

Languages and formats

R Markdown is a language for including R code and outputs in a document. It is based on Markdown. It is implemented in the R package rmarkdown, and builds on a package called knitr by Yihui Xie.

Bookdown is a package that lets you write large documents spanning multiple files, such as whole books. However also consider the newer Quarto system.

Quarto is an overhaul of R Markdown released in 2022 by Posit, the authors of RStudio. It has many useful features, including features for working on larger projects such as books and websites. It is not necessary to use Quarto, but it is worth a look.

Markdown is a concise language for producing nicely formatted documents. It can be converted to formats such as HTML and PDF, using a command-line program called pandoc (this is done automatically during knitting). Markdown in various forms is widely used on the internet.

HTML (Hyper-Text Markup Language) is the language for documents on the web. You can write your own HTML, but it is more verbose than Markdown.

CSS (Cascadings Style Sheets) is a language that defines how HTML appears on screen.

LaTeX is a venerable language for producing high quality documents. These days, the output is usually PDF. LaTeX is built on top of an earlier system called TeX written by Donald Knuth.

BibTeX is a format for bibliographic information, created alongside TeX.

MathJax is a Javascript library that lets LaTeX-style math be used in HTML documents.

PDF (Portable Document Format) is a binary format for documents. You can’t edit it directly.