---
title: "ETC1010: Data Modelling and Computing"
subtitle: "Week of introduction: Rmarkdown"
author: "Professor Di Cook & Dr. Nicholas Tierney"
institute: "EBS, Monash U."
date: "`r Sys.Date()`"
output:
xaringan::moon_reader:
lib_dir: libs
css: ["shinobi", "ninjutsu", "slides.css"]
seal: true
self_contained: false
nature:
ratio: "16:9"
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = FALSE,
message = FALSE,
warning = FALSE,
collapse = TRUE,
fig.height = 4,
fig.width = 8,
fig.align = "center",
cache = FALSE)
```
```{r}
library(tidyverse)
library(lubridate)
library(gridExtra)
```
class: bg-blue
.vvvhuge.white.center.middle[
What is this song?
]
---
class: bg-black
.gigantic.white.middle.center[
Recap
]
---
class: center
# Traffic Light System
```{r tom-cruise, out.width = "90%"}
include_graphics("https://nypdecider.files.wordpress.com/2014/08/help-me-help-you.gif")
```
---
.center[
# Traffic Light System
]
--
.red.pull-left.huge.middle[
# Red Post it
--
* I need a hand
* Slow down
]
--
.green.pull-right.huge.middle[
# Green Post it
--
* I am up to speed
* I have completed the thing
]
---
class: bg-main1, center, bottom, white
```{r rstudio-cooking-example}
include_graphics("https://njtierney.updog.co/img/rstudio-cooking-example.jpeg")
```
From Jessica Ward (@JKRWard) of R Ladies Newcaslte (UK) - @RLadiesNCL
https://twitter.com/RLadiesNCL/status/1138812826917724160
---
background-image: url(https://njtierney.updog.co/img/unvotes-oz-usa.png)
background-size: contain
background-position: 50% 50%
class: center, bottom, white
---
background-image: url(https://njtierney.updog.co/img/tower-of-babel.gif)
background-size: contain
background-position: 50% 50%
class: center, bottom, white
---
background-image: url(https://njtierney.updog.co/img/edstem.png)
background-size: contain
background-position: 50% 50%
class: center, bottom, white
---
class: bg-main1
# R essentials: A short list (for now)
.large.white[
- Functions are (most often) verbs, followed by what they will be applied to in parentheses:
]
```{r eval=FALSE, echo = TRUE}
do_this(to_this)
do_that(to_this, to_that, with_those)
```
--
.large.white[
- Columns (variables) in data frames are accessed with `$`:
]
```{r eval=FALSE, echo = TRUE}
dataframe$var_name
```
--
.large.white[
- Packages are installed with the `install.packages` function and loaded with the `library` function, once per session:
]
```{r eval=FALSE, echo = TRUE}
install.packages("package_name")
library(package_name)
```
---
# Today: Outline
.huge[
* Why we care about Reproducibility
* R + markdown = Rmarkdown
* Controling output and input of rmarkdown
* Exercises on creating rmarkdown reports on the humble platypus
* Form up assignment groups
* Quiz
* Release assignment (later today)
]
???
* Reproducibility: Why we care
* Rmarkdown
* YAML
* Code
* text
* markdown (online quiz)
* rmarkdown - edit the existing one on platypus!
* code chunks
* code chunk names
* chunk options
* exercise on this
* setting different chunk options globally
* exercise extending the platypus report
* make assignment groups
* release assignment
* quiz
Should be able to answer the questions:
How should I start an rmarkdown document?
What do I put in the YAML metadata?
How do I create a code chunk?
What sort of options to I need to worry about for my code?
What is the value in a reproducible report?
What is markdown?
Can I combine my software and my writing?
---
class: bg-black
.vvhuge.white.center.middle[
We are in a tight spot with reproducibility
]
---
.blockquote.huge[
Only 6 out of 53 landmark results could be reproduced
-- [Amgen, 2014*](https://www.nature.com/articles/483531a)
]
.footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)]
---
.blockquote.huge[
An estimated 75% - 90% of preclinical results cannot be reproduced
-- [Begley, 2015*](https://www.ncbi.nlm.nih.gov/pubmed/25552691)
]
.footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)]
---
.blockquote.huge[
Estimated **annual** cost of irreproducibility for biomedical industry = 28 Billion USD
-- [Freedman, 2015*](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165)
]
.footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)]
---
background-image: url(https://njtierney.updog.co/gifs/njt-gif-jaws-dolly-zoom.gif)
background-size: contain
background-position: 50% 50%
class: center, bottom, white
---
background-image: url(https://njtierney.updog.co/gifs/ice-climber-fall.gif)
background-size: contain
background-position: 50% 50%
class: center, bottom, white
---
background-image: url(https://njtierney.updog.co/img/open-science-berg.jpeg)
background-size: contain
background-position: 50% 50%
class: center, bottom, white
---
class: bg-main1 center middle
.vvhuge[
So what can we do about it?
]
---
# Reproducibility checklist
.huge[
Near-term goals:
- Are the tables and figures reproducible from the code and data?
- Does the code actually do what you think it does?
- In addition to what was done, is it clear **why** it was done?
(e.g., how were parameter settings chosen?)
]
---
# Reproducibility checklist
.huge[
Long-term goals:
- Can the code be used for other data?
- Can you extend the code to do other things?
]
---
# Literate programming is a partial solution
.huge[
* Literate programming shines some light on this dark area of science.
* An idea from [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) where you combine your text with your code output to create a document.
* A _blend_ of your literature (**text**), and your programming (**code**), to create something you can read from top to bottom.
]
---
.huge[
So
]
--
.huge[
Imagine a report:
Introduction, methods, results, discussion, and conclusion,
]
--
.huge[
and all the bits of code that make each section.
]
.huge[
With rmarkdown, you can see all the pieces of your data analysis all together.
Each time you knit the analysis is ran from the beginning
]
---
class: bg-main1
.vhuge[
Markdown as a new player to legibility
]
--
.vhuge[
In 2004, [John Gruber](https://en.wikipedia.org/wiki/John_Gruber), of [daring fireball](https://daringfireball.net/) created [markdown](https://en.wikipedia.org/wiki/Markdown), a simple way to create text that rendered into a HTML webpage.
]
---
class: bg-main1
.pull-left[
```
- bullet list
- bullet list
- bullet list
```
]
--
.pull-right.huge.white[
- bullet list
- bullet list
- bullet list
]
---
class: bg-main1
.pull-left.huge[
```
1. numbered list
2. numbered list
3. numbered list
__bold__, **bold**, _italic_, *italic*
> quote of something profound
```
]
--
.pull-right.huge[
1. numbered list
2. numbered list
3. numbered list
__bold__, **bold**, _italic_, *italic*
> quote of something profound
]
---
class: bg-black
.vvhuge.white[
With very little marking up, we can create rich text, that **actually resembles** the text that we want to see.
]
---
class: bg-main1
.vvhuge.white[
**Learn to use markdown** In your small groups, spend five minutes working through [markdowntutorial.com](https://www.markdowntutorial.com/)
]
```{r cdown-md}
library(countdown)
countdown(minutes = 5)
```
---
class: bg-main1
# Rmarkdown helps complete the solution to the reproducibility problem
--
.huge[
* Q: How do we take `markdown` + `R code` = "literate programming environment"
* A: `Rmarkdown`
]
---
# Rmarkdown...
--
.huge[
Provides an environment where you can write your complete analysis, and marries your text, and code together into a rich document.
]
--
.huge[
You write your code as code chunks, put your text around that, and then hey presto, you have a document you can reproduce.
]
---
# Reminder: You've already used rmarkdown!
```{r remind-unvotes, out.width = "90%"}
include_graphics("https://njtierney.updog.co/img/unvotes-oz-usa.png")
```
---
# How will we use R Markdown?
.huge[
- Every assignment + project / is an R Markdown document
- You'll always have a template R Markdown document to start with
- The amount of scaffolding in the template will decrease over the semester
- These lecture notes are created using R Markdown (!)
]
---
# The anatomy of an rmarkdown document
.huge[
There are three parts to an rmarkdown document.
* Metadata (YAML)
* Text (markdown formatting)
* Code (code formatting)
]
--
.vvhuge.center[DEMO]
---
class: bg-main1
# Metadata: YAML (YAML Ain't Markup Language)
.huge[
* The metadata of the document tells you how it is formed - what the **title** is, what **date** to put, and other control information.
* If you're familiar with $\LaTeX$, this is ksimilar to how you specify document type, styles, fonts, options, etc in the front matter / preamble.
]
---
class: bg-main1
# Metadata: YAML
.huge[
* Rmarkdown documents use YAML to provide the metadata. It looks like this:
```YAML
---
title: "An example document"
author: "Nicholas Tierney"
output: html_document
---
```
It starts an ends with three dashes `---`, and has fields like the following: `title`, `author`, and `output`.
]
---
class: bg-main1
# Text
.huge[
Is markdown, as we discussed in the earlier section,
It provides a simple way to mark up text
]
.pull-left.huge[
```
1. bullet list
2. bullet list
3. bullet list
```
]
.pull-right.huge[
1. bullet list
1. bullet list
1. bullet list
]
---
# Code
.vhuge[
We refer to code in an rmarkdown document in two ways:
1. Code chunks, and
2. Inline code.
]
---
# Code: Code chunks
.huge[
`Code chunks` are marked by three backticks and curly braces with `r` inside them:
````markdown
```{r chunk-name}`r ''`
# a code chunk
```
````
]
---
class: bg-main1
.huge[
**a backtick** is a special character you might not have seen before, it is typically located under the tilde key (`~`). On USA / Australia keyboards, is under the escape key:
```{r show-backtick, out.width = "80%", echo = FALSE, fig.cap = "image from https://commons.wikimedia.org/wiki/File:ANSI_Keyboard_Layout_Diagram_with_Form_Factor.svg"}
knitr::include_graphics("https://njtierney.updog.co/img/ansi-keyboard.png")
```
]
---
# Code: Inline code
.huge[
Sometimes you want to run the code inside a sentence. This is called running the code "inline".
]
--
.huge[
You might want to run the code inline to name the number of variables or rows in a dataset in a sentence like:
> There are XXX observations in the airquality dataset, and XXX variables.
]
---
# Code: Inline code
.huge[
You can call code "inline" like so:
````markdown
There are `r "\u0060r nrow(airquality) \u0060"` observations in the airquality dataset,
and `r "\u0060r ncol(airquality) \u0060"` variables.
````
Which gives you the following sentence
]
--
.huge[
> There are `r nrow(airquality)` observations in the airquality dataset, and `r ncol(airquality)` variables.
]
---
# Code: Inline code
.huge[
What's great about this is that if your data changes upstream, then you don't need to work out where you mentioned your data, you just update the document.
]
---
class: bg-main1
# Your Turn: Put it together
.huge[
Go to `rstudio.cloud` and
* open the document "01-oz-atlas.Rmd"
* knit the document
* Change the data section at the top to be from a different state instead of "New South Wales"
* knit the document again
* How do the text and figures in the document change?
]
```{r cdown-rmd-inline}
countdown(minutes = 5)
```
---
class: bg-main1 center middle
.gigantic[
break
]
---
background-image: url(https://imgs.xkcd.com/comics/art_project.png)
background-size: contain
background-position: 50% 50%
class: center, bottom, black
---
class: bg-main1
# Code: Chunk names
.huge[
Straight after the ` ```{r ` you can use a text string to name the chunk:
` ```{r read-crime-data} `
]
````markdown
```{r read-crime-data}`r ''`
crime <- read_csv("data/crime-data.csv")
```
````
---
class: bg-main1
# Code: Chunk Names
.huge[
Naming code chunks has three advantages:
1. Navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor.
2. Graphics produced by chunks now have useful names.
3. You can set up networks of cached chunks to avoid re-performing expensive computations on every run.
]
---
# Code: Chunk names
.huge[
- Every chunk should ideally have a name.
- Naming things is hard, but follow these rules and you'll be fine:
1. One word that describes the action (e.g., "read")
2. One word that describes the thing inside the code (e.g, "gapminder")
3. Separate words with "-" (e.g., `read-gapminder`)
]
---
# Code: Chunk options
.huge[
You can control how the code is output by changing the code chunk options which follow the title.
]
````markdown
```{r read-gapminder, eval = FALSE, echo = TRUE}`r ''`
gap <- read_csv("gapminder.csv")
```
````
.huge[
What do you think this does?
]
```{r}
countdown(minutes = 0, seconds = 30)
```
---
class: bg-main1
# Code: Chunk options
.vlarge[
The code chunk options you need to know about right now are:
* `cache`: TRUE / FALSE. Do you want to save the output of the chunk so it doesn't have to run next time?
* `eval`: TRUE / FALSE Do you want to evaluate the code?
* `echo`: TRUE / FALSE Do you want to print the code?
* `include`: TRUE / FALSE Do you want to include code output in the final output document? Setting to `FALSE` means nothing is put into the output document, but the code is still run.
]
.large[
You can read more about the options at the official documentation: https://yihui.name/knitr/options/#code-evaluation
]
---
# Your turn
.huge[
* go to `rstudio.cloud`, open document `01-oz-atlas.Rmd` and change the document so that the code output is hidden, but the graphics are shown. (Hint: Google "rstudio rmarkdown cheatsheet" for some tips!)
* Re-Knit the document.
* Take a look at the [R Markdown Gallery](https://rmarkdown.rstudio.com/gallery.html).
]
```{r cdown-rmd-chunk-opts}
countdown(minutes = 5)
```
---
# Global options: Set and forget
.huge[
You can set the default chunk behaviour once at the top of the `.Rmd` file using a chunk like:
```r
knitr::opts_chunk$set(
echo = FALSE,
cache = TRUE
)
```
then you will only need to add chunk options when you have the occasional one that you'd like to behave differently.
]
---
class: bg-main1
# Your turn
.huge[
* Go to your `01-oz-atlas.Rmd` document on `rstudio.cloud` and change the global settings at the top of the rmarkdown document to `echo = FALSE`, and `cache = TRUE`
```r
knitr::opts_chunk$set(
echo = FALSE,
cache = TRUE
)
```
* Update the other code chunks by removing the code chunk options.
]
```{r countdown-cloud}
countdown(minutes = 3)
```
---
class: bg-indigo
.vvhuge.white.center[
DEMO
The many different outputs of rmarkdown
]
---
class: bg-main1
# Your turn: Different types of documents
.huge[
1. Change the output of your current R Markdown file to produce a **Word document**. Now try to produce pdf - this may not work! That's OK, we do'nt need it right now.
2. Create a new document that will produce a slide show `File > New R Markdown > Presentation`
3. Create a flexdashboard document - see this option in the `File > New R Markdown > From template` list.
]
---
class: bg-indigo
.vhuge.white[
Your Turn: Making the groups
]
.huge.white[
We are going to set up the groups for doing assignment work.
1. Choose a quote from the bag.
2. Find the other people in the class with the same quote as you
3. Grab your gear and claim a table to work together at.
]
---
class: bg-indigo
.huge.white[
Your Turn: Ask your team mates these questions:
1. What is one food you'd never want to taste again?
2. If you were a comic strip character, who would you be and why?
LASTLY, come up with a name for your team and tell this to a tutor, along with the names of members of the team.
]
```{r countdown-two}
countdown(minutes = 5,
left = 0,
right = 0,
padding = "15px",
margin = "3%",
font_size = "4em",
play_sound = TRUE)
```
---
class: bg-main1
# Your Turn
.vhuge.white[
* Go to rstudio.cloud to `oz-atlas-final.Rmd`
* Read through the document and add text where prompted to learn more about the Australian native platypus!
]
---
# Recap:
.huge[
- There is a Reproducibility Crisis
- rmarkdown = YAML + text + code
- rmarkdown has many different output types
- Platypus are interesting!
- Assignment will be announced later today
]
---
# Learning more:
.huge[
- [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and Markdown Quick Reference (Help -> Markdown Quick Reference) handy, we'll refer to it often as the course progresses
]
---
# Lab quiz
.huge[
Take the quiz for today from ED.
]
---
## Share and share alike
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
???
*Note on ethics:* When you use someone else's work, you need to (1) check if it's allowed, that it has a [creative commons license](http://creativecommons.org/licenses/by/4.0/), (2) reference them as the source.
- If you have a big document, build it up in pieces. You can run just one code chunk at a time, or the past several, or even one line of code. The "Run" button has a menu of options of doing the coding in pieces.
- The workspace of your R Markdown document is separate from the Console
- rmarkdown runs code from the start to finish in a new environment