--- title: "ETC1010: Data Modelling and Computing" subtitle: "Week of introduction: Rmarkdown" author: "Professor Di Cook & Dr. Nicholas Tierney" institute: "EBS, Monash U." date: "`r Sys.Date()`" output: xaringan::moon_reader: lib_dir: libs css: ["shinobi", "ninjutsu", "slides.css"] seal: true self_contained: false nature: ratio: "16:9" highlightStyle: github highlightLines: true countIncrementalSlides: false --- ```{r setup, include=FALSE} library(knitr) knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, collapse = TRUE, fig.height = 4, fig.width = 8, fig.align = "center", cache = FALSE) ``` ```{r} library(tidyverse) library(lubridate) library(gridExtra) ``` class: bg-blue .vvvhuge.white.center.middle[ What is this song? ] --- class: bg-black .gigantic.white.middle.center[ Recap ] --- class: center # Traffic Light System ```{r tom-cruise, out.width = "90%"} include_graphics("https://nypdecider.files.wordpress.com/2014/08/help-me-help-you.gif") ``` --- .center[ # Traffic Light System ] -- .red.pull-left.huge.middle[ # Red Post it -- * I need a hand * Slow down ] -- .green.pull-right.huge.middle[ # Green Post it -- * I am up to speed * I have completed the thing ] --- class: bg-main1, center, bottom, white ```{r rstudio-cooking-example} include_graphics("https://njtierney.updog.co/img/rstudio-cooking-example.jpeg") ``` From Jessica Ward (@JKRWard) of R Ladies Newcaslte (UK) - @RLadiesNCL https://twitter.com/RLadiesNCL/status/1138812826917724160 --- background-image: url(https://njtierney.updog.co/img/unvotes-oz-usa.png) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/img/tower-of-babel.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/img/edstem.png) background-size: contain background-position: 50% 50% class: center, bottom, white --- class: bg-main1 # R essentials: A short list (for now) .large.white[ - Functions are (most often) verbs, followed by what they will be applied to in parentheses: ] ```{r eval=FALSE, echo = TRUE} do_this(to_this) do_that(to_this, to_that, with_those) ``` -- .large.white[ - Columns (variables) in data frames are accessed with `$`: ] ```{r eval=FALSE, echo = TRUE} dataframe$var_name ``` -- .large.white[ - Packages are installed with the `install.packages` function and loaded with the `library` function, once per session: ] ```{r eval=FALSE, echo = TRUE} install.packages("package_name") library(package_name) ``` --- # Today: Outline .huge[ * Why we care about Reproducibility * R + markdown = Rmarkdown * Controling output and input of rmarkdown * Exercises on creating rmarkdown reports on the humble platypus * Form up assignment groups * Quiz * Release assignment (later today) ] ??? * Reproducibility: Why we care * Rmarkdown * YAML * Code * text * markdown (online quiz) * rmarkdown - edit the existing one on platypus! * code chunks * code chunk names * chunk options * exercise on this * setting different chunk options globally * exercise extending the platypus report * make assignment groups * release assignment * quiz Should be able to answer the questions: How should I start an rmarkdown document? What do I put in the YAML metadata? How do I create a code chunk? What sort of options to I need to worry about for my code? What is the value in a reproducible report? What is markdown? Can I combine my software and my writing? --- class: bg-black .vvhuge.white.center.middle[ We are in a tight spot with reproducibility ] --- .blockquote.huge[ Only 6 out of 53 landmark results could be reproduced -- [Amgen, 2014*](https://www.nature.com/articles/483531a) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- .blockquote.huge[ An estimated 75% - 90% of preclinical results cannot be reproduced -- [Begley, 2015*](https://www.ncbi.nlm.nih.gov/pubmed/25552691) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- .blockquote.huge[ Estimated **annual** cost of irreproducibility for biomedical industry = 28 Billion USD -- [Freedman, 2015*](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- background-image: url(https://njtierney.updog.co/gifs/njt-gif-jaws-dolly-zoom.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/gifs/ice-climber-fall.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/img/open-science-berg.jpeg) background-size: contain background-position: 50% 50% class: center, bottom, white --- class: bg-main1 center middle .vvhuge[ So what can we do about it? ] --- # Reproducibility checklist .huge[ Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) ] --- # Reproducibility checklist .huge[ Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? ] --- # Literate programming is a partial solution .huge[ * Literate programming shines some light on this dark area of science. * An idea from [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) where you combine your text with your code output to create a document. * A _blend_ of your literature (**text**), and your programming (**code**), to create something you can read from top to bottom. ] --- .huge[ So ] -- .huge[ Imagine a report: Introduction, methods, results, discussion, and conclusion, ] -- .huge[ and all the bits of code that make each section. ] .huge[ With rmarkdown, you can see all the pieces of your data analysis all together. Each time you knit the analysis is ran from the beginning ] --- class: bg-main1 .vhuge[ Markdown as a new player to legibility ] -- .vhuge[ In 2004, [John Gruber](https://en.wikipedia.org/wiki/John_Gruber), of [daring fireball](https://daringfireball.net/) created [markdown](https://en.wikipedia.org/wiki/Markdown), a simple way to create text that rendered into a HTML webpage. ] --- class: bg-main1 .pull-left[ ``` - bullet list - bullet list - bullet list ``` ] -- .pull-right.huge.white[ - bullet list - bullet list - bullet list ] --- class: bg-main1 .pull-left.huge[ ``` 1. numbered list 2. numbered list 3. numbered list __bold__, **bold**, _italic_, *italic* > quote of something profound ``` ] -- .pull-right.huge[ 1. numbered list 2. numbered list 3. numbered list __bold__, **bold**, _italic_, *italic* > quote of something profound ] --- class: bg-black .vvhuge.white[ With very little marking up, we can create rich text, that **actually resembles** the text that we want to see. ] --- class: bg-main1 .vvhuge.white[ **Learn to use markdown** In your small groups, spend five minutes working through [markdowntutorial.com](https://www.markdowntutorial.com/) ] ```{r cdown-md} library(countdown) countdown(minutes = 5) ``` --- class: bg-main1 # Rmarkdown helps complete the solution to the reproducibility problem -- .huge[ * Q: How do we take `markdown` + `R code` = "literate programming environment" * A: `Rmarkdown` ] --- # Rmarkdown... -- .huge[ Provides an environment where you can write your complete analysis, and marries your text, and code together into a rich document. ] -- .huge[ You write your code as code chunks, put your text around that, and then hey presto, you have a document you can reproduce. ] --- # Reminder: You've already used rmarkdown! ```{r remind-unvotes, out.width = "90%"} include_graphics("https://njtierney.updog.co/img/unvotes-oz-usa.png") ``` --- # How will we use R Markdown? .huge[ - Every assignment + project / is an R Markdown document - You'll always have a template R Markdown document to start with - The amount of scaffolding in the template will decrease over the semester - These lecture notes are created using R Markdown (!) ] --- # The anatomy of an rmarkdown document .huge[ There are three parts to an rmarkdown document. * Metadata (YAML) * Text (markdown formatting) * Code (code formatting) ] -- .vvhuge.center[DEMO] --- class: bg-main1 # Metadata: YAML (YAML Ain't Markup Language) .huge[ * The metadata of the document tells you how it is formed - what the **title** is, what **date** to put, and other control information. * If you're familiar with $\LaTeX$, this is ksimilar to how you specify document type, styles, fonts, options, etc in the front matter / preamble. ] --- class: bg-main1 # Metadata: YAML .huge[ * Rmarkdown documents use YAML to provide the metadata. It looks like this: ```YAML --- title: "An example document" author: "Nicholas Tierney" output: html_document --- ``` It starts an ends with three dashes `---`, and has fields like the following: `title`, `author`, and `output`. ] --- class: bg-main1 # Text .huge[ Is markdown, as we discussed in the earlier section, It provides a simple way to mark up text ] .pull-left.huge[ ``` 1. bullet list 2. bullet list 3. bullet list ``` ] .pull-right.huge[ 1. bullet list 1. bullet list 1. bullet list ] --- # Code .vhuge[ We refer to code in an rmarkdown document in two ways: 1. Code chunks, and 2. Inline code. ] --- # Code: Code chunks .huge[ `Code chunks` are marked by three backticks and curly braces with `r` inside them: ````markdown ```{r chunk-name}`r ''` # a code chunk ``` ```` ] --- class: bg-main1 .huge[ **a backtick** is a special character you might not have seen before, it is typically located under the tilde key (`~`). On USA / Australia keyboards, is under the escape key: ```{r show-backtick, out.width = "80%", echo = FALSE, fig.cap = "image from https://commons.wikimedia.org/wiki/File:ANSI_Keyboard_Layout_Diagram_with_Form_Factor.svg"} knitr::include_graphics("https://njtierney.updog.co/img/ansi-keyboard.png") ``` ] --- # Code: Inline code .huge[ Sometimes you want to run the code inside a sentence. This is called running the code "inline". ] -- .huge[ You might want to run the code inline to name the number of variables or rows in a dataset in a sentence like: > There are XXX observations in the airquality dataset, and XXX variables. ] --- # Code: Inline code .huge[ You can call code "inline" like so: ````markdown There are `r "\u0060r nrow(airquality) \u0060"` observations in the airquality dataset, and `r "\u0060r ncol(airquality) \u0060"` variables. ```` Which gives you the following sentence ] -- .huge[ > There are `r nrow(airquality)` observations in the airquality dataset, and `r ncol(airquality)` variables. ] --- # Code: Inline code .huge[ What's great about this is that if your data changes upstream, then you don't need to work out where you mentioned your data, you just update the document. ] --- class: bg-main1 # Your Turn: Put it together .huge[ Go to `rstudio.cloud` and * open the document "01-oz-atlas.Rmd" * knit the document * Change the data section at the top to be from a different state instead of "New South Wales" * knit the document again * How do the text and figures in the document change? ] ```{r cdown-rmd-inline} countdown(minutes = 5) ``` --- class: bg-main1 center middle .gigantic[ break ] --- background-image: url(https://imgs.xkcd.com/comics/art_project.png) background-size: contain background-position: 50% 50% class: center, bottom, black --- class: bg-main1 # Code: Chunk names .huge[ Straight after the ` ```{r ` you can use a text string to name the chunk: ` ```{r read-crime-data} ` ] ````markdown ```{r read-crime-data}`r ''` crime <- read_csv("data/crime-data.csv") ``` ```` --- class: bg-main1 # Code: Chunk Names .huge[ Naming code chunks has three advantages: 1. Navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor. 2. Graphics produced by chunks now have useful names. 3. You can set up networks of cached chunks to avoid re-performing expensive computations on every run. ] --- # Code: Chunk names .huge[ - Every chunk should ideally have a name. - Naming things is hard, but follow these rules and you'll be fine: 1. One word that describes the action (e.g., "read") 2. One word that describes the thing inside the code (e.g, "gapminder") 3. Separate words with "-" (e.g., `read-gapminder`) ] --- # Code: Chunk options .huge[ You can control how the code is output by changing the code chunk options which follow the title. ] ````markdown ```{r read-gapminder, eval = FALSE, echo = TRUE}`r ''` gap <- read_csv("gapminder.csv") ``` ```` .huge[ What do you think this does? ] ```{r} countdown(minutes = 0, seconds = 30) ``` --- class: bg-main1 # Code: Chunk options .vlarge[ The code chunk options you need to know about right now are: * `cache`: TRUE / FALSE. Do you want to save the output of the chunk so it doesn't have to run next time? * `eval`: TRUE / FALSE Do you want to evaluate the code? * `echo`: TRUE / FALSE Do you want to print the code? * `include`: TRUE / FALSE Do you want to include code output in the final output document? Setting to `FALSE` means nothing is put into the output document, but the code is still run. ] .large[ You can read more about the options at the official documentation: https://yihui.name/knitr/options/#code-evaluation ] --- # Your turn .huge[ * go to `rstudio.cloud`, open document `01-oz-atlas.Rmd` and change the document so that the code output is hidden, but the graphics are shown. (Hint: Google "rstudio rmarkdown cheatsheet" for some tips!) * Re-Knit the document. * Take a look at the [R Markdown Gallery](https://rmarkdown.rstudio.com/gallery.html). ] ```{r cdown-rmd-chunk-opts} countdown(minutes = 5) ``` --- # Global options: Set and forget .huge[ You can set the default chunk behaviour once at the top of the `.Rmd` file using a chunk like: ```r knitr::opts_chunk$set( echo = FALSE, cache = TRUE ) ``` then you will only need to add chunk options when you have the occasional one that you'd like to behave differently. ] --- class: bg-main1 # Your turn .huge[ * Go to your `01-oz-atlas.Rmd` document on `rstudio.cloud` and change the global settings at the top of the rmarkdown document to `echo = FALSE`, and `cache = TRUE` ```r knitr::opts_chunk$set( echo = FALSE, cache = TRUE ) ``` * Update the other code chunks by removing the code chunk options. ] ```{r countdown-cloud} countdown(minutes = 3) ``` --- class: bg-indigo .vvhuge.white.center[ DEMO The many different outputs of rmarkdown ] --- class: bg-main1 # Your turn: Different types of documents .huge[ 1. Change the output of your current R Markdown file to produce a **Word document**. Now try to produce pdf - this may not work! That's OK, we do'nt need it right now. 2. Create a new document that will produce a slide show `File > New R Markdown > Presentation` 3. Create a flexdashboard document - see this option in the `File > New R Markdown > From template` list. ] --- class: bg-indigo .vhuge.white[ Your Turn: Making the groups ] .huge.white[ We are going to set up the groups for doing assignment work. 1. Choose a quote from the bag. 2. Find the other people in the class with the same quote as you 3. Grab your gear and claim a table to work together at. ] --- class: bg-indigo .huge.white[ Your Turn: Ask your team mates these questions: 1. What is one food you'd never want to taste again? 2. If you were a comic strip character, who would you be and why? LASTLY, come up with a name for your team and tell this to a tutor, along with the names of members of the team. ] ```{r countdown-two} countdown(minutes = 5, left = 0, right = 0, padding = "15px", margin = "3%", font_size = "4em", play_sound = TRUE) ``` --- class: bg-main1 # Your Turn .vhuge.white[ * Go to rstudio.cloud to `oz-atlas-final.Rmd` * Read through the document and add text where prompted to learn more about the Australian native platypus! ] --- # Recap: .huge[ - There is a Reproducibility Crisis - rmarkdown = YAML + text + code - rmarkdown has many different output types - Platypus are interesting! - Assignment will be announced later today ] --- # Learning more: .huge[ - [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and Markdown Quick Reference (Help -> Markdown Quick Reference) handy, we'll refer to it often as the course progresses ] --- # Lab quiz .huge[ Take the quiz for today from ED. ] --- ## Share and share alike Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. ??? *Note on ethics:* When you use someone else's work, you need to (1) check if it's allowed, that it has a [creative commons license](http://creativecommons.org/licenses/by/4.0/), (2) reference them as the source. - If you have a big document, build it up in pieces. You can run just one code chunk at a time, or the past several, or even one line of code. The "Run" button has a menu of options of doing the coding in pieces. - The workspace of your R Markdown document is separate from the Console - rmarkdown runs code from the start to finish in a new environment