class: center, middle, inverse, title-slide # ETC1010: Data Modelling and Computing ## Week of introduction: Rmarkdown ### Professor Di Cook & Dr. Nicholas Tierney ### EBS, Monash U. ### 2019-08-07 --- class: bg-blue .vvvhuge.white.center.middle[ What is this song? ] --- class: bg-black .gigantic.white.middle.center[ Recap ] --- class: center # Traffic Light System <img src="https://nypdecider.files.wordpress.com/2014/08/help-me-help-you.gif" width="90%" style="display: block; margin: auto;" /> --- .center[ # Traffic Light System ] -- .red.pull-left.huge.middle[ # Red Post it -- * I need a hand * Slow down ] -- .green.pull-right.huge.middle[ # Green Post it -- * I am up to speed * I have completed the thing ] --- class: bg-main1, center, bottom, white <img src="https://njtierney.updog.co/img/rstudio-cooking-example.jpeg" style="display: block; margin: auto;" /> From Jessica Ward (@JKRWard) of R Ladies Newcaslte (UK) - @RLadiesNCL https://twitter.com/RLadiesNCL/status/1138812826917724160 --- background-image: url(https://njtierney.updog.co/img/unvotes-oz-usa.png) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/img/tower-of-babel.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/img/edstem.png) background-size: contain background-position: 50% 50% class: center, bottom, white --- class: bg-main1 # R essentials: A short list (for now) .large.white[ - Functions are (most often) verbs, followed by what they will be applied to in parentheses: ] ```r do_this(to_this) do_that(to_this, to_that, with_those) ``` -- .large.white[ - Columns (variables) in data frames are accessed with `$`: ] ```r dataframe$var_name ``` -- .large.white[ - Packages are installed with the `install.packages` function and loaded with the `library` function, once per session: ] ```r install.packages("package_name") library(package_name) ``` --- # Today: Outline .huge[ * Why we care about Reproducibility * R + markdown = Rmarkdown * Controling output and input of rmarkdown * Exercises on creating rmarkdown reports on the humble platypus * Form up assignment groups * Quiz * Release assignment (later today) ] ??? * Reproducibility: Why we care * Rmarkdown * YAML * Code * text * markdown (online quiz) * rmarkdown - edit the existing one on platypus! * code chunks * code chunk names * chunk options * exercise on this * setting different chunk options globally * exercise extending the platypus report * make assignment groups * release assignment * quiz Should be able to answer the questions: How should I start an rmarkdown document? What do I put in the YAML metadata? How do I create a code chunk? What sort of options to I need to worry about for my code? What is the value in a reproducible report? What is markdown? Can I combine my software and my writing? --- class: bg-black .vvhuge.white.center.middle[ We are in a tight spot with reproducibility ] --- .blockquote.huge[ Only 6 out of 53 landmark results could be reproduced -- [Amgen, 2014*](https://www.nature.com/articles/483531a) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- .blockquote.huge[ An estimated 75% - 90% of preclinical results cannot be reproduced -- [Begley, 2015*](https://www.ncbi.nlm.nih.gov/pubmed/25552691) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- .blockquote.huge[ Estimated **annual** cost of irreproducibility for biomedical industry = 28 Billion USD -- [Freedman, 2015*](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- background-image: url(https://njtierney.updog.co/gifs/njt-gif-jaws-dolly-zoom.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/gifs/ice-climber-fall.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(https://njtierney.updog.co/img/open-science-berg.jpeg) background-size: contain background-position: 50% 50% class: center, bottom, white --- class: bg-main1 center middle .vvhuge[ So what can we do about it? ] --- # Reproducibility checklist .huge[ Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) ] --- # Reproducibility checklist .huge[ Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? ] --- # Literate programming is a partial solution .huge[ * Literate programming shines some light on this dark area of science. * An idea from [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) where you combine your text with your code output to create a document. * A _blend_ of your literature (**text**), and your programming (**code**), to create something you can read from top to bottom. ] --- .huge[ So ] -- .huge[ Imagine a report: Introduction, methods, results, discussion, and conclusion, ] -- .huge[ and all the bits of code that make each section. ] .huge[ With rmarkdown, you can see all the pieces of your data analysis all together. Each time you knit the analysis is ran from the beginning ] --- class: bg-main1 .vhuge[ Markdown as a new player to legibility ] -- .vhuge[ In 2004, [John Gruber](https://en.wikipedia.org/wiki/John_Gruber), of [daring fireball](https://daringfireball.net/) created [markdown](https://en.wikipedia.org/wiki/Markdown), a simple way to create text that rendered into a HTML webpage. ] --- class: bg-main1 .pull-left[ ``` - bullet list - bullet list - bullet list ``` ] -- .pull-right.huge.white[ - bullet list - bullet list - bullet list ] --- class: bg-main1 .pull-left.huge[ ``` 1. numbered list 2. numbered list 3. numbered list __bold__, **bold**, _italic_, *italic* > quote of something profound ``` ] -- .pull-right.huge[ 1. numbered list 2. numbered list 3. numbered list __bold__, **bold**, _italic_, *italic* > quote of something profound ] --- class: bg-black .vvhuge.white[ With very little marking up, we can create rich text, that **actually resembles** the text that we want to see. ] --- class: bg-main1 .vvhuge.white[ **Learn to use markdown** In your small groups, spend five minutes working through [markdowntutorial.com](https://www.markdowntutorial.com/) ]
05
:
00
--- class: bg-main1 # Rmarkdown helps complete the solution to the reproducibility problem -- .huge[ * Q: How do we take `markdown` + `R code` = "literate programming environment" * A: `Rmarkdown` ] --- # Rmarkdown... -- .huge[ Provides an environment where you can write your complete analysis, and marries your text, and code together into a rich document. ] -- .huge[ You write your code as code chunks, put your text around that, and then hey presto, you have a document you can reproduce. ] --- # Reminder: You've already used rmarkdown! <img src="https://njtierney.updog.co/img/unvotes-oz-usa.png" width="90%" style="display: block; margin: auto;" /> --- # How will we use R Markdown? .huge[ - Every assignment + project / is an R Markdown document - You'll always have a template R Markdown document to start with - The amount of scaffolding in the template will decrease over the semester - These lecture notes are created using R Markdown (!) ] --- # The anatomy of an rmarkdown document .huge[ There are three parts to an rmarkdown document. * Metadata (YAML) * Text (markdown formatting) * Code (code formatting) ] -- .vvhuge.center[DEMO] --- class: bg-main1 # Metadata: YAML (YAML Ain't Markup Language) .huge[ * The metadata of the document tells you how it is formed - what the **title** is, what **date** to put, and other control information. * If you're familiar with `\(\LaTeX\)`, this is ksimilar to how you specify document type, styles, fonts, options, etc in the front matter / preamble. ] --- class: bg-main1 # Metadata: YAML .huge[ * Rmarkdown documents use YAML to provide the metadata. It looks like this: ```YAML --- title: "An example document" author: "Nicholas Tierney" output: html_document --- ``` It starts an ends with three dashes `---`, and has fields like the following: `title`, `author`, and `output`. ] --- class: bg-main1 # Text .huge[ Is markdown, as we discussed in the earlier section, It provides a simple way to mark up text ] .pull-left.huge[ ``` 1. bullet list 2. bullet list 3. bullet list ``` ] .pull-right.huge[ 1. bullet list 1. bullet list 1. bullet list ] --- # Code .vhuge[ We refer to code in an rmarkdown document in two ways: 1. Code chunks, and 2. Inline code. ] --- # Code: Code chunks .huge[ `Code chunks` are marked by three backticks and curly braces with `r` inside them: ````markdown ```{r chunk-name} # a code chunk ``` ```` ] --- class: bg-main1 .huge[ **a backtick** is a special character you might not have seen before, it is typically located under the tilde key (`~`). On USA / Australia keyboards, is under the escape key: <div class="figure" style="text-align: center"> <img src="https://njtierney.updog.co/img/ansi-keyboard.png" alt="image from https://commons.wikimedia.org/wiki/File:ANSI_Keyboard_Layout_Diagram_with_Form_Factor.svg" width="80%" /> <p class="caption">image from https://commons.wikimedia.org/wiki/File:ANSI_Keyboard_Layout_Diagram_with_Form_Factor.svg</p> </div> ] --- # Code: Inline code .huge[ Sometimes you want to run the code inside a sentence. This is called running the code "inline". ] -- .huge[ You might want to run the code inline to name the number of variables or rows in a dataset in a sentence like: > There are XXX observations in the airquality dataset, and XXX variables. ] --- # Code: Inline code .huge[ You can call code "inline" like so: ````markdown There are `r nrow(airquality) ` observations in the airquality dataset, and `r ncol(airquality) ` variables. ```` Which gives you the following sentence ] -- .huge[ > There are 153 observations in the airquality dataset, and 6 variables. ] --- # Code: Inline code .huge[ What's great about this is that if your data changes upstream, then you don't need to work out where you mentioned your data, you just update the document. ] --- class: bg-main1 # Your Turn: Put it together .huge[ Go to `rstudio.cloud` and * open the document "01-oz-atlas.Rmd" * knit the document * Change the data section at the top to be from a different state instead of "New South Wales" * knit the document again * How do the text and figures in the document change? ]
05
:
00
--- class: bg-main1 center middle .gigantic[ break ] --- background-image: url(https://imgs.xkcd.com/comics/art_project.png) background-size: contain background-position: 50% 50% class: center, bottom, black --- class: bg-main1 # Code: Chunk names .huge[ Straight after the ` ```{r ` you can use a text string to name the chunk: ` ```{r read-crime-data} ` ] ````markdown ```{r read-crime-data} crime <- read_csv("data/crime-data.csv") ``` ```` --- class: bg-main1 # Code: Chunk Names .huge[ Naming code chunks has three advantages: 1. Navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor. 2. Graphics produced by chunks now have useful names. 3. You can set up networks of cached chunks to avoid re-performing expensive computations on every run. ] --- # Code: Chunk names .huge[ - Every chunk should ideally have a name. - Naming things is hard, but follow these rules and you'll be fine: 1. One word that describes the action (e.g., "read") 2. One word that describes the thing inside the code (e.g, "gapminder") 3. Separate words with "-" (e.g., `read-gapminder`) ] --- # Code: Chunk options .huge[ You can control how the code is output by changing the code chunk options which follow the title. ] ````markdown ```{r read-gapminder, eval = FALSE, echo = TRUE} gap <- read_csv("gapminder.csv") ``` ```` .huge[ What do you think this does? ]
00
:
30
--- class: bg-main1 # Code: Chunk options .vlarge[ The code chunk options you need to know about right now are: * `cache`: TRUE / FALSE. Do you want to save the output of the chunk so it doesn't have to run next time? * `eval`: TRUE / FALSE Do you want to evaluate the code? * `echo`: TRUE / FALSE Do you want to print the code? * `include`: TRUE / FALSE Do you want to include code output in the final output document? Setting to `FALSE` means nothing is put into the output document, but the code is still run. ] .large[ You can read more about the options at the official documentation: https://yihui.name/knitr/options/#code-evaluation ] --- # Your turn .huge[ * go to `rstudio.cloud`, open document `01-oz-atlas.Rmd` and change the document so that the code output is hidden, but the graphics are shown. (Hint: Google "rstudio rmarkdown cheatsheet" for some tips!) * Re-Knit the document. * Take a look at the [R Markdown Gallery](https://rmarkdown.rstudio.com/gallery.html). ]
05
:
00
--- # Global options: Set and forget .huge[ You can set the default chunk behaviour once at the top of the `.Rmd` file using a chunk like: ```r knitr::opts_chunk$set( echo = FALSE, cache = TRUE ) ``` then you will only need to add chunk options when you have the occasional one that you'd like to behave differently. ] --- class: bg-main1 # Your turn .huge[ * Go to your `01-oz-atlas.Rmd` document on `rstudio.cloud` and change the global settings at the top of the rmarkdown document to `echo = FALSE`, and `cache = TRUE` ```r knitr::opts_chunk$set( echo = FALSE, cache = TRUE ) ``` * Update the other code chunks by removing the code chunk options. ]
03
:
00
--- class: bg-indigo .vvhuge.white.center[ DEMO The many different outputs of rmarkdown ] --- class: bg-main1 # Your turn: Different types of documents .huge[ 1. Change the output of your current R Markdown file to produce a **Word document**. Now try to produce pdf - this may not work! That's OK, we do'nt need it right now. 2. Create a new document that will produce a slide show `File > New R Markdown > Presentation` 3. Create a flexdashboard document - see this option in the `File > New R Markdown > From template` list. ] --- class: bg-indigo .vhuge.white[ Your Turn: Making the groups ] .huge.white[ We are going to set up the groups for doing assignment work. 1. Choose a quote from the bag. 2. Find the other people in the class with the same quote as you 3. Grab your gear and claim a table to work together at. ] --- class: bg-indigo .huge.white[ Your Turn: Ask your team mates these questions: 1. What is one food you'd never want to taste again? 2. If you were a comic strip character, who would you be and why? LASTLY, come up with a name for your team and tell this to a tutor, along with the names of members of the team. ]
05
:
00
--- class: bg-main1 # Your Turn .vhuge.white[ * Go to rstudio.cloud to `oz-atlas-final.Rmd` * Read through the document and add text where prompted to learn more about the Australian native platypus! ] --- # Recap: .huge[ - There is a Reproducibility Crisis - rmarkdown = YAML + text + code - rmarkdown has many different output types - Platypus are interesting! - Assignment will be announced later today ] --- # Learning more: .huge[ - [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and Markdown Quick Reference (Help -> Markdown Quick Reference) handy, we'll refer to it often as the course progresses ] --- # Lab quiz .huge[ Take the quiz for today from ED. ] --- ## Share and share alike <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. ??? *Note on ethics:* When you use someone else's work, you need to (1) check if it's allowed, that it has a [creative commons license](http://creativecommons.org/licenses/by/4.0/), (2) reference them as the source. - If you have a big document, build it up in pieces. You can run just one code chunk at a time, or the past several, or even one line of code. The "Run" button has a menu of options of doing the coding in pieces. - The workspace of your R Markdown document is separate from the Console - rmarkdown runs code from the start to finish in a new environment