+ - 0:00:00
Notes for current slide
Notes for next slide

ETC1010: Data Modelling and Computing

Lecture 4B: Advanced topics in data visualisation

Dr. Nicholas Tierney & Professor Di Cook

EBS, Monash U.

2019-08-23

1 / 55

While the song is playing...

Draw a mental model / concept map of last lectures content on joins.

2 / 55

recap

  • Joins
  • venn diagrams
  • feedback
3 / 55

Joins with a person and a coat, by Leight Tami

4 / 55

Upcoming Due Dates

  • Assignment 2: Thursday 5th September, 5pm
  • Stay tuned on ED for the upcoming dates
5 / 55

Making effective data plots

  1. Principles / science of data visualisation
  2. Features of graphics
6 / 55

Principles / science of data visualisation

  • Palettes and colour blindness
    • change blindness
    • using proximity
    • hierarchy of mappings
7 / 55

Features of graphics

  • Layering statistical summaries
    • Themes
    • adding interactivity
8 / 55

Palettes and colour blindness

There are three main types of colour palette:

  • Qualitative: categorical variables
  • Sequential: low to high numeric values
  • Diverging: negative to positive values
9 / 55

Qualitative: categorical variables

10 / 55

Sequential: low to high numeric values

11 / 55

Diverging: negative to positive values

12 / 55

Example: TB data

## # A tibble: 157,820 x 5
## country year count gender age
## <chr> <dbl> <dbl> <chr> <chr>
## 1 Afghanistan 1980 NA m 04
## 2 Afghanistan 1981 NA m 04
## 3 Afghanistan 1982 NA m 04
## 4 Afghanistan 1983 NA m 04
## 5 Afghanistan 1984 NA m 04
## 6 Afghanistan 1985 NA m 04
## 7 Afghanistan 1986 NA m 04
## 8 Afghanistan 1987 NA m 04
## 9 Afghanistan 1988 NA m 04
## 10 Afghanistan 1989 NA m 04
## # … with 157,810 more rows
13 / 55

Example: TB data - adding relative change from 2002 - 2012

## # A tibble: 219 x 4
## country `2002` `2012` reldif
## <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan 6509 13907 1.14
## 2 Albania 225 185 -0.178
## 3 Algeria 8246 7510 -0.0893
## 4 American Samoa 1 0 -1
## 5 Andorra 2 2 0
## 6 Angola 17988 22106 0.229
## 7 Anguilla 0 0 0
## 8 Antigua and Barbuda 4 1 -0.75
## 9 Argentina 5383 4787 -0.111
## 10 Armenia 511 316 -0.382
## # … with 209 more rows
14 / 55

Example: Sequential colour with default palette

ggplot(tb_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
theme_map()

15 / 55

Example: (improved) sequential colour with default palette

library(viridis)
ggplot(tb_map) +
geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
theme_map() + scale_fill_viridis(na.value = "white")

16 / 55

Example: Diverging colour with better palette

ggplot(tb_map) +
geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
theme_map() +
scale_fill_distiller(palette = "PRGn", na.value = "white", limits = c(-7, 7))

17 / 55

summary on colour palettes

  • Different ways to map colour to values:
    • Qualitative: categorical variables
    • Sequential: low to high numeric values
    • Diverging: negative to positive values
18 / 55

Colour blindness

  • About 8% of men (about 1 in 12), and 0.5% women (about 1 in 200) population have difficulty distinguishing between red and green.
  • Several colour blind tested palettes: RColorbrewer has an associated web site colorbrewer.org where the palettes are labelled. See also viridis, and scico.
19 / 55

Plot of two coloured points: Normal Mode

20 / 55

Plot of two coloured points: dicromat mode

21 / 55

22 / 55
p2 <- p + scale_colour_brewer(palette = "Dark2")
p2

23 / 55

24 / 55
p3 <- p + scale_colour_viridis_d()
p3

25 / 55

26 / 55

Summary colour blindness

  • Apply colourblind-friendly colourscales
    • + scale_colour_viridis()
    • + scale_colour_brewer(pallete = "Dark2")
    • scico R package
27 / 55

Pre-attentiveness: Find the odd one out?

28 / 55

Pre-attentiveness: Find the odd one out?

29 / 55

Using proximity in your plots

Basic rule: place the groups that you want to compare close to each other

30 / 55

Which plot answers which question?

  • "Is the incidence similar for males and females in 2012 across age groups?"
  • "Is the incidence similar for age groups in 2012, across gender?"
31 / 55

...incidence similar for: (males and females) or (age groups, across gender) ?"

32 / 55

Here are two different arrangements of the tb data. To answer the question "Is the incidence similar for males and females in 2012 across age groups?" the first arrangement is better. It puts males and females right beside each other, so the relative heights of the bars can be seen quickly. The answer to the question would be "No, the numbers were similar in youth, but males are more affected with increasing age."

The second arrangement puts the focus on age groups, and is better to answer the question "Is the incidence similar for age groups in 2012, across gender?" To which the answer would be "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups."

"Is the incidence similar for males and females in 2012 across age groups?"

  • Males & females next to each other: relative heights of bars is seen quickly.
  • Auestion answer: "No, the numbers were similar in youth, but males are more affected with increasing age."
33 / 55

"Is the incidence similar for age groups in 2012, across gender?"

  • Puts the focus on age groups
  • Answer to the question: "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups."
34 / 55

Proximity wrap up

  • Facetting of plots, and proximity are related to change blindness, an area of study in cognitive psychology.
  • There are a series of fabulous videos illustrating the effects of making a visual break, on how the mind processes it by Daniel Simons lab.
  • Here's one example:
    The door study
35 / 55

Layering

  • Statistical summaries: It is common to layer plots, particularly by adding statistical summaries, like a model fit, or means and standard deviations. The purpose is to show the trend in relation to the variation.
  • Maps: Commonly maps provide the framework for data collected spatially. One layer for the map, and another for the data.
36 / 55
ggplot(df, aes(x = x, y = y1)) + geom_point()

37 / 55
ggplot(df, aes(x = x, y = y1)) + geom_point() +
geom_smooth(method = "lm", se = FALSE)

38 / 55
ggplot(df, aes(x = x, y = y1)) + geom_point() +
geom_smooth(method = "lm")

39 / 55
ggplot(df, aes(x = x, y = y2)) + geom_point()

40 / 55
ggplot(df, aes(x = x, y = y2)) + geom_point() +
geom_smooth(method = "lm", se = FALSE)

41 / 55
ggplot(df, aes(x = x, y = y2)) + geom_point() +
geom_smooth(se = FALSE)

42 / 55
ggplot(df, aes(x = x, y = y2)) + geom_point() +
geom_smooth(se = FALSE, span = 0.05)

43 / 55
p1 <- ggplot(df, aes(x = x, y = y2)) + geom_point() +
geom_smooth(se = FALSE, span = 0.2)
p1

44 / 55

Interactivity with magic plotly

library(plotly)
ggplotly(p1)
45 / 55

Themes: Add some style to your plot

p <- ggplot(mtcars) +
geom_point(aes(x = wt,
y = mpg,
colour = factor(gear))) +
facet_wrap(~am)
p

46 / 55

Theme: theme_minimal

p +
theme_minimal()

47 / 55

Theme: ggthemes theme_few()

p +
theme_few() +
scale_colour_few()

48 / 55

Theme: ggthemes theme_excel() 🤒

p +
theme_excel() +
scale_colour_excel()

49 / 55

Theme: for fun

library(wesanderson)
p +
scale_colour_manual(
values = wes_palette("Royal1")
)

50 / 55

Summary: themes

  • The ggthemes package has many different styles for the plots.
  • Other packages such as xkcd, skittles, wesanderson, beyonce, ochre, ....
51 / 55

Hierarchy of mappings

  1. Position - common scale (BEST): axis system
  2. Position - nonaligned scale: boxes in a side-by-side boxplot
  3. Length, direction, angle: pie charts, regression lines, wind maps
  4. Area: bubble charts
  5. Volume, curvature: 3D plots
  6. Shading, color (WORST): maps, points coloured by numeric variable
52 / 55

Your Turn:

  • lab quiz open (requires answering questions from Lab exercise)
  • go to rstudio.cloud and check out exercise 4-B
  • If you want to use R / Rstudio on your laptop:
    • Install R + Rstudio (see )
    • open R
    • type the following:
      # install.packages("usethis")
      library(usethis)
      use_course("dmac.netlify.com/lectures/lecture4b/exercise/exercise-4b.zip")
53 / 55

Resources

54 / 55

Share and share alike

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

55 / 55

Variable types and mapping

Type of variable How to map Common errors
Categorical, qualitative Category + count/proportion displayed, often as an area plot or with a small number of categories mapped to colour or symbol Not including 0 on the count/proportion axis. Not ordering categories.
Quantitative Position along an axis Displaying as a bar, especially when showing mean values. Mapping to colour.
Date/Time Time-ordered axis, different temporal resolutions to study long term trend, or seasonal patterns. Lines typically connect measurements to indicate temporal dependence Time order corrupted
Space Conventional projections of the sphere, map aspect ratio Wrong aspect ratio

Coordinate systems

  • Cartesian, polar: most plots are made in Cartesian coordinates. Just a few are in polar coordinates, primarily the pie chart. Polar coordinates use radius and angle to describe position in 2D space. Occasionally measurements like wind (direction and speed) make sense to be plotted in polar coordinates.
  • fixed, equal: When variables are made on scales that should be comparable, it may be important to reflect this in the axes limits and page space that the plot takes. (This is different from theme(aspect.ratio=1) which sets the physical size of the plot to be the same, or in some ratio.)
  • map: Maps come in conventional formats, most often with a specific aspect ratio of vertical to horizontal axes, that depends on latitude.
  • flip: Useful for generating a plot with a categorical variable on the x axis and then flipping it sideways to look at.
df <- tibble(x = runif(100), y = runif(100) * 10)
ggplot(df, aes(x = x, y = y)) + geom_point() + coord_fixed()

ggplot(df, aes(x = x, y = y)) + geom_point() + coord_equal()

ggplot(df, aes(x = x, y = y)) + geom_point() + coord_fixed(ratio = 0.2)

ggplot(df, aes(x = x, y = y)) + geom_point() + theme(aspect.ratio = 1)

Adding interactivity to plots

Interaction on a plot can help de-clutter it, by making labels only show on mouse over. Occasionally it can be useful to zoom into parts of the plot. Often it is useful to change the aspect ratio.

The plotly package makes it easy to add interaction to ggplots.

library(plotly)
p <- passengers %>%
filter(type_of_flight == "INTL") %>%
spread(key = bound, value = amount) %>%
ggplot() + geom_point(aes(x = IN, y = OUT, label = airport)) +
facet_wrap(~Year, ncol = 8) +
coord_equal() +
scale_x_continuous("Incoming passengers (mil)", breaks = seq(0, 8000000, 2000000), labels = seq(0, 8, 2)) +
scale_y_continuous("Outgoing passengers (mil)", breaks = seq(0, 8000000, 2000000), labels = seq(0, 8, 2))
ggplotly(p)
024680246802468024680246802468024680246802468024680246802468
Incoming passengers (mil)Outgoing passengers (mil)1985-861986-871987-881988-891989-901990-911991-921992-931993-941994-951995-961996-971997-981998-991999-002000-012001-022002-032003-042004-052005-062006-072007-082008-092009-102010-112011-122012-132013-142014-152015-162016-17

While the song is playing...

Draw a mental model / concept map of last lectures content on joins.

2 / 55
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow