+ - 0:00:00
Notes for current slide
Notes for next slide

ETC1010: Data Modelling and Computing

Lecture 9A: Networks and Graphs

Dr. Nicholas Tierney & Professor Di Cook

EBS, Monash U.

2019-09-25

1 / 33

Announcements

  • Assignment 3 has been released
  • NO LECTURE ON FRIDAY
  • Project deadlines:
    • Deadline 3 (11th October) : Electronic copy of your data, and a page of data description, and cleaning done, or needing to be done.
    • Deadline 4 (18th October) : Final version of story board uploaded.
  • Practical exam: 18th October in class at 8am
2 / 33

recap: Last week on tidy text data

3 / 33

Network analysis

4 / 33

Network analysis

A description of phone calls

  • Johnny --> Liz
  • Liz --> Anna
  • Johnny -- > Dan
  • Dan --> Liz
  • Dan --> Lucy
4 / 33

As a graph

5 / 33

And as an association matrix

[DEMO]

6 / 33

Nodes and edges?

Netword data can be thought of as two related tables, nodes and edges:

  • nodes are connection points
  • edges are the connections between points
7 / 33

Example: Mad Men. (Nodes = characters from the series)

## # A tibble: 45 x 2
## label Gender
## <fct> <fct>
## 1 Betty Draper female
## 2 Don Draper male
## 3 Harry Crane male
## 4 Joan Holloway female
## 5 Lane Pryce male
## 6 Peggy Olson female
## 7 Pete Campbell male
## 8 Roger Sterling male
## 9 Sal Romano male
## 10 Henry Francis male
## # … with 35 more rows
8 / 33

Example: Mad Men. (Edges = how they are associated)

## # A tibble: 39 x 2
## Name1 Name2
## <fct> <fct>
## 1 Betty Draper Henry Francis
## 2 Betty Draper Random guy
## 3 Don Draper Allison
## 4 Don Draper Bethany Van Nuys
## 5 Don Draper Betty Draper
## 6 Don Draper Bobbie Barrett
## 7 Don Draper Candace
## 8 Don Draper Doris
## 9 Don Draper Faye Miller
## 10 Don Draper Joy
## # … with 29 more rows
9 / 33

Why care about these relationships?

  • Telephone exchanges: Nodes are the phone numbers. Edges would indicate a call was made betwen two numbers.
  • Book or movie plots: Nodes are the characters. Edges would indicate whether they appear together in a scene, or chapter. If they speak to each other, various ways we might measure the association.
  • Social media: nodes would be the people who post on facebook, including comments. Edges would measure who comments on who's posts.
10 / 33

Drawing these relationships out:

One way to describe these relationships is to provide association matrix between many objects.

(Image created by Sam Tyner.)

11 / 33

Example: Madmen

Source: wikicommons

12 / 33

Generate a network view

  • Create a layout (in 2D) which places nodes which are most related close,
  • Plot the nodes as points, connect the appropriate lines
  • Overlaying other aspects, e.g. gender
13 / 33

introducing tidygraph and ggraph

library(tidygraph)
library(ggraph)
madmen_graph <- tbl_graph(
nodes = madmen$vertices,
edges = madmen$edges,
directed = FALSE
)
## # A tbl_graph: 45 nodes and 39 edges
## #
## # An undirected simple graph with 6 components
## #
## # Node Data: 45 x 2 (active)
## label Gender
## <fct> <fct>
## 1 Betty Draper female
## 2 Don Draper male
## 3 Harry Crane male
## 4 Joan Holloway female
## 5 Lane Pryce male
## 6 Peggy Olson female
## # … with 39 more rows
## #
## # Edge Data: 39 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 3
## 3 4 5
## # … with 36 more rows
14 / 33

plotting using ggraph

gg_madmen <- ggraph(madmen_graph, layout = "kk") +
geom_edge_link() +
geom_node_label(aes(colour = Gender,
label = label))
15 / 33

plotting using ggraph

gg_madmen

16 / 33

Which actor was most connected?

madmen_graph %>%
activate(nodes) %>%
mutate(count = centrality_degree()) %>%
arrange(-count) %>%
as_tibble()
## # A tibble: 45 x 3
## label Gender count
## <fct> <fct> <dbl>
## 1 Joan Holloway female 14
## 2 Woman at the Clios party female 6
## 3 Janine female 5
## 4 Duck Phillips male 4
## 5 Betty Draper female 3
## 6 Rachel Menken female 3
## 7 Hildy female 3
## 8 Joy female 2
## 9 Vicky female 2
## 10 Don Draper male 1
## # … with 35 more rows
17 / 33

activate() what now?

madmen_graph
## # A tbl_graph: 45 nodes and 39 edges
## #
## # An undirected simple graph with 6 components
## #
## # Node Data: 45 x 2 (active)
## label Gender
## <fct> <fct>
## 1 Betty Draper female
## 2 Don Draper male
## 3 Harry Crane male
## 4 Joan Holloway female
## 5 Lane Pryce male
## 6 Peggy Olson female
## # … with 39 more rows
## #
## # Edge Data: 39 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 3
## 3 4 5
## # … with 36 more rows
  • need to tell dplyr if you are working on nodes or edges.
  • activate means we don't need a mutate_nodes or mutate_edges commands
18 / 33

centrality what now?

madmen_graph %>%
activate(nodes) %>%
mutate(count = centrality_degree()) %>%
arrange(-count) %>%
as_tibble()
## # A tibble: 45 x 3
## label Gender count
## <fct> <fct> <dbl>
## 1 Joan Holloway female 14
## 2 Woman at the Clios party female 6
## 3 Janine female 5
## 4 Duck Phillips male 4
## 5 Betty Draper female 3
## 6 Rachel Menken female 3
## 7 Hildy female 3
## 8 Joy female 2
## 9 Vicky female 2
## 10 Don Draper male 1
## # … with 35 more rows
  • How central is a node or edge in a graph?
  • definition is inherently vague
  • there are many different centrality scores that exist
  • centrality_degree() says: "What is the number of adjacent edges?"
19 / 33

What do we learn?

  • Joan Holloway had a lot of affairs, all with loyal partners except for his wife Betty, who had two affairs herself
  • Followed by Woman at Clios party
20 / 33

Example: American college football

Early American football outfits were like Australian AFL today!

Source: wikicommons

21 / 33

Example: American college football

Fall 2000 Season of Division I college football.

  • Nodes are the teams, edges are the matches.
  • Teams are broken into "conferences" which are the primary competition, but they can play outside this group.
22 / 33

Example: American college football

foot_graph
## # A tbl_graph: 115 nodes and 613 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Node Data: 115 x 3 (active)
## uni conference schools
## <chr> <chr> <chr>
## 1 BrighamYoung Mountain West <NA>
## 2 FloridaState Atlantic Coast <NA>
## 3 Iowa Big Ten <NA>
## 4 KansasState Big Twelve <NA>
## 5 NewMexico Mountain West <NA>
## 6 TexasTech Big Twelve <NA>
## # … with 109 more rows
## #
## # Edge Data: 613 x 3
## from to same.conf
## <int> <int> <dbl>
## 1 1 2 0
## 2 3 4 0
## 3 1 5 1
## # … with 610 more rows
23 / 33
set.seed(2019-09-25-1117)
gg_foot_graph <-
ggraph(foot_graph, layout = "fr") +
geom_edge_link(alpha = 0.2) +
geom_node_point(size = 7,
alpha = 0.9,
aes(colour = conference)) +
scale_colour_brewer(palette = "Paired") +
theme(legend.position = "bottom")
24 / 33

25 / 33

What do we learn?

  • Remember layout is done to place nodes that are more similar close together in the display.
  • The colours indicate conference the team belongs too. For the most part, conferences are clustered, more similar to each other than other conferences.
  • There are some clusters of conference groups, eg Mid-American, Big East, and Atlantic Coast
  • The Independents are independent
  • Some teams play far afield from their conference.
26 / 33

Example: Harry Potter characters

Source: wikicommons

27 / 33

There is a connection between two students if one provides emotional support to the other at some point in the book.

  • Code to pull the data together is provided by Sam Tyner here.
28 / 33

Harry potter data as nodes and edges

hp
## # A tbl_graph: 64 nodes and 434 edges
## #
## # A directed multigraph with 29 components
## #
## # Node Data: 64 x 4 (active)
## name schoolyear gender house
## <chr> <dbl> <chr> <chr>
## 1 Adrian Pucey 1989 M Slytherin
## 2 Alicia Spinnet 1989 F Gryffindor
## 3 Angelina Johnson 1989 F Gryffindor
## 4 Anthony Goldstein 1991 M Ravenclaw
## 5 Blaise Zabini 1991 M Slytherin
## 6 C. Warrington 1989 M Slytherin
## # … with 58 more rows
## #
## # Edge Data: 434 x 3
## from to book
## <int> <int> <dbl>
## 1 11 25 1
## 2 11 26 1
## 3 11 44 1
## # … with 431 more rows
29 / 33

Let's plot the characters

ggraph_hp <-
ggraph(hp,
layout = "fr") +
geom_edge_link(alpha = 0.2) +
geom_node_point(aes(colour = house,
shape = gender)) +
# geom_node_text(aes(label = name)) +
facet_edges(~book,
ncol = 2) +
scale_colour_manual(values = c("#941B08","#F1F31C",
"#071A80", "#154C07"))
30 / 33

Let's plot the characters

ggraph_hp

31 / 33

Your turn: rstudio.cloud

  • Read in last semesters class data, which contains s1_name and s2_name are the first names of class members, and tutors, with the latter being the "go-to" person for the former.
  • Write the code to produce a class network that looks something like below

32 / 33

Share and share alike

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

33 / 33

Announcements

  • Assignment 3 has been released
  • NO LECTURE ON FRIDAY
  • Project deadlines:
    • Deadline 3 (11th October) : Electronic copy of your data, and a page of data description, and cleaning done, or needing to be done.
    • Deadline 4 (18th October) : Final version of story board uploaded.
  • Practical exam: 18th October in class at 8am
2 / 33
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow