Instructions

Exercise

For the week of your birthday in 2016, read in the pedestrian counts for all the sensors in Melbourne, using code like this:

library(tidyverse)
library(rwalkr)
library(lubridate)
myweek <- walk_melb(ymd("2016-10-31"), ymd("2016-11-06")) # Monday through Sunday

You can also use the shiny app in the package to explore the data, select and download a subset.

shine_melb()
  1. Who is the author of the rwalkr package?
  2. How many sensors are there in your data set?
  3. Create a week day variable, which specifies that the day in this order Mon, Tue, … and count the number of pedestrians each day at “QV Market-Peel St”. What is the busiest day?
  4. Make a plot of Count by Time separately for each day, for “QV Market-Peel St”. Write a couple of sentences describing the pattern.
  5. Plot a google map of Melbourne, with the pedestrian sensor locations overlaid. Colour the points by the total number of pedestrians during the week. Describe the spatial pattern, where most people are walking.

6. There are some mismatches in sensor names between the locations data and the counts data. Name one, if you find one. 7. Create a new variable, indicating week day vs weekend. Fit a linear model with log count (+1) as the response variable, to hour of the day, coded as a factor, for “Flinders St-Swanston St (West)”. Use an interaction. Explain why an interaction is a good idea. 8. Make a plot of the fitted values, but on the raw count scale, not logs. Overlay this on a plot of the original data. Facet the plot by the weekend variable. It should look like this:

  1. Predict the number of pedestrians walking by 5-6pm, on a week day.
  2. Explain why using time as a factor was a better approach than using it as a numerical variable, for studying the pedestrian patterns during the day.

Grading

Points for the assignment will be based on: