Exercise

For the week of your birthday in 2016, read in the pedestrian counts for all the sensors in Melbourne, using code like this:

myweek <- walk_melb(ymd("2016-10-31"), ymd("2016-11-06")) # Monday through Sunday
  1. (1pt) Who is the author of the rwalkr package? EARO WANG
  2. (1pt) How many sensors are there in your data set? 43
  3. (2pts) Create a week day variable, which specifies that the day in this order Mon, Tue, … and count the number of pedestrians each day at “QV Market-Peel St”. What is the busiest day?
myweek <- myweek %>% mutate(day = wday(Date, label=TRUE, abbr=TRUE, week_start = 1))
qv <- myweek %>% 
  filter(Sensor == "QV Market-Peel St") %>% 
  group_by(day) %>% 
  summarise(n=sum(Count, na.rm=T))
qv
# A tibble: 7 x 2
  day       n
  <ord> <int>
1 Mon    2085
2 Tue    1630
3 Wed    2336
4 Thu    3258
5 Fri    3156
6 Sat    3347
7 Sun    3105
qv %>% filter(n==max(n))
# A tibble: 1 x 2
  day       n
  <ord> <int>
1 Sat    3347
  1. (2pts) Make a plot of Count by Time separately for each day, for “QV Market-Peel St”. Write a couple of sentences describing the pattern.

QV Market-Peel St has a big peak in the middle of the weekend days. The week days are mixed. Monday and Wednesday have triple peaks, which look like commuter and lunch time traffic. These patterns are not there on Tuesday, Thursday or Friday. Tuesday has the smallest number of people walking by. Thursday and Friday appear to have the middle of the day peak, but also possibly the commuter traffic as well, which makes for a difficult pattern detection. The opening hours for QV market are Tues, Thu-Sun, and it looks like the Thu-Sun are busy market days, as well as commuting, but Tues doesn’t have a large number of people attending.

  1. (3pts) Plot a google map of Melbourne, with the pedestrian sensor locations overlaid. Colour the points by the total number of pedestrians during the week. Describe the spatial pattern, where most people are walking.
melb <- get_map(location=c(144.9631, -37.8136), zoom=14)
loc <- pull_sensor()
loc <- loc %>% 
  filter(Sensor %in% myweek$Sensor)
# sens_count <- myweek %>% group_by(Sensor) %>% summarise(n=sum(Count, na.rm=TRUE)) %>%
#   left_join(loc, by="Sensor")
sens_count <- myweek %>% group_by(Sensor) %>% summarise(n=sum(Count, na.rm=TRUE)) %>%
  left_join(loc, by="Sensor")
ggmap(melb) + geom_point(data=sens_count, aes(x=Longitude, y=Latitude, colour=n), size=4, alpha=0.7) +
  scale_colour_distiller(palette="YlOrRd")

The corridor along Flinders to Swanston to Bourke is the busiest pedestrian traffic.

  1. (1pt) There are some mismatches in sensor names between the locations data and the counts data. Name one, if you find one.
# A tibble: 2 x 1
  Sensor                        
  <chr>                         
1 Chinatown-Lt Bourke St (South)
2 Lonsdale St (South)           

For me there are two missing sensors, but this might not be the case for everyone.

  1. (3pts) Create a new variable, indicating week day vs weekend. Fit a linear model with log count (+1) as the response variable, to hour of the day, coded as a factor, for “Flinders St-Swanston St (West)”. Use an interaction. Explain why an interaction is a good idea.
myweek <- myweek %>% mutate(weekend = ifelse(day %in% c("Sat", "Sun"), TRUE, FALSE)) %>%
  mutate(lCount = log10(Count+1))
flinders <- myweek %>% filter(Sensor == "Flinders St-Swanston St (West)")
mod <- lm(lCount~factor(Time)*weekend, data=flinders)
#tidy(mod)
glance(mod)
  r.squared adj.r.squared     sigma statistic     p.value df   logLik
1 0.9196537     0.8881847 0.1639844  29.22415 3.02108e-47 48 93.62326
        AIC      BIC deviance df.residual
1 -89.24651 63.82772 3.226908         120
fit <- augment(mod, flinders)
fit <- fit %>% mutate(.raw_fitted = 10^(.fitted)-1)

Using an interaction term allows weekends to have a different pattern to week days.

  1. (2pts) Make a plot of the fitted values, but on the raw count scale, not logs. Overlay this on a plot of the original data. Facet the plot by the weekend variable. It should look like this:
ggplot() + 
  geom_line(data=fit, aes(x=Time, y=Count, group=Date), colour="hotpink", alpha=0.9) +
  geom_line(data=fit, aes(x=Time, y=.raw_fitted), size=1.5) + facet_wrap(~weekend) 

  1. (2pts) Predict the number of pedestrians walking by 5-6pm, on a week day.
fit %>% filter((!weekend), Time == 17) %>% select(.raw_fitted) %>% distinct()
  .raw_fitted
1    4970.749
  1. (2pts) Explain why using time as a factor was a better approach than using it as a numerical variable, for studying the pedestrian patterns during the day.

With time treated as a factor there is flexibility in the type of pattern in counts that is captured by the model.

Grading

One point is reserved for easy to compile, spell-checked, nicely turned in work.