Instructions
- This is an individual assignment. You can talk to other class members, but you need to submit your own solution.
- You need to write a report that
- answers the questions. Using paragraphs and full sentences, to respond to each question.
- provides a choice of good plots to support your answers
- has a good summary of what you have learned about pedestrian patterns in Melbourne
- Turn in
html
output file, the Rmd
file.
- Total points for the assignment is 20.
Exercise
For the week of your birthday in 2016, read in the pedestrian counts for all the sensors in Melbourne, using code like this:
library(tidyverse)
library(rwalkr)
library(lubridate)
myweek <- walk_melb(ymd("2016-10-31"), ymd("2016-11-06")) # Monday through Sunday
You can also use the shiny app in the package to explore the data, select and download a subset.
shine_melb()
- Who is the author of the
rwalkr
package?
- How many sensors are there in your data set?
- Create a week day variable, which specifies that the day in this order Mon, Tue, … and count the number of pedestrians each day at “QV Market-Peel St”. What is the busiest day?
- Make a plot of Count by Time separately for each day, for “QV Market-Peel St”. Write a couple of sentences describing the pattern.
- Plot a google map of Melbourne, with the pedestrian sensor locations overlaid. Colour the points by the total number of pedestrians during the week. Describe the spatial pattern, where most people are walking.
6. There are some mismatches in sensor names between the locations data and the counts data. Name one, if you find one. 7. Create a new variable, indicating week day vs weekend. Fit a linear model with log count (+1) as the response variable, to hour of the day, coded as a factor, for “Flinders St-Swanston St (West)”. Use an interaction. Explain why an interaction is a good idea. 8. Make a plot of the fitted values, but on the raw count scale, not logs. Overlay this on a plot of the original data. Facet the plot by the weekend variable. It should look like this:
- Predict the number of pedestrians walking by 5-6pm, on a week day.
- Explain why using time as a factor was a better approach than using it as a numerical variable, for studying the pedestrian patterns during the day.
Grading
Points for the assignment will be based on:
- Whether the Rmd file, can read the data as coded, and produce the results in the report
- Analysis and plotting code clearly written, concise, and efficient
- Explanations clear and accurate
- Plots well-constructed