Instructions to Students

This assignment is designed to simulate a scenario where you are taking over someone’s existing work, and continuing with it to draw some further insights.

This is a real world dataset taken from the Crime Statistics Agency Victoria. https://www.crimestatistics.vic.gov.au/download-data, specifically the data called “Data tables - Spotlight: Burglary/Break and Enter Offences Recorded in Victoria visualisation - year ending December 2018 (XLSX, 4.4 MB)”. The raw data is used for this assignment, with no changes made.

You are writing a quick summary of the data, following along from guidance from Amelia, and some of the questions your manager has. This is not a formal report, but rather something you are giving to your manager that describes the data, and some interesting insights. We have written example text for the first section on Monash, and would like you to explore another area. Our example writings are a good example of how to get full marks.

Your “colleague”, Amelia (in the text treatment below) has written some helpful hints throughout the assignment to help guide you.

Questions that are work marks are indicated with ** at the start and end of the question, as well as a number of marks in parenthesis.

Marking + Grades

This assignment will be worth 4% of your total grade, and will be marked out of 16 marks total.

  • 3 Marks for grammar and clarity. You must write in complete sentences and do a spell check.
  • 3 Marks for overall presentation of the data visualisations
  • 10 marks for the questions

  • Your marks will then be weighted according to peer evaluation.

  • Sections that contain marks are indicated with **, and will have the number of marks indicated in parentheses. For example:

# `**` What are the types of item divisions? How many are there? (0.5 Mark) `**`

A Note on skills

As of week 1, you have seen some of the code used here, but I do not expect you to know immediately what the code below does. This is a challenge for you! We will be covering skills on data summary and data visualisation in the next two weeks, but this assignment is designed to simulate a real life work situation - this means that there are some things where you need to “learn on the job”. But the vast majority of the assignment will cover things that you will have seen in class, or the readings.

Remember, you can look up the help file for functions by typing ?function_name. For example, ?mean. Feel free to google questions you have about how to do other kinds of plots, and post on the ED if you have any questions about the assignment.

How to complete this assignment.

To complete the assignment you will need to fill in the blanks for function names, arguments, or other names. These sections are marked with *** or ___. At a minimum, your assignment should be able to be “knitted” using the knit button for your Rmarkdown document.

If you want to look at what the assignment looks like in progress, but you do not have valid R code in all the R code chunks, remember that you can set the chunk options to eval = FALSE. If you do this, please remember to ensure that you remove this chunk option or set it to eval = TRUE when you submit the assignment, to ensure all your R code runs.

You will be completing this assignment in your assigned groups. A reminder regarding our recommendations for completing group assignments:

  • Each member of the group completes the entire assignment, as best they can.
  • Group members compare answers and combine it into one document for the final submission.

Your assignments will be peer reviewed, and results checked for reproducibility. This means:

  • 25% of the assignment grade will come from peer evaluation.
  • Peer evaluation is an important learning tool.

Each student will be randomly assigned another team’s submission to provide feedback on three things:

  1. Could you reproduce the analysis?
  2. Did you learn something new from the other team’s approach?
  3. What would you suggest to improve their work?

Due Date

This assignment is due in by close of business (5pm) on Friday 16th August. You will submit the assignment via ED. Please change the file name to include your teams name. For example, if you are team dplyr, your assignment file name could read: “assignment-1-2019-s2-team-dplyr.Rmd”

Treatment

You work as a data scientist in the well named company, “The Security Company”, that sells security products: alarms, surveillance cameras, locks, screen doors, big doors, and so on.

It’s your second day at the company, and you’re taken to your desk. Your boss says to you:

Amelia has managed to find this treasure trove of data - get this: crime statistics on breaking and entering around Victoria for the past years! Unfortunately, Amelia just left on holiday to New Zealand. They discovered this dataset the afternoon before they left on holiday, and got started on doing some data analysis.

We’ve got a meeting coming up soon where we need to discuss some new directions for the company, and we want you to tell us about this dataset and what we can do with it. We want to focus on Monash, since we have a few big customers in that area, and then we want you to help us compare that whatever area has the highest burglary.

You’re in with the new hires of data scientists here. We’d like you to take a look at the data and tell me what the spreadsheet tells us. I’ve written some questions on the report for you to answer, and there are also some questions from Amelia I would like you to look at as well.

Most Importantly, can you get this to me by COB Friday 16th August (COB = Close of Business at 5pm).

I’ve given this dataset to some of the other new hire data scientists as well, you’ll all be working as a team on this dataset. I’d like you to all try and work on the questions separately, and then combine your answers together to provide the best results.

From here, you are handed a USB stick. You load this into your computer, and you see a folder called “vic-crime”. In it is a folder called “data-raw”, and an Rmarkdown file. It contains the start of a data analysis. Your job is to explore the data and answer the questions in the document.

Note that the text that is written was originally written by Amelia, and you need to make sure that their name is kept up top, and to pay attention to what they have to say in the document! # Data read in.

Amelia: First, let’s read in the data using the function read_excel() from the readxl package, and clean up the names, using the rename function from dplyr.

library(readxl)
crime_raw <- read_excel("data-raw/Data_tables_spotlight_burglary_break_and_enter_visualisation_year_ending_December_2018_v3.xlsx",
                    sheet = 6)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
crime <- crime_raw %>%
  rename(year = `Year ending December`,
         local_gov_area = `Local Government Area`,
         offence_subgroup = `Offence Subgroup`,
         item_division = `Property Item Division`,
         item_subdivision = `Property Item Subdivision`,
         n_property_items = `Number of Property Items`)

Amelia: Let’s print the data and look at the first few rows.

crime
## # A tibble: 43,830 x 6
##     year local_gov_area offence_subgroup   item_division  item_subdivision
##    <dbl> <chr>          <chr>              <chr>          <chr>           
##  1 2009. Alpine         B321 Residential … Cash/Document  Cash/Document   
##  2 2009. Alpine         B321 Residential … Cigarettes/Li… Cigarettes/Liqu…
##  3 2009. Alpine         B321 Residential … Electrical Ap… Other Electrica…
##  4 2009. Alpine         B321 Residential … Electrical Ap… Video Game Unit 
##  5 2009. Alpine         B321 Residential … Firearms/Ammu… Firearms/Ammuni…
##  6 2009. Alpine         B321 Residential … Food           Food            
##  7 2009. Alpine         B321 Residential … Garden Items   Garden Items    
##  8 2009. Alpine         B321 Residential … Household Ite… Household Items 
##  9 2009. Alpine         B321 Residential … Jewellery      Jewellery       
## 10 2009. Alpine         B321 Residential … Marine Proper… Marine Property 
## # ... with 43,820 more rows, and 1 more variable: n_property_items <dbl>

Amelia: And what are the names of the columns in the dataset?

names(crime)
## [1] "year"             "local_gov_area"   "offence_subgroup"
## [4] "item_division"    "item_subdivision" "n_property_items"

Amelia: How many years of data are there?

summary(crime$year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2009    2011    2014    2014    2016    2018

Amelia: We have data that goes from 2009 until 2018, that’s nine years of data!

How many Local Government Areas (LGAs) are there? And what are the LGAs called?

There are 80 LGAs and they are listed in the table below.

n_distinct(crime$local_gov_area)
## [1] 80
unique(crime$local_gov_area)
##  [1] "Alpine"               "Ararat"               "Ballarat"            
##  [4] "Banyule"              "Bass Coast"           "Baw Baw"             
##  [7] "Bayside"              "Benalla"              "Boroondara"          
## [10] "Brimbank"             "Buloke"               "Campaspe"            
## [13] "Cardinia"             "Casey"                "Central Goldfields"  
## [16] "Colac-Otway"          "Corangamite"          "Darebin"             
## [19] "East Gippsland"       "Frankston"            "Gannawarra"          
## [22] "Glen Eira"            "Glenelg"              "Golden Plains"       
## [25] "Greater Bendigo"      "Greater Dandenong"    "Greater Geelong"     
## [28] "Greater Shepparton"   "Hepburn"              "Hindmarsh"           
## [31] "Hobsons Bay"          "Horsham"              "Hume"                
## [34] "Indigo"               "Kingston"             "Knox"                
## [37] "Latrobe"              "Loddon"               "Macedon Ranges"      
## [40] "Manningham"           "Mansfield"            "Maribyrnong"         
## [43] "Maroondah"            "Melbourne"            "Melton"              
## [46] "Mildura"              "Mitchell"             "Moira"               
## [49] "Monash"               "Moonee Valley"        "Moorabool"           
## [52] "Moreland"             "Mornington Peninsula" "Mount Alexander"     
## [55] "Moyne"                "Murrindindi"          "Nillumbik"           
## [58] "Northern Grampians"   "Port Phillip"         "Pyrenees"            
## [61] "Queenscliffe"         "South Gippsland"      "Southern Grampians"  
## [64] "Stonnington"          "Strathbogie"          "Surf Coast"          
## [67] "Swan Hill"            "Towong"               "Wangaratta"          
## [70] "Warrnambool"          "Wellington"           "West Wimmera"        
## [73] "Whitehorse"           "Whittlesea"           "Wodonga"             
## [76] "Wyndham"              "Yarra"                "Yarra Ranges"        
## [79] "Yarriambiack"         "Victoria"

Amelia: That’s a lot of areas - about 80!

What are the types of offence subgroups? How many are there?

There are 6 types of offence subgroups and they are listed in the table below.

unique(crime$offence_subgroup)
## [1] "B321 Residential non-aggravated burglary"    
## [2] "B322 Non-residential non-aggravated burglary"
## [3] "B311 Residential aggravated burglary"        
## [4] "B319 Unknown aggravated burglary"            
## [5] "B329 Unknown non-aggravated burglary"        
## [6] "B312 Non-residential aggravated burglary"
n_distinct(crime$offence_subgroup)
## [1] 6

Amelia: Remember that you can learn more about what these functions do by typing ?unique or ?n_distinct into the console.

** What are the types of item divisions? How many are there? (0.5 Mark) **

The types of item divisions are listed in the table below, and there are 25 of them.

unique(crime$item_division)
##  [1] "Cash/Document"         "Cigarettes/Liquor"    
##  [3] "Electrical Appliances" "Firearms/Ammunition"  
##  [5] "Food"                  "Garden Items"         
##  [7] "Household Items"       "Jewellery"            
##  [9] "Marine Property"       "Other"                
## [11] "Personal Property"     "Sporting Goods"       
## [13] "Tools"                 "Tv/Vcr"               
## [15] "Clothing"              "Photographic Equip"   
## [17] "Power Tools"           "Car Accessories"      
## [19] "Weapons"               "Domestic Pets"        
## [21] "Furniture"             "Timber/Build Mat"     
## [23] "Police Property"       "Livestock"            
## [25] "Explosives"
n_distinct(crime$item_division)
## [1] 25

** What are the types of item subdivisions? (0.5 Mark) **

The types of item subdivisions are listed in the table below.

unique(crime$item_subdivision)
##  [1] "Cash/Document"               "Cigarettes/Liquor"          
##  [3] "Other Electrical Appliances" "Video Game Unit"            
##  [5] "Firearms/Ammunition"         "Food"                       
##  [7] "Garden Items"                "Household Items"            
##  [9] "Jewellery"                   "Marine Property"            
## [11] "Other Property Items"        "Personal Property"          
## [13] "Sporting Goods"              "Tools"                      
## [15] "Tv/Vcr"                      "Clothing"                   
## [17] "Photographic Equip"          "Power Tools"                
## [19] "Car Accessories"             "Computer"                   
## [21] "Mobile Phone"                "Weapons"                    
## [23] "Key"                         "Domestic Pets"              
## [25] "Speaker"                     "Furniture"                  
## [27] "Timber/Build Mat"            "Police Property"            
## [29] "Livestock"                   "Explosives"                 
## [31] "Laptop"                      "Tablet Computer"
n_distinct(crime$item_subdivision)
## [1] 32

** What is the summary of the number of property items? (0.5 Mark) **

summary(crime$n_property_items)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     1.00     2.00     7.00    74.15    29.00 45027.00

** Can you tell me what each row represents, and what each of the columns measure? (1 Mark) **

Amelia: We need to describe what each row of the data represents, and take our best guess at what we think each column measures. It might be worthwhile looking through the excel sheet in the data folder, or on the website where the data was extracted.

Answer

Each row represents a different crime that has been recorded by the Victorian Police, more so different burlgaries. Each column represents different facts about these buglaries. These recorded variables include:

  • ‘year’ - This tells us what year the burglary took place.
  • ‘local_gov_area’ - This tells us in what Local Government Area the crime took place. There are 80 Local Government Areas including Victoria. The rest of the Local Government Areas are situated in Victoria.
  • ‘offence_subgroup’ - Any burglary offence is coded to the CSA offence classification category “B30 Buglary/Break and Enter”. This column tells us which out of the 6 subgroups of this category the crime fell into.
  • ‘item_division’ - This tells us the category of propery item that was stolen.
  • ‘item_subdivision’ - This tells us the specific item within the item category that was stolen.
  • ‘n_property_items’ - This tells how many items were reported stolen.

Is there a yearly trend in the total number of items stolen per year?

Amelia: Let’s group by year and then sum up the number of property items. Then we can take this information and use ggplot to plot the year on the x axis, and n_items on the y axis, and number of items as a column with geom_col().

crime_year_n_items <- crime %>%
  group_by(year) %>%
  summarise(n_items = sum(n_property_items))

library(ggplot2)
ggplot(crime_year_n_items,
       aes(x = year,
           y = n_items)) +
  geom_col(aes(fill=-n_items,)) + 
  labs(x="Year",y="Number of Property Items Stolen") +
  labs(title="Yearly Trend in the Total Number of Items Stolen per Year")

Summary

This graph represents the total number of items stolen per year in Victoria. On the x-axis we have a timeline from 2009 until 2018. The y-axis represents the total amount of items stolen in a given year. The darker the colour blue the more items stolen in a year, the lighter the colour the less items stolen. From this graph we can see that the total number of items stolen from 2009 until 2018 peaked in 2012, with roughly 350 000 items stolen. The amount of items stolen then stayed relatively steady between 2009 and 2011, as well as from 2013 until 2016, with this amounting to roughly 320 000. From 2016 onwards the amount of items stolen decreased, suggesting that overall crime within Victoria has decreased in recent years. From the 2012 peak to the most recent 2018 trough, amount of items stolen have decreased by roughly 100 000.

Amelia: I try and write three sentences complete about what I learn in a graphic. You should start with a quick summary of what the graphic shows you. Then, describe what is on the x axis, the y axis, and any other colours used to separate the data. You then need to describe what you learn. So, I would say:

“A summary of the number of items stolen from burglaries for each year from 2009 until 2018. On the x axis is each year, and the y axis is the number of items stolen. We learn that the number of items stolen stays around 300,000 (3e+05 means the number 3 with 5 zeros after it), but from 201, the number of items stolen has decreased each year.”

Look at burglary around Monash and tell me about it?

Amelia: Let’s filter the data down to the ‘Monash’ LGAs.

crime_monash <- crime %>% filter(local_gov_area == "Monash")

Is crime in Monash increasing?

Amelia: Let’s count the number of crimes per year.

crime_count_monash <- crime_monash %>% count(year) 

ggplot(crime_count_monash,
       aes(x = year,
           y = n)) + 
  geom_col(aes(fill=-n,)) + 
  labs(x="Year",y="Number of Crimes") +
  labs(title="Crime in Monash between 2009 - 2018") 

Amelia: This plot shows the number of burglary crimes per year across Victoria. The x axis shows the year, and the y axis shows the number of crimes scored for that year. There appears to be a slight upwards trend, but it looks variable for each year.

What are the most common offences in Monash across all years?

Amelia: We count the number of observations in each offence_subgroup to tell us which are the most common.

crime_monash %>% count(offence_subgroup)
## # A tibble: 6 x 2
##   offence_subgroup                                 n
##   <chr>                                        <int>
## 1 B311 Residential aggravated burglary           117
## 2 B312 Non-residential aggravated burglary        10
## 3 B319 Unknown aggravated burglary                 6
## 4 B321 Residential non-aggravated burglary       273
## 5 B322 Non-residential non-aggravated burglary   248
## 6 B329 Unknown non-aggravated burglary            35

Amelia: The top subgroups are “B321 Residential non-aggravated burglary”, at 273, followed by “B322 Non-residential non-aggravated burglary” at 248.

Are any of these offences increasing over time?

Amelia: We take the crime data, then group by year, and count the number of offences in each year. We then plot this data. On the x axis we have year. On the y axis we have n, the number of crimes that take place in a subgroup in a year, and we are colouring according to the offence subgroup, and drawing this with a line, then making sure that the limits go from 0 to 30.

crime_year_offence_monash <- crime_monash %>%
  group_by(year) %>%
  count(offence_subgroup)

ggplot(crime_year_offence_monash,
       aes(x = year,
           y = n,
           colour = offence_subgroup)) + 
  geom_line() + 
  lims(y = c(0, 35)) # Makes sure the y axis goes to zero

Amelia: This shows us that the most common offence is “residential non-aggravated burglary”,

Answer

No offence is really increasing at a dramatic rate over time, only ‘B311 - Residential aggravated burglary’ is on a slight upward trend, apart from this all are fairly stationary. If anything we could say that ‘B329 - unknown non-aggravated burglary’ has decreased.

What are the most common items stolen in Monash?

Amelia: We count up the item subdivisions, which is the smallest category on items. We then plot number of times an item is stolen, and reorder the y axis so that the items are in order of most to least.

crime_items_monash <- crime_monash %>% 
  count(item_subdivision)

# save an object of the maximum number of items stolen
# to help construct the plot below.
max_items_stolen <- max(crime_items_monash$n)

ggplot(crime_items_monash,
       aes(x = n,
           y = reorder(item_subdivision, n))) + 
  geom_point() + 
  lims(x = c(0, max_items_stolen)) + # make sure x axis goes from 0 
  labs(title = "Most Common Items Stolen in Monash", x="Number of Items Stolen",y="Types of Items Stolen")

Amelia:

Using all the crime data, what are the top 5 local government areas for total burglaries?

Amelia: This could be where we focus our next marketing campaign! Let’s take the crime data, then count the number of rows in each local_gov_area, and take the top 5 results using top_n, and arrange in descending order by the column “n”

crime %>%
  count(local_gov_area) %>%
  top_n(n = 5) %>%
  arrange(desc(n))
## Selecting by n
## # A tibble: 5 x 2
##   local_gov_area      n
##   <chr>           <int>
## 1 Victoria         1335
## 2 Casey             831
## 3 Wyndham           830
## 4 Greater Geelong   828
## 5 Brimbank          817

(**) Which LGA had the most crime? (0.5 Mark) (**)

Earlier we saw that there were 80 Local Government Areas (LGAs) however this number included Victoria as a LGA. As per the crimes statistic website we see that there is only 79 LGAs, hence Victoria should not count as an LGA. Therefore the LGA with the most crime is the City of Casey with 831 burglaries.

** Subset the data to be the LGA with the most crime. (0.5 Mark) **

(crime_Casey <- crime %>% 
  filter(local_gov_area == "Casey"))
## # A tibble: 831 x 6
##     year local_gov_area offence_subgroup    item_division item_subdivision
##    <dbl> <chr>          <chr>               <chr>         <chr>           
##  1 2009. Casey          B311 Residential a… Car Accessor… Car Accessories 
##  2 2009. Casey          B311 Residential a… Cash/Document Cash/Document   
##  3 2009. Casey          B311 Residential a… Electrical A… Computer        
##  4 2009. Casey          B311 Residential a… Jewellery     Jewellery       
##  5 2009. Casey          B311 Residential a… Other         Key             
##  6 2009. Casey          B311 Residential a… Other         Mobile Phone    
##  7 2009. Casey          B311 Residential a… Personal Pro… Personal Proper…
##  8 2009. Casey          B311 Residential a… Photographic… Photographic Eq…
##  9 2009. Casey          B311 Residential a… Tv/Vcr        Tv/Vcr          
## 10 2009. Casey          B312 Non-residenti… Cash/Document Cash/Document   
## # ... with 821 more rows, and 1 more variable: n_property_items <dbl>

Repeat the previous analysis, but compare Monash with the rest of the data.

** Is crime in Casey increasing? (1 Mark) **

crime_count_Casey <- crime_Casey %>% count(year) 
library(ggplot2)
ggplot(crime_count_Casey,
       aes(x = year,
           y = n)) + 
  geom_col(aes(fill=-n,)) +
  labs(x="Year",y="Number of Crimes") +
  labs(title="Crime in Casey between 2009 - 2018") +
  lims(y = c(0, 95)) 

This graph represents crime in Casey between 2009 to 2018. On the x-axis we have a timeline on a yearly basis. The y-axis represents the number of crimes in Casey. Each column is colour coded, the darker the blue the more burglaries recorded in Casey, the lighter blue represents less crimes recorded.

From this we can see that Casey’s crime has gradually been increasing since 2008, with some peaks in 2012 and 2014. Crime in this context refers to burglaries.

** What are the most common offences at Casey across all years? (1 Marks) **

crime_Casey %>%
  count(offence_subgroup) %>%
arrange(desc(n))
## # A tibble: 5 x 2
##   offence_subgroup                                 n
##   <chr>                                        <int>
## 1 B321 Residential non-aggravated burglary       290
## 2 B322 Non-residential non-aggravated burglary   255
## 3 B311 Residential aggravated burglary           181
## 4 B329 Unknown non-aggravated burglary            93
## 5 B312 Non-residential aggravated burglary        12

The most common offences in Casey are ‘B321 - Residential non-aggravated burglary’ with 290 offence, next is ‘B322 - Non-residential non-aggravated burglary’ with 255 offences.

** Are any of these offences increasing over time? (1 Mark) **

crime_year_offence_Casey <- crime_Casey %>%
  group_by(year) %>%
  count(offence_subgroup)

ggplot(crime_year_offence_Casey,
       aes(x = year,
           y = n,
           colour = offence_subgroup)) + 
  geom_line() + 
  labs(x="Years",y="Number of Offences") +
  labs(title="Types of Offences - Casey: 2008 - 2018")

  lims(y = c(0, 35)) # Makes sure the y axis goes to zero
## $y
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --   35

This graph represents how the amount of offences have changed since 2008 until 2018 in the City of Casey. On the x-axis we have the timeline in years, and on the y-axis we have the frequency at which each offence occurs. Each line is colour coded to represent what type of burglary took place. The legend on the right explains what colour represents what subgroup of burglary.

From this graph we can clearly see that ‘B311 - Residential aggravated burglary’ is on an upward trend, whereas ‘B321 - Residential non-aggravated burglary’, and ‘B312 - Residential aggravated burglary’ is on a slight upward trend since 2008.

Lastly, ‘B329 - Unknown non-aggravated burglary’ is on a downward trend.

Amelia: I would write three sentences complete about what I learn in this graphic. You should start with a quick summary of what the graphic shows you. Then, describe what is on the x axis, the y axis, and any other colours used to separate the data. You then need to describe what you learn.

What are the most common subdivision items stolen in Casey?

crime_items_Casey <- crime_Casey %>% 
  count(item_subdivision)

max_items_stolen <- max(crime_items_Casey$n)

ggplot(crime_items_Casey,
       aes(x = n,
           y = reorder(item_subdivision, n))) + 
  geom_point() +
  lims(x = c(0, max_items_stolen)) +
  labs(x="Number of Items Stolen",y="Items") + 
  labs(title="Most Common Subdivision Items Stolen in Casey")

Summary

The most common subdivision item stolen in Casey is electrical appliances (not including computers, mobile phones, Tv/Vcrs, tablet computers).

Combine Monash with the top crime LGA area into one data set using bind_rows()

Amelia: You can stack the data together using bind_rows().

crime_top_monash <- bind_rows(crime_monash,
                              crime_Casey)

Amelia: Use ggplot to create two separate plots for each local government area using facet_wrap() on local government area.

crime_year_offence_both <- crime_top_monash %>%
  group_by(year, local_gov_area) %>%
  count(offence_subgroup)

gg_crime_offence <- ggplot(crime_year_offence_both,
       aes(x = year,
           y = n,
           colour = offence_subgroup)) + 
  geom_line() + 
  facet_wrap(~ local_gov_area,nrow=2)
gg_crime_offence

crime_items_both <- crime_top_monash %>% 
  group_by(local_gov_area) %>%
  count(item_subdivision)

ggplot(crime_items_both,
       aes(x = n,
           y = reorder(item_subdivision, n), # reorder the points
           colour = local_gov_area)) +
  geom_point()

** Do you have any recommendations about future directions with this dataset? Is there anything else in the excel spreadsheet we could look at? (2 Mark) **

Amelia: I was planning on looking at the other tabs in the spreadsheet to help us use information on the tool used to break in. How could we use what is in there? And what is in there that looks useful?

First we can look at the most common tool that was used to break into a household. We can then check for trends among different tools. For instance if physical force is increasing then this may suggest that most households do not have sufficient standard security.

However if ‘Jemmy/Screwdriver’ were increasing then this may suggest that standard households do not have locks that are safe enough to prevent burglaries.

In addition sheet 7 can be used; this is inclusive of the days of the week and the time frame in which the burglaries occurred. Using this data we will be able to see at what time majority of the burglaries occured. This information could be given to the police to increase patrol units at times of high burglaries. Also, this can be split into seasonality to show whether crime occur during high public holiday seasons or not.

** For our presentation to stake holders, you get to pick one figure to show them, which of the ones above would you choose? Why? Recreate the figure below here and write 3 sentences about it (2.5 Marks) **

I would include the following figure:

crime_year_offence_both <- crime_top_monash %>%
  group_by(year, local_gov_area) %>%
  count(offence_subgroup)

gg_crime_offence <- ggplot(crime_year_offence_both,
       aes(x = year,
           y = n,
           colour = offence_subgroup)) + 
  geom_line() + 
  facet_wrap(~ local_gov_area,nrow=2)
gg_crime_offence

Summary of Graph

This graph represents the movement in burglary offences from 2009 until 2018 in the City of Casey and Monash. The x-axis represents each year and the y-axis represents the amount of burglaries that took place. Each line is colour coded to represent the type of burglary that took place. The legend on the right explains what colour represents what subgroup of burglary.

Looking at the similarities between Casey and Monash we can see that residential aggravated and non-aggravated burglaries are on the rise. I suggest that we should focus on more on household friendly security systems specifically in the City of Monash as we have big customers in this area. We should also note that there is a big market in the City of Casey for household security as this city has the most amount of burglaries in Victoria.

Amelia: Remember, when you are describing data visualisation, You should start with a quick summary of what the graphic shows you. Then, describe what is on the x axis, the y axis, and any other colours used to separate the data. You then need to describe what you learn.

Amelia: Remeber to include the graphic again below.

References

Amelia: I have got to remember to cite all the R packages that I have used, and any Stack Overflow questions, blog posts, text books, from online that I have used to help me answer questions.

Data downloaded from https://www.crimestatistics.vic.gov.au/download-data

Packages used (look for things which were loaded with library()): * ggplot2 * dplyr * readxl