Skip to content

Analyzing car accidents in the United Kingdom using PySpark and Python for big data processing.

License

Notifications You must be signed in to change notification settings

ijeffries/car-accident-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

car-accident-analysis

Map of Accidents in the UK

Index

  1. Summary
  2. File Directory
  3. Language and Packages Used
  4. Installing PySpark
  5. Credits
  6. License

Summary

The following project uses Python and PySpark to simulate how to leverage big data processing to analyze car crashes in the UK. The attached Jupyter Notebook could be used in conjunction with databricks to process the data across a real cluster.

File Directory

  1. data - contains the four files used in analysis:
           a. Acc.csv - 2017 accident data reported by the UK police force.
           b. Cas.csv - 2017 casualty data reported by the UK police force.
           c. Veh.csv - 2017 vehicle data reported by the UK police force.
           c. dictionary.xls - Data dictionary used to define coded categorical values within datasets.

  2. images - contains visualizations:
           a. uk_accidents.png - Heatmap showing accidents in the UK by accident severity.

  3. car_crash.ipynb - Jupyter Notebook containing all analysis performed on the datasets, along with visualizations.

Language and Packages Used

Python is used in conjunction with Pyspark for all analysis performed.

The following commands will import all necessary packages:

import pyspark, os, zipfile
import pandas as pd
import urllib.request
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from pyspark.sql import SQLContext
from pyspark import SparkContext
from pyspark.sql.types import IntegerType

Installing PySpark

PySpark takes special configuration to install and run within Jupyter Notebook:

  1. If you're using windows, Michael Galarnyk has an excellent tutorial on installing PySpark for windows.

  2. If you are installing on Linux or Mac OS, Charles Bochet's article will get you started.

Credits

  1. Would like to thank the UK goverment for posting the data on their website.
  2. Would like to thank the stackoverflow user whose function I stole, because of you lot I get to stand on the shoulders of giants.

License

MIT License Copyright (c) 2019 Ian Jeffries