Using data to make Seattle streets safer

Data Reporting and Model Generation for public safety

BACKDROP
🚦Seattle has a LOT Traffic
Seattle has been climbing the ranks for ‘Worst Traffic’, and any disruption in the rainy city can cause hour long back-ups. Within the past 10 years, collisions and traffic have costed the city nearly $2.6 billion. To find a solution, I led an investigative team to help improve road design and better distribute city infrastructure capital. 
We built tools, predictive models, and reports from our investigation into how traffic collisions are affected by the weather, lighting, road conditions, and city infrastructure. Our work has laid to ground work for urban engineers to build solutions for a long-lasting, data-driven impact. 
Timeline: 14 days
Team Members: 5
My role: Building the Collision Intelligence Report & Data Exploration Tool
Tools used: Power BI, R, Python, Javascript, Leaflet.JS
📄 Collision Intelligence Report
Using R, Python, and Power BI, we identified spatial trends in collisions across dimensions like time, temperature, precipitation and lighting conditions. We generated an interactive report to communicate our insights which included visualizations like: 
-  a sunburst visualization accompanied by line graphs to highlight seasonality in collisions
- a bubble map to highlight the addresses and areas which need immediate resources.
📊 Data Exploration Tool 
There have been over 200,000 traffic collisions in Seattle since 2004. That is a lot of data for the Seattle government to explore with a lot of hidden insights. Our dataset had over 30+ columns for us to explore each providing us with different details regarding each accident. To help the Seattle Urban Planning committee easily explore this data, we built a Power BI Exploration Tool.
With this tool they can easily connect to their live database and explore the data in an interactive manner to ask detailed questions of the data with options to filter for each category.

Predicted vs Actual Collisions

👨‍🔬 Machine Learning Model to predict the total accidents for a day in Seattle
Initially, we wished to predict collisions for each street of Seattle. However, the data only contained records for accidents and not for traffic flow so it was not possible to find the likelihood of an accident for a particular street. 
Instead, we trained a K-Nearest Neighbors model using the past decade's data to predict the total accidents in Seattle for a given date with a -4 Mean Absolute error. The features we used were: the daylight hours, time of day, day of the week, week of the year, month, precipitation, and temperature (collected via the Weather API). 
Note: We were unable to find free APIs to provide us with data related to live traffic counts, road conditions, road construction, and light conditions.
🗾 Interactive Granular Bubble Map of Collisions
Using Leaflet JS, we built a bubble map that provided insights about a region at a more granular level as the reader zoomed in. This provided a deeper understanding of the accidents in each region and allowed us to compare and contrast them against each other.
FIN.
📚 Key Learnings
When we started this project, each of us assumed that the most accidents may be happening when it was raining or had just rained. However this hypothesis was immediately debunked once we conducted our data analysis. This again went to prove the philosophy of allowing the data to do the talking. 
While we don’t believe our knowledge resources can necessarily change the human behavior of distracted driving or chance occurrences leading to pile-ups, we do believe that knowledge of hot spots for a given can lead to more efficient response and planning. While traffic collisions may be an inevitable part of city life, hopefully our work can ensure they do not take up any more precious hours of our daily lives than they need to.
🤔 What Next?
From what we were able to produce in the two week time frame, we believe this is a good start to tackling the issue of reducing and understanding traffic collisions in Seattle. Towards the end, we began to realize that while we could continue to experiment with new machine learning algorithms, our analysis and models would always be affected by the incoming data quality. 
We were limited in our scope of features by the lack of free APIs that would offer greater analytical strength, such as live traffic counts, road conditions, road construction, more granular weather/light conditions (by roadway), etc. Therefore our recommendation to the city of Seattle is to invest in the collection of more metadata for each collision record such as traffic flow count.
Back to Top