Thomas Lucas Knowledge is Beautiful

Motor Vehicle Collisions in New York City

This project focuses on analyzing the Motor Vehicle Collisions in New York City provided by the Police Department (data available here) combined with historic weather data of New York City scraped from wunderground website. The main idea is motivated by the fact that I believe that traffic accidents are not purely linked to fate. It is often a combination of circumstances which can lead to a crash. Using my data I will try to identify the potential risky areas of New York City in term of motor vehicles collisions regarding a particular moment of the day, date, season, weather conditions, type of vehicle, or a combination of several factors.

First Analysis of the Dataset

After having collected and aggregated the data, it is important to have a good understanding of this dataset to be able to formulate hypothesis and start building predictive modeling. As mentioned in the introduction, I believe that motor vehicles collisions are in most cases not purely linked to fate. I think for example that the weather can be responsible for lots of car accidents, as well as tiredness.

Using all this information, the objective will be to build a predictive regression model which will take a list of inputs like the date, the hour, the vehicle, the weather,… and will return a probability of having an accident in a particular street of the city. Ideally, the final product would be an application where users (taxi drivers, private individuals) could choose their destination and could see the parts of their journey which present higher dangers regarding the aforementioned parameters. Therefore, they could be either more vigilant while reaching those particular areas or try to plan another route.

That is why I want to focus first on two series of plots which will give us a better comprehension of the repartition of the different collisions in NYC. I have chosen to use geographic plots using the coordinates of the collisions.

First group of charts: Vehicle type

The first two charts will be two heat maps showing the repartition of the crashes where the first vehicle involved is a bus (on the first one) and a taxi (on the second one). I have chosen these two types as I think they could be potential users of the product.

Heat Map of the Collisions involving a bus as first vehicle
Heat Map of the Collisions involving a taxi as first vehicle

These two plots enable us to have a better understanding of the potential areas of risks for both taxi drivers and bus drivers. Unsurprinsingly, Manhattan is the borough with the highest number of collision for both buses and taxis. However, buses seem to be safer. This is a first interesting dive which helps us to identify interesting tracks of study. Plus, this analysis will be extended to the other types of transportation.

To have a better understanding of the numbers of collisions for each type of first vehicle involved I have chosen to summarize this data in the table below:

Vehicle Type Code 1 Count
Ambulance 2216
Bicycle 666
Bus 13993
Fire Truck 763
Large Com Vehicle (6 or more tires) 13925
Livery Vehicle 9615
Other 23147
Passenger Vehicle 416348
Pedicab 23
Pick-up Truck 11621
Scooter 287
Small Com Vehicle (4 tires) 12788
Sport Utility / Station Wagon 180740
Taxi 31717
Unknown 20708
Van 25375

Second group of charts: Weather conditions

The second series of charts shows heat maps of the repartition of the collision in bad weather conditions: rain and snow. I have chosen these two types as I think they generate many accidents.

Heat Map of the Collisions when the weather was rainy
Heat Map of the Collisions when the weather was snowy

According to the last two plots, it can't be denied that the weather seems to have an important place in the number of collisions. In addition, even if Manhattan still gather the highest number of vehicle collisions, it seems that bad weather conditions (like rain and snow) are also causing troubles outside of Manhattan, in Brooklyn for example.

Similarly to what I have done for the type of first vehicle involved in the collisions, I have chosen to summarize the weather conditions in the table below:

Weather Conditions Count
Dry Weather 501252
Fog 3198
Fog-Rain 40268
Fog-Rain-Snow 9981
Fog-Snow 8164
Rain 173928
Rain-Snow 8573
Snow 23690

It is interesting to add that on the 1552 days of the study period, 362 were rainy days, 49 were snowy days and 74 were foggy and rainy days. Thus, there are about 480 collisions per day on rainy days and 483 on snowy days. Both weather conditions seem to have a similar impact on the number of collisions.

Combining the weather information with the type of vehicle will be part of the next steps of the project to try to identify safer means of transportation when the weather is bad and / or areas to avoid.