Motor Vehicle Collisions in New York City
18 Apr 2016This project focuses on analyzing the Motor Vehicle Collisions in New York City provided by the Police Department (data available here) combined with historic weather data of New York City scraped from wunderground website. The main idea is motivated by the fact that I believe that traffic accidents are not purely linked to fate. It is often a combination of circumstances which can lead to a crash. Using my data I will try to identify the potential risky areas of New York City in term of motor vehicles collisions regarding a particular moment of the day, date, season, weather conditions, type of vehicle, or a combination of several factors.
First Analysis of the Dataset
After having collected and aggregated the data, it is important to have a good understanding of this dataset to be able to formulate hypothesis and start building predictive modeling. As mentioned in the introduction, I believe that motor vehicles collisions are in most cases not purely linked to fate. I think for example that the weather can be responsible for lots of car accidents, as well as tiredness.
Using all this information, the objective will be to build a predictive regression model which will take a list of inputs like the date, the hour, the vehicle, the weather,… and will return a probability of having an accident in a particular street of the city. Ideally, the final product would be an application where users (taxi drivers, private individuals) could choose their destination and could see the parts of their journey which present higher dangers regarding the aforementioned parameters. Therefore, they could be either more vigilant while reaching those particular areas or try to plan another route.
That is why I want to focus first on two series of plots which will give us a better comprehension of the repartition of the different collisions in NYC. I have chosen to use geographic plots using the coordinates of the collisions.
First group of charts: Vehicle type
The first two charts will be two heat maps showing the repartition of the crashes where the first vehicle involved is a bus (on the first one) and a taxi (on the second one). I have chosen these two types as I think they could be potential users of the product.
Heat Map of the Collisions involving a bus as first vehicle
Heat Map of the Collisions involving a taxi as first vehicle
These two plots enable us to have a better understanding of the potential areas of risks for both taxi drivers and bus drivers. Unsurprinsingly, Manhattan is the borough with the highest number of collision for both buses and taxis. However, buses seem to be safer. This is a first interesting dive which helps us to identify interesting tracks of study. Plus, this analysis will be extended to the other types of transportation.
To have a better understanding of the numbers of collisions for each type of first vehicle involved I have chosen to summarize this data in the table below:
Vehicle Type Code 1 | Count |
---|---|
Ambulance | 2216 |
Bicycle | 666 |
Bus | 13993 |
Fire Truck | 763 |
Large Com Vehicle (6 or more tires) | 13925 |
Livery Vehicle | 9615 |
Other | 23147 |
Passenger Vehicle | 416348 |
Pedicab | 23 |
Pick-up Truck | 11621 |
Scooter | 287 |
Small Com Vehicle (4 tires) | 12788 |
Sport Utility / Station Wagon | 180740 |
Taxi | 31717 |
Unknown | 20708 |
Van | 25375 |
Second group of charts: Weather conditions
The second series of charts shows heat maps of the repartition of the collision in bad weather conditions: rain and snow. I have chosen these two types as I think they generate many accidents.
Heat Map of the Collisions when the weather was rainy
Heat Map of the Collisions when the weather was snowy
According to the last two plots, it can't be denied that the weather seems to have an important place in the number of collisions. In addition, even if Manhattan still gather the highest number of vehicle collisions, it seems that bad weather conditions (like rain and snow) are also causing troubles outside of Manhattan, in Brooklyn for example.
Similarly to what I have done for the type of first vehicle involved in the collisions, I have chosen to summarize the weather conditions in the table below:
Weather Conditions | Count |
---|---|
Dry Weather | 501252 |
Fog | 3198 |
Fog-Rain | 40268 |
Fog-Rain-Snow | 9981 |
Fog-Snow | 8164 |
Rain | 173928 |
Rain-Snow | 8573 |
Snow | 23690 |
It is interesting to add that on the 1552 days of the study period, 362 were rainy days, 49 were snowy days and 74 were foggy and rainy days. Thus, there are about 480 collisions per day on rainy days and 483 on snowy days. Both weather conditions seem to have a similar impact on the number of collisions.
Combining the weather information with the type of vehicle will be part of the next steps of the project to try to identify safer means of transportation when the weather is bad and / or areas to avoid.