This chapter aims to observe emerging patterns and trends by using big data analysis to enhance predictions of motor vehicle collisions. This chapter data set consists of 17 attributes and 998 193 collisions in New York City. The data is extracted from the New York Police Department (NYPD). Then the data set has been tested in three classification algorithms, which are k-nearest neighbor (kNN), random forest, and Naive Bayes. The outputs are captured using k-fold cross-validation method. These outputs are used to identify and compare classifier accuracy, and random forest node accuracy and processing time. Further, an analysis of raw data is performed describing the four different vehicle groups to detect significance within the recorded period. Finally, extreme cases of collision severity are identified by using outlier analysis. The analysis demonstrates that out of three classifiers, random forest has been classified to show the highest number of accuracies with 95.03%, followed by kNN with 94.93%, and Naive Bayes provided the least accuracy of 70.13%, although it has recorded the least processing time of 5.7938 seconds. Further, random forest confirmed stable high accuracy throughout each node used. Therefore, random forest classifier can be identified as the most accurate prediction method among all other tested classification methods. Additionally, statistical analysis shows each described vehicle group to be highly related to the recorded period of years (p < 0.001). Overall, this chapter has identified a highly accurate classification model and the significance of a vehicle group that could minimize road risks and motor vehicle collisions. Therefore, these results provide new evidence to support future researchers.
History
Related Materials
1.
ISBN - Is published in 9781119544456 (urn:isbn:9781119544456)