文章詳目資料

運輸計劃 TSSCI

  • 加入收藏
  • 下載文章
篇名 數據不平衡下以機器學習方法預測交通事故嚴重性之分析
卷期 51:4
並列篇名 MACHINE LEARNING METHODS FOR TRAFFIC ACCIDENT SEVERITY PREDICTION UNDER IMBALANCED DATA
作者 胡大瀛李岳洪
頁次 275-301
關鍵字 事故嚴重性不平衡數據機器學習集成算法Traffic accident severityimbalanced datamachine learningensemble methodTSSCI
出刊日期 202212

中文摘要

降低事故的嚴重程度是近年來全世界努力的方向,全球已經發展出許多被動式安全系統來減緩事故嚴重程度,如安全帶、安全氣囊、煞車輔助系統等等,建立預測事故嚴重性的模型也是許多學者研究的目標,近年來機器學習以及深度學習的方法取代統計方法,可以達到較高的準確度以及運算效率,然而進行模型訓練時需要大量的數據,但肇事資料庫中存在著數據不平衡的問題,因此如何處理這種狀況將是一項重要的課題。本研究將交通事故嚴重性分為死亡、受傷、未受傷三個等級,為多元分類問題,並收集臺南市的公開資料庫且利用過採樣以及欠採樣兩種資料預處理的方法,對於不平衡的數據進行重新採樣,分別使用SMOTE 和Cluster Centroid 這兩種演算法去進行;在模型訓練的部分,採用基於集成學習(Ensemble Learning)的兩種分類模型,本文使用Random Forest 和Catboost 這兩種演算法來進行兩種集成的學習,研究結果顯示,在欠採樣及過採樣的資料中,兩種模型分別都有97.69%以及86.84%以上的準確度,此結果未來可以應用於自駕車上或是給予相關單位作為制定決策時的一些證據。

英文摘要

Reducing traffic accident severity is an effective approach to improve road safety. To decrease traffic severity, there are many passive safety systems like safety belts, airbags, brake assist systems and so on. In recent years, building models to predict traffic accident severity is also the subject that many researchers focus on. There are a lot of machine learning and deep learning approaches instead of statistical methods. They can get higher accuracy and faster calculate speed. It needs large datasets to train the model, but there is usually an imbalanced data problem in the datasets. Therefore, it must preprocess these sets. This study divides the traffic accident severity into three levels: death, injury, and non-injury. It is a multi-class classification problem. We collect data from Tainan open datasets and utilize over-sampling and under-sampling methods to resample the imbalanced data. To implement the resample process, we apply SMOTE and Cluster Centroid algorithms separately. We apply two classification models based on the ensemble learning to train the model. This study uses Random Forest and Catboost to execute the two ensemble learning methods. The research results denote that these two models have more than 97.69% and 86.84% accuracy separately in the under-sampling and over-sampling datasets. This result can apply in autonomous vehicles in the future or provide related apartments some suggestions for making the decision.

相關文獻