文章詳目資料

先進工程學刊

  • 加入收藏
  • 下載文章
篇名 一個調整不平衡資料以提升分類正確率的新方法
卷期 18:1
並列篇名 A New Method to Adjust Imbalanced Data to Improve Classification Accuracy
作者 李維平周賢明林劭旻
頁次 025-032
關鍵字 合成少數法過採樣欠採樣近似差錯決策樹synthetic minority oversampling techniqueover samplingunder samplingnearmissdecision tree
出刊日期 202304

中文摘要

對於數據的處理方法,各領域都會遇到不同的難題,其中不平衡資料是一項較為棘手的課題。目前學術界有針對多數類的欠採樣,也有針對少數類的過採樣,但只要處理不妥,就容易在欠採樣時造成樣本本身重要資訊遺失,或是在過採樣時造成分類器過擬合。也有不少研究針對分類器進行改良、優化,但資料本身的品質優劣較大程度的影響了分類結果,分類器本身的改良對於分類結果較無顯著的幫助。本研究結合了SMOTE(Synthetic Minority Oversampling Technique)合成少數法、近似差錯(NearMiss)、欠採樣法來解決資料不平衡的問題,並和過採樣法、SMOTE法分別建立決策樹分類模型進行比較,最後透過實驗得知使用NMS(NearMiss-2 SMOTE)採樣法在四種不同數據的實驗中皆為最佳採樣方法,在少數類樣本的分類正確率也為各種採樣方法中最高的。

英文摘要

For data processing methods, various fields will encounter different problems, and unbalanced data is a more difficult subject. At present, academia has under-sampling for the majority of classes and over-sampling for the minority classes, but as long as it is not handled properly, it is easy to cause important information about the sample itself to be lost during under-sampling, or to over-fit the classifier during oversampling. There are also many studies that improve and optimize the classifier, but the quality of the data itself has a greater impact on the classification results, and the improvement of the classifier itself has no significant help to the classification results. This study combines SMOTE (Synthetic Minority Oversampling Technique) and NearMiss to solve the problem of data imbalance, and compare it with the oversampling method and SMOTE method to establish the decision tree classification model. Finally, through experiments, it is found that the NMS (NearMiss-2 SMOTE) sampling method is the best in the four different data experiments. The best sampling method, the classification accuracy rate of the minority samples is also the highest among various sampling methods.

相關文獻