文章詳目資料

Journal of Computers EIMEDLINEScopus

  • 加入收藏
  • 下載文章
篇名 An E-mail Classification Algorithm based on Stacking Integrated Learning
卷期 33:2
作者 Li-Xia WanWei-Xing HuangQing-Hua Tang
頁次 105-114
關鍵字 anti spam systemintegrated learning algorithmTF-IDF algorithmword vector space modele-mail classificationEIMEDLINEScopus
出刊日期 202204
DOI 10.53106/199115992022043302009

中文摘要

英文摘要

The text filtering of traditional anti spam system mainly focuses on keyword matching and text fingerprint analysis, which is difficult to accurately identify and classify spam. Therefore, an integrated learning algorithm based on stackin g is proposed in this paper. Firstly, the algorithm takes the manually marked text data of various categories as samples, uses TF-IDF algorithm to train the word vector space model, then selects linear SVC, xgboost and logistic regression algorithm to structure the base classifier, uses random forest algorithm to structure the meta classifier, and combines the stacking ensemble learning algorithm to structure the classification model. It achieves the function of dividing e-mail into five categories: illegal, advertisement, news, bill and recruitment. From the simulation results, the AUC values of the stacking integrated learning classification algorithm for each category are 0.92, 0.95, 1.00, 0.93 and 0.97 respectively, and the AP values are 0.86, 0.88, 1.00, 0.88 and 0.94 respectively, which realizes the high performance and high precision of text classification.

本卷期文章目次

相關文獻