文章詳目資料

Journal of Computers EIMEDLINEScopus

  • 加入收藏
  • 下載文章
篇名 Applying the Chi-Square Test to Improve the Performance of the Decision Tree for Classification by Taking Baseball Database as an Example
卷期 29:6
作者 Chia-En LiYe-In Chang
頁次 001-015
關鍵字 chi-square testclassificationdata miningdecision treesignificant factorEIMEDLINEScopus
出刊日期 201812
DOI 10.3966/199115992018122906001

中文摘要

英文摘要

The chi-square test is one of the statistical tests and is good to analyze whether categorical variable A is the significant factor to categorical variable B. On the other hand, a decision tree is one of useful models for data classification. To achieve the goal of efficient knowledge discovery by a compact decision tree, in this paper, we propose a method by making use of the result of the chi-square test to reduce the number of concerned attributes. We make use of the P-value from the chi-square test to decide the significant factors as the preprocessing step to prune insignificant factors before constructing the decision tree. In such a way, we can avoid constructing the inaccurate decision tree. We use the public baseball database as an example to illustrate our method. From our performance study, we observe that the way of checking the most significant factor (i.e., the factor with the minimum P-value) first can reduce the number of conditions (i.e., levels) to be decided. Therefore, the compact decision tree constructed from our method can provide less storage cost, faster prediction time and higher degree of accuracy for data classification than the decision tree concerning all original factors.

本卷期文章目次

相關文獻