文章詳目資料

Journal of Computers EIMEDLINEScopus

  • 加入收藏
  • 下載文章
篇名 Event-based Feature Synthesis: Autonomous Data Science Engine
卷期 30:2
作者 Thirat LimsurutWarasinee Chaisangmongkon
頁次 055-067
關鍵字 automated machine learningclassificationdata sciencefeature engineeringEIMEDLINEScopus
出刊日期 201904
DOI 10.3966/199115992019043002005

中文摘要

英文摘要

In this paper, we develop the autonomous data science tool to allow artificial intelligence to solve data science problems. Data exploration and feature extraction are the most time-consuming steps in a data science process. We develop an event-based feature synthesis algorithm, which can automatically recognize relationships between different entities and events presented in the data, extract important features using statistical and mathematical functions, and filter out only features of high importance. Our algorithm can generate features for data science problems with single and multiple data tables and use them to fit random forest classifier. To test the robustness of our autonomous data science engine (ADE) framework against wellestablished Deep Feature Synthesis (DFS) framework, we put our data science bot to test in public data science challenges and assess the usefulness of our feature sets. ADE can achieve high accuracy scores in several competitions, for example, it can predict targets at the accuracy as high as 89.5%, beating 74% of human participants in Employee Access Challenge (Kaggle, 2013). In MOOC dropout prediction (KDD 2015), features from ADE can augment features from DFS framework and improve accuracy from 85.3% to 86.3%.

本卷期文章目次

相關文獻