文章詳目資料

電子商務學報 TSSCI

  • 加入收藏
  • 下載文章
篇名 高效率之遞増式資料探勘演算法-ICI
卷期 8:3
並列篇名 An Efficient Incremental Mining Algorithm-ICI
作者 黃仁鵬錢依佩郭煌政
頁次 393-414
關鍵字 資料探勘關聯規則Apriori演算法高頻項目集遞增式資料探勘Data MiningAssociation RuleFrequent ItemsetsIncremental MiningTSSCI
出刊日期 200609

中文摘要

隨著資訊科技的進步、電腦的普及,蒐集資料變得更容易、快速而且方便。但長 時間之下,資料庫累積了大量且有隱藏知識的資料。所以,如何將這些被隱藏的知識, 做正確又有效率地探勘,成為一個重要的議题。因此,資料探勘的技術便應運而生。 當中,最被廣為使用的技術為關聯規則之探勘。關聯規則探勘主要是探討如何從龐大 資料庫中找出高頻項目集,進而發掘有用的知識。而在關聯規則中最常被使用的方法 為Apriori演算法。雖然此方法可以找出關聯規則,但是它有二個最大的缺點:第一點 為在找高頻項目集合時,會產生大量的候選項目集合;第二點為執行時必須經常掃瞄 整個資料庫,造成執行效率不佳。後續有許多研究皆針對此缺點做改進,但皆未跳脫 Apriori演算法的整體架構,以致於其執行效率並無很大的進展。本研究所提出ICI演 算法脫離Apriori演算法的架構,在產生大項目集合時,只需掃描資料庫一次,因此可 以有效率地降低I/O的存取時間,並且快速地找出關聯規則,使得探勘更有效率。此 外ICI演算法不需要任何修改就可以當作線上即時渐增式資料探勘(On-lineIncremental Data Mining)的演算法。

英文摘要

Due to the improvement of information technologies and popularization of computers, collecting information becomes easier, rapider and more convenient than before. As the time goes by, database accumulates huge and knowledge-hiding information. Therefore, how to correctly uncover and efficiently mining hidden knowledge from those information becomes a very important issue. Hence the technology of data mining becomes one of the solutions.Among the data mining technologies association rules mining is one of the most popular technologies to be used. Association rules mining explores the approaches to extract the frequent itemsets from large database and to derive the knowledge behind implicitly. The Apri-ori algorithm is one of the most frequently used algorithms. Although the Apriori algorithm can successful derive the association rules from database, the Apriori algorithm has two major defects: First, the Apriori algorithm produces large amounts of candidate itemsets during extracting the frequent itemsets fr om large database. Secondly,the whole database is scanned many times which leads to inefficient performance. Many researches tiy to improve the performance of the Apriori algorithm, but still not escape from the frame of the Apriori algorithm and lead to a little improvement of the performance. In this paper we propose ICI (Incremental Combination Itemsets) which escapes the frame of Apriori algorithm, and it only needs to scan whole database once during extracting the frequent itemsets from large database. Therefore, the ICI algorithm efficiently reduces the I/O time, and rapidly extracts the frequent itemsets from large database, and makes data mining more efficient than before. Meanwhile, ICI algorithm doesn't need to scan database and reconstruct data structure again when database is updated or minimum support is varied. Therefore, it can be applied to online incremental mining applications without any modification.

相關文獻