文章詳目資料

電子商務學報 TSSCI

  • 加入收藏
  • 下載文章
篇名 高效率探勘關聯規則之演算法_GRA
卷期 8:4
並列篇名 An Efficient Algorithm for Mining Association Rules —GRA
作者 黃仁鵬藍國誠
頁次 469-498
關鍵字 資料探勘關聯法則階段縮減機制data miningassociation rulesthe gradation reduction mechanismsTSSCI
出刊日期 200612

中文摘要

資料探勘的技術變得日益重要,目前已廣泛應用在商業上的預測及決策的支援。 其中,關聯法則在資料探勘領域中,扮演著相當重要的角色,因此,許多探勘關聯法 則演算法不斷被提出及改進,主要目的在於增加執行效能或提升記憶體使用率。本研 究也朝著這個目標,試著改進關聯規則演算法為主要方向。 本研究主要是針對探勘關聯規則GDA演算法的特性及缺點來加以改進,雖然 GDA演算法已是很有效率的演算法之一,不過,它還是有兩個最主要的問题。第一, GDA無法探勘交易長度較長的交易資料庫。第二,GDA演算法在某階段須儲存大量 非必要的項目集時,將導致記憶體使用率會呈現不佳狀態;因此,GDA演算法的實用 性大打折扣。 基本於上述理由,本研究提出一個改良於GDA演算法的階段拆解核心的新演算 法GRA ( Gradation Reduction Approaches )。GRA演算法主要的特色就是加入階段縮 減交易長度機制,因為該縮減機制可大量減少非頻繁項目集的數量,將可更適用於探 勘交易長度較長的資料庫。在執行過程中,GRA演算法不需要產生任何候選項目集, 即可快速找出關聯規則。除此之外,GRA演算法透過階段縮減機制僅會產生可能成為 頻繁的項目集,所以,不需耗費龐大記憶體空間來儲存項目集,可有效提昇記憶體的 利用率。在現實生活中的資料庫容量通常都是大於記憶體容量,為了解決此問题,本 研究也以GRA演算法為主题提出另一修正版GAR-M演算法,GAR-M演算法將選擇 探用資料庫分割方式繼續執行探勘任務,每個子資料庫僅需進行四次實體I/O動作, 不隨著頻繁項目集的長度增長而增加實體I/O的次數,以避免耗費過多的I/O時間, 也可有效提高執行效率與實用性。

英文摘要

The technology of data mining becomes important in recent years, and it is generally applied to commercial forecast and decision supports. Association rules mining algorithms play the important role in the field of data mining. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. Our major study also tries to improve the efficiency of association rules mining algorithms. In this paper, our major study is to improve the defects of the GDA algorithm. Although GDA algorithm was one of the most efficient algorithms, but it still has two serious problems; in the first place, GDA algorithm can^t mine the transactions of databases whose record length is very long; in the second place, GDA algorithm isrft very efficient at utility rate of the memory when it must store lots of unnecessary itemsets at one phase. Therefore, the GDA algorithm is not very practical. Based on above the reasons we propose a new algorithm - GRA (Gradation Reduction Approaches) that is improved from GDA algorithm. One of the characters of the GRA algorithm is the gradation reduction mechanisms because it can reduce lots of infrequent itemsets; the GRA algorithm is very suitable to mine the transactions of databases whose record length is very long. In the mining procedure, the GRA algorithm doesn't generate any candidate itemset to find association rules quickly. Besides, the GRA algorithm through gradation reduction mechanisms only generate those itemsets which are the most possible to be the frequent itemsets. So, the GRA algorithm canft waste lots of memory spaces to store infrequent itemsets; it can efficiently increase the utility rate of memory. The size of the databases in the real world is always greater than the size of the memory. In order to solve this problem, we propose a modifying algorithm - GRA-M (Gradation Reduction Approaches -Modifying); it divides a large database int o many sub-databases and mines association rules from those sub-databases. The GRA-M algorithm only scans database four times and will not be affect by the length of frequent itemsets. The GRA-M algorithm avoids wasting a lot of I/O time and increases the efficiency and the practicability in application.

相關文獻