HyRead Journal 台灣全文資料庫

文章詳目資料

資訊管理展望

社會科學/管理

篇名	從雜訊資料中探勘相似性頻繁項目集
卷期	10:1
並列篇名	TAFI: An Efficient Algorithm for Mining Approximate Frequent Itemsets from Noisy Data
作者	趙景明、郭芳甄
頁次	89-110
關鍵字	資料探勘、關聯規則、頻繁項目集探勘、相似性頻繁項目集、雜訊資料、 Data mining 、 Association rules 、 Frequent itemsets mining 、 Approximate frequent itemsets 、 Noisy data
出刊日期	200806

中文摘要

勘出資料中的頻繁樣式﹙Frequent Patterns﹚。過去傳統的頻繁項目集探勘都是採用
精確的探勘模式並不適合應用於真實的資料上，因真實的資料往往都會存在著雜訊
(noise)，若在真實資料中採用精確模式來探勘便無法產出正確的頻繁項目集，而錯誤的
探勘結果則會產生錯誤的決策。
近年有學者研究如何在雜訊資料中取出頻繁項目集，然而他們的方法運用在資料為
稀疏矩陣(sparse matrix)的狀況下探勘效率不佳，以鑒於此，本研究提出一個新的探勘演
算法稱之為TAFI (Trie Approximate Frequent Itemsets)，它利用精簡資料庫(Reduced
basket)大幅減少探勘時所需的空間以及提升計算速度。另外，TAFI 採用了Trie 資料結
構可以有效的提升探勘頻繁項目的效率，並且利用項目出現頻率以減少候選項目的數
量。由實驗結果得知，TAFI 演算法執行效率優於其他演算法，在不同類型的資料下仍
然可以維持良好的執行效率。

英文摘要

To discover association rules, frequent itemset mining can find out items that appear
frequently together in a dataset. Traditional frequent itemset mining utilizes the “exact” mode.
However, the exact-mode mining is not appropriate for real data. Mining noisy data using the
exact mode cannot generate correct frequent itemsets, and may eventually lead to incorrect
decisions.
In recent years, many researchers have studied how to discover frequent itemsets from
noisy data. However, existing methods can become inefficient when the dataset is sparse.
Therefore, these methods cannot be applied to all kinds of datasets. In this paper, we propose
a new algorithm, called the TAFI algorithm, for mining approximate frequent itemsets. The
TAFI algorithm not only can correctly and efficiently discover approximate frequent itemsets
from noisy data, but also can perform well with spare datasets.

本卷期文章目次

關鍵知識WIKI