文章詳目資料

電子商務學報 TSSCI

  • 加入收藏
  • 下載文章
篇名 混合型資料集的k-means 分群演算法
卷期 19:1
並列篇名 A k-means Based Clustering Algorithm for Mixed- Attribute Data Sets
作者 黃宇翔王品鈞方志強
頁次 001-028
關鍵字 叢集分析k-means順序屬性距離量度Clustering analysisk-meansordinal attributedistance measureTSSCI
出刊日期 201706

中文摘要

叢集分析為資料探勘分群技術之一,由於目前網路環境快速發展,資料屬性的種 類與數量大量增加,導致傳統分群技術執行的效能大幅降低,傳統k-means 分群方法 將難以應付。因此後續的相關研究則是針對數值、類別、順序等屬性資料的處理作為 研究的重點。本研究以Ahmad and Dey(2007)所提出k-means 之衡量距離定義為基 礎,針對三種屬性同時存在的資料集做叢集分析,並以各自不同的衡量距離定義作為 分群考量,提出基因演算法以求得最佳衡量指標最好之群心組合,希望能提供各界應 用,解決因三種混合的資料屬性所造成分群困難的實務問題。

英文摘要

Clustering is one of the most important analysis methods in data mining. In the wake of the fast development of networks technology, various types of data attribute and large numbers of data items cause the substantial inefficiency of data processing for clustering. Among different clustering approaches, partitioning clustering is relatively easier to implement and faster to perform than other ones. Different types of data attributes make clustering complicated. Most of literature focuses on numerical and categorical attributes or only ordinal attributes, respectively, but the results turn out to be less satisfactory in terms of accuracy and execution time. The proposed clustering approach, based on Ahmad and Dey (2007) k-means method, is advantageous in dealing with the three attributes: numerical, categorical and ordinal attributes simultaneously in which Euclidean distance is used to define the numerical similarity, the frequency of each value’s rank is used to indicate the categorical similarity, and the normalized distance is used to measure the ordinal similarity. The effectiveness of the proposed approach is evaluated by the use of an essential concept of clustering which is to minimize the ratio of the within cluster errors to the between cluster errors. A generic algorithm is also developed for reducing the execution time in dealing with the clustering of the three types of attributes at the same time. We hope the proposed method can provide a useful clustering technique for applications in practice.

相關文獻