文章詳目資料

Journal of Computers EIMEDLINEScopus

  • 加入收藏
  • 下載文章
篇名 Canopy-MMD Text Clustering Algorithm Based on Simulated Annealing and Canopy Optimization
卷期 34:1
作者 Jun-Wu ZhaiYu-Chen TianWen-Tao LiKun Liang
頁次 075-086
關鍵字 Jun-Wu ZhaiYu-Chen TianWen-Tao LiKun LiangEIMEDLINEScopus
出刊日期 202302
DOI 10.53106/199115992023023401006

中文摘要

英文摘要

Aiming at the problems that traditional K-means text clustering cannot automatically determine the number of clusters and is sensitive to initial cluster centers, this paper proposes a Canopy-MMD text clustering algorithm based on simulated annealing and silhouette coefficient optimization. The algorithm uses the simulated annealing algorithm combined with the silhouette coefficient to optimize the Canopy algorithm to find the optimal number of clusters, and uses the optimal number of clusters to determine the scale coefficient in the MMD algorithm, and finally achieves a better text clustering effect. The Sohu News dataset of Sogou Lab is experimentally analyzed and compared with the clustering results obtained by traditional K-means and algorithms in the literature. The experimental results show that the clustering performance of the algorithm is better than the traditional K-means algorithm and the algorithm in the literature, and the accuracy, precision, recall and F value are improved by 8.02%, 8.91%, 8.02%, 9.51% compared with the traditional K-means algorithm, which can be widely used in fields such as text mining, knowledge graph and natural language processing.

本卷期文章目次

相關文獻