篇名 | A Method of Detecting Approximate Repetitive News Documents |
---|---|
卷期 | 29:2 |
作者 | Xueping Liang 、 Xiaojun Wen |
頁次 | 104-109 |
關鍵字 | approximate repetition of documents 、 document clusters 、 multi-feature fingerprint clusters 、 EI 、 MEDLINE 、 Scopus |
出刊日期 | 201804 |
DOI | 10.3966/199115992018042902011 |
In view of the phenomenon of too much repeated webpage on the Internet, this paper proposes an approximately duplicate webpage detection algorithm and system , which combined multi-feature fingerprint cluster detection with document similarity detection. In this scheme, the multi-feature fingerprint cluster detection is used first to ensure the precision and efficiency of the algorithm; for small portion of the document that not be recalled, approximately duplicate webpage detection algorithm is used to guarantee the recall rate. The scheme has good improvements in the aspects of precision and recall rate, and at the same time has a good balance on performance.