文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Similarity Based Chinese Synonym Collocation Extraction
卷期 10:1
作者 Li, WanyinLu, QinXu, Ruifeng
頁次 123-143
關鍵字 Lexical StatisticsSemantic InformationSimilaritySynonymous CollocationsTHCI Core
出刊日期 200503

中文摘要

英文摘要

Collocation extraction systems based on pure statistical methods suffer from two major problems. The first problem is their relatively low precision and recall rates. The second problem is their difficulty in dealing with sparse collocations. In order to improve performance, both statistical and lexicographic approaches should be considered. This paper presents a new method to extract synonymous collocations using semantic information. The semantic information is obtained by calculating similarities from HowNet. We have successfully extracted synonymous
collocations which normally cannot be extracted using lexical statistics. Our evaluation conducted on a 60MB tagged corpus shows that we can extract
synonymous collocations that occur with very low frequency and that the
improvement in the recall rate is close to 100%. In addition, compared with a collocation extraction system based on the Xtract system for English, our algorithm can improve the precision rate by about 44%.

相關文獻