文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
卷期 13:4
作者 Yu, Liang-chihWu, Chung-hsienYeh, Jui-fengHovy, Eduard
頁次 405-419
關鍵字 Corpus CleanupSemantic AnalysisWord Sense DisambiguationEntropyTHCI Core
出刊日期 200812

中文摘要

英文摘要

Word sense annotated corpora are useful resources for many text mining
applications. Such corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, nobody has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the
mistaken agreements in word sense annotation, we employ word sense
disambiguation (WSD) to select a set of suspicious candidates for human
evaluation. Experiments are conducted from three aspects (precision,
cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective in identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% of the remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.

相關文獻