HyRead Journal 台灣全文資料庫

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

自然科學/資訊/科技

篇名	Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
卷期	13:4
作者	Yu, Liang-chih 、 Wu, Chung-hsien 、 Yeh, Jui-feng 、 Hovy, Eduard
頁次	405-419
關鍵字	Corpus Cleanup 、 Semantic Analysis 、 Word Sense Disambiguation 、 Entropy 、 THCI Core
出刊日期	200812

Word sense annotated corpora are useful resources for many text mining
applications. Such corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, nobody has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the
mistaken agreements in word sense annotation, we employ word sense
disambiguation (WSD) to select a set of suspicious candidates for human
evaluation. Experiments are conducted from three aspects (precision,
cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective in identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% of the remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.

本卷期文章目次

關鍵知識WIKI

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

中文摘要

英文摘要

本卷期文章目次

關鍵知識WIKI

相關文獻