篇名 | Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation |
---|---|
卷期 | 13:4 |
作者 | Yu, Liang-chih 、 Wu, Chung-hsien 、 Yeh, Jui-feng 、 Hovy, Eduard |
頁次 | 405-419 |
關鍵字 | Corpus Cleanup 、 Semantic Analysis 、 Word Sense Disambiguation 、 Entropy 、 THCI Core |
出刊日期 | 200812 |
Word sense annotated corpora are useful resources for many text mining
applications. Such corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, nobody has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the
mistaken agreements in word sense annotation, we employ word sense
disambiguation (WSD) to select a set of suspicious candidates for human
evaluation. Experiments are conducted from three aspects (precision,
cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective in identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% of the remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.