文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Aligning More Words with High Precision for Small Bilingual Corpora
卷期 2:2
作者 Ker, J. SueChang, S. Jason
頁次 063-095
關鍵字 Word alignmentmachine readable dictionary and thesaurusbilingual corpusword sense disambiguationTHCI Core
出刊日期 199708

中文摘要

英文摘要

In this paper, we propose an algorithm for identifying each word with its translations in a sentence and translation pair. Previously proposed methods require enormous amounts of bilingual data to train statistical word-by-word translation models. By taking a word-based approach, these methods align frequent words with consistent translations at a high precision rate. However, less frequent words or words with diverse translations generally do not have statistically significant evidence for
confident alignment. Consequently, incomplete or incorrect alignments occur. Here, we attempt to improve on the coverage using class-based rules. An automatic procedure for acquiring such rules is also described. Experimental results confirm that the algorithm can align over 85% of word pairs while maintaining a comparably high precision rate, even when a small corpus is used in training.

相關文獻