文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Learning to Find Translations and Transliterations on the Web based on Conditional Random Fields
卷期 18:1
作者 Chang, Joseph Z.Chang, Jason S.Jang, Jyh-Shing
頁次 19-45
關鍵字 Machine TranslationCross-lingual Information ExtractionWikipediaConditional Random FieldsTHCI Core
出刊日期 201303

中文摘要

英文摘要

In recent years, state-of-the-art cross-linguistic systems have been based on parallel corpora. Nevertheless, it is difficult at times to find translations of a certain technical term or named entity even with a very large parallel corpora. In this paper, we present a new method for learning to find translations on the Web for a given term. In our approach, we use a small set of terms and translations to obtain mixed-code snippets returned by a search engine. We then automatically annotate the data with translation tags, automatically generate features to augment the tagged data, and automatically train a conditional random fields model for identifying translations. At runtime, we obtain mixed-code webpages containing the given term and run the model to extract translations as output. Preliminary experiments and evaluation results show our method cleanly combines various features, resulting in a system that outperforms previous works.

相關文獻