篇名 | Learning to Find Translations and Transliterations on the Web based on Conditional Random Fields |
---|---|
卷期 | 18:1 |
作者 | Chang, Joseph Z. 、 Chang, Jason S. 、 Jang, Jyh-Shing |
頁次 | 19-45 |
關鍵字 | Machine Translation 、 Cross-lingual Information Extraction 、 Wikipedia 、 Conditional Random Fields 、 THCI Core |
出刊日期 | 201303 |
In recent years, state-of-the-art cross-linguistic systems have been based on parallel corpora. Nevertheless, it is difficult at times to find translations of a certain technical term or named entity even with a very large parallel corpora. In this paper, we present a new method for learning to find translations on the Web for a given term. In our approach, we use a small set of terms and translations to obtain mixed-code snippets returned by a search engine. We then automatically annotate the data with translation tags, automatically generate features to augment the tagged data, and automatically train a conditional random fields model for identifying translations. At runtime, we obtain mixed-code webpages containing the given term and run the model to extract translations as output. Preliminary experiments and evaluation results show our method cleanly combines various features, resulting in a system that outperforms previous works.