文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 統計式片語翻譯模型
卷期 6:2
並列篇名 Statistical Translation Model for Phrases
作者 張俊盛游大緯李俊仁
頁次 043-063
關鍵字 Statistical Machine TranslationCross-language Information RetrievalPhrase TranslationTHCI Core
出刊日期 200108

中文摘要

機器翻譯是自然語言處理研究上最重要的課題之一,在過去運用機器翻譯比較成功的例子,多是特定的領域文件的翻譯。近來因為網際網路與搜尋引擎的盛行,大家開始重視機器翻譯在跨語言檢索(Cross Language Information Retrieval)中的角色。在跨語言檢索的問題上,通常是對查詢字詞或片語,進行翻譯(Query Translation)。然而翻譯的結果必須和欲搜尋的文件庫有高度的相關性,才能達到檢索的效果。目前翻譯查詢關鍵詞的做法,無論是採用現成的翻譯軟體,或者使用一般性的雙語詞典,都很難確保產生和文件相關的翻譯。因此我們希望能夠透過統計式片語機器翻譯(Statistical Phrase Translation Model, SPTM)的做法來進行查詢關鍵詞的翻譯,以提高跨語言檢索的效率。在這篇論文中,我們提出新的統計式片語翻譯模型,並進行實驗。實驗中我們利用BDC 雙語電子辭典實驗以SPTM 進行片語內的詞彙對應。以SPTM 產生對應分析,比較快速,而且正確率比較高。

英文摘要

Machine Translation is one of the most difficult problems in the field of natural . In the past, MT has been applied to professional
communication in the process of translating technical and corporate document on a specific domain. Recent years saw the rapid development of Internet as a new form of communication and information exchange, and the need to access information across the language barrier became apparent. People began to look into the role that MT can play in Cross Language Information Retrieval. The prevalent approach to CLIR is based on translation of query, in particular query phrases. However, for CLIR there is an additional new objective of translating into something that is relevant to the collection being searched upon. Therefore, the current approach of using general bilingual word list or an off-the-shelf commercial MT software is bound to be very ineffective in terms of retrieving relevant documents. We propose a new approach to Statistical Phrase Translation Model (SPTM), aimed at achieving a tighter estimation of phrase translation. Experiments were conducted using bilingual phrases in BDC Electronic Chinese-English Dictionary. Preliminary results shows the approach is much faster and produces better word alignment for phrases, which has not been possible using previous approaches.

相關文獻