HyRead Journal 台灣全文資料庫

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

加入收藏
暫不開放

自然科學/資訊/科技

篇名	Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models
卷期	19:3
作者	Wang, Yu-chun 、 Karol Chang, Chia-Tien 、 Richard Tsai, Tzong-Han 、 Hsiang, Jieh
頁次	025-038
關鍵字	Ttransliteration Extraction 、 Classical Chinese 、 Buddhist Literation 、 Langauge Model 、 Conditional Random Fields 、 CRF 、 THCI Core
出刊日期	201409

Extracting plausible transliterations from historical literature is a key issue in historical linguistics and other research fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguists and digital humanities researchers, this paper proposes a transliteration extraction method based on the conditional random field method with features based on the language models and the characteristics of the Chinese characters used in transliterations. To evaluate our method, we compiled an evaluation set from two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also constructed a baseline approach with a suffix array based extraction method and phonetic similarity measurement. Our method significantly outperforms the baseline approach, and the method achieves recall of 0.9561 and precision of 0.9444. The results show our method is very effective for extracting transliterations in classical Chinese texts.

本卷期文章目次

關鍵知識WIKI

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

中文摘要

英文摘要

本卷期文章目次

關鍵知識WIKI

相關文獻