HyRead Journal 台灣全文資料庫

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

自然科學/資訊/科技

篇名	Reduced N-Grams for Chinese Evaluation
卷期	10:1
作者	Ha, Le-quan 、 Seymour, R. 、 Hanna, P. 、 Smith, F.-J.
頁次	019-034
關鍵字	Reduced n-grams 、 Chinese reduced model 、 Chinese reduced n-grams 、 reduced model 、 reduced n-gram algorithm / identification 、 THCI Core
出刊日期	200503

Theoretically, an improvement in a language model occurs as the size of the n-grams increases from 3 to 5 or higher. As the n-gram size increases, the number of parameters and calculations, and the storage requirement increase very rapidly if we attempt to store all possible combinations of n-grams. To avoid these problems, the reduced n-grams’ approach previously developed by O’ Boyle and Smith [1993] can be applied. A reduced n-gram language model, called a reduced model, can
efficiently store an entire corpus’s phrase-history length within feasible storage limits. Another advantage of reduced n-grams is that they usually are semantically complete. In our experiments, the reduced n-gram creation method or the O’Boyle-Smith reduced n-gram algorithm was applied to a large Chinese corpus. The Chinese reduced n-gram Zipf curves are presented here and compared with previously obtained conventional Chinese n-grams. The Chinese reduced model reduced perplexity by 8.74% and the language model size by a factor of 11.49. This paper is the first attempt to model Chinese reduced n-grams, and may provide important insights for Chinese linguistic research.

本卷期文章目次

關鍵知識WIKI

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

中文摘要

英文摘要

本卷期文章目次

關鍵知識WIKI

相關文獻