
International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Analyzing the Morphological Structures in Seediq Words
卷期 25:2
作者 Chuan-Jie LinLi-May SungJing-Sheng YouWei WangCheng-Hsun LeeZih-Cyuan Liao
頁次 001-020
關鍵字 SeediqAutomatic Analysis of Morphological StructuresDeep RootNatural Language Processing for Indigenous Languages in TaiwanFormosan LanguagesTHCI Core
出刊日期 202012


NLP techniques are efficient to build large datasets for low-resource languages. It is helpful for preservation and revitalization of the indigenous languages. This paper proposes approaches to analyze morphological structures in Seediq words automatically as the first step to develop NLP applications such as machine translation. Word inflections in Seediq are plentiful. Sets of morphological rules have been created according to the linguisitic features provided in the Seediq syntax book (Sung, 2018) and based on regular morpho-phonological processing in Seediq, a new idea of “deep root” is also suggested. The rule-based system proposed in this paper can successfully detect the existence of infixes and suffixes in Seediq with a precision of 98.88% and a recall of 89.59%. The structure of a prefix string is predicted by probabilistic models. We conclude that the best system is bigram model with back-off approach and Lidstone smoothing with an accuracy of 82.86%.

