篇名 | Analyzing the Morphological Structures in Seediq Words |
---|---|
卷期 | 25:2 |
作者 | Chuan-Jie Lin 、 Li-May Sung 、 Jing-Sheng You 、 Wei Wang 、 Cheng-Hsun Lee 、 Zih-Cyuan Liao |
頁次 | 001-020 |
關鍵字 | Seediq 、 Automatic Analysis of Morphological Structures 、 Deep Root 、 Natural Language Processing for Indigenous Languages in Taiwan 、 Formosan Languages 、 THCI Core |
出刊日期 | 202012 |
NLP techniques are efficient to build large datasets for low-resource languages. It is helpful for preservation and revitalization of the indigenous languages. This paper proposes approaches to analyze morphological structures in Seediq words automatically as the first step to develop NLP applications such as machine translation. Word inflections in Seediq are plentiful. Sets of morphological rules have been created according to the linguisitic features provided in the Seediq syntax book (Sung, 2018) and based on regular morpho-phonological processing in Seediq, a new idea of “deep root” is also suggested. The rule-based system proposed in this paper can successfully detect the existence of infixes and suffixes in Seediq with a precision of 98.88% and a recall of 89.59%. The structure of a prefix string is predicted by probabilistic models. We conclude that the best system is bigram model with back-off approach and Lidstone smoothing with an accuracy of 82.86%.