文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 以最佳化及機率分佈標記形聲字聲符之研究
卷期 15:2
並列篇名 Annotating Phonetic Component of Chinese Characters Using Constrained Optimization and Pronunciation Distribution
作者 張嘉惠林書彥李淑瑩蔡孟峰李淑萍廖湘美Hsiang-Mei Liao孫致文黃鍔
頁次 145-159
關鍵字 形聲字聲符發音相似度最佳化機率分佈KL divergencePicto-phonetic CompoundsPhonetic ComponentPronunciation SimilarityPronunciation DistributionOptimizationTHCI Core
出刊日期 201006

中文摘要

一般說來,漢字乃圖形文字,無法像英文等拼音文字一樣,一旦學會拼音方法,即有基本的閱讀能力。相對的,漢字讀寫的學習進展則相當緩慢,而且必須搭配注音符號或是其他拼音方法,才可知道每個漢字的發音。事實上漢字中有八成的字是形聲字,形聲字不僅可由形旁表意,又可以聲符表音,因此即使沒見過的字也可以由偏旁推論其音及義。不過主要的困難在於聲旁未必一定同音,可能是相近的發音,之間的演變規則尚未有人探究過,例如:泡、抱、飽三個字同樣與『包』的發音相近,然而發音如何由『包』的發音轉變成其他三個字的發音,則仍待研究。本論文首先嘗試以自動化方式判定漢字聲符,做為研究形聲字發聲規則的第一步。實驗顯示,我們所提的兩種方式,發音相似度比較法在9593 個形聲字中的判定聲符準確率為90.7%,而構件發分佈比較法則可達到 98.1%的準確率,可以加速形聲字聲符標記所需的大量人力工作與時間。

英文摘要

Generally speaking, Chinese characters are graphic characters that do not allow immediate pronunciation unless they are accompanied with Mandarin phonetic symbols (zhuyin) or other pinyin methods (e.g. romanization system). In fact, about 80 to 90 percents of Chinese characters are pictophonetic characters which are composed of a phonetic component and a semantic component. Therefore, even if one had not seen the character before, one can make a logical guess at the character's pronunciation and meaning from its phonetic and semantic symbols. In order to analyze such relations, we start by analyzing the characteristics of phonetic components. We found two interesting features that could automatically identify the phonectic components of Chinese characters. One is pronunciation similarity,the other is pronunciation distribution.
Experiments show that these two methods have high accuracy (90.8% and 98.1% for 9593 pictophonetic characters) in predicting the phonetic components of pictophonetic characters. These methods can save a lot of time and effort during the annotation of phonetic symbols in the
early stage.

相關文獻