文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 應用記憶增強條件隨機場域與之深度學習及自動化詞彙特徵於中文命名實體辨識之研究
卷期 24:1
並列篇名 Leveraging Memory Enhanced Conditional Random Fields with Gated CNN and Automatic BAPS Features for Chinese Named Entity Recognition
作者 簡國峻張嘉惠
頁次 001-014
關鍵字 機器學習命名實體辨識神經網路特徵探勘Machine LearningNamed Entity RecognitionMemory NetworkFeature MiningTHCI Core
出刊日期 201906

中文摘要

命名實體辨識是在自然語言處理當中一個重要的任務。現今基礎深度學習模型應用於資料品質較為優良的命名實體擷取,雖有不錯的效果,但在社群媒體資料集中卻未能達到傳統條件隨機場域之基準值。由於一個命名實體有能可多次在文中提及,因此藉由上下文資訊來改進命名實體的擷取也是近年來的研究方向。在本研究中,我們延伸記憶增強條件隨機場域MECRF於中文的命名實體擷取,利用門控卷積網路及雙向GRU網路來增強記憶條件隨機場域,以利模型抓取長距離的文章資訊。此外,也藉由特徵探勘擷取命名實體前後詞彙以及前綴後綴詞彙特徵(簡稱為BAPS),並使用模型可自動訓練的參數,自動調整詞向量及BAPS 詞彙特徵。最後我們同時採用字元及詞彙向量來增進模型的效能。本研究所提出之模型,在網路社群媒體的人名辨識資料中可以達到的91.67%準確率,在SIGHAN-MSRA中也得到最高的92.45%地名實體辨識效果及90.95%整體召回率。

英文摘要

Named Entity Recognition (NER) is an essential task in Natural Language Processing. Memory Enhanced CRF (MECRF) integrates external memory to extend Conditional Random Field (CRF) to capture long-range dependencies with attention mechanism. However, the performance of pure MECRF for Chinese NER is not good. In this paper, we enhance MECRF with Stacked CNNs and gated mechanism to capture better word and sentence representation for Chinese NER. Meanwhile, we combine both character and word information to improve the performance. We further improve the performance by importing common before and common after vocabularies of named entities as well as entity prefix and suffix via feature mining. The BAPS features are then combined with character embedding features to automatically adjust the weight. The model proposed in this research achieve 91.67% tagging accuracy on the online social media data for Chinese person name recognition, and reach the highest F1-score 92.45% for location name recognition and 90.95% overall recall rate in SIGHAN-MSRA dataset.

相關文獻