HyRead Journal 台灣全文資料庫

文章詳目資料

電腦與通訊

自然科學/資訊/科技

篇名	使用前後文篩選之快速具名實體擷取技術
卷期	154
並列篇名	Fast Named Entity Extraction Using Context Filtering
作者	劉昭宏、吳宗憲
頁次	041-047
關鍵字	具名實體辨識、非監督式辨識、前後文篩選、 Named Entity Recognition； NER 、 Unsupervised Recognition 、 Context Filtering
出刊日期	201312

中文摘要

在自然語言處理的技術中，文本中具名實體的辨識是最為基礎的工作。具名實體是用來表達如人名、地名…等文字中的語義類別。傳統上，具名實體的辨識依賴大量人工標記的語料以作為辨識器訓練所需的語料。然而，此法必須耗費大量人力與時間才能取得足夠的語料，因此辨識率的改進十分有限。另外，具名實體之定義若有增減或是其他變動，語料仍需額外蒐集處理，對於各種自然語言處理的應用而言十分不便。本文因此回顧相關技術，並且提出使用前後文篩選以快速擷取具名實體辨識系統所需訓練語料之方法。以最少的時間與人工介入，取得較之完全使用人工標記所得之語料十倍至百倍的訓練語料，進而改善中文具名實體辨識之效能，同時亦可快速調適辨識系統於不同的自然語言處理應用領域。

英文摘要

Recognizing named entities in texts is one of the fundamental tasks in natural language processing (NLP). Named entities are sematic classes used to represent concepts like names of people and places, etc.Traditionally, the training of named entity recognition (NER) systems relied on large annotated corpus which requires a lot of human involvement. This led to limited improvements on accuracies of the NER systems. It also lacks flexibility when the definitions of entities are changed or modified; human annotated corpus which reflects the new definitions is still required. This has posed a major obstacle to many applications of natural language processing. In this paper, we proposed a fast extraction method using context filtering to retrieve required corpus for the training of NER systems. The implemented method allows us to quickly collect amounts
of training corpus in orders of magnitude, thus improves the performance of Chinese named entity recognition and is capable to quickly adapt into different genres of NLP tasks.

本卷期文章目次

關鍵知識WIKI