文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

  • 加入收藏
  • 下載文章
篇名 Design and Development of a Bilingual Reading Comprehension Corpus
卷期 10:2
作者 Xu, KuiMeng, Helen
頁次 251-275
關鍵字 bilingualreading comprehensioncorpusTHCI Core
出刊日期 200506

中文摘要

英文摘要

This paper describes our initial attempt to design and develop a bilingual reading comprehension corpus (BRCC). RC is a task that conventionally evaluates the reading ability of an individual. An RC system can automatically analyze a passage of natural language text and generate an answer for each question based on information in the passage. The RC task can be used to drive advancements of natural language processing (NLP) technologies imparted in automatic RC systems. Furthermore, an RC system presents a novel paradigm of information search, when compared to the predominant paradigm of text retrieval in search engines on the Web. Previous works on automatic RC typically involved English-only language learning materials (Remedia and CBC4Kids) designed for children/students, which included stories, human-authored questions, and answer keys. These corpora are important for supporting empirical evaluation of RC performance. In the present work, we attempted to utilize RC as a driver for NLP techniques in both English and Chinese. We sought parallel English, and Chinese learning materials and incorporated annotations deemed relevant to the RC task. We measured the
comparative levels of difficulty among the three corpora by means of the baseline bag-of-words (BOW) approach. Our results show that the BOW approach achieves better RC performance in BRCC (67%) when compared to Remedia (29%) and CBC4Kids (63%). This reveals that BRCC has the highest degree of word overlap between questions and passages among the three corpora, which artificially simplifies the RC task. This result suggests that additional effort should be devoted to authoring questions with a various grades of difficulty in order for BRCC to better support RC research across the English and Chinese languages.

相關文獻