
圖書資訊學刊 CSSCIScopusTSSCI

  • 加入收藏
  • 下載文章
篇名 Identifying Food-related Word Association and Topic Model Processing using LDA
卷期 16:1
並列篇名 「食」類相關的詞彙聯想識別和主題模型處理:以LDA為例
作者 李郁錦胡宗智張國恩
頁次 023-044
關鍵字 LDA Mandarin Vocabulary StudySemantic PrimingTimelimited Multiple Divergent Thinking Test of Word Associative Strategy Word AssociationLDA(latent Dirichlet allocation)華語詞彙學習語義啟動多重限時「詞彙 聯想策略擴散性思考測驗」詞彙聯想TSSCI
出刊日期 201806
DOI 10.6182/jlis.201806_16(1).023


本研究結合自然語言處理及心理語言學二者,屬一跨領域研究。為理解人類對詞 彙認知與習得的機制與過程,試圖以主題模型中的潛在語意模型LDA(latent Dirichlet allocation),進行詞彙語意相關度的運算。為測試潛在語意模型的輸出與人類詞彙聯想的 相似度,本研究藉由大規模的多重限時「詞彙聯想策略擴散性思考測驗」的資料搜集,以 三項刺激詞進行測驗,共101位受試者參與受試,輸出共4,251項獨立詞。實驗結果透過二 個層次的分析:(1)以專家分類(expert classification)的方式,透過二名專家,一方面以 Ross與Murphy(1999)所提出的詞彙聯想結果的分類指標(知識及腳本分類)分類。另 一方面,以Mednick(1962)的連結層級理論,將詞彙測驗結果分為二類:陡峭式與平緩 式連結。分析結果指出人類聯想不僅具有隨機性,更具有普遍性及延展性。(2)實驗文本 經由潛在語意模型LDA運算,二者的結果交叉比對後,證實具高度顯著相關。輸出結果符 合人類學習和聯想的機制。本研究所進行的是一個全新的嘗試—資料處理科學對人類的詞 彙及概念的聯想進行推理和預測。此一結果,未來在教學和商業上可提供改善及應用。


This paper presents an interdisciplinary study that combines natural language processing and psycholinguistics research. The latent Dirichlet allocation (LDA) model was used for semantic relatedness computation to enable an understanding of the mechanisms and processes through which humans encode and retrieve lexical units. To test the similarity of the output of the topic model and human word association, the “Time-limited Multiple Divergent Thinking Test of Word Associative Strategy” (TLM-DTTWAS) was used to collect data and conduct tests with three food-related stimulus words. A total of 101 subjects took the tests, producing 4,251 words. The empirical results were analyzed on two levels: (1) by the expert word association classification: taxonomic and script proposed by Ross and Murphy (1999); (2) followed by the associative hierarchy theory of Mednick (1962), to sort the vocabulary test results into two associative hierarchies, “steep” and “flat.” The analysis indicated that human word association displays randomness, as well as generalization and continuity. After the experimental text was passed through the LDA latent semantic model which demonstrated highly significant correlation. This was a whole new attempt to train a data science model to make inference and prediction of human concept association which could be very useful in teaching as well as commercial applications.
