篇名 | A Comparative Study of Methods for Topic Modeling in Spoken Document Retrieval |
---|---|
卷期 | 17:1 |
作者 | Lin, Shih-hsiang 、 Chen, Berlin |
頁次 | 065-085 |
關鍵字 | Information Retrieval 、 Document Topic Models 、 Word Topic Models 、 Spoken Document Retrieval 、 THCI Core |
出刊日期 | 201203 |
Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this paper, we first present a comprehensive comparison of various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, different granularities of index features, including words, subword units, and their combinations, are also exploited to work in conjunction with various extensions of
topic modeling presented in this paper, so as to alleviate SDR performance
degradation caused by speech recognition errors. All of the experiments were performed on the TDT Chinese collection.