HyRead Journal 台灣全文資料庫

文章詳目資料

International Journal of Computational Linguistics And Chinese Language Processing THCI

自然科學/資訊/科技

篇名	基於訊息回應配對相似度估計的聊天記錄解構
卷期	24:2
並列篇名	Chatlog Disentanglement based on Similarity Evaluation Via Reply Message Pairs Prediction Task
作者	劉至咸、張嘉惠
頁次	063-078
關鍵字	對話解構、回覆關係預測、 BERT模型應用、 Chatlog Disentanglement 、 Reply Relation Prediction 、 BERT Neural Model 、 THCI Core
出刊日期	201912

中文摘要

一般而言，為建立Retrieval-based 聊天機器人，我們可以從聊天紀錄中來建立所需的問答配對（Question-Answer Pair），然而問答配對並非完全連續地呈現在聊天紀錄中，不同內容的問答配對可能互相穿插，而從互相穿插的訊息中分離出不同子題的會話任務稱為對話解構（conversation disentanglement）。現有的對話解構研究大多透過計算兩個訊息的相似度來解決問題，過去研究將問題定義為判斷兩則Reddit 訊息是否屬於相同主題的對話，但其所提出的模型對於未見過訊息的效能很差。實務上我們發現即使是使用者，在沒有上下文的情況下，要單從兩個給定訊息，判定其是否屬於相同會話是非常困難的。但若我們的目標是預測兩則訊息是否為回覆關係，則使用者判斷的一致性效能相對的較高。因此在本篇論文中，我們使用IRC 與Reddit 資料集進行實驗，並使用聊天記錄進行對話解構。利用Reddit 回覆標記合成的資料集提供大量訓練資料建立模型，最後透過BERT 模型在新定義的回覆關係預測上獲得良好的效能。

英文摘要

To build a Retrieval-based dialog system, we can exploit conversation log to extract question-answer pairs. However, the question-answer pairs are hidden in the conversation log, interleaving each other. The conversation task that separates different sub-topics from the interspersed messages is called conversation disentanglement. In this paper, we examined the task of judging whether two Reddit messages belong to the same topic dialogue and found that the performance is worse if training and testing data are splitted by time. In practice, it is also a very hard task even for human beings as there are only two messages and no context. However, if our goal is to predict whether a message is a reply to the other, the problem becomes much easier to judge. By changing the way of data preparation, we are able to achieve better performance through DA-LSTM (Dual Attention LSTM) and BERT-based models in the newly defined Reply prediction task.

本卷期文章目次

關鍵知識WIKI