HyRead Journal 台灣全文資料庫

文章詳目資料

Journal of Computers EIMEDLINEScopus

自然科學/資訊/科技

篇名	FOF: Fusing Object Features into Deep Learning Model to Generate Image Caption
卷期	30:4
作者	Hang Zhou 、 Xue-Qiang Lv 、 Xin-Dong You 、 Zhi-An Dong 、 Kai Zhang
頁次	206-216
關鍵字	convolutional neural network 、 image caption 、 object detection 、 recurrent neural network 、 EI 、 MEDLINE 、 Scopus
出刊日期	201908
DOI	10.3966/199115992019083004020

To solve the problem of category errors and number errors of objects in the sentences generated by existing image captioning model, we propose an image captioning model fused with object features. In particular, we integrate object statistical feature and object regional feature extracted from the image into the Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework. Using object detection network to extract object statistical feature and object regional feature, the object statistical feature and the image convolutional feature are used as the input of Long Short-Term Memory (LSTM), and Attention Mechanism (AM) is used to concatenating the object regional feature with the output of LSTM to generate sentences, so that the model obtains additional information about objects categories, objects numbers and objects regions, which helps to improve the quality of the generated description. Experiments are conducted on MSCOCO dataset. Especially compared with the Hard-attention model, BLEU3/4 increase 4.5%, 4.9%, respectively and compared with the g-LSTM model, BLEU3/4 increase 4.4%, 3.5%, respectively. The proposed model is of great significance to solve the problem of object category errors and object number errors in image description.

本卷期文章目次

關鍵知識WIKI

文章詳目資料

Journal of Computers EIMEDLINEScopus

中文摘要

英文摘要

本卷期文章目次

關鍵知識WIKI

相關文獻