文章詳目資料

Journal of Computers EIMEDLINEScopus

  • 加入收藏
  • 下載文章
篇名 FOF: Fusing Object Features into Deep Learning Model to Generate Image Caption
卷期 30:4
作者 Hang ZhouXue-Qiang LvXin-Dong YouZhi-An DongKai Zhang
頁次 206-216
關鍵字 convolutional neural networkimage captionobject detectionrecurrent neural networkEIMEDLINEScopus
出刊日期 201908
DOI 10.3966/199115992019083004020

中文摘要

英文摘要

To solve the problem of category errors and number errors of objects in the sentences generated by existing image captioning model, we propose an image captioning model fused with object features. In particular, we integrate object statistical feature and object regional feature extracted from the image into the Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework. Using object detection network to extract object statistical feature and object regional feature, the object statistical feature and the image convolutional feature are used as the input of Long Short-Term Memory (LSTM), and Attention Mechanism (AM) is used to concatenating the object regional feature with the output of LSTM to generate sentences, so that the model obtains additional information about objects categories, objects numbers and objects regions, which helps to improve the quality of the generated description. Experiments are conducted on MSCOCO dataset. Especially compared with the Hard-attention model, BLEU3/4 increase 4.5%, 4.9%, respectively and compared with the g-LSTM model, BLEU3/4 increase 4.4%, 3.5%, respectively. The proposed model is of great significance to solve the problem of object category errors and object number errors in image description.

本卷期文章目次

相關文獻