篇名 | Autoencoder-based Feature Learning from a 2D Depth Map and 3D Skeleton for Action Recognition |
---|---|
卷期 | 29:4 |
作者 | Zhi-Ze Wu 、 Shou-Hong Wan 、 Li Yan 、 Li-Hua Yue |
頁次 | 082-095 |
關鍵字 | 2D depth map 、 3D skeleton 、 action recognition 、 Auto-Encoder 、 back-propagation neural network 、 EI 、 MEDLINE 、 Scopus |
出刊日期 | 201808 |
DOI | 10.3966/199115992018082904007 |
3D skeleton is compact for human action recognition. Existing approaches focus mainly on performing action recognition with only joint coordinates. However, they are difficult recognizing some similar or complex actions based on skeleton data only. Furthermore, they could have poor performance when the estimated skeletal joints are not reliable or when the actions share a large overlap in the sequences. Noticing that the existing 3D skeleton data is usually obtained from depth map and a neural network can simulate arbitrary relationships, we propose an efficient approach for action recognition by combining 2D depth information with 3D skeleton information using a new deep architecture called Deep Multimodal Auto-Encoder (DMAE) in this paper. First, the framework of DMAE employs two Auto-Encoders to extract hidden representations for the 2D depth maps and the 3D skeletons. Second, DMAE uses a twolayer neural network to map the hidden representations of the 2D depth maps to those of the 3D skeletons. Third, based on a Back-propagation Neural Network (BP-NN), it can jointly explore the hidden 2D/3D representations and the appropriate relationships between them. We finally use Temporal Pyramid Matching (TPM) on the learned features to generate temporal representations and perform classifications using a linear SVM. Additionally, we demonstrate the effectiveness and efficiency of DMAE by extensive experimentation using two popular action datasets.