Human Motion Prediction Using Wavelet Transform

  • Wafaa Shihab Ahmed University of Technology
Keywords: Wavelet Transform, Convolution Neural Network, Variational Auto Encoder (VAE), Long Short Term Memory (LSTM).

Abstract

The goal of prediction human motion is to analyze a subject's behaviors based on observed sequences and produced future body motions. In this work the deep neural network has been employed and proposed using wavelet transform with CNN-VAE model to analyze the input data to multi scales and extract features to encode it by CNN-VAE model, LSTM model has been used to predict encoded data and decoded it by used CNN-decoder to produce the new  predicted frames. The propose system achieved best results in PSNR, MSE and SSIM and made the time of training and testing (prediction) faster. The experiments have been applied on two dataset: KTH and Weizmann and generate video of 1200 ms.

 

Downloads

Download data is not yet available.

References

[1] W. S. Ahmed and A. A. Karim, "Human Motion Imagination and Prediction- A Survey," MJPS, vol. 8, no. 2, pp. 30-45, 2021.
[2] H. Yasin, U. Iqbal, B. Kruger, A. Weber, and J. Gall, "A dual-source approach for 3d pose estimation from a single image," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), pp. 4948–4956.
[3] J.F. Hu,W. ShiZheng, J. Lai, and J. Zhang, "Jointly learning heterogeneous features for rgbd activity recognition,". In Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 5344–5352.
[4] J. Liu, A. Shahroudy, D. Xu, and G. Wang. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision, Springer, (2016), pp. 816–833.
[5] Y. Tang, L. Ma, W. Liu and W. Zheng, "Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic," Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), (2018), pp. 935-941.
[6] J. Martinez, M. J. Black, and J. Romero, " on human motion prediction using recurrent neural network," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), arXiv preprint arXiv:1705.02445, (2017), pp. 2891-2900.
[7] K. Fragkiadaki, S. Levine, P. Felsen and J. Malik, " Recurrent Network Models for Human Dynamics," In Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4346-4354.
[8] P. Ghosh, J. Song, E. Aksan and O. Hilliges, " Learning Human Motion Models for Long-term Predictions," In 3D Vision (3DV), International Conference on IEEE, (2017).
[9] R. Villegas, J. Yang, S. Hong, X. Lin and H. Lee. "Decomposing Motion and Content for Natural Video Sequence Prediction," in ICLR (2017), pp. 1-22, 2017.
[10] C. Li, Z. Zhang, W. Sun, L. Gim and H. Lee, " Convolutional Sequence to Sequence Model for Human Dynamics," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), pp. 5226-5234.
[11] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu and M. Yang, " Flow-Grounded Spatial Temporal Video Prediction from Still Images," ECCV, Springer, (2018), pp. 1-16.
[12] K. Xu, G. Li, H. Xu, W. Zhang and Q. Huang, " Edge Guided Generation Network for Video Prediction," IEEE, (2018).
[13] P. Liu, H. Zhang, W. Lian, and W. Zuo, "Multi-level Wavelet Convolutional Neural Networks," (2019), pp. 1-12.
[14] W. Witwit, Y. Zhao, K. Jenkins and S. Addepalli, "Global motion based video super-resolution reconstruction using discrete wavelet transform," Multimed Tools Appl (2018) 77, pp. 27641–27660.
[15] Ahmed, Wafaa Shihab. "Motion Classification Using CNN Based on Image Difference." In 2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA), pp. 1-6. IEEE, 2020.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks", International Conference on Neural Information Processing Systems. Curran Associates Inc. pp. 1097-1105, (2012).
[17] A. Karpathy, G. Toderici, and S. Shetty, "Large-Scale Video Classification with Convolutional Neural Networks", Computer Vision and Pattern Recognition. IEEE, pp. 1725-1732, (2014).
[18]https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73].
[19] H. Sak, A. W. Senior, and F. Beaufays, "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," CoRR, abs/1402.1128, (2014).
URL http://arxiv.org/abs/1402.1128.
[20] E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, "Using recurrent neural network models for early detection of heart failure onset," in Journal of the American Medical Informatics Association, vol. 24(2), pp. 361–370, August (2016). ISSN 1527-974X. doi: 10.1093/jamia/ocw112.
[21] W. Mingkuan, "Sequential Images Prediction Using Convolutional LSTM with Application in Precipitation Nowcasting," master's thesis, University of Calgary, Calgary, (2019).
[22] http://www.nada.kth.se/cvap/actions/
[23] Ahmed, Wafaa Shihab. "The Impact of Filter Size and Number of Filters on Classification Accuracy in CNN." In 2020 International Conference on Computer Science and Software Engineering (CSASE), pp. 88-93. IEEE, 2020.
[24] http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html
Published
2021-08-19
How to Cite
Shihab Ahmed, W. (2021). Human Motion Prediction Using Wavelet Transform. Al-Qadisiyah Journal of Pure Science, 26(4), 444–458. Retrieved from https://journalsc.qu.edu.iq/index.php/JOPS/article/view/1354
Section
Special Issue (Silver Jubilee)