An Evaluation of 2D Human Pose Estimation based on ResNet Backbone
by Hai-Yen -Tran1, Trung-Minh Bui2, Thi-Loan Pham3, Van-Hung Le 2,*
1 Tan Trao University, Tuyen Quang, 22000, Vietnam
2 Vietnam Academy of Dance, HaNoi, 100000, Vietnam
3 Hai Duong College, HaiDuong, 02203, Vietnam
* Author to whom correspondence should be addressed.
Journal of Engineering Research and Sciences, Volume 1, Issue 3, Page # 59-67, 2022; DOI: 10.55708/js0103007
Keywords: 2D Human Pose Estimation, Residual Networks backbone, Human 3.6M Dataset, Convolutional
Neural Networks
Received: 08 January 2022, Revised: 07 March 2022, Accepted: 26 February 2022, Published Online: 17 March 2022
AMA Style
-Tran H-Y, Bui T-M, Pham T-L, Le V-H. An evaluation of 2D human pose estimation based on Resnet backbone. Journal of Engineering Research and Sciences. 2022;1(3):59-67. doi:10.55708/js0103007
Chicago/Turabian Style
-Tran, Hai-Yen, Trung-Minh Bui, Thi-Loan Pham, and Van-Hung Le. “An Evaluation of 2D Human Pose Estimation Based on Resnet Backbone.” Journal of Engineering Research and Sciences 1, no. 3 (2022): 59–67. https://doi.org/10.55708/js0103007.
IEEE Style
H.-Y. -Tran, T.-M. Bui, T.-L. Pham, and V.-H. Le, “An evaluation of 2D human pose estimation based on Resnet backbone,” Journal of Engineering Research and Sciences, vol. 1, no. 3, pp. 59–67, 2022.
2D HumanPose Estimation (2D-HPE) has been widely applied in many practical applications in life su ction, human-robot interaction, using Convolutional Neural Networks (CNNs), which has achieved many good results. In particular, the 2D-HPE results are intermediate in the 3D Human Pose Estimation (3D-HPE) process. In this paper, we perform a study to compare the results of 2D-HPE using versions of Residual Network (ResNet/RN) (RN-10, RN- 18, RN-50, RN-101, RN-152) on HUman 3.6M Dataset (HU-3.6M-D). We transformed the original 3D annotation data of the Human 3.6M dataset to a 2D human pose. The estimated models are fine-tuning based on two protocols of the HU-3.6M-D with the same input parameters in the RN versions. The best estimate has an error of 34.96 pixels with Protocol #1 and 28.48 pixels with Protocol #3 when training with 10 epochs, increasing the number of training epochs reduces the estimation error (15.8 pixels of Protocol #3). The results of quantitative evaluation, comparison, analysis, and illustration in the paper.
- N. S. Willett, H. V. Shin, Z. Jin, W. Li, A. Finkelstein, “Pose2Pose: Pose Selection and Transfer for 2D Character Animation”, “Interna- tional Conference on Intelligent User Interfaces, Proceedings IUI”, pp. 88–99, 2020, doi:10.1145/3377325.3377505.
- H. Zhang, C. Sciutto, M. Agrawala, K. Fatahalian, “Vid2Player: Con- trollable Video Sprites That Behave and Appear Like Professional Tennis Players”, ACM Transactions on Graphics, vol. 40, no. 3, pp. 1–16, 2021, doi:10.1145/3448978.
- H. G. Weiming Chen , Zijie Jiang, X. Ni, “Fall Detection Based on Key Points of of human-skeleton using openpose”, Symmetry, 2020.
- N. T. Thanh, L. V. Hung, P. T. Cong, “An Evaluation of Pose Estimation in Video of Traditional Martial Arts Presentation”, Journal of Research and Development on Information and Communication Technology, vol. 2019, no. 2, pp. 114–126, 2019, doi:10.32913/mic-ict-research.v2019.n2.864.
- X. Zhou, Q. Huang, X. Sun, X. Xue, Y. Wei, “Towards 3d human pose estimation in the wild: A weakly-supervised approach”, “The IEEE International Conference on Computer Vision (ICCV)”, 2017.
- G. Chandrasekaran, S. Periyasamy, K. Panjappagounder Rajaman- ickam, “Minimization of test time in system on chip using arti- ficial intelligence-based test scheduling techniques”, Neural Com- puting and Applications, vol. 32, no. 9, pp. 5303–5312, 2020, doi: 10.1007/s00521-019-04039-6.
- G. Chandrasekaran, P. R. Karthikeyan, N. S. Kumar, V. Kumarasamy, “Test scheduling of System-on-Chip using Dragonfly and Ant Lion
rnal of Intelligent and Fuzzy Systems,
o. 3, pp. 4905–4917, 2021, doi:10.3233/JIFS-201691. - J. Martinez, R. Hossain, J. Romero, J. J. Little, “A Simple Yet Effective Baseline for 3d Human Pose Estimation”, “Proceedings of the IEEE International Conference on Computer Vision”, vol. 2017-Octob, pp. 2659–2668, 2017.
- S. Li, L. Ke, K. Pratama, Y.-W. Tai, C.-K. Tang, K.-T. Cheng, “Cas- caded deep monocular 3d human pose estimation with evolutionary training data”, “The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)”, 2020.
- Q. Dang, J. Yin, B. Wang, W. Zheng, “Deep learning based 2D human pose estimation: A survey”, TPAMI, vol. 24, no. 6, pp. 663–676, 2021, doi:10.26599/TST.2018.9010100.
- Z. Luo, Z. Wang, Y. Huang, L. Wang, T. Tan, E. Zhou, “Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation”, “CVPR”, pp. 13259–13268, 2021, doi:10.1109/cvpr46437.2021.01306.
- A. Bulat, G. Tzimiropoulos, “Human pose estimation via convolu- tional part heatmap regression”, “European Conference on Com- puter Vision”, vol. 9911 LNCS, pp. 717–732, 2016, doi:10.1007/ 978-3-319-46478-7_44.
- A. Newell, K. Yang, J. Deng, “Stacked Hourglass Networks for Hu- man Pose Estimation”, “ECCV”, 2016.
- K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for im- age recognition”, “IEEE Conference on CVPR”, vol. 2016-Decem, pp. 770–778, 2016, doi:10.1109/CVPR.2016.90.
- R. Zhang, L. Du, Q. Xiao, J. Liu, “Comparison of Backbones for Semantic Segmentation Network”, “Journal of Physics: Conference Series”, vol. 1544, 2020, doi:10.1088/1742-6596/1544/1/012196.
- R. Girshick, “Fast R-CNN”, “Proceedings of the IEEE International Conference on Computer Vision”, vol. 2015 Inter, pp. 1440–1448, 2015, doi:10.1109/ICCV.2015.169.
- S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137– 1149, 2017, doi:10.1109/TPAMI.2016.2577031.
- K. He, G. Gkioxari, P. Dollar, R. Girshick, “Mask R-CNN”, “ICCV”, 2017.
- A. Krizhevsky, I. Sutskever, G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, F. Pereira, C. J. C. Burges,
L. Bottou, K. Q. Weinberger, eds., “Advances in Neural Information Processing Systems”, vol. 25, Curran Associates, Inc., 2012. - M. Lin, Q. Chen, S. Yan, “Network in network”, “2nd International Conference on Learning Representations, ICLR 2014 – Conference Track Proceedings”, pp. 1–10, 2014.
- K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition”, Y. Bengio, Y. LeCun, eds., “3rd Inter- national Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings”, 2015.
- A. Toshev, C. Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks”, “IEEE Conference on CVPR”, 2014.
- S. Liang, X. Sun, Y. Wei, “Compositional Human Pose Regression”, “ICCV”, vol. 176-177, pp. 1–8, 2017, doi:10.1016/j.cviu.2018.10.006.
- D. C. Luvizon, H. Tabia, D. Picard, “Human pose regression by combining indirect part detection and contextual information”, Computers and Graphics (Pergamon), vol. 85, pp. 15–22, 2019, doi: 10.1016/j.cag.2019.09.002.
- Z. Cao, T. Simon, S. E. Wei, Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields”, “IEEE Conference on CVPR”, vol. 2017-Janua, pp. 1302–1310, 2017, doi:10.1109/CVPR.2017.143.
- T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dol- lár, C. L. Zitnick, “Microsoft COCO: Common objects in context”, “Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)”, l. 8693 LNCS, pp. 740–755, 2014, doi:10.1007/978-3-319-10602-1_
48. - M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, “2d human pose estimation: New benchmark and state of the art analysis”, “IEEE Conference on Computer Vision and Pattern Recognition (CVPR)”, 2014.
- X. Xiao, W. Wan, “Human pose estimation via improved ResNet50”, “4th International Conference on Smart and Sustainable City (ICSSC 2017)”, vol. 148, pp. 148–162.
- Y. Wang, T. Wang, “Cycle Fusion Network for Multi-Person Pose Es- timation”, Journal of Physics: Conference Series, vol. 1550, no. 3, 2020.
- N. Benvenuto, F. Piazza, “On the Complex Backpropagation Algo- rithm”, IEEE Transactions on Signal Processing, vol. 40, no. 4, pp. 967– 969, 1992, doi:10.1109/78.127967.
- N. V. Hieu, N. L. H. Hien, “Recognition of plant species using deep convolutional feature extraction”, International Journal on Emerging Technologies, vol. 11, no. 3, pp. 904–910, 2020.
- C. Ionescu, D. Papava, V. Olaru, C. Sminchisescu, “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments”, TPAMI, vol. 36, no. 7, pp. 1325–1339, 2014.
- N. burrus, “Kinect calibration”, http://nicolas.burrus.name/ index.php/Research/KinectCalibration.
- X. Zhang, X. Zhou, M. Lin, J. Sun, “ShuffleNet: An Extremely Effi- cient Convolutional Neural Network for Mobile Devices”, “CVPR”, pp. 6848–6856, 2018.
- C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, Z. Ding, “3d hu- man pose estimation with spatial and temporal transformers”, “Pro- s of the IEEE International Conference on Computer Vision
CV)”, 2021.