An Evaluation of 2D Human Pose Estimation based on ResNet Backbone
by Hai-Yen -Tran1, Trung-Minh Bui2, Thi-Loan Pham3, Van-Hung Le 2,*
1 Tan Trao University, Tuyen Quang, 22000, Vietnam
2 Vietnam Academy of Dance, HaNoi, 100000, Vietnam
3 Hai Duong College, HaiDuong, 02203, Vietnam
* Author to whom correspondence should be addressed.
Journal of Engineering Research and Sciences, Volume 1, Issue 3, Page # 59-67, 2022; DOI: 10.55708/js0103007
Keywords: 2D Human Pose Estimation, Residual Networks backbone, Human 3.6M Dataset, Convolutional
Neural Networks
Received: 08 January 2022, Revised: 07 March 2022, Accepted: 26 February 2022, Published Online: 17 March 2022
-Tran H-Y, Bui T-M, Pham T-L, Le V-H. An evaluation of 2D human pose estimation based on Resnet backbone. Journal of Engineering Research and Sciences. 2022;1(3):59-67. doi:10.55708/js0103007
-Tran, Hai-Yen, Trung-Minh Bui, Thi-Loan Pham, and Van-Hung Le. “An Evaluation of 2D Human Pose Estimation Based on Resnet Backbone.” Journal of Engineering Research and Sciences 1, no. 3 (2022): 59–67.
H.-Y. -Tran, T.-M. Bui, T.-L. Pham, and V.-H. Le, “An evaluation of 2D human pose estimation based on Resnet backbone,” Journal of Engineering Research and Sciences, vol. 1, no. 3, pp. 59–67, 2022.
2D HumanPose Estimation (2D-HPE) has been widely applied in many practical applications in life su ction, human-robot interaction, using Convolutional Neural Networks (CNNs), which has achieved many good results. In particular, the 2D-HPE results are intermediate in the 3D Human Pose Estimation (3D-HPE) process. In this paper, we perform a study to compare the results of 2D-HPE using versions of Residual Network (ResNet/RN) (RN-10, RN- 18, RN-50, RN-101, RN-152) on HUman 3.6M Dataset (HU-3.6M-D). We transformed the original 3D annotation data of the Human 3.6M dataset to a 2D human pose. The estimated models are fine-tuning based on two protocols of the HU-3.6M-D with the same input parameters in the RN versions. The best estimate has an error of 34.96 pixels with Protocol #1 and 28.48 pixels with Protocol #3 when training with 10 epochs, increasing the number of training epochs reduces the estimation error (15.8 pixels of Protocol #3). The results of quantitative evaluation, comparison, analysis, and illustration in the paper.
