- Open Access
- Article
An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks
by Swarup Kumar Mondal and Anindya Sen
1 Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, 700107, India
* Author to whom correspondence should be addressed.
Journal of Engineering Research and Sciences, Volume 3, Issue 10, Page # 1-12, 2024; DOI: 10.55708/js0310001
Keywords: Imbalanced data, Regression, Deep Neural Network, Artificial Neural Network, Support Vector Machine
Received: 19 August 2024, Revised: 20 September 2024, Accepted: 21 September 2024, Published Online: 11 October 2024
(This article belongs to the Special Issue Special Issue on Multidisciplinary Sciences and Advanced Technology 2024 & Section Biochemical Research Methods (BRM))
APA Style
Lois, L. J., Liu, Q., Quan, J., Ye, Q., Zhang, X., Hua, Y., & Li, X. (2024). Application of deuterium oxide (D2O) isotope tracing technique for land grid array package failure analysis. Journal of Engineering Research and Sciences, 3(10), 1-20. https://doi.org/10.55708/js0310001
Chicago/Turabian Style
Lois, Liao Jinzhi, Liu Qing, Quan Jing, Ye Qing, Zhang Xi, Hua Younan, and Li Xiaomin. “Application of Deuterium Oxide (D2O) Isotope Tracing Technique for Land Grid Array Package Failure Analysis.” Journal of Engineering Research and Sciences 3, no. 10 (2024): 1-20. https://doi.org/10.55708/js0310001.
IEEE Style
L. J. Lois, Q. Liu, J. Quan, Q. Ye, X. Zhang, Y. Hua, and X. Li, “Application of deuterium oxide (D2O) isotope tracing technique for land grid array package failure analysis,” Journal of Engineering Research and Sciences, vol. 3, no. 10, pp. 1-20, 2024, doi: 10.55708/js0310001.
Imbalanced dataset handling in real time is one of the most challenging tasks in predictive modelling. This work handles the critical issues arising in imbalanced dataset with implementation of artificial neural network and deep neural network architecture. The usual machine learning algorithms fails to achieve desired throughput with certain input circumstances due to mismatched class ratios in the sample dataset. Dealing with imbalanced dataset leads to performance degradation and interpretability issue in traditional ML architectures. For regression tasks, where the target variable is continuous, the skewed data distribution is major issue. In this study, we have investigated a detailed comparison of traditional ML algorithms and neural networks with dimensionality reduction method to overcome this problem. Principle component analysis has been used for feature selection and analysis on real time satellite-based air pollution dataset. Five regression algorithms Multilinear, Ridge, Lasso, Elastic Net and SVM regression is combined with PCA and non PCA to interpret the outcome. To address unbalanced datasets in real-time, deep neural networks and artificial neural network architectures have been developed. Each model’s experiments and mathematical modelling is done independently. The Deep neural network is superior compared to other conventional models for performance measures of target variable in imbalanced datasets.
- D. Pozzolo, O. Caelen, Y.A.L Borgne, S. Waterschoot, G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Syst. Appl. 2014, 41, 4915–4928, doi: https://doi.org/10.1016/j.eswa.2014.02.026.
- B. Anuradha, V. C. Veera Reddy, “ANN for classification of cardiac arrhythmias,” Asian Research Publishing Network Journal of Engineering and Applied Sciences, vol.3, no.3, 1-6, 2008.
- L. Bruzzone, S. B. Serpico, “A classification of imbalanced remote-sensing data by neural networks,” Pattern Recognition Letters, vol.18, pp.1323-1328, 1997, doi: https://doi.org/10.1016/S0167-8655(97)00109-8.
- G. H. Nguyen, A. Bouzerdou, S. L. Phung, “A supervised learning approach for imbalanced data sets,” Proc. of the 19th International Conference on Pattern Recognition, 1-4, 2008, doi: 10.1109/ICPR.2008.4761278.
- G. Pang, C. Shen, L. Cao, A. Van Den Hengel, , “Deep learning for anomaly detection: A review,” ACM Comput. Surv. (CSUR), 54, 38, 2021, doi: https://doi.org/10.1145/3439950.
- A. Adam, M. Shapiai, Z. Ibrahim, M. Khalid, “A Modified Artificial Neural Network Learning Algorithm for Imbalanced Data Set Problem,” International Conference on Computational Intelligence, Communication Systems and Networks, CICSyN 2010, doi: 10.1109/CICSyN.2010.9.
- S. Wang, W. Liu, J. Wu, “Training Deep Neural Networks on Imbalanced Data Sets,” International Joint Conference on Neural Networks (IJCNN), 4368-4374, 2016, doi: 10.1109/IJCNN.2016.7727770.
- A. Li, S. Liang, A. Wang, J. Qin, “Estimating Crop Yield from Multi-temporal Satellite Data Using Multivariate Regression and Neural Network Techniques,” American Society for Photogrammetry and Remote Sensing, Vol. 73, No. 10, 1149–1157, 2007, doi: 10.14358/PERS.73.10.1149.
- Y. Tang, V. N. Chawla, “SVMs Modeling for Highly Imbalanced Classification,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. 39, 281 – 288, 2008, doi: 10.1109/TSMCB.2008.2002909.
- Y. Yang, K. Zha, “Delving into Deep Imbalanced Regression,” ICML 2021, https://arxiv.org/abs/2102.09554.
- C. Huang, Y. Li, C. L. Change, X. Tang, “Learning deep representation for imbalanced classification,” IEEE conference on computer vision and pattern recognition, pages 5375–5384, 2016, doi: 10.1109/CVPR.2016.580.
- M. Steininger, K. Kobs, P. Davidson, “Density‑based weighting for imbalanced regression,” Mach Learn, 110, 2187–2211, 2021, doi: https://doi.org/10.1007/s10994-021-06023-5.
- A. Rahim, N.A. Rashid, A. Nayan, A. Ahmad, “SMOTE Approach to Imbalanced Dataset in Logistic Regression Analysis,” ICMS 2017, 429-433, 2019, doi: https://doi.org/10.1007/978-981-13-7279-7_53.
- P. Branco, L. Torgo, P. R. Ribeiro, “SMOGN: a Pre-processing Approach for Imbalanced Regression,” In First international workshop on learning with imbalanced domains: Theory and applications, pages 36–50. PMLR, 2017.
- C. Peng, Q. Cheng, “Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data,” IEEE Trans. on Neural Networks and Learning Systems, 2595 – 2609, 2020, doi: 10.1109/TNNLS.2020.3006877.
- A. SzeTo, K. C. Wong, “A Weight-Selection Strategy on Training Deep Neural Networks for Imbalanced Classification,” International Conference Image Analysis and Recognition, 3-10, 2017, doi: https://doi.org/10.1007/978-3-319-59876-5_1.
- R. Akbani, S. Kwek, N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets,” European Conference on Machine Learning (ECML), 39–50, 2004, doi: https://doi.org/10.1007/978-3-540-30115-8_7
- Y. H. Liu, Y. T. Chen, S. S. Lu, “Face Detection Using Kernel PCA and Imbalanced SVM,” International Conference on Natural Computation, 351–360, 2006, doi: https://doi.org/10.1007/11881070_50.
- J. Mathew, M. Luo, C. K. Pang, H. L. Chan, “Kernel-based smote for SVM classification of imbalanced datasets,” IECON,.1127-1132, 2015, doi: 10.1109/IECON.2015.7392251.
- R. Anand, K. G. Mehrotra, C.K. Mohan, S. Ranka, “An improved algorithm for neural network classification of imbalanced training sets;” IEEE Trans. Neural Networks 4, 962–969, 1993, doi: 10.1109/72.286891.
- H. Larochelle, Y. Bengio, J. Louradour, J. Lamblin, “Exploring strategies for training deep neural networks,” Journal of machine learning research, vol 10, 1-40, 2009, doi: 10.1145/1577069.1577070.
- S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, R. Togneri, “Cost-Sensitive Learning of Deep Feature Representations from Imbalanced Data,” IEEE Trans. Neural Network Learn System, pp 3573 – 3587, 2017, doi: 10.1109/TNNLS.2017.2732482.