Integration of SMOTE and Ensemble Models for Predicting Airline Passenger Satisfaction

Bagus Laksono, Ika Kurniawati, Aditya Budi Sriwiyanta, Zen Zen Zaenudin, Johan Afrian Ramadha, Desri Alfian

Abstract

The high public interest in air transportation has become a polemic for airline companies that are competing to maintain their existence by continuously improving their services. The passenger satisfaction survey data collected has several problems such as unbalanced data, missing values, noise, difficulty finding significant patterns and biased data. Imbalanced class causes the classification results to lean more towards the majority class, this can reduce the performance of the prediction model. SMOTE is one of the over-sampling methods to balance the dataset by increasing the number of samples in the minority class based on k-nearest neighbors to approach the same class. Boosting is a machine learning strategy that combines many very fragile and poor prediction rules to produce very accurate prediction rules. In this study, we conducted a model experiment by integrating the SMOTE and AdaBoost ensembles with the classification algorithm to obtain the best performance metrics. The results showed that the performance of integrating the DT + SMOTE and DT + SMOTE + AdaBoost models produced an accuracy of 91.88%, this performance is superior to the traditional DT model. Significant performance improvements also occur in the integration of NB+SMOTE+AdaBoost and NB+AdaBoost, which is an increase of around 5% compared to NB. However, the application of SMOTE to NB decreases accuracy because SMOTE produces synthetic samples that can disrupt the independence assumption of NB. The results of this study demonstrate the superiority of our proposed method, a robust ensemble learning compared to traditional machine learning classifiers. Both techniques are very efficient in improving classification capabilities, especially in cases of complex and imbalanced data. 

AdaBoost, Customer satisfaction prediction, Data mining, Ensemble learning, Imbalanced data, SMOTE. 

Full Text:

PDF (77-85)

References

S. Walia, D. Sharma, and A. Mathur, “The Impact of Service Quality on Passenger Satisfaction and Loyalty in the Indian Aviation Industry,” Int. J. Hosp. Tour. Syst. Vol 14, No 2, Jul. 2022, [Online]. Available: https://i-scholar.in/index.php/ijhts/article/view/213099

W. S. Ismail, H. Bawazeer, and H. Almansori, “Predictive Analytics for Enhanced Passenger Satisfaction in the Airline Industry: Leveraging Machine Learning to Drive Strategic Decision-Making,” in 2024 10th International Conference on Optimization and Applications (ICOA), 2024, pp. 1–6. doi: 10.1109/ICOA62581.2024.10753807.

K. Hulliyah, “Predicting Airline Passenger Satisfaction with Classification Algorithms,” IJIIS Int. J. Informatics Inf. Syst., vol. 4, no. 1, pp. 82–94, 2021, doi: 10.47738/ijiis.v4i1.80.

D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach. Learn., vol. 113, no. 7, pp. 4903–4923, 2024, doi: 10.1007/s10994-022-06296-4.

J. Wei, Z. Lu, K. Qiu, P. Li, and H. Sun, “Predicting drug risk level from adverse drug reactions using smote and machine learning approaches,” IEEE Access, vol. 8, pp. 185761–185775, 2020, doi: 10.1109/ACCESS.2020.3029446.

A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Expert Syst. Appl., vol. 244, no. May 2023, p. 122778, 2024, doi: 10.1016/j.eswa.2023.122778.

Y. Wang and L. Feng, “Improved Adaboost Algorithm for Classification Based on Noise Confidence Degree and Weighted Feature Selection,” IEEE Access, vol. 8, pp. 153011–153026, 2020, doi: 10.1109/ACCESS.2020.3017164.

Y. Wang and L. Feng, “An adaptive boosting algorithm based on weighted feature selection and category classification confidence,” Appl. Intell., vol. 51, no. 10, pp. 6837–6858, 2021, doi: 10.1007/s10489-020-02184-3.

I. D. Mienye and N. Jere, “A Survey of Decision Trees: Concepts, Algorithms, and Applications,” IEEE Access, vol. 12, no. June, pp. 86716–86727, 2024, doi: 10.1109/ACCESS.2024.3416838.

S. Chen, G. I. Webb, L. Liu, and X. Ma, “A novel selective naïve Bayes algorithm,” Knowledge-Based Syst., vol. 192, p. 105361, 2020, doi: 10.1016/j.knosys.2019.105361.

I. Wickramasinghe and H. Kalutarage, “Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation,” Soft Comput., vol. 25, no. 3, pp. 2277–2293, 2021, doi: 10.1007/s00500-020-05297-6.

J. Beinecke and D. Heider, “Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making,” BioData Min., vol. 14, no. 1, pp. 1–11, 2021, doi: 10.1186/s13040-021-00283-6.

N. Ahmad, M. J. Awan, H. Nobanee, A. M. Zain, A. Naseem, and A. Mahmoud, “Customer Personality Analysis for Churn Prediction Using Hybrid Ensemble Models and Class Balancing Techniques,” IEEE Access, vol. 12, no. October 2023, pp. 1865–1879, 2024, doi: 10.1109/ACCESS.2023.3334641.

A. O. Widodo, B. Setiawan, and R. Indraswari, “Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE,” Procedia Comput. Sci., vol. 234, pp. 578–583, 2024, doi: 10.1016/j.procs.2024.03.042.

C. Karima and W. Anggraeni, “Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced Dataset,” Procedia Comput. Sci., vol. 234, pp. 645–653, 2024, doi: 10.1016/j.procs.2024.03.050.

D. Rofianto, E. Safitri, K. Amaliah, J. Fitra, and A. Hijriani, “Cyber Threat Detection Using an Ensemble Model Approach for Phishing Website Identification,” Innov. Res. Informatics, vol. 2, pp. 81–89, 2024.

A. D. Amirruddin, F. M. Muharam, M. H. Ismail, N. P. Tan, and M. F. Ismail, “Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles,” Comput. Electron. Agric., vol. 193, no. January 2021, 2022, doi: 10.1016/j.compag.2021.106646.

V. G. Costa and C. E. Pedreira, “Recent advances in decision trees: An updated survey,” Artif. Intell. Rev., vol. 56, no. 5, pp. 4765–4800, 2023.

Y. Zhu, Y. Wang, L. Qin, B. Zhang, B.-C. Shia, and M. Chen, “Naïve Bayes classifier based on reliability measurement for datasets with noisy labels,” Ann. Oper. Res., 2023, doi: 10.1007/s10479-023-05671-1.

R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Constrained Naïve Bayes with application to unbalanced data classification,” Cent. Eur. J. Oper. Res., vol. 30, no. 4, pp. 1403–1425, 2022, doi: 10.1007/s10100-021-00782-1.

N. S. Rahmi, N. W. S. Wardhani, M. B. Mitakda, R. S. Fauztina, and I. Salsabila, “SMOTE Classification and Random Oversampling Naive Bayes in Imbalanced Data : (Case Study of Early Detection of Cervical Cancer in Indonesia),” in 2022 IEEE 7th International Conference on Information Technology and Digital Applications (ICITDA), IEEE, Nov. 2022, pp. 1–6. doi: 10.1109/ICITDA55840.2022.9971421.

https://www.kaggle.com/datasets/tetanggabaik/flight-user-satisfaction-level-citilink-indonesia

Refbacks

  • There are currently no refbacks.