Air Quality Classification Using Extreme Gradient Boosting (XGBOOST) Algorithm

Albi Mulyadi Sapari, Asep Id Hadiana, Fajri Rakhmat Umbara

Abstract

Air pollution is a serious issue caused by vehicle exhaust, industrial factories, and piles of garbage. The impact is detrimental to human health and the environment. To quickly and accurately monitor classification, techniques are used. One efficient and accurate classification algorithm is XGBoost, a development of the Gradient Decision Tree (GDBT) with several advantages, such as high scalability and prevention of overfitting. The parameters used in the classification include (PM10), (PM2,5),(SO2),(CO),(O3) and (NO2). This study aims to classify air quality into three labels or categories: good, moderate, and unhealthy. In the dataset used to experience an imbalance class, to overcome the imbalance class, techniques will be carried out, namely SMOTE, Random UnderSampling, and Random OverSampling, by producing an accuracy of up to 98,61% with the SMOTE technique for class imbalance. Testing the level of accuracy is done by using the Confusion Matrix.

Full Text:

PDF (44-51)

References

R. Satra and A. Rachman, “Pengembangan Sistem Monitoring Pencemaran Udara Berbasis Protokol ZIGBEE dengan Sensor CO,” Ilk. J. Ilm., vol. 8, no. 1, p. 17, 2016, doi: 10.33096/ilkom.v8i1.8.17-22.

J. Abidin, F. Artauli Hasibuan, K. Kunci, P. Udara, and D. Gauss, “Pengaruh dampak pencemaran udara terhadap kesehatan untuk menambah pemahaman masyarakat awam tentang bahaya dari polusi udara,” Pros. Semin. Nas. Fis. Univ. Riau IV, no. September, pp. 1–7, 2019, [Online]. Available: https://snf.fmipa.unri.ac.id/wp-content/uploads/2019/09/18.-OFMI-3002.pdf

M. Rosyidah, “Polusi Udara dan Kesehatan,” J. Tek. Ind., vol. 1, no. 11, pp. 5–8, 2016.

A. Sanmorino, J. Alie, N. Ariati, and S. V. Wulanda, “K-NN Based Air Classification as Indicator of the Index of Air Quality in Palembang,” SinkrOn, vol. 7, no. 3, pp. 853–859, 2022, doi: 10.33395/sinkron.v7i3.11469.

P. R. Peraturan Pemerintah No 41 Tahun 1999, “PP-No.41-th-1999-Pengendalian-pencemaran-Udara,” no. 41, pp. 1–16, 1999.

Peraturan Pemerintah RI, “Peraturan Menteri Lingkungan Hidup dan Kehutanan Republik Indonesia No 14 Tahun 2020 tentang Indeks Standar Pencemaran Udara,” pp. 1–16, 2020.

J. Wang, H. Li, and H. Lu, “Application of a novel early warning system based on fuzzy time series in urban air quality forecasting in China,” Appl. Soft Comput. J., vol. 71, pp. 783–799, 2018, doi: 10.1016/j.asoc.2018.07.030.

A. Aziiz, H. Kirono, I. Asror, Y. Firdaus, and A. Wibowo, “Klasifikasi Tingkat Kualitas Udara Dki Jakarta Menggunakan Algoritma Naive Bayes,” eProceedings …, vol. 9, no. 3, pp. 1962–1969, 2022, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/18002%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/18002/17631

Y. Su, “Prediction of air quality based on Gradient Boosting Machine Method,” Proc. - 2020 Int. Conf. Big Data Informatiz. Educ. ICBDIE 2020, pp. 395–397, 2020, doi: 10.1109/ICBDIE50010.2020.00099.

Z. Qi, “The Text Classification of Theft Crime Based on TF-IDF and XGBoost Model,” Proc. 2020 IEEE Int. Conf. Artif. Intell. Comput. Appl. ICAICA 2020, pp. 1241–1246, 2020, doi: 10.1109/ICAICA50127.2020.9182555.

M. K. Nasution, R. R. Saedudin, and V. P. Widartha, “Perbandingan Akurasi Algoritma Naïve Bayes Dan Algoritma Xgboost Pada Klasifikasi Penyakit Diabetes,” e-Proceeding Eng., vol. 8, no. 5, pp. 9765–9772, 2021, [Online]. Available: https://journal.ubpkarawang.ac.id/mahasiswa/index.php/ssj/article/view/424/338%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15759

M. R. Givari, M. R. Sulaeman, and Y. Umaidah, “Perbandingan Algoritma SVM, Random Forest Dan XGBoost Untuk Penentuan Persetujuan Pengajuan Kredit,” Nuansa Inform., vol. 16, no. 1, pp. 141–149, 2022, doi: 10.25134/nuansa.v16i1.5406.

P. Studi and T. Lingkungan, “Analisa Deskriptif Pengelompokan Data Konsentrasi Pm2 , 5 Berdasarkan Hari Pada Titik Pemantauan,” vol. 03, no. 01, pp. 42–48, 2022.

M. Sahare and H. Gupta, “A review of multi-class classification for imbalanced data,” Int. J. Adv. Comput. …, no. 3, pp. 1–5, 2012, [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.300.8687&rep=rep1&type=pdf

F. Hamami and I. Fithriyah, “Classification of air pollution levels using artificial neural network,” 2020 Int. Conf. Inf. Technol. Syst. Innov. ICITSI 2020 - Proc., pp. 217–220, 2020, doi: 10.1109/ICITSI50517.2020.9264910.

A. T. Teologo, E. P. Dadios, R. Q. Neyra, and I. M. Javel, “Air Quality Index (AQI) Classification using CO and NO2 Pollutants: A Fuzzy-based Approach,” IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, vol. 2018-Octob, no. 2, pp. 194–198, 2019, doi: 10.1109/TENCON.2018.8650344.

H. Yi, Q. Xiong, Q. Zou, R. Xu, K. Wang, and M. Gao, “A Novel Random Forest and its Application on Classification of Air Quality,” Proc. - 2019 8th Int. Congr. Adv. Appl. Informatics, IIAI-AAI 2019, pp. 35–38, 2019, doi: 10.1109/IIAI-AAI.2019.00018.

A. A. Nababan, M. Jannah, M. Aulina, and D. Andrian, “Prediksi Kualitas Udara Menggunakan Xgboost Dengan Synthetic Minority Oversampling Technique (Smote) Berdasarkan Indeks Standar Pencemaran Udara (Ispu),” JTIK (Jurnal Tek. Inform. Kaputama), vol. 7, no. 1, pp. 214–219, 2023, doi: 10.59697/jtik.v7i1.66.

H. Sulastri and A. I. Gufroni, “Penerapan Data Mining Dalam Pengelompokan Penderita Thalassaemia,” J. Nas. Teknol. dan Sist. Inf., vol. 3, no. 2, pp. 299–305, 2017, doi: 10.25077/teknosi.v3i2.2017.299-305.

G. Abdurrahman, “Jurnal Sistem dan Teknologi Informasi Klasifikasi Penyakit Diabetes Melitus Menggunakan Adaboost Classifier,” JUSTINDO (Jurnal Sist. dan Teknol. Informasi), vol. 7, no. 1, pp. 59–66, 2022, [Online]. Available: http://jurnal.unmuhjember.ac.id/index.php/JUSTINDO

T. Pan, J. Zhao, W. Wu, and J. Yang, “Learning imbalanced datasets based on SMOTE and Gaussian distribution,” Inf. Sci. (Ny)., vol. 512, pp. 1214–1233, 2020, doi: 10.1016/j.ins.2019.10.048.

S. M. Abd Elrahman and A. Abraham, “A Review of Class Imbalance Problem,” J. Netw. Innov. Comput., vol. 1, pp. 332–340, 2013, [Online]. Available: www.mirlabs.net/jnic/index.html

M. H. Ariansyah, S. Winarno, E. Nur Fitri, and H. M. Arga Retha, “Multi-Layer Perceptron For Diagnosing Stroke With The SMOTE Method In Overcoming Data Imbalances,” Innov. Res. Informatics, vol. 5, no. 1, pp. 1–8, 2023, doi: 10.37058/innovatics.v5i1.6565.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th Int. Conf. Inf. Commun. Syst. ICICS 2020, pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

M. Syukron, R. Santoso, and T. Widiharih, “Perbandingan Metode Smote Random Forest Dan Smote Xgboost Untuk Klasifikasi Tingkat Penyakit Hepatitis C Pada Imbalance Class Data,” J. Gaussian, vol. 9, no. 3, pp. 227–236, 2020, doi: 10.14710/j.gauss.v9i3.28915.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

I. Muslim Karo Karo, “Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan,” J. Softw. Eng. Inf. Commun. Technol., vol. 1, no. 1, pp. 11–18, 2020.

Refbacks

  • There are currently no refbacks.