Air Quality Classification Using Extreme Gradient Boosting (XGBOOST) Algorithm

Albi Mulyadi Sapari, Asep Id Hadiana, Fajri Rakhmat Umbara


Air pollution is a serious issue caused by vehicle exhaust, industrial factories, and piles of garbage. The impact is detrimental to human health and the environment. To quickly and accurately monitor classification, techniques are used. One efficient and accurate classification algorithm is XGBoost, a development of the Gradient Decision Tree (GDBT) with several advantages, such as high scalability and prevention of overfitting. The parameters used in the classification include (PM10), (PM2,5),(SO2),(CO),(O3) and (NO2). This study aims to classify air quality into three labels or categories: good, moderate, and unhealthy. In the dataset used to experience an imbalance class, to overcome the imbalance class, techniques will be carried out, namely SMOTE, Random UnderSampling, and Random OverSampling, by producing an accuracy of up to 98,61% with the SMOTE technique for class imbalance. Testing the level of accuracy is done by using the Confusion Matrix.

