Use of Support Vector Machines (SVM) for Classifying Pollution Sources in Urban Environments
Rishabh BhardwajCentre of Research Impact and Outcome, Chitkara University, Rajpura, Punjab, India. rishabh.bhardwaj.orp@chitkara.edu0009-0009-6075-8837
Rakhi JhaAssistant Professor, Department of Computer Science & IT, ARKA JAIN University, Jamshedpur, Jharkhand, India. rakhi.j@arkajainuniversity.ac.in0009-0007-2593-9072
Dr. Mercy Paul SelvanProfessor, Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India. mercypaulselvan.cse@sathyabama.ac.in0000-0001-8950-849X
Dr. Koushik SarAssistant Professor, Department of Agronomy, Institute of Agricultural Sciences, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India. koushiksar@soa.ac.in0000-0002-7754-2663
M. Sunil KumarAssistant Professor, Department of Mechanical Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Ramnagar, Karnataka, India. sunilkumar.m@jainuniversity.ac.in0000-0001-9054-4279
Lakshman SinghSchool of Engineering & Computing, Dev Bhoomi Uttarakhand University, Dehradun, India. ce.lakshman@dbuu.ac.in0009-0005-7018-3855
Keywords: Air pollution source, machine learning, meteorological data, support vector machines, urban air quality.
Abstract
This study investigates the application of Support Vector Machines (SVM) to classify major air pollution sources in Bengaluru, India, by integrating routinely collected air quality, meteorological, and land-use data from 2021 to 2023. The main objective is to assess whether commonly available datasets can accurately distinguish between vehicular, industrial, domestic, and biomass burning sources. Air pollutant concentrations (PM₂.₅, PM₁₀, NO₂, SO₂, CO, O₃) were combined with meteorological parameters, satellite-derived land-cover indices (NDVI, NDBI), and urban activity datasets to develop feature vectors for classification. Data preprocessing ensured quality control, synchronisation, and normalisation, while principal component analysis reduced dimensionality. An SVM with a radial basis function kernel was trained and evaluated using stratified cross-validation, with model stability improved through auxiliary Support Vector Regression (SVR) for temporal smoothing. The classifier achieved an overall accuracy of 70% (Cohen's kappa: 0.59), with best performance for biomass burning (F1-score: 0.78) and industrial emissions (F1-score: 0.68), and moderate success in differentiating vehicular (F1-score: 0.63) and domestic (F1-score: 0.64) sources. Predictor importance analysis revealed that road density, wind-adjusted pollutant concentrations, and land-cover indices were most influential. Spatial and temporal validation demonstrated consistency with external ground-truth activities. The findings suggest that SVM, supplemented by routine datasets, provides a robust, cost-effective alternative to traditional source apportionment for urban air quality management, with potential for real-time application in rapidly growing cities.