Enhanced prediction of occurrence forms of heavy metals in tailings: a systematic comparison of machine learning methods and model integration
-
Graphical Abstract
-
Abstract
Tailings produced by mining and ore smelting are a major source of soil pollution. Understanding the speciation of heavy metals (HMs) in tailings is essential for soil remediation and sustainable development. Given the complex and time-consuming nature of traditional sequential laboratory extraction methods for determining the forms of HMs in tailings, a rapid and precise identification approach is urgently required. To address this issue, a general empirical prediction method for HM occurrence was developed using machine learning (ML). The compositional information of the tailings, properties of the HMs, and sequential extraction steps were used as inputs to calculate the percentages of the seven forms of HMs. After the models were tuned and compared, extreme gradient boosting, gradient boosting decision tree, and categorical boosting methods were found to be the top three performing ML models, with the coefficient of determination (R2) values on the testing set exceeding 0.859. Feature importance analysis for these three optimal models indicated that electronegativity was the most important factor affecting the occurrence of HMs, with an average feature importance of 0.4522. The subsequent use of stacking as a model integration method enabled the ability of the ML models to predict HM occurrence forms to be further improved, and resulting in an increase of R2 to 0.879. Overall, this study developed a robust technique for predicting the occurrence forms in tailings and provides an important reference for the environmental assessment and recycling of tailings.
-
-