TY - JOUR
T1 - Characterizing Spatial Variability of Soil Organic Carbon Through Improved Machine-Learning Modeling With In Situ Data Resampling
T2 - A Case Study in Alaska
AU - Peng, Wei
AU - Yi, Yonghong
AU - Mishra, Umakant
AU - Bakian-Dogaheh, Kazem
AU - Kimball, John S.
AU - Moghaddam, Mahta
AU - Chen, Hans W.
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Sparse and unevenly distributed soil samples across the northern high-latitude region greatly limit the accuracy of soil organic carbon (SOC) mapping. Substantial discrepancies, therefore, exist in SOC estimation in this region, which makes it challenging to characterize the SOC spatial variability and its potential responses to climate change and permafrost degradation. In order to address these challenges, we enhanced a machine-learning model for SOC mapping by developing a data resampling approach that accounts for the soil samples’ spatial heterogeneity, using Alaska as a case study. Specifically, in situ SOC data were resampled with weights proportional to the variance within a 15-km radius and then fit using a random forest (RF) regression model. Multiple features, including temporal composites of Sentinel-1 C-band radar backscatter, vegetation indices from Sentinel-2, climate indices including thawing and freezing indices from moderate resolution imaging spectroradiometer (MODIS), and ancillary topography data, were selected as inputs for the RF model after recursive feature elimination (RFE) to generate top-layer (0–30 cm) SOC content (SOCC) maps in Alaska at a 250-m resolution. The enhanced RF model with data resampling showed improved accuracy compared to the original RF model, with the coefficient of determination ( R2 ) increased from 0.36 to 0.56 and the root-mean square error (RMSE) decreased from 16% to 11% for the surface (0–10 cm) SOCC, and slightly improved accuracy for the deeper (10–30 cm) SOCC. Additionally, the enhanced RF model also better captured local-scale variability of SOC than the original RF model and SoilGrids 2.0 dataset, with high-resolution remote sensing indices playing a major role. The improved SOCC estimates were then used to estimate soil bulk density (BD) and calculate total SOC stock for Alaska. Our results suggest that Alaskan topsoil (0–30 cm) stores approximately 25.21 ± 17.18 Pg C, with the largest SOC reserves found in shrublands. These findings highlight the importance of accounting for spatial heterogeneity in in situ samples and leveraging high-resolution remote sensing data for regional soil mapping.
AB - Sparse and unevenly distributed soil samples across the northern high-latitude region greatly limit the accuracy of soil organic carbon (SOC) mapping. Substantial discrepancies, therefore, exist in SOC estimation in this region, which makes it challenging to characterize the SOC spatial variability and its potential responses to climate change and permafrost degradation. In order to address these challenges, we enhanced a machine-learning model for SOC mapping by developing a data resampling approach that accounts for the soil samples’ spatial heterogeneity, using Alaska as a case study. Specifically, in situ SOC data were resampled with weights proportional to the variance within a 15-km radius and then fit using a random forest (RF) regression model. Multiple features, including temporal composites of Sentinel-1 C-band radar backscatter, vegetation indices from Sentinel-2, climate indices including thawing and freezing indices from moderate resolution imaging spectroradiometer (MODIS), and ancillary topography data, were selected as inputs for the RF model after recursive feature elimination (RFE) to generate top-layer (0–30 cm) SOC content (SOCC) maps in Alaska at a 250-m resolution. The enhanced RF model with data resampling showed improved accuracy compared to the original RF model, with the coefficient of determination ( R2 ) increased from 0.36 to 0.56 and the root-mean square error (RMSE) decreased from 16% to 11% for the surface (0–10 cm) SOCC, and slightly improved accuracy for the deeper (10–30 cm) SOCC. Additionally, the enhanced RF model also better captured local-scale variability of SOC than the original RF model and SoilGrids 2.0 dataset, with high-resolution remote sensing indices playing a major role. The improved SOCC estimates were then used to estimate soil bulk density (BD) and calculate total SOC stock for Alaska. Our results suggest that Alaskan topsoil (0–30 cm) stores approximately 25.21 ± 17.18 Pg C, with the largest SOC reserves found in shrublands. These findings highlight the importance of accounting for spatial heterogeneity in in situ samples and leveraging high-resolution remote sensing data for regional soil mapping.
KW - Data resampling
KW - machine learning
KW - multisource remote sensing
KW - soil organic carbon (SOC)
UR - https://www.scopus.com/pages/publications/105005838561
U2 - 10.1109/TGRS.2025.3572344
DO - 10.1109/TGRS.2025.3572344
M3 - Article
AN - SCOPUS:105005838561
SN - 0196-2892
VL - 63
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 4505414
ER -