Interpretable data-driven modeling of total phosphorus dynamics from 2005 to 2024 in a large shallow lake

  • Weipeng Lin
  • , Yongqiang Zhou
  • , Ze Ren
  • , Wei Zou
  • , Hongwei Guo
  • , Na Li
  • , Yunlin Zhang
  • , James Elser
  • , R. Iestyn Woolway
  • , Kun Shi
  • , Guangwei Zhu
  • , Boqiang Qin
  • , Yufei Xue

Research output: Contribution to journalArticlepeer-review

Abstract

Phosphorus (P) is a critical biogenic element driving aquatic productivity and eutrophication in freshwater systems. However, monitoring total phosphorus (TP) in shallow, dynamic lakes remain challenging due to its pronounced spatiotemporal variability and complex interactions with optically active constituents. While remote sensing provides a cost-effective supplement to in situ monitoring of TP, accurately estimating TP —a non-optically active component— using remote sensing requires models that balance high precision, reliability, and interpretability. Therefore, this study develops an interpretable machine learning framework using the Light Gradient Boosting Machine Regressor (LGBMR), trained on multi-source data including MODIS satellite reflectance, in situ measurements and meteorological variables to estimate TP dynamics. The LGBMR outperformed 13 other algorithms on independent validation datasets (N = 609, R² = 0.70, MAPE = 27.9 %), demonstrating superior predictive performance. Shapley Additive Explanations (SHAP) analysis revealed mechanistic controls of input variables on TP dynamics, enabling the model to effectively capture both seasonal-spatial TP variability and climate-induced extremes. Long-term analysis of Lake Taihu revealed a significant declining trend in TP concentration over the past two decades (R2 = 0.26, P < 0.05, rate: -0.009 mg/L/decade), with an accelerated decline from 2017 to 2024 (R2 = 0.77, P < 0.05, rate: -0.059 mg/L/decade). SHAP analysis revealed a 12.4 % and 18.9 % decrease in pixel counts dominated by total suspended matter (TSM) and algal-associated P, respectively. The decline is attributed to reduced external loading due to improved watershed management and internal phosphorus release due to reduced algal biomass and sediment resuspension linked to weakened wind-driven mixing. These findings underscore the effectiveness of integrated modeling approaches for tracking phosphorus dynamics in shallow eutrophic lakes, providing actionable insights for eutrophication management. The proposed framework advances interpretable machine learning in environmental monitoring by elucidating mechanistic linkages between hydrological, meteorological, and biogeochemical drivers.

Original languageEnglish
Article number125169
JournalWater Research
Volume291
DOIs
StatePublished - Mar 1 2026

Keywords

  • Eutrophication
  • Interpretable machine learning
  • Remote sensing
  • Shallow eutrophic lake
  • Total phosphorus

Fingerprint

Dive into the research topics of 'Interpretable data-driven modeling of total phosphorus dynamics from 2005 to 2024 in a large shallow lake'. Together they form a unique fingerprint.

Cite this