Modelling post-fire tree mortality: Can random forest improve discrimination of imbalanced data?

Timothy M. Shearman, J. Morgan Varner, Sharon M. Hood, C. Alina Cansler, J. Kevin Hiers

Research output: Contribution to journalArticlepeer-review

29 Scopus citations

Abstract

• Logistic regression (LR) models trained with data containing imbalanced data can severely under-predict the minority class. • We show that balanced random forest (RF) models can be a potential solution to the class imbalance problem in post-fire tree mortality models. • Duff consumption and crown scorch were important predictors of tree mortality in both LR and RF models. • The RF model also suggested pre-fire duff depth as an important predictor. • Incorporating RF into decision-support tools could increase our understanding of the underlying mechanisms of fire-induced tree mortality. Predicting post-fire tree mortality is a major area of research in fire-prone forests, woodlands, and savannas worldwide. Past research has relied overwhelmingly on logistic regression analysis (LR) that predicts post-fire tree status as a binary outcome (i.e. living or dead). One of the most problematic issues for LR (or any classification problem) occurs when there is a class imbalance in the training data. In these instances, predictions will be biased toward the majority class. Using a historical prescribed fire data set of longleaf pines (Pinus palustris) from northern Florida, USA, we compare results from standard LR and the machine-learning algorithm, random forest (RF). First, we demonstrate the class imbalance problem using simulated data. We then show how a balanced RF model can be used to alleviate the bias in the model and improve mortality prediction results. In the simulated example, LR model sensitivity and specificity was clearly biased based on the degree of imbalance between the classes. The balanced RF models had consistent sensitivity and specificity throughout the simulated data sets. Re-analyzing the original longleaf pine data set with a balanced RF model showed that although both LR and RF models had similar areas under the receiver operating curve (AUC), the RF model had better discrimination for predicting new observations of dead trees. Both LR and RF models identified duff consumption and percent crown scorch as important predictors of tree mortality, however the RF model also suggested pre-fire duff depth as an important predictor. Our analysis highlights LR limitations when data are imbalanced and supports using RF to develop post-fire tree mortality models. We suggest how RF can be incorporated into future tree mortality studies, as well as possible implementation in future decision-support tools.
Original languageEnglish
Number of pages1
JournalEcological Modelling
Volume414
DOIs
StatePublished - Dec 15 2019

Keywords

  • FLORIDA
  • DEAD trees
  • LONGLEAF pine
  • PRESCRIBED burning
  • LOGISTIC regression analysis
  • MORTALITY
  • Fire effects
  • Logistic regression
  • Machine learning
  • Model evaluation
  • Model validation
  • Pinus palustris
  • Prescribed fire

Fingerprint

Dive into the research topics of 'Modelling post-fire tree mortality: Can random forest improve discrimination of imbalanced data?'. Together they form a unique fingerprint.

Cite this