TY - JOUR
T1 - ‘Highly-Informative’ Genetic Markers Can Bias Conclusions
T2 - Examples and General Solutions
AU - Lee, Andy
AU - Hemstrom, William
AU - Molea, Natalie
AU - Luikart, Gordon
AU - Christie, Mark R.
N1 - © 2025 The Author(s). Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2025/10
Y1 - 2025/10
N2 - High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-FST markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation. Biased results from choosing high-FST markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to FST approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high FST loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary FST cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).
AB - High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-FST markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation. Biased results from choosing high-FST markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to FST approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high FST loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary FST cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).
KW - ecological genetics
KW - genomics/proteomics
KW - natural selection and contemporary evolution
KW - population genetics—theoretical
KW - Bias
KW - Computer Simulation
KW - Genetic Markers
KW - Genetics, Population/methods
UR - https://www.scopus.com/pages/publications/105010454010
U2 - 10.1111/1755-0998.70011
DO - 10.1111/1755-0998.70011
M3 - Article
C2 - 40641441
AN - SCOPUS:105010454010
SN - 1755-098X
VL - 25
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 7
M1 - e70011
ER -