Transfer learning enables prediction of CYP2D6 haplotype function

  • Gregory McInnes
  • , Rachel Dalton
  • , Katrin Sangkuhl
  • , Michelle Whirl-Carrillo
  • , Seung Been Lee
  • , Philip S. Tsao
  • , Andrea Gaedigk
  • , Russ B. Altman
  • , Erica L. Woodahl

Research output: Contribution to journalArticlepeer-review

39 Scopus citations

Abstract

Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene whose protein product metabolizes more than 20% of clinically used drugs. Genetic variations in CYP2D6 are responsible for interindividual heterogeneity in drug response that can lead to drug toxicity and ineffective treatment, making CYP2D6 one of the most important pharmacogenes. Prediction of CYP2D6 phenotype relies on curation of literature-derived functional studies to assign a functional status to CYP2D6 haplotypes. As the number of large-scale sequencing efforts grows, new haplotypes continue to be discovered, and assignment of function is challenging to maintain. To address this challenge, we have trained a convolutional neural network to predict functional status of CYP2D6 haplotypes, called Hubble.2D6. Hubble.2D6 predicts haplotype function from sequence data and was trained using two pre-training steps with a combination of real and simulated data. We find that Hubble.2D6 predicts CYP2D6 haplotype functional status with 88% accuracy in a held-out test set and explains 47.5% of the variance in in vitro functional data among star alleles with unknown function. Hubble.2D6 may be a useful tool for assigning function to haplotypes with uncurated function, and used for screening individuals who are at risk of being poor metabolizers.

Original languageEnglish
Article numbere1008399
JournalPLoS Computational Biology
Volume16
Issue number11
DOIs
StatePublished - Nov 2 2020

Funding

G.M. is supported by the Big Data to Knowledge (BD2K) from the National Institutes of Health (T32 LM012409). E.L.W. and R.D. are supported by the Northwest Alaska-Pharmacogenomics Research Network (NWAPGRN) (P01GM116691). K.S., M.W.C., and R.B.A are supported by NIH/NHGRI PharmGKB resource, (U24 HG010615). R.B.A. is also supported by the Chan Zuckerberg Biohub and NIH GM102365. A.G. is supported by the National Institutes of Health for the Pharmacogene Variation Consortium (R24GM123930). P.S.T is supported by 1HL-101388 (NIH-NHLBI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Kenneth Thummel at the University of Washington and Erin Schuetz at St. Jude Children’s Research Hospital for providing the liver bank resources to conduct this work.

Funder number
1HL-101388
P01GM116691
T32 LM012409
U24 HG010615, GM102365
R24GM123930

    Fingerprint

    Dive into the research topics of 'Transfer learning enables prediction of CYP2D6 haplotype function'. Together they form a unique fingerprint.

    Cite this