Safety and Completeness in Flow Decompositions for RNA Assembly

  • Shahbaz Khan
  • , Milla Kortelainen
  • , Manuel Cáceres
  • , Lucia Williams
  • , Alexandru I. Tomescu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Flow decomposition has numerous applications, ranging from networking to bioinformatics. Some applications require any valid decomposition that optimizes some property as number of paths, robustness, or path lengths. Many bioinformatic applications require the specific decomposition which relates to the underlying data that generated the flow. Thus, no optimization criteria guarantees to identify the correct decomposition for real inputs. We propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. Ma et al. [WABI 2020] addressed the existence of multiple optimal solutions in a probabilistic framework, which is referred to as non-identifiability. Later, they gave a quadratic-time algorithm [RECOMB 2021] based on a global criterion for solving a problem called AND-Quant, which generalizes the problem of reporting whether a given path is safe. We present the first local characterization of safe paths for flow decompositions in directed acyclic graphs, giving a practical algorithm for finding the complete set of safe paths. We also evaluated our algorithm against the trivial safe algorithms (unitigs, extended unitigs) and a popular heuristic (greedy-width) for flow decomposition on RNA transcripts datasets. Despite maintaining perfect precision our algorithm reports ≈ 50% higher coverage over trivial safe algorithms. Though greedy-width reports better coverage, it has significantly lower precision on complex graphs. On a unified metric (F-Score) of coverage and precision, our algorithm outperforms greedy-width by ≈ 20%, when the evaluated dataset has significant number of complex graphs. Also, it has superior time (3–5 × ) and space efficiency (1.2–2.2 × ), resulting in a better and more practical approach for bioinformatics applications of flow decomposition.

Original languageEnglish
Title of host publicationResearch in Computational Molecular Biology - 26th Annual International Conference, RECOMB 2022, Proceedings
EditorsItsik Pe’er
PublisherSpringer Science and Business Media Deutschland GmbH
Pages177-192
Number of pages16
ISBN (Print)9783031047480
DOIs
StatePublished - 2022
Event26th International Conference on Research in Computational Molecular Biology, RECOMB 2022 - San Diego, United States
Duration: May 22 2022May 25 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13278 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Research in Computational Molecular Biology, RECOMB 2022
Country/TerritoryUnited States
CitySan Diego
Period05/22/2205/25/22

Keywords

  • DAGs
  • Flow decomposition
  • RNA assembly
  • Safety

Fingerprint

Dive into the research topics of 'Safety and Completeness in Flow Decompositions for RNA Assembly'. Together they form a unique fingerprint.

Cite this