TY - GEN
T1 - Fast, Flexible, and Exact Minimum Flow Decompositions via ILP
AU - Dias, Fernando H.C.
AU - Williams, Lucia
AU - Mumey, Brendan
AU - Tomescu, Alexandru I.
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Minimum flow decomposition (MFD)—the problem of finding a minimum set of paths that perfectly decomposes a flow—is a classical problem in Computer Science, and variants of it are powerful models in multiassembly problems in Bioinformatics (e.g. RNA assembly). However, because this problem and its variants are NP-hard, practical multiassembly tools either use heuristics or solve simpler, polynomial-time solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Many RNA assemblers also use integer linear programming (ILP) formulations of such practical variants, having the major limitation they need to encode all the potentially exponentially many solution paths. Moreover, the only exact solver for MFD does not scale to large instances, and cannot be efficiently generalized to practical MFD variants. In this work, we provide the first practical ILP formulation for MFD (and thus the first fast and exact solver for MFD), based on encoding all of the exponentially many solution paths using only a quadratic number of variables. On both simulated and real flow graphs, our approach runs in under 13 s on average. We also show that our ILP formulation can be easily and efficiently adapted for many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. We hope that our results can remove the current tradeoff between the complexity of a multiassembly model and its tractability, and can lie at the core of future practical RNA assembly tools. Our implementations are freely available at github.com/algbio/MFD-ILP.
AB - Minimum flow decomposition (MFD)—the problem of finding a minimum set of paths that perfectly decomposes a flow—is a classical problem in Computer Science, and variants of it are powerful models in multiassembly problems in Bioinformatics (e.g. RNA assembly). However, because this problem and its variants are NP-hard, practical multiassembly tools either use heuristics or solve simpler, polynomial-time solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Many RNA assemblers also use integer linear programming (ILP) formulations of such practical variants, having the major limitation they need to encode all the potentially exponentially many solution paths. Moreover, the only exact solver for MFD does not scale to large instances, and cannot be efficiently generalized to practical MFD variants. In this work, we provide the first practical ILP formulation for MFD (and thus the first fast and exact solver for MFD), based on encoding all of the exponentially many solution paths using only a quadratic number of variables. On both simulated and real flow graphs, our approach runs in under 13 s on average. We also show that our ILP formulation can be easily and efficiently adapted for many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. We hope that our results can remove the current tradeoff between the complexity of a multiassembly model and its tractability, and can lie at the core of future practical RNA assembly tools. Our implementations are freely available at github.com/algbio/MFD-ILP.
KW - Flow decomposition
KW - Integer linear programming
KW - Multiassembly
KW - Network flow
KW - RNA assembly
UR - https://www.scopus.com/pages/publications/85131149879
U2 - 10.1007/978-3-031-04749-7_14
DO - 10.1007/978-3-031-04749-7_14
M3 - Conference contribution
AN - SCOPUS:85131149879
SN - 9783031047480
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 230
EP - 245
BT - Research in Computational Molecular Biology - 26th Annual International Conference, RECOMB 2022, Proceedings
A2 - Pe’er, Itsik
PB - Springer Science and Business Media Deutschland GmbH
T2 - 26th International Conference on Research in Computational Molecular Biology, RECOMB 2022
Y2 - 22 May 2022 through 25 May 2022
ER -