TY - GEN
T1 - Optimizing UltraScan Job Scheduling with Deep Learning-Based Performance Prediction
AU - Householder, Aaron
AU - Zou, Cliff
AU - Brookes, Emre
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/7/18
Y1 - 2025/7/18
N2 - High-Performance Computing (HPC) resources are crucial for computationally intensive applications, yet efficient job scheduling remains a challenge due to inaccurate user-provided runtime estimates. UltraScan, software for analyzing analytical ultracentrifugation experiments, relies on queue-managed HPC resources where execution times vary significantly based on input parameters. To improve scheduling efficiency and resource utilization, we propose a deep learning-based approach for predicting UltraScan job run times. Unlike previous heuristic-based and regression models, our study employs deep neural networks trained on historical execution records, utilizing hyperparameter tuning and grid search to optimize predictive accuracy. However, feature selection methods such as LASSO regression, XGBoost, and Random Forest did not improve runtime prediction, suggesting that execution time is influenced by complex high-dimensional interactions rather than individual feature importance. Instead, deep learning models demonstrated better performance by capturing implicit patterns within the data. To further refine predictions, we introduce a Z-score-based outlier filtering strategy that adaptively adjusts acceptance thresholds, mitigating the impact of extreme cases on runtime estimation. Our results indicate that deep learning models, combined with outlier handling, provide a scalable approach for improving HPC scheduling, though challenges remain in reducing long-tail prediction errors. This study represents the first large-scale application of deep learning for UltraScan performance prediction.
AB - High-Performance Computing (HPC) resources are crucial for computationally intensive applications, yet efficient job scheduling remains a challenge due to inaccurate user-provided runtime estimates. UltraScan, software for analyzing analytical ultracentrifugation experiments, relies on queue-managed HPC resources where execution times vary significantly based on input parameters. To improve scheduling efficiency and resource utilization, we propose a deep learning-based approach for predicting UltraScan job run times. Unlike previous heuristic-based and regression models, our study employs deep neural networks trained on historical execution records, utilizing hyperparameter tuning and grid search to optimize predictive accuracy. However, feature selection methods such as LASSO regression, XGBoost, and Random Forest did not improve runtime prediction, suggesting that execution time is influenced by complex high-dimensional interactions rather than individual feature importance. Instead, deep learning models demonstrated better performance by capturing implicit patterns within the data. To further refine predictions, we introduce a Z-score-based outlier filtering strategy that adaptively adjusts acceptance thresholds, mitigating the impact of extreme cases on runtime estimation. Our results indicate that deep learning models, combined with outlier handling, provide a scalable approach for improving HPC scheduling, though challenges remain in reducing long-tail prediction errors. This study represents the first large-scale application of deep learning for UltraScan performance prediction.
KW - High-Performance Computing (HPC)
KW - Job Runtime Prediction
KW - Machine Learning for HPC
KW - Performance Prediction
UR - https://www.scopus.com/pages/publications/105013082808
U2 - 10.1145/3708035.3736016
DO - 10.1145/3708035.3736016
M3 - Conference contribution
AN - SCOPUS:105013082808
T3 - PEARC 2025 - Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration
SP - 1
EP - 8
BT - PEARC 2025 - Practice and Experience in Advanced Research Computing 2025
PB - Association for Computing Machinery, Inc
T2 - 2025 Practice and Experience in Advanced Research Computing, PEARC 2025
Y2 - 20 July 2025 through 24 July 2025
ER -