On the Efficiency of Executing Hydro-environmental Models on Cloud
Abstract
Optimizing high-performance computing applications requires understanding of both the application and its parallelization approach, the system software stack and the target architecture. Traditionally, performance tuning of parallel applications involves consideration of the underlying machine architecture, including floating point performance, memory hierarchies and bandwidth, interconnect architecture, data placement - among others. The shift to the utility computing model through cloud has created tempting economies of scale across IT and domains, not leaving HPC as an exception as a candidate beneficiary. Nevertheless, the infrastructure abstraction and multi-tenancy inherent to cloud offerings poses great challenges to HPC workloads, requiring a dedicated study of applicability of cloud computing as a viable time-to-solution and efficiency platform. In this paper, we present the evaluation of a widely used hydro-environmental code, EFDC, on a cloud platform. Specifically, we evaluate the target parallel application on Linux containers managed by Docker. Unlike virtualization- based solutions that have been widely used for HPC cloud explorations, containers are more fit-for-purpose, sporting among others native execution and lightweight resource consumption. Many-core capability is provided by the OpenMP library in a hybrid configuration with MPI for cross-node data movement, and we explore the combination of these in the target setup. For the MPI part, the work flow is implemented as a data-parallel execution model, with all processing elements performing the same computation, on different sub-domains with thread-level, fine-grain parallelism provided by OpenMP. Optimizing performance requires consideration of the overheads introduced by the OpenMP paradigm such as thread initialization and synchronization. Features of the application make it an ideal test case for deployment on modern cloud architectures, including that it: 1) is legacy code written in Fortran 77, 2) has an implicit solver requiring non-local communication that poses a challenge to traditional partitioning methods, communication optimization and scaling and, 3) is a legacy code across academia, research organizations, governmental agencies, and consulting firms. These technical and practical considerations make this study a representative assessment of migrating legacy codes from traditional HPC systems to the cloud. We finally discuss challenges that stem from the containerized nature of the platform; the latter forms another novel contribution of this paper.