OpenFOAM is software for computational fluid dynamics (CFD), maintained by a core team at CFD Direct and licensed free and open source by the OpenFOAM Foundation. OpenFOAM provides parallel computing through domain decomposition and an interface to software written to the message passing interface (MPI) standard, e.g. Open MPI. Small parallel computations use multiple cores, e.g. 2-36, from a multi-core computer processor. Computer clusters connect multiple processors to increase the number of available cores. High performance computing (HPC) involves a larger number of cores, e.g. 100s, with fast networking to deliver good scalability.
CFD Direct From the Cloud™
CFD Direct provides resources to run CFD on the cloud with OpenFOAM using Amazon Web Services (AWS). It produces CFD Direct From the Cloud™ (CFDDFC), a Marketplace Product for Amazon Elastic Cloud Compute (EC2), which includes a command line interface for simple management of CFDDFC instances, data transfer and running of OpenFOAM applications. EC2 includes Compute Optimized instances (the C-series) for compute-intensive workloads. For the C5 instances (5th generation C-series), AWS launched the C5n variant, which provides network performance that delivers good parallel scaling on clusters of instances (70%-90% at 504 cores), running OpenFOAM applications.
Parallel Scaling with AWS Elastic Fabric Adapter
In November 2018, AWS announced Elastic Fabric Adapter (EFA), a network interface for HPC applications running on EC2. During the preview of the EFA technology, we ran the benchmark OpenFOAM simulations presented previously:
- strong scaling – 97 million (97 m) total cells, steady-state external aerodynamics around a car;
- weak scaling – 100 thousand (100 k) cells per core, transient flow over a weir with hydraulic jump.
The simulations used all available physical cores on a single c5n.18xlarge
instance and on clusters of 4, 7, 14 and 28 instances. The c5n.18xlarge
contains 36 physical cores, so that the largest cluster contained 28 × 36 = 1008 cores. For each simulation, we calculated a performance factor F = (t /T )⋅(M /C ), where: t = simulated time; T = clock time; M = number of cells; C = number of cores. F is normalised by its value calculated for a single instance (36 cores) and presented as a percentage. On that basis, 100% denotes “linear scaling”, with super- and sub-linear scaling represented by percentages above and below 100, respectively.
External Aerodynamics around a Car
The simulation of external aerodynamics of a car was run as described previously. The scaling graph below shows super-linear performance for the cluster configurations until, at 1008 cores, the scaling returns to linear (98.4% precisely). These strong scaling tests conflate the effects of network performance with other factors, such as CPU cache. As the number of cells per core decreases, the CPU can cache a greater proportion of the data structures in CFD, e.g. fields, matrices, etc, leading to super-linear scaling. At the same time, as the number of cells per core decreases with increasing cores, there is greater sub-linear scaling associated with network performance. Around 1000 cores and approximately 100 k cells per core, the two effects “cancel out” to deliver net linear scaling.
Flow over a Weir with Hydraulic Jump
The simulation of flow over a weir, with hydraulic jump, was run as described previously. The weak scaling tests isolates the effect of network performance which shows sub-linear scaling down to 67% at 1008 cores (see below). The simulations ran with a fixed time step of 0.01 s, writing data every 50 time steps across the network to a file system on the main instance. As before, we repeated the simulations with data writing switched off, which increased the scaling to 72.6% at 1008 cores.
The simulation on 1008 cores has a weir width of 4.2 km (and mesh size of 100 m cells). The image above shows a region of flow where the hydraulic jump at the base of the weir was no longer perpendicular to the flow direction. This phenomenon could cause an increase in solution time due to the change in flow physics, although it is unlikely to be particularly significant.
Summary of EFA
- C5n instances, with standard networking, deliver 70%-90% scaling at 504 cores.
- The addition of EFA delivered ~70% scaling at 1008 cores for meshes of 100 k cells per core.
- EFA delivered linear scaling at 1008 cores for a fixed mesh size of 97 m cells.
- With linear scaling, EFA enables faster solutions for the same cost simply by running on more cores.
- With open source software and C5n instances, EFA offers improved HPC, still at a price in the order of $100.
- OpenFOAM and AWS makes HPC with CFD applications finally accessible to all.
Acknowledgments
The work was supported by the AWS Cloud Credits for Research program.