Running CFDDFC with the AWS Console
- Launch an Instance
- Connect to an Instance
- Using OpenFOAM on an Instance
- Data Transfer to an Instance
- Connect with the Remote Desktop
- Data Storage
- Creating a Cluster of Instances
Problem with these instructions? Please send a CFDDFC Support Query.
Why a Cluster?
The latest generation of compute-optimized (C5) instances provides the highest performing processors and the lowest price/performance of the EC2 instance types. The largest C5 instance includes 36 physical cores. A CFD simulation requiring a larger number of cores can be performed using a networked cluster of instances. For clusters, we recommend the C5n instances (launched in November 2018), which provide significantly higher network performance across all instance sizes, compared to the standard C5 instances. Note that the C5n instances are not eligible for free tier and the user will be charged while instances are running.
Creating a Cluster using the CFDDFC CLI
The easy way to create a cluster of instances running CFDDFC is to use the command line interface (CLI). The CLI provides a cluster
subcommand enable the user to add slave instances to a running master instance. The instructions on this page demonstrate creating a cluster of 4 instances. The cluster is formed of 1 master instance attached to the storage for the entire CFD simulation, and 3 slave instances with minimal attached storage. Using the CFDDFC CLI, the user would first launch an instance with the launch subcommand, e.g. with c5n.18xlarge
and 300 GB of storage:
cfddfc launch -instance c5n.18xlarge -volume 300
The user would wait until the instance is running; this is the master instance. The user can then create a cluster of 4 instances by adding 3 slave instances using the cluster
subcommand, e.g.
cfddfc cluster -slaves 3
Once the slave instances are running, the cluster if fully operational.
Creating a Cluster Manually (alternative to CLI)
1. Cluster Security and Placement Groups (one time only)
A security group is required that allows network access between instances. To do this, first create a new security group named cluster
.
- From the left hand menu of the AWS console, select Security Groups.
- Click Create Security Group.
- In the Create Security Group panel, enter the Group Name
cluster
and a Description to remind you what this group does (e.g. “Cluster Group”). - Click Add Rule, and for Inbound, select SSH Type from the drop-down menu and My IP (or Anywhere) Source.
- Click Create.
To allow instances within this security group to access each other, complete the following steps.
- In Security Groups, select the
cluster
security group and for Inbound rules tab, click Edit. - Click Add Rule, selecting All TCP Type from the drop-down menu…
- …then under Source, select Custom and begin typing
sg-
in the text box until a panel pops up, listing the security groups; select thecluster
group itself.
Instances need to be connected through a low latency, high speed network. To ensure this, a placement group is needed to group instances within a single Availability Zone. The user should therefore create a placement group, also named cluster
.
- From the left hand menu of the AWS console, select Placement Groups.
- Click Create Placement Group.
- In the panel, enter the name
cluster
and click Create.
2. Launching Instances
First launch a single master instance by following the same cloud launch instructions as before but observing the following details.
- Select the
c5n.18xlarge
instance (C5n largest size). - Under Instance Details, select
cluster
from the Placement Group menu. - Under Security Groups, check Select an existing security group and select the
cluster
group. - Under Storage, set a volume size that can accommodate the data for the intended simulations.
- Click Launch.
Now launch the slave instances, selecting minimal storage but using the same security and placement groups. All 3 slave instances can be created at once by specifying 3 instances under Instance Details. Under Instance Details, the Auto-assign Public IP can be set to Disable, because the slave nodes only need to be accessed from the master, not from the outside world.
3. SSH access to Slave Instances (once per login session)
To run CFD in parallel using domain decomposition, the master instance needs to execute processes on the slave instances. This is achieved by enabling SSH access from the master instance into the slave instances, which requires the private key which the master instance does not have. However, SSH agent forwarding allows the private key on the local machine to be used, rather than storing it on the master instance (which is not advisable). To use agent forwarding, the private key must be added to to the authentication agent by the following command.
ssh-add ${HOME}/.ssh/ec2.pem
Following that, the user will find that SSH login to an instance no longer requires the key to be specified with the -i
option. Agent forwarding is applied by logging into the master instance using the -A
option, i.e.
ssh -A ubuntu@M.M.M.M
4. Sharing the Master Instance Volume (one time only)
This cluster is set up so that all data is stored on a volume attached to the master instance. We therefore share the OpenFOAM
directory on the master instance across the slave instances using the network file system (NFS) protocol. This involves exporting the OpenFOAM
directory on the master instance, with exportfs
(one time only), which is then mounted by all the slave instances using mount
.
To export the OpenFOAM
directory: from the master instance, add the OpenFOAM
directory to the /etc/exports
file as superuser (sudo
), export the file, then start the NFS server with the following terminal commands.
sudo sh -c "echo '/home/ubuntu/OpenFOAM *(rw,sync,no_subtree_check)' >> /etc/exports" sudo exportfs -ra sudo service nfs-kernel-server start
5. Mounting the Master Volume from Slaves
For each slave instance we need to delete the empty directories in the OpenFOAM
directory and use that directory as a mount point for mounting the OpenFOAM
directory on the master instance. That requires the private IP address of the master instance which can be obtained from the AWS console, denoted here as L.L.L.L
. We also need the private IP addresses of each slave instances from the AWS console, denoted here as X.X.X.X
, Y.Y.Y.Y
and Z.Z.Z.Z
. To simplify the process, define a SPIPS
environment variable for the slave private IPs before deleting and mounting directories using non-interactive SSH for all slave instances by the following commands.
SPIPS="X.X.X.X Y.Y.Y.Y Z.Z.Z.Z" for IP in $SPIPS ; do ssh $IP 'rm -rf ${HOME}/OpenFOAM/*' ; done for IP in $SPIPS ; do ssh $IP 'sudo mount L.L.L.L:${HOME}/OpenFOAM ${HOME}/OpenFOAM' ; done
The mounting of the OpenFOAM
directory on the master instance can be tested for each slave instance by the following command.
for IP in $SPIPS ; do ssh $IP 'ls ${HOME}/OpenFOAM' ; done
For each slave instance, it should return the ubuntu-〈version〉
directory, e.g. ubuntu-3.0.1
.
6. Running in Parallel on a Cluster
Users can test the cluster on the damBreak
tutorial case from the User Guide that simulates the collapse of the column of water under its own weight. The test involves:
- changing to the
$FOAM_RUN
directory with therun
alias; - copying the
damBreak
case files from thetutorials
directory to the current directory; - changing to the
damBreak
case directory; - generating a mesh for the geometry with the
blockMesh
utility; - refine the mesh by splitting each (2D) cell 2×2 using the
refineMesh
utility; - creating the
alpha.water
field file from backup and initialising with thesetFields
utility; - decomposing the mesh and fields into 4 using the
decomposePar
utility; - running the
interFoam
solver in parallel with 4 processes.
First log onto the master node using SSH with agent forwarding (-A
).
ssh -A ubuntu@M.M.M.M
Then go into the run
directory and execute all stages up to running the simulation in parallel with interFoam
.
run cp -r $FOAM_TUTORIALS/multiphase/interFoam/laminar/damBreak/damBreak . cd damBreak blockMesh refineMesh -overwrite cp -r 0/alpha.water.org 0/alpha.water setFields decomposePar
Parallel running across a cluster needs a list of host machines than the user wishes to use. In this example, the host machine names are the private IP addresses of the master and slave instances. Open an editor and create such a file in the damBreak
case directory, naming it machines
, containing one IP address per line, e.g. as shown below (changing the addresses accordingly).
172.12.12.12 172.12.34.34 172.12.56.56 172.12.78.78
Finally, the user can execute interFoam
using the foamJob
script with the -p
option for parallel running. foamJob -p
will automatically run on a number cores indicated by the processor
directories, using the host names in the machines
file.
foamJob -p interFoam