Back: 6. Data Storage | CFDDFC on AWS

Why a Cluster?

The latest generation of compute-optimized (C5) instances provides the highest performing processors and the lowest price/performance of the EC2 instance types. The largest C5 instance includes 36 physical cores. A CFD simulation requiring a larger number of cores can be performed using a networked cluster of instances. For clusters, we recommend the C5n instances (launched in November 2018), which provide significantly higher network performance across all instance sizes, compared to the standard C5 instancesNote that the C5n instances are not eligible for free tier and the user will be charged while instances are running.

Creating a Cluster using the CFDDFC CLI

The easy way to create a cluster of instances running CFDDFC is to use the command line interface (CLI).   The CLI provides a cluster subcommand enable the user to add slave instances to a running master instance.  The instructions on this page demonstrate creating a cluster of 4 instances. The cluster is formed of 1 master instance attached to the storage for the entire CFD simulation, and 3 slave instances with minimal attached storage.  Using the CFDDFC CLI, the user would first launch an instance with the launch subcommand, e.g. with c5n.18xlarge and 300 GB of storage:

cfddfc launch -instance c5n.18xlarge -volume 300

The user would wait until the instance is running; this is the master instance.  The user can then create a cluster of 4 instances by adding 3 slave instances using the cluster subcommand, e.g.

cfddfc cluster -slaves 3

Once the slave instances are running, the cluster if fully operational.

Creating a Cluster Manually (alternative to CLI)

1. Cluster Security and Placement Groups (one time only)

A security group is required that allows network access between instances.  To do this, first create a new security group named cluster.

  • From the left hand menu of the AWS console, select Security Groups.
  • Click Create Security Group.
  • In the Create Security Group panel, enter the Group Name cluster and a Description to remind you what this group does (e.g. “Cluster Group”).
  • Click Add Rule, and for Inbound, select SSH Type from the drop-down menu and My IP (or Anywhere) Source.
  • Click Create.

To allow instances within this security group to access each other, complete the following steps.

  • In Security Groups, select the cluster security group and for Inbound rules tab, click Edit.
  • Click Add Rule, selecting All TCP Type from the drop-down menu…
  • …then under Source, select Custom and begin typing sg- in the text box until a panel pops up, listing the security groups; select the cluster group itself.

Instances need to be connected through a low latency, high speed network. To ensure this, a placement group is needed to group instances within a single Availability Zone. The user should therefore create a placement group, also named cluster.

  • From the left hand menu of the AWS console, select Placement Groups.
  • Click Create Placement Group.
  • In the panel, enter the name cluster and click Create.

2. Launching Instances

First launch a single master instance by following the same cloud launch instructions as before but observing the following details.

  • Select the c5n.18xlarge instance (C5n largest size).
  • Under Instance Details, select cluster from the Placement Group menu.
  • Under Security Groups, check Select an existing security group and select the cluster group.
  • Under Storage, set a volume size that can accommodate the data for the intended simulations.
  • Click Launch.

Now launch the slave instances, selecting minimal storage but using the same security and placement groups. All 3 slave instances can be created at once by specifying 3 instances under Instance Details.  Under Instance Details, the Auto-assign Public IP can be set to Disable, because the slave nodes only need to be accessed from the master, not from the outside world.

3. SSH access to Slave Instances (once per login session)

To run CFD in parallel using domain decomposition, the master instance needs to execute processes on the slave instances.  This is achieved by enabling SSH access from the master instance into the slave instances, which requires the private key which the master instance does not have.  However, SSH agent forwarding allows the private key on the local machine to be used, rather than storing it on the master instance (which is not advisable). To use agent forwarding, the private key must be added to to the authentication agent by the following command.

ssh-add ${HOME}/.ssh/ec2.pem

Following that, the user will find that SSH login to an instance no longer requires the key to be specified with the -i option. Agent forwarding is applied by logging into the master instance using the -A option, i.e.

ssh -A ubuntu@M.M.M.M

4. Sharing the Master Instance Volume (one time only)

This cluster is set up so that all data is stored on a volume attached to the master instance.  We therefore share the OpenFOAM directory on the master instance across the slave instances using the network file system (NFS) protocol.  This involves exporting the OpenFOAM directory on the master instance, with exportfs (one time only), which is then mounted by all the slave instances using mount.

To export the OpenFOAM directory: from the master instance, add the OpenFOAM directory to the /etc/exports file as superuser (sudo), export the file, then start the NFS server with the following terminal commands.

sudo sh -c "echo '/home/ubuntu/OpenFOAM  *(rw,sync,no_subtree_check)' >> /etc/exports"
sudo exportfs -ra
sudo service nfs-kernel-server start

5. Mounting the Master Volume from Slaves

For each slave instance we need to delete the empty directories in the OpenFOAM directory and use that directory as a mount point for mounting the OpenFOAM directory on the master instance. That requires the private IP address of the master instance which can be obtained from the AWS console, denoted here as L.L.L.L. We also need the private IP addresses of each slave instances from the AWS console, denoted here as X.X.X.X, Y.Y.Y.Y and Z.Z.Z.Z. To simplify the process, define a SPIPS environment variable for the slave private IPs before deleting and mounting directories using non-interactive SSH for all slave instances by the following commands.

SPIPS="X.X.X.X Y.Y.Y.Y Z.Z.Z.Z"
for IP in $SPIPS ; do ssh $IP 'rm -rf ${HOME}/OpenFOAM/*' ; done
for IP in $SPIPS ; do ssh $IP 'sudo mount L.L.L.L:${HOME}/OpenFOAM ${HOME}/OpenFOAM' ; done

The mounting of the OpenFOAM directory on the master instance can be tested for each slave instance by the following command.

for IP in $SPIPS ; do ssh $IP 'ls ${HOME}/OpenFOAM' ; done

For each slave instance, it should return the ubuntu-〈version〉 directory, e.g. ubuntu-3.0.1.

6. Running in Parallel on a Cluster

Users can test the cluster on the damBreak tutorial case from the User Guide that simulates the collapse of the column of water under its own weight. The test involves:

  1. changing to the $FOAM_RUN directory with the run alias;
  2. copying the damBreak case files from the tutorials directory to the current directory;
  3. changing to the damBreak case directory;
  4. generating a mesh for the geometry with the blockMesh utility;
  5. refine the mesh by splitting each (2D) cell 2×2 using the refineMesh utility;
  6. creating the alpha.water field file from backup and initialising with the setFields utility;
  7. decomposing the mesh and fields into 4 using the decomposePar utility;
  8. running the interFoam solver in parallel with 4 processes.

First log onto the master node using SSH with agent forwarding (-A).

ssh -A ubuntu@M.M.M.M

Then go into the run directory and execute all stages up to running the simulation in parallel with interFoam.

run
cp -r $FOAM_TUTORIALS/multiphase/interFoam/laminar/damBreak/damBreak .
cd damBreak
blockMesh
refineMesh -overwrite
cp -r 0/alpha.water.org 0/alpha.water
setFields
decomposePar

Parallel running across a cluster needs a list of host machines than the user wishes to use. In this example, the host machine names are the private IP addresses of the master and slave instances. Open an editor and create such a file in the damBreak case directory, naming it machines, containing one IP address per line, e.g. as shown below (changing the addresses accordingly).

172.12.12.12
172.12.34.34
172.12.56.56
172.12.78.78

Finally, the user can execute interFoam using the foamJob script with the -p option for parallel running. foamJob -p will automatically run on a number cores indicated by the processor directories, using the host names in the machines file.

foamJob -p interFoam

Back: 6. Data Storage | CFDDFC on AWS