Back: 5. Data Storage | CFDDFC on AWS

Why a Cluster?

The latest generation of compute-optimized (C4) instances provides the highest performing processors and the lowest price/performance of the EC2 instance types. The largest C4 instance, c4.8xlarge, includes 18 physical cores (36 vCPU). A CFD simulation requiring a larger number of cores can be performed using a networked cluster of instances. The following instructions demonstrate creating a cluster of 4 instances. The cluster is formed of 1 master instance attached to the storage for the entire CFD simulation, and 3 slave instances with minimal attached storage.  Note that the C4 instances are not eligible for free tier and the user will be charged while instances are running.

Cluster Security and Placement Groups (one time only)

A security group is required that allows network access between instances.  To do this, first create a new security group named cluster.

  • From the left hand menu of the AWS console, select Security Groups.
  • Click Create Security Group.
  • In the Create Security Group panel, enter the Group Name cluster and a Description to remind you what this group does (e.g. “Cluster Group”).
  • Click Add Rule, and for Inbound, select SSH Type from the drop-down menu and My IP (or Anywhere) Source.
  • Click Create.

To allow instances within this security group to access each other, complete the following steps.

  • In Security Groups, select the cluster security group and for Inbound rules tab, click Edit.
  • Click Add Rule, selecting All TCP Type from the drop-down menu…
  • …then under Source, select Custom and begin typing sg- in the text box until a panel pops up, listing the security groups; select the cluster group itself.

Instances need to be connected through a low latency, high speed network. To ensure this, a placement group is needed to group instances within a single Availability Zone. The user should therefore create a placement group, also named cluster.

  • From the left hand menu of the AWS console, select Placement Groups.
  • Click Create Placement Group.
  • In the panel, enter the name cluster and click Create.

Launching Instances

First launch a single master instance by following the same cloud launch instructions as before but observing the following details.

  • Select an instance that supports placement groups, i.e. any one of the c4 instances.
  • Under Instance Details, select cluster from the Placement Group menu.
  • Under Security Groups, check Select an existing security group and select the cluster group.
  • Under Storage, set a volume size that can accommodate the data for the intended simulations.
  • Click Launch.

Now launch the slave instances, selecting minimal storage but using the same security and placement groups. All 3 slave instances can be created at once by specifying 3 instances under Instance Details.  Under Instance Details, the Auto-assign Public IP can be set to Disable, because the slave nodes only need to be accessed from the master, not from the outside world.

SSH access to Slave Instances (once per login session)

To run CFD in parallel using domain decomposition, the master instance needs to execute processes on the slave instances.  This is achieved by enabling SSH access from the master instance into the slave instances, which requires the private key which the master instance does not have.  However, SSH agent forwarding allows the private key on the local machine to be used, rather than storing it on the master instance (which is not advisable). To use agent forwarding, the private key must be added to to the authentication agent by the following command.

ssh-add ${HOME}/.ssh/ec2.pem

Following that, the user will find that SSH login to an instance no longer requires the key to be specified with the -i option. Agent forwarding is applied by logging into the master instance using the -A option, i.e.

ssh -A ubuntu@M.M.M.M

Sharing the Master Instance Volume (one time only)

This cluster is set up so that all data is stored on a volume attached to the master instance.  We therefore share the OpenFOAM directory on the master instance across the slave instances using the network file system (NFS) protocol.  This involves exporting the OpenFOAM directory on the master instance, with exportfs (one time only), which is then mounted by all the slave instances using mount.

To export the OpenFOAM directory: from the master instance, add the OpenFOAM directory to the /etc/exports file as superuser (sudo), export the file, then start the NFS server with the following terminal commands.

sudo sh -c "echo '/home/ubuntu/OpenFOAM  *(rw,sync,no_subtree_check)' >> /etc/exports"
sudo exportfs -ra
sudo service nfs-kernel-server start

Mounting the Master Volume from Slaves

For each slave instance we need to delete the empty directories in the OpenFOAM directory and use that directory as a mount point for mounting the OpenFOAM directory on the master instance. That requires the private IP address of the master instance which can be obtained from the AWS console, denoted here as L.L.L.L. We also need the private IP addresses of each slave instances from the AWS console, denoted here as X.X.X.X, Y.Y.Y.Y and Z.Z.Z.Z. To simplify the process, define a SPIPS environment variable for the slave private IPs before deleting and mounting directories using non-interactive SSH for all slave instances by the following commands.

SPIPS="X.X.X.X Y.Y.Y.Y Z.Z.Z.Z"
for IP in $SPIPS ; do ssh $IP 'rm -rf ${HOME}/OpenFOAM/*' ; done
for IP in $SPIPS ; do ssh $IP 'sudo mount L.L.L.L:${HOME}/OpenFOAM ${HOME}/OpenFOAM' ; done

The mounting of the OpenFOAM directory on the master instance can be tested for each slave instance by the following command.

for IP in $SPIPS ; do ssh $IP 'ls ${HOME}/OpenFOAM' ; done

For each slave instance, it should return the ubuntu-〈version〉 directory, e.g. ubuntu-3.0.1.

Running in Parallel on a Cluster

Users can test the cluster on the damBreak tutorial case from the User Guide that simulates the collapse of the column of water under its own weight. The test involves:

  1. changing to the $FOAM_RUN directory with the run alias;
  2. copying the damBreak case files from the tutorials directory to the current directory;
  3. changing to the damBreak case directory;
  4. generating a mesh for the geometry with the blockMesh utility;
  5. refine the mesh by splitting each (2D) cell 2×2 using the refineMesh utility;
  6. creating the alpha.water field file from backup and initialising with the setFields utility;
  7. decomposing the mesh and fields into 4 using the decomposePar utility;
  8. running the interFoam solver in parallel with 4 processes.

First log onto the master node using SSH with agent forwarding (-A).

ssh -A ubuntu@M.M.M.M

Then go into the run directory and execute all stages up to running the simulation in parallel with interFoam.

run
cp -r $FOAM_TUTORIALS/multiphase/interFoam/laminar/damBreak .
cd damBreak
blockMesh
refineMesh -overwrite
cp -r 0/alpha.water.org 0/alpha.water
setFields
decomposePar

Parallel running across a cluster needs a list of host machines than the user wishes to use. In this example, the host machine names are the private IP addresses of the master and slave instances. Open an editor and create such a file in the damBreak case directory, naming it machines, containing one IP address per line, e.g. as shown below (changing the addresses accordingly).

172.12.12.12
172.12.34.34
172.12.56.56
172.12.78.78

Finally, the user can execute interFoam using the foamJob script with the -p option for parallel running. foamJob -p will automatically run on a number cores indicated by the processor directories, using the host names in the machines file.

foamJob -p interFoam

Back: 5. Data Storage | CFDDFC on AWS