Parallelization
Alchemical free energy calculations can be parallelized over numerous domains. Some domains of parallelization can be used in
any kind of molecular dynamics simulation, such as the spatial domain were a simulation box is decomposed into smaller cells
all run in parallel. These domains are, in general, more difficult to achieve parallelization across than the ones we discuss here which
focus on alchemical calculations. The two domains we focus on here are repeat/ensemble simulations and alchemical windows.
Ensemble simulation are critical to control the aleatoric error inherent in chaotic molecular dynamics simulation. Each simulation
in an ensemble has no communication with the other simulations and so this is an embarrassingly parallel problem, or a problem for which
parallelization is easy to implement. Likewise there is no communication between individual alchemical windows of the simulation
and so parallelizing these windows is also easy. The remainder of this page will explore how to achieve this parallelization
using OpenMM
and NAMD
with TIES
.
TIES-OpenMM
For reference we will consider
running an example system from our TIES MD
Github
page. This example can be run without parallelization using this line:
ties_md --exp_name=sys_solv
This would use 1 available GPU to execute all 8 alchemical windows and the 3 repeat specified in the config file TIES.cfg
If we wanted to parallelize 3 repeats over 3 GPUs on one node we would run:
ties_md --exp_name=sys_solv --devices=0,1,2
Each CUDA
device will then run 8 windows of the 1 replica. Equally ths could be spit into to separate runs of TIES MD
masked to only see one device:
ties_md --exp_name=sys_solv --devices=0 --rep_id=0&
ties_md --exp_name=sys_solv --devices=1 --rep_id=1&
ties_md --exp_name=sys_solv --devices=2 --rep_id=2&
To run in this configuration the options total_reps=3
and split_run=1
are set in TIES.cfg to tell TIES MD
that
there are a total of 3 replicas being run and that each execution of TIES MD
should run only one. --rep_id
determines which replica each instance will run. --rep_id
only needs to be set when using split_run=1
.
If we need further parallelization over alchemical windows we can use the command line option --windows_mask
this option takes a Python
range (start inclusive and end exclusive) of the windows which that instance of
TIES MD
should run.:
ties_md --exp_name=sys_solv --windows_mask=0,1 --devices=0&
ties_md --exp_name=sys_solv --windows_mask=1,2 --devices=1&
ties_md --exp_name=sys_solv --windows_mask=2,3 --devices=2&
ties_md --exp_name=sys_solv --windows_mask=3,4 --devices=3&
ties_md --exp_name=sys_solv --windows_mask=4,5 --devices=4&
ties_md --exp_name=sys_solv --windows_mask=5,6 --devices=5&
ties_md --exp_name=sys_solv --windows_mask=6,7 --devices=6&
ties_md --exp_name=sys_solv --windows_mask=7,8 --devices=7&
Now sing the configuration options total_reps=3
and split_run=0
the above runs 3 replica of each alchemical
window on a different GPU.
For maximum parallelism we combine parallelizing over replicas and alchemical windows. For clarity we now consider the
same example as above but now with 6 alchemical windows, 2 replica simulations and one simulation per GPU, so in
TIES.cfg global_lambdas=0.0, 0.1, 0.4, 0.6, 0.9, 1.0
, total_reps=2
and split_run=1
. To scale over multiple node
we could use the resource allocator of the HPC for example jsrun
on Summit. would allow us to run with 2 replicas of 6 windows as follows:
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=0,1 --rep_id=0&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=1,2 --rep_id=0&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=2,3 --rep_id=0&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=3,4 --rep_id=0&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=4,5 --rep_id=0&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=5,6 --rep_id=0&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=0,1 --rep_id=1&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=1,2 --rep_id=1&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=2,3 --rep_id=1&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=3,4 --rep_id=1&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=4,5 --rep_id=1&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=5,6 --rep_id=1&
Note here we do not set --devices
as the masking of GPUs is handled by the resource allocator, this is not the general case.
If a resource allocator is not available an alternative method to run multiple simulations across nodes is to use a message passing interface
(MPI
). The use of MPI
can vary from system to system and there is no universal solution to running across many node
for all HPC systems, however we provide an example (NAMD 3) which would work with
ThetaGPU.
TIES-NAMD
The parallelization of TIES in NAMD2
follows the same ideas as OpenMM
above. We want to run independent simulations
for all alchemical window and replica simulations. If in TIES.cfg split_run=0
the submission script that
TIES_MD
writes will use the NAMD
option +replicas X
this makes each NAMD
run X
replicas and the
run lines in sub.sh will look something like:
for stage in {0..3}; do
for lambda in 0.00 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 1.0; do
srun -N $nodes_per_namd -n $cpus_per_namd namd2 +replicas 5 --tclmain run$stage-replicas.conf $lambda&
sleep 1
done
wait
done
Alternatively if split_run=1
the run lines will look like:
for stage in {0..3}; do
for lambda in 0.00 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 1.0; do
for i in {{0..4}}; do
srun -N $nodes_per_namd -n $cpus_per_namd namd2 --tclmain run$stage.conf $lambda $i &
sleep 1
done
done
wait
done
Notice now the additional loop over $i
. So these run line are creating 65 different instances of NAMD
each
running 1 replica and one alchemical window. Anecdotally using the +replicas
results in less crashes and
we have tested up to +replicas 135
on ARCHER 2 with no crashes. In the two above
examples the parallelism over alchemical windows is achieved in the loop over lambda.
Using NAMD3
parallelization can be achieved like so (NAMD 3). NAMD
in general has extensive options to provision
hardware and achieve parallelism, what have outlined here is not exhaustive and we would suggest consulting
the documentation for more a more comprehensive information.