HPC Submission scriptsο
Here we provide some example submission scripts for various HPC systems. TIES MD
will attempt to automatically write sensible submission
scripts for NAMD2
targeting ARCHER 2 and for OpenMM
targeting Summit.
In general the user can make there own script for whichever HPC or cluster they prefer. To aid with writing general
scripts TIES MD
exposes 3 options in the API called sub_header
, pre_run_line
and run_line
. The strings passed
with these options will be injected into a general template for a NAMD2
or OpenMM
submission. All generated
submission scripts are written to the base TIES MD
directory as sub.sh. An example of this is provided in here Running.
NAMDο
Here is an example of a submission script for a large system (β100k atoms) running on SuperMUC-NG:
#!/bin/bash
#SBATCH --job-name=LIGPAIR
#SBATCH -o ./%x.%j.out
#SBATCH -e ./%x.%j.err
#SBATCH -D ./
#SBATCH --nodes=130
#SBATCH --tasks-per-node=48
#SBATCH --no-requeue
#SBATCH --export=NONE
#SBATCH --get-user-env
#SBATCH --account=XXX
#SBATCH --partition=general
#SBATCH --time=10:00:00
module load slurm_setup
module load namd/2.14-gcc8-impi
nodes_per_namd=10
cpus_per_namd=480
echo $nodes_per_namd
echo $cpus_per_namd
#change this line to point to your project
ties_dir=/hppfs/work/pn98ve/di67rov/test_TIES/study/prot/ties-l2-l1/com
cd $ties_dir/replica-confs
for stage in {0..3}; do
for lambda in 0.00 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 1.0; do
srun -N $nodes_per_namd -n $cpus_per_namd namd2 +replicas 5 --tclmain run$stage-replicas.conf $lambda&
sleep 1
done
wait
done
The first 20 lines of this script could be adapted for a smaller system (β10k atoms) as follows:
#!/bin/bash
#SBATCH --job-name=LIGPAIR
#SBATCH -o ./%x.%j.out
#SBATCH -e ./%x.%j.err
#SBATCH -D ./
#SBATCH --nodes=13
#SBATCH --tasks-per-node=45
#SBATCH --no-requeue
#SBATCH --export=NONE
#SBATCH --get-user-env
#SBATCH --account=XXX
#SBATCH --partition=micro
#SBATCH --time=10:00:00
module load slurm_setup
module load namd/2.14-gcc8-impi
#--nodes and nodes_per_namd can be scaled up for large simulations
nodes_per_namd=1
cpus_per_namd=45
OpenMMο
Here we provide an example of TIES MD
running with OpenMM
on Summit:
#!/bin/bash
#BSUB -P XXX
#BSUB -W 20
#BSUB -nnodes 1
#BSUB -alloc_flags "gpudefault smt1"
#BSUB -J test
#BSUB -o otest.%J
#BSUB -e etest.%J
cd $LS_SUBCWD
export PATH="/gpfs/alpine/scratch/adw62/chm155/TIES_test/miniconda/bin:$PATH"
export ties_dir="/gpfs/alpine/scratch/adw62/chm155/TIES_test/TIES_MD/TIES_MD/examples/ethane/zero_sum/leg1"
module load cuda/10.1.168
date
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=0,1 --rep_id=0 > $ties_dir/0.out&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=1,2 --rep_id=0 > $ties_dir/1.out&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=2,3 --rep_id=0 > $ties_dir/2.out&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=3,4 --rep_id=0 > $ties_dir/3.out&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=4,5 --rep_id=0 > $ties_dir/4.out&
jsrun --smpiargs="off" -n 1 -a 1 -c 1 -g 1 -b packed:1 ties_md --config_file=$ties_dir/TIES.cfg --exp_name='sys_solv' --windows_mask=5,6 --rep_id=0 > $ties_dir/5.out&
wait
NAMD 3ο
Here we provide an example of TIES MD
running with NAMD3
on ThetaGPU:
#!/bin/bash
#COBALT -A XXX
#COBALT -t 100
#COBALT -n 2
#COBALT -q full-node
export mpirun="/lus/theta-fs0/software/thetagpu/openmpi-4.0.5/bin/mpirun"
export namd3="/lus/theta-fs0/projects/CompBioAffin/awade/NAMD3/NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA/namd3"
node1=$(sed "1q;d" $COBALT_NODEFILE)
node2=$(sed "2q;d" $COBALT_NODEFILE)
cd /lus/theta-fs0/projects/CompBioAffin/awade/many_reps/mcl1/l18-l39/com/replica-confs
for stage in {0..3}; do
$mpirun -host $node1 --cpu-set 0 --bind-to core -np 1 $namd3 +devices 0 --tclmain run$stage.conf 0.00 0&
$mpirun -host $node1 --cpu-set 1 --bind-to core -np 1 $namd3 +devices 1 --tclmain run$stage.conf 0.05 0&
$mpirun -host $node1 --cpu-set 2 --bind-to core -np 1 $namd3 +devices 2 --tclmain run$stage.conf 0.10 0&
$mpirun -host $node1 --cpu-set 3 --bind-to core -np 1 $namd3 +devices 3 --tclmain run$stage.conf 0.20 0&
$mpirun -host $node1 --cpu-set 4 --bind-to core -np 1 $namd3 +devices 4 --tclmain run$stage.conf 0.30 0&
$mpirun -host $node1 --cpu-set 5 --bind-to core -np 1 $namd3 +devices 5 --tclmain run$stage.conf 0.40 0&
$mpirun -host $node1 --cpu-set 6 --bind-to core -np 1 $namd3 +devices 6 --tclmain run$stage.conf 0.50 0&
$mpirun -host $node1 --cpu-set 7 --bind-to core -np 1 $namd3 +devices 7 --tclmain run$stage.conf 0.60 0&
$mpirun -host $node2 --cpu-set 0 --bind-to core -np 1 $namd3 +devices 0 --tclmain run$stage.conf 0.70 0&
$mpirun -host $node2 --cpu-set 1 --bind-to core -np 1 $namd3 +devices 1 --tclmain run$stage.conf 0.80 0&
$mpirun -host $node2 --cpu-set 2 --bind-to core -np 1 $namd3 +devices 2 --tclmain run$stage.conf 0.90 0&
$mpirun -host $node2 --cpu-set 3 --bind-to core -np 1 $namd3 +devices 3 --tclmain run$stage.conf 0.95 0&
$mpirun -host $node2 --cpu-set 4 --bind-to core -np 1 $namd3 +devices 4 --tclmain run$stage.conf 1.00 0&
wait
done
This script is running 13 alchemical windows using only 1 replica simulation in each window. Additionally 3 GPUs are idle
on node2. For real world application this script needs to be scaled up. Currently TIES MD
will not attempt to build
NAMD3
HPC scripts automatically. For creating general scripts a Python
script can be very helpful the following
script would allow us to scale up on ThetaGPU:
import os
if __name__ == "__main__":
###OPTIONS###
#account name
acc_name = 'XXX'
#how many nodes do we want
nodes = 9
#what thermodynamic leg to run (these may have different wall times)
leg = 'com'
#Where is the namd3 binary
namd3_exe = '/lus/theta-fs0/projects/CompBioAffin/awade/NAMD3/NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA/namd3'
#############
cwd = os.getcwd()
#give com and lig simulations different wall times if needed
if leg == 'com':
wall_time = 100
else:
wall_time = 60
with open(os.path.join(cwd, 'thetagpu_{}.sub'.format(leg)), 'w') as f:
#Writing a header
f.write('#!/bin/bash\n')
f.write('#COBALT -A {}\n'.format(acc_name))
f.write('#COBALT -t {}\n'.format(wall_time))
f.write('#COBALT -n {}\n'.format(nodes))
f.write('#COBALT -q full-node\n')
#exporting mpirun and namd3 install locations
f.write('export mpirun=\"/lus/theta-fs0/software/thetagpu/openmpi-4.0.5/bin/mpirun\"\n')
f.write('export namd3=\"/lus/theta-fs0/projects/CompBioAffin/awade/NAMD3/NAMD_3.0alpha9_Linux-x86_64-multicore-CUDA/namd3\"\n')
#writing line to read node file
for node in range(nodes):
f.write('node{0}=$(sed \"{1}q;d\" $COBALT_NODEFILE)\n'.format(node+1, node+1))
#move to ties directory
f.write('cd {}\n'.format(os.path.join(cwd, 'replica-confs')))
#iterate over minimization, NVT eq, NPT eq and production
for stage in ['run0', 'run1', 'run2', 'run3']:
count = 0
node = 1
#iterate over alchemical windows
for lam in [0.00, 0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95, 1.00]:
#iterate over replica simulations
for rep in [0, 1, 2, 3, 4]:
#write the run line
f.write('$mpirun -host $node{} --cpu-set {} --bind-to core -np 1 $namd3 +devices {} --tclmain {}.conf {:.2f} {}&\n'.format(node, count%8, count%8, stage, lam, rep))
# count the number of gpus move to next node when gpus all filled
count += 1
if count%8 == 0:
node += 1
#make sure we wait between simulation stages for all sims to finish
f.write('wait\n')