Tutorial ========= Getting started --------------- ``TIES MD`` is a package for the preparation, running and analysis of binding free energy calculations. In this document we will outline what commands should be run to calculate binding free energies. To start with any free energy calculations we must first outline what are the expected input files to the ``TIES MD`` program. In this tutorial we will refer to example systems which can be found in the `source code `_. These examples can be download by running:: git clone https://github.com/UCL-CCS/TIES_MD.git and found by navigating to ``TIES_MD/TIES_MD/examples/`` Input ------ ``TIES MD`` expects a number of input files, these are two essential files, e.g. ``complex.pdb`` and ``complex.prmtop``. These files contain information about the position, topology and parameters for the system. Currently we only support the ``AMBER`` based format ``prmtop``. ``complex.pdb`` also contains the alchemical indexes denoting which atoms will appear and disappear during the simulation. There is also an optional input file, ``constraints.pdb``, and this contains indexes denoting which atoms, if any, are constrained during the pre-production simulation. This input should all be placed in a directory named build located where the user wishes to run the simulation. Examples of these files can be found `here `_. Please use a directory structure like ``study/system/ligand/thermodynamic_leg/build`` this will allow the analysis scripts to understand the structure and perform analysis automatically. ``study``, ``system``, ``ligand`` and ``thermodynamic_leg`` can be renamed to anything but the name of the ``build`` directory is fixed. If input for novel ligand transformations is desired the `TIES20 `_ program can be used to generate all required inputs. ``TIES 20`` can be used via our online `service `_ or locally and details of how to use this will be provided later in these documents. The only non standard input to ``TIES MD`` is a configuration file (``TIES.cfg``) which specifies options which the user my wish to occasionally change. This file must be placed alongside the build directory. Here we provide an example of such a file:: ################################################################# # # # TIES # # # ################################################################# #Which molecular dynamics engine will be used, valid options are openmm/namd2.14/namd3 engine = openmm #Target temperature for the thermostat temperature = 300*unit.kelvin #Target pressure for barostat pressure = 1*unit.atmospheres #How much production sampling to run per alchemical window (4ns recommended) sampling_per_window = 4*unit.nanoseconds #How much equilibration to run per alchemical window (2ns recommended) equili_per_window = 2*unit.nanoseconds #Which estimators to use. Valid options are: TI, FEP methods = FEP, TI #How many total replicas of each window are run (we recommend at least 5). total_reps = 5 #Boolean for if we will split all replicas into separate runs. (1 for maximum parallelism) split_run = 0 #Where in lambda schedule (0->1) should the electrostatic potentials begin, stop appearing. elec_edges = 0.5, 1.0 #Where in lambda schedule (0->1) should the Lennard_Jones potentials begin, stop appearing. ster_edges = 0.0, 0.5 #The value the global controlling parameter takes in each window (13 windows of this spacing recommended) global_lambdas = 0.00, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1.00 #The name of the pdb file with constraints in the build directory, i.e. cons.pdb If no constraints pass na constraint_file = na #Which column in pdb are constraints provided valid options are occupancy/beta_factor. (beta_factor is standard) constraint_column = beta_factor #Manually specify box vectors of this simulation, unit Angstrom. cell_basis_vec1 = 50, 0.0, 0.0 cell_basis_vec2 = 0.0, 50, 0.0 cell_basis_vec3 = 0.0, 0.0, 50 #What input type is provided, only AMBER supported. input_type = AMBER ``total_reps`` and ``split_run`` are options which can be used to achieve simple parallelism of the simulations. For example if you wished to run a total of 5 simulations on 5 GPUs in parallel one could use the settings ``total_reps = 5`` and ``split_run = 1``. See the :ref:`Parallelization` section for more details of how to achieve this. The following image shows ``TIES_MD`` applied to one alchemical transformation. .. image:: _static/images/one_leg.png :align: center :width: 600 :alt: Alternative text The setting ``global_lambdas`` defines the values the alchemical parameter ``λ`` will take in each alchemical windows. ``global_lambdas`` should run from 0 to 1. As ``global_lambdas`` varies from 0 to 1 some atoms in the system are turned on and off moving the system between two physical states (see :ref:`Alchemical Calculations`). The values given to ``elec_edges`` and ``ster_edges`` define exactly how ``λ`` modifies the potential energy functions of the system and for more information these settings please see the :ref:`Alchemical Pathways` section. Note the option ``constraint_column`` which determines if the constraint indexes will be read from the temperature factor or occupancy column of the constraints PDB. The alchemical indexes are always be read from the temperature factor column in the main PDB ``complex.pdb``. ``TIES20`` will populate a TIES.cfg automatically with the correct box size. Typically a constraint file may be used during preproduction of simulations involving proteins but possibly not a small drug like molecule in only solvent. It will be shown later in the :ref:`Binding Free Energy Tutorial` section when and why we use a constraints file. Command Line ------------ With all input present in the build directory and ``TIES MD`` installed we are almost ready to calculate binding free energies. ``TIES_MD`` can be invoked on the command line by just running ``TIES_MD`` and this will use all default settings and the settings found in ``TIES.cfg``. The setting which can be taken on the command line and their default values are as follows:: [--config_file=./TIES.cfg] A string for the path pointing to where the TIES.cfg file is located. [--exp_name=complex] This is the name of the experiment and the prefix that TIES OpenMM will expect on the input pdb and prmtop file. Note that the constraints pdb is specified separately in the config file. [--run_type=run] A string either `run`, `setup` or `class`. `run` will tell TIES OpenMM to execute the binding free energy calculation, `setup` will prep the output directories and `class` will halt the program after the construction of the TIES class, this can be used in testing or advanced analysis. [--windows_mask=None] Comma separated list of integers. These specify what alchemical windows the current instance of TIES OpenMM should run. By default all windows will be run. # Below are OpenMM specific options, these are silently ignored for NAMD runs. [--devices=0] A comma separated list of integers which tells TIES OpenMM which GPUs to run on. If multiple GPUs are specified then TIES OpenMM will parallelize requested replicas over the available GPUs. [--rep_id=0] An int which will be used to generate the names of output files. Should be used if many independent replicas of the same simulation are run on different nodes to ensure output is writen to unique location. Simulation Preparation ---------------------- ``TIES_MD`` is run in two stages first using the option ``--run_type=setup`` this prepares output directories and MD engine input scripts, this stage is not compute intensive and can be run on a PC or HPC head node. The setup stage can but does not have to be run of you are using ``OpenMM`` and directories are built when using ``--run_type=run`` also. Output directories are prepared with the structure ``LAMBDA_X`` where ``X`` is a float denoting what alchemical window that folder contains the output for. Within ``LAMBDA_X`` there are directories named ``repY`` where ``Y`` is an integer which denotes a replica in the ensemble. In the ``repY`` directories there are three more directories: ``equilibration``, ``simulation`` and ``results``. The ``equilibration`` and ``simulation`` directory will contain all the output for the pre-production and production stages of the simulation respectively. The ``results`` directory will contain the files with potentials and gradients output by ``TIES_MD`` or ``NAMD``. The files in the ``results`` directories will be analysed to calculate binding free energies. Considering the application of this setup stage to `this `_ example for the zero sum transformation of ethane to ethane the setup command would be:: ties_md --exp_name=sys_solv --run_type=setup The above sets up an ``OpenMM`` calculation. Alternatively to use ``NAMD`` some options must be changed please see this `modified `_ config file as an example of what to change. Running Simulations ------------------- The second stage of running ``TIES_MD``, after setup, involves the running of the MD simulations, this is compute intensive and can only be run on a HPC for all but the smallest systems. The execution of ``TIES_MD`` branches at this point dependant on what MD engine is being used. HPC submission scripts should be prepared with the target MD engine in mind. Consider the same `example, `_ used in the setup stage, for the transformation of ethane to ethane. If in TIES.cfg the option ``global_lambdas`` is set equal to ``0.0, 0.2, 0.4, 0.6, 0.8, 1.0`` there are 6 alchemical windows and the option ``total_reps`` is set equal to ``1``, there is therefore 6x1 = 6 total simulations to perform. If a HPC submission script was to request one node with 6 GPUS with each GPU running one alchemical window the run lines for an ``OpenMM`` calculation would look like:: ties_md --exp_name=sys_solv --windows_mask=0,1 --devices=0& ties_md --exp_name=sys_solv --windows_mask=1,2 --devices=1& ties_md --exp_name=sys_solv --windows_mask=2,3 --devices=2& ties_md --exp_name=sys_solv --windows_mask=3,4 --devices=3& ties_md --exp_name=sys_solv --windows_mask=4,5 --devices=4& ties_md --exp_name=sys_solv --windows_mask=5,6 --devices=5& There are a lot of options for how these ``OpenMM`` calculations can be structured and parallelized with ``TIES_MD`` see our :ref:`Parallelization` page for more information on this. For a ``NAMD`` calculation if the submission script requested 6 CPU nodes each with 128 cores the run lines in the submission script might look like:: cd $build/replica-confs for stage in {0..3}; do for lambda in in 0.0 0.2 0.4 0.6 0.8 1.0; do for i in {0..0}; do srun -N 1 -n 128 namd2 --tclmain run$stage.conf $lambda $i & sleep 1 done done wait done Notice in the ``NAMD`` example reference is made to a directory ``$build/replica-confs`` this is where the NAMD input scripts are writen during the ``TIES_MD`` setup stage. Also notice in the ``NAMD`` examples there is a loop over the ``stages`` these are three pre-production stages and one production stage. The preproduction stages are a minimization followed by an NVT equilibration and finishing with NPT equilibration. The production stage is NVT simulation, it is the production simulation which is analysed to calculate the results. These stages are performed automatically by ``TIES MD`` when running with ``OpenMM`` but must be explicitly executed when using ``NAMD`` as shown above. The exact submission script for a particular HPC and the settings with which each engine should be run to get good performance is a wide problem without a general solution to solve any issues we would suggest consulting user manuals of both HPC and MD engine, reading our example :ref:`HPC Submission scripts` or submitting an `issue `_ on ``Github``. Analysis --------- The analysis of the files found in the output can be performed by ``TIES_analysis`` which is a submodule of ``TIES_MD``. ``TIES_MD`` will create the input needed to perform the analysis. Input configuration files for ``TIES_analysis`` will be filled in with information such as the lambda schedule or which MD engine was used. If the directory structure ``study/system/ligand/thermodynamic_leg/build`` was used then these config files are written to the ``study`` directory. Some information is missing from these config files which must be filled out. The missing information is for the names of the ``thermodynamic_leg`` directories. Add the names of the ``thermodynamic_leg`` to the config file ``analysis.cfg`` under the option ``legs`` as an example see the option `legs `_ in this example script. This example analysis input also has an ``exp.dat`` `file `_ populated for the system named ``ethane`` and a ligand transformation in that system called ``zero_sum``, this transformation has an theoretical ΔG of 0.0 kcal/mol and an unknown standard deviation associated with that measurement. Any unknown values in ``exp.dat`` which need to be populated can be left as 0.0. The theoretical ΔG of this ethane zero sum system is zero because the transformation carried out is ethane into ethane so we should expect the result to sum to zero. This is not the result we would expect in general and is special only to a test case such as this. To save time an ``exp.dat`` file with all values set to 0.0 can be generated with ``TIES_analysis`` by running:: ties_ana --run_type=setup The information in the generated ``exp.dat`` will be inferred from the directory structure. If desired the user can populate the ``exp.dat`` correct (non-zero) values at a later date for their own reference/analysis. With ``analysis.cfg`` and ``exp.dat`` populated the analysis can then be executed on a HPC head node or PC by running ``TIES_analysis`` in the ``study`` directory using the command:: ties_ana This will produce as output a file ``results.dat`` in the ``study`` directory which contains a python dictionary keyed first by the methodology used, then the system name and then ligand name. Each value in the dictionary is a list, the first entry in that list is the calculated free energy change and the second entry is the standard deviation associated with that free energy change. So for example the ``results.dat`` output from the ethane to ethane transformation example would look something like:: #METHOD SYSTEM LIGAND {'OpenMM_FEP': {'ethane': {'zero_sum': [-0.023, 0.023]}}, 'OpenMM_TI': {'ethane': {'zero_sum': [0.003, 0.076]}}} To complement this main output ``TIES_analysis`` will also produce analysis figures which should help in determining the validity and accuracy of calculations. ``TIES_analysis`` will place these figures in a new directory named ``analysis/engine/method/system/ligand/thermodynamic_leg/`` .. image:: _static/images/analysis.png :align: center :width: 600 :alt: Alternative text Panel a) pertains to FEP results and measures the overlap in phase space between alchemical states (λ index). If the off diagonal element of this matrix are less then 0.03 then this can serve as a warning to the user if their states are too far apart. Please see the work of `Mey et al. `_ for more details on this analysis. Panel b) pertains to TI results and shows gradient of the potential with respect to the alchemical controlling parameter (λ) rapid changes in this gradient warn the user that the results may be inaccurate. As an example see the following figure: .. image:: _static/images/TI_grad.jpeg :align: center :width: 600 :alt: Alternative text Panel a) show the analysis for a calculation where the gradient of 'λ sterics disappear' is changing too quickly in states 12 - 13. This will lead to inaccuracies in numerical integration used in the TI method. Panel b) shows the results from a simulation of the same chemical system with modified alchemical settings which soften how the Van der Waals forces are removed. This `softening `_ is a common in alchemical methods and default in our implementation of ``TIES``, please see our ``TIES`` `paper `_ for more details.