Initial preparation

The BAC Builder script requires the input PDB structure to follow a particular format.

All chains to be included in the final model must have coordinates present (BIOMT records, etc. will not be applied)
The sequence must contain unambiguous coordinates for each residue and atom (i.e. no alternative locations or residues)
All hydrogen atoms should be removed
Chains must contain no gaps and residues numbered sequentially
Each chain must have a unique ID and end with a TER card
Chains must be present in a particular order: protein, ligand, solvent.
All residues included must be compatible with the Amber forcefield. In the case of the ligand, this means that the included version of the coordinates should be those generated during parameterization (to ensure atom names are consistent).

In addition any ligands will need to be parameterized before they can be simulated. In this section we will prepare the protein and solvent elements of the PDB for incorporation in the starting model and the ligans for parameterization.

Models for preparation in BAC can come from many sources but a common scenario is that the starting point is a PDB containing coordinates for all components of the system. In this section we detail the steps necessary to create a BAC input PDB from such a model. The PDB 4BJX is used as an example to illustrate the general process (it can be downloaded from the link for use when following along).

In this example we assume that you begin with a PDB containing all elements of the system. Where multiple ligands are to be added to the same protein receptor then once the protein structure has been prepared once you can skip to the ligand preparation section to create input for parameterization.

Protein model

The example structure contains a protein, ligand and solvent molecules (see picture below). The first step is to separate the protein chains and ensure they are ready for incorporation in the final model.

4BJX structure

Before proceeding the protein should be checked to ensure that no residues are missing from the chains. If using a file directly from the Protein Databank then the header provides information on missing residues. If not you will have to check manually (using a program such as VMD).

The protein model must be extracted from the PDB. Simple ways of achieving this in general include the use of protein selections within viewers like VMD or using a text editor. In the case of the 4BJX the protein residues are the only ones provided as ATOM records meaning that a grep command (in Linux) can be used to obtain the protein residues alone in a separated PDB:

grep ^ATOM 4bjx.pdb > 4bjx-protein.pdb

An example of the protein only file can be downloaded from here. To remove alternative conformers for residues and get a report on any non-standard residues use pdb4amber, part of the AmberTools package:

pdb4amber -i 4bjx-protein.pdb -o 4bjx-protein-stripped.pdb --nohyd

This will save the updated coordinates in 4bjx-protein-stripped.pdb and provide a short report to the console about the residues edited. The --nohyd flag removes hydrogens, you can leave this off if you are confident that the atoms present are correctly named for use in Amber. An example of the stripped protein file can be downloaded from here.

THE END line at the end of the file should be replaced with TER (using either a text editor or sed). For proteins containing multiple chains TER lines should be inserted between each pair .

Non-standard residues

Non-standard residues (such as those incorporating post-translation modifications) cannot usually be incorporated in BAC models. However, if you have Amber frcmod and lib files these can be incorporated in the system description.

Biological units

No transforms are applied by BAC Builder so these need to be accounted for before processing of the PDB begins.

Disulphide bonds

A file, with the suffix '_sslinks', detailing the bonds to be made should be provided by pdb4amber. Retain this file as it will be included in the final system description.

Solvent molecules

Any solvent molecules you wish to be retained in the final model must be extracted from the PDB. Again, this can be done using a variety of methods (VMD, text editor, etc.). BAC Builder can only parse a small number of solvent atoms

Solvent molecule	Residue Name
Water	HOH
Magnesium	MG
Zinc	ZN
Chlorine	CL
Sodium	NA

The solvent can be extracted from 4BJX using:

egrep " HOH | MG | ZN | CL | NA " 4bjx.pdb | egrep "ATOM | HETATM" > 4bjx-solvent.pdb

This grep command can be dangerous if the ligand contains 'CL' atoms and entries in the element column. This file is an example of this - delete the top line of the 4bjx-solvent.pdb which is the chlorine atom from the ligand. An example of the solvent only file can be downloaded from here.

Prepare ligand for processing

In order to parameterize the ligand we need to have a separate PDB containing only ligand atoms. Create this PDB using the same tools as for the protein and solvent atoms.

The ligand can be extracted from 4BJX using:

grep "73B" 4bjx.pdb | grep HETATM > 4bjx-ligand.pdb

An example of the ligand only file can be downloaded from here. In the next section we will see how to create a parameterization for the ligand.