topology_superimposer

The main module responsible for the superimposition.

Classes:

AtomPair –

An atom pair for networkx.
SuperimposedTopology –

SuperimposedTopology contains in the minimal case two sets of nodes S1 and S2, which

Functions:

get_largest –

return a list of largest solutions
long_merge –

Carry out a merge and apply all checks.
merge_compatible_suptops –

Imagine mapping of two carbons C1 and C2 to another pair of carbons C1' and C2'.
superimpose_topologies –

The main function that manages the entire process.
is_mirror_of_one –

"Mirror" in the sense that it is an alternative topological way to traverse the molecule.
generate_nxg_from_list –

Helper function. Generates a graph from a list of atoms
get_starting_configurations –

Minimise the number of starting configurations to optimise the process speed.
get_atoms_bonds_from_file –

Use Parmed to load the files.
assign_coords_from_pdb –

Match the atoms from the ParmEd object based on a .pdb file

AtomPair

AtomPair(left_node, right_node)

An atom pair for networkx.

Source code in ties/topology_superimposer.py

def __init__(self, left_node, right_node):
    self.left_atom = left_node
    self.right_atom = right_node
    # generate the hash value for this match
    self.hash_value = self._gen_hash()

SuperimposedTopology

SuperimposedTopology(topology1=None, topology2=None, parmed_ligA=None, parmed_ligZ=None)

SuperimposedTopology contains in the minimal case two sets of nodes S1 and S2, which are paired together and represent a strongly connected component.

However, it can also represent the symmetrical versions that were superimposed.

Methods:

mcs_score –

Raturn a ratio of the superimposed atoms to the number of all atoms.
write_metadata –

Writes a .json file with a summary of which atoms are classified as appearing, disappearing
write_pdb –

param filename: name or a filepath of the new file. If None, standard preconfigured pattern will be used.
write_mol2 –

param filename: str location where the .mol2 file should be saved.
get_single_topology_region –

Return: matched atoms (even if they were unmatched for any reason)
get_single_topology_app –

fixme - called app but gives both app and dis
ringring –

Rings can only be matched to rings.
is_or_was_matched –

A helper function. For whatever reasons atoms get discarded.
get_unmatched_atoms –

Find the atoms in both topologies which were unmatched and return them.
get_unique_atom_count –

Requires that the .assign_atoms_ids() was called.
align_ligands_using_mcs –

Align the two ligands using the MCS (Maximum Common Substructure).
alchemical_overlap_check –

Calculate how well the alchemical regions overlap using distances between them.
rm_matched_pairs_with_different_bonds –

Scan the matched pairs. Assume you have three pairs
get_dual_topology_bonds –

Get the bonds between all the atoms.
largest_cc_survives –

CC - Connected Component.
assign_atoms_ids –

Assign an ID to each pair A1-B1. This means that if we request an atom ID
get_appearing_atoms –

fixme - should check first if atomName is unique
get_disappearing_atoms –

fixme - should check first if atomName is unique
remove_lonely_hydrogens –

You could also remove the hydrogens when you correct charges.
match_gaff2_nondirectional_bonds –

If needed, swap cc-cd with cd-cc.
get_net_charge –

Calculate the net charge difference across
get_matched_with_diff_q –

Returns a list of matched atom pairs that have a different q,
apply_net_charge_filter –

Averaging the charges across paired atoms introduced inequalities.
remove_attached_hydrogens –

The node_pair to which these hydrogens are attached was removed.
find_lowest_rmsd_mirror –

Walk through the different mirrors and out of all options select the one
is_subgraph_of_global_top –

Check if after superimposition, one graph is a subgraph of another
rmsd –

For each pair take the distance, and then get rmsd, so root(mean(square(deviation)))
link_pairs –

This helps take care of the bonds.
find_mirror_choices –

For each pair (A1, B1) find all the other options in the mirrors where (A1, B2)
add_alternative_mapping –

This means that there is another way to traverse and overlap the two molecules,
correct_for_coordinates –

Use the coordinates of the atoms, to figure out which symmetries are the correct ones.
is_area_overlapping_fully –

Each atom in one set has to be matched to an atom in the second set. And vice versa.
is_area_overlapping –

Even a small overlap will return True.
enforce_no_partial_rings –

Ensure that rings are either fully matched,
get_topology_similarity_score –

Having the superimposed A(Left) and B(Right), score the match.
unmatch_pairs_with_different_charges –

Removes the matched pairs where atom charges are more different
is_consistent_with –

Conditions:
get_circles –

Return circles found in the matched pairs.
get_original_circles –

Return the original circles present in the input topologies.
cycle_spans_multiple_cycles –

What is the circle is shared?
merge –

Absorb the other suptop by adding all the node pairs that are not present
validate_charges –

Check the original charges:
redistribute_charges –

After the match is made and the user commits to the superimposed topology,
contains_same_atoms_symmetric –

The atoms can be paired differently, but they are the same.
is_subgraph_of –

Checks if this superimposed topology is a subgraph of another superimposed topology.
subgraph_relationship –

Return
is_mirror_of –

this is a naive check
eq –

Check if the superimposed topology is "the same". This means that every pair has a corresponding pair in the
toJSON –

"

Source code in ties/topology_superimposer.py

def __init__(
    self, topology1=None, topology2=None, parmed_ligA=None, parmed_ligZ=None
):
    self.set_parmeds(parmed_ligA, parmed_ligZ)

    """
    @superimposed_nodes : a set of pairs of nodes that matched together
    """
    matched_pairs = []

    # TEST: with the list of matching nodes, check if each node was used only once,
    # the number of unique nodes should be equivalent to 2*len(common_pairs)
    all_matched_nodes = []
    [all_matched_nodes.extend(list(pair)) for pair in matched_pairs]
    assert len(matched_pairs) * 2 == len(all_matched_nodes)

    # fixme don't allow for initiating with matching pairs, it's not used anyway

    # todo convert to nx? some other graph theory package?
    matched_pairs.sort(key=lambda pair: pair[0].name)
    self.matched_pairs = matched_pairs
    self.top1 = topology1
    self.top2 = topology2
    # create graph representation for both in networkx library, initially to track the number of cycles
    # fixme

    self.mirrors = []
    self.alternative_mappings = []
    # this is a set of all nodes rather than their pairs
    self.nodes = set(all_matched_nodes)
    self.nodes_added_log = []

    self.internal_ids = None
    self.unique_atom_count = 0
    self.matched_pairs_bonds = {}

    # options
    # Ambertools ignores the bonds when creating the .prmtop from the hybrid.mol2 file,
    # so for now we can ignore the bond types
    self.ignore_bond_types = True

    # removed because
    # fixme - make this into a list
    self._removed_pairs_with_charge_difference = []  # atom-atom charge decided by qtol
    self._removed_because_disjointed_cc = []  # disjointed segment
    self._removed_due_to_net_charge = []
    self._removed_because_unmatched_rings = []
    self._removed_because_diff_bonds = []  # the atoms pair uses a different bond

    # save the cycles in the left and right molecules
    if self.top1 is not None and self.top2 is not None:
        self._init_nonoverlapping_cycles()

    self.id = SuperimposedTopology.COUNTER
    SuperimposedTopology.COUNTER += 1

mcs_score

mcs_score()

Raturn a ratio of the superimposed atoms to the number of all atoms. Specifically, (superimposed_atoms_number * 2) / (atoms_number_ligandA + atoms_number_ligandB) :return:

Source code in ties/topology_superimposer.py

def mcs_score(self):
    """
    Raturn a ratio of the superimposed atoms to the number of all atoms.
    Specifically, (superimposed_atoms_number * 2) / (atoms_number_ligandA + atoms_number_ligandB)
    :return:
    """
    return (len(self.matched_pairs) * 2) / (len(self.top1) + len(self.top2))

write_metadata

write_metadata(filename=None)

Writes a .json file with a summary of which atoms are classified as appearing, disappearing as well as all other metadata relevant to this superimposition/hybrid. TODO add information: - config class in general -- relative paths to ligand 1, ligand 2 (the latest copies, ie renamed etc) -- general settings used - pair? bonds? these can be restractured, so not necessary?

param filename: a location where the metadata should be saved

Source code in ties/topology_superimposer.py

def write_metadata(self, filename=None):
    """
    Writes a .json file with a summary of which atoms are classified as appearing, disappearing
    as well as all other metadata relevant to this superimposition/hybrid.
    TODO add information:
     - config class in general
     -- relative paths to ligand 1, ligand 2 (the latest copies, ie renamed etc)
     -- general settings used
     - pair? bonds? these can be restractured, so not necessary?

        param filename: a location where the metadata should be saved
    """

    # store at the root for now
    # fixme - should either be created or generated API
    if filename is None:
        matching_json = (
            self.config.workdir
            / f"meta_{self.morph.ligA.internal_name}_{self.morph.ligB.internal_name}.json"
        )
    else:
        matching_json = pathlib.Path(filename)

    matching_json.parent.mkdir(parents=True, exist_ok=True)

    json.dump(self.toJSON(), open(matching_json, "w"))

write_pdb

write_pdb(filename=None)

param filename: name or a filepath of the new file. If None, standard preconfigured pattern will be used.

Source code in ties/topology_superimposer.py

def write_pdb(self, filename=None):
    """
    param filename: name or a filepath of the new file. If None, standard preconfigured pattern will be used.
    """
    if filename is None:
        morph_pdb_path = (
            self.config.workdir
            / f"{self.morph.ligA.internal_name}_{self.morph.ligB.internal_name}_morph.pdb"
        )
    else:
        morph_pdb_path = filename

    # def write_morph_top_pdb(filepath, mda_l1, mda_l2, suptop, hybrid_single_dual_top=False):
    if self.config.use_hybrid_single_dual_top:
        # the NAMD hybrid single dual topology
        # rename the ligand on the left to INI
        # and the ligand on the right to END

        # first, set all the matched pairs to -2 and 2 (single topology)
        # regardless of how they were mismatched
        raise NotImplementedError(
            "Cannot yet write hybrid single dual topology .pdb file"
        )

        # then, set the different atoms to -1 and 1 (dual topology)

        # save in a single PDB file
        # Note that the atoms from left to right
        # in the single topology region have to
        # be separated by the same number
        # fixme - make a check for that
        return
    # fixme - find another library that can handle writing to a PDB file, MDAnalysis
    # save the ligand with all the appropriate atomic positions, write it using the pdb format
    # pdb file format: http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM
    # write a dual .pdb file
    with open(morph_pdb_path, "w") as FOUT:
        for atom in self.parmed_ligA.atoms:
            """
            There is only one forcefield which is shared across the two topologies.
            Basically, we need to check whether the atom is in both topologies.
            If that is the case, then the atom should have the same name, and therefore appear only once.
            However, if there is a new atom, it should be specfically be outlined
            that it is 1) new and 2) the right type
            """
            # write all the atoms if they are matched, that's the common part
            # note that ParmEd does not have the information on a residue ID
            REMAINS = 0
            if self.contains_left_atom(atom.idx):
                line = (
                    f"ATOM  {atom.idx:>5d} {atom.name:>4s} {atom.residue.name:>3s}  "
                    f"{1:>4d}    "
                    f"{atom.xx:>8.3f}{atom.xy:>8.3f}{atom.xz:>8.3f}"
                    f"{1.0:>6.2f}{REMAINS:>6.2f}" + (" " * 11) + "  " + "  " + "\n"
                )
                FOUT.write(line)
            else:
                # this atom was not found, this means it disappears, so we should update the
                DISAPPEARING_ATOM = -1.0
                line = (
                    f"ATOM  {atom.idx:>5d} {atom.name:>4s} {atom.residue.name:>3s}  "
                    f"{1:>4d}    "
                    f"{atom.xx:>8.3f}{atom.xy:>8.3f}{atom.xz:>8.3f}"
                    f"{1.0:>6.2f}{DISAPPEARING_ATOM:>6.2f}"
                    + (" " * 11)
                    + "  "
                    + "  "
                    + "\n"
                )
                FOUT.write(line)
        # add atoms from the right topology,
        # which are going to be created
        for atom in self.parmed_ligZ.atoms:
            if not self.contains_right_atom(atom.idx):
                APPEARING_ATOM = 1.0
                line = (
                    f"ATOM  {atom.idx:>5d} {atom.name:>4s} {atom.residue.name:>3s}  "
                    f"{1:>4d}    "
                    f"{atom.xx:>8.3f}{atom.xy:>8.3f}{atom.xz:>8.3f}"
                    f"{1.0:>6.2f}{APPEARING_ATOM:>6.2f}"
                    + (" " * 11)
                    + "  "
                    + "  "
                    + "\n"
                )
                FOUT.write(line)
    self.pdb = morph_pdb_path

write_mol2

write_mol2(filename=None, use_left_charges=True, use_left_coords=True)

param filename: str location where the .mol2 file should be saved.

Source code in ties/topology_superimposer.py

def write_mol2(self, filename=None, use_left_charges=True, use_left_coords=True):
    """
    param filename: str location where the .mol2 file should be saved.
    """
    if filename is None:
        hybrid_mol2 = (
            self.config.workdir
            / f"{self.morph.ligA.internal_name}_{self.morph.ligB.internal_name}_morph.mol2"
        )
    else:
        hybrid_mol2 = filename

    # fixme - make this as a method of suptop as well
    # recreate the mol2 file that is merged and contains the correct atoms from both
    # mol2 format: http://chemyang.ccnu.edu.cn/ccb/server/AIMMS/mol2.pdf
    # fixme - build this molecule using the MDAnalysis builder instead of the current approach
    # however, MDAnalysis currently cannot convert pdb into mol2? ...
    # where the formatting is done manually
    with open(hybrid_mol2, "w") as FOUT:
        bonds = self.get_dual_topology_bonds()

        FOUT.write("@<TRIPOS>MOLECULE " + os.linesep)
        # name of the molecule
        FOUT.write("HYB " + os.linesep)
        # num_atoms [num_bonds [num_subst [num_feat [num_sets]]]]
        # fixme this is tricky
        FOUT.write(f"{self.get_unique_atom_count():d} {len(bonds):d}" + os.linesep)
        # mole type
        FOUT.write("SMALL " + os.linesep)
        # charge_type
        FOUT.write("NO_CHARGES " + os.linesep)
        FOUT.write(os.linesep)

        # write the atoms
        FOUT.write("@<TRIPOS>ATOM " + os.linesep)
        # atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]]
        # e.g.
        #       1 O4           3.6010   -50.1310     7.2170 o          1 L39      -0.815300

        # so from the two topologies all the atoms are needed and they need to have a different atom_id
        # so we might need to name the atom_id for them, other details are however pretty much the same
        # the importance of atom_name is difficult to estimate

        # we are going to assign IDs in the superimposed topology in order to track which atoms have IDs
        # and which don't

        # fixme - for writing, modify things to achieve the desired output
        # note - we are modifying in place our atoms
        for left, right in self.matched_pairs:
            logger.debug(
                f"Aligned {left.original_name} id {left.id} with {right.original_name} id {right.id}"
            )
            if not use_left_charges:
                left.charge = right.charge
            if not use_left_coords:
                left.position = right.position

        subst_id = 1  # resid basically
        # write all the atoms that were matched first with their IDs
        # prepare all the atoms, note that we use primarily the left ligand naming
        all_atoms = [
            left for left, right in self.matched_pairs
        ] + self.get_unmatched_atoms()

        # reorder the list according to the ID
        all_atoms.sort(key=lambda atom: self.get_generated_atom_id(atom))

        resname = "HYB"
        for atom in all_atoms:
            FOUT.write(
                f"{self.get_generated_atom_id(atom)} {atom.name} "
                f"{atom.position[0]:.4f} {atom.position[1]:.4f} {atom.position[2]:.4f} "
                f"{atom.type.lower()} {subst_id} {resname} {atom.charge:.6f} {os.linesep}"
            )

        FOUT.write(os.linesep)

        # write bonds
        FOUT.write("@<TRIPOS>BOND " + os.linesep)

        # we have to list every bond:
        # 1) all the bonds between the paired atoms, so that part is easy
        # 2) bonds which link the disappearing atoms, and their connection to the paired atoms
        # 3) bonds which link the appearing atoms, and their connections to the paired atoms

        bond_counter = 1
        list(bonds)
        for bond_from_id, bond_to_id, bond_type in sorted(
            list(bonds), key=lambda b: b[:2]
        ):
            # Bond Line Format:
            # bond_id origin_atom_id target_atom_id bond_type [status_bits]
            FOUT.write(
                f"{bond_counter} {bond_from_id} {bond_to_id} {bond_type}"
                + os.linesep
            )
            bond_counter += 1

    self.mol2 = hybrid_mol2

get_single_topology_region

get_single_topology_region()

Return: matched atoms (even if they were unmatched for any reason)

Source code in ties/topology_superimposer.py

def get_single_topology_region(self):
    """
    Return: matched atoms (even if they were unmatched for any reason)
    """
    # strip the pairs of the exact information about the charge differences
    removed_pairs_with_charge_difference = [
        (n1, n2) for (n1, n2), q_diff in self._removed_pairs_with_charge_difference
    ]

    # fixme: this should not work with disjointed cc and others?
    unpaired = (
        self._removed_because_disjointed_cc
        + self._removed_due_to_net_charge
        + removed_pairs_with_charge_difference
    )

    return self.matched_pairs + unpaired

get_single_topology_app

get_single_topology_app()

fixme - called app but gives both app and dis get the appearing and disappearing region in the hybrid single topology use the single topology region and classify all other atoms not in it as either appearing or disappearing

Source code in ties/topology_superimposer.py

def get_single_topology_app(self):
    """
    fixme - called app but gives both app and dis
    get the appearing and disappearing region in the hybrid single topology
    use the single topology region and classify all other atoms not in it
    as either appearing or disappearing
    """
    single_top_area = self.get_single_topology_region()

    # turn it into a set
    single_top_set = set()
    for left, right in single_top_area:
        single_top_set.add(left)
        single_top_set.add(right)

    # these unmatched atoms could be due to charge etc.
    # so they historically refer to the dual-topology
    unmatched_app = self.get_appearing_atoms()
    app = {a for a in unmatched_app if a not in single_top_set}
    unmatched_dis = self.get_disappearing_atoms()
    dis = {a for a in unmatched_dis if a not in single_top_set}

    return app, dis

ringring

ringring()

Rings can only be matched to rings.

Source code in ties/topology_superimposer.py

def ringring(self):
    """
    Rings can only be matched to rings.
    """
    l_circles, r_circles = self.get_original_circles()
    removed_h = []
    ringring_removed = []
    for left, right in self.matched_pairs[::-1]:
        if (left, right) in removed_h:
            continue

        l_ring = any([left in c for c in l_circles])
        r_ring = any([right in c for c in r_circles])
        if l_ring + r_ring == 1:
            removed_h.extend(self.remove_attached_hydrogens((left, right)))
            self.remove_node_pair((left, right))
            ringring_removed.append((left, right))

    if ringring_removed:
        logger.debug(
            f"(ST{self.id}) Ring only matches ring filter, removed: {ringring_removed} with hydrogens {removed_h}"
        )
    return ringring_removed, removed_h

is_or_was_matched

is_or_was_matched(atom_name1, atom_name2)

A helper function. For whatever reasons atoms get discarded. E.g. they had a different charge, or were part of the disjointed component, etc. This function simply checks if the most original match was made between the two atoms. It helps with verifying the original matching.

Source code in ties/topology_superimposer.py

def is_or_was_matched(self, atom_name1, atom_name2):
    """
    A helper function. For whatever reasons atoms get discarded.
    E.g. they had a different charge, or were part of the disjointed component, etc.
    This function simply checks if the most original match was made between the two atoms.
    It helps with verifying the original matching.
    """
    if self.contains_atom_name_pair(atom_name1, atom_name2):
        return True

    # check if it was unmatched
    unmatched_lists = [
        self._removed_because_disjointed_cc,
        # ignore the charges in this list
        [pair for pair, q in self._removed_due_to_net_charge],
        [pair for pair, q in self._removed_pairs_with_charge_difference],
    ]
    for unmatched_list in unmatched_lists:
        for atom1, atom2 in unmatched_list:
            if atom1.name == atom_name1 and atom2.name == atom_name2:
                return True

    return False

get_unmatched_atoms

get_unmatched_atoms()

Find the atoms in both topologies which were unmatched and return them. These are both, appearing and disappearing.

Note that some atoms were removed due to charges.

Source code in ties/topology_superimposer.py

def get_unmatched_atoms(self):
    """
    Find the atoms in both topologies which were unmatched and return them.
    These are both, appearing and disappearing.

    Note that some atoms were removed due to charges.
    """
    unmatched_atoms = []
    for node in self.top1:
        if not self.contains_node(node):
            unmatched_atoms.append(node)

    for node in self.top2:
        if not self.contains_node(node):
            unmatched_atoms.append(node)

    return unmatched_atoms

get_unique_atom_count

get_unique_atom_count()

Requires that the .assign_atoms_ids() was called. This should be rewritten. But basically, it needs to count each matched pair as one atom, and the appearing and disappearing atoms separately.

Source code in ties/topology_superimposer.py

def get_unique_atom_count(self):
    """
    Requires that the .assign_atoms_ids() was called.
    This should be rewritten. But basically, it needs to count each matched pair as one atom,
    and the appearing and disappearing atoms separately.
    """
    return self.unique_atom_count

align_ligands_using_mcs

align_ligands_using_mcs(overwrite_original=False, use_disjointed=False)

Align the two ligands using the MCS (Maximum Common Substructure). The ligA here is the reference (docked) to which the ligZ is aligned.

:param overwrite_original: After aligning by MCS, update the internal coordinates which will be saved to a file at the end. :type overwrite_original: bool

Source code in ties/topology_superimposer.py

def align_ligands_using_mcs(self, overwrite_original=False, use_disjointed=False):
    """
    Align the two ligands using the MCS (Maximum Common Substructure).
    The ligA here is the reference (docked) to which the ligZ is aligned.

    :param overwrite_original: After aligning by MCS, update the internal coordinates
        which will be saved to a file at the end.
    :type overwrite_original: bool
    """

    if self.mda_ligA is None or self.mda_ligB is None:
        # todo comment
        return self.rmsd()

    ligA = self.mda_ligA
    ligB = self.mda_ligB

    # back up
    ligA_original_positions = ligA.atoms.positions[:]
    ligB_original_positions = ligB.atoms.positions[:]

    left_disjointed_cc = []
    right_disjointed_cc = []

    if use_disjointed and self._removed_because_disjointed_cc:
        left_disjointed_cc = [
            left.id for left, _ in self._removed_because_disjointed_cc
        ]
        right_disjointed_cc = [
            right.id for _, right in self._removed_because_disjointed_cc
        ]

    # select the atoms for the MCS,
    # the following uses 0-based indexing
    mcs_ligA_ids = [
        left.id for left, right in self.matched_pairs
    ] + left_disjointed_cc
    mcs_ligB_ids = [
        right.id for left, right in self.matched_pairs
    ] + right_disjointed_cc

    ligA_fragment = ligA.atoms[mcs_ligA_ids]
    ligB_fragment = ligB.atoms[mcs_ligB_ids]

    # move all to the origin of the fragment
    ligA_mcs_centre = ligA_fragment.centroid()
    ligA.atoms.translate(-ligA_mcs_centre)
    ligB.atoms.translate(-ligB_fragment.centroid())

    rotation_matrix, rmsd = MDAnalysis.analysis.align.rotation_matrix(
        ligB_fragment.positions, ligA_fragment.positions
    )

    # apply the rotation to
    ligB.atoms.rotate(rotation_matrix)
    # move back to ligA
    ligB.atoms.translate(ligA_mcs_centre)

    # save the superimposed coordinates
    ligB_sup = self.mda_ligB.atoms.positions[:]

    # restore the MDAnalysis positions ("working copy")
    # in theory you do not need to do this every time
    self.mda_ligA.atoms.positions = ligA_original_positions
    self.mda_ligB.atoms.positions = ligB_original_positions

    if not overwrite_original:
        # return the RMSD of the superimposed matched pairs only
        return rmsd

    # use the aligned coordinates
    self.parmed_ligZ.coordinates = ligB_sup

    # ideally this would now be done with MDAnalysis which can now write .mol2
    # overwrite the internal atom positions with the final generated alignment
    for parmed_atom in self.parmed_ligZ.atoms:
        found = False
        for atom in self.top2:
            if parmed_atom.idx == atom.id:
                atom.position = parmed_atom.xx, parmed_atom.xy, parmed_atom.xz
                found = True
                break
        assert found

    return rmsd

alchemical_overlap_check

alchemical_overlap_check() -> tuple[float]

Calculate how well the alchemical regions overlap using distances between them.

For A (left) and B (right). For each atom in B, find the distance to closest alchemical atom in A to get B-A distances. Then apply RMS(B-A).

Do the same steps in reverse to get A-B.

For B-A

0, B and A are the same size.

0, B is growing

If both, B-A and A-B > 0, this means the alchemical regions are divergent.

This function takes the coordinates as they come.

:return: RMS(A-B), max(A-B), RMS(B-A), max(B-A)

Source code in ties/topology_superimposer.py

def alchemical_overlap_check(self) -> tuple[float]:
    """
    Calculate how well the alchemical regions overlap using distances between them.

    For A (left) and B (right). For each atom in B,
    find the distance to closest alchemical atom in A to get B-A distances.
    Then apply RMS(B-A).

    Do the same steps in reverse to get A-B.

    For B-A:
        0, B and A are the same size.
        >0, B is growing

    If both, B-A and A-B > 0, this means the alchemical regions are divergent.

    This function takes the coordinates as they come.

    :return: RMS(A-B), max(A-B), RMS(B-A), max(B-A)
    """

    # alchemical areas
    B_pos = np.array([a.position for a in self.get_appearing_atoms()])
    A_pos = np.array([a.position for a in self.get_disappearing_atoms()])

    if not B_pos.size or not A_pos.size:
        return 0, 0, 0, 0

    # shortest distances from B to any alchemical atom in A
    # B - A
    shortest_B_to_A = MDAnalysis.lib.distances.distance_array(B_pos, A_pos).min(
        axis=1
    )
    B_to_A_rmsd = np.sqrt(np.square(shortest_B_to_A).mean())

    # shortest distances from A to any alchemical atom in B
    # A - B
    shortest_A_to_B = MDAnalysis.lib.distances.distance_array(A_pos, B_pos).min(
        axis=1
    )
    A_to_B_rmsd = np.sqrt(np.square(shortest_A_to_B).mean())

    return A_to_B_rmsd, max(shortest_A_to_B), B_to_A_rmsd, max(shortest_B_to_A)

rm_matched_pairs_with_different_bonds

rm_matched_pairs_with_different_bonds()

Scan the matched pairs. Assume you have three pairs A-B=C with the double bond on the right side, and the alternative bonds A=B-C remove all A, B and C pairs because of the different bonds Remove them by finding that A-B is not A=B, and B=C is not B-C

return: the list of removed pairs

Source code in ties/topology_superimposer.py

def rm_matched_pairs_with_different_bonds(self):
    """
    Scan the matched pairs. Assume you have three pairs
    A-B=C with the double bond on the right side,
    and the alternative bonds
    A=B-C remove all A, B and C pairs because of the different bonds
    Remove them by finding that A-B is not A=B, and B=C is not B-C

    return: the list of removed pairs
    """

    # extract the bonds for the matched molecules first
    removed_pairs = []
    for from_pair, bonded_pair_list in list(self.matched_pairs_bonds.items())[::-1]:
        for bonded_pair, bond_type in bonded_pair_list:
            # ignore if this combination was already checked
            if bonded_pair in removed_pairs and from_pair in removed_pairs:
                continue

            if bond_type[0] != bond_type[1]:
                # resolve this, remove the bonded pair from the matched atoms
                if from_pair not in removed_pairs:
                    self.remove_node_pair(from_pair)
                    removed_pairs.append(from_pair)
                if bonded_pair not in removed_pairs:
                    self.remove_node_pair(bonded_pair)
                    removed_pairs.append(bonded_pair)

                # keep the history
                self._removed_because_diff_bonds.append((from_pair, bonded_pair))

    return removed_pairs

get_dual_topology_bonds

get_dual_topology_bonds()

Get the bonds between all the atoms. Use the atom IDs for the bonds.

Source code in ties/topology_superimposer.py

def get_dual_topology_bonds(self):
    """
    Get the bonds between all the atoms.
    Use the atom IDs for the bonds.
    """
    assert self.top1 is not None and self.top2 is not None
    # fixme - check if the atoms IDs have been generated
    assert self.internal_ids is not None

    # extract the bonds for the matched molecules first
    bonds = set()
    for from_pair, bonded_pair_list in self.matched_pairs_bonds.items():
        from_pair_id = self.get_generated_atom_id(from_pair)
        for bonded_pair, bond_type in bonded_pair_list:
            if not self.ignore_bond_types:
                if bond_type[0] != bond_type[1]:
                    logger.error(
                        "ERROR: bond types do not match, even though they apply to the same atoms"
                    )
                    logger.error(
                        f'ERROR: left bond is "{bond_type[0]}" and right bond is "{bond_type[1]}"'
                    )
                    logger.error(f"ERROR: the bonded atoms are {bonded_pair}")
                    raise Exception(
                        "The bond types do not correspond to each other"
                    )
            # every bonded pair has to be in the topology
            assert bonded_pair in self.matched_pairs
            to_pair_id = self.get_generated_atom_id(bonded_pair)
            # before adding them to bonds, check if they are not already there
            bond_sorted = sorted([from_pair_id, to_pair_id])
            bond_sorted.append(bond_type[0])
            bonds.add(tuple(bond_sorted))

    # extract the bond information from the unmatched
    unmatched_atoms = self.get_unmatched_atoms()
    # for every atom, check to which "pair" the bond connects,
    # and use that pair's ID to make the link

    # several iterations of walking through the atoms,
    # this is to ensure that we remove each atom one by one
    # e.g. imagine this PAIR-SingleA1-SingleA2-SingleA3
    # so only the first SingleA1 is connected to a pair,
    # so the first iteration would take care of that,
    # the next iteration would connect SingleA2 to SingleA1, etc
    # first, remove the atoms that are connected to pairs
    for atom in unmatched_atoms:
        for bond in atom.bonds:
            unmatched_atom_id = self.get_generated_atom_id(atom)
            # check if the unmatched atom is bonded to any pair
            pair = self.find_pair_with_atom(bond.atom)
            if pair is not None:
                # this atom is bound to a pair, so add the bond to the pair
                pair_id = self.get_generated_atom_id(pair[0])
                # add the bond between the atom and the pair
                bond_sorted = sorted([unmatched_atom_id, pair_id])
                bond_sorted.append(bond.type)
                bonds.add(tuple(bond_sorted))
            else:
                # it is not directly linked to a matched pair,
                # simply add this missing bond to whatever atom it is bound
                another_unmatched_atom_id = self.get_generated_atom_id(bond.atom)
                bond_sorted = sorted([unmatched_atom_id, another_unmatched_atom_id])
                bond_sorted.append(bond.type)
                bonds.add(tuple(bond_sorted))

    # fixme - what about circles etc? these bonds
    # that form circles should probably be added while checking if the circles make sense etc
    # also, rather than checking if it is a circle, we could check if the new linked atom,
    # is in a pair to which the new pair refers (the same rule that is used currently)
    return bonds

largest_cc_survives

largest_cc_survives(verbose=True)

CC - Connected Component.

Removes any disjoint components. Only the largest CC will be left. In the case of of equal length CCs, an arbitrary is chosen.

How: Generates the graph where each pair is a single node, connecting the nodes if the bonds exist. Uses then networkx to find CCs.

Source code in ties/topology_superimposer.py

def largest_cc_survives(self, verbose=True):
    """
    CC - Connected Component.

    Removes any disjoint components. Only the largest CC will be left.
    In the case of of equal length CCs, an arbitrary is chosen.

    How:
    Generates the graph where each pair is a single node, connecting the nodes if the bonds exist.
    Uses then networkx to find CCs.
    """

    if len(self) == 0:
        return self, []

    def lookup_up(pairs, tuple_pair):
        for pair in pairs:
            if pair.is_pair(tuple_pair):
                return pair

        raise Exception("Did not find the AtomPair")

    g = nx.Graph()
    atom_pairs = []
    for pair in self.matched_pairs:
        ap = AtomPair(pair[0], pair[1])
        atom_pairs.append(ap)
        g.add_node(ap)

    # connect the atom pairs
    for pair_from, pair_list in self.matched_pairs_bonds.items():
        # lookup the corresponding atom pairs
        ap_from = lookup_up(atom_pairs, pair_from)
        for tuple_pair, bond_type in pair_list:
            ap_to = lookup_up(atom_pairs, tuple_pair)
            g.add_edge(ap_from, ap_to)

    # check for connected components (CC)
    remove_ccs = []
    ccs = [g.subgraph(cc).copy() for cc in nx.connected_components(g)]
    largest_cc = max([len(cc) for cc in ccs])

    # there are disjoint fragments, remove the smaller one
    for cc in ccs[::-1]:
        # remove the cc if it smaller than the largest component
        if len(cc) < largest_cc:
            remove_ccs.append(cc)
            ccs.remove(cc)

    # remove the cc that have a smaller number of heavy atoms
    largest_heavy_atom_cc = max(
        [len([p for p in cc.nodes() if p.is_heavy_atom()]) for cc in ccs]
    )
    for cc in ccs[::-1]:
        if len([p for p in cc if p.is_heavy_atom()]) < largest_heavy_atom_cc:
            if verbose:
                logger.debug("Found CC that had fewer heavy atoms. Removing. ")
            remove_ccs.append(cc)
            ccs.remove(cc)

    # remove the cc that has a smaller number of rings
    largest_cycle_num = max([len(nx.cycle_basis(cc)) for cc in ccs])
    for cc in ccs[::-1]:
        if len(nx.cycle_basis(cc)) < largest_cycle_num:
            if verbose:
                logger.debug("Found CC that had fewer cycles. Removing. ")
            remove_ccs.append(cc)
            ccs.remove(cc)

    # remove cc that has a smaller number of heavy atoms across rings
    most_heavy_atoms_in_cycles = 0
    for cc in ccs[::-1]:
        # count the heavy atoms across the cycles
        heavy_atom_counter = 0
        for cycle in nx.cycle_basis(cc):
            for a in cycle:
                if a.is_heavy_atom():
                    heavy_atom_counter += 1
        if heavy_atom_counter > most_heavy_atoms_in_cycles:
            most_heavy_atoms_in_cycles = heavy_atom_counter

    for cc in ccs[::-1]:
        # count the heavy atoms across the cycles
        heavy_atom_counter = 0
        for cycle in nx.cycle_basis(cc):
            for a in cycle:
                if a.is_heavy_atom():
                    heavy_atom_counter += 1

        if heavy_atom_counter < most_heavy_atoms_in_cycles:
            if verbose:
                logger.debug(
                    "Found CC that had fewer heavy atoms in cycles. Removing. "
                )
            remove_ccs.append(cc)
            ccs.remove(cc)

    if len(ccs) > 1:
        # there are equally large CCs
        if verbose:
            logger.debug(
                "The Connected Components are equally large! Picking the first one"
            )
        for cc in ccs[1:]:
            remove_ccs.append(cc)
            ccs.remove(cc)

    assert len(ccs) == 1, (
        "At this point there should be left only one main component"
    )

    # remove the worse cc
    for cc in remove_ccs:
        for atom_pair in cc:
            atom_tuple = (atom_pair.left_atom, atom_pair.right_atom)
            self.remove_node_pair(atom_tuple)
            self._removed_because_disjointed_cc.append(atom_tuple)

    return largest_cc, remove_ccs

assign_atoms_ids

assign_atoms_ids(id_start=1)

Assign an ID to each pair A1-B1. This means that if we request an atom ID for A1 or B1 it will be the same.

Then assign different IDs for the other atoms

Source code in ties/topology_superimposer.py

def assign_atoms_ids(self, id_start=1):
    """
    Assign an ID to each pair A1-B1. This means that if we request an atom ID
    for A1 or B1 it will be the same.

    Then assign different IDs for the other atoms
    """
    self.internal_ids = {}
    id_counter = id_start
    # for each pair assign an ID
    for left_atom, right_atom in self.matched_pairs:
        self.internal_ids[left_atom] = id_counter
        self.internal_ids[right_atom] = id_counter
        # make it possible to look up the atom ID with a pair
        self.internal_ids[(left_atom, right_atom)] = id_counter

        id_counter += 1
        self.unique_atom_count += 1

    # for each atom that was not mapped to any other atom,
    # but is still in the topology, generate an ID for it

    # find the not mapped atoms in the left topology and assign them an atom ID
    for node in self.top1:
        # check if this node was matched
        if not self.contains_node(node):
            self.internal_ids[node] = id_counter
            id_counter += 1
            self.unique_atom_count += 1

    # find the not mapped atoms in the right topology and assign them an atom ID
    for node in self.top2:
        # check if this node was matched
        if not self.contains_node(node):
            self.internal_ids[node] = id_counter
            id_counter += 1
            self.unique_atom_count += 1

    # return the last atom
    return id_counter

get_appearing_atoms

get_appearing_atoms()

fixme - should check first if atomName is unique

Return a list of appearing atoms (atomName) which are the atoms that are

Source code in ties/topology_superimposer.py

def get_appearing_atoms(self):
    """
    # fixme - should check first if atomName is unique
    Return a list of appearing atoms (atomName) which are the
    atoms that are
    """
    unmatched = []
    for top2_atom in self.top2:
        is_matched = False
        for _, matched_right_ligand_atom in self.matched_pairs:
            if top2_atom is matched_right_ligand_atom:
                is_matched = True
                break
        if not is_matched:
            unmatched.append(top2_atom)

    return unmatched

get_disappearing_atoms

get_disappearing_atoms()

fixme - should check first if atomName is unique

fixme - update to using the node set

Return a list of appearing atoms (atomName) which are the atoms that are found in the topology, and that are not present in the matched_pairs

Source code in ties/topology_superimposer.py

def get_disappearing_atoms(self):
    """
    # fixme - should check first if atomName is unique
    # fixme - update to using the node set
    Return a list of appearing atoms (atomName) which are the
    atoms that are found in the topology, and that
    are not present in the matched_pairs
    """
    unmatched = []
    for top1_atom in self.top1:
        is_matched = False
        for matched_left_ligand_atom, _ in self.matched_pairs:
            if top1_atom is matched_left_ligand_atom:
                is_matched = True
                break
        if not is_matched:
            unmatched.append(top1_atom)

    return unmatched

remove_lonely_hydrogens

remove_lonely_hydrogens()

You could also remove the hydrogens when you correct charges.

Source code in ties/topology_superimposer.py

def remove_lonely_hydrogens(self):
    """
    You could also remove the hydrogens when you correct charges.
    """
    logger.error(
        "ERROR: function used that was not verified. It can create errors. "
        "Please verify that the code works first."
    )
    # in order to see any hydrogens that are by themselves, we check for any connection
    removed_pairs = []
    for A1, B1 in self.matched_pairs:
        # fixme - assumes hydrogens start their names with H*
        if not A1.name.upper().startswith("H"):
            continue

        # check if any of the bonded atoms can be found in this sup top
        if not self.contains_any(A1.bonds) or not self.contains_node(B1.bonds):
            # we appear disconnected, remove us
            pass
        for bonded_atom in A1.bonds:
            assert not bonded_atom.name.upper().startswith("H")
            if self.contains_node(bonded_atom):
                continue

    return removed_pairs

match_gaff2_nondirectional_bonds

match_gaff2_nondirectional_bonds()

If needed, swap cc-cd with cd-cc. If two pairs are linked: (CC/CD) - (CD/CC), replace them according to the left side: (CC/CC) - (CD/CD). Apply this rule to all other pairs in Table I (b) at http://ambermd.org/antechamber/gaff.html

These two define where the double bond is in a ring. GAFF decides on which one is cc or cd depending on the arbitrary atom order. This intervention we ensure that we do not remove atoms based on that arbitrary order.

This method is idempotent.

Source code in ties/topology_superimposer.py

def match_gaff2_nondirectional_bonds(self):
    """
    If needed, swap cc-cd with cd-cc.
    If two pairs are linked: (CC/CD) - (CD/CC),
    replace them according to the left side: (CC/CC) - (CD/CD).
    Apply this rule to all other pairs in Table I (b) at http://ambermd.org/antechamber/gaff.html

    These two define where the double bond is in a ring.
    GAFF decides on which one is cc or cd depending on the arbitrary atom order.
    This intervention we ensure that we do not remove atoms based on that arbitrary order.

    This method is idempotent.
    """
    nondirectionals = (
        {"CC", "CD"},
        {"CE", "CF"},
        {"CP", "CQ"},
        {"PC", "PD"},
        {"PE", "PF"},
        {"NC", "ND"},
    )

    for no_direction_pair in nondirectionals:
        corrected_pairs = []
        for A1, A2 in self.matched_pairs:
            # check if it is the right combination
            if (
                not {A1.type, A2.type} == no_direction_pair
                or (A1, A2) in corrected_pairs
            ):
                continue

            # ignore if they are already the same
            if A2.type == A1.type:
                continue

            # fixme - temporary solution
            # fixme - do we want to check if we are in a ring?
            # for now we are simply rewriting the types here so that it passes the "specific atom type" checks later
            # ie so that later CC-CC and CD-CD are compared
            # fixme - check if .type is used when writing the final output.
            A2.type = A1.type
            logger.debug(
                f"Arbitrary atom type correction. "
                f"Right atom type {A2.type} (in {A2}) overwritten with left atom type {A1.type} (in {A1}). "
            )

            corrected_pairs.append((A1, A2))

    return 0

get_net_charge

get_net_charge()

Calculate the net charge difference across the matched pairs.

Source code in ties/topology_superimposer.py

def get_net_charge(self):
    """
    Calculate the net charge difference across
    the matched pairs.
    """
    net_charge = sum(n1.charge - n2.charge for n1, n2 in self.matched_pairs)
    return net_charge

get_matched_with_diff_q

get_matched_with_diff_q()

Returns a list of matched atom pairs that have a different q, sorted in the descending order (the first pair has the largest q diff).

Source code in ties/topology_superimposer.py

def get_matched_with_diff_q(self):
    """
    Returns a list of matched atom pairs that have a different q,
    sorted in the descending order (the first pair has the largest q diff).
    """
    diff_q = [
        (n1, n2)
        for n1, n2 in self.matched_pairs
        if np.abs(n1.united_charge - n2.united_charge) > 0
    ]
    return sorted(
        diff_q,
        key=lambda p: abs(p[0].united_charge - p[1].united_charge),
        reverse=True,
    )

apply_net_charge_filter

apply_net_charge_filter(net_charge_threshold)

Averaging the charges across paired atoms introduced inequalities. Check if the sum of the inequalities in charges is below net_charge. If not, remove pairs until that net_charge is met. Which pairs are removed depends on the approach. Greedy removal of the pairs with the highest difference can create disjoint blocks which creates issues in themselves.

Specifically, create copies for each strategy here and try a couple of them.

Returns: a new suptop where the net_charge_threshold is enforced.

Source code in ties/topology_superimposer.py

def apply_net_charge_filter(self, net_charge_threshold):
    """
    Averaging the charges across paired atoms introduced inequalities.
    Check if the sum of the inequalities in charges is below net_charge.
    If not, remove pairs until that net_charge is met.
    Which pairs are removed depends on the approach.
    Greedy removal of the pairs with the highest difference
    can create disjoint blocks which creates issues in themselves.

    # Specifically, create copies for each strategy here and try a couple of them.
    Returns: a new suptop where the net_charge_threshold is enforced.
    """

    approaches = [
        "greedy",
        "terminal_alch_linked",
        "terminal",
        "alch_linked",
        "leftovers",
        "smart",
    ]
    rm_disjoint_at_each_step = [True, False]

    # best configuration info
    best_approach = None
    suptop_size = -1
    rm_disjoint_each_step_conf = False

    # try all confs
    for rm_disjoint_each_step in rm_disjoint_at_each_step:
        for approach in approaches:
            # make a shallow copy of the suptop
            next_approach = copy.copy(self)
            # first overall
            if rm_disjoint_each_step:
                next_approach.largest_cc_survives(verbose=False)

            # try the strategy
            while np.abs(next_approach.get_net_charge()) > net_charge_threshold:
                best_candidate_with_h = next_approach._smart_netqtol_pair_picker(
                    approach
                )
                for pair in best_candidate_with_h:
                    next_approach.remove_node_pair(pair)

                if rm_disjoint_each_step:
                    next_approach.largest_cc_survives(verbose=False)

            # regardless of whether the continuous disjoint removal is being tried or not,
            # it will be applied at the end
            # so apply it here at the end in order to make this comparison equivalent
            next_approach.largest_cc_survives(verbose=False)

            if len(next_approach) > suptop_size:
                suptop_size = len(next_approach)
                best_approach = approach
                rm_disjoint_each_step_conf = rm_disjoint_each_step

    # apply the best strategy to this suptop
    logger.debug(
        f"Pair removal strategy (q net tol): {best_approach} with disjoint CC removed at each step: {rm_disjoint_each_step_conf}"
    )

    total_diff = 0
    if rm_disjoint_each_step_conf:
        self.largest_cc_survives()
    while np.abs(self.get_net_charge()) > net_charge_threshold:
        best_candidate_with_h = self._smart_netqtol_pair_picker(best_approach)

        # remove them
        for pair in best_candidate_with_h:
            self.remove_node_pair(pair)
            diff_q_pairs = abs(pair[0].united_charge - pair[1].united_charge)
            # add to the list of removed because of the net charge
            self._removed_due_to_net_charge.append([pair, diff_q_pairs])
            total_diff += diff_q_pairs

        if rm_disjoint_each_step_conf:
            self.largest_cc_survives()

    return total_diff

remove_attached_hydrogens

remove_attached_hydrogens(node_pair)

The node_pair to which these hydrogens are attached was removed. Remove the dangling hydrogens.

Check if these hydrogen are matched/superimposed. If that is the case. Remove the pairs.

Note that if the hydrogens are paired and attached to node_pairA, they have to be attached to node_pairB, as a rule of being a match.

Source code in ties/topology_superimposer.py

def remove_attached_hydrogens(self, node_pair):
    """
    The node_pair to which these hydrogens are attached was removed.
    Remove the dangling hydrogens.

    Check if these hydrogen are matched/superimposed. If that is the case. Remove the pairs.

    Note that if the hydrogens are paired and attached to node_pairA,
    they have to be attached to node_pairB, as a rule of being a match.
    """

    # skip if no hydrogens found
    if node_pair not in self.matched_pairs_bonds:
        return []

    attached_pairs = self.matched_pairs_bonds[node_pair]

    removed_pairs = []
    for pair, bond_types in list(attached_pairs):
        # ignore non hydrogens
        if not pair[0].element == "H":
            continue

        self.remove_node_pair(pair)
        logger.debug(f"Removed dangling hydrogen pair: {pair}")
        removed_pairs.append(pair)
    return removed_pairs

find_lowest_rmsd_mirror

find_lowest_rmsd_mirror()

Walk through the different mirrors and out of all options select the one that has the lowest RMSD. This way we increase the chance of getting a better match. However, long term it will be necessary to use the dihedrals to ensure that we match the atoms better.

Source code in ties/topology_superimposer.py

def find_lowest_rmsd_mirror(self):
    """
    Walk through the different mirrors and out of all options select the one
    that has the lowest RMSD. This way we increase the chance of getting a better match.
    However, long term it will be necessary to use the dihedrals to ensure that we match
    the atoms better.
    """
    # fixme - you have to also take into account the "weird / other symmetries" besides mirrors
    winner = self
    lowest_rmsd = self.rmsd()
    for mirror in self.mirrors:
        mirror_rmsd = mirror.rmsd()
        if mirror_rmsd < lowest_rmsd:
            lowest_rmsd = mirror_rmsd
            winner = mirror

    if self is winner:
        # False here means that it is not a mirror
        return lowest_rmsd, self, False
    else:
        return lowest_rmsd, winner, True

is_subgraph_of_global_top

is_subgraph_of_global_top()

Check if after superimposition, one graph is a subgraph of another :return:

Source code in ties/topology_superimposer.py

def is_subgraph_of_global_top(self):
    """
    Check if after superimposition, one graph is a subgraph of another
    :return:
    """
    # check if one topology is a subgraph of another topology
    if len(self.matched_pairs) == len(self.top1) or len(self.matched_pairs) == len(
        self.top2
    ):
        return True

    return False

rmsd

rmsd()

For each pair take the distance, and then get rmsd, so root(mean(square(deviation)))

Source code in ties/topology_superimposer.py

def rmsd(self):
    """
    For each pair take the distance, and then get rmsd, so root(mean(square(deviation)))
    """

    assert len(self.matched_pairs) > 0

    dsts = []
    for atomA, atomB in self.matched_pairs:
        dst = np.sqrt(np.sum(np.square((atomA.position - atomB.position))))
        dsts.append(dst)
    return np.sqrt(np.mean(np.square(dsts)))

link_pairs

link_pairs(from_pair, pairs)

This helps take care of the bonds.

Source code in ties/topology_superimposer.py

def link_pairs(self, from_pair, pairs):
    """
    This helps take care of the bonds.
    """
    assert from_pair in self.matched_pairs_bonds
    for pair, bond_types in pairs:
        # the parent pair should have its list of pairs
        assert pair in self.matched_pairs_bonds, f"not found pair {pair}"

        # link X-Y
        self.matched_pairs_bonds[from_pair].add((pair, bond_types))
        # link Y-X
        self.matched_pairs_bonds[pair].add((from_pair, bond_types))

find_mirror_choices

find_mirror_choices()

For each pair (A1, B1) find all the other options in the mirrors where (A1, B2)

ie Ignore (X, B1) search, if we repair from A to B, then B to A should be repaired too

fixme - is this still necessary if we are traversing all paths?

Source code in ties/topology_superimposer.py

def find_mirror_choices(self):
    """
    For each pair (A1, B1) find all the other options in the mirrors where (A1, B2)
    # ie Ignore (X, B1) search, if we repair from A to B, then B to A should be repaired too

    # fixme - is this still necessary if we are traversing all paths?
    """
    choices = {}
    for A1, B1 in self.matched_pairs:
        options_for_a1 = []
        for mirror in self.mirrors:
            for A2, B2 in mirror.matched_pairs:
                if A1 is A2 and B1 is not B2:
                    options_for_a1.append(B2)

        if options_for_a1:
            options_for_a1.insert(0, B1)
            choices[A1] = options_for_a1

    return choices

add_alternative_mapping

add_alternative_mapping(weird_symmetry)

This means that there is another way to traverse and overlap the two molecules, but that the self is better (e.g. lower rmsd) than the other one

Source code in ties/topology_superimposer.py

def add_alternative_mapping(self, weird_symmetry):
    """
    This means that there is another way to traverse and overlap the two molecules,
    but that the self is better (e.g. lower rmsd) than the other one
    """
    self.alternative_mappings.append(weird_symmetry)

correct_for_coordinates

correct_for_coordinates()

Use the coordinates of the atoms, to figure out which symmetries are the correct ones. Rearrange so that the overall topology represents the one that has appropriate coordinates, whereas all the mirrors represent the other poor matches.

fixme - ensure that each node is used only once at the end

Source code in ties/topology_superimposer.py

def correct_for_coordinates(self):
    """
    Use the coordinates of the atoms, to figure out which symmetries are the correct ones.
    Rearrange so that the overall topology represents the one that has appropriate coordinates,
    whereas all the mirrors represent the other poor matches.

    # fixme - ensure that each node is used only once at the end
    """

    # check if you have coordinates
    # fixme - rn we have it, check

    # superimpose the coordinates, ensure a good match
    # fixme - this was done before, so let's leave this way for now

    # fixme - consider putting this conf as a mirror, and then modifying this

    # check which are preferable for each of the mirrors
    # we have to match mirrors to each other, ie say we have (O1=O3) and (O2=O4)
    # we should find the mirror matching (O1=O4) and (O2=O3)
    # so note that we have a closure here: All 4 atoms are used in both cases, and each time are paired differently.
    # So this is how we defined the mirror - and therefore we can reduce this issue to the minimal mirrors.
    # fixme - is this a cycle? O1-O3-O2-O4-O1
    # Let's try to define a chain: O1 =O3, and O1 =O4, and O2 is =O3 or =O4
    # So we have to define how to find O1 matching to different parts, and then decide
    choices_mapping = self.find_mirror_choices()

    # fixme - rewrite this method to eliminate one by one the hydrogens that fit in perfectly,
    # some of them will have a plural significant match, while others might be hazy,
    # so we have to eliminate them one by one, searching the best matches and then eliminating them

    removed_nodes = set()
    for A1, choices in choices_mapping.items():
        # remove the old tuple
        # fixme - not sure if this is the right way to go,
        # but we break all the rules when applying this simplistic strategy
        self.remove_node_pair((A1, choices[0]))
        removed_nodes.add(A1)
        removed_nodes.add(choices[0])

    shortest_dsts = []

    added_nodes = set()

    # better matches
    # for each atom that mismatches, scan all molecules and find the best match and eliminate it
    blacklisted_bxs = []
    for _ in range(len(choices_mapping)):
        # fixme - optimisation of this could be such that if they two atoms are within 0.2A or something
        # then they are straight away fixed
        closest_dst = 9999999
        closest_a1 = None
        closest_bx = None
        for A1, choices in choices_mapping.items():
            # so we have several choices for A1, and now naively we are taking the one that is closest, and
            # assuming the superimposition is easy, this would work

            # FIXME - you cannot use simply distances, if for A1 and A2 the best is BX, then BX there should be
            # rules for that
            for BX in choices:
                if BX in blacklisted_bxs:
                    continue
                # use the distance_array because of PBC correction and speed
                a1_bx_dst = np.sqrt(np.sum(np.square(A1.position - BX.position)))
                if a1_bx_dst < closest_dst:
                    closest_dst = a1_bx_dst
                    closest_bx = BX
                    closest_a1 = A1

        # across all the possible choices, found the best match now:
        blacklisted_bxs.append(closest_bx)
        shortest_dsts.append(closest_dst)
        logger.debug(f"{closest_a1.name} is matching best with {closest_bx.name}")

        # remove the old tuple and insert the new one
        self.add_node_pair((closest_a1, closest_bx))
        added_nodes.add(closest_a1)
        added_nodes.add(closest_bx)
        # remove from consideration
        del choices_mapping[closest_a1]
        # blacklist

    # fixme - check that the added and the removed nodes are the same set
    assert removed_nodes == added_nodes

    # this is the corrected region score (there might not be any)
    if len(shortest_dsts) != 0:
        avg_dst = np.mean(shortest_dsts)
    else:
        # fixme
        avg_dst = 0

    return avg_dst

is_area_overlapping_fully

is_area_overlapping_fully(l_atoms, r_atoms)

Each atom in one set has to be matched to an atom in the second set. And vice versa.

:param l_atoms: :param r_atoms: :return:

Source code in ties/topology_superimposer.py

def is_area_overlapping_fully(self, l_atoms, r_atoms):
    """
    Each atom in one set has to be matched to an atom in the second set. And vice versa.

    :param l_atoms:
    :param r_atoms:
    :return:
    """
    if len(l_atoms) != len(r_atoms):
        return False

    for atom in l_atoms:
        if not self.contains_node(atom):
            return False

        _, matched_r = self.get_pair_with_atom(atom)
        if matched_r not in r_atoms:
            return False

    return True

is_area_overlapping

is_area_overlapping(l_atoms, r_atoms)

Even a small overlap will return True.

:param l_atoms: :param r_atoms: :return:

Source code in ties/topology_superimposer.py

def is_area_overlapping(self, l_atoms, r_atoms):
    """
    Even a small overlap will return True.

    :param l_atoms:
    :param r_atoms:
    :return:
    """

    for atom in l_atoms:
        _, matched_r = self.get_pair_with_atom(atom)
        if matched_r in r_atoms:
            return True

    return False

enforce_no_partial_rings

enforce_no_partial_rings()

Ensure that rings are either fully matched, or not matched with anything at all.

Source code in ties/topology_superimposer.py

def enforce_no_partial_rings(self):
    """
    Ensure that rings are either fully matched,
    or not matched with anything at all.
    """

    # circles from the original ligands
    l_cycles, r_cycles = self.get_original_circles()

    # keep track of the fully matched cycles
    atoms_in_good_cycles = set()

    # find the fully matched cycles
    for l_cycle in l_cycles[::-1]:
        for r_cycle in r_cycles[::-1]:
            if not self.is_area_overlapping_fully(l_cycle, r_cycle):
                continue

            # these rings are matched perfectly
            l_cycles.remove(l_cycle)
            r_cycles.remove(r_cycle)
            atoms_in_good_cycles.update(l_cycle)
            atoms_in_good_cycles.update(r_cycle)

    def remove_partial_rings(circles):
        for atom in [atom for sublist in circles for atom in sublist]:
            # account for the fused rings
            # the correct part of the fused ring should remain untouched
            if atom in atoms_in_good_cycles:
                continue

            if not self.contains_node(atom):
                continue

            left, right = self.get_pair_with_atom(atom)
            self._remove_unmatched_ring_atom(right)
            self._remove_unmatched_ring_atom(left)

    # remove any other matched ring atoms
    # at this point we cannot be certain what they are matched to
    remove_partial_rings(l_cycles)
    remove_partial_rings(r_cycles)

get_topology_similarity_score

get_topology_similarity_score()

Having the superimposed A(Left) and B(Right), score the match. This is a rather naive approach. It compares A-B match by checking if any of the node X and X' in A and B have a bond to another node Y that is not present in A-B, but that is directly reachable from X and X' in a similar way. We ignore the charge of Y and focus here only on the topology.

For every "external bond" from the component we try to see if topologically it scores well. So for any matched pair, we extend the topology and the score is equal to the size of such an component. Then we do this for all other matching nodes and sum the score.

fixme - maybe you should use the entire graphs in order to see if this is good or not?

so the simpler approach is to ignore charges for a second to only understand the relative place in the topology, in other words, the question is, how similar are two nodes A and B vs A and C? let's traverse A and B together, and then A and C together, and while doing that, ignore the charges. In this case, A and B could get together 20 parts, whereas A and C traverses together 22 parts, meaning that topologically, it is a more suitable one, because it closer corresponds to the actual atom. Note that this approach has problem: - you can imagine A and B traversing where B is in a completely wrong global place, but it happens to have a bigger part common to A, than C which globally is correct. Answer to this: at the same time, ideally B would be excluded, because it should have been already matched to another topology.

Alternative approach: take into consideration other components and the distance from this component to them. Specifically, allows mismatches

FIXME - allow flexible mismatches. Meaning if someone mutates one bonded atom, then it might be noticed that

Source code in ties/topology_superimposer.py

def get_topology_similarity_score(self):
    """
    Having the superimposed A(Left) and B(Right), score the match.
    This is a rather naive approach. It compares A-B match by checking
    if any of the node X and X' in A and B have a bond to another node Y that is
    not present in A-B, but that is directly reachable from X and X' in a similar way.
    We ignore the charge of Y and focus here only on the topology.

    For every "external bond" from the component we try to see if topologically it scores well.
    So for any matched pair, we extend the topology and the score is equal to the size of
    such an component. Then we do this for all other matching nodes and sum the score.

    # fixme - maybe you should use the entire graphs in order to see if this is good or not?
    so the simpler approach is to ignore charges for a second to only understand the relative place in the topology,
    in other words, the question is, how similar are two nodes A and B vs A and C? let's traverse A and B together,
    and then A and C together, and while doing that, ignore the charges. In this case, A and B could
    get together 20 parts, whereas A and C traverses together 22 parts, meaning that topologically,
    it is a more suitable one, because it closer corresponds to the actual atom.
    Note that this approach has problem:
    - you can imagine A and B traversing where B is in a completely wrong global place, but it
    happens to have a bigger part common to A, than C which globally is correct. Answer to this:
    at the same time, ideally B would be excluded, because it should have been already matched to another
    topology.

    Alternative approach: take into consideration other components and the distance from this component
    to them. Specifically, allows mismatches

    FIXME - allow flexible mismatches. Meaning if someone mutates one bonded atom, then it might be noticed
    that
    """
    overall_score = 0
    for node_a, node_b in self.matched_pairs:
        # for every neighbour in Left
        for a_bond in node_a.bonds:
            # if this bonded atom is present in this superimposed topology (or component), ignore
            # fixme - surely this can be done better, you could have "contains this atom or something"
            in_this_sup_top = False
            for other_a, _ in self.matched_pairs:
                if a_bond.atom == other_a:
                    in_this_sup_top = True
                    break
            if in_this_sup_top:
                continue

            # a candidate is found that could make the node_a and node_b more similar,
            # so check if it is also present in node_b,
            # ignore the charges to focus only on the topology and put aside the parameterisation
            for b_bond in node_b.bonds:
                # fixme - what if the atom is mutated into a different atom? we have to be able
                # to relies on other measures than just this one, here the situation is that the topology
                # is enough to answer the question (because only charges were modified),
                # however, this gets more tricky
                # fixme - hardcoded
                score = len(_overlay(a_bond.atom, b_bond.atom))

                # this is a purely topology based score, the bigger the overlap the better the match
                overall_score += score

            # check if the neighbour points to any node X that is not used in Left,

            # if node_b leads to the same node X
    return overall_score

unmatch_pairs_with_different_charges

unmatch_pairs_with_different_charges(atol)

Removes the matched pairs where atom charges are more different than the provided absolute tolerance atol (units in Electrons).

remove_dangling_h: After removing any pair it also removes any bound hydrogen(s).

Source code in ties/topology_superimposer.py

def unmatch_pairs_with_different_charges(self, atol):
    """
    Removes the matched pairs where atom charges are more different
    than the provided absolute tolerance atol (units in Electrons).

    remove_dangling_h: After removing any pair it also removes any bound hydrogen(s).
    """
    removed_hydrogen_pairs = []
    for node1, node2 in self.matched_pairs[::-1]:
        if (
            node1.united_eq(node2, atol=atol)
            or (node1, node2) in removed_hydrogen_pairs
        ):
            continue

        # remove this pair
        # use full logging for this kind of information
        # print('Q: removing nodes', (node1, node2)) # to do - consider making this into a logging feature
        self.remove_node_pair((node1, node2))

        # keep track of the removed atoms due to the charge
        self._removed_pairs_with_charge_difference.append(
            ((node1, node2), math.fabs(node2.united_charge - node1.united_charge))
        )

        # Removed functionality: remove the dangling hydrogens
        removed_h_pairs = self.remove_attached_hydrogens((node1, node2))
        removed_hydrogen_pairs.extend(removed_h_pairs)
        for h_pair in removed_h_pairs:
            self._removed_pairs_with_charge_difference.append((h_pair, "dangling"))

    # sort the removed in a descending order
    self._removed_pairs_with_charge_difference.sort(
        key=lambda x: x[1], reverse=True
    )

    return self._removed_pairs_with_charge_difference

is_consistent_with

is_consistent_with(suptop)

Conditions

There should be a minimal overlap of at least 1 node.
There is no pair (Na=Nb) in this sup top such that (Na=Nc) or (Nb=Nc) for some Nc in the other suptop.
The number of cycles in this suptop and the other suptop must be the same (?removing for now, fixme)
merging cannot lead to new cycles?? (fixme). What is the reasoning behind this? I mean, I guess the assumption is that, if the cycles were compatible, they would be created during the search, rather than now while merging. ??

Source code in ties/topology_superimposer.py

def is_consistent_with(self, suptop):
    """
    Conditions:
        - There should be a minimal overlap of at least 1 node.
        - There is no pair (Na=Nb) in this sup top such that (Na=Nc) or (Nb=Nc) for some Nc in the other suptop.
        - The number of cycles in this suptop and the other suptop must be the same (?removing for now, fixme)
        - merging cannot lead to new cycles?? (fixme). What is the reasoning behind this?
            I mean, I guess the assumption is that, if the cycles were compatible,
            they would be created during the search, rather than now while merging. ??
    """

    # confirm that there is no mismatches, ie (A=B) in suptop1 and (A=C) in suptop2 where (C!=B)
    for st1Na, st1Nb in self.matched_pairs:
        for st2Na, st2Nb in suptop.matched_pairs:
            if (
                (st1Na is st2Na)
                and st1Nb is not st2Nb
                or (st1Nb is st2Nb)
                and st1Na is not st2Na
            ):
                return False

    # ensure there is at least one common pair
    if self.count_common_node_pairs(suptop) == 0:
        return False

    # why do we need this?
    # if not self.is_consistent_cycles(suptop):
    #     return False

    return True

get_circles

get_circles()

Return circles found in the matched pairs.

Source code in ties/topology_superimposer.py

def get_circles(self):
    """
    Return circles found in the matched pairs.
    """
    gl, gr = self.get_nx_graphs()
    gl_circles = [set(circle) for circle in nx.cycle_basis(gl)]
    gr_circles = [set(circle) for circle in nx.cycle_basis(gr)]
    return gl_circles, gr_circles

get_original_circles

get_original_circles()

Return the original circles present in the input topologies.

Source code in ties/topology_superimposer.py

def get_original_circles(self):
    """
    Return the original circles present in the input topologies.
    """
    # create a circles
    l_original = self._get_original_circle(self.top1)
    r_original = self._get_original_circle(self.top2)

    l_circles = [set(circle) for circle in nx.cycle_basis(l_original)]
    r_circles = [set(circle) for circle in nx.cycle_basis(r_original)]
    return l_circles, r_circles

cycle_spans_multiple_cycles

cycle_spans_multiple_cycles()

What is the circle is shared? We are using cycles which excluded atoms that join different rings. fixme - could this lead to a special case?

Source code in ties/topology_superimposer.py

def cycle_spans_multiple_cycles(self):
    # This filter checks whether a newly created suptop cycle spans multiple cycles
    # this is one of the filters (#106)
    # fixme - should this be applied whenever we work with more than 1 cycle?
    # it checks whether any cycles in the left molecule,
    # is paired with more than one cycle in the right molecule
    """
    What is the circle is shared?
    We are using cycles which excluded atoms that join different rings.
    fixme - could this lead to a special case?
    """

    for l_cycle in self._nonoverlapping_l_cycles:
        overlap_counter = 0
        for r_cycle in self._nonoverlapping_r_cycles:
            # check if the cycles overlap
            if self._cycles_overlap(l_cycle, r_cycle):
                overlap_counter += 1

        if overlap_counter > 1:
            return True

    for r_cycle in self._nonoverlapping_r_cycles:
        overlap_counter = 0
        for l_cycle in self._nonoverlapping_l_cycles:
            # check if the cycles overlap
            if self._cycles_overlap(l_cycle, r_cycle):
                overlap_counter += 1

        if overlap_counter > 1:
            return True

    return False

merge

merge(suptop)

Absorb the other suptop by adding all the node pairs that are not present in the current sup top.

WARNING: ensure that the other suptop is consistent with this sup top.

Source code in ties/topology_superimposer.py

def merge(self, suptop):
    """
    Absorb the other suptop by adding all the node pairs that are not present
    in the current sup top.

    WARNING: ensure that the other suptop is consistent with this sup top.
    """
    # assert self.is_consistent_with(suptop)

    # print("About the merge two sup tops")
    # self.print_summary()
    # other_suptop.print_summary()

    merged_pairs = []
    for pair in suptop.matched_pairs:
        # check if this pair is present
        if not self.contains(pair):
            n1, n2 = pair
            if self.contains_node(n1) or self.contains_node(n2):
                raise Exception("already uses that node")
            # pass the bonded pairs here
            self.add_node_pair(pair)
            merged_pairs.append(pair)
    # after adding all the nodes, now add the bonds
    for pair in merged_pairs:
        # add the connections
        bonded_pairs = suptop.matched_pairs_bonds[pair]
        assert len(bonded_pairs) > 0
        self.link_pairs(pair, bonded_pairs)

    # removed from the "merged" the ones that agree, so it contains only the new stuff
    # to make it easier to read
    self.nodes_added_log.append(("merged with", merged_pairs))

    # check for duplication, fixme - temporary
    return merged_pairs

validate_charges `staticmethod`

validate_charges(atom_list_l, atom_list_right)

Check the original charges: - ensure that the total charge of L and R are integers - ensure that they are equal to the same integer

Source code in ties/topology_superimposer.py

@staticmethod
def validate_charges(atom_list_l, atom_list_right):
    """
    Check the original charges:
    - ensure that the total charge of L and R are integers
    - ensure that they are equal to the same integer
    """
    whole_left_charge = sum(a.charge for a in atom_list_l)
    np.testing.assert_almost_equal(
        whole_left_charge,
        round(whole_left_charge),
        decimal=2,
        err_msg=f"left charges are not integral. Expected {round(whole_left_charge)}"
        f" but found {whole_left_charge}",
    )

    whole_right_charge = sum(a.charge for a in atom_list_right)
    np.testing.assert_almost_equal(
        whole_right_charge,
        round(whole_right_charge),
        decimal=2,
        err_msg=f"right charges are not integral. Expected {round(whole_right_charge)}"
        f" but found {whole_right_charge}",
    )
    # same integer
    np.testing.assert_almost_equal(whole_left_charge, whole_right_charge, decimal=2)

    return round(whole_left_charge)

redistribute_charges

redistribute_charges()

After the match is made and the user commits to the superimposed topology, the charges can be revised. We calculate the average charges between every match, and check how that affects the rest of the molecule (the unmatched atoms). Then, we distribute the charges to the unmatched atoms to get the net charge as a whole number/integer.

This function should be called after removing the matches for whatever reason. ie at the end of anything that could modify the atom pairing.

Source code in ties/topology_superimposer.py

def redistribute_charges(self):
    """
    After the match is made and the user commits to the superimposed topology,
    the charges can be revised.
    We calculate the average charges between every match, and check how that affects
    the rest of the molecule (the unmatched atoms).
    Then, we distribute the charges to the unmatched atoms to get
    the net charge as a whole number/integer.

    This function should be called after removing the matches for whatever reason.
    ie at the end of anything that could modify the atom pairing.
    """

    SuperimposedTopology.validate_charges(self.top1, self.top2)

    # find the integral net charge of the molecule
    net_charge = round(sum(a.charge for a in self.top1))
    net_charge_test = round(sum(a.charge for a in self.top2))
    if net_charge != net_charge_test:
        raise Exception(
            "The internally computed net charges of the molecules are different"
        )
    # fixme - use the one passed by the user?
    logger.debug(f"Internally computed net charge: {net_charge}")

    # the total charge in the matched region before the changes
    matched_total_charge_l = sum(  # noqa: F841
        left.charge for left, right in self.matched_pairs
    )
    matched_total_charge_r = sum(  # noqa: F841
        right.charge for left, right in self.matched_pairs
    )

    # get the unmatched atoms in Left and Right
    l_unmatched = self.get_disappearing_atoms()
    r_unmatched = self.get_appearing_atoms()

    init_q_dis = sum(a.charge for a in l_unmatched)
    init_q_app = sum(a.charge for a in r_unmatched)
    logger.debug(
        f"Initial cumulative charge of the appearing={init_q_app:.6f}, disappearing={init_q_dis:.6f} "
        f"alchemical regions"
    )

    # average the charges between matched atoms in the joint area of the dual topology
    total_charge_matched = (
        0  # represents the net charge of the joint area minus molecule charge
    )
    for left, right in self.matched_pairs:
        avg_charge = (left.charge + right.charge) / 2.0
        # write the new charge
        left.charge = right.charge = avg_charge
        total_charge_matched += avg_charge
    # total_partial_charge_matched e.g. -0.9 (partial charges) - -1 (net molecule charge) = 0.1
    total_partial_charge_matched = total_charge_matched - net_charge
    logger.debug(
        f"Total partial charge in the joint area = {total_partial_charge_matched:.6f}"
    )

    # calculate what the correction should be in the alchemical regions
    r_delta_charge_total = -(total_partial_charge_matched + init_q_app)
    l_delta_charge_total = -(total_partial_charge_matched + init_q_dis)
    logger.debug(
        f"Total charge imbalance to be distributed in "
        f"dis={l_delta_charge_total:.6f} and app={r_delta_charge_total:.6f}"
    )

    if len(l_unmatched) == 0 and l_delta_charge_total != 0:
        logger.error(
            "----------------------------------------------------------------------------------------------"
        )
        logger.error(
            "ERROR? AFTER AVERAGING CHARGES, THERE ARE NO UNMATCHED ATOMS TO ASSIGN THE CHARGE TO: "
            "left ligand."
        )
        logger.error(
            "----------------------------------------------------------------------------------------------"
        )
    if len(r_unmatched) == 0 and r_delta_charge_total != 0:
        logger.error(
            "----------------------------------------------------------------------------------------------"
        )
        logger.error(
            "ERROR? AFTER AVERAGING CHARGES, THERE ARE NO UNMATCHED ATOMS TO ASSIGN THE CHARGE TO: "
            "right ligand. "
        )
        logger.error(
            "----------------------------------------------------------------------------------------------"
        )

    # distribute the charges over the alchemical regions
    if len(l_unmatched) != 0:
        l_delta_per_atom = float(l_delta_charge_total) / len(l_unmatched)
    else:
        # fixme - no unmatching atoms, so there should be no charge to redistribute
        l_delta_per_atom = 0

    if len(r_unmatched) != 0:
        r_delta_per_atom = float(r_delta_charge_total) / len(r_unmatched)
    else:
        r_delta_per_atom = 0
        # fixme - no matching atoms, so there should be no charge to redistribute
    logger.debug(
        f"Charge imbalance per atom in dis={l_delta_per_atom:.6f} and app={r_delta_per_atom:.6f}"
    )

    # redistribute that delta q over the atoms in the left and right molecule
    for atom in l_unmatched:
        atom.charge += l_delta_per_atom
    for atom in r_unmatched:
        atom.charge += r_delta_per_atom

    # check if the appearing atoms and the disappearing atoms have the same net charge
    dis_q_sum = sum(a.charge for a in l_unmatched)
    app_q_sum = sum(a.charge for a in r_unmatched)
    logger.debug(
        f"Final cumulative charge of the appearing={app_q_sum:.6f}, disappearing={dis_q_sum:.6f} "
        f"alchemical regions"
    )
    if not np.isclose(dis_q_sum, app_q_sum):
        logger.error(
            "The partial charges in app/dis region are not equal to each other. "
        )
        raise Exception(
            "The alchemical region in app/dis do not have equal partial charges."
        )

    # note that we are really modifying right now the original nodes.
    SuperimposedTopology.validate_charges(self.top1, self.top2)

contains_same_atoms_symmetric

contains_same_atoms_symmetric(other_sup_top)

The atoms can be paired differently, but they are the same.

Source code in ties/topology_superimposer.py

def contains_same_atoms_symmetric(self, other_sup_top):
    """
    The atoms can be paired differently, but they are the same.
    """
    if len(self.nodes.symmetric_difference(other_sup_top.nodes)) == 0:
        return True

    return False

is_subgraph_of

is_subgraph_of(other_sup_top)

Checks if this superimposed topology is a subgraph of another superimposed topology. Or if any mirror topology is a subgraph.

Source code in ties/topology_superimposer.py

def is_subgraph_of(self, other_sup_top):
    """
    Checks if this superimposed topology is a subgraph of another superimposed topology.
    Or if any mirror topology is a subgraph.
    """
    # subgraph cannot be equivalent self.eq, it is only proper subgraph (ie proper subset)
    if len(self.matched_pairs) >= len(other_sup_top.matched_pairs):
        return False

    # self is smaller, so it might be a subgraph
    if other_sup_top.contains_all(self):
        return True

    # self is not a subgraph, but it could be a subgraph of one of the mirrors
    for mirror in self.mirrors:
        if other_sup_top.contains_all(mirror):
            return True

    # other is bigger than self, but not a subgraph of self
    return False

subgraph_relationship

subgraph_relationship(other_sup_top)

Return 1 if self is a supergraph of other, -1 if self is a subgraph of other 0 if they have the same number of elements (regardless of what the nodes are)

Source code in ties/topology_superimposer.py

def subgraph_relationship(self, other_sup_top):
    """
    Return
    1 if self is a supergraph of other,
    -1 if self is a subgraph of other
    0 if they have the same number of elements (regardless of what the nodes are)
    """
    if len(self.matched_pairs) == len(other_sup_top.matched_pairs):
        return 0

    if len(self.matched_pairs) > len(other_sup_top.matched_pairs):
        # self is bigger than other,
        # check if self contains all nodes in other
        if self.contains_all(other_sup_top):
            return 1

        # other is not a subgraph, but check the mirrors if any of them are
        for mirror in self.mirrors:
            if mirror.contains_all(other_sup_top):
                return 1

        # other is smaller but not a subgraph of this graph or any of its mirrors
        return 0

    if len(self.matched_pairs) < len(other_sup_top.matched_pairs):
        # other is bigger, so self might be a subgraph
        # check if other contains all nodes in self
        if other_sup_top.contains_all(self):
            return -1

        # self is not a subgraph, but it could be a subgraph of one of the mirrors
        for mirror in self.mirrors:
            if other_sup_top.contains_all(mirror):
                return -1

        # other is bigger than self, but it is not a subgraph
        return 0

is_mirror_of

is_mirror_of(other_sup_top)

this is a naive check fixme - check if the found superimposed topology is the same (ie the same matches), what then?

some of the superimposed topologies represent symmetrical matches, for example, imagine T1A and T1B is a symmetrical version of T2A and T2B, this means that - the number of nodes in T1A, T1B, T2A, and T2B is the same - all the nodes in T1A are in T2A, - all the nodes in T1B are in T2B

Source code in ties/topology_superimposer.py

def is_mirror_of(self, other_sup_top):
    """
    this is a naive check
    fixme - check if the found superimposed topology is the same (ie the same matches), what then?

    some of the superimposed topologies represent symmetrical matches,
    for example, imagine T1A and T1B is a symmetrical version of T2A and T2B,
    this means that
     - the number of nodes in T1A, T1B, T2A, and T2B is the same
     - all the nodes in T1A are in T2A,
     - all the nodes in T1B are in T2B
    """

    if len(self.matched_pairs) != len(other_sup_top.matched_pairs):
        return False

    if self.contains_same_atoms_symmetric(other_sup_top):
        return True

    return False

eq

eq(sup_top)

Check if the superimposed topology is "the same". This means that every pair has a corresponding pair in the other topology (but possibly in a different order)

Source code in ties/topology_superimposer.py

def eq(self, sup_top):
    """
    Check if the superimposed topology is "the same". This means that every pair has a corresponding pair in the
    other topology (but possibly in a different order)
    """
    # fixme - should replace this with networkx
    if len(self) != len(sup_top):
        return False

    for pair in self.matched_pairs:
        # find for every pair the matching pair
        if not sup_top.contains(pair):
            return False

    return True

toJSON

toJSON()

" Extract all the important information and return a json string.

Source code in ties/topology_superimposer.py

def toJSON(self):
    """ "
    Extract all the important information and return a json string.
    """
    summary = {}

    if self.config.unique_atom_names:
        # renamed atoms, new name : old name
        summary["renamed_atoms"] = {
            "start_ligand": {(a.name, a.id): a.original_name for a in self.top1},
            "end_ligand": {(a.name, a.id): a.original_name for a in self.top2},
        }

    # the dual topology information
    summary["superimposition"] = {
        "matched": {str(n1): str(n2) for n1, n2 in self.matched_pairs},
        "matched_id": {n1.id: n2.id for n1, n2 in self.matched_pairs},
        "appearing": list(map(str, self.get_appearing_atoms())),
        "disappearing": [str(a) for a in self.get_disappearing_atoms()],
        "appearing_id": [a.id for a in self.get_appearing_atoms()],
        "disappearing_id": [a.id for a in self.get_disappearing_atoms()],
        "removed": {  # because of:
            # replace atoms with their names
            "net_charge": [
                ((a1.name, a2.name), d)
                for (a1, a2), d in self._removed_due_to_net_charge
            ],
            "net_charge_id": [
                ((a1.id, a2.id), d)
                for (a1, a2), d in self._removed_due_to_net_charge
            ],
            "pair_q": [
                ((a1.name, a2.name), d)
                for (a1, a2), d in self._removed_pairs_with_charge_difference
            ],
            "pair_q_id": [
                ((a1.id, a2.id), d)
                for (a1, a2), d in self._removed_pairs_with_charge_difference
            ],
            "disjointed": [
                ((a1.name, a2.name),)
                for a1, a2 in self._removed_because_disjointed_cc
            ],
            "disjointed_id": [
                ((a1.id, a2.id),) for a1, a2 in self._removed_because_disjointed_cc
            ],
            "bonds": [
                ((a1.name, a2.name), d)
                for (a1, a2), d in self._removed_because_diff_bonds
            ],
            "unmatched_rings": [
                ((a1.name, a2.name), d)
                for (a1, a2), d in self._removed_because_unmatched_rings
            ],
        },
        "charges_delta": {
            "start_ligand": {
                a.name: a.charge - a._original_charge
                for a in self.top1
                if a._original_charge != a.charge
            },
            "end_ligand": {
                a.name: a.charge - a._original_charge
                for a in self.top2
                if a._original_charge != a.charge
            },
        },
    }
    summary["config"] = self.config.get_serializable()
    summary["internal"] = "atoms"

    return summary

get_largest

get_largest(lists)

return a list of largest solutions

Source code in ties/topology_superimposer.py

def get_largest(lists):
    """
    return a list of largest solutions
    """
    solution_sizes = [len(st) for st in lists]
    largest_sol_size = max(solution_sizes)
    return list(filter(lambda st: len(st) == largest_sol_size, lists))

long_merge

long_merge(suptop1, suptop2)

Carry out a merge and apply all checks. Merge suptop2 into suptop1.

Source code in ties/topology_superimposer.py

def long_merge(suptop1, suptop2):
    """
    Carry out a merge and apply all checks.
    Merge suptop2 into suptop1.

    """
    if suptop1 is suptop2:
        return suptop1

    if suptop1.eq(suptop2):
        return suptop1

    if suptop2.is_subgraph_of(suptop1):
        return suptop1

    # check if the two are consistent
    # ie there is no clashes
    if not suptop1.is_consistent_with(suptop2):
        return -1

    # fixme - this can be removed because it is now taken care of in the other functions?
    # g1, g2 = suptop1.getNxGraphs()
    # assert len(nx.cycle_basis(g1)) == len(nx.cycle_basis(g2))
    # g3, g4 = suptop2.getNxGraphs()
    # assert len(nx.cycle_basis(g3)) == len(nx.cycle_basis(g4))
    #
    # assert suptop1.sameCircleNumber()
    newly_added_pairs = suptop1.merge(suptop2)

    # if not suptop1.sameCircleNumber():
    #     raise Exception('something off')
    # # remove sol2 from the solutions:
    # all_solutions.remove(sol2)
    return newly_added_pairs

merge_compatible_suptops

merge_compatible_suptops(suptops)

Imagine mapping of two carbons C1 and C2 to another pair of carbons C1' and C2'. If C1 was mapped to C1', and C2 to C2', and each craeted a suptop, then we have to join the two suptops.

fixme - appears to be doing too many combinations Consider using a queue. Add the new combinations here rather than restarting again and again. You could keep a list of "combinations" in a queue, and each time you make a new element,

Source code in ties/topology_superimposer.py

def merge_compatible_suptops(suptops):
    """
    Imagine mapping of two carbons C1 and C2 to another pair of carbons C1' and C2'.
    If C1 was mapped to C1', and C2 to C2', and each craeted a suptop, then we have to join the two suptops.

    fixme - appears to be doing too many combinations
    Consider using a queue. Add the new combinations here rather than restarting again and again.
    You could keep a list of "combinations" in a queue, and each time you make a new element,

    """

    if len(suptops) == 1:
        return suptops

    # consier simplifying in case of "2"

    # keep track of which suptops have been used to build a bigger one
    # these can be likely later discarded
    ingredients = {}
    excluded = []
    while True:
        any_new_suptop = False
        for st1, st2 in itertools.combinations(suptops, r=2):
            if {st1, st2} in excluded:
                continue

            if st1 in ingredients.get(st2, []) or st2 in ingredients.get(st1, []):
                continue

            if st1.is_subgraph_of(st2) or st2.is_subgraph_of(st1):
                continue

            # fixme - verify this one
            if st1.eq(st2):
                continue

            # check if the two suptops are compatible
            elif st1.is_consistent_with(st2):
                # merge them!
                large_suptop = copy.copy(st1)
                # add both the pairs and the bonds that are not present in st1
                large_suptop.merge(st2)
                suptops.append(large_suptop)

                ingredients[large_suptop] = {st1, st2}.union(
                    ingredients.get(st1, set())
                ).union(ingredients.get(st2, set()))
                excluded.append({st1, st2})

                # break
                any_new_suptop = True

        if not any_new_suptop:
            break

    # flatten
    all_ingredients = list(itertools.chain(*ingredients.values()))

    # return the larger suptops, but not the constituents
    new_suptops = [st for st in suptops if st not in all_ingredients]
    return new_suptops

superimpose_topologies

superimpose_topologies(top1_nodes, top2_nodes, pair_charge_atol=0.1, use_charges=True, use_coords=True, starting_node_pairs=None, force_mismatch=None, disjoint_components=False, net_charge_filter=True, net_charge_threshold=0.1, redistribute_charges_over_unmatched=True, ligA_pmd=None, ligB_pmd=None, align_molecules=True, partial_rings_allowed=False, ignore_charges_completely=False, ignore_bond_types=True, use_rmsd=True, use_general_type=True, use_only_element=False, starting_pairs_heuristics=0.2, starting_pair_seed=None, logging_key=None, config=None)

The main function that manages the entire process.

TODO: - check if each molecule topology is connected

Source code in ties/topology_superimposer.py

def superimpose_topologies(
    top1_nodes,
    top2_nodes,
    pair_charge_atol=0.1,
    use_charges=True,
    use_coords=True,
    starting_node_pairs=None,
    force_mismatch=None,
    disjoint_components=False,
    net_charge_filter=True,
    net_charge_threshold=0.1,
    redistribute_charges_over_unmatched=True,
    ligA_pmd=None,
    ligB_pmd=None,
    align_molecules=True,
    partial_rings_allowed=False,
    ignore_charges_completely=False,
    ignore_bond_types=True,
    use_rmsd=True,
    use_general_type=True,
    use_only_element=False,
    starting_pairs_heuristics=0.2,
    starting_pair_seed=None,
    logging_key=None,
    config=None,
):
    """
    The main function that manages the entire process.

    TODO:
    - check if each molecule topology is connected
    """

    if config is not None and config.logging_breakdown:
        file_log_handler = logging.FileHandler(config.workdir / f"{logging_key}.log")
        file_log_handler.setLevel(config.logging_level)
        file_log_handler.setFormatter(config.logging_formatter)
        logger.addHandler(file_log_handler)

    if not ignore_charges_completely:
        SuperimposedTopology.validate_charges(top1_nodes, top2_nodes)

    # deal with the situation where the config is not passed
    if config is None:
        weights = None
        align_add_removed_mcs = False
        use_rdkit_mcs = False
    else:
        # tmp solution
        weights = config.weights_ratio
        align_add_removed_mcs = config.align_add_removed_mcs
        use_rdkit_mcs = config.use_rdkit_mcs

    # Get the superimposed topology(/ies).
    suptops = _superimpose_topologies(
        top1_nodes,
        top2_nodes,
        ligA_pmd,
        ligB_pmd,
        starting_node_pairs=starting_node_pairs,
        use_rmsd=use_rmsd,
        use_general_type=use_general_type,
        starting_pairs_heuristics=starting_pairs_heuristics,
        starting_pairs=starting_pair_seed,
        weights=weights,
        use_rdkit_mcs=use_rdkit_mcs,
    )
    if not suptops:
        warnings.warn("Did not find a single superimposition state.")
        return None

    logger.debug(f"Phase 1: The number of SupTops found: {len(suptops)}")
    for i, st in enumerate(suptops):
        logger.debug(f"ST - {i} - len: {len(st)} - {st}")

    # ignore bond types
    # they are ignored when creating the run file with tleap anyway
    for st in suptops:
        # fixme - transition to config
        st.ignore_bond_types = ignore_bond_types

    # link the suptops to their original molecule data
    for suptop in suptops:
        # fixme - transition to config
        suptop.set_tops(top1_nodes, top2_nodes)
        suptop.set_parmeds(ligA_pmd, ligB_pmd)

    # align the 3D coordinates before applying further changes
    # use the largest suptop to align the molecules
    if align_molecules and use_rmsd:

        def take_largest(x, y):
            return x if len(x) > len(y) else y

        reduce(take_largest, suptops).align_ligands_using_mcs()
        logger.debug(
            f"RMSD of the best overlay: {suptops[0].align_ligands_using_mcs():.2f}"
        )

    # fixme - you might not need because we are now doing this on the way back
    # if useCoords:
    #     for sup_top in sup_tops:
    #         sup_top.correct_for_coordinates()

    # mismatch atoms as requested
    if force_mismatch:
        for sp in suptops:
            for a1, a2 in sp.matched_pairs[::-1]:
                if (a1.name, a2.name) in force_mismatch:
                    sp.remove_node_pair((a1, a2))
                    logger.debug(f"Removing the pair: {((a1, a2))}, as requested")

    # ensure that ring-atoms are not matched to non-ring atoms
    for st in suptops:
        st.ringring()

    # introduce exceptions to the atom type types so that certain
    # different atom types are seen as the same
    # ie allow to swap cc-cd with cd-cc (and other pairs)
    for st in suptops:
        st.match_gaff2_nondirectional_bonds()

    # remove matched atom pairs that have a different specific atom type
    if not use_only_element:
        for st in suptops:
            # fixme - rename
            st.enforce_matched_atom_types_are_the_same()

    # ensure that the bonds are used correctly.
    # If the bonds disagree, but atom types are the same, remove both bonded pairs
    # we cannot have A-B where the bonds are different. In this case, we have A-B=C and A=B-C in a ring,
    # we could in theory remove A,B,C which makes sense as these will show slightly different behaviour,
    # and this we we avoid tensions in the bonds, and represent both
    # fixme - apparently we are not relaying on these?
    # turned off as this is reflected in the atom type
    if not ignore_bond_types and False:
        for st in suptops:
            removed = st.removeMatchedPairsWithDifferentBonds()
            if not removed:
                logger.debug(f"Removed bonded pairs due to different bonds: {removed}")

    if not partial_rings_allowed:
        # remove partial rings, note this is a cascade problem if there are double rings
        for suptop in suptops:
            suptop.enforce_no_partial_rings()
            logger.debug(
                f"Removed pairs because partial rings are not allowed {suptop._removed_because_unmatched_rings}"
            )

    # note that charges need to be checked before assigning IDs.
    # ie if charges are different, the matched pair
    # becomes two different atoms with different IDs
    if use_charges and not ignore_charges_completely:
        for sup_top in suptops:
            removed = sup_top.unmatch_pairs_with_different_charges(
                atol=pair_charge_atol
            )
            if removed:
                logger.debug(
                    f"Removed pairs with charge incompatibility: "
                    f"{[(s[0], f'{s[1]:.3f}') for s in sup_top._removed_pairs_with_charge_difference]}"
                )

    if net_charge_filter and not ignore_charges_completely:
        # Note that we apply this rule to each suptop.
        # This is because we are only keeping one suptop right now.
        # However, if disjointed components are allowed, these number might change.
        # ensure that each suptop component has net charge differences < 0.1
        # Furthermore, disjointed components has not yet been applied,
        # even though it might have an effect, fixme - should disjointed be applied first?
        # to account for this implement #251
        logger.debug(f"Accounting for net charge limit of {net_charge_threshold:.3f}")
        for suptop in suptops[::-1]:
            suptop.apply_net_charge_filter(net_charge_threshold)

            # remove the suptop from the list if it's empty
            if len(suptop) == 0:
                suptops.remove(suptop)
                continue

            # Display information
            if suptop._removed_due_to_net_charge:
                logger.debug(
                    f"SupTop: Removed pairs due to net charge: "
                    f"{[[p[0], f'{p[1]:.3f}'] for p in suptop._removed_due_to_net_charge]}"
                )

    # remove the suptops that are empty
    for st in suptops[::-1]:
        if len(st) == 0:
            suptops.remove(st)

    if not disjoint_components:
        logger.debug(f"Checking for disjoint components in the {len(suptops)} suptops")
        # ensure that each suptop represents one CC
        # check if the graph was divided after removing any pairs (e.g. due to charge mismatch)
        # fixme - add the log about which atoms are removed?
        [st.largest_cc_survives() for st in suptops]

        for st in suptops:
            logger.debug(
                f"Removed disjoint components: {st._removed_because_disjointed_cc}"
            )

        # fixme
        # remove the smaller suptop, or one arbitrary if they are equivalent
        # if len(suptops) > 1:
        #     max_len = max([len(suptop) for suptop in suptops])
        #     for suptop in suptops[::-1]:
        #         if len(suptop) < max_len:
        #             suptops.remove(suptop)
        #
        #     # if there are equal length suptops left, take only the first one
        #     if len(suptops) > 1:
        #         suptops = [suptops[0]]
        #
        # assert len(suptops) == 1, suptops

    if len(suptops) == 0:
        return None

    suptop = extract_best_suptop(suptops, use_rmsd, weights=weights, get_list=False)

    if redistribute_charges_over_unmatched and not ignore_charges_completely:
        # assume that none of the suptops are disjointed
        logger.debug("Assuming that all suptops are separate at this point")
        # fixme: apply distribution of q only on the first st, that's the best one anyway,

        # we only want to apply redistribution once on the largest piece for now
        suptop.redistribute_charges()

    # atom ID assignment has to come after any removal of atoms due to their mismatching charges
    suptop.assign_atoms_ids(1)

    # there might be several best solutions, order them according the RMSDs
    # suptops.sort(key=lambda st: st.rmsd())

    # fixme - remove the hydrogens without attached heavy atoms

    # resolve_sup_top_multiple_match(sup_tops_charges)
    # sup_top_correct_chirality(sup_tops_charges, sup_tops_no_charges, atol=atol)

    logger.info("-------- Summary -----------")
    logger.info(
        f"Matched pairs: {len(suptop.matched_pairs)} out of {len(top1_nodes)}L/{len(top2_nodes)}R"
    )
    logger.info(
        f"Disappearing atoms: {(len(top1_nodes) - len(suptop.matched_pairs)) / len(top1_nodes) * 100:.1f}%"
    )
    logger.info(
        f"Appearing atoms: {(len(top2_nodes) - len(suptop.matched_pairs)) / len(top2_nodes) * 100:.1f}%"
    )

    # carry out a check. Each
    if align_molecules and not use_rmsd:
        main_rmsd = suptop.align_ligands_using_mcs()
        for mirror in suptop.mirrors:
            mirror_rmsd = mirror.align_ligands_using_mcs()
            if mirror_rmsd < main_rmsd:
                logger.debug("THE MIRROR RMSD IS LOWER THAN THE MAIN RMSD")
        rmsd = suptop.align_ligands_using_mcs(
            overwrite_original=True, use_disjointed=align_add_removed_mcs
        )
        logger.info(f"Aligned Common Area RMSD: {rmsd:.2f}")

    A_minus_B, A_minus_B_max, B_minus_A, B_minus_A_max = (
        suptop.alchemical_overlap_check()
    )
    logger.info(
        f"Alchemical Area Overlap:\n"
        f"\tRMS(A-B): {A_minus_B:.2f} Angstrom\n"
        f"\tmax(A-B): {A_minus_B_max:.2f} Angstrom\n"
        f"\tRMS(B-A): {B_minus_A:.2f} Angstrom\n"
        f"\tmax(B-A): {B_minus_A_max:.2f} Angstrom"
    )

    if config is not None and config.logging_breakdown:
        logger.removeHandler(file_log_handler)

    return suptop

is_mirror_of_one

is_mirror_of_one(candidate_suptop, suptops, ignore_coords, extract_weight_ratio)

"Mirror" in the sense that it is an alternative topological way to traverse the molecule.

extract_weight_ratio: refers to the extract_best_suptop function parameter

Depending on the "better" fit between the two mirrors, we pick the one that is better.

Source code in ties/topology_superimposer.py

def is_mirror_of_one(candidate_suptop, suptops, ignore_coords, extract_weight_ratio):
    """
    "Mirror" in the sense that it is an alternative topological way to traverse the molecule.

    extract_weight_ratio: refers to the extract_best_suptop function parameter

    Depending on the "better" fit between the two mirrors, we pick the one that is better.
    """
    for next_suptop in suptops:
        if next_suptop.is_mirror_of(candidate_suptop):
            # the suptop saved as the mirror should be the suptop
            # that is judged to be of a lower quality
            best_suptop = extract_best_suptop(
                [candidate_suptop, next_suptop],
                ignore_coords,
                weights=extract_weight_ratio,
            )

            if next_suptop is best_suptop:
                next_suptop.add_mirror_suptop(candidate_suptop)
            else:
                suptops.remove(next_suptop)
                suptops.append(candidate_suptop)

            return True

    return False

generate_nxg_from_list

generate_nxg_from_list(atoms)

Helper function. Generates a graph from a list of atoms @parameter atoms: follow the internal format for atoms

Source code in ties/topology_superimposer.py

def generate_nxg_from_list(atoms):
    """
    Helper function. Generates a graph from a list of atoms
    @parameter atoms: follow the internal format for atoms
    """
    g = nx.Graph()
    # add attoms
    [g.add_node(a) for a in atoms]
    # add all the edges
    for a in atoms:
        # add the edges from nA
        for a_bonded in a.bonds:
            g.add_edge(a, a_bonded.atom)

    return g

get_starting_configurations

get_starting_configurations(left_atoms, right_atoms, fraction=0.2, filter_ring_c=True)

Minimise the number of starting configurations to optimise the process speed. Use: * the rarity of the specific atom types, * whether the atoms are bottlenecks (so they do not suffer from symmetry). The issue with symmetry is that it is impossible to find the proper symmetry match if you start from the wrong symmetry. @parameter fraction: ensure that the number of atoms used to start the traversal is not more than the fraction value of the overall number of possible matches, counted as a fraction of the maximum possible number of pairs (MIN(LEFTNODES, RIGHTNODES)) @parameter filter_ring_c: filter out the carbon elements in the rings to avoid any issues with the symmetry. This assumes that a ring usually has one N element, etc.

Source code in ties/topology_superimposer.py

def get_starting_configurations(
    left_atoms, right_atoms, fraction=0.2, filter_ring_c=True
):
    """
    Minimise the number of starting configurations to optimise the process speed.
    Use:
     * the rarity of the specific atom types,
     * whether the atoms are bottlenecks (so they do not suffer from symmetry).
        The issue with symmetry is that it is impossible to find the proper
        symmetry match if you start from the wrong symmetry.
    @parameter fraction: ensure that the number of atoms used to start the traversal is not more
        than the fraction value of the overall number of possible matches, counted as
        a fraction of the maximum possible number of pairs (MIN(LEFTNODES, RIGHTNODES))
    @parameter filter_ring_c: filter out the carbon elements in the rings to avoid any issues
        with the symmetry. This assumes that a ring usually has one N element, etc.


    """
    logger.debug(
        "Superimposition: optimising the search by narrowing down the starting configuration. "
    )

    # ignore hydrogens
    left_atoms_noh = list(filter(lambda a: a.element != "H", left_atoms))
    right_atoms_noh = list(filter(lambda a: a.element != "H", right_atoms))

    # find out which atoms types are common across the two molecules
    # fixme - consider subclassing atom from MDAnalysis class and adding functions for some of these features
    # first, find the unique types for each molecule
    left_types = {left_atom.type for left_atom in left_atoms_noh}
    right_types = {right_atom.type for right_atom in right_atoms_noh}
    common_types = left_types.intersection(right_types)

    # for each atom type, check how many maximum atoms can theoretically be matched
    per_type_max_counter = {}
    for atom_type in common_types:
        left_count_by_type = sum(
            [1 for left_atom in left_atoms if left_atom.type == atom_type]
        )
        right_count_by_type = sum(
            [1 for right_atom in right_atoms if right_atom.type == atom_type]
        )
        per_type_max_counter[atom_type] = min(left_count_by_type, right_count_by_type)
    max_overlap_size = sum(per_type_max_counter.values())
    logger.debug(f"Largest MCS size: {max_overlap_size}")

    left_atoms_starting = left_atoms_noh[:]
    right_atoms_starting = right_atoms_noh[:]

    # ignore carbons in cycles
    # fixme - we should not use this for macrocycles, which should be ignored here
    if filter_ring_c:
        nxl = generate_nxg_from_list(left_atoms)
        for cycle in nx.cycle_basis(nxl):
            # ignore the carbons in the cycle
            cycle_carbons = list(filter(lambda a: a.element == "C", cycle))
            logger.debug(
                f"Superimposition of left atoms: Ignoring carbons as starting configurations because "
                f"they are carbons in a cycle: {cycle_carbons}"
            )
            [
                left_atoms_starting.remove(a)
                for a in cycle_carbons
                if a in left_atoms_starting
            ]
        nxr = generate_nxg_from_list(right_atoms_starting)
        for cycle in nx.cycle_basis(nxr):
            # ignore the carbons in the cycle
            cycle_carbons = list(filter(lambda a: a.element == "C", cycle))
            logger.debug(
                f"Superimposition of right atoms: Ignoring carbons as starting configurations because "
                f"they are carbons in a cycle: {cycle_carbons}"
            )
            [
                right_atoms_starting.remove(a)
                for a in cycle_carbons
                if a in right_atoms_starting
            ]

    # find out which atoms types are common across the two molecules
    # fixme - consider subclassing atom from MDAnalysis class and adding functions for some of these features
    # first, find the unique types for each molecule
    left_types = {left_atom.type for left_atom in left_atoms_starting}
    right_types = {right_atom.type for right_atom in right_atoms_starting}
    common_types = left_types.intersection(right_types)

    # for each atom type, check how many maximum atoms can theoretically be matched
    paired_by_type = []
    max_after_cycle_carbons = 0
    for atom_type in common_types:
        picked_left = list(filter(lambda a: a.type == atom_type, left_atoms_starting))
        picked_right = list(filter(lambda a: a.type == atom_type, right_atoms_starting))
        paired_by_type.append([picked_left, picked_right])
        max_after_cycle_carbons += min(len(picked_left), len(picked_right))
    logger.debug(
        f"Superimposition: simple max match of atoms after cycle carbons exclusion: {max_after_cycle_carbons}"
    )

    # sort atom according to their type rarity
    # use the min across, since 1x4 mapping will give 4 options only, so we count this as one,
    # but 4x4 would give 16,
    sorted_paired_by_type = sorted(
        paired_by_type, key=lambda p: min(len(p[0]), len(p[1]))
    )

    # find the atoms in each type and generate appropriate pairs,
    # use only a fraction of the maximum theoretical match
    desired_number_of_pairs = int(fraction * max_overlap_size)

    starting_configurations = []
    added_counter = 0
    for rare_left_atoms, rare_right_atoms in sorted_paired_by_type:
        # starting_configurations
        starting_configurations.extend(
            list(itertools.product(rare_left_atoms, rare_right_atoms))
        )
        added_counter += min(len(rare_left_atoms), len(rare_right_atoms))
        if added_counter > desired_number_of_pairs:
            break

    logger.debug(
        f"Superimposition: initial starting pairs for the search: {starting_configurations}"
    )
    return starting_configurations

get_atoms_bonds_from_file

get_atoms_bonds_from_file(ref_filename, mob_filename, use_general_type=True)

Use Parmed to load the files.

returns

1) a dictionary with charges, e.g. Item: "C17" : -0.222903

2) a list of bonds

Source code in ties/topology_superimposer.py

def get_atoms_bonds_from_file(ref_filename, mob_filename, use_general_type=True):
    """
    Use Parmed to load the files.

    # returns
    # 1) a dictionary with charges, e.g. Item: "C17" : -0.222903
    # 2) a list of bonds
    """

    universe_ref_atoms, universe_ref_bonds, ref = get_atoms_bonds_and_parmed_structure(
        ref_filename, use_general_type=use_general_type
    )
    universe_mob_atoms, universe_mob_bonds, mobile = (
        get_atoms_bonds_and_parmed_structure(
            mob_filename, use_general_type=use_general_type
        )
    )

    return (
        universe_ref_atoms,
        universe_ref_bonds,
        universe_mob_atoms,
        universe_mob_bonds,
        ref,
        mobile,
    )

assign_coords_from_pdb

assign_coords_from_pdb(atoms, pdb_atoms)

Match the atoms from the ParmEd object based on a .pdb file and overwrite the coordinates from ParmEd. :param atoms: internal Atom representation (fixme: refer to it here in docu), will have their coordinates overwritten. :param pdb_atoms: atoms loaded with ParmEd with the coordinates to be used

Source code in ties/topology_superimposer.py

def assign_coords_from_pdb(atoms, pdb_atoms):
    """
    Match the atoms from the ParmEd object based on a .pdb file
    and overwrite the coordinates from ParmEd.
    :param atoms: internal Atom representation (fixme: refer to it here in docu),
        will have their coordinates overwritten.
    :param pdb_atoms: atoms loaded with ParmEd with the coordinates to be used

    """
    for atom in atoms:
        # find the corresponding atom
        found_match = False
        for pdb_atom in pdb_atoms.atoms:
            if pdb_atom.name.upper() == atom.name.upper():
                # charges?
                atom.position = (pdb_atom.xx, pdb_atom.xy, pdb_atom.xz)
                found_match = True
                break
        if not found_match:
            logger.error(f"Did not find atom? {atom.name}")
            raise Exception("wait a minute")

topology_superimposer

AtomPair

SuperimposedTopology

fixme - should check first if atomName is unique

fixme - should check first if atomName is unique

mcs_score

write_metadata

write_pdb

write_mol2

get_single_topology_region

get_single_topology_app

ringring

is_or_was_matched

get_unmatched_atoms

get_unique_atom_count

align_ligands_using_mcs

alchemical_overlap_check

rm_matched_pairs_with_different_bonds

get_dual_topology_bonds

largest_cc_survives

assign_atoms_ids

get_appearing_atoms

fixme - should check first if atomName is unique

get_disappearing_atoms

fixme - should check first if atomName is unique

fixme - update to using the node set

remove_lonely_hydrogens

match_gaff2_nondirectional_bonds

get_net_charge

get_matched_with_diff_q

apply_net_charge_filter

Specifically, create copies for each strategy here and try a couple of them.

remove_attached_hydrogens

find_lowest_rmsd_mirror

is_subgraph_of_global_top

rmsd

link_pairs

find_mirror_choices

ie Ignore (X, B1) search, if we repair from A to B, then B to A should be repaired too

fixme - is this still necessary if we are traversing all paths?

add_alternative_mapping

correct_for_coordinates

fixme - ensure that each node is used only once at the end

is_area_overlapping_fully

is_area_overlapping

enforce_no_partial_rings

get_topology_similarity_score

fixme - maybe you should use the entire graphs in order to see if this is good or not?

unmatch_pairs_with_different_charges

is_consistent_with

get_circles

get_original_circles

cycle_spans_multiple_cycles

merge

validate_charges staticmethod

redistribute_charges

contains_same_atoms_symmetric

is_subgraph_of

subgraph_relationship

is_mirror_of

eq

toJSON

get_largest

long_merge

merge_compatible_suptops

superimpose_topologies

is_mirror_of_one

generate_nxg_from_list

get_starting_configurations

get_atoms_bonds_from_file

returns

1) a dictionary with charges, e.g. Item: "C17" : -0.222903

2) a list of bonds

assign_coords_from_pdb

validate_charges `staticmethod`