1. AMSConformers

1.1. Python library for conformer generation

The conformers package holds a collection of tools for the generation of conformer sets for a molecule. It makes heavy use of the AMS Python library PLAMS. The main tools for conformer generation in the package are:

  • CREST conformer generation 1

  • RDKit ETKDG conformer generation 2

The top level classes in the conformers package are the UniqueConformers classes:

  • UniqueConformersCrest

  • UniqueConformersRMSD

  • UniqueConformersTFD

  • UniqueConformersAMS

The main task of the conformer class is to hold the conformers of a molecule and prune/filter out duplicates. There are several ways duplicates can be defined :

  • CREST duplicate recognition (UniqueConformersCrest) 1

  • Duplicate recognition based on RMSD (UniqueConformersRMSD)

  • Duplicate recognition based on the Torsion Fingerprint Difference (UniqueConformersTFD) 3

  • Duplicate recognition based on interatomic distances and torsion angles (UniqueConformersAMS)

Out of these conformer classes we recommend UniqueConformersCrest as the best compromise between accuracy and efficiency in detecting and filtering out duplicates.

In addition to duplicate pruning, all UniqueConformers classes are also linked to a default conformer generation method (which can be changed by the user). The conformer generation methods in turn are linked to default geometry optimization settings (also changeable by the user).

To create a conformer set, an empty instance of the conformers object is created, and then conformers can be added to it, read from file, or generated. The conformers in the set can then be manipulated and compared.

1.2. Simple example

In this example, conformers are generated using the RDKit ETKDG routines, and duplicates are filtered out using the AMS approach.

from scm.plams import Settings, from_smiles
from scm.plams import init, finish
from scm.conformers import UniqueConformersCrest

# Set up the molecular data
smiles = "CCC"
mol = from_smiles(smiles)
conformers = UniqueConformersCrest()
conformers.prepare_state(mol)

# Set up PLAMS file handling
init()

# Generate the conformers using the RDKit ETKDG method
# This method includes geometry optimization and pruning
conformers.generate(method="rdkit", nproc=1)

finish()

print("NConformers: ", len(conformers))
print(conformers)

1.3. Comparing conformer sets

In this example, two conformer sets are read in from file and evaluated based on three criteria:

  1. The energy of the most stable conformer in the set

  2. The number of unique conformers in the set

  3. The number of conformer-clusters (similar conformers) in the set

import os
import numpy
from scm.plams import Molecule, DCDTrajectoryFile
from scm.conformers import UniqueConformersAMS, UniqueConformersCrest

# import matplotlib
# matplotlib.use("TkAgg")

# Define some filenames
pathname = "."
filename = os.path.join(pathname, "mol.xyz")
mol = Molecule(filename)
print("nats: ", len(mol))

# Read in the enumerated conformers (the larger set)
conformers_enum = UniqueConformersAMS()
conformers_enum.accept_all = True
conformers_enum.prepare_state(mol)
conformers_enum.read(pathname, "ENUM")
print("Original number of enumerated conformers: ", len(conformers_enum))

# Read in the Crest conformers
conformers_crest = UniqueConformersAMS()
conformers_crest.accept_all = True
conformers_crest.prepare_state(mol)
conformers_crest.read(pathname, "crest")
print("Original number of CREST conformers: ", len(conformers_crest))

# Combine the two sets into one big set (remembering which conformers came from which set)
all_conformers, (indices1, indices2) = conformers_enum + conformers_crest
names = all_conformers.indices_to_names(indices1, indices2)
print("Total number of conformers: ", len(all_conformers))
print("Indices CREST conformers in full set:")
print(indices2)

# Check the lowest energy of the CREST set
print("\n=================================")
e_diff = conformers_crest.energies[0] - min(all_conformers.energies)
print("Lowest energy (CREST wrt enumeration)                  : %20.10f kcal/mol" % (e_diff))
print("Quality lowest energy conformer CREST (between 0 and 1): ", numpy.exp(-e_diff))
print("=================================\n")

# Print the percentage of conformers found by the testset
print("=================================")
print(
    "Fraction of conformers found by CREST      : ",
    len(conformers_crest) / len(all_conformers),
)
print(
    "Fraction of conformers found by enumeration: ",
    len(conformers_enum) / len(all_conformers),
)
print("=================================\n")

# Make a dendrogram plot for the big set
dendrogram = all_conformers.get_dendrogram()
figure = all_conformers.get_plot_dendrogram(dendrogram, names=names, fontsize=8)
figure.savefig("dendro.jpg")

# Now assign conformers to clusters
cluster_indices1, cluster_indices2 = all_conformers.find_clusters(0.11, "distance", indices=(indices1, indices2))
nclusters = max(cluster_indices1 + cluster_indices2)
print("=================================")
print("nclusters: ", nclusters)
print("Cluster number of each member of CREST set")
print(cluster_indices2)
print("Fraction of clusters in CREST set     : ", len(set(cluster_indices2)) / nclusters)
print("Fraction of clusters in enumerated set: ", len(set(cluster_indices1)) / nclusters)
print("=================================")

References

1(1,2)

DOI: 10.1039/c9cp06869d

2

DOI: 10.1021/acs.jcim.5b00654

3
  1. Schulz-Gasch, C. Schaerfe, W. Guba, M. Rarey, ‘TFD: Torsion Fingerprints As a New Measure To Compare Small Molecule Conformations’, 2012, 52, 6, 1499-1512r. DOI: 10.1021/ci2002318