RDKit interface¶
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. PLAMS interface to RDKit originates from QMFlows project and features functions for generating PLAMS molecules from string representations (SMARTS, SMILES) as well as a handful of tools for dealing with proteins and PDB files.
-
add_Hs(mol, forcefield=None, return_rdmol=False)[source]¶ Add hydrogens to protein molecules read from PDB. Makes sure that the hydrogens get the correct PDBResidue info.
- Parameters
- Returns
A molecule with explicit hydrogens added
- Return type
Moleculeor rdkit.Chem.Mol
-
apply_reaction_smarts(mol, reaction_smarts, complete=False, forcefield=None, return_rdmol=False)[source]¶ Applies reaction smirks and returns product. If returned as a PLAMS molecule, thismolecule.properties.orig_atoms is a list of indices of atoms that have not been changed (which can for example be used partially optimize new atoms only with the freeze keyword)
- Parameters
mol (
Moleculeor rdkit.Chem.Mol) – molecule to be modifiedreactions_smarts (str) – Reactions smarts to be applied to molecule
complete (bool or float (value between 0 and 1)) – Apply reaction until no further changes occur or given fraction of reaction centers have been modified
forcefield (str) – Specify ‘uff’ or ‘mmff’ to apply forcefield based geometry optimization of product structures.
return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule
- Returns
(product molecule, list of unchanged atoms)
- Return type
(
Molecule, list of int)
-
apply_template(mol, template)[source]¶ Modifies bond orders in PLAMS molecule according template smiles structure.
-
get_backbone_atoms(mol)[source]¶ Return a list of atom indices corresponding to the backbone atoms in a peptide molecule. This function assumes PDB information in properties.pdb_info of each atom, which is the case if the molecule is generated with the “readpdb” or “from_sequence” functions.
-
modify_atom(mol, idx, element)[source]¶ Change atom “idx” in molecule “mol” to “element” and add or remove hydrogens accordingly
-
to_rdmol(plams_mol, sanitize=True, properties=True, assignChirality=False)[source]¶ Translate a PLAMS molecule into an RDKit molecule type. PLAMS
Molecule,AtomorBondproperties are pickled if they are neither booleans, floats, integers, floats nor strings, the resulting property names are appended with ‘_pickled’.- Parameters
plams_mol (
Molecule) – A PLAMS moleculesanitize (bool) – Kekulize, check valencies, set aromaticity, conjugation and hybridization
properties (bool) – If all
Molecule,AtomandBondproperties should be converted from PLAMS to RDKit format.assignChirality (bool) – Assign R/S and cis/trans information, insofar as this was not yet present in the PLAMS molecule.
- Returns
an RDKit molecule
- Return type
rdkit.Chem.Mol
-
from_rdmol(rdkit_mol, confid=- 1, properties=True)[source]¶ Translate an RDKit molecule into a PLAMS molecule type. RDKit properties will be unpickled if their name ends with ‘_pickled’.
-
from_sequence(sequence, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule from a peptide sequence. Includes explicit hydrogens and 3D coordinates.
- Parameters
sequence (str) – A peptide sequence, e.g. ‘HAG’
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- Returns
A peptide molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Moleculeor list of PLAMS Molecules
-
from_smiles(smiles, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule(s) from a smiles strings.
- Parameters
smiles (str) – A smiles string
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations
- Returns
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Moleculeor list of PLAMS Molecules
-
from_smarts(smarts, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule(s) from a smarts strings. This allows for example to define hydrogens explicitly. However it is less suitable for aromatic molecules (use from_smiles in that case).
- Parameters
smarts (str) – A smarts string
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- Returns
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Moleculeor list of PLAMS Molecules
-
to_smiles(plams_mol, short_smiles=True, **kwargs)[source]¶ Returns the RDKit-generated SMILES string of a PLAMS molecule.
Note: SMILES strings are generated based on the molecule’s connectivity. If the input PLAMS molecule does not contain any bonds, “guessed bonds” will be used.
- Parameters
plams_mol – A PLAMS
Moleculeshort_smiles (bool) – whether or not to use some RDKit sanitization to get shorter smiles (e.g. for a water molecule, short_smiles=True -> “O”, short_smiles=False -> [H]O[H])
**kwargs – With ‘kwargs’ you can provide extra optional parameters to the rdkit.Chem method ‘MolToSmiles’. See the rdkit documentation for more info.
- Returns
the SMILES string
-
partition_protein(mol, residue_bonds=None, split_heteroatoms=True, return_rdmol=False)[source]¶ Splits a protein molecule into capped amino acid fragments and caps.
- Parameters
mol (
Moleculeor rdkit.Chem.Mol) – A protein moleculeresidue_bonds (tuple) – a tuple of pairs of residue number indicating which peptide bonds to split. If none, split all peptide bonds.
split_heteroatoms (bool) – if True, all bonds between a heteroatom and a non-heteroatom across residues are removed
- Returns
list of fragments, list of caps
-
readpdb(pdb_file, sanitize=True, removeHs=False, proximityBonding=False, return_rdmol=False)[source]¶ Generate a molecule from a PDB file
-
writepdb(mol, pdb_file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Write a PDB file from a molecule
- Parameters
mol (
Moleculeor rdkit.Chem.Mol) – molecule to be exported to PDBpdb_file (path- or file-like) – The PDB file to write to, or a filename
-
get_substructure(mol, func_list)[source]¶ Search for functional groups within a molecule based on a list of reference functional groups. SMILES strings, PLAMS and/or RDKit molecules can be used interchangeably in “func_list”.
Example:
>>> mol = from_smiles('OCCO') # Ethylene glycol >>> func_list = ['[H]O', 'C[N+]', 'O=PO'] >>> get_substructure(mol, func_list) {'[H]O': [(<scm.plams.mol.atom.Atom at 0x125183518>, <scm.plams.mol.atom.Atom at 0x1251836a0>), (<scm.plams.mol.atom.Atom at 0x125183550>, <scm.plams.mol.atom.Atom at 0x125183240>)]}
- Parameters
- Returns
A dictionary with functional groups from “func_list” as keys and a list of n-tuples with matching PLAMS
Atomas values.
-
get_conformations(mol, nconfs=1, name=None, forcefield=None, rms=- 1, enforceChirality=False, useExpTorsionAnglePrefs='default', constraint_ats=None, EmbedParameters='EmbedParameters', randomSeed=1, best_rms=- 1)[source]¶ Generates 3D conformation(s) for an rdkit_mol or a PLAMS Molecule
- Parameters
mol (rdkit.Chem.Mol or
Molecule) – RDKit or PLAMS Moleculenconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
best_rms (float) – Root Mean Square deviation of best atomic permutation for removing similar/equivalent conformations.
enforceChirality (bool) – Enforce the correct chirality if chiral centers are present
useExpTorsionAnglePrefs (str) – Use experimental torsion angles preferences for the conformer generation by rdkit
constraint_ats (list) – List of atom indices to be constrained
EmbedParameters (str) – Name of RDKit EmbedParameters class (‘EmbedParameters’, ‘ETKDG’)
randomSeed (int) – The seed for the random number generator. If set to None the generated conformers will be non-deterministic.
- Returns
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type
Moleculeor list of PLAMS Molecules
-
yield_coords(rdmol, id=- 1)[source]¶ Take an rdkit molecule and yield its coordinates as 3-tuples.
>>> from scm.plams import yield_coords >>> from rdkit import Chem >>> rdmol = Chem.Mol(...) # e.g. Methane >>> for xyz in yield_coords(rdmol): ... print(xyz) (-0.0, -0.0, -0.0) (0.6405, 0.6405, -0.6405) (0.6405, -0.6405, 0.6405) (-0.6405, 0.6405, 0.6405) (-0.6405, -0.6405, -0.6405)
The iterator produced by this function can, for example, be passed to
Molecule.from_array()the update the coordinates of a PLAMS Molecule in-place.>>> from scm.plams import Molecule >>> mol = Molecule(...) >>> xyz_iterator = yield_coords(rdmol) >>> mol.from_array(xyz_iterator)
- Parameters
rdmol (rdkit.Chem.Mol) – An RDKit mol.
id (int) – The ID of the desired conformer.
- Returns
An iterator yielding 3-tuples with rdmol’s Cartesian coordinates.
- Return type
iterator
-
canonicalize_mol(mol, inplace=False, **kwargs)[source]¶ Take a PLAMS molecule and sort its atoms based on their canonical rank.
Example:
>>> from scm.plams import Molecule, canonicalize_mol # Methane >>> mol: Molecule = ... >>> print(mol) Atoms: 1 H 0.640510 0.640510 -0.640510 2 H 0.640510 -0.640510 0.640510 3 C 0.000000 0.000000 0.000000 4 H -0.640510 0.640510 0.640510 5 H -0.640510 -0.640510 -0.640510 >>> print(canonicalize_mol(mol)) Atoms: 1 C 0.000000 0.000000 0.000000 2 H -0.640510 -0.640510 -0.640510 3 H -0.640510 0.640510 0.640510 4 H 0.640510 -0.640510 0.640510 5 H 0.640510 0.640510 -0.640510
- Parameters
mol (
Molecule) – The to-be canonicalized molecule.inplace (bool) – Whether to sort the atoms inplace or to return a new molecule.
**kwargs – Further keyword arguments for rdkit.Chem.CanonicalRankAtoms.
- Returns
Either
Noneor a newly sorted molecule, depending on the value ofinplace.- Return type
None or
Molecule