RDKit interface¶
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. PLAMS interface to RDKit originates from QMFlows project and features functions for generating PLAMS molecules from string representations (SMARTS, SMILES) as well as a handful of tools for dealing with proteins and PDB files.
- add_Hs(mol: Molecule | 'RDKitMol', forcefield: str | None = None, return_rdmol: Literal[False] = False) Molecule[source]¶
- add_Hs(mol: Molecule | 'RDKitMol', forcefield: str | None = None, return_rdmol: Literal[True] = False) RDKitMol
- add_Hs(mol: Molecule | 'RDKitMol', forcefield: str | None = None, return_rdmol: bool = False) Molecule | 'RDKitMol'
Add hydrogens to protein molecules read from PDB. Makes sure that the hydrogens get the correct PDBResidue info.
- Parameters:
- Returns:
A molecule with explicit hydrogens added
- Return type:
Moleculeor rdkit.Chem.Mol
- apply_reaction_smarts(mol: Molecule | 'RDKitMol', reaction_smarts: str, complete: bool = False, forcefield: str | None = None, return_rdmol: Literal[False] = False) Molecule[source]¶
- apply_reaction_smarts(mol: Molecule | 'RDKitMol', reaction_smarts: str, complete: bool = False, forcefield: str | None = None, return_rdmol: Literal[True] = False) RDKitMol
- apply_reaction_smarts(mol: Molecule | 'RDKitMol', reaction_smarts: str, complete: bool = False, forcefield: str | None = None, return_rdmol: bool = False) Molecule | 'RDKitMol'
Applies reaction smirks and returns product. If returned as a PLAMS molecule, thismolecule.properties.orig_atoms is a list of indices of atoms that have not been changed (which can for example be used partially optimize new atoms only with the freeze keyword)
- Parameters:
mol (
Moleculeor rdkit.Chem.Mol) – molecule to be modifiedreactions_smarts (str) – Reactions smarts to be applied to molecule
complete (bool or float (value between 0 and 1)) – Apply reaction until no further changes occur or given fraction of reaction centers have been modified
forcefield (str) – Specify ‘uff’ or ‘mmff’ to apply forcefield based geometry optimization of product structures.
return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule
- Returns:
(product molecule, list of unchanged atoms)
- Return type:
(
Molecule, list of int)
- apply_template(mol, template)[source]¶
Modifies bond orders in PLAMS molecule according template smiles structure.
- get_backbone_atoms(mol)[source]¶
Return a list of atom indices corresponding to the backbone atoms in a peptide molecule. This function assumes PDB information in properties.pdb_info of each atom, which is the case if the molecule is generated with the “readpdb” or “from_sequence” functions.
- modify_atom(mol, idx, element)[source]¶
Change atom “idx” in molecule “mol” to “element” and add or remove hydrogens accordingly
- to_rdmol(plams_mol, sanitize=True, properties=True, assignChirality=False, presanitize=False)[source]¶
Translate a PLAMS molecule into an RDKit molecule type. PLAMS
Molecule,AtomorBondproperties are pickled if they are neither booleans, floats, integers, floats nor strings, the resulting property names are appended with ‘_pickled’.- Parameters:
plams_mol (
Molecule) – A PLAMS moleculesanitize (bool) – Kekulize, check valencies, set aromaticity, conjugation and hybridization
properties (bool) – If all
Molecule,AtomandBondproperties should be converted from PLAMS to RDKit format.assignChirality (bool) – Assign R/S and cis/trans information, insofar as this was not yet present in the PLAMS molecule.
presanitize (bool) – Iteratively adjust bonding and atomic charges, to avoid failure of sanitization. Only relevant is sanitize is set to True.
plams_mol –
sanitize –
properties –
assignChirality –
presanitize –
- Returns:
an RDKit molecule
- Return type:
rdkit.Chem.Mol
- from_rdmol(rdkit_mol, confid=-1, properties=True)[source]¶
Translate an RDKit molecule into a PLAMS molecule type. RDKit properties will be unpickled if their name ends with ‘_pickled’.
- Parameters:
- Returns:
a PLAMS molecule
- Return type:
- from_sequence(sequence: str, nconfs: Literal[1] = 1, name: str | None = None, forcefield: str | None = None, rms: float = 0.1) Molecule[source]¶
- from_sequence(sequence: str, nconfs: int, name: str | None = None, forcefield: str | None = None, rms: float = 0.1) List[Molecule] | Molecule
Generates PLAMS molecule from a peptide sequence. Includes explicit hydrogens and 3D coordinates.
- Parameters:
sequence (str) – A peptide sequence, e.g. ‘HAG’
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- Returns:
A peptide molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type:
Moleculeor list of PLAMS Molecules
- from_smiles(smiles: str, nconfs: Literal[1] = 1, name: str | None = None, forcefield: str | None = None, rms: float = 0.1) Molecule[source]¶
- from_smiles(smiles: str, nconfs: int = 1, name: str | None = None, forcefield: str | None = None, rms: float = 0.1) Molecule | List[Molecule]
Generates PLAMS molecule(s) from a smiles strings.
- Parameters:
smiles (str) – A smiles string
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of conformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations
- Returns:
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type:
Moleculeor list of PLAMS Molecules
- from_smarts(smarts: str, nconfs: Literal[1] = 1, name: str | None = None, forcefield: str | None = None, rms: float = 0.1) Molecule[source]¶
- from_smarts(smarts: str, nconfs: int = 1, name: str | None = None, forcefield: str | None = None, rms: float = 0.1) Molecule | List[Molecule]
Generates PLAMS molecule(s) from a smarts strings. This allows for example to define hydrogens explicitly. However it is less suitable for aromatic molecules (use from_smiles in that case).
- Parameters:
smarts (str) – A smarts string
nconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- Returns:
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type:
Moleculeor list of PLAMS Molecules
- to_smiles(plams_mol, short_smiles=True, **kwargs)[source]¶
Returns the RDKit-generated SMILES string of a PLAMS molecule.
Note: SMILES strings are generated based on the molecule’s connectivity. If the input PLAMS molecule does not contain any bonds, “guessed bonds” will be used.
- Parameters:
short_smiles (bool) – whether or not to use some RDKit sanitization to get shorter smiles (e.g. for a water molecule, short_smiles=True -> “O”, short_smiles=False -> [H]O[H])
kwargs (Any) – With ‘kwargs’ you can provide extra optional parameters to the rdkit.Chem method ‘MolToSmiles’. See the rdkit documentation for more info.
plams_mol –
short_smiles –
kwargs –
- Returns:
the SMILES string
- Return type:
- partition_protein(mol: Molecule | 'RDKitMol', residue_bonds: Iterable[Tuple[int, int]] | None = None, split_heteroatoms: bool = True, return_rdmol: Literal[False] = False) Tuple[List[Molecule], List[Molecule]][source]¶
- partition_protein(mol: Molecule | 'RDKitMol', residue_bonds: Iterable[Tuple[int, int]] | None = None, split_heteroatoms: bool = True, return_rdmol: Literal[True] = False) Tuple[List['RDKitMol'], List['RDKitMol']]
- partition_protein(mol: Molecule | 'RDKitMol', residue_bonds: Iterable[Tuple[int, int]] | None = None, split_heteroatoms: bool = True, return_rdmol: bool = False) Tuple[List[Molecule | 'RDKitMol'], List[Molecule | 'RDKitMol']]
Splits a protein molecule into capped amino acid fragments and caps.
- Parameters:
mol (
Moleculeor rdkit.Chem.Mol) – A protein moleculeresidue_bonds (tuple) – a tuple of pairs of residue number indicating which peptide bonds to split. If none, split all peptide bonds.
split_heteroatoms (bool) – if True, all bonds between a heteroatom and a non-heteroatom across residues are removed
- Returns:
list of fragments, list of caps
- readpdb(pdb_file: str, sanitize: bool = True, removeHs: bool = False, proximityBonding: bool = False, return_rdmol: Literal[False] = False) Molecule[source]¶
- readpdb(pdb_file: str, sanitize: bool = True, removeHs: bool = False, proximityBonding: bool = False, return_rdmol: Literal[True] = False) RDKitMol
- readpdb(pdb_file: str, sanitize: bool = True, removeHs: bool = False, proximityBonding: bool = False, return_rdmol: bool = False) 'RDKitMol' | Molecule
Generate a molecule from a PDB file
- writepdb(mol, pdb_file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write a PDB file from a molecule
- Parameters:
mol (
Moleculeor rdkit.Chem.Mol) – molecule to be exported to PDBpdb_file (path- or file-like) – The PDB file to write to, or a filename
mol –
- Return type:
None
- get_substructure(mol, func_list)[source]¶
Search for functional groups within a molecule based on a list of reference functional groups. SMILES strings, PLAMS and/or RDKit molecules can be used interchangeably in “func_list”.
Example:
>>> mol = from_smiles('OCCO') # Ethylene glycol >>> func_list = ['[H]O', 'C[N+]', 'O=PO'] >>> get_substructure(mol, func_list) {'[H]O': [(<scm.plams.mol.atom.Atom at 0x125183518>, <scm.plams.mol.atom.Atom at 0x1251836a0>), (<scm.plams.mol.atom.Atom at 0x125183550>, <scm.plams.mol.atom.Atom at 0x125183240>)]}
- Parameters:
- Returns:
A dictionary with functional groups from “func_list” as keys and a list of n-tuples with matching PLAMS
Atomas values.- Return type:
- get_conformations(mol: Molecule | 'RDKitMol', nconfs: Literal[1] = 1, name: str | None = None, forcefield: str | None = None, rms: float = -1, enforceChirality: bool = False, useExpTorsionAnglePrefs: str = 'default', constraint_ats: List[int] | None = None, EmbedParameters: str = 'EmbedParameters', randomSeed: int = 1, best_rms: float = -1) Molecule[source]¶
- get_conformations(mol: Molecule | 'RDKitMol', nconfs: int = 1, name: str | None = None, forcefield: str | None = None, rms: float = -1, enforceChirality: bool = False, useExpTorsionAnglePrefs: str = 'default', constraint_ats: List[int] | None = None, EmbedParameters: str = 'EmbedParameters', randomSeed: int = 1, best_rms: float = -1) List[Molecule] | Molecule
Generates 3D conformation(s) for an rdkit_mol or a PLAMS Molecule
- Parameters:
mol (rdkit.Chem.Mol or
Molecule) – RDKit or PLAMS Moleculenconfs (int) – Number of conformers to be generated
name (str) – A name for the molecule
forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step
rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
best_rms (float) – Root Mean Square deviation of best atomic permutation for removing similar/equivalent conformations.
enforceChirality (bool) – Enforce the correct chirality if chiral centers are present
useExpTorsionAnglePrefs (str) – Use experimental torsion angles preferences for the conformer generation by rdkit
constraint_ats (list) – List of atom indices to be constrained
EmbedParameters (str) – Name of RDKit EmbedParameters class (‘EmbedParameters’, ‘ETKDG’)
randomSeed (int) – The seed for the random number generator. If set to None the generated conformers will be non-deterministic.
- Returns:
A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
- Return type:
Moleculeor list of PLAMS Molecules
- yield_coords(rdmol, id=-1)[source]¶
Take an rdkit molecule and yield its coordinates as 3-tuples.
>>> from scm.plams import yield_coords >>> from rdkit import Chem >>> rdmol = Chem.Mol(...) # e.g. Methane >>> for xyz in yield_coords(rdmol): ... print(xyz) (-0.0, -0.0, -0.0) (0.6405, 0.6405, -0.6405) (0.6405, -0.6405, 0.6405) (-0.6405, 0.6405, 0.6405) (-0.6405, -0.6405, -0.6405)
The iterator produced by this function can, for example, be passed to
Molecule.from_array()the update the coordinates of a PLAMS Molecule in-place.>>> from scm.plams import Molecule >>> mol = Molecule(...) >>> xyz_iterator = yield_coords(rdmol) >>> mol.from_array(xyz_iterator)
- Parameters:
rdmol (rdkit.Chem.Mol) – An RDKit mol.
id (int) – The ID of the desired conformer.
rdmol –
id –
- Returns:
An iterator yielding 3-tuples with rdmol’s Cartesian coordinates.
- Return type:
iterator
- canonicalize_mol(mol: Molecule, inplace: Literal[True], **kwargs: Any) None[source]¶
- canonicalize_mol(mol: Molecule, inplace: Literal[False] = False, **kwargs: Any) Molecule
Take a PLAMS molecule and sort its atoms based on their canonical rank.
Example:
>>> from scm.plams import Molecule, canonicalize_mol # Methane >>> mol: Molecule = ... >>> print(mol) Atoms: 1 H 0.640510 0.640510 -0.640510 2 H 0.640510 -0.640510 0.640510 3 C 0.000000 0.000000 0.000000 4 H -0.640510 0.640510 0.640510 5 H -0.640510 -0.640510 -0.640510 >>> print(canonicalize_mol(mol)) Atoms: 1 C 0.000000 0.000000 0.000000 2 H -0.640510 -0.640510 -0.640510 3 H -0.640510 0.640510 0.640510 4 H 0.640510 -0.640510 0.640510 5 H 0.640510 0.640510 -0.640510
- Parameters:
mol (
Molecule) – The to-be canonicalized molecule.inplace (bool) – Whether to sort the atoms inplace or to return a new molecule.
**kwargs – Further keyword arguments for rdkit.Chem.CanonicalRankAtoms.
- Returns:
Either
Noneor a newly sorted molecule, depending on the value ofinplace.- Return type:
None or
Molecule
- to_image(mol, remove_hydrogens=True, filename=None, fmt='svg', size=(200, 100), as_string=True)[source]¶
Convert single molecule to single image object
mol– PLAMS Molecule objectremove_hydrogens– Wether or not to remove the H-atoms from the imagefilename– Optional: Name of image file to be created.fmt– One of “svg”, “png”, “eps”, “pdf”, “jpeg”size– Tuple/list containing width and height of image in pixels.as_string– Returns the image as a string or bytestring. If set to False, the original formatwill be returned, which can be either a PIL image or SVG text We do this because after converting a PIL image to a bytestring it is not possible to further edit it (with our version of PIL).
Returns – Image text file / binary of image text file / PIL Image object.
- get_reaction_image(reactants, products, filename=None, fmt='svg', size=(200, 100), as_string=True)[source]¶
Create a 2D reaction image from reactants and products (PLAMS molecules)
reactants– Iterable of PLAMS Molecule objects representing the reactants.products– Iterable of PLAMS Molecule objects representing the products.filename– Optional: Name of image file to be created.fmt– The format of the image (svg, png, eps, pdf, jpeg).The extension in the filename, if provided, takes precedence.
size– Tuple/list containing width and height of image in pixels.as_string– Returns the image as a string or bytestring. If set to False, the original formatwill be returned, which can be either a PIL image or SVG text
Returns – SVG image text file.