RDKit interface¶
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. PLAMS interface to RDKit originates from QMFlows project and features functions for generating PLAMS molecules from string representations (SMARTS, SMILES) as well as a handful of tools for dealing with proteins and PDB files.
-
from_rdmol(rdkit_mol, confid=-1, properties=True)[source]¶ Translate an RDKit molecule into a PLAMS molecule type. RDKit properties will be unpickled if their name ends with ‘_pickled’.
Parameters: Returns: a PLAMS molecule
Return type:
-
to_rdmol(plams_mol, sanitize=True, properties=True, assignChirality=False)[source]¶ Translate a PLAMS molecule into an RDKit molecule type. PLAMS
Molecule,AtomorBondproperties are pickled if they are neither booleans, floats, integers, floats nor strings, the resulting property names are appended with ‘_pickled’.Parameters: - plams_mol (
Molecule) – A PLAMS molecule - sanitize (bool) – Kekulize, check valencies, set aromaticity, conjugation and hybridization
- properties (bool) – If all
Molecule,AtomandBondproperties should be converted from PLAMS to RDKit format. - assignChirality (bool) – Assign R/S and cis/trans information, insofar as this was not yet present in the PLAMS molecule.
Returns: an RDKit molecule
Return type: rdkit.Chem.Mol
- plams_mol (
-
from_smiles(smiles, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule(s) from a smiles strings.
Parameters: - smiles (str) – A smiles string
- nconfs (int) – Number of conformers to be generated
- name (str) – A name for the molecule
- forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
- rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations
Returns: A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
Return type: Moleculeor list of PLAMS Molecules
-
from_smarts(smarts, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule(s) from a smarts strings. This allows for example to define hydrogens explicitly. However it is less suitable for aromatic molecules (use from_smiles in that case).
Parameters: - smarts (str) – A smarts string
- nconfs (int) – Number of conformers to be generated
- name (str) – A name for the molecule
- forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
- rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
Returns: A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
Return type: Moleculeor list of PLAMS Molecules
-
get_conformations(mol, nconfs=1, name=None, forcefield=None, rms=-1, enforceChirality=False)[source]¶ Generates 3D conformation(s) for an rdkit_mol or a PLAMS Molecule
Parameters: - mol (rdkit.Chem.Mol or
Molecule) – RDKit or PLAMS Molecule - nconfs (int) – Number of conformers to be generated
- name (str) – A name for the molecule
- forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step
- rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
- enforceChirality (bool) – Enforce the correct chirality if chiral centers are present
Returns: A molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
Return type: Moleculeor list of PLAMS Molecules- mol (rdkit.Chem.Mol or
-
from_sequence(sequence, nconfs=1, name=None, forcefield=None, rms=0.1)[source]¶ Generates PLAMS molecule from a peptide sequence. Includes explicit hydrogens and 3D coordinates.
Parameters: - sequence (str) – A peptide sequence, e.g. ‘HAG’
- nconfs (int) – Number of conformers to be generated
- name (str) – A name for the molecule
- forcefield (str) – Choose ‘uff’ or ‘mmff’ forcefield for geometry optimization and ranking of comformations. The default value None results in skipping of the geometry optimization step.
- rms (float) – Root Mean Square deviation threshold for removing similar/equivalent conformations.
Returns: A peptide molecule with hydrogens and 3D coordinates or a list of molecules if nconfs > 1
Return type: Moleculeor list of PLAMS Molecules
-
modify_atom(mol, idx, element)[source]¶ Change atom “idx” in molecule “mol” to “element” and add or remove hydrogens accordingly
Parameters: Returns: Molecule with new element and possibly added or removed hydrogens
Return type:
-
apply_template(mol, template)[source]¶ Modifies bond orders in PLAMS molecule according template smiles structure.
Parameters: Returns: Molecule with correct chemical structure and provided 3D coordinates
Return type:
-
apply_reaction_smarts(mol, reaction_smarts, complete=False, forcefield=None, return_rdmol=False)[source]¶ Applies reaction smirks and returns product. If returned as a PLAMS molecule, thismolecule.properties.orig_atoms is a list of indices of atoms that have not been changed (which can for example be used partially optimize new atoms only with the freeze keyword)
Parameters: - mol (
Moleculeor rdkit.Chem.Mol) – molecule to be modified - reactions_smarts (str) – Reactions smarts to be applied to molecule
- complete (bool or float (value between 0 and 1)) – Apply reaction until no further changes occur or given fraction of reaction centers have been modified
- forcefield (str) – Specify ‘uff’ or ‘mmff’ to apply forcefield based geometry optimization of product structures.
- return_rdmol (bool) – return a RDKit molecule if true, otherwise a PLAMS molecule
Returns: (product molecule, list of unchanged atoms)
Return type: (
Molecule, list of int)- mol (
-
readpdb(pdb_file, removeHs=False, proximityBonding=False, return_rdmol=False)[source]¶ Generate a molecule from a PDB file
Parameters: Returns: The molecule
Return type: Moleculeor rdkit.Chem.Mol
-
writepdb(mol, pdb_file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶ Write a PDB file from a molecule
Parameters: - mol (
Moleculeor rdkit.Chem.Mol) – molecule to be exported to PDB - pdb_file (path- or file-like) – The PDB file to write to, or a filename
- mol (
-
add_Hs(mol, forcefield=None, return_rdmol=False)[source]¶ Add hydrogens to protein molecules read from PDB. Makes sure that the hydrogens get the correct PDBResidue info.
Parameters: Returns: A molecule with explicit hydrogens added
Return type: Moleculeor rdkit.Chem.Mol
-
partition_protein(mol, residue_bonds=None, split_heteroatoms=True, return_rdmol=False)[source]¶ Splits a protein molecule into capped amino acid fragments and caps.
Parameters: - mol (
Moleculeor rdkit.Chem.Mol) – A protein molecule - residue_bonds (tuple) – a tuple of pairs of residue number indicating which peptide bonds to split. If none, split all peptide bonds.
- split_heteroatoms (bool) – if True, all bonds between a heteroatom and a non-heteroatom across residues are removed
Returns: list of fragments, list of caps
- mol (
-
get_backbone_atoms(mol)[source]¶ Return a list of atom indices corresponding to the backbone atoms in a peptide molecule. This function assumes PDB information in properties.pdb_info of each atom, which is the case if the molecule is generated with the “readpdb” or “from_sequence” functions.
Parameters: mol ( Moleculeor rdkit.Chem.Mol) – a peptide moleculeReturns: a list of atom indices Return type: list
-
get_substructure(mol, func_list)[source]¶ Search for functional groups within a molecule based on a list of reference functional groups. SMILES strings, PLAMS and/or RDKit molecules can be used interchangeably in “func_list”.
Example:
>>> mol = from_smiles('OCCO') # Ethylene glycol >>> func_list = ['[H]O', 'C[N+]', 'O=PO'] >>> get_substructure(mol, func_list) {'[H]O': [(<scm.plams.mol.atom.Atom at 0x125183518>, <scm.plams.mol.atom.Atom at 0x1251836a0>), (<scm.plams.mol.atom.Atom at 0x125183550>, <scm.plams.mol.atom.Atom at 0x125183240>)]}
Parameters: Returns: A dictionary with functional groups from “func_list” as keys and a list of n-tuples with matching PLAMS
Atomas values.
-
yield_coords(rdmol, id=-1)[source]¶ Take an rdkit molecule and yield its coordinates as 3-tuples.
>>> from scm.plams import yield_coords >>> from rdkit import Chem >>> rdmol = Chem.Mol(...) # e.g. Methane >>> for xyz in yield_coords(rdmol): ... print(xyz) (-0.0, -0.0, -0.0) (0.6405, 0.6405, -0.6405) (0.6405, -0.6405, 0.6405) (-0.6405, 0.6405, 0.6405) (-0.6405, -0.6405, -0.6405)
The iterator produced by this function can, for example, be passed to
Molecule.from_array()the update the coordinates of a PLAMS Molecule in-place.>>> from scm.plams import Molecule >>> mol = Molecule(...) >>> xyz_iterator = yield_coords(rdmol) >>> mol.from_array(xyz_iterator)
Parameters: - rdmol (rdkit.Chem.Mol) – An RDKit mol.
- id (int) – The ID of the desired conformer.
Returns: An iterator yielding 3-tuples with rdmol’s Cartesian coordinates.
Return type: iterator