molsg¶
The molsg
package is used to generate molecular and atomic hashes. These hash values can be used to determine if molecules or local atomic environments are identical. The specifics of the chemical structure information used to generate the hashes is controlled by the user, meaning that hash values can optionally include discriminating chemical information such as stereochemistry, torsion angles, surface aromaticity, ring membership, and more. For clarity, these options can be turned on or off, changing the information used to calculate hashes, and thereby changing the criteria for molecular and atomic environment equality.
- Example 1: A molecular hash and atomic labeling
Let’s take a look at a simple example to demonstrate some basic functionality. In this example, we’ll use
molsg
to calculate hashes for diethyl ether. This molecule is chosen because it has a central plane of symmetry, meaning we should observe chemically equivalent atoms. We’ll do a calculation with default options and then output the molecular and atomic hashes.import scm_molsg as molsg # input a molecule in SMILES format mol = molsg.Input.read_smiles('CCOCC') # initialize a hash calculator with the default options calc = molsg.CalcSG() # calculate the hashes on our molecule calc.calc_sg_hashes(mol) # print the molecular and atomic hashes print("molecule hash: ", mol.hash) for atom in mol.atoms: print(f"Atom {atom.symbol} has hash value {atom.hash}")
molecule hash: 7799821200906220727 Atom C has hash value 17594037434007768629 Atom C has hash value 12550518761658240532 Atom O has hash value 9554973667697621739 Atom C has hash value 12550518761658240532 Atom C has hash value 17594037434007768629 Atom H has hash value 143678654245410047 Atom H has hash value 143678654245410047 Atom H has hash value 143678654245410047 Atom H has hash value 16555217933092745424 Atom H has hash value 16555217933092745424 Atom H has hash value 16555217933092745424 Atom H has hash value 16555217933092745424 Atom H has hash value 143678654245410047 Atom H has hash value 143678654245410047 Atom H has hash value 143678654245410047
Note here that we do observe the expected results. There are 2 sets of 2 equivalent carbons and 2 sets of equivalent hydrogens.
- Example 2: Classifying molecules as same or different
In this example, we’ll input two identical molecules using different smiles strings, which will illustrate the hashes are index-invariant. We’ll input a slightly different 3rd molecule and show that the hash value is different from the first two.
import scm_molsg as molsg # input a molecule in SMILES format mol1 = molsg.Input.read_smiles('CC(O)CCN') # input an idential molecule using a different SMILES string mol2 = molsg.Input.read_smiles('NCCC(O)C') # input a third molecule that is slightly different from the first 2 mol3 = molsg.Input.read_smiles('NCC(O)C') # initialize a hash calculator with the default options calc = molsg.CalcSG() for m in [mol1,mol2,mol3]: calc.calc_sg_hashes(m) print(f"mol1 hash = {mol1.hash}") print(f"mol2 hash = {mol2.hash}") print(f"mol3 hash = {mol3.hash}")
mol1 hash = 9324307893686980071 mol2 hash = 9324307893686980071 mol3 hash = 7518558567789403784
molsg contents¶
Classes
The following are the main classes required to solve reaction mapping problems.
This class stores information about individual atoms, including individual atomic environment hashes. |
|
This class represents instances of molecules or groups of molecules. |
|
This class is the main access point for molsg hash calculations. |
|
This class is used to specify different settings for generating molecular and atomic hashes. |
Submodules
This module contains a few functions for inputting |