molsg

The molsg package is used to generate molecular and atomic hashes. These hash values can be used to determine if molecules or local atomic environments are identical. The specifics of the chemical structure information used to generate the hashes is controlled by the user, meaning that hash values can optionally include discriminating chemical information such as stereochemistry, torsion angles, surface aromaticity, ring membership, and more. For clarity, these options can be turned on or off, changing the information used to calculate hashes, and thereby changing the criteria for molecular and atomic environment equality.

Example 1: A molecular hash and atomic labeling

Let’s take a look at a simple example to demonstrate some basic functionality. In this example, we’ll use molsg to calculate hashes for diethyl ether. This molecule is chosen because it has a central plane of symmetry, meaning we should observe chemically equivalent atoms. We’ll do a calculation with default options and then output the molecular and atomic hashes.

import scm_molsg as molsg

# input a molecule in SMILES format
mol  = molsg.Input.read_smiles('CCOCC')
# initialize a hash calculator with the default options
calc = molsg.CalcSG()

# calculate the hashes on our molecule
calc.calc_sg_hashes(mol)

# print the molecular and atomic hashes
print("molecule hash: ", mol.hash)
for atom in mol.atoms:
  print(f"Atom {atom.symbol} has hash value {atom.hash}")
molecule hash:  7799821200906220727
Atom C has hash value 17594037434007768629
Atom C has hash value 12550518761658240532
Atom O has hash value 9554973667697621739
Atom C has hash value 12550518761658240532
Atom C has hash value 17594037434007768629
Atom H has hash value 143678654245410047
Atom H has hash value 143678654245410047
Atom H has hash value 143678654245410047
Atom H has hash value 16555217933092745424
Atom H has hash value 16555217933092745424
Atom H has hash value 16555217933092745424
Atom H has hash value 16555217933092745424
Atom H has hash value 143678654245410047
Atom H has hash value 143678654245410047
Atom H has hash value 143678654245410047

Note here that we do observe the expected results. There are 2 sets of 2 equivalent carbons and 2 sets of equivalent hydrogens.

Example 2: Classifying molecules as same or different

In this example, we’ll input two identical molecules using different smiles strings, which will illustrate the hashes are index-invariant. We’ll input a slightly different 3rd molecule and show that the hash value is different from the first two.

import scm_molsg as molsg

# input a molecule in SMILES format
mol1  = molsg.Input.read_smiles('CC(O)CCN')
# input an idential molecule using a different SMILES string
mol2  = molsg.Input.read_smiles('NCCC(O)C')
# input a third molecule that is slightly different from the first 2
mol3  = molsg.Input.read_smiles('NCC(O)C')
# initialize a hash calculator with the default options
calc = molsg.CalcSG()

for m in [mol1,mol2,mol3]:
  calc.calc_sg_hashes(m)

print(f"mol1 hash = {mol1.hash}")
print(f"mol2 hash = {mol2.hash}")
print(f"mol3 hash = {mol3.hash}")
mol1 hash = 9324307893686980071
mol2 hash = 9324307893686980071
mol3 hash = 7518558567789403784

molsg contents

Classes

The following are the main classes required to solve reaction mapping problems.

Atom

This class stores information about individual atoms, including individual atomic environment hashes.

Mol

This class represents instances of molecules or groups of molecules.

CalcSG

This class is the main access point for molsg hash calculations.

Options

This class is used to specify different settings for generating molecular and atomic hashes.

Submodules

Input

This module contains a few functions for inputting molsg molecules.

Index

Index