API

Classes

class scm.reactmap.Molecule(atoms=None, bonds=None, smiles=None, graph=None, plams_mol=None, active_atoms=None, active_bonds=None)

Bases: object

The main class for inputing/using reactants and products.

There are four possible ways of initializing the molecule:

  • by explicitly specifying atoms and bonds

  • using a SMILES string

  • using a networkx graph

  • using a PLAMS molecule

Keyword Arguments:
  • atoms (list(str)) – a list of atomic symbols. The order in the list corresponds to the index and will be used with the bonds argument.

  • bonds (list(tuple)) – a list of pairs (e1,e2) where e1 and e2 correspond to atom indices (integers) between which there is a bond

  • smiles (str) – a SMILES string for inputting molecular structure(s)

  • graph (networkx.graph) – a networkx graph representing the molecular structure(s)

  • plams_mol (scm.plams.Molecule) – a PLAMS molecule. If bonds are not specified in the plams molecule, they will be automatically guessed.

  • active_atoms (list(int)) – a subset of the atom indices which are allowed to be mapped to an atom with a different index. Atoms outside this set will automatically be mapped using the identity function on the indices (e.g., if atom 5 is not in the active set, atom 5 in the reactants will be mapped to atom 5 in the products).

  • active_bonds (list(int)) – a subset of the bonds that are allowed to be broken (reactant) or formed (product) in a reaction. Bonds outside this set will be preserved but not necessarily mapped to atoms with the same indices.

Examples

Let’s input acetone first using the atoms and bonds arguments:

import scm.reactmap as rm
reactant = rm.Molecule(atoms=['C','C','C','O','H','H','H','H','H','H'],bonds=[(0,1),(1,2),(1,3),(0,4),(0,5),(0,6),(3,7),(3,8),(3,9)])

And now as a SMILES sting using the smiles keyword argument:

reactant = rm.Molecule(smiles="CC(=O)C")
property H2_count

The number of H2 molecules in this Molecule object.

property active_atoms

A list of the indices of the active atoms for the Molecule instance

property active_bonds

A list of the active bonds for the Molecule instance

property assignable_atoms

Atoms which can be assigned to a different (non-identity) index. This also accounts for implicit H’s

get_atom_implicitH(a: int)

This function returns the number of implicit Hydrogens for the atom with index a. If implicit Hydrogens are not being used, this function returns 0.

Parameters:

a (int) – the index of the atom

Returns:

the number of implicit Hydrogens bonded to atom a

Return type:

int

get_graph(implicitH: bool | None = None)

The networkx graph representation of the chemical structure(s).

Note

The graph instance will only be calculated a single time as it is assumed that the reactant/product molecules will not change after input. This prevents redundant computations if the graph is needed in multiple places.

Keyword Arguments:

implicitH (bool) – whether or not to return the graph with implicit Hydrogens

Returns:

a networkx.graph instance

property implicitH_valid

True if the Molecule object can be used with the implicitH description of the system; Otherwise, False (e.g., if there are hypervalent Hydrogens).

Type:

(bool)

is_valid

a boolean denoting if input molecule is valid and can be used in reaction mapping; this may be False if the SMILES string is invalid, if one but not both of atoms and bonds is provided, or if something is incorrect about the active atoms/bonds input.

property neigh

A list of lists containing neighbor atom indices.

Type:

list

property neigh_type

A list of lists containing neighbor atom types

Type:

list

class scm.reactmap.Reaction(reactant, product, name=None)

Bases: object

The main class for inputting (reactant, product) pairs to specify a reaction and investigating the results of a mapping.

The Reaction class is intended as the minimal workspace for users of reactmap. The Reaction class contains the reactant and product molecules and, once the problem has been solved, the optimal mapping and its associated cost.

Keyword Arguments:
  • reactant (reactmap.Molecule) – The reactant in the reaction

  • product (reactmap.Molecule) – The product in the reaction

  • name (str) – A name for the reaction. If no name is supplied, a generic reaction name is generated.

Example

Let’s find the (hypothetical) mapping between CCCCO and CCOCC using the Reaction class to input the molecules. Then, we’ll print the mapping using the value stored in the Reaction instance.:

>>> import reactmap as rm
>>> reac = rm.Molecule(smiles='CCCCO')
>>> prod = rm.Molecule(smiles='CCOCC')
>>> reaction = rm.Reaction(reactant = reac, product = prod)
>>> rm.Map(reaction)
>>> print(reaction.mapping)
[4, 3, 0, 1, 2, 12, 13, 14, 10, 11, 5, 6, 8, 9, 7]
property bipartite_graph

A networkx bipartite graph between the reactant and product atoms.

property bond_breakages

The bonds that are removed from the reactant molecule in the optimal mapping.

property bond_formations

The bonds that are formed in the product molecule in the optimal mapping.

cost

The cost of the reaction mapping

Type:

(int)

error_message

The error message from CBC (if these is one)

Type:

(str)

found_solution

Whether a solution has been found.

Type:

(Bool)

property implicitH

True/False if this reaction is treating the reactants/products using implicit H’s

Type:

(Bool)

invert_solution()

When a mapping has been calculated from reactant to product. Return an inverted mapping from product to reactant.

is_feasible() bool

the reaction mapping problem has a solution if and only if the size of atom type sets is the same for reactant and product i.e., if we have 2 Oxygen atoms and 4 Carbon atoms for the reactants and 1 oxygen and 5 carbons for the products, then there is no feasible mapping

lb

The lower bound found by the heuristics

Type:

(int)

mapping

The atom mapping

Type:

(list)

output_image(filename)

Generates 2 .svg files showing an atom mapping between the reactants and the products.

Parameters:

filename (str) – the base name for the two output .svg files

The two output files will be called <filename>_reactants.svg and <filename>_products.svg. The reactants will show broken bonds in red and the products will show formed bonds in green. The atom indices in both images refer to the reactant indices. This means that the reactant indices correspond to the same indices as were given in the input. The product indices, however, correspond to the index of the reactant atom that was mapped to that product atom. As a concrete example, atom 15 in the reactants image always maps to atom 15 in the products image. Atom 15 in the reactants is also atom 15 in the original reactants input, but atom 15 in the products images may refer to atom 17 (for example) in the original product input.

prod

The product molecule

Type:

(reactmap.Reaction)

reac

The reactant molecule

Type:

(reactmap.Reaction)

property reaction_parity

Either 0 or 1, The parity of the feasible solutions. The cost of the solution will always have the same parity as all feasible mapping costs differ by 2.

Type:

(Int)

show_results(initial_index=0)

Displays the results of the reaction mapping in a human-readable way.

Keyword Arguments:

initial_index (int) – The index of the first atom. You may want to set this to 1 if you prefer the first atom to have index ‘1’.

The output shows several pieces of information.

  • Problem details

    • Number of atoms

    • Atom counts

    • Number of reactant/product bonds

  • Atom mapping details

    • Reactant index - the index of the reactant atom

    • Product index - the index of the product atom mapped to by the corresponding reactant atom

    • Atom type - the chemical symbol of the atom

    • Atom Status - any of the following

      • A - an active atom

      • I - an inactive atom

      • D - a (duplicate) atom that was filtered out of the reaction (and not mapped) because it was part of a molecule that occurred in both reactants and products

  • Bond change details - a list of bonds that are broken and formed given the atom mapping

  • Solution details

    • Time to find the solution

    • The cost of the mapping

solve_time

The total solve time for this problem

Type:

(float)

property top_nodes

A list of the reactant atoms to distinguish the reactant and product atoms.

ub

The upper bound found by the heuristics

Type:

(int)

valid_reaction

Whether the reaction is valid.

Type:

(Bool)

class scm.reactmap.Map(reaction, settings=None, mapping=None)

Bases: object

The class that solves the mapping problem on a Reaction instance.

This is the main class that handles feasibility checking, pre-processing, running heuristics, and setting up and running mixed-integer linear programming (MILP) problems to solve the reaction mapping problem. This class interfaces with Reaction instances and writes relevant mapping data to those instances. Optionally, a Settings object can be provided to specify problem settings.

Keyword Arguments:
  • reaction (reactmap.Reaction) – the reaction class that contains the reactant(s) and product(s). This can be a single Reaction object or a list of Reaction objects.

  • settings (reactmap.Settings) – the Settings class that contains user-specified problem options. If no Settings class is supplied, default values for all settings will be used.

Example

Let’s calculate the mapping cost between CCCCO and CCOCC:

>>> import scm.reactmap as rm
>>> reac = rm.Molecule(smiles='CCCCO')
>>> prod = rm.Molecule(smiles='CCOCC')
>>> reaction = rm.Reaction(reactant = reac, product = prod)
>>> rm.Map(reaction)
>>> print(reaction.cost)
4

We can also use a Settings object to turn off heuristics:

>>> settings = rm.Settings(use_heuristics = False)
>>> rm.Map(reaction, settings = settings)
>>> print(reaction.cost)
4
class scm.reactmap.Settings(**kwargs)

Bases: object

This is the main class for inputting settings for solving the problem(s).

Table 1 General problem solving options

Keyword

Options/Description

implicitH

  • True (default) - Treat all hydrogens as implicit. This greatly reduces the dimension of the optimization problem and can lead to significant performance gains. Note that these constraints assume all Hydrogens are bound to at most one other atom.

  • False - Do not use an implicit Hydrogen representation of the problem.

formulation

These options allow the user to choose between MIP formulations.
  • ‘bondmapping’ - use a larger MIP formulation that explicitly maps all preserved bonds.

  • ‘bondchange’ - (default) - use a smaller MIP formulation that implicitly maps all preserved bonds. This is typically the superior formulation.

cbc_path

A path to the cbc executable if it is not automatically detected by the system (e.g., it is not in the $PATH variable)

warmstart

  • True - Uses the mapping in the reaction class (either given by the user, or from the heuristics) as an initial guess.

  • False (default) - Do not use the mappings as a starting guess.

preprocess

  • True (default) - Attempt to remove redundancies (e.g. identical molecular structures in both reactants and products) to speed up the problem.

  • False - Do not attempt to simplify the problem.

preprocess_map

  • True - Also solve the mappings for molecules that were filtered out by pre-processing (these maps should be obvious between identical molecular structures, but calculating them requires more computational effort)

  • False (default) - Ignore the mappings for filtered structures

time_limit

The total time the MILP solver is allowed to run. Default is no time limit.

Table 2 Heuristics options

Keyword

Options/Description

use_heuristics

  • True (default) - Attempt to solve the problem using inexpensive heuristics rather than the MIP.

  • False - Directly solve the MIP.

lb_bonds

  • True (default) - Calculate an extremely cheap but often loose lower bound based on the minimum number of required bond changes.

  • False - Do not calculate this lower bound.

lb_assignment

  • True (default) - Calculate a cheap and relatively tight lower bound based on solving an assignment problem with local valence difference costs.

  • False - Do not calculate this lower bound.

lb_branchtight

  • True - Calculate an expensive lower bound using an iterative procedure.

  • False (default) - Do not calculate this lower bound.

ub_assignment

  • True (default) - Calculate a cheap and often good upper bound using an assignment problem with costs based on local neighborhood difference between reactant and product atoms.

  • False - Do not calculate this upper bound.

ub_greedy

  • True - Use an expensive, greedy algorithm to try to find an initial atom mapping.

  • False (default) - Do not calculate this upper bound.

ub_qap

  • True - Solve the quadratic assignment problem to local optimality to find an upper bound for the reaction mapping problem. This can be expensive but can often produce a tight upper bound.

  • False (default) - Do not calculate this upper bound.

Table 3 MIP Constraint Options

Keyword

Options/Description

con_valance

  • True (default) - Use valence constraints. These constraints involve limiting the possible number of mapped bonds based on the valences of mapped atoms (e.g., if we map a carbon with 2 oxygen neighbors to a carbon with only 1 oxygen neighbor, we know that at most one of the C-O bonds can be preserved.)

  • False - Do not use valence constraints.

con_tightbonds

  • True - Use tightbonds constraints. These constraints tighten the model by taking into account that a certain bond can only be mapped once. These constraints are typically expensive and usually do not improve performance.

  • False (default) - Do not use these constraints.

con_symmetry

  • True - Take problem symmetries into account to reduce the number of redundant solutions. For example, in the CO2 molecule, the mappings involving the oxygens are interchangeable. These constraints typically do not improve performance.

  • False (default) - Do not use these constraints.

con_symmetry_form

These options are only used if con_symmetry is True
  • 17 - Use a sparse symmetry-breaking constraint.

  • 18 - Use a medium-sparse symmetry-breaking constraint.

  • 19 (default) - Use a dense symmetry-breaking constraint.

con_symmetry_version

These options are only used if con_symmetry is True
  • ‘cycle’ - consider cyclic symmetries

  • ‘node’ - consider nodal symmetries

  • ‘nodecycle’ (default) - consider both nodal and cyclic symmetries

con_symmetry_side

These options are only used if con_symmetry is True
  • ‘reactant’ - use symmetries in the reactant molecule(s) to make constraints

  • ‘product’ - use the symmetries in the product molecule(s)

  • ‘reactantproduct’ (default) - use the symmetries in both reactants and products

Table 4 Output Options

Keyword

Options/Description

print_progress

  • True (default) - In solving a large problem set of reactions, write a one-line summary to stdout after each reaction map is solved.

  • False - Do not write anything to stdout.

Example

We can set options in two ways. First by initializing the settings class, and by changing them directly. We can see the set options with the show_option attribute. Here one can see that the implictH and formulation options have been changed:

>>> import reactmap as rm
>>> settings = rm.Settings(implicitH = False)
>>> settings.formulation = 'bondmapping'
>>> settings.show_options()
──────────────────────────────────────────────────────
                     RM Settings
──────────────────────────────────────────────────────
Key                   Value            Default
──────────────────────────────────────────────────────
print_level           status           status
log_level             info             info
print_progress        False            False
cbc_path              None             None
warmstart             False            False
preprocess            True             True
preprocess_map        False            False
use_heuristics        True             True
lb_bonds              False            False
lb_assignment         True             True
lb_branchtight        False            False
ub_assignment         True             True
ub_qap                False            False
ub_greedy             False            False
implicitH             False            True
con_valence           True             True
con_tightbonds        False            False
formulation           bondmapping      bondchange
con_symmetry          False            False
con_symmetry_version  cycle            cycle
con_symmetry_form     19               19
con_symmetry_side     reactantproduct  reactantproduct
write_log             False            False
time_limit            None             None
──────────────────────────────────────────────────────
__str__()

‘ Print a table of all settings, the current value and the default setting.

restore_defaults()

Resets all options to the default values.

show_options()

This displays all settings options as well as their current and default values.

class scm.reactmap.RMLogger(print_level='error', log_level='info', write_log=False, log_name='RM_Log')

Bases: object

This is the logger class for reactmap. This class handles all output from other reactmap components. An instance of this class is available at the package level (as reactmap.logger) and is used throughout the solution process.

Table 5 Useful Output and Logging Options

Keyword

Options/Description

print_level

These options indicate the level of information to write to stdout.
  • ‘info’ - output every step of the solution process

  • ‘status’ (default) - output a summary of the solution process

  • ‘warning’ - only output warnings

  • ‘error’ - only output errors

  • ‘critical’ - only output critical errors

  • ‘none’ - do not write to stdout

log_level

These options indicate the level of information to write to the logfile. write_log must be True for these options to take effect.
  • ‘info’ - output every step of the solution process

  • ‘status’ (default) - output a summary of the solution process

  • ‘warning’ - only output warnings

  • ‘error’ - only output errors

  • ‘critical’ - only output critical errors

  • ‘none’ - do not write to log

write_log

  • True - Write a log file

  • False (default) - Do not write a log file

add_to_log(message)

Adds a message to the log. This function also checks if the internal log string becomes to large, and will flush the log to disk.

Keyword Arguments:

message (str) – The message to add to the log.

conclude_mapping(reaction)

When a mapping is finished. Write a small concluding message about the reaction results.

Keyword Arguments:

reaction (reactmap.Reaction) – The reaction that will is about to be solved.

critical(message)

Create a message of level critical.

Keyword Arguments:

message (str) – The message to be logged and/or printed.

error(message)

Create a message of level error.

Keyword Arguments:

message (str) – The message to be logged and/or printed.

info(message)

Create a message of level info.

Keyword Arguments:

message (str) – The message to be logged and/or printed.

new_reaction(reaction)

When starting to solve a new problem, this function will show some information about the problem, like the name of the reaction, the amount of atoms and number of bonds for both the reactant and product molecules.

Keyword Arguments:

reaction (reactmap.Reaction) – The reaction that will is about to be solved.

open_new_log_file()

Open a log file named after the set log name.

print_message(message)

Prints a message to the command line.

Keyword Arguments:

message (str) – The message to be printed.

set_log_level(output)
Set the log level that messages need to exceed in order to be saved to the log. The options are:
  • info : Show all messages. Also the least important messages as trying individual heuristics.

  • status : Show mayor points in the solving process, like trying heuristics or starting with solving the MILP model.

  • warning : Show warnings. For example when certain settings cannot be used and are thus turned off.

  • error : Show errors.

  • critical : Shows critical errors that force the solver to quit with this problem.

  • none : Show no messages.

set_print_level(output)
Set the print level that messages need to exceed in order to be printed in the command line. The options are:
  • info : Show all messages. Also the least important messages as trying individual heuristics.

  • status : Show mayor points in the solving process, like trying heuristics or starting with solving the MILP model.

  • warning : Show warnings. For example when certain settings cannot be used and are thus turned off.

  • error : Show errors.

  • critical : Shows critical errors that force the solver to quit with this problem.

  • none : Show no messages.

start_new_log_file(log_name='RM_Log')

Start a new log file. If a previous log is still open, it will be closed.

Keyword Arguments:

log_name (str) – The name of the file. The default is ‘RM_Log’.

status(message)

Create a message of level status.

Keyword Arguments:

message (str) – The message to be logged and/or printed.

warning(message)

Create a message of level warning.

Keyword Arguments:

message (str) – The message to be logged and/or printed.

write_data()

Flushes the current log to disk.

property write_log

True/False if a log file should be written (defaults to False)

Submodules

heuristics is a submodule for using heuristic techniques to generate upper and lower bounds for reaction mapping problems. The HeuristicManager class interfaces with all heuristic techniques and manages the best values for upper and lower bounds to the problem. The HeuristicManager class should be sufficient for most users to try different heuristics.

Classes:
  • HeuristicsManager

Submodules:
  • lower_bound

  • upper_bound

class scm.reactmap.heuristics.HeuristicsManager(settings)

Bases: object

The HeuristicsManager class is responsible for executing the heuristics (upper and lower bounds) on a reaction problem. The class will automatically be called by the Map class during a normal reactmap run. Advanced users may sometimes find it useful to work with instances of the HeuristicsManager class, but this will likely be for finding approximate solutions to very difficult problems or other advanced case studies.

Keyword Arguments:
  • reaction (reactmap.Reaction) – The reaction problem for which to find the upper and lower bounds.

  • settings (reactmap.Settings) – The settings class that details which heuristics to try.

Example

We will use the HeuristicsManager class to find an upper bound (a feasible solution) to the reaction mapping problem between CCCCO and CCOCC. Here we find that the default upper bound options are sufficient in finding a tight upper bound.

>>> import reactmap as rm
>>> reac = rm.Molecule(smiles='CCCCO')
>>> prod = rm.Molecule(smiles='CCOCC')
>>> reaction = rm.Reaction(reactant = reac, product = prod)
>>> heurManager = rm.heuristics.HeuristicsManager(rm.Settings(ub_greedy = True))
>>> heurManager.run(reaction)
>>> print(reaction.ub)
4
>>> rm.Map(reaction)
>>> print(reaction.cost)
4
run(reaction)
try_lb(f, heur_name)

This function tries a lower bound. If a better lower bound was found it updates the appropriate fields in the reaction class.

Keyword Arguments:
  • f (func) – The function handle for a lower bound heuristic

  • heur_name (str) – The settings name of the lower bound. This is required for logging.

try_ub(f, heur_name)

This function tries a upper bound. If a better upper bound was found it updates the appropriate fields in the reaction class.

Keyword Arguments:
  • f (func) – The function handle for a upper bound heuristic

  • heur_name (str) – The settings name of the upper bound. This is required for logging.