Automating Workflows

Identifying systems with certain physical properties out of a large database of molecules is a typical task that can be easily automatized. In this tutorial we will show how this can be achieved with a simple PLAMS script.

Note: this tutorial focuses on scripting aspects rather than studying physical phenomena. Thus, the computational methods used might not be entirely appropriate to describe the physical properties of interest.

Introducing the case study

Imagine that, out of a large database of test systems, we want to identify those molecules that absorb light with an energy smaller than 4 eV.

A simple approach would be to calculate the excitation energies (and the corresponding oscillator strengths) using ADF’s time-dependent DFT (TD-DFT) for all the systems of the database. As TD-DFT is a computationally expensive method, this procedure can be quite demanding for larger numbers of molecules.

A feasible approach may thus be realized by pre-screening the large database of molecules first, e.g. by using a less accurate but more efficient method like time-dependent DFTB (TD-DFTB). The (computationally expensive) TD-DFT calculations with ADF need then to be conducted only for those systems exhibiting the most promising adsorption energies in the previous TD-DFTB calculations. Thus, the basic workflow may look as follows:

Step 1
  • Calculate excitation energies and oscillator strengths with TD-DFTB.
  • Select those molecules with electronic excitations of energies below 6 eV and nonvanishing oscillator strengths (since TD-DFTB is less accurate than TD-DFT, we opt for a larger energy threshold in this step). This should significantly reduce the number of molecules to be considered in the following steps.
Step 2
  • Optimize the structures of the promising molecules with ADF.
Step 3
  • Compute the electronic excitations energies using ADF’s TD-DFT and select the molecules with excitations of energies below 4 eV and significant oscillator strengths.

Depending on your command of the Python language, you may want to read the following Python tutorials before proceeding:

Workflow script

Click here to download a tarball file (workflow_tutorial.tar) containing the scripts workflow.py, my_settings.py and a folder molecules with the molecular structures in .xyz format.

You can extract the files by typing in your terminal:

tar -xf workflow_tutorial.tar

The workflow.py script performs the three steps described above:

# workflow.py
# Workflow automation with PLAMS

import os
import sys
sys.path.append(os.getcwd()) # make sure we can import from the current dir
import my_settings # my_settings.py contains some predefined settings

def get_good_excitations(job, energy, osc_str):
  """Returns the excitations satisfying the condition e < en_thresh and o > osc_thresh.
     Units: energies in eV and oscillator strengths in Debye"""
  good_excitations = []
  # Before reading the results, make sure that the job terminated normally
  if job.check():
    # The results on the binary KF file are in atomic units
    exc = [Units.convert(e,'au','eV')
           for e in job.results.readkf('Excitations SS A','excenergies')]
    osc = [Units.convert(o,'au','Debye')
           for o in job.results.readkf('Excitations SS A','oscillator strengths')]
    for e,o in zip(exc,osc):
      # Check whether the excitation satisfies our condition:
      if e < energy and o > osc_str:
        good_excitations.append((e,o))
  return good_excitations

@add_to_class(DFTBResults)
def get_opt_molecule(self):
   return self.get_molecule('Molecule','Coords')

@add_to_class(ADFResults)
def get_opt_molecule(self):
   return self.get_molecule('Geometry','xyz')

################################ Step 0 ################################

# Create a list of Molecules from the .xyz files in the directory 'molecules':

molecules_dir = 'molecules'

molecules = []
# Loop over all the file in the directory
for mol_name in os.listdir(molecules_dir):
  mol = Molecule(os.path.join(molecules_dir,mol_name))
  # the properties attribute of the Molecule object is a PLAMS’ Settings object
  mol.properties.name = os.path.splitext(mol_name)[0]
  molecules.append(mol)

################################ Step 1 ################################

print("Step 1: optimize molecules and calculate excitations with DFTB")

promising_molecules = []
for mol in molecules:
  # The settings are defined in the module ``my_settings``,
  # which we imported at the beginning of our script
  dftb_job = DFTBJob(molecule=mol, settings=my_settings.dftb_GO_plus_excitations(),
                     name='DFTB_' + mol.properties.name)
  dftb_job.run()
  good_excitations = get_good_excitations(dftb_job, energy=6.0, osc_str=1.0E-4)
  if good_excitations:
    print ("Found a molecule with a promising excitation: " + mol.properties.name)
    promising_molecules.append(dftb_job.results.get_opt_molecule())

print("Number of promising molecules: %i " % len(promising_molecules))

################################ Step 2 ################################

print("Step 2: Geometry optimizations with ADF")

optimized_molecules = []
for mol in promising_molecules:
  adf_GO_job = ADFJob(molecule=mol, settings=my_settings.adf_GO(basis='DZP', XC='LDA'),
                      name='ADF_GO_' + mol.properties.name)
  adf_GO_job.run()

  # Before reading the results, make sure that the job terminated normally
  if adf_GO_job.check():
    optimized_molecules.append(adf_GO_job.results.get_opt_molecule())

################################ Step 3 ################################

print("Step 3: Calculate the excitations with ADF")

for mol in optimized_molecules:
  adf_exci_job = ADFJob(molecule=mol, settings=my_settings.adf_excitations(n_excitations=5),
                        name='ADF_EXCI_' + mol.properties.name)
  adf_exci_job.run()
  good_excitations = get_good_excitations(adf_exci_job, energy=4.0, osc_str=1.0E-4)
  if good_excitations:
    print ("Found a molecule with a good excitation: " + mol.properties.name)
    print ("Excitations (energy [eV], oscillator strength [Debye]:" + str(good_excitations))

Go to the folder workflow_tutorial and execute the script by typing:

$ADFBIN/plams workflow.py

The calculation will take about a minute and should yield:

Step 1: optimize molecules and calculate excitations with DFTB
[13:34:33] Job DFTB_AlF3 started
...
[13:34:41] Job DFTB_S2Cl2 finished with status 'successful'
Found a molecule with a promising excitation: S2Cl2
Number of promising molecules: 2
Step 2: Geometry optimizations with ADF
[13:34:41] Job ADF_GO_CSCl2 started
...
[13:34:53] Job ADF_GO_S2Cl2 finished with status 'successful'
Step 3: Calculate the excitations with ADF
[13:34:53] Job ADF_EXCI_CSCl2 started
...
[13:35:18] Job ADF_EXCI_S2Cl2 finished with status 'successful'
Found a molecule with a good excitation: S2Cl2
Excitations (energy [eV], oscillator strength [Debye]):
[(3.426, 0.011), (3.563, 0.015), (3.568, 0.001)]

In this simple example the large database of molecules consists of 5 systems. 2 of the 5 molecules pass the TD-DFTB screening step (Step 1), and eventually only one molecule, S2Cl2, has electronic excitations satisfying our condition of an excitation energy smaller than 4 eV with a significant corresponding oscillator strength.

Settings library

The Settings objects specifying the input option for the computational engines are defined in my_settings.py. We encourage you to develop your own library of input options.

# my_settings.py
# Small library of predefined settings

from scm.plams import *

# Simple immutable settings:

def dftb_GO_plus_excitations():
  sett = Settings()
  sett.input.Task.RunType = 'GO'
  sett.input.DFTB.ResourcesDir = 'QUASINANO2015'
  sett.input.Properties.Excitations.TDDFTB.calc = 'singlet'
  sett.input.Properties.Excitations.TDDFTB.lowest = 10
  return sett

# You can improve flexibility by having optional arguments:

def adf_GO(basis='TZP', XC='GGA PBE'):
  sett = Settings()
  sett.input.Basis.Type = basis
  sett.input.XC['_1'] = XC
  sett.input.NumericalQuality = 'Basic'
  sett.input.Geometry
  return sett

def adf_excitations(basis='TZP', XC='GGA PBE', n_excitations=10):
  sett = Settings()
  sett.input.Basis.Type = basis
  sett.input.XC['_1'] = XC
  sett.input.Excitations.lowest = n_excitations
  sett.input.Excitations.OnlySing = ''
  sett.input.NumericalQuality = 'Basic'
  sett.input.Symmetry = 'NoSym'
  return sett

Miscellaneous remarks

  • Before accessing a Results object of a PLAMS’ job you should assert the correct termination of the corresponding job via the check() member function:

    job.run()
    
    if job.check():
      r = job.results.readkf(section_name, variable_name)
    
  • Further details about the usage of decorator functions can be found in the corresponding section of the PLAMS documentation.