Automating Workflows

Identifying systems with certain physical properties out of a large database of molecules is a typical task that can be easily automatized. In this tutorial we will show how this can be achieved with a simple PLAMS script.

Note: this tutorial focuses on scripting aspects rather than studying physical phenomena. Thus, the computational methods used might not be entirely appropriate to describe the physical properties of interest.

Introducing the case study

Imagine that, out of a large database of test systems, we want to identify those molecules that absorb light with an energy smaller than 4 eV.

A simple approach would be to calculate the excitation energies (and the corresponding oscillator strengths) using ADF’s time-dependent DFT (TD-DFT) for all the systems of the database. As TD-DFT is a computationally expensive method, this procedure can be quite demanding for larger numbers of molecules.

A feasible approach may thus be realized by pre-screening the large database of molecules first, e.g. by using a less accurate but more efficient method like time-dependent DFTB (TD-DFTB). The (computationally expensive) TD-DFT calculations with ADF need then to be conducted only for those systems exhibiting the most promising adsorption energies in the previous TD-DFTB calculations. Thus, the basic workflow may look as follows:

Step 1
  • Calculate excitation energies and oscillator strengths with TD-DFTB.
  • Select those molecules with electronic excitations of energies below 6 eV and nonvanishing oscillator strengths (since TD-DFTB is less accurate than TD-DFT, we opt for a larger energy threshold in this step). This should significantly reduce the number of molecules to be considered in the following steps.
Step 2
  • Optimize the structures of the promising molecules with ADF.
Step 3
  • Compute the electronic excitations energies using ADF’s TD-DFT and select the molecules with excitations of energies below 4 eV and significant oscillator strengths.

Depending on your command of the Python language, you may want to read the following Python tutorials before proceeding:

Workflow script

Click here to download a tarball file (workflow_tutorial.tar) containing the scripts workflow.py, my_settings.py and a folder molecules with the molecular structures in .xyz format.

You can extract the files by typing in your terminal:

tar -xf workflow_tutorial.tar

The workflow.py script performs the three steps described above:

# workflow.py
# Workflow automation with PLAMS

import os, sys
import my_settings # my_settings.py contains some predefined settings


@add_to_class(AMSResults)
def get_dftb_excitations(results):
  if results.job.ok():
    exc_energies   = Units.convert(results.readrkf('Excitations SS A','excenergies','dftb'),'au','eV')
    oscillator_str = results.readrkf('Excitations SS A','oscillator strengths','dftb')
  else:
    exc_energies, oscillator_str = [], []
  return exc_energies, oscillator_str


@add_to_class(ADFResults)
def get_excitations(results):
  if results.job.ok():
    exc_energies   = Units.convert(results.readkf('Excitations SS A','excenergies'),'au','eV')
    oscillator_str = Units.convert(results.readkf('Excitations SS A','oscillator strengths'),'au','Debye')
  else:
    exc_energies, oscillator_str = [], []
  return exc_energies, oscillator_str


################################ Step 0 ################################

# Create a list of Molecules from the .xyz files in the directory 'molecules':

molecules_dir = 'molecules'
molecules = read_molecules(molecules_dir)


################################ Step 1 ################################

print("Step 1: optimize molecules and calculate excitations with DFTB")

promising_molecules = {}
for name, mol in molecules.items():
  # The settings are defined in the module ``my_settings``,
  # which we imported at the beginning of our script
  dftb_job = AMSJob(name='DFTB_'+name, molecule=mol, settings=my_settings.dftb_GO_plus_excitations())
  dftb_job.run()

  exc_energies, oscillator_str = dftb_job.results.get_dftb_excitations()

  # Check whether the excitation satisfies our condition:
  for e,o in zip(exc_energies, oscillator_str):
    if e < 6.0 and o > 1.0E-4:
      print ("Found a molecule with a promising excitation: " + name)
      promising_molecules[name] = dftb_job.results.get_main_molecule()
      break

print("Number of promising molecules: {}".format(len(promising_molecules)))


################################ Step 2 ################################

print("Step 2: Geometry optimizations with ADF")

optimized_molecules = {}
for name, mol in promising_molecules.items():
  adf_GO_job = ADFJob(name='GO_'+name, molecule=mol, settings=my_settings.adf_GO(basis='DZP', XC_fun=('LDA', '')))
  adf_GO_job.run()

  # Before reading the results, make sure that the job terminated normally
  if adf_GO_job.ok():
    optimized_molecules[name] = adf_GO_job.results.get_main_molecule()

# ################################ Step 3 ################################

print("Step 3: Calculate the excitations with ADF")

for name, mol in optimized_molecules.items():
  adf_exc_job = ADFJob(name='EXC_'+name, molecule=mol, settings=my_settings.adf_excitations(n_excitations=5))
  adf_exc_job.run()

  exc_energies, oscillator_str = adf_exc_job.results.get_excitations()

  good_excitations = []
  for e,o in zip(exc_energies, oscillator_str):
    if e < 4.0 and o > 1.0E-4:
      good_excitations.append((e,o))

  if good_excitations:
    print ("Found a molecule with a good excitation: " + name)
    print ("Excitations (energy [eV], oscillator strength):")
    for e,o in good_excitations:
      print("   {:10.6f}   {:10.8f}".format(e,o))

Go to the folder workflow_tutorial and execute the script by typing:

$ADFBIN/plams workflow.py

The calculation will take about a minute and should yield

Step 1: optimize molecules and calculate excitations with DFTB
[13:34:33] JOB DFTB_AlF3 STARTED
...
[13:34:41] JOB DFTB_CSCl2 SUCCESSFUL
Found a molecule with a promising excitation: S2Cl2
Number of promising molecules: 2
Step 2: Geometry optimizations with ADF
[13:34:41] JOB GO_CSCl2 STARTED
...
[13:34:53] JOB GO_S2Cl2 SUCCESSFUL
Step 3: Calculate the excitations with ADF
[13:34:53] JOB EXC_CSCl2 STARTED
...
[13:35:18] JOB EXC_S2Cl2 SUCCESSFUL
Found a molecule with a good excitation: S2Cl2
Excitations (energy [eV], oscillator strength):
     3.434510   0.01157421
     3.555938   0.00112001
     3.561667   0.01508418

In this simple example the large database of molecules consists of 5 systems. 2 of the 5 molecules pass the TD-DFTB screening step (Step 1), and eventually only one molecule, S2Cl2, has electronic excitations satisfying our condition of an excitation energy smaller than 4 eV with a significant corresponding oscillator strength.

Settings library

The Settings objects specifying the input option for the computational engines are defined in my_settings.py. We encourage you to develop your own library of input options.

# my_settings.py
# Small library of predefined settings

from scm.plams import Settings

# Simple immutable settings:

def dftb_GO_plus_excitations():
  settings = Settings()
  settings.input.AMS.Task = 'GeometryOptimization'
  settings.input.AMS.GeometryOptimization.Convergence.Gradients = 1.0e-5
  settings.input.DFTB.ResourcesDir = 'QUASINANO2015'
  settings.input.DFTB.Properties.Excitations.TDDFTB.calc = 'singlet'
  settings.input.DFTB.Properties.Excitations.TDDFTB.lowest = 10
  settings.input.DFTB.Occupation.Temperature = 5.0
  return settings

# You can improve flexibility by having optional arguments:

def adf_GO(basis='TZP', XC_fun=('GGA','PBE')):
  settings = Settings()
  settings.input.Basis.Type = basis
  settings.input.XC[XC_fun[0]] = XC_fun[1]
  settings.input.NumericalQuality = 'Basic'
  settings.input.Geometry
  settings.input.Geometry.converge = 'grad=1.0e-5'
  return settings

def adf_excitations(basis='TZP', XC_fun=('GGA','PBE'), n_excitations=10):
  settings = Settings()
  settings.input.Basis.Type = basis
  settings.input.XC[XC_fun[0]] = XC_fun[1]
  settings.input.Excitations.lowest = n_excitations
  settings.input.Excitations.OnlySing = ''
  settings.input.NumericalQuality = 'Basic'
  settings.input.Symmetry = 'NoSym'
  return settings

Miscellaneous remarks

  • Before accessing a Results object of a PLAMS’ job you should assert the correct termination of the corresponding job via the ok() method:

    results = job.run()
    
    if results.job.ok():
      r = results.readkf(section_name, variable_name)
    
  • Further details about the usage of decorator functions can be found in the corresponding section of the PLAMS documentation.