7.6. Job and Engine Collections

Collections represent containers for AMS jobs that need calculation before a Data Set can be evaluated. They store all settings necessary for the execution in a human-readable YAML format. We divide these settings into two Collections: Job and Engine Collection.

See also

YAML – Homepage and on Wikipedia.

The Job Collection holds information relevant to the AMS driver, such as the chemical system, the driver task, and the properties to calculate. The Engine Collection stores different AMS engine blocks.
Combining an entry from the Job Collection with any entry from the Engine Collection ensures that results are comparable: Nothing in the job should change apart from the engine that is used to execute it.
Fundamentally, collections behave like dictionaries with key-value pairs. We refer to the key as ID (or jobID in case of the Job Collection).

Important

Each ID within a Collection needs to be unique due to the dict-like nature of the classes.


7.6.1. Job Collection

The Job Collection stores input that can be read by the AMS driver, alongside with optional metadata. When stored to disk, the data looks similar to the following:

---
ID: H2O_001
ReferenceEngineID: myDFTengine_1
AMSInput: |
   Task SinglePoint
   system
     atoms
        H     -0.7440600000      1.1554900000     -0.0585900000
        O      0.6438200000      1.2393700000      0.0060400000
        H     -3.3407000000      0.2702700000      0.7409900000
     end
   end
Source: ThisAwesomePaper
PlotIt: True
---
ID: CH4_001
ReferenceEngineID: DFTB_1
AMSInput: |
   Task GeometryOptimization
   properties
     Gradients True
   end
   system
     atoms
        C     -0.7809900000      1.1572800000     -0.0369200000
        H      0.6076000000      1.2309400000      0.0140600000
        H      1.3758800000      0.0685800000      0.0285600000
        H      0.7425100000     -1.1714200000     -0.0084500000
        H     -0.6465400000     -1.2538900000     -0.0595900000
     end
   end
...

The collection above contains two entries. We can recognize the ID, by which each entry is labeled. Everything else is a textual representation of the value stored under this ID. At runtime, each entry is stored as a JCEntry instance. Basic usage is discussed below.

The Job Collection inherits basic dictionary functionality. Consequently, all commonly known methods are available – with a few additions:

>>> jc.keys() # All IDs:
dict_keys(['H2O_001'])
>>> jc() # A shortcut to list(jc.keys())
['H2O_001']
>>> jc.values() # All JCEntries:
dict_values([<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>])
>>> for k,v in jc.items(): # Or both of the above
>>>   ...

7.6.1.1. Adding Jobs

Important

The Job Collection only stores instances of JCEntry.

An instance of the Job Collection can be initialized without further parameters. The instance can be populated with JCEntry objects only. Every JCEntry instance needs to have at least the attributes settings and molecule defined. Adding a reference_engine string and metadata is optional:

>>> jc = JobCollection()
>>> jce = JCEntry()
>>> jce.settings.input.AMS.Task = 'SinglePoint' # Must be a PLAMS Settings() instance
>>> jce.molecule = h2o_mol                      # Must be a PLAMS Molecule() instance
>>> jce.reference_engine = 'myDFTengine'        # Optional: A string that matches the ID of an EngineCollection entry
>>> jce.metadata['Source'] = 'SomePaper'        # Optional: Metadata can be added to the `metadata` dictionary
>>> jc.add_entry('H2O_001',jce)                 # Adding `jce` with ID 'H20_001'.
>>> # jc['H2O_001'] = jce                       # This is the same as the line above.
>>> # Adding more entries with the same ID to `jc` is not possible anymore.

All attributes can also be assigned when instantiating the object:

>>> jce = JCEntry(settings, molecule, refengine, **metadata)

See below for a textual representation of jce.

7.6.1.2. Lookup

>>> 'H2O_001' in jc
True
>>> jc['H2O_001']      # Get the respective JCEntry
>>> jc('H2O_001')      # Same
<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>

7.6.1.3. Removing Entries

>>> del jc['H2O_001']
>>> len(jc)
0
>>> # Alternatively:
>>> jc.remove_entry('H2O_001')

7.6.1.4. Renaming Entries

>>> oldkey = 'H2O_001'
>>> newkey = 'H2O_002'
>>> jc.rename_entry(oldkey, newkey)

Collections can be added. Duplicate keys will use the value of the first argument (jc rather than jc2):

>>> added_jc = jc + jc2

7.6.1.5. Comparison

>>> jc == jc
True

Metadata can be stored per-entry, as shown above. For the storage of global metadata or comments, the header attribute can be used to store a string:

>>> comments = """this is a multiline
>>> header comment"""
>>> jc.header = comments
>>> jc.store('jc_with_header.yaml') # The header string will be stored when writing to YAML

The header is also available to the Data Set and Engine Collection classes.

7.6.1.6. Saving and loading

Storing and loading collections can be done with:

>>> jc.store('jobs.yml')       # Store the collection in a YAML format.

This produces:

---
ID: H2O_001
ReferenceEngineID: myDFTengine
AMSInput: |
   system
     atoms
              H      0.0000000000      0.0000000000      0.3753600000
              H      0.0000000000      0.0000000000     -0.3753600000
     end
   end
   task SinglePoint
Source: SomePaper
...

The textual representation of a single JCEntry can also be invoked by calling the str() method. Calling print(jc['H2O_001']) would produce the same output as above (since our Job Collection only has one entry).

The file can then be loaded:

>>> jc2 = JobCollection('jobs.yml') # From YAML

7.6.1.7. Generating AMSJobs

The JobCollection.to_amsjobs() method can be used to quickly generate plams.AMSJob instances from all the entries in a Job Collection. You can limit the output to a specific subset of entries by providing the jobids argument. An additional engine_settings argument can be passed to be added to all AMSJob.settings, making the returned AMSJobs executable:

>>> engine_settings = Settings()
>>> engine_settings.input.ams.BAND # Assuming we would like to run all jobs in `jc` with BAND
>>> jobids = ['job1', 'job2']
>>> jobs = jc.to_amsjobs(jobids, engine_settings)
>>> all(isinstance(job, AMSJob) for job in jobs)
True
>>> [job.run() for job in jobs] # The jobs can now be executed by PLAMS

7.6.1.8. Running Collection Jobs

All entries in a Job Collection can be calculated at once with the JobCollection.run() method, returning a respective dictionary of {jobID : plams.AMSResults} pairs. This can be useful when a manual interaction with the job results is needed, given a specific engine (for example when calculating the reference data):

>>> len(jc)
20
>>> engine = Settings() # The JCEntries do not include engine settings
>>> engine.input.BAND   # We would like to run all stored jobs with BAND
>>> results = jc.run(engine) # Will run all jobs in jc and return their results object
>>> all(r.ok() for r in results.values()) # The returned value is a dict of {jobID : AMSResults}
True
>>> energies = [r.get_energy() for r in results.values()] # We can now process the results

Alternatively, a subset of jobs can be calculated by providing the jobids argument:

>>> ids_to_run = ['myjob1', 'myotherjob']
>>> results = jc.run(engine, jobids=ids_to_run)
>>> len(results)
2

Note

This method uses the AMSWorker interface where possible. Use the use_pipe keyword to disable it.

7.6.2. Engine Collection

Engine Collections are very similar to the Job Collection: The user can work with it in exactly the same manner. The main difference between those two is that the Engine Collection is storing Engine instances instead of JCEntry. A textual representation looks similar to this:

---
ID: DFTB_1
AMSInput: |
   engine DFTB
     Model DFTB3
     ResourcesDir DFTB.org/3ob-3-1
   endengine
Comment: My favourite engine.
...

Important

The Engine Collection only stores instances of Engine.

Within each entry, only the settings attribute must be defined. The remaining metadata is optional.

>>> ec = EngineCollection()
>>> e  = Engine()
>>> e.settings.input.DFTB.model = 'DFTB3' # e.settings is a PLAMS Settings() instance.
>>> e.settings.input.DFTB.ResourceDir = 'DFTB.org/3ob-3-1'
>>> e.metadata['Comment'] = 'My favourite engine.' # This is optional.
>>> ec.add_entry('DFTB_1',e)
>>> # print(ec['DFTB_1']) reproduces the textual representation above

See also

For further examples on how to work with the collection, please refer to the Job Collection section.

7.6.3. Collections API

7.6.3.1. JCEntry

class JCEntry(settings=None, molecule=None, reference_engine: Optional[str] = None, extra_engine: Optional[str] = None, **metadata)

A class representing a single job collection entry, i.e., an AMS job with optionally an associated reference engine and metadata.

Attributes

settingsplams.Settings

plams.Settings() instance, holding the input for the job.
If no settings are provided, a new object representing a single point calculation will be created.

Additionally, the following strings can be used as shortcuts to automatically initialize a Settings instance with an appropriate AMS Task:

  • ‘sp’: SinglePoint

  • ‘go’: GeometryOptimization

  • ‘md’: MolecularDynamics

  • ‘pes’: PESScan

  • ‘ts’: TransitionStateSearch

Important

Can not be empty when adding a class instance to the JobCollection.

moleculeplams.Molecule

plams.Molecule() for the system of interest.

Important

Can not be empty when adding a class instance to the JobCollection.

reference_engineoptional, str

ID of the reference engine, used for lookup in the EngineCollection.

extra_engineoptional, str

ID of the extra engine, used for look up in the EngineCollection. Specifying extra_engine allows you to have different per-job engine settings during the parametrization. When parametrizing DFTB or xTB, this can for example be used to have different k-space samplings for different jobs.

Important

The respective settings of the Engine instance should include the complete Engine block, which should match the parametrization interface. For example, if parametrizing XTBParameters, appropriate Engine settings would be Settings.input.dftb.kspace.quality = 'Basic'.

metadataoptional

Additional keyword arguments will be interpreted as metadata and stored in this variable.

__init__(settings=None, molecule=None, reference_engine: Optional[str] = None, extra_engine: Optional[str] = None, **metadata)

Creates a new job collection entry.

classmethod from_amsjob(amsjob, reference_engine=None, extra_engine=None, task=None, molecule='final', remove_bonds=False, **metadata)Tuple[str, scm.params.core.jobcollection.JCEntry]

Returns a 2-tuple (suggested_name, JCEntry)

JCEntry contains AMS settings and system from amsjob. suggested_name == amsjob.name

amsjob can either be an AMSJob, an AMSResults, or a string pointing to the job directory or ams.rkf file

The task is by default the same as in the amsjob, but can be changed with the task argument.

moleculestr

‘initial’ will get the initial system for finished AMSJob ‘final’ the final system for finished AMSJob ‘jobmolecule’ the AMSJob.molecule ‘first_history_indices’ read the frame given by the first entry in PESScan%HistoryIndices in History

This method adds the following additional metadata to the resulting JCEntry instance:
  • Origin - path to the calculation from which this instance originated

  • OriginalEnergyHartree - calculated energy of the system (if present in the original job)

__str__()

Returns a string representation of a job collection entry.

copy()

Create a copy of this entry.

is_pipeable()bool

Based on settings, return whether the job can be calculated using the AMSWorker interface.

__eq__(other)

Check if two collections are the same.

7.6.3.2. JobCollection

See also

This class inherits from BaseCollection. Most methods can be found there.

class JobCollection(*a, **kw)

A class representing a job collection, i.e. a collection of JCEntry instances.

Attributes

headerdict

A dictionary with global metadata that will be printed at the beginning of the file when store() is called. Will always contain the ParAMS version number and class name.

enginesEngineCollection

An EngineCollection instance attached to this collection. Used in run() and run_reference().

__init__(*a, **kw)

Creates a new collection, optionally populating it with entries from yamlfile.

load(fpath='job_collection.yaml')

Alias for load_yaml().

load_yaml(yamlfile)

Loads all job collection entries from a (compressed) YAML file and adds them to the job collection.

duplicate_entry(key, newkey)

Duplicates this colection’s entry associated with key and stores it under newkey

store(yamlfile='job_collection.yaml')

Stores the JobCollection to a (compressed) YAML file.
The file will be automatically compressed when the file ending is .gz or .gzip. If at least one Engine is defined in the self.engines attribute, will also store the Engine Collection as under the same name as yamlfile appended by ‘_engines’).

from_jobids(jobids: Set[str])scm.params.core.jobcollection.JobCollection

Generates a subset of self, reduced to entries in jobids.

writexyz(filename: str, jobids: Optional[Sequence] = None)

Writes geometries in this instance to one xyz trajectory file.

Parameters

filenamestr

Path to the xyz file that will be written

jobidsoptional, sequence of strings

Write only the jobIDs present in this Sequence.

to_amsjobs(jobids: Optional[Sequence] = None, engine_settings: Optional[scm.plams.core.settings.Settings] = None)List[scm.plams.interfaces.adfsuite.ams.AMSJob]

Batch-generate a list of plams.AMSJob from entries in the Job Collection.
If engine_settings is provided, will __add__() the instance to each entry’s settings when generating the jobs.

This method is equivalent to:

engine_settings = Settings()
engine_settings.input.BAND
jobs = [AMSJob(name=ename, molecule=e.molecule, settings=e.settings+engine_settings) for ename,e in JobCollection().items() if ename in jobids]
Parameters

jobidsoptional, sequence of strings

A sequence of keys that will be used to generate the AMSJobs. Defaults to all jobs in the collection.

engine_settingsoptional, plams.Settings

A plams.Settings instance that will be added to every AMSJob.settings.

Returns

List[plams.AMSJob]

run(engine_settings: Union[scm.plams.core.settings.Settings, Type[scm.params.parameterinterfaces.base.BaseParameters]], jobids: Optional[Sequence] = None, parallel: Optional[scm.params.common.parallellevels.ParallelLevels] = None, use_pipe=True, _skip_normjobs=False)Dict[str, Union[scm.plams.interfaces.adfsuite.ams.AMSResults, scm.plams.interfaces.adfsuite.amsworker.AMSWorkerResults]]

Run all jobs in the job collection using engine settings from engine_settings. If a jobcollection entry has an extra_engine defined, you must also specify an engine_collection which contains the definition of the extra_engine that is used to augment the engine settings on a per-job basis.

Returns the respective AMSResults dict.

When running jobs that are incompatible with the AMSWorker interface or when use_pipe=False, this method will use the regular PLAMS backend. Note that when plams.init() is not called prior to this method, all executed job results will be stored in the system’s temporary directory only for as long as the return value is referenced at runtime. You can make the results storage persistent or change the PLAMS working directory by manually calling plams.init before calling this method.

Parameters

engine_settingsplams.Settings or Parameter Interface type

A plams.Settings instance representing the AMS engine block, or a parameter interface.
Every entry will be executed with this engine. The engine can be augmented if there is an engine_collection and the job collection entry has an extra_engine (ExtraEngineID) defined.

jobidsoptional, Sequence[str]

A Sequence of jobids that will be calculated.
Defaults to all jobs in the collection.

paralleloptional, ParallelLevels

Parallelization for running the jobs from the collection.

use_pipebool

Whether to use the AMSWorker interface or not.

_skip_normjobsbool

When both, plams.AMSWorker and plams.AMSJobs need to be computed, skip the computation of the latter if any of the previous plams.AMSWorkerResults are not results.ok(). By default, this is set to True during an optimization, to save time, as one failed job equals in the cost function being inf.

Returns

resultsdict

Dictionary mapping the jobID to a plams.AMSResults or plams.AMSWorkerResults.

run_reference(jobids: Optional[Sequence] = None, parallel: Optional[scm.params.common.parallellevels.ParallelLevels] = None, use_pipe=True, _skip_normjobs=False)Dict[str, Union[scm.plams.interfaces.adfsuite.ams.AMSResults, scm.plams.interfaces.adfsuite.amsworker.AMSWorkerResults]]

Only useful if not all reference engines per entry are the same (otherwise same functionality as run()).
Runs multiple jobs with different engines, as defined by each entry’s reference_engine attribute. The corresponding settings will be obtained from self.engines. Assumes that all entries have a reference_engine defined, and all values are also present in self.engines. Will raise a ValueError otherwise.

See run() for a description of the remaining parameters.

set_extra_engine(settings: scm.plams.core.settings.Settings, engine_id: str = 'ParAMS', overwrite: bool = False)

Function to set the extra_engine (ExtraEngineID) attribute of all job collection entries, and at the same time define the engine in the linked engine collection.

This is mostly useful to ensure that the ‘ParAMS’ engine used by the ParAMS GUI is defined and used by all job collection entries.

settings: Settings

Engine settings (settings.input.dftb)

engine_id: str

The name of the engine

overwrite: bool

If True, overwrite the extra_engine in all job collection entries. If False, only set extra_engine if it hasn’t been defined. The engine definition in the engine collection is always overwritten.

7.6.3.3. Engine

class Engine(settings=None, metadata=None)

A class representing an AMS engine, i.e. its input (the engine block) and optional metadata.

Attributes:

settingsplams.Settings

A plams.Settings instance, holding the AMS input information for the Engine.

Important

Can not be empty when adding the class instance to the EngineCollection.

metadatadict

Additional metadata entries can be stored in this variable.

typestr

String representation of the engine used. Will be generated automatically.

__init__(settings=None, metadata=None)

Create a new Engine entry.

__str__()

Returns a string representation of an AMS engine.

__eq__(other)

Check if two collections are the same.

copy()

Return a copy of this entry

7.6.3.4. EngineCollection

See also

This class inherits from BaseCollection. Most methods can be found there.

class EngineCollection(yamlfile=None, _gui=False)

A class representing a collection of engines.

Attributes

headerdict

A dictionary with global metadata that will be printed at the beginning of the file when store() is called. Will always contain the ParAMS version number and class name.

load(yamlfile='engine_collection.yaml')

Loads all engines from a yaml file and adds them to the collection.

store(yamlfile='engine_collection.yaml')

Stores the EngineCollection to a (compressed) YAML file.
The file will be automatically compressed when the file ending is .gz or .gzip.

7.6.3.5. Collection Base Class

All collections inherit from this base class.

class BaseCollection(yamlfile=None, _gui=False)

Base class for JobCollection and EngineCollection

Attributes

headerdict

A dictionary with global metadata that will be printed at the beginning of the file when store() is called. Will always contain the ParAMS version number and class name.

__init__(yamlfile=None, _gui=False)

Creates a new collection, optionally populating it with entries from yamlfile.

load(yamlfile)str

Abstract method, call in the child’s load method. This method returns the raw string from file and extracts the header. Call it (with super()) before or after the actual loading.

store(yamlfile)

Stores the entire collection in a (compressed) yamlfile.

add_entry(eid: str, entry: Any, replace=False)

Adds an entry to the collection.

Parameters:

eidstr

Unique ID for the entry. Will warn and convert to string when a non-string IDs is provided.

entrysubject to _check_entry().

This subclass is meant to store the actual contents. The structure of the subclass will be different, depending on the collection.

replacebool

By default, adding entries with an ID that is already present will raise a KeyError. Set this to True if you would like to overwrite the entry stored at that ID instead.

add_entry_nonstrict(eid, entry, reuse_existing=False)

Adds an entry to the collection. If the eid already exists, creates a new unique name by appending an integer.

reuse_existingbool

If True, compare the contents of the current entry to each existing entry. If the new entry duplicates an old one, do not add anything and return the existing eid. The type added to the collection (e.g. JCEntry or Engine) must implement the __eq__ method to compare values.

remove_entry(eid)

Removes an entry matching eid from the collection, or throws an exception if the entry is not found.

rename_entry(oldkey, newkey)

Rename an entry in the collection to be associated with newkey

duplicate_entry(oldid: str, newid: str)

Maps an entry with an existing oldid to a new entry with newid (without removing the old one)

_check_entry(eid, entry, replace=False)

Abstract method. Add additional checks here, then call super()._check_entry(eid,entry).

__str__()

Return str(self).

items()

Return all key:value pairs in collection.

values()

Return all entries in collection.

keys()

Return all IDs in collection.

__getitem__(key)

Get the entry with matching key (ID).

__setitem__(key, value)

Same as add_entry() with replace=True.

__delitem__(key)

Same as remove_entry().

__len__()

Return number of entries in collection.

__iter__()

Iterate over key:value pairs.

__add__(other)

Add two classes to return a new collection. Entries from other will only be added if not already present in self.

update(other)

Update the instance with entries from other (possibly overwriting existing entries).

__contains__(key)

Check if ID == key is in collection.

__eq__(other)

Check if two collections are the same.

__ne__(other)

Return self!=value.

__repr__()

Return repr(self).