3.2. Jobs

Without any doubt job is the most important object in PLAMS library. Job is the basic piece of computational work and running jobs is the main goal of PLAMS scripts.

Various jobs may differ in details quite a lot, but they all follow the common set of rules defined in the abstract class Job.

Note

Being an abstract class means that Job class has some abstract methods – methods that are declared but not implemented (they do nothing). Those methods are supposed to be defined in subclasses of Job. When a subclass of an abstract class defines all required abstract methods, it is called a concrete class. You should never create an instance of an abstract class, because when you try to use it, empty abstract methods are called and your script crashes.

Every job has its own unique name and a separate folder (called job folder, with the same name as the job) located in the main working folder. All files regarding that particular job (input, output, runscript, other files produced by job execution) end up in the job folder.

In general a job can be of one of two types: a single job or a multijob. These types are defined as subclasses of the Job class: SingleJob and MultiJob.

Single job is a job representing a single calculation, usually done by executing an external binary (ADF, Dirac etc.). Single job creates a runscript that is then either executed locally or submitted to some external queueing system. As a result of running a single job a handful of files is created, including dumps of the standard output and standard error streams together with any other files produced by the external binary. SingleJob is still an abstract class that is further subclassed by program-specific concrete classes like for example ADFJob.

Multijob, on the other hand, does not run any calculation by itself. It is a container for other jobs, used to aggregate smaller jobs into bigger ones. There is no runscript produced by a multijob. Instead, it contains a list of subjobs called children that are run together when the parent job is executed. Children jobs can in turn be either single or multijobs. Job folder of each child job is a subfolder of its parent’s job folder, so folder hierarchy fully corresponds to job child/parent hierarchy. MultiJob is a concrete class so you can create its instances and run them.

3.2.1. Preparing a job

The first step to run a job using PLAMS is to create a job object. You need to pick a concrete class that defines a type of job you want to run (ADFJob will be used as an example in our case) and create its instance:

>>> myjob = ADFJob(name='myfirstjob')

Various keyword arguments (arguments of the form arg=value, like name in the example above) can be passed to a job constructor, depending on the type of your job. However, the following keyword arguments are common for all types of jobs:

  • name – a string containing the name of the job. If not supplied, default name plamsjob is used. Job’s name cannot contain path separator (\ in Linux, / in Windows).
  • settings – a Settings instance to be used by this job. It gets copied (using copy()) so you can pass the same instance to several different jobs and changes made afterwards won’t interfere. Any instance of Job can be also passed as a value of this argument. In that case Settings associated with the passed job are copied.
  • depend – a list of jobs that need to be finished before this job can start. This is useful when you want to execute your jobs in parallel. Usually there is no need to use this argument, since dependencies between jobs are resolved automatically (see Synchronization of parallel job executions). However, sometimes one needs to explicitly state such a dependency and this option is then helpful.

Those values do not need to be passed to the constructor, they can be set or changed later (but they should be fixed before the job starts to run):

>>> myjob = ADFJob()
>>> myjob.name = 'myfirstjob'
>>> myjob.settings.runscript.pre = 'echo HelloWorld'

Single jobs can be supplied with another keyword argument, molecule. It is supposed to be a Molecule object. Multijobs, in turn, accept keyword argument children that stores the list of children jobs.

The most meaningful part of each job object is its Settings instance. It is used to store information about contents of job’s input file, runscript as well as other tweaks of job’s behavior. Thanks to tree-like structure of Settings, this information is organized in a convenient way: the top level (myjob.settings.) stores general settings, myjob.settings.input. is a branch for specifying input settings, myjob.settings.runscript. holds information for runscript creation and so on. Some types of jobs will make use of their own myjob.settings branches and not every kind of job will always require input or runscript branches (like multijob for example). The nice thing is that all the unnecessary data present in job’s settings is simply ignored, so accidentally plugging settings with too much data will not cause any problem (except some cases where the whole content of some branch is used, like for example the input branch in SCMJob).

3.2.1.1. Contents of job’s settings

The following keys and branches of job’s settings are meaningful for all kinds of jobs:

  • myjob.settings.input. is a branch storing settings regarding input file of a job. The way data present in this branch is used depends on the type of job and is specified in respective subclasses of Job.
  • myjob.settings.runscript. holds runscript information, either program-specific or general:
    • myjob.settings.runscript.shebang – the first line of the runscript, starting with #!, describing interpreter to use
    • myjob.settings.runscript.pre – an arbitrary string that will be placed in the runscript file just below the shebang line, before the actual contents
    • myjob.settings.runscript.post – an arbitrary string to put at the end of the runscript.
    • myjob.settings.runscript.stdout_redirect – boolean flag defining if standard output redirection should be handled inside the runscript. If set to False, the redirection will be done by Python outside the runscript. If set to True, standard output will be redirected inside the runscript using >.
  • myjob.settings.run. branch stores run flags for the job (see below)
  • myjob.settings.pickle is a boolean defining if job object should be pickled after finishing
  • myjob.settings.keep and myjob.settings.save are keys adjusting Cleaning job folder.
  • myjob.settings.link_files decides if files from job folder can be linked rather than copied when copying is requested

3.2.1.2. Default settings

Every job instance has an attribute called default_settings that stores a list of Settings instances that serve as default templates for that job. Initially this list contains only one element, global defaults for all jobs stored in config.job. You can add other templates just like adding elements to a list:

>>> myjob.default_settings.append(sometemplate)
>>> myjob.default_settings += [temp1, temp2]

During job execution (just after prerun() is finished) job’s own settings are soft-updated with all elements of default_settings list, one by one, starting with last. That way if you want to adjust some setting for all jobs run in your script you don’t need to go to each job and set it there every time, one change in config.job is enough. Similarly, if you have a group of jobs that need the same settings adjustments, you can create an empty Settings instance, put those adjustments in it and add it to each job’s default_settings. Keep in mind that soft_update() is used so any key in a template in default_settings will end up in job’s settings only if such a key is not yet present there. Thanks to that the order of templates in default_settings somehow defines their importance: data from a preceding template will never override the following one, it can only enrich it.

3.2.2. Running a job

After creating a job instance and adjusting its settings you can finally run it. It is done by invoking job’s run() method, which returns a Results instance:

>>> myresults = myjob.run()

Again, various keyword arguments can be passed here. With jobrunner and jobmanager you can specify which JobRunner and JobManager to use for your job. If those arguments are omitted, the default instances stored in config.default_jobrunner and config.jm are taken. All other keyword arguments passed here are collected and stored in myjob.settings.run branch as one flat level. They can be used later by various objects involved in running your job, for example GridRunner uses them to build command executed to submit runscript to the queueing system.

The following steps are taken after the run() method is called:

  1. myjob.settings.run is soft-updated with run() keyword arguments.
  2. If a parallel JobRunner was used, a new thread is spawned and all further steps of this list happen in this thread.
  3. Explicit dependencies from myjob.depend are resolved. This means waiting for all jobs listed there to finish.
  4. Job’s name gets registered in the job manager and the job folder is created.
  5. Job’s prerun() method is called.
  6. myjob.settings are updated according to contents of myjob.default_settings.
  7. The hash of a job is calculated and checked (see Rerun prevention). If the same job was found as previously run, its results are copied (or linked) to the current job’s folder and run() finishes.
  8. Now the real job execution happens. If your job is a single job, an input file and a runscript are produced and passed to job runner’s method call(). In case of multijob, run() method is called for all children jobs.
  9. After the execution is finished, result files produced by the job are collected and check() is used to test if the execution was successful.
  10. The job folder is cleaned using myjob.settings.keep. See Cleaning job folder for details.
  11. Job’s postrun() method is called.
  12. If myjob.settings.pickle is true, the whole job instance gets pickled and saved to the [jobname].dill file in the job folder.

3.2.2.1. Name conflicts

Jobs are identified by their names and hence those names need to be unique. This is obligatory also because job’s name corresponds to the name of its folder. Usually it is recommended to manually set unique names for jobs for easier navigation through results. But for part of applications, especially those requiring running large numbers of similar jobs, this is neither convenient nor necessary.

PLAMS automatically resolves conflicts between jobs’ names. During step 4. of the above list, if a job with the same name was already registered, the new job is renamed. The new name is created by appending some number to the old one. For example, the second job with the name plamsjob will be renamed to plamsjob.002, third to plamsjob.003 and so on. Number of digits used in this counter can be adjusted in config.jobmanager.counter_len and the default value is 3. Overflowing the counter will not cause any problems, the job coming after plamsjob.999 will be called plamsjob.1000.

3.2.2.2. Prerun and postrun methods

prerun() and postrun() methods are intended for further customization of your jobs. They can contain arbitrary pieces of code that are executed before and after the actual execution of your job. prerun() takes place after job’s folder is created but before hash checking. Here are some ideas what can be put there:

  • adjusting job settings
  • copying to job folder some files required for running
  • extracting results of some other job, processing them and plugging to job
  • generating children jobs in multijobs

See also Synchronization of parallel job executions for explanation how to use prerun() to automatically handle dependencies in parallel workflows.

The other method, postrun(), is called after job execution is finished, the results are collected and the job folder is cleaned. It is supposed to contain any kind of essential results postprocessing that needs to be done before results of this job can be pushed further in the workflow. For that purpose code contained in postrun() has some special privileges. At the time the method is executed the job is not yet considered done, so all threads requesting its results are waiting. However, the guardian restricting the access to results of unfinished jobs can recognize code coming from postrun() and allow it to access and modify results. So calling Results methods can be safely done there and you can be sure that everything you put in postrun() is done before other jobs have access to this job’s results.

prerun() and postrun() methods can be added to your jobs in multiple ways:

  • you can create a tiny subclass which redefines the method:

    >>> class MyJobWithPrerun(MyJob):
    >>>     def prerun(self):
    >>>         #do stuff
    

    It can be done right inside you script. After the above definition you can create instances of the new class and treat them in exactly the same way you would treat MyJob instances. The only difference is that they will be equipped with prerun() method you just defined.

  • you can bind the method to an existing class using add_to_class() decorator:

    >>> @add_to_class(MyJob)
    >>> def prerun(self):
    >>>     #do stuff
    

    That change affects all instances of MyJob, even those created before the above code was executed (obviously it won’t affect instances previously run and finished).

  • you can bind the method directly to an instance using add_to_instance() decorator:

    >>> j = MyJob(...)
    >>> @add_to_instance(j)
    >>> def prerun(self):
    >>>     #do stuff
    

    Only one specified instance (j) is affected this way.

All the above works for postrun() as well.

3.2.2.3. Preview mode

Preview mode is a special way of running jobs without the actual runscript execution. In this mode the procedure of running a job is interrupted just after input and runscript files are written to job folder. Preview mode can be used to check if your jobs generate proper input and runscript files, without having to run the full calculation.

You can enable preview mode by putting the following line at the beginning of your script:

>>> config.preview = True

3.2.3. Job API

class Job(name='plamsjob', settings=None, depend=None)[source]

General abstract class for all kind of computational tasks.

Methods common for all kinds of jobs are gathered here. Instances of Job should never be created. It should not be subclassed either. If you wish to define a new type of job please subclass either SingleJob or MultiJob.

Methods that are meant to be explicitly called by the user are run() and occasionally pickle(). In most cases Pickling is done automatically, but if for some reason you wish to do it manually, you can use pickle() method.

Methods that can be safely overridden in subclasses are:

Other methods should remain unchanged.

Class attribute _result_type defines the type of results associated with this job. It should point to a class and it must be a Results subclass.

Every job instance has the following attributes. Values of these attributes are adjusted automatically and should not be set by the user:

  • status – current status of the job in human-readable format.
  • results – reference to a results instance. An empty instance of the type stored in _result_type is created when the job constructor is called.
  • path – an absolute path to the job folder.
  • jobmanager – a job manager associated with this job.
  • parent – a pointer to the parent job if this job is a child job of some MultiJob. None otherwise.

These attributes can be modified, but only before run() is called:

  • name – the name of the job.
  • settings – settings of the job.
  • default_settings – see Default settings.
  • depend – a list of explicit dependencies.
  • _dont_pickle – additional list of this instance’s attributes that will be removed before pickling. See Pickling for details.
__getstate__()[source]

Prepare an instance for pickling.

Attributes jobmanager, parent, default_settings and _lock are removed, as well as all attributes listed in self._dont_pickle.

run(jobrunner=None, jobmanager=None, **kwargs)[source]

Run the job using jobmanager and jobrunner (or defaults, if None). Other keyword arguments (**kwargs) are stored in run branch of job’s settings. Returned value is the Results instance associated with this job.

Note

This method should not be overridden.

Technical

This method does not do too much by itself. After simple initial preparation it passes control to job runner, which decides if a new thread should be started for this job. The role of the job runner is to execute three methods that make the full job life cycle: _prepare(), _execute() and _finalize(). During _execute() the job runner is called once again to execute the runscript (only in case of SingleJob).

pickle(filename=None)[source]

Pickle this instance and save to a file indicated by filename. If None, save to [jobname].dill in the job folder.

check()[source]

Check if the calculation was successful.

This method can be overridden in concrete subclasses for different types of jobs. It should return a boolean value.

The definition here serves as a default, to prevent crashing if a subclass does not define its own check(). It always returns True.

hash()[source]

Calculate the hash of this instance. Abstract method.

prerun()[source]

Actions to take before the actual job execution.

This method is initially empty, it can be defined in subclasses or directly added to either whole class or a single instance using Binding decorators.

postrun()[source]

Actions to take just after the actual job execution.

This method is initially empty, it can be defined in subclasses or directly added to either whole class or a single instance using Binding decorators.

_prepare(jobmanager)[source]

Prepare the job for execution. This method collects steps 1-7 from Running a job. Should not be overridden. Returned value indicates if job execution should continue (Rerun prevention did not find this job previously run).

_get_ready()[source]

Get ready for _execute(). This is the last step before _execute() is called. Abstract method.

_execute(jobrunner)[source]

Execute the job. Abstract method.

_finalize()[source]

Gather the results of job execution and organize them. This method collects steps 9-12 from Running a job. Should not be overridden.

3.2.4. Single jobs

class SingleJob(molecule=None, name='plamsjob', settings=None, depend=None)[source]

Abstract class representing a job consisting of a single execution of some external binary (or arbitrary shell script in general).

In addition to constructor arguments and attributes defined by Job, the constructor of this class accepts the keyword argument molecule that should be a Molecule instance. The constructor creates a copy of the supplied Molecule and stores it as the molecule attribute.

Class attribute _filenames defines default names for input, output, runscript and error files. If you wish to override this attribute it should be a dictionary with string keys 'inp', 'out', 'run', 'err'. The value for each key should be a string describing corresponding file’s name. Shortcut $JN can be used for job’s name. The default value is defined in the following way:

>>> _filenames = {'inp':'$JN.in', 'run':'$JN.run', 'out':'$JN.out', 'err': '$JN.err'}

This class defines no new methods that could be directly called in your script. Methods that can and should be overridden are get_input() and get_runscript().

_filename(t)[source]

Return filename for file of type t. t can be any key from _filenames dictionary. $JN is replaced with job name in returned string.

get_input()[source]

Generate the input file. Abstract method.

This method should return a single string with full content of the input file. It should process information stored in input branch of job’s settings and in molecule attribute.

get_runscript()[source]

Generate runscript. Abstract method.

This method should return a single string with runscript contents. It can process information stored in runscript branch of job’s settings. In general the full runscript has the following form:

[first line defined by job.settings.runscript.shebang]

[contents of job.settings.runscript.pre, if any]

[value returned by get_runscript()]

[contents of job.settings.runscript.post, if any]

When overridden, this method should pay attention to .runscript.stdout_redirect key in job’s settings.

hash_input()[source]

Calculate SHA256 hash of the input file.

hash_runscript()[source]

Calculate SHA256 hash of the runscript.

hash()[source]

Calculate unique hash of this instance.

The behavior of this method is adjusted by the value of hashing key in JobManager settings. If no JobManager is yet associated with this job, default setting from config.jobmanager.hashing is used.

Methods hash_input() and hash_runscript() are used to obtain hashes of, respectively, input and runscript.

Currently supported values for hashing are:

  • False or None – returns None and disables Rerun prevention.
  • input – returns hash of the input file.
  • runscript – returns hash of the runscript.
  • input+runscript – returns SHA256 hash of the concatenation of hashes of input and runscript.
_full_runscript()[source]

Generate full runscript, including shebang line and contents of pre and post, if any.

Technical

In practice this method is just a wrapper around get_runscript().

_get_ready()[source]

Generate input and runscript files in the job folder. Methods get_input() and get_runscript() are used for that purpose.

_execute(jobrunner)[source]

Execute previously created runscript using jobrunner.

The method call() of jobrunner is used. Working directory is self.path. self.settings.run is passed as runflags argument.

If preview mode is on, this method does nothing.

3.2.4.1. Subclassing SingleJob

SingleJob class was designed in a way that makes subclassing it quick and easy. Thanks to that it takes very little effort to create PLAMS interface for a new external binary.

Your new class has to, of course, be a subclass of SingleJob and define methods get_input() and get_runscript():

>>> class MyJob(SingleJob):
>>>     def get_input(self):
>>>         ...
>>>         return 'string with input file'
>>>     def get_runscript(self):
>>>         ...
>>>         return 'string with runscript'

Note

get_runscript() method should properly handle output redirection based on the value of myjob.settings.runscript.stdout_redirect. When False, no redirection should occur inside runscript. If True, runscript should be constructed in such a way that all standard output is redirected (using >) to the proper file (its name is “visible” as self._filename('out') from inside get_runscript() body).

This is sufficient for your new job to work properly with other PLAMS components. However, there are other useful attributes and methods that can be overridden:

  • check() – the default version of this method defined in Job always returns True and hence effectively disables correctness checking. If you wish to enable checking for your new class, you need to define check() method in it, just like get_input() and get_runscript() in the example above. It should take no other arguments than self and return a boolean value indicating if job execution was successful. This method is privileged to have an early access to Results methods in exactly the same way as postrun().

  • if you wish to create a special Results subclass for results of your new job, make sure to let it know about it:

    >>> class MyResults(Results):
    >>>     def some_method(self, ...):
    >>>         ...
    >>>
    >>> class MyJob(SingleJob):
    >>>     _result_type = MyResults
    >>>     def get_input(self):
    >>>         ...
    >>>         return 'string with input file'
    >>>     def get_runscript(self):
    >>>         ...
    >>>         return 'string with runscript'
    
  • hash_input() and hash_runscript() – see Rerun prevention for details

  • if your new job requires some special preparations regarding input or runscript files these preparations can be done for example in prerun(). However, if you wish to leave prerun() clean for further subclassing or adjusting in instance-based fashion, you can use another method called _get_ready(). This method is responsible for input and runscript creation, so if you decide to override it you must call its parent version in your version:

    >>> def _get_ready(self):
    >>>     # do some stuff
    >>>     SingleJob._get_ready()
    >>>     # do some other stuff
    

Warning

Whenever you are subclassing any kind of job, either single of multi, and you wish to override its constructor (__init__ method) it is absolutely essential to call the parent constructor and pass all unused keyword arguments to it:

>>> class MyJob(SomeOtherJob):
>>>     def __init__(self, myarg1, myarg2=default2, **kwargs):
>>>         SomeOtherJob.__init__(self, **kwargs)
>>>         # do stuff with myarg1 and myarg2

Technical

Usually when you need to call some method from a parent class it is a good idea to use super(). However, there exists a known bug in Python 3 that causes the dill package to crash when super() is used. For that reason, if you’re using Python 3, please do not use super().

3.2.5. Multijobs

class MultiJob(children=None, name='plamsjob', settings=None, depend=None)[source]

Concrete class representing a job that is a container for other jobs.

In addition to constructor arguments and attributes defined by Job, the constructor of this class accepts two keyword arguments:

  • children – should be a list (or other iterable container) containing children jobs.
  • childrunner – by default all the children jobs are run using the same JobRunner as the parent job. If you wish to use a different JobRunner for children, you can pass it using this argument.

Values passed as children and childrunner are stored as instance attributes and can be adjusted later, but before the run() method is called.

This class defines no new methods that could be directly called in your script.

When executed, a multijob runs all its children using the same run() arguments. If you need to specify different run flags for children you can do it by manually setting them in children job Settings:

>>> childjob.settings.run.arg = 'value'

Since run branch of settings gets soft-updated by run flags, value set this way is not overwritten by parent job.

Job folder of a multijob gets cleaned independently of its children. See Cleaning job folder for details.

new_children()[source]

Generate new children jobs.

This method is useful when some of children jobs are not known beforehand and need to be generated based on other children jobs, like for example in any kind of self-consistent procedure.

The goal of this method is to produce a new portion of children jobs. Newly created jobs should be returned in a container compatible with self.children (e.g. list for list, dict for dict). No adjustment of newly created jobs’ parent attribute is needed. This method cannot modify _active_children attribute.

The method defined here is a default template, returning None, which means no new children jobs are generated and the entire execution of the parent job consists only of running jobs initially found in self.children. To modify this behavior you can override this method in MultiJob subclass or use one of Binding decorators, just like with Prerun and postrun methods.

hash()[source]

Hashing for multijobs is disabled by default. Return None.

check()[source]

Check if the calculation was successful. Returns True if every children job has its status attribute set to 'successful'.

other_jobs()[source]

Iterate through other jobs that belong to this MultiJob, but are not in children.

Sometimes prerun() or postrun() methods create and run some small jobs that don’t end up in children collection, but are still considered a part of a MultiJob instance (their parent atribute points to the MultiJob and their working folder is inside MultiJob’s working folder). This method provides an iterator that goes through all such jobs.

Each attribute of self that is an instance of a Job and has it’s parent pointing to self is returned, in random order.

_get_ready()[source]

Get ready for _execute(). Count children jobs and set their parent attribute.

__iter__()[source]

Iterate through children. If it is a dictionary, iterate through its values.

_notify()[source]

Notify this job that one of its children has finished.

Decrement _active_children by one. Use _lock to ensure thread safety.

_execute(jobrunner)[source]

Run all children from children. Then use new_children() and run all jobs produced by it. Repeat this procedure until new_children() returns an empty list. Wait for all started jobs to finish.

3.2.5.1. Using MultiJob

Since MultiJob is a concrete class, it can be used in two ways: either by creating instances of it or subclassing it. The simplest application is just to use an instance of MultiJob as a container grouping similar jobs that you wish to run at the same time using the same job runner:

>>> mj = MultiJob(name='somejobs', children=[job1, job2, job3])
>>> mj.children.append(job4)
>>> mj.run(...)

You can of course use it together with Prerun and postrun methods to further customize the behavior of mj.

More flexible way of using multijobs is subclassing. You can subclass directly from MultiJob or from any of its subclasses. Defining your own multijob is the best solution when you need to run many similar jobs and later compare their results. In that case prerun() method can be used for populating children and postrun() for extracting results and merging them.